Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh, it will certainly come up, but it's not going to break Reddit if you get an old version of a comment as long as it's eventually consistent. Nobody is going to die, and nobody is going to lose any money.

Data stores like Cassandra and MongoDB don't lose revisions. That's not the kind of consistency we're talking about. CAP consistency is just getting the most recent version. You won't lose data -- data loss is a bug, not expected behavior, just like any other data store -- you just won't always get the most recent version of it. And, keep in mind, when we talk about eventual consistency here we generally mean "consistent on all nodes within a few minutes, but we're not blocking reads to write this data." It's not going to take hours.

That said, if you find you get an old version of your own comment, I'd be more willing to believe it's the fact that your request failed with a 503 error or otherwise timed out as much as it was a data store problem. Next time it happens, wait 5 minutes and try again.

> is there really no other way to cluster their data except "one big table"? Maybe like shard subreddits to specific servers ala Hyperdex?

The whole point of MongoDB or Cassandra is that you can get shards without all the headache that RDBMSs usually put you through. You configure your sharding function and let the system do the rest. You don't have to connect to the right shard or anything of the sort, which some RDBMSs do (or did, it's been awhile since I've looked) require with sharding.

Reddit has their code and architecture posted, though it's out-of-date now, it makes it clear that it's basically just two big tables:

https://github.com/reddit/reddit/wiki/Architecture-Overview

It's PostgreSQL, ThingDB, Cassandra, memcached, and RabbitMQ.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: