Reddit: Lessons Learned From Scaling To 1 Billion Pageviews A Month

0xbadcafebee · on Aug 26, 2013

You notice how in these recaps, all you read about is "I learned that X does Y"? They don't seem to have much in the way of lessons to take heed of for all situations. It's more like, "If you use this specific key/value store, tweak the thingimabob to sassyfraz to make sure your dingo does wibblydong." So if my platform doesn't use that store, your lesson is pointless. If it's a problem with an application, it's great that you're pointing it out, but if it was just oversight by lazy engineers, leave it out.

Then there's the wise lessons on general topics, like the idea that you should "wait until your site grows so you can learn where your scaling problems are going to be". I'm pretty sure we know what your scaling problems are going to be. Every single resource in your platform and the way they are used will eventually pose a scaling problem. Wait until they become a problem, or plan for them to become a problem?

I'm not that crazy. It really doesn't take a lot of time to plan ahead. Just think about what you have, take an hour or two and come up with some potential problems. Then sort the problems based on most-imminent most-horrible factors and make a roadmap to fix them. I know nobody likes to take time to reflect before they start banging away, but consider architectural engineering. Without careful planning, the whole building may fall apart. (Granted, nobody's going to die when your site falls apart, but it's a good mindset to be in)

diego · on Aug 26, 2013

No, you have no idea what your scale problems are going to be (if you ever have them). That is because if you get lucky and your application scales, it (and the world) will change significantly from what it is today.

Let me tell you a story: in 1998 at Inktomi (look it up) we had a distributed search engine running on Sun hardware. We could not have anticipated that we'd need to migrate to Linux/PC because Sun's prices would make it impractical for us to continue scaling using their hardware. It took us three years to make the switch, and that's one of the reasons we lost to Google. Had we started two years later (when Gigabit ethernet became available for no-brand PC hardware), then we would have built the entire thing on Linux to begin with.

"It really doesn't take a lot of time to plan ahead."

Have you ever experienced the growth of a startup to see your infrastructure cost soar to five, six, seven figures per month? Two hours will get you as far as "one day we'll probably need to replace MySQL by something else." What you don't know is what that something else will be. Too many writes per second? Reads? Need for geographical distribution? A schema that changes all the time because you need features and fields you never imagined? Will you need to break up your store into a service-oriented architecture with different types of components? Will you run your own datacenter, or will you be on AWS? What will be the maturity level of different pieces of software in two years?

I hope you get the point.

0xbadcafebee · on Aug 27, 2013

Do you really expect me to buy the idea that your company failed because Gigabit was too expensive? Even if 100Mbit was far cheaper, there are plenty of workarounds to cheaply increase throughput.

I assume by no-brand you mean custom built, and also assuming you mean the cheapest available, in which case even one gigabit interface may have been difficult, seeing as 32-bit 33mhz bus capacity is barely above gigabit speed. In any case, the money you saved on Sun gear could have built you a sizeable PC cluster and even with several 100Mbit interfaces would have been more powerful and cheaper. Really I think it wasn't built on Linux because Sun was the more robust, stable platform. But I could be crazy.

While i'm being crazy, I should point out all the other things you mentioned can be planned for. Anybody who's even read about high-performance systems design should be able to account for too many reads/writes! Geographical distribution is simple math: at some point, there are too many upstream clients for one downstream server, capacity fills, latency goes through the roof. A DBA knows all about schema woes. I thought service-oriented architecture was basic CS stuff? (I don't know, I never went to school) AWS didn't exist at the time. And the maturity level of your software in two years will, obviously, be two years more mature.

All of these problems are what someone with no experience or training will run into. But there should be enough material out there now that anyone can read up enough to account for all these operational and design problems, and more. But if your argument is that start-up people shouldn't have to know it because hey, I haven't got time to figure out how to do it right because I have to ship right now, I don't buy that for a minute.

There's a paper that some guy wrote years ago that goes over in great detail every single operational scaling issue you can think of, and it's free. I don't remember where it is. But it should be required reading for anyone who ever works on a network of more than two servers.

As an aside: was it really cost that prohibited you from porting to Linux? This article[1] from 2000 states "adding Linux to the Traffic Server's already impressive phalanx of operating systems, including Solaris, Windows 2000, HP-UX, DEC, Irix and others, shows that Inktomi is dedicated to open-source standards, similar to the way IBM Corp. has readily embraced the technology for its eServers." And this HN thread[2] has a guy claiming that in 1996 "we used Intel because with Sun servers you paid an extreme markup for unnecessary reliability". However, it did take him 4 years to move to Linux. (?!) A lot of other interesting comments on that thread.

[1] http://www.internetnews.com/bus-news/article.php/526691/Inkt... [2] https://news.ycombinator.com/item?id=3924609

asdfjjjjjj · on Aug 27, 2013

Hi Peter, another ex-inktomi/ex-yahoo guy here. I worked on this infrastruture much later than Diego. Traffic Server is not a significant part of the Inktomi environment -- you are looking at the wrong thing. Diego is describing the search engine itself, which ran on Myrinet at that time. It did not run on 100baseT ethernet. Myrinet was costly and difficult to operate, but necessary as the clusters performed an immense amount of network i/o.

It is also extremely non-trivial to replace your entire network fabric alongside new serving hardware and a new OS platform. These are not independent web servers, these are clustered systems which all speak peer to peer during the process of serving a search result. This is very different from running a few thousand web servers.

Even once migrated to gigE and linux, I watched the network topology evolve several times as the serving footprint doubled and doubled.

I assure you, there is no single collection of "every single operational scaling issue you can think of," because some systems have very different architectures and scale demands -- often driven by costs unique to their situation.

0xbadcafebee · on Aug 27, 2013

What you're saying makes total sense in terms of complexity and time for turning out a whole new platform. But to my view it depends a lot on your application.

Was the app myrinet-specific? If so, I can understand increased difficulty in porting. But at the same time, in 1999 and 2000, people were already building real-time clusters on Linux Intel boxes with Myrinet. (I still don't know exactly what time his post was referencing) If Diego's point was that they didn't move to Linux because Gigabit wasn't cheap enough yet, why did they stick with the expensive Sun/Myrinet gear, when they could have used PC/Myrinet for cheaper? I must be missing something.

I can imagine your topology changing as you changed your algorithms or grew your infrastructure to work around the massive load. I think that's natural. My point was simply that making an attempt to understand your limitations and anticipate growth is completely within the realm of possibility. This doesn't sound unrealistic to me [based on working with HPC and web farms].

What I meant to say was "every single issue", as in, individual problems of scale, assumptions made about them, and how they affect your systems and end users. It's a broad paper that generically covers all the basic "pain points" of scaling both a network and the systems on it. You're going to have specific concerns not listed, but it points out all the categories you should look at. I believe it even went into datacenter design...

krenoten · on Aug 27, 2013

> Geographical distribution is simple math

HAHAHAHAHAHAHA.

> And the maturity level of your software in two years will, obviously, be two years more mature.

HAHAHAHAHAHAHA.

0xbadcafebee · on Aug 27, 2013

You don't get token karma on here for being an ass. This isn't reddit.

sbarre · on Aug 27, 2013

If this isn't Reddit, then why are you telling diego that you know more about his personal life experiences than he does?

That's a pretty Reddit thing to do..

gbog · on Aug 27, 2013

> Stay as schemaless as possible. It makes it easy to add features. All you need to do is add new properties without having to alter tables.

And at the same time they use and praise Postgres a lot, so it cannot be about NoSQL.

I am wondering what they mean exactly. From my own tendency, it should mean use a few very big and narrow tables in the form of "who - do - what - when - where", eg "userA - vote up - comment1 - timestamp - foosubreddit", and also "userB - posted - link1 - timestamp - barsubreddit"

Then in the same table you get kinda all events happening in the site, and you are somewhat schemaless, in the sense that adding a new functionality do not require schema change.

If someone with inner insight can confirm this is not too far from what reddit team meant, I'd appreciate.

ketralnis · on Aug 27, 2013

> And at the same time they use and praise Postgres a lot, so it cannot be about NoSQL.

We had a basic schema that basically made postgres into a K/V store. So we had both.

free652 · on Aug 27, 2013

>Postgres is a great database. It makes a wonderful, really fast key-value store

^^ They probably have a basic schema, the rest is in KV store (guess)

_ofdw · on Aug 27, 2013

Reddit is an interesting case; they seem to have almost unlimited amounts of user good will. Case in point: I get the "you broke reddit" pageload failure message an awful lot and I'm sure others do too. How many other sites have userbases that would tolerate such a high number of errors?

hboon · on Aug 27, 2013

Not many perhaps. But Twitter did too.

ape4 · on Aug 27, 2013

Its a discussion site. Not an eCommerce site.

continuations · on Aug 26, 2013

> For comments it’s very fast to tell which comments you didn’t vote on, so the negative answers come back quickly.

Can you get into more details about how this is used? If reddit needs to display a page that has 100 comments, do they query Cassandra on the voting status of the user on those 100 comments?

I thought Cassandra was pretty slow in reads (slower than postgres) so how does using Cassandra make it fast here?

extesy · on Aug 27, 2013

As far as I understand, since user most likely voted only on a small subset of those 100 comments (say 3) and negative lookups are very fast because of bloom filters [1], therefore all lookups combined are fast.

[1] https://en.wikipedia.org/wiki/Bloom_filter

continuations · on Aug 27, 2013

That makes sense. Thanks.

ketralnis · on Aug 27, 2013

> Can you get into more details about how this is used? If reddit needs to display a page that has 100 comments, do they query Cassandra on the voting status of the user on those 100 comments?

Sort of, yeah. There are two versions of this, the old way and the new way.

Old way: keep a cache of the last time they voted. Remove any comments from those 100 that are younger than the last vote (since they can't possibly have voted on them). Then look up the keys containing the remaining ones. Most of them hit the bloom filter, those that pass the bloom filter actually get looked up. In the worst case this is all 100 comments, which can hit up up to 1/RF of your Cassandra nodes. The worst case doesn't happen often.

The new way is a little different, you have one Cassandra row (so only one machine) containing all of the votes for the user (perhaps further limited to a given link ID or date). You hit that one node for all 100 comments. If you have per-row bloom filters, see Old Way for the rest.

> I thought Cassandra was pretty slow in reads (slower than postgres) so how does using Cassandra make it fast here?

"Fast" and "slow" as used here is very naive, performance is usually more complicated than simple statements like this. I guess if you had 1 postgres node with 1 key and one Cassandra node with 1 key, and only 1 client, you could answer simple generalisations like this. But reddit has thousands of concurrent hits spread over hundreds of servers with varying amounts of RAM, I/O performance, network bottlenecks, and usage profiles.

The single biggest win is that you can easily horizontally scale Cassandra. Just add more nodes until it's "fast". But even that's a gross simplification.

For another example, if you scale Postgres by adding a bunch of replicas and choose a random one to read from for a given key, then they all have the same data in RAM, so your block cache miss rate is very high (that is, your effective block cache is the amount of RAM in one machine). Additionally, your write performance is capped to the write performance of your one master. Your replication throughput is capped to his outbound network bandwidth.

So you want a subset of the data on all of them so that whichever machine you ask has a very high likelihood of having that data in block cache. So you shard it by some function. But then you want to add more postgres machines, you have to migrate the data somehow, without shutting down the site. You've now more or less written Cassandra.

asdasf · on Aug 27, 2013

> You've now more or less written Cassandra.

More like you've now more or less written a tiny piece of postgres-xc: http://wiki.postgresql.org/wiki/Postgres-XC

jzelinskie · on Aug 26, 2013

This looks like a summary of the talk on InfoQ on the subject:

http://www.infoq.com/presentations/scaling-reddit

seiji · on Aug 26, 2013

highscalability is a strange reposty/blogspam aggregation thing that takes information from other places and just puts it up on their own site. I think they started having some original content, but it's still mostly second hand reports of source material found elsewhere.

(Think of it more as somebody's personal notes about how things work and not an exclusive source of breaking news or architecture revelations.)

chaz · on Aug 26, 2013

I find a lot of value in the summarization. Frankly, I'd rather read the notes on a 38 minute video and maybe watch the original source, rather than have to watch 38 minutes without knowing what the value will be.

Aggregation and filtering is value, too. Like HN, it's a channel with the expectation of a certain type of content. I can't possibly discover good tech talks (or any other content) entirely on my own.

nasalgoat · on Aug 26, 2013

The signal/noise ratio on highscalability is significantly good that it is well worth following - the quality and usefulness of the content is often much better than even HN, of which I have about a 1/100 ratio of click to ignore.

ketralnis · on Aug 27, 2013

Yes, nobody was interviewed or anything to put this together. They just cobbled together some (mostly very old!) articles

jedberg · on Aug 27, 2013

I wish they would put this disclaimer at the top of their articles. :(

billybob255 · on Aug 26, 2013

It is, they link to the presentation in the beginning of the article.

pella · on Aug 26, 2013

HN comments: https://news.ycombinator.com/item?id=6222726

727374 · on Aug 26, 2013

"Treat nonlogged in users as second class citizens. By always giving logged out always cached content Akamai bears the brunt for reddit’s traffic. Huge performance improvement. "

This is the lowest of low hanging fruit. Many people don't realize it but a ton of huge media sites use Akamai to offload most of their "read-only" traffic.

ketralnis · on Aug 27, 2013

Definitely true, and one of the earliest and longest-standing optimisations. Even pre-Akamai, we had simple caching, both whole-page and per-object/query.

human_error · on Aug 26, 2013

> Used the Pylons (Django was too slow), a Python based framework, from the start

This isn't quite right. It was web.py at the beginning. They have started using Pylons after Conde Nast acquisition.

showerst · on Aug 26, 2013

Originally it was actually written in LISP. =)

http://blog.reddit.com/2005/12/on-lisp.html

jedberg · on Aug 26, 2013

Yes, I glossed over that part. We didn't use web.py very long.

dmead · on Aug 26, 2013

one time, rob malda commented back to me on something. if you could do that same, that'd be greeaaaat

jedberg · on Aug 26, 2013

I think this kind of thing doesn't really fly on HN, but sure. :)

ketralnis · on Aug 27, 2013

And even calling it Pylons is a bit of a stretch. It's an ancient version of Pylons, of which most of the innards have been replaced over time.

sologoub · on Aug 26, 2013

It's a very interesting assertion to make, but really ambiguous. Slow how? Is it slow to render the page, slow DB access, slow to build?

It would also be interesting to know the versions and any backstory. My guess is that none of this info exists because it, like most things done in a rush/under pressure, was probably attempted, didn't work right away and then tossed.

falcolas · on Aug 26, 2013

I can certainly appreciate what Reddit has accomplished, but the thought of losing the abilities of a full RDBMS for a key-value store makes my hair stand on end.

I've yet to find schema changes limiting in my ability to code against a DB (and I use MySQL, which is one of the most limiting in this regard). Plus, I appreciate the ability to offload things like data consistancy and relationships to the database. I understand, however, where others might not feel the same way.

diego · on Aug 26, 2013

The long tail of startups will rarely need something other than a relational database because they won't get to a scale anywhere near Reddit's. It's not that others don't "feel" the same way; there's a reason all those technologies exist. If you want to know, go work for Google, Twitter, Facebook, LinkedIn, etc.

j_baker · on Aug 26, 2013

I would argue the exact opposite. The average startup is likely using MySQL as a glorified key-value store already anyway, and they're likely using it in lieu of a more appropriate datastore because people tell them they don't need a NoSQL database until they get to Google-size.

The lesson is: match your database to your use-case, not the other way around. Need advanced querying/reporting options? Get a warm, fuzzy feeling from a SQL prompt? Use MySQL. Want a plain jane key-value store? Use Voldemort/Kyoto Cabinet. Want flexible schemas? Use MongoDB. Want a Key-value store with secondary indexes and lots of scaling capabilities? Use Cassandra/HBase. Want a powerful datastore that's supported by a BigCo? Use DynamoDB or Cloud Datastore.

diego · on Aug 26, 2013

That's not the exact opposite. Because most people are familiar with SQL databases, that's usually what they use. That's the case with the grandparent post. "Use what you know, and works for your case" is better for a startup than learning a trendy technology because you believe it might be better for your use case.

Three years ago at IndexTank we were looking for a SimpleDB replacement because it just didn't work as advertised. We explored a bunch of options, and we paid a significant cost to find out that deploying Cassandra would not be worth it for us. If you have never used Cassandra and choose it because you "want a Key-value store with secondary indexes and lots of scaling capabilities" then you're in for a world of hurt.

j_baker · on Aug 26, 2013

For some reason, you seem to be reducing my argument to "Use the shiniest technology possible!". Please don't strawman me.

If MySQL works for your use-case, and it's the option you're the most familiar with, use it. You'd be doing yourself a disservice by not at least evaluating other options though.

And Cassandra is a key-value store with secondary indexes and lots of scaling capabilities (such as multi-datacenter deployments, multi-master replication deployment)[1], and some companies who aren't Google or Facebook do need these things. It sounds to me like IndexTank wasn't one of those companies.

I reiterate my point: choose what's best for your company, and don't settle at MySQL just because it's "good enough".

[1] I'd also add that it's particularly suited for large, insert-heavy datasets.

asdasf · on Aug 26, 2013

There is literally zero appropriate use cases for mysql. If you need a relational database, use one. If you need a network hash table, use one. Don't use mysql at all.

falcolas · on Aug 26, 2013

This is a very common (at least on HN), and very misdirected view of MySQL.

MySQL is a high performing, highly scalable, ACID compliant relational database, when configured correctly.

The "MySQL is not production ready" meme was perpetuated by some well meaning, if ill-informed, fans of other RDBMS platforms.

7mediaws · on Aug 26, 2013

I agree. There is no proof that MySQL cannot do the job when configured correctly. Furthermore, another issue is with coding. Sometimes unnecessary nested "if" statements can cause huge problems no matter what type of database you use.

asdasf · on Aug 26, 2013

>MySQL is a high performing, highly scalable, ACID compliant relational database, when configured correctly.

You forgot the "with tons of brokenness, misfeatures, mistakes, problems and otherwise NOTABUG bugs that will never be fixed and cause immense amounts of pain". Literally every other RDBMS is a better option, thus there is no reason to use mysql.

falcolas · on Aug 26, 2013

Simply repeating anti-MySQL rhetoric is not going to convince anybody that it has actual problems, just that you've had bad experiences in the past that have biased you strongly against it.

It's particularly not going to convince people when it's so widely used (from Wordpress installations to Facebook), and perhaps more importantly when it's offered as part of the two largest VPS providers.

On topic, I'd be happy to offer some advice on how to set up MySQL in a way that limits (or eliminates) the concerns proffered by most "MySQL is not Production Ready" comments... the two most oft cited problems being sorted by the following two my.cnf settings:

    sql-mode=TRADITIONAL
    default-storage-engine=InnoDB

jacques_chester · on Aug 27, 2013

The baseline is what counts.

You can avoid buffer overflows in C by using a library that's got safe strings.

Does that make C safe? Nope.

The Windows NT architecture has an enormously rich security mechanism that can allow arbitrarily granular security statements to be made about almost everything. But the default policy until Windows 7 was "pretend you're Windows 95".

Did that make Windows more secure than Unix? Nope.

The baseline is what counts.

falcolas · on Aug 27, 2013

The baseline (reading as default configuration) is the only thing that counts? Then Postgres is unusable for any reasonably sized dataset.

Of course, so is Oracle, SQL Server, and every other database known to man.

You have to tailor the configuration of any database server to meet your needs. MySQL is no different in this regard.

jacques_chester · on Aug 27, 2013

My need is for a database that doesn't silently corrupt my data.

MySQL is different in this regard.

sendob · on Aug 27, 2013

I don't like rhetoric either. I think it is good to consider the facts, and for me generally the situation/environment/problem as much as possible.

I like to consider the problem, before I recommend a solution generally ( I don't mean to accuse you of pushing a solution, I think you are offering help which is always appreciated ), but I think a lot of people are used to having had the choice already made ( and indeed, in some circumstances it is!).

One thing I always try to remember about mysql, as it is is less than intuitive to me, at least that there is no way I am aware of to alter or restrict this behavior directly, is that in mysql the client is allowed to alter the sql-mode ( I do think I have used proxies to filter out this behavior as a sort of guardian, but that was not an ideal fix by any means ), generally if you don't have control of your clients ( or also hopefully some good layers in front ) in the RDBMS world you are already sunk, but this has been more as a guard against accidental breakage for instance.

http://dev.mysql.com/doc/refman/5.5/en/faqs-sql-modes.html

This can make it unsuitable for certain situations ( where you may not have control of the client ).

One thing that I think is both a strength, and a weakness ( again depending upon the situation ) is that mysql is very flexible and can be deployed in so many different configurations.

Generally I think it is best for people to carefully consider their situation and needs ( and be prepared to change when the situation does!).

I really enjoy working with Postgresql as well, and have long respected the code produced by that project.

In summary, I'd say there are many great databases ( both relational and otherwise!) which can be a real asset to solving problems. The best thing I think is to learn directly and continually :)

asdasf · on Aug 27, 2013

>On topic, I'd be happy to offer some advice on how to set up MySQL in a way that limits (or eliminates) the concerns proffered by most "MySQL is not Production Ready" comments

There are no settings for "make triggers actually work", or "remove arbitrary limitations like being unable to update a table referenced in a subquery", or "make views with aggregates perform well enough to be used", or to add expression indexes or check constraints or window functions or let you set defaults to functions or to have a transactional DDL or to make rollbacks not corrupt the database or to allow prepare/execute in procedures or to allow recursion in procedures or to allow triggers to modify the table they were fired against. That's the point, mysql is full of crippling limitations. There are non-broken databases available that are superior in every single way. Thus there is no reason to use mysql. I know quite a lot more about mysql than you seem to think I do, and there is a reason that the only thing I do with mysql is conversions from mysql to an appropriate database that actually works.

bcoates · on Aug 26, 2013

Reason to use MySQL: It's easy to find people comfortable with it that won't accidentally shoot off a toe doing simple things.

The "Don't use MySQL" argument smells like the "don't use bcrypt" argument to me. You're letting the perfect be the enemy of the good, for 95% of the usecases where you're doing something dumb like using MongoDB or homebrewing something, MySQL is a better choice--even if it isn't often the best choice.

asdasf · on Aug 27, 2013

I've never seen a single mysql database where there was no toe shooting happening. People just don't seem to miss their toes. In 100% of the cases where you used mysql, postgresql was a better option in every way. It isn't letting perfect be the enemy of good, it is saying "don't use bad software when there's good software available".

hackula1 · on Aug 26, 2013

Are you implying that mysql is not a relational db?

falcolas · on Aug 26, 2013

As a differing example of scaling - Facebook works very well using MySQL at a very large scale; implying that you don't need to move to key/value stores to scale.

Both technologies have their place and reasons to exist, but it's not solely for the ability to scale.

mh- · on Aug 26, 2013

I don't necessarily disagree with your point, but that is a poor example.

Facebook uses MySQL, largely, as a key-value store.

falcolas · on Aug 26, 2013

Do you have a citation for this? The closest I could find is a mention in a gigaom article[1] that mentions that they have some data better suited for a document store tool, but saying "there likely are unstructured or semistructured data currently in MySQL that are better suited for HBase" doesn't imply that the majority of their data is key/value based.

If you'd like another example at slightly less than Facebook size - RightNow (recently acquired by Oracle). They manage customer service for many (1,000+) different clients at huge scale: more than 300 MySQL databases spread throughout the world. If a website has a knowledge base, there's a good chance it's being managed on the backend by RightNow.

[1] http://gigaom.com/2011/12/06/facebook-shares-some-secrets-on...

chrismealy · on Aug 26, 2013

Queues were a saviour. When passing work between components put it into a queue. You get a nice little buffer.

What does reddit use for queuing?

jeffasinger · on Aug 26, 2013

I believe they use RabbitAMQP

ptolts · on Aug 26, 2013

Came here to find that out as well!

ketralnis · on Aug 27, 2013

rabbitmq

WestCoastJustin · on Aug 26, 2013

This appears to a summery of an InfoQ presentation, which was discussed about two weeks ago @ https://news.ycombinator.com/item?id=6222726

jjwiseman · on Aug 26, 2013

"Do not keep secret keys on the instance." I'm curious how people deal with this--what approaches do you use?

jedberg · on Aug 26, 2013

Amazon now provides a service to give you on instance keys: http://aws.amazon.com/iam/faqs/#What_is_IAM_roles_for_EC2_in...

Before that at Netflix we developed a service that would hand out temporary keys to the requestor when they presented a proper certificate.

At reddit we put the secret keys on the instance, which was bad. :)

arohner · on Aug 27, 2013

Very cool. Are there any publicly available options for non-AWS services?

misiti3780 · on Aug 26, 2013

Is it common for people to use PostGres for a key-value store in production (rather than redis)?. This is the first time I have heard of it, and I am just starting to use PostGres now, so I was a bit surprised

falcolas · on Aug 26, 2013

Redis doesn't have a great solution for the D(urability) in ACID yet. PostgreSQL can ensure that changes are durable while being performant; redis can't do that as well.

Redis does offer AOS for durability, but it's not nearly as mature as PostgreSQL (and comes with all sorts of caveats).

mason55 · on Aug 26, 2013

Depends what other features you need. redis will be better if you need to do things like time-based collections, sorting by z-score, etc. Postgres gives you mature clustering (at least for master-slave), a mature tooling ecosystem (PgAdmin has been around for awhile, I didn't say it was good), better access controls, etc.

Like anything else in engineering, it's all a series of trade-offs to figure out what fits your needs the best.

jeffasinger · on Aug 26, 2013

We're trying out Postgres as a partially schema-less store.

We store arbitrary JSON in some fields, and even build indexes on that data. It's been remarkably fast so far, and other than a few small gotchas relating to type conversions and indexing, is really easy to use.

pjscott · on Aug 26, 2013

For very good engineering reasons, Redis is limited to what it can keep in memory. Postgres, for equally good reasons, can use disk. That makes a difference if you have a lot of key-value pairs to store, and are willing to accept a few disk seeks.

rosser · on Aug 26, 2013

It's still relatively new functionality, so I wouldn't expect to see it in wide use, but we're currently trying it out in a limited, point-solution kind of role. (We're a Postgres-mostly shop already.)

So far, everything's working as well as I'd have expected from something released by the PostgreSQL community.

yummyfajitas · on Aug 27, 2013

The functionality was always there:

    CREATE TABLE kvstore (
        key VARCHAR(128) NOT NULL UNIQUE,
        value VARCHAR(128) NOT NULL
     );

Or are you referring to hstore?

rosser · on Aug 27, 2013

I'm talking about the JSON-specific functionality, which tends to make doing JSON-based, document-oriented, key-value-ish things in PostgreSQL significantly better, easier, faster, whatever-er.

ketralnis · on Aug 27, 2013

And this is how reddit does it, reddit doesn't use hstore.

asdasf · on Aug 26, 2013

Yes, in general it is pretty common for web developers to misuse relational databases as key-value stores. There's a common misconception that "joins are slow", so people write the joins in their application code instead, thus making it several orders of magnitude slower.

Shorel · on Aug 26, 2013

In the case of Postgres, it has indexable json and hstore datatypes that enable the use of a table as a key-value store.

And, it has good performance when compared to MongoDB: http://thebuild.com/presentations/pg-as-nosql-pgday-fosdem-2...

So in this case, it is using Postgres as intended by the datatype designers.

May be you are thinking of the Ruby Rails ORMs.

asdasf · on Aug 27, 2013

I didn't mean people using hstore, I mean people having relational data, putting it in tables like you would with a normal relational model, but just not using foreign keys and joins because they mistakenly think "joins are slow".

exhaze · on Aug 26, 2013

Jeremy also gave a great Airbnb tech talk on this topic:

http://nerds.airbnb.com/reddit-netflix-and-beyond-building-s...

callmeed · on Aug 27, 2013

Can someone elaborate/clarify this:

> Users connect to a web tier which talks to an application tier.

So, I'm assuming the web tier is nginx/haproxy and the application tier is Pylons.

Are the 240 servers mentioned all running both the web tier and the app tier?

computer · on Aug 27, 2013

Presumably the web tier does slightly more than just reverse proxying. For example, it could build (render) pages based on an internal RedditAPI it queries. This RedditAPI (application layer) would then basically be a distributed database front-end with some state, like user sessions.

Seperating it at that point allows the web tier to offload much of the work (mostly rendering), while keeping it stateless, thus allowing effortless scaling of that tier.

ketralnis · on Aug 27, 2013

No. Some are postgres, some are cassandra, some are web servers (haproxy), some are app servers (the reddit app running inside of pylons), etc

chum · on Aug 26, 2013

Recode Python functions in C

From a security standpoint, this sounds like a bad idea

jedberg · on Aug 26, 2013

By the time it hits the C code it should be sanitized, but yes, it does add some security overhead.

ketralnis · on Aug 27, 2013

That's a really general statement, people write C code all of the time. You just have to be more careful. That is, you have to actually be a C programmer instead of a Python programmer cobbling together some C.

That said, reddit uses a mix of straight-C and Cython-ised Python, which is a bit like the best of both worlds.

ivanbrussik · on Aug 26, 2013

Just out of curiosity what does "stay as schemaless" as possible that did not read right?

skeletonjelly · on Aug 27, 2013

jedberg - you speak of automation, did you use anything (or is there anything in use currently) that handles auto scaling for EC2? puppet/chef/ansible etc? Or was this all done by hand?

srj55 · on Aug 26, 2013

hmm...no love for django here.

falcolas · on Aug 26, 2013

I'm currently using Django, and I can understand where they are coming from. The sheer amount of code that a simple request has to go through to receive an answer is staggering sometimes, such as 18 line tracebacks just to identify an authentication problem...

My own projects are not yet large enough to have this cause an issue, but I can see where something the size of Reddit would indeed have issues that even the most aggressive caching can't resolve.

film42 · on Aug 26, 2013

It's not a matter of rating frameworks, it's a matter of purpose. Django and rails are designed to go from nothing to product as quickly as possible. Plus, thanks to better hardware and scaling techniques, it's easy and affordable to stick to frameworks longterm (instagram and github, for example). Sure, they could they rewrite their site in C++, Go or Erlang, but they would lose their ability to rapid prototype new features.