

Steve Huffman on Lessons Learned at Reddit - natsel
http://thinkvitamin.com/code/steve-huffman-on-lessons-learned-at-reddit/

======
shazow
I'm always reluctant about going completely schemaless. Reminds me of the blog
post by FriendFeed about how they use MySQL, highly recommended:
<http://bret.appspot.com/entry/how-friendfeed-uses-mysql>

I feel like like there needs to be a better middleground for having some
schema but being able to augment it easily with metadata that you're not
querying against (yet). Then later extracting the metadata into queryable
columns.

I wrote a post outlining some ideas of how to do this:
[https://github.com/shazow/everything/blob/master/idea/arbitr...](https://github.com/shazow/everything/blob/master/idea/arbitrarily-
structured-data-in-rdbms.md)

I've only implemented bits and pieces of this in practice, huge convenience so
far.

~~~
joshu
I think this is exactly right. Have a document store and a (separate?) index
store.

------
runningdogx
I don't understand the fetish with caching every variation for every user.

Here's a thought: dynamic apps should render a page exactly one way in html,
and rely on javascript and cookies to post-process the site so it APPEARS to
be customized for that particular user. That includes admin widgets.

That particularly applies for displaying how many minutes ago something was
generated. Serve it with a date-time format, use javascript to post-process
into "x seconds ago" or "x minutes ago".

If someone isn't logged in, or isn't an admin, and they hack their javascript
to display user or admin stuff, who cares? The user/admin requests sent to the
webserver won't succeed anyway, because they rely on having an admin session
cookie.

If there's meaningful rather than just UI stuff that users or admins get to
see, then you have to cache that separately, but you can even do stuff like
loading it dynamically with js so the publicly visible (cached) content can
still be used, and you cut down the amount of stuff your server has to auto-
generate. It can cache the separate pages (xml, json, whatever) that serve the
logged-in user content, as well.

------
apu
Note: the talk is from May 2010.

I wonder if Steve (or rather, jedberg or someone else at reddit) were to give
the talk today, if 'memcache' and 'memcachedb' would both be replaced by
Redis?

~~~
jedberg
We replaced memcachedb with Cassandra a while ago, because memcachedb pretty
much hit a wall at some point.

As for replacing memcached, I'm certainly open to it, but from what I've read,
the performance of memcache for what we use it for is better than redis.

------
p90x
another lesson could be: "don't make a site that appeals to people who use ad-
block. revenue won't keep up with demand."

~~~
patrickod
many users (myself included) add an exception to adblock for reddit. I don't
mind helping their ad revenue when the ads aren't that intrusive.

------
code_duck
That's a good basic overview on what it takes to keep a site the size of
Reddit (a year ago) afloat. I need to think more about caching, personally.

Reddit has done a great job of serving a massive amount of traffic, given the
size of their staff especially.

------
dwc
Watching the video made me feel a little uncomfortable. I come away with the
impression that they _almost_ , but not quite, really understood the important
lessons.

Still, it was well worth watching and I'm glad Huffman decided to go there.

------
natsel
Why did they use Python in the first place? Reddit is still kind of unstable.

~~~
simonw
Are you trolling? Site stability issues very rarely have anything to do with
the underlying programming language, unless you're using some experimental
language that no one else is using for web development (and even then, Arc
seems to be working pretty well for HN).

