

Reddit releases fully functional VM with their source code - jedberg
http://blog.reddit.com/2010/05/admins-never-do-what-you-want-now-it-is.html

======
KirinDave
Am I the only person who thought that instead of “an image for VMware” they
meant “the source code to a functional virtual machine that Reddit uses”?

Don't get me wrong, this is cool. But the other way would have been way
cooler.

~~~
jedberg
We're not quite that talented. ;)

~~~
eru
We are. (I am working on XenServer.)

I did not get confused by the headline.

~~~
apgwoz
I read it as something like a new python vm, not a xen vm.

------
keysersosa
The choice to use VMWare was nothing personal to VirtualBox; we just happen to
already be running VMWare on our dev boxes. (I'm actually downloading
VirtualBox now to see if cross-compatibility is possible.)

~~~
jacquesm
I'm absolutely amazed at your dedication to this, and how confident you guys
are that you're not helping some competitor.

That goes to prove that closed source is really a dead end, once you have a
sufficient head-start you can even afford to give away a _turn-key_ copy of
your software and still sleep at night.

Impressive!

~~~
keysersosa
Thanks.

We really hope this will help us more than hurt us in the end. Part of the
problem with releasing something like reddit as open source is that it isn't
designed around installation. For the most part, the pieces have been built in
place organically as needed. This means that even though the source is out
there, it's been really hard to get developer contributions as many get stuck
before they get reddit up and running locally.

This should effectively lower the barrier to entry there and let devs actually
think about the code and adding features rather than about whether or not
rabbit-mq or cassandra are properly configured.

~~~
jacquesm
I would have exactly that problem if I released code to any of my sites (not
that they're worth looking at :) ), and for much the same reason.

You build stuff because you have to when it hits, especially if you experience
'unexpected growth', and things like installation documentation and so on will
suffer, if they exist at all.

So releasing a working VM is a great way to do this, it's about as user-
friendly as you can get.

I think you'll be setting an example here that will be followed many times.

------
blasdel
Reddit itself is hosted on EC2, right?

I use scripts (<http://github.com/wr0ngway/rubber/>) to boot up, configure,
and deploy to my instances from a stock Ubuntu AMI -- but from the looks of
this maybe you set up and maintain AMIs for each role instead?

I much prefer a scripted installation to a frozen hand-rolled installation,
even if the latter is more reliable.

~~~
natrius
This is a little off topic, but can you elaborate on your reasons for
preferring scripted installation? It's what I use too, but hearing someone
else's justification would be helpful.

~~~
jedberg
I'll tell you why I'd like to get there. It makes updating the master image
easier. If you need a new package, you just update on the server side, and
then all the new images get that new package. No rebundling required.

It also helps when you want to upgrade your OS. As long as the packages are
mostly the same, you can just run your update script from the new OS and make
a few changes.

The way I have to do it now, I have to build a whole new master image from
scratch, because I can't upgrade Ubuntu in place on EC2 due to the way they
handle the kernel and kernel modules.

~~~
apu
Doesn't this increase the time it takes to boot up a new instance, though?

Boot, update and apply patches, copy over source & data, start running
services... _everytime_ you start a new instance

~~~
jedberg
Yes, it does. But you probably aren't starting instances _that_ often, and it
only adds a few minutes. If you really need a fast boot, then you can still
use an image.

However keep in mind, on EC2 at least, that the images are actually downloaded
from S3 when you boot, so if you have a huge image, it will take almost as
much time.

------
jwegan
Another question, If I wanted to improve reddit's search, it would help to
know what solutions you've already investigated (and what you are currently
doing)?

~~~
jedberg
Here is where we publish most of the info about search:

<http://www.reddit.com/help/search>

In short, it used to be a fulltext search through the database, and now is is
Solr built on Lucene.

However, I think PG's essay about why he doesn't have a better search on HN
applies equally to reddit -- because there are much better things to spend
time on.

~~~
soult
PG is wrong, search is a very useful feature. Even more so for a site like
reddit where submissions can be in different subreddits.

If search weren't useful, why is there searchyc? And every few weeks a topic
about why there is no search function?

~~~
samd
Is there some reason that using Google isn't good enough? Most of my trouble
searching reddit is because it's hard to translate "that picture of a guy with
a bacon helmet" into something Google understands.

------
emehrkay
I've been bullshitting with getting this idea out of my mind and into
production -- over thinking how I should build it, getting stuck in design
mode, wondering if it will scale -- a bunch of self-imposed barriers. Now
Reddit has provided a solid blueprint. I am amazed and very thankful. I hope
to help with some of their issues one day.

Thanks reddit

------
ilovecomputers
Ah, thanks for this.

I tried to run reddit on my machine. However, I got into many dead ends, the
redditdev IRC was quiet, and because I had no idea what I was doing with
MacPort or the Terminal (or my filesystem or the many different paradigms I
had to deal with) I lost all faith in me trying to understand how sysadmins
(or python developers) got things done.

Essentially, I became contempt for being a noob of a CS student, but at least
I know how to run a virtual machine and prove by induction. That's a job skill
right?

~~~
jedberg
> That's a job skill right?

Depends on what you do with the VM.

------
jwegan
Out of curiosity, how much external contribution is there to reddit's source
code?

~~~
jedberg
Not a lot right now, but hopefully more with this release.

~~~
apgwoz
Any stats as to how many people have reddit clones based on the codebase?

~~~
jedberg
These are the only ones we know about (and I think 1/2 of those are dead):
<http://code.reddit.com/wiki/PoweredByReddit>

The goal with open sourcing was never to have clones -- it was to get people
to dev the features that they felt were important but we did not, or did not
have the time for. Being able to make an easy clone was just a side effect.

~~~
Nwallins
I really like <http://lesswrong.com> \-- a spinoff of
<http://overcomingbias.com>

Their 'stacked' view of threaded comments makes a lot of sense to me. It might
be really nice for a code editor too -- the minimal indent width can be
smaller, and context is clearer.

~~~
jedberg
Lesswrong is awesome. It is my favorite reddit derivative.

