

MySQL and Sphinx at Craigslist (presentation by Jeremy Zawodny) - amix
http://www.percona.com/ppc2009/PPC2009_craigslist_search.pdf

======
thorax
I love sphinx (we use a custom version of it to power <http://bug.gd>).

Some choice quotes:

"25 MySQL Boxes to 10 Sphinx"

"50M queries per day w/steady growth"

"1,000+ qps during peak w/room to grow"

It's great to see it standing-up well, though it looks like they made (or
sought) patches for issues they ran into.

~~~
aaronblohowiak
How hard or easy was it to customize? Would you mind sharing the nature of
your customizations?

~~~
thorax
Well, the customizations are to the C++ code and it doesn't feel particularly
difficult to adjust, but then we have years of senior-level C++ coding behind
us.

One of the first changes we made to sphinx was just to increase the minimum
word length so that queries could be extremely long (as error messages
sometimes are quite long). This was changing a define and tweaking some other
areas internally.

We also made some changes recommended by others in the sphinx forums and have
been adjusting the weighting algorithms for our own needs.

~~~
leej
Did/Will you make your patches open source?

~~~
thorax
No plan, really. The changes would need to be #ifdef'd or otherwise
conditioned-out because almost no one would want them. We optimized it
specifically for our own error message searching and that's not really very
useful for practical use.

For adjusting the maximum query length, you can find tips on that in the
sphinx forums.

------
amix
The most interesting thing of their usage is how they managed to create almost
realtime search on Craigslist by using Sphinx's delta indexes. And they are
doing this on lots of data, which is a good sign.

Sphinx 1.x+ should feature realtime index updates, which will make the Sphinx
deal a lot more impressive.

This all said, the support of realtime index updates in the current search
engines is a joke and one must do lots of hacks in order to support them
properly and on lots of data/updates.

------
larryfreeman
At Hubpages.Com, we switched to sphinx in December. We found for PHP that the
migration was pretty painless. We had previously used the MySQL full text
search.

Check out the tutorial here for a nice overview of what's involved in setting
up sphinx: [http://www.ibm.com/developerworks/library/os-php-
sphinxsearc...](http://www.ibm.com/developerworks/library/os-php-
sphinxsearch/)

------
mbrubeck
Slideshare version (via <http://news.ycombinator.com/item?id=583304>):

[http://www.slideshare.net/jzawodn/mysql-and-search-at-
craigs...](http://www.slideshare.net/jzawodn/mysql-and-search-at-
craigslist?type=presentation)

------
zandorg
I never understood Craigslist. The most baffling thing is they took on EBay
and won, which I put down to luck.

But the initial concept of a listing for a small town (whether it be San
Francisco or not) is laughable.

~~~
chops
What!? Classifieds have been around for forever and were the bread and butter
of the print newspapers for a long time. If craigslist "took on" anything,
it's classifieds.

The "second-hand random stuff" market that ebay had previously was just made
easier (and cheaper) by craigslist, but only for local items.

~~~
zandorg
Sorry, that was a bit flamey of me.

I have read Craig's interview in Founders at Work, though.

The thing I'm pointing out, is that given all the online newspapers, and
startups probably to do it, I'm surprised Craigslist could take off.

My second point about it starting as SF is: In 2005 or so, when I read about
CL, the site was only SF, and was pretty small.

So I reasoned: It would be meaningless to people outside SF (eg, no listings)
so how could it eventually become a multi-million dollar business?

Web business is _really_ wierd...

~~~
thwarted
_The thing I'm pointing out, is that given all the online newspapers, and
startups probably to do it, I'm surprised Craigslist could take off._

You make it sound like craigslist is some recent newcomer to the internet
classified scene. whois craigslist.com shows "Record created on 24-Sep-1997".
Craigslist was most likely the first to do and popularize on-line classifies,
and figured out all the tech and social changes to do it. Online newspapers
never have concentrated on on-line classified and are just now doing that in
order to have a decent on-line offering that keeps people coming back. There
are even whitebox classified engines that companies like radio stations can
stick on their websites to encourage traffic. None of them will beat
craigslist though.

 _My second point about it starting as SF is: In 2005 or so, when I read about
CL, the site was only SF, and was pretty small._

I'm sure craigslist has had city listings for places other than San Francisco
since before 2005 (maybe not too long before). But San Francisco is a good
place to build up a userbase of on-line users because the culture is welcoming
to on-line interaction and everyone is so wired (at least compared to other
cities).

