
I Want a New Data Store (for Craigslist) - blasdel
http://blog.zawodny.com/2010/04/27/i-want-a-new-data-store/
======
bensummers
GenieDB <http://www.geniedb.com> is an interesting possibility for these kinds
of migrations. It allows you to use NoSQL type stores with a MySQL interface,
using SQL and even doing JOINs between MySQL and NoSQL tables. The idea is you
can gradually adjust your application over time without a sudden transition
where you have to rewrite all your database code.

(Disclosure: I'm on the advisory board.)

~~~
jzawodn
Hmm... Very interesting idea, thanks!

------
ck2
whoa

 _ALTER TABLE takes an unreasonably long time [...] Changes take over a
month._

How about a RAID of SSD ? Or a Fusion-IO ioDrive?

Shouldn't they be an order of magnitude faster?

I'd love to see a benchmark comparison of a large db with SSD vs conventional
(can't find one).

added: here's an old benchmark with an older slower SSD that is faster than
RAID10 <http://www.bigdbahead.com/?p=68> and an update with more details (and
faster) here <http://www.bigdbahead.com/?p=557>

~~~
jzawodn
Yes and no. Fusion-io removes the io bottleneck but ALTER TABLE is still
single threaded. So you're looking at a CPU bottleneck where only 1 core can
be used.

We just did a lot of ALTER TABLE work on one of our clusters WITH fusion-io
drives (2 in a RAID-0) for a 360GB table and it still took ~8 hours to finish.

~~~
ck2
So what was the CPU on that bottleneck, out of curiosity?

If single core is the problem, throw a massive clock rate at it?

The i5-680 is out this month and is 3.6ghz native ("turbo" near 4ghz) should
do 5ghz with good air cooling (it's 32nm) and then you could use ECC memory to
be certain in a mission-critical environment?

~~~
jzawodn
This was... lemme see. less /proc/cpuinfo

Intel(R) Xeon(R) CPU X5570 @ 2.93GHz

So it was a pretty high-end Nehalem proc already. And, yeah, we do have ECC
RAM in the box, of course. Plus InnoDB does page-level checksumming to guard
against corruption too.

But still, it was ~8 hours or so. Using all 8 cores on the box would have
brought it down to ~1 hour _if_ it could have been perfectly parallelized (and
much of it could have been, I bet).

~~~
ck2
Ah okay, so even with the fastest clocks available today, the best you could
hope for is cutting the 8 hours to 4, or one month to two weeks.

As far as the parallel cores it sounds like you already have thought about
possible "end runs" around MySQL's "transaction safe" ALTER (which is probably
why it's single threaded). InnoDB supposedly has a few performance
improvements for ALTER over MyISAM so you are already taking advantage of
that.

------
lukatmyshu
We've got a few TB in couchdb-lounge (we also wrote couchdb-lounge) so we
think it works. Big pluses ... it's got a RESTful API which means that we
don't need a special library to access it in any language ... if you can speak
http you can talk to couch. This is especially important for us because there
are lots of places where we like to talk to our databases asynchronously. It's
a lot easier to find an async HTTP API than it is to find a random db API.

~~~
jzawodn
That's a great point about async libs. Look no farther than MySQL to see how
much pain it can be to get a good one in your language of choice.

------
tjpick
> to influence the organization of records on disk so that the most common
> queries will require very few seeks

might be revealing my lack of knowledge here but what tools/apis etc does one
use when they want to work at this level? It seems like a good thing to do but
I'd have no idea how to go about it. Enlightenment appreciated.

~~~
JoachimSchipper
SQL has a CLUSTER command, for instance. See e.g.
[http://www.postgresql.org/docs/8.4/interactive/sql-
cluster.h...](http://www.postgresql.org/docs/8.4/interactive/sql-
cluster.html).

~~~
tjpick
ahhh, thanks.

------
aboodman
Reminds me of this article on how friendfeed implemented schemaless storage on
top of MySQL:

<http://bret.appspot.com/entry/how-friendfeed-uses-mysql>

I always thought it was a very clever design.

~~~
jzawodn
Yup. It's a very interesting mix of using a subset of something very well
known (MySQL) to get what you need done.

------
chime
I know it's not NoSQL but wouldn't this be a very good candidate for
RethinkDB? A few TBs of SSD is not that expensive for a major site. It also
supports live schema changes and is a drop-in replacement for MySQL.

~~~
jzawodn
As luck would have it, I'm an advisor for RethinkDB. They're doing great work,
but also aren't quite ready for prime time either. I'm very optimistic for
their technology in an OLTP system, but this is not like an OLTP system at
all.

------
w-ll
Redis says that craigslist uses Redis. You have a link to a redis tutorial on
your blog. Is this not your answer?

~~~
jzawodn
Redis is great for some things but is definitely not the right tool here. It's
built mainly for serving up small chunks of info (or functionality) _very_
quickly. And it kicks ass at that.

This is a system that probably sees even more writes than reads, and they're
batched up mostly at night anyway. The longer term _management_ of that
infrastructure is a big part of the pain.

------
sjs
I think Riak has everything you need. It's worth a look anyway.

~~~
jzawodn
Yeah, a few folks have mentioned it already. And I remember seeing some good
features in the video I watched about it several weeks ago...

------
yarapavan
So it is Cassandra then?

