
RethinkDB (YC S09) Raises $1.2 Million For Its Database For Solid-State Drives - jasonlbaptiste
http://techcrunch.com/2010/04/01/rethinkdb-raises-1-2-million-for-its-database-for-solid-state-drives/
======
pg
Incidentally, if any hackers are looking for jobs working on an interesting
problem, I know the Rethinks are hiring. It would be the perfect job for a lot
of hackers. You'd get to solve big problems starting with a blank slate, and
you'd get to work with smart, totally pragmatic people (Slava and Mike). Plus
they now have enough to actually pay you.

~~~
quickpost
Not to mention they are very transparent with their compensation and stock
options (Scroll down - <http://rethinkdb.com/jobs/>). People who run their
hiring process and business like that seem like very good people to work for
indeed.

~~~
rms
That's really, really cool. I've never seen the stock options/equity listed on
a hiring page before.

~~~
reitzensteinm
And I love that it's a percentage, not, eg, 10k options.

~~~
byrneseyeview
Do people actually quote the number of options without reference to a
percentage or a stock price?

------
ShabbyDoo
"There’s obviously risk involved with trying to redefine how people structure
their databases"

TechCrunch misses the point that Rethink is explicitly not doing this. The
MySQL engine is below the SQL parsing layer, so as-is MySQL apps should be
able to run against it.

~~~
davidu
That's in theory. In practice, MySQL has different syntax for different
engines when you get into more esoteric queries.

------
andrewcooke
i was looking at this the other day. how do they address (what i assume are)
the larger space requirements of append-only databases with the lower
capacities of ssds? am i wrong about the space requirements, or is moore's law
going to fix it, or is there some kind of background compaction?

~~~
coffeemug
A couple of points:

\- We garbage collect (see Mendel Rosenblum's Ph.D. thesis on log-structured
systems)

\- Our customers care about cost per IOPS, not cost per GB.

\- The hot real-time stuff is usually handled by a different database and/or
storage system than the older, less frequently accessed data anyway.

~~~
runT1ME
Will you guys be able to support all isolation levels?

~~~
coffeemug
Isolation levels are really a poor design decision, because they imply the use
of locks. Serializable is great, but impossible to implement efficiently.
Repeatable read, read committed, and read uncommitted can be implemented
efficiently, but allow for various unpleasant isolation artifacts.

The one we're implementing is really the one everyone wants - snapshot
isolation. It can be implemented very efficiently, and is stronger than
repeatable read, read committed, and read uncommitted (so you should never
want these three). It's not as strong as serializable, but nobody can give you
a scalable serializable isolation level.

Snapshot isolation also guarantees consistency, but requires all transactions
to be idempotent (so they could be rerun in case of a conflict). It's the best
of both worlds, in practice most other databases already behave this way
anyway.

~~~
andrewcooke
so for inserts do you need to have some kind of uniqueness constraint that
makes sure that a repeated insert is rejected (the first example that came
into my head when i read "idempotent" was a simple insert, which isn't, as far
as i can tell)?

[sorry if this seems like an interrogation - it's just interesting stuff
you're doing...]

~~~
coffeemug
Essentially, it means that _any_ transaction might potentially be rolled back
and rerun. This isn't a problem for SQL, but suppose I select some stuff, get
back into the host programming language, fire off some rockets into space from
the Kennedy Space Center, and then insert some data about the launch into the
database. This is a _big_ problem, because if the insertion fails because of
potential conflicts, the _whole_ thing needs to be rolled back (including the
rocket launch), and rerun. A lot of software is written to account for this
(i.e. don't perform any external state modification you can't roll back until
you've confirmed the transaction is committed), but a lot of software isn't.
To really have great isolation _and_ performance, you need to write software
this way. For people that don't, we'll support serializable level, but there
are very strong limitations as to how efficient this can be.

~~~
andrewcooke
ah, ok, i misunderstood how broadly you were using the word "transaction".
makes sense, thanks.

------
ALee
Big congrats. Smart people tackling tough problems get my vote of confidence.

------
helwr
congratulations, Slava & Michael

------
vladocar
This is very interesting project, all the current DB are optimized for normal
HD(and the standard HD is the slowest part of our PC). But with development of
the Solid-State Drives we will have more and more fast drives. So the Database
who will take advantage of the these new SSDrives will lead the way in
Database design technology. It is right time to invest in this technology.

~~~
jrockway
But really, rotating disks are not that bad for most database use cases.
B-trees, the usual on-disk database structure, are designed to keep similar
data on the same disk page, which means that if you request row 42, row 43
will be in memory by the time you need it. So the slowness of the disk is
abstracted away; iterate over your data in index order, and it's always fast.

Hash tables have a theoretical advantage over balanced trees, and an SSD would
make a naive hash table implementation easier to implement. But if you are
smart (like, say, BerkeleyDB), hash tables and balanced trees have almost the
same real world performance.

RethinkDB might be better for write-heavy operations, but that's because SSDs
are better for random writes.

~~~
runT1ME
But the lack of locking is potentially big for multicore applications.

~~~
cperciva
I'm not convinced. Modern OSes use locking extensively and do perfectly fine
on multicore applications (FreeBSD's pgsql performance scales linearly up to
16 cores last time I saw graphs).

Obviously you need to be smart about how you do your locking (no giant lock!)
but the mere fact of having locking is not automatically a problem.

~~~
runT1ME
I misspoke, I should have said 'many-core'. Yes, you're probably right that no
respectable database is going to have a problem with lock contention on 16
cores. But, AMD released 12 core processors this week. Its likely we'll see
the average DB server have 48 cores sometime in the next year or two, and who
knows after that.

~~~
cperciva
I quoted 16 cores because that's the biggest hardware the FreeBSD project had
available when those benchmarks were being run -- I suspect that it scales
linearly quite a bit further than that.

------
known
Can't we implement RethinkDB as _features_ of MySQL or PostgreSQL

------
mlLK
Congratulations guys, you deserve it, and thank you to anybody else out there
writing drivers or optimizing software for changes happening in hardware that
we all take for granted.

------
ojbyrne
It says "maintanence" on their technical details page. Hopefully they can fix
that.

~~~
mglukhovsky
Fixed, thank you for pointing it out.

------
spicyj
Does the iPhone really have an SSD?

~~~
CrazedGeek
It uses flash memory, so yes. <http://www.apple.com/iphone/specs.html>

