

Answers to common questions about RethinkDB - alexpopescu
http://www.rethinkdb.com/blog/answers-to-common-questions/

======
themgt
If anyone wants to play with rethink's (awesome) web UI, we setup a demo here:
<http://rethinkdb.a.pogoapp.com>

We ran into a little bug w/ the ajax requests using the wrong port when run
behind an nginx reverse proxy, and the team was very helpful and responsive in
getting it fixed: <https://github.com/rethinkdb/rethinkdb/issues/63>

Seems like a very promising project

~~~
woogley
Thanks, but word of warning - through the Data Explorer interface it is
possible to take down a core or two by passing things like 'while (1) {}' to
the r.js() command

No good deed goes unpunished, etc.

~~~
jdoliner
There are actually really cool insidious queries you can run. There is for
example a query you can run which will insert an infinite number of documents
into a table!! Hats off to the first person to post it.

It's awesome that you're hosting this you should just be sure it's on a
machine you're not depending on for other services.

~~~
atnnn
t.forEach(t.insert(t.without("primaryKey"));

Only try this at home.

------
nlh
I'm almost fearful for asking this given the somewhat anti-Rails sentiment I
sometimes see here, but...

Anyone know of any projects to create a Rails ODM for RethinkDB? (a la
Mongoid/MongoMapper?)

Seems to me a natural fit given the (relative) popularity of MongoDB in the
Rails world and the comparisons/improvements of RethinkDB over MongoDB.

~~~
jemeshsu
To compete with MongoDB for Node crowd, you need an ORM similar to Mongoose.
It seems that Node users now default to MongoDB in their stack, much like
MySQL in LAMP.

For Golang, mgo <http://labix.org/mgo> is a quality implementation driver for
MongoDB. Hopefully there is something similar for RethinkDB.

~~~
coffeemug
We're working on improving the API spec so people can write clients easily
(we'll publish an RFC soon).

As far as ORMs go, we do plan to do that, but it will take a little bit of
time. It shouldn't be very difficult, so we hope to get to it soon.

------
mintplant
I've got a question that I haven't been able to find an answer to anywhere.

What would be the best way to set up a one-to-many relationship in RethinkDB?
For example, a User has a Store, a Store has Categories, and a Category has
Products.

Along the same lines, is there some functionality similar to MongoDB's
compound indexes?

~~~
coffeemug
There are two ways to set up a one-to-many relationship. The first is to use
an array that holds ids. The second is to use a more traditional db approach
and create a separate mapping table where each document holds ids of documents
its mapping.

Use the former approach if your array will be relatively small (say, under
~2000 items). Use the latter if the number of relationships is much bigger
than that.

As far as compound indexes, we don't have support for that yet, but it will
happen.

~~~
mintplant
Good to hear that compound indexes are forthcoming. When you do have secondary
index support, will there be functionality to enforce a "unique" index, or
will that have to be handled application-side?

Also, in your API documentation [1], I see there's an `append` function to add
a value to an array. Is there an equivalent function to remove a value from an
array?

[1] <http://www.rethinkdb.com/api/#js>

~~~
jdoliner
Yes, you can do arbitrary slices on an array like in python such as:
array[:5]. This is done using the slice command. I believe it's overloaded in
python to work with it's native slice syntax.

------
ra
What has to happen before rethink is production ready?

I have to build an analytics db for commercial launch in February, it'll be a
single machine cluster for the first few months. Would rethink be a risky
choice?

~~~
coffeemug
slava@rethink here. RethinkDB is new software and still has bugs (see:
<https://github.com/rethinkdb/rethinkdb/issues>) We're working very hard to
iron out all reported and known issues, and improve the testing infrastructure
to find new ones. There is nothing critical that's wrong with the server that
we know of, but I still prefer to wait until the test infrastructure is much
more comprehensive before I can comfortably recommend people to put it in
production. I suspect this process will take about three months.

That being said, if you choose to build analytics on top of RethinkDB, we'll
support you all the way.

~~~
StavrosK
I'm building exactly that (it's on postgres now, but I'm considering moving to
RethinkDB when it's more stable). It's good to know you have my back.

By the way, for the benefit of the commenters here, I checked RethinkDB two
days ago for <http://www.instahero.com> (the analytics app I'm building), and
I ran into a bug where RethinkDB's count() would only return half my data.

The team was very responsive and fixed the bug in a few hours ( _and_ pushed
the fix so I could test it), so I'm very impressed by the response time. I
can't hold the bug against them, because they do say it's not production-
ready.

Also, I pointed out that the DB listening to interfaces other than localhost
by default was a potential security risk and that, too, was fixed in around a
day. Kudos.

------
smoyer
"How do I install RethinkDB on other Linux flavors?"

It would be nice to mention that RethinkDB only compiles on x86-64. If you
follow the instructions on the linked "building from source" page, it works
fine on 32 bit machines ... until you actually run "make".

It wasn't that much time, but I wouldn't have tried if I'd known ahead of
time.

Thanks for the effort!

------
KaoruAoiShiho
Is it production ready?

~~~
coffeemug
No, we're still ironing our some issues, improving the test infrastructure,
etc. We'd love for you to use it in personal projects, but I wouldn't put
rethink in production quite yet.

~~~
eikenberry
Timeline/Roadmap? That is, when will it approach being production ready?

~~~
coffeemug
slava@rethink here. I'm hesitant to give a timeline - we won't say the product
is production ready until we know it is, and since complex software systems
can have bugs lurking under the surface, it's difficult to predict how long it
will take.

That being said, the way we're approaching this issue is by overhauling the
test infrastructure to stress the server in as many ways as we possibly can. I
suspect it will take about three months for us to feel satisfied with how
extensive the test infrastructure is. Once that's in place, we'll be able to
recommend rethink for production. Of course if the test infrastructure exposes
hard-to-kill bugs, it will take a little longer.

------
Toshio
I don't know if this is a common question, but it is a question I have about
every data storage and retrieval engine, be it an SQL database or a NoSQL
datastore:

How would you re-architect it if you knew ahead of time it would only sit on
SSDs - never on spinning media?

~~~
jdoliner
Hi Joe Doliner engineer at RethinkDB here. RethinkDB is actually structured in
exactly this way because it was originally conceived as a memcached compatible
database optimized for solid state drives. Optimizing for solid state drives
is a complicated matter on which we published some papers. Here's the HN
comment level of detail description:

SSDs are, unlike rotational drives, inherently parallel devices. Each drive
has a number of flash memory units which are capable of concurrently doing
writes. To split the load evenly we divide files up into zones which we call
"extents" the number of extents is configurable and varying the number of
concurrent extents will give varying performance for a given drive. Normally
there's a sweet spot where you have as many extents as it has independent
chips. Extents are written to in a very specific manner. They are append only
which means that you start at the top of the extent and fill it up with blocks
of a predetermined size. The block size can be tuned to optimize performance
as well although I'm not sure off the top of my head which properties of the
drive determine which block size is best. Once an extent has been filled up
top to bottom we stop writing to it and move on to another extent.

Garbage collection: Each block that we write in an extent is actually a
version of a logical block in a btree. When blocks are changes they are
rewritten and the old block is marked as garbage. This leads to extents which
are filled almost entirely with garbage. When we hit a threshold we look for
the extent with the most garbage copy all of the non garbage blocks to an
active extent and then reclaim the extent. SSDs have their own garbage
collection mechanisms that can seriously degrade the performance of the drive
if when they misbehave. If we get the extent size right the drive will have a
much easier time of garbage collection (I actually think the drive won't have
to do anything at all for gc in many cases.

