Hacker News new | comments | show | ask | jobs | submit login
RethinkDB raises an $8M Series A (rethinkdb.com)
299 points by coffeemug 923 days ago | past | web | 126 comments



If you're an ambitious engineer thinking about joining a start-up, this is your chance to be smart about this.

RethinkDB team is the nicest possible group of people one can hope to work with in Bay Area. They have a great combination of often mutually exclusive things: hacker-friendly business model (no ads, tech for cash), aggressiveness and tech-savviness of founders, yet they're humble, honest and nice.

And the product works, well-liked, serving an exploding market, so the probability of failure is quite low by a typical start-up standards.

This is very rare. Make a move.


We are aggressively hiring. Most job descriptions aren't up yet, but will be posted in the next few days. Here is a quick list of high priority positions:

  * Reliability engineer
    Skills:
      - Good working knowledge of Bash and Python
      - Good working knoweldge of C and/or C++
      - Patience
    Responsibilities:
      - Automate testing and benchmarking infrastructure
      - Make sure long tests/benchmarks reliably run
      - Get to the bottom of stability/performance problems, and
        work with the engineering team to fix them
    More info: http://rethinkdb.com/jobs/reliability/

  * Technical writer/developer advocate
    Skills:
      - Basic programming ability in at least two of ruby/python/javascript
      - Good command of english language, style, tone, etc.
      - Ability to explain complex ideas in simple ways in written form
    Responsibilities:
      - Improve API documentation, guides, and tutorials
      - Help support users and bring their concerns back into the product

  * Ruby/Rails, Python/Django, Javascript/Node experts
    Skills:
      - Deep understanding of one of the above stacks (you should
        be able to hack on core Rails/Django/Node, know the
        conventions, and respect the community)
    Responsibilities:
      - Write software to make RethinkDB the absolute best possible
        experience for your respective stack
      - Then tell the community about it

  * Visual designer
    Skills:
      - Produce beautiful web designs and illustrations
      - Mastery of Adobe Creative Suite
      - Mastery with a tablet and stylus
      - Exhibit creative taste
    Responsibilities (we take visual design and user experience extremely seriously):
      - Design user interfaces
      - Help rebrand our website for commercial / community versions
      - Incorporate illustration and a unique visual style into our brand
      - Creative one-off projects (logo, design t-shirts, marketing
        materials, postcards, packaging, generally make things
        beautiful)
Email jobs@rethinkdb.com -- we'd love to hear from you.


Is it weird that I find the Technical writer/developer advocate position fun?


For the right person, it is fun! Shoot me an e-mail to slava@rethinkdb.com ? I couldn't find your e-mail in the profile/website/github.


Ah, I'm not in the US. :) Just never heard/thought of the title before, but it caught my eye and I thought I would enjoy that kind of work more than just being a developer.


Are you open to relocation? We can help take care of the logistics for the right candidate.


I don't seem like a good fit for any of these positions (Java/C++ engineer >.<), but I'd love to learn more about database internals and how things work and how to make them better. Is there much to do in that area, or is it pretty much complete? Should I just pick a bug and try to fix it, or is there some process to follow >.<


We're hiring C++ engineers. You don't have to know anything about database internals, but you do have to know C or C++ well and understand all the standard software engineering stuff. (I didn't list this position because it's a less hair on fire sort of thing). Please e-mail me -- slava@rethinkdb.com.

As far as getting started on the codebase, we unfortunately don't have a guide/mentorship program yet for hacking on the internals. The best way is probably to pick a really simple bug and try to fix it. I'll see if we can work on making getting started with core contributions easier.


Is it possible to work remotely on any of this positions?


Unfortunately we're not set up to hire remotely. Hiring remotely has to be strongly baked into the culture and we aren't quite ready to do that yet.

We do, however, help with relocation (and even take care of the logistics for people from oversees).


Does that include helping with a H1B visa?


Yes


I just recently had the opportunity to speak with Slava and Michael after one of my posts hit the front page of HackerNews. They are definitely nice people, I enjoyed speaking with them.

I am no expert on databases, so I don't have much to say on that front, but the admin UI for RethinkDB is definitely beyond anything I've seen for databases before. I know that's also something they value tremendously and I'm very happy to see them doing well.

Keep up the great work!


Rethinkdb is a really well-designed system. I've been using it (not in production currently) as a better-designed MongoDB with a proper query language. I would recommend checking it out for new projects where a document datastore is appropriate, and to migrate away from a troublesome Mongo.


I've been using mongo for a few years now, can you please elaborate more on better designed mongo and troublesome mongo?


Three words: JOIN, JOIN, JOIN. For any non-trivial domain, you'll need to join different entity types. For example, a collection of users is fine for user names, (hashed) passwords, last logins, etc. A collection of document is fine for text, fonts, URL. Now what happens if you have multiple users collaborating on multiple documents? Are you going to denormalize the shared documents to user objects, or are you only keeping document IDs in them? If you choose the former, you'll run into inconsistency very quickly. If you choose the latter, you'll need join support, or you have to write the code to do the poor man's join which could be very inefficient.

RethinkDB supports join out of the box.


I actually have been running into this a lot with my documents in my project. Thinking about making the switch to Rethink. The only thing I'm bummed about is having to migrate all my Mongo code over. Anyone here end up doing something similar?


So, if I have 30 nodes and my data is sharded between them, how is that better than having app/client side join, where I have more control what to fetch and what not? I could potentially cache results.

I mean, you realize if data is distributed at large scale, it may take a while till it gets from all nodes the data and joins it...


If you let the DB do the joins, it could handle more efficiently. For example, it could distribute the joins to those 30 partitions of the main table, and then merge the results, so the heavy computation is distributed, and less bits to move around the network.

Now in the cases where if you can optimize the joins, you still have the option of doing it in your code in RethinkDB/CouchDB. I've done that too, and it's usually when I know for sure that I can prune a big collection to a very small subset more efficiently than using an index.

I would still argue that client app is not the right level of abstraction for data join though, unless it is a big performance gain for very little extra complication.


Slava @ rethink here. I'd be curious to see a use case where doing the join on the client is more efficient than doing it on the server. I can't think of a single one off the top of my head.


If the join is "compute all f(s, t) for s in S and t in T" then you'll save bandwidth (having O(|S| + |T|) bandwidth over the network instead of O(|S||T|)) by doing it on the client. Of course you could just run `rethinkdb proxy` if you want to save ethernet bandwidth and run the query on RethinkDB while connecting to the local cluster node.


Ah, I see, that makes sense. Usually pulling both tables in full (or even a single table in full) to the client is not an option (and nobody does cross products in real-time systems). So people end up pulling a subset of table A, and then for each document in the subset issue a separate get to the db for table B (which is obviously worse than having the db do it).


This was brought up in a big way in the recent firestorm around Mongo, but the lack of joins precludes it from any use as a relational database (which is fine, since it's not intended as one). However, I've never been able to use Mongo as a large-scale store as a result.

It seems like the addition of joins lends some relational capabilities which in my eyes is very impressive. I'll be watching the advance of this one with interest.


How does Rethinkdb handle replication, multi-master and MVCC? Thath's the main reason to use CouchDB over MongoDB.


Slava @ rethink here.

RethinkDB supports MVCC out of the box -- you don't have to do anything as a user to get it. (It basically means you can do writes and long reads concurrently without any locking issues).

Sharding and replication in Rethink is similar to Mongo's architecturally, but is much easier in practice. You can shard and replica in Rethink in one click -- check out the 1m highlights video (http://rethinkdb.com/videos/what-is-rethinkdb/) to see how it works. If you have a little more time, take a look at the 10m screencast for more details -- http://rethinkdb.com/screencast/.


I was talking to a guy from Mongo at a conference and he said that Rethinkdb's sharding is as bad as Mongo's because it can't be done in realtime. Is this true? Do you think he meant that it would just be slow to move to sharding for a big database?


In practice today, Rethink's sharding is very similar to Mongo's sharding. This won't remain the case for long -- you'll soon be able to add and remove shards live if that's a requirement in your application. The architecture's already in place, but there are some loose ends we have to tie up to expose this functionality to users.


sharding IS NOT similar to mongo, although it's easier, non-primary key sharding + range sharding(which is going away) is missing

and i think those are very important in high-performance sharding (think-per user_id sharding etc)


Is that ACID transactions with multiple keys?


No, there is no ACID for multiple keys/documents, only on a single key/document.


How do CouchDB and RethinkDB compare?


In addition to what other's posted, CouchDB is one of the few NoSQL solutions that will give you master-master replication.

Mongo and Rethink are both single-master/multi-slave solutions.

One is not better than the other, just depends what you need.

And as someone else mentioned, if you want complexity out of your CouchDB queries, you must write map-reduce functions to provide those "views" for you to query. Rethink you can treat similar to a SQL data store and just execute queries against it.


Couch has a much shorter list of operations. Really only get, put, delete, and whatever you can cook up yourself with mapreduce. Rethink has ad-hoc live queries and joins. I don't think Rethink has a tailable update log in the way Couch does.


CouchDB has a HTTP REST API, and speaks JSON natively. You can write your own CouchDB client to meet your needs. You can also access your data from any server using http.


I've been struggling with this one a lot - could someone explain where a document store is a good solution?


Imagine an RDBMS table where many columns are nullable and many rows contain nulls in practice. Dealing with these is really painful in traditional database systems (you don't know how painful until you try a document store).

A document store flips the default. It makes dealing with data that has lots of nullable columns much, much easier. (It also makes dealing with hierarchical data a breeze)

There are lots of details, but this is the gist of it.


How is it better than an XML field in SQL Server, which allows indexing, schemas (if you want), and full querying inside the document? I think Postgres also has similar functionality with JSON, now, too.

Certainly a lot of applications would benefit from having a full RDBMS they can opt-in to document-style data when they feel like it?

Built-in horizontal scaling is one selling point for non-RDBMS stores, but large systems seem to just shard on top of RDBMSes anyways, right?


> How is it better than an XML field in SQL Server

It changes the default, which results in a drastically different programming experience. The difference is difficult to describe in the same way a dynamically typed programming language is difficult to describe to someone who's never tried one.

I'd encourage you to try a document store (Mongo, Rethink, whatever) for a throw-away project. A ten minute tutorial walkthrough is worth a thousand HN comments when it comes to stuff like this :)

Here's the Rethink tutorial: http://rethinkdb.com/docs/guide/python/. Just play with it and see if you like it!


OK, I will try it out for something and see how it feels different.

Related: The comparison to a dynamically typed language makes me suspicious. I spent a bit of time trying to find any examples of dynamic code that actually provided any benefit. Even read "Metaprogramming Ruby" and was dismayed to see examples of reading a CSV - big deal if I save a few quotation marks. The others were just places where the static type system wasn't good enough (duck typing), or dynamic code was a pain to get going (poor reflection/codegen APIs).


I'll try to add some additional color here. My background is that I work for Google (where the vast majority of our data is stored in what amounts to document databases, generally protobufs in a key-value store), I prototype in Python and Javascript, and I write production code in Java and C++.

Both document databases and dynamic typing are at their best when you don't understand your problem domain. They let you express what you do know about your problem domain concisely, and then fill in the blanks later on. So in a document database, when you find that you want to record a new bit of data - just add it as a field to newly-created XML/JSON documents, and only display it in the UI if it's present. Or pick a default value if you need to perform computations with it. Don't bother with data migrations, don't bother with schemas, don't bother trying to backfill previous data. Try out your idea and see if it works first, because chances are, it doesn't.

If you always work on projects where the requirements are handed to you, specs are complete, and the problem domain is understood, this will seem terribly irresponsible to you. And it is - if you understand your problem domain, you should capture as much of that knowledge in the software system you build to understand it.

But if you are working in startups, or in consumer web, where you absolutely have to be on the leading edge or die and the only opportunities that haven't been picked over yet are the ones that nobody understands - being able to try things out without having to flesh out all your assumptions is crucial. You will run circles around the people who spend time defining their data model and speccing out their objects. And then when consumer tastes change - which happens quite regularly - you can adapt to them immediately instead of throwing out all the work you did under the old assumptions.

The other bit of context I'll toss in is to get in the mindset of solving a problem that you don't know how to solve and assume that your first 10 solutions won't work. For example, if you're reading a CSV - everybody knows how to do that, dynamic typing doesn't really help there. If you're cloning Stack Overflow, you can probably figure out what your database schema should be. But what if you're trying to figure out a new way for people to socialize over mobile phones? Where do you start there? That's the use case for dynamic languages and document DBs. The problems where technology is a tool for understanding & manipulating vaguely-defined social behaviors.


Thank you for the explanation. I can understand the logic.

In F#, adding a field requires "field : type". If I don't care about type checking, I can just add "Props : dict<string, object>" and go to town. Or I can opt-in to the dynamic features and just do "foo?bar <- baz". When I change types around, things either just work due to type inference, or the compiler helpfully points out every place that'd be a runtime error. I've never felt this slows me down. I feel the type checking and autocomplete is worth the tiny amount that specifying a record takes. (I've spent days finding minor issues in JavaScript, stuff that'd instantly be caught by a type checker.)

Databases make it a more cumbersome, and it takes more than one line to start using a new field. I totally sympathize with the flexibility issue there. Even with a document type, most syntax I've seen doesn't have truly first-class querying support (not as easy a column, anyways). And it feels ugly to have some fields defined in schema, and some in a document. But that seems like a minor tooling issue -- there's no fundamental reason SQL can't let me do "WHERE x.SomeDoc.SomeField.OtherField > 5" (perhaps some minor scope resolution issues to ensure I'm not referring to some other multi-part name).


I'm using it to collect stats from various systems, events and alerts too. It makes it easy to collect new data points and add fields to new events and query things in a consistent way across the entire data set.

When the schema is not rigid and likely to change on a daily basis I prefer a document database over an SQL one. There are probably other use cases as well, this is just my favourite.


One example is a CMS. An article these days is a title, body and 5-10 comments, and whatever other metadata that you would want to present on a page. Don't update the corresponding NoSQL record till there are writes in the RDMBS. Serve from the NoSQL system, with at least another caching layer in front. You get the best of both worlds and just one query instead of many.


Why not just use a materialized view inside your SQL system? No need to cache invalidation; let the system handle it for you. Same benefits, truly a single query, and only one system.

If you're going to concatenate relational data into a document, I'm not sure why a simple KV table doesn't fit the job.


Thanks, had to read up on what exactly is a materialized view. Not something I was familiar with.


I've been looking at Rethinkdb for an upcoming project that has a decent amount of relational data. Rethinkdb seems a lot different from Mongodb in that it's partially designed to accomodate relational queries. Has that been your experience?

Also, does Rethinkdb have any kind of transactional capabilities?


> Does Rethinkdb have any kind of transactional capabilities

You have full ACID on a single document, but not across multiple documents. In this way Rethink is similar to other NoSQL systems (except you can do almost any operation imaginable atomically on a single document in RethinkDB).


Nice, can't wait to try it out! Though in my case (single node installations) hot backup missing feature is a deal breaker, so I'll have to wait till RethinkDB supports it. The current workaround (replication, stop the slave, backup, start the slave) is just not good enough for me...


Hot backup has been available since 1.7. You can learn more about how it works here: http://rethinkdb.com/docs/backup/

Enjoy!


I'm looking forward to the LTS release so I can feel more comfortable using it in a production app.

A slight aside, but I spotted this (currently broken) integration of RethinkDB and Meteor the other day and wanted to share. It does away with the long poll Meteor is doing on Mongo. (I have no involvement in this project at all.)

https://github.com/tuhinc/rethink-livedata

http://www.youtube.com/watch?v=YLu_ROrA0YY


Meteor core dev here! We are super excited about Rethink and Slava and I have been talking for a while about an official integration. We scoped multidatabase support out of the upcoming Meteor 1.0 release just to get it out the door faster -- we need to support people using Meteor in production with frozen APIs, and we need to do it yesterday -- but support for Rethink is something I'm very interested in exploring in 2014.

As for the polling on Mongo, you'll love what Meteor is shipping this week. Meteor now by default connects to Mongo as a replication slave and slurps up the replication log to drive your realtime queries.


Excellent news! Came here to see if anyone is doing meteor-rethink integration, and saw this. It will be a killer combo. Keep up the good work.


++ for multidatabase support.


Oh, I was just reading an article about this (http://www.hackreactor.com/blog/Building-a-RethinkDB-Module-...) and I'm sad to hear it's currently broken. I really hope the meteor guys add support for something better than Mongo. I've already ran into scaling issues on a meteor app I use to gather analytic information. You really need to implement sharding very early on with MongoDB in order to keep memory use and locking issues under control. To me that means a large investment in hardware and infrastructure just for a personal project.


It amazes me that there is so much money floating around to fund these niche open source NoSQL products.

(Not that I'm not a fan of RethinkDB. I've been playing with it since one of the earliest released builds and find it a really lightweight nice database to use.)


Used to think the same way about most investing till I figured out better.

I do not know the specifics of this particular deal, nor have I used the product, but I hope the points are of going to be some use.

Let us look at it this way. Assume there are 200 funds out there who can do a series A of this size. That would naturally mean that not every fund is going to be either a leader or someone who spots new trends (there are not enough trends out there). Naturally, a lot of them have to invest in deals in other companies in a hot sector.

A lot of investment is momentum-driven and momentum is often driven by the narrative. You have to remember that as long as a successful exit happens, the fund winds up with a good deal irrespective of whether the public (IPO) or the acquiring company (M&A) eventually profits from it. NoSQL has that momentum at the moment.

A healthy start-up ecosystem can easily support more than a handful of companies in a single domain. Once the narrative for the domain really picks up, even the not-so-great ones (again, I have no clue about RethinkDB) stand a good chance of being acquired as long as there is decent enough traction and the sector is so hot that there is pressure on the GPs to make a play in it.

The later they get into the game, the pricier the ticket becomes, but you get lesser risk too.

And all of this is perfectly OK and fair.


Slava @ Rethink here.

This is a really good breakdown. I can't read our investors's minds, but I'm pretty sure this would be a worst case scenario for them. It's certainly not why we're doing Rethink -- if we thought it would be a #5 company in the space, we'd pack up and do something else (life's too short).

The NoSQL market is reminiscent of "horseless carriages" -- as long as you define a technology by an absence of something, you know you're early in the game. Databases are a fundamental part of the technology stack, and they tend to easily stick around for 20-30 years. We think we can build a long-term open source company that will stick around for that long (incidentally, that's why we take conventions in ReQL so seriously -- we imagine millions of programmers fifteen years from now cursing at us for a stupid naming convention).

It's not hard to imagine groundbreaking features in NoSQL products that nobody is shipping. That's why RethinkDB exists, and we think we won't be a niche product for long.


Slava, I love the "horseless carriage" analogy. I'll have to use that as I try to raise money in the NoSQL space :) Seriously though, it also points to a future where the NoSQL name is going make less and less sense. Anyone have suggestions for the NoSQL database equivalent of the word 'car'?

-Dave (FoundationDB)


Like my sibling commenter said, I prefer names that are more descriptive. I prefer "schemaless" databases, but it depends on what you do. Redis is called NoSQL but it's not a document database, it's more a key-value store with lots of slicing and dicing features.


> Anyone have suggestions for the NoSQL database equivalent of the word 'car'?

Not really, because "NoSQL" databases aren't really a coherent group of things. "Document databases" is a good name for an important subset, though. As are "Column-oriented datastores".


Hey Slava,

First up, congrats. Whatever the back story is, getting funded (save runaway revenue/profits/margins) is always a crucial inflection point in a company/product's lifecycle as an enabler for bigger and better things. Whether those will eventually happen or not, nobody knows. But there are a lot of things that only money can accomplish and investment is a key enabler for a start-up that needs capital to scale/grow. If you find an investor who is aligned extremely well, it is a massive bonus.

Historically, a lot of good has also happened from a combination of events that may not exactly be awesome. Outcomes always trump everything else. So don't sweat the mind-reading angle much!

Can't comment much on the technical aspects of the product as I am not even remotely qualified to do something like that.

I'm excited for your team and I have immense respect for anyone who builds an OSS company. There is much that the world owes to numerous companies and individuals releasing code like this and don't get enough credit for it. So, thank you and hopefully it will come together very well for everyone involved :)


I was thinking about investments the other day (I'm also trying to wrap my brain around this topic). I think it comes down to this:

An investment happens when the investor thinks the company will be worth more to another investor in the future.

That's basically it (and it's basically what you said). The second investor in the sentence above could be another VC, a buyer (M&A), or the stock market (IPO).


RethinkDB is credible (perhaps major) competitor for MongoDB valuated at $1.2B. So this founding seems very reasonable.

There is actually real shortage of innovative databases. For example Redis was released just a few years ago, but all ingredients were here for decades.

It takes years to develop, quality demands are very high and takes long time to build reputation to get enterprise users. Also experienced people are scarce and get 'hired away' by big guys. Very hard field for start-up.


There's some serious money to be had in this market. I would actually argue this a very low amount.

http://www.marketresearchmedia.com/?p=568

http://wikibon.org/wiki/v/Big_Data_Database_Revenue_and_Mark...


Thanks, there are some good numbers to chew on there, though, I am always skeptical about projects, thanks to IDC/Garntner.

From the little I have seen of NoSQL, I like it as a niche use-case DB, not entirely divorced from a RDMBS. Thus, not surprised it has the potential to grow like crazy.


Well, they're funding on the belief that one of them isn't going to remain niche.


RethinkDB is awesome. I have a stealth project which uses RethinkDB in the backend. I moved it over from Mongo over the weekend. It will be revealed soon. But I'm working with about 100 million records. Currently testing it with node, slamming it with thousand of concurrent requests.

Also, I just started a blog series on Rethink. http://www.realpython.com/blog/python/rethink-flask-a-simple...

Next up will be performance testing.

Congrats, RethinkDB.


I'm doing the same thing, launching a service built on RethinkDB and Flask within the next week. I recommend giving Rethink a go if you're contemplating an easy to setup and easy to deploy (AWS supports them) database solution. It also helps that the community behind Rethink are a friendly bunch and the documentation is steadily growing.

I've barely scratched the surface of Rethink in terms of functionality and features; but I definitely see a market for it as a competent and approachable database for people who want something that 'just works'.


Thanks for the post on Flask and Rethink. I am starting a side project using these two and it is always good to come across helpful resources like yours.


How much code had you written with Mongo before? What were your major pain points in switching?


I've been using mongo for about a year.

The only pain points are that the API documentation is somewhat confusing. But they still haven't really launched so I expect that the documentation will be updated with better examples soon.


Could you shoot me a mail at michel@rethinkdb.com with what you found confusing? We are working on making the docs better, and your feedback would really help a lot!

Also nice tutorial! I


I've already spoken with you all on it. This is Michael Herman. :)

Show inputs and outputs!

Love the theme you have going. Cheers!


Your post on flask and rethinkdb is great. Very helpful indeed.


thanks! what would you like to see in the second post?


I like the idea of LTS release.

Btw, how do you know that "Thousands of developers are already building applications backed by RethinkDB;"?

PS: I hope we'll see soon more official drivers...


> How do you know that "Thousands of developers are already building applications backed by RethinkDB;"?

We know in two ways. First, people have told us via GitHub/IRC/e-mail/etc. and Shirts For Stories page (http://www.rethinkdb.com/community/shirts-for-stories/).

Second, the administration UI checks for version updates, which gives us some information about how many developers are using RethinkDB and how long they stuck around.

There is no perfect way to know because there are many sources of imprecise information, but we're quite confident about the overall conclusions of the data.

> I hope we'll see soon more official drivers...

We've been trying to keep the surface area of the project low for now. Which specific drivers are you interested in? (we probably won't be able to do anything about it for the next ~6 months or so, but having the info helps enormously)


The part with update check is interesting.

Java! My current stack is spring mvc 3, spring-data-mongodb. It's not perfect, but at least I don't have to hack my way into basic db ops...


I'd like to second the call for an official Java(/Scala) driver. Would be real helpful to push RethinkDB forward in the circles I'm in.


Any specific driver in your mind?


Java in particular... I don't want to use a small community supported driver in production. Which may be the reason that popular frameworks will not release support until there is one.

I just trying to compare with my current stack spring mvc 3, spring data mongodb.


+1 to official Java drivers


Scala/Java so you can hire/contract me ;)


Been trying RethinkDB on and off and really like the query language. Does anybody know if there are any RethinkDB hosting/DBaaS services out there? Sysadmin/devops is not my forte, and with LTS coming, I hope companies like MongoHQ/MongoLab/IrisCouch for RethinkDB start to prop up.


Have you tried www.rethinkdbcloud.com ?


I think I stumbled upon it before, looks rough around the edges and very early preview/beta so I kinda passed on it. Will revisit it soon.


Is that hosted by RethinkDB or another company?


It is not hosted by RethinkDB, it's another company.


Congrats to Slava and the RethinkDB team. I'm really looking forward to your continued progress.


Congrats guys! Well deserved.

I have been using RethinkDB over the last month in a new project. If you know that a document store is the right solution for you, take a look at RethinkDB. I evaluated it against some of its competitors, and I must say that I was really amazed at the deep engineering thinking that is going into RethinkDB. The ease and power of its programming model (use of AST/lamda functions and like abstractions are awesome), and attention to ease of deployment and manageability (great UI!) is unparalled in like products. RethinkDb is a young product for sure, but one with a very bright potential. In addition, being well funded should help alleviate fears and hopefully help it further gain traction.

Best of luck!


Super excited for the rethinkdb team. I absolutely love their database and am currently trying to stop using it for everything.

Super helpful on IRC (I've been in there multiple times for help with small problems), seems like an overall awesome team


Very cool. Congrats, guys! This was a great talk - http://www.hakkalabs.co/articles/how-rethinkdb-works/


Congrats to Slava, Mike and the RethinkDB team.


I wonder if the low limits on the number of databases and tables will be fixed in 1.12.

I've seen issue 1648 closed for 1.12 and issue 97 to be completed. Are these two enough to fix the limits?

Will it be possibly to have, say 100K tables in a single database? In 1000 databases? Is it possible to have 10K databases?

I've read @cofeemug's explanation that a table is a heavyweight object requiring a few megs of disk space. But that's just a few TBs for 100K tables which is perfectly fine for a 32 node cluster.

Also, it would be great of you could make a page like Mongos' "Limits and Thresholds". I understand that you have lots of other things to do but that one is key in making a decision to use Rethink vs other options.


Congrats to Rethink! Really glad to see such a fascinating product get the funding it deserves!


This is so awesome. I have been excited about RethinkDB and a real mongodb alternative and was just hesitating based on RethinkDb's ability to last. But now I know you will last. Just awesome. I'm going to start using Rethinkdb on my next project.


There are two things that bother me about the Node.js driver. The first is that it doesn't have its own repository, you have to go to https://github.com/rethinkdb/rethinkdb/tree/next/drivers/jav... The second one is that it's written in CoffeeScript. That may have been a good idea at the beginning, but if you want to have more traction and more developers looking into the source code I think you should 'translate' it into raw JS.


Read this article, installed and started messing w/it.. Got any stats or insight how this holds up in a real prod environment?

A very crude measurement - I just threw it on a box that I'm 70ms away from, I'm getting insert responses back in 90ms, on Mongo which I HATED (the "old" query language I was trying to understand.. a while back it was thought for just a moment we could get by without the relational algebra) - I was about 200ms - so far so good, but how much can I hammer a particular node?

Looks to be using ~20MB ram on the svr process, works for me..


Just curious, how many indexes were on the structure you were inserting? Also, i am assuming 90ms per structure, if that is correct, then that is equal to 11 inserts per second which is unfortunately very slow..


> Also, i am assuming 90ms per structure, if that is correct, then that is equal to 11 inserts per second which is unfortunately very slow..

You can't quite divide like that. In Rethink

  r.table('users').insert({ 'name': 'Bob' })
  r.table('users').insert({ 'name': 'Jim' })
is much slower than the equivalent batched insert

  r.table('users').insert([{ 'name': 'Bob' },
                           { 'name': 'Jim' }])
The latter isn't subject to network round trips and can batch disk writes, drastically increasing performance relative to multiple individual writes. You can see a similar effect in most other database systems.

There other nuances to this -- the complexity of measuring things properly is why we haven't published benchmarks to date. Check out this doc on insert performance for more details: http://rethinkdb.com/docs/troubleshooting/#my-insert-queries...


Assuming a tcp connection can be established in 15ms worst case and another 10ms for data transmission, 65ms spent by the dbms for a single insert is incredibly slow.

I am very aware of the increase in performance of batch inserts and batched disk writes and so on.. as an individual who has worked on the development of a major DBMS.


No, 90ms including in transit, so 20ms in svr processing time.


Assuming your 70ms in transit is accurate, although you have not provided details on how you measured that. 20ms svr processing time is only 50 inserts per second which is simply untenable in respect to a high performing rethink..

In effect if i wanted to do 10000 single writes per second typical of most interactive systems, i will need 200 nodes to pull that Off.


By default Rethink requires an fsync from disk to guarantee data safety. So a typical default query goes like this: client request -> network trip -> wait on disk fsync -> network trip -> ack. That's going to be really slow in any system, and makes 10k writes/sec pretty much impossible on any rotational disk.

In Rethink you can turn on soft durability which will allow buffering writes in memory. That would drastically speed up write performance. Another option is to use concurrent clients. Rethink is optimized for throughput and concurrency, so if you use multiple concurrent clients the latency won't scale linearly.


We haven't been publishing benchmarks yet because it's hard to do a good, scientifically sound job. Different hardware, different setups, different conditions make things really difficult.

If you run into any performance issues, please let us know, we'd love to fix them!


Congrats on the deal. It is some cool tech.


Congratulations, can't wait for 2.0!


What are the pros or cons of using RethinkDB versus say Cassandra?


Here is a very short summary.

RethinkDB is much easier to set up and operate, has a much more powerful query language, and is much friendlier for application developers.

Cassandra supports high write availability in case of network partitions. RethinkDB does not. The flipside of that is that you (as an app developer) have to deal with conflicts in Cassandra which makes writing applications much more difficult.

If you need high write availability in case of network partitioning, go with Cassandra or Riak. If you don't, go with RethinkDB or MongoDB.


Much easier to set up? What can be much easier than unpacking an archive and running a script to start a node?

IMHO the main difference is data model. RethinkDB is a document store, Cassandra is a wide-table store. As of the query language I agree, but this is caused by Cassandra never including anything that would not scale out.


great explanation. thank you!


Any notable companies using RethinkDB in production?


Slava @ rethink here. There are hundreds of really cool production use cases of RethinkDB that we know of. We'll be publishing a "who's using it" page soon (we have tons of work and haven't gotten around to this bit yet, but will soon).

Note that Rethink is still in beta. Lots of companies already use it in production, but we advise people to test carefully until we ship a long term support (LTS) release. We'll also offer commercial support options then.


Congrats! Using RethinkDB on several projects. I love the web interface, I love the query language, I love the features. Keep it up!


I tried to find the restrictions/performance costs of storing and accessing medium to large blobs of text in rethinkdb store (say, for a simple web crawler), but couldn't find docs related to that. Is there a size restriction or some other insights from you rethinkdb users or developers out there?


A blob is generally restricted to 10MB. We haven't published performance data thus far as we're still working on improving performance. If you try Rethink and run into problems, please let us know -- we'd love to fix them!


Good to know - thanks and best of luck to you guys!


Congrats to the RethinkDB team! These guys rock, and I'm looking forward to their future releases.


Coffeemug, I remember you on IRC in #weblocks. Good stuff, hope you took some off the table.


Having met Slava a couple years ago (which seems like a long time ago) in our office, he is one of wicked smart folks around. Congratulations! Looking forward to the evolution of RethinkDB.


Good luck, and your first page is golden:

    joe@alchemist~$ rethinkdb
    joe@clockwerk~$ rethinkdb -j alchemist:29015
Just guessing, but joe is probably a Dota fan :)


Does anyone know at what percentage equity was this raised?


What is their revenue model?


Build database, sell support. Like any other open-source DB. Or Riak, for instance, charges for site-to-site replication. Generally if you have an open-source project many people are using, you can make money with it.


RavenDb actually sells the actual product so you need to pay every month for it.


OK, this explains part of their plan:

http://venturebeat.com/2013/12/16/rethinkdb-grabs-8m-to-show...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: