We added it to the mix because it got increasingly difficult to tune SQL queries involved in building API responses, especially for endpoints that needed to pull data from many tables.
Our limited experience of MySQL operations was also a factor. We're on 5.5 and couldn't do some table operations that seemed promising without service disruptions. There were solutions to perform the actions we wanted without downtime but they scared us a bit. We also looked into upgrading to 5.6 or MariaDB but that seemed like it would take a long time and need much testing, while there were no guarantees that we would see performance gains.
We looked for alternative solutions and found RethinkDB. We reused the parts that serialize data for the API and put the resulting documents in RethinkDB. Then we had our API request handlers pull data from there instead of from MySQL and added indexes to support various kinds of filtering, pagination, and so on. We built this for our most problematic endpoint and got the two-server cluster up and running in about a week, tried it out on employees for another week, and then enabled it for everyone (with the option to quickly fall back to pulling data from MySQL).
This turned out to work well and we saw good response times, so we did the same thing for other endpoints.
There's some complexity involved in keeping RethinkDB docs up to date with MySQL (where writes still go) but nothing extreme and we haven't had many sync issues.
RethinkDB has been rock solid and it's a joy to operate.
Had you looked into using PostgreSQL's materialized views? you can add indexes to the view with the additional bonus of the view hiding those joins from client code.
Everyone has this problem. But it's been largely solved in practice by performing the schema changes on slaves, and then promoting the slaves to master.
Also, if you're just using RethinkDB as a delayed (and almost certainly inconsistent) secondary storage system, why not use ElasticSearch instead?
BTW, 250GB fits in memory on any decent size box. You're not really going to see how things scale till you get into the terabytes.
All of our physical boxes have 256gb+ of RAM.
They run VM's via XenServer for most of our uses, but our production DB's run on bare metal as they need the RAM and disk performance.
If you are looking for larger purchases 50k+ USD than you should talk directly with Dell, HP or comparable vendor and put them into the play off for who you choose :)
Rethink's 'ungroup' method lets you chain multiple reductions, which is incredibly powerful for building aggregation queries. Elasticsearch doesn't have that capability, and hence its aggregation capabilities are severely limited.
For example, with Rethink, it's very easy to compute a metric from metrics computed in a previous reduction. You can't do that with Elasticsearch, since its dsl allows metrics to only be computed from fields in the raw document, but not from other aggregation metrics.