Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've seen a lot of the rest of their code, and most if it is getting better over time, as they grow they're forced to adopt better habits in order to scale their engineering team. I think you're misunderstanding the type of programmers they are. They didn't use mmap because they are sloppy everywhere, they used mmap because their critical innovation was not in storage. What they really thought was valuable, what they wanted to work on, was the query language and cluster management tools, so they did the simplest thing for storage and moved on (personally I don't understand why they didn't just use BDB, maybe they were afraid of transactions, but I suppose everyone has a little NIH syndrome in their database). Now they're a bit locked in to that code, because after bolting on journaling (that architecture is a brilliant but incredibly dirty hack), the code is a mess and I'm sure nobody wants to touch it. In fact most of the other subsystems have been getting cleaner rewrites, except for the storage layer. I think the only way out is a complete replacement, which is what we did so I feel pretty good about that. So I don't know if I'll convince you, but I've read a lot of their code (especially in the last few weeks, I've been backporting things from 2.4), and that's the feeling I get about their history and vision. Hope it gives you some insight.


I don't think the problem is I misunderstand them, I just disagree with them

I disagree with them on what is the minimum viable product for a database. I come a storage and service provider background where failures are treated very harshly (usually death of companies for singular mistakes) so I take releasing a product that stores customer data very seriously.

To be honest this is the biggest attraction for me to RethinkDB. They waited a sufficiently long amount of time with a commercially backed team of very competent engineers that obviously have the required background to sit down and DESIGN a database. The query language generates a non-turing complete language with a clean AST the has all the right deterministic characteristics to implement a powerful planner/optimizer. Their on disk format has been abit in flux but the core design is excellent and you can see that it has been optimized for very fast range queries. Even the API protocol and serialization were designed with care, not to mention the excellent ReQL language and attention to detail when integrating drivers into the host language.

Which is the other thing I tend to dislike about Mongo, it reeks of lack of design. The journalling effort for instance as you pointed out is very adhoc, this goes for GridFS and alot of the other features they have integrated into the codebase. These are smells that I can't ignore when looking at a product that I need to trust with my data.

The counter argument is to not trust it with your data. But I am yet to find a reason where that makes sense where another datastore wouldn't be a better choice.


As I recall it, all the early noise they generated was their excitement about their hot benchmarks and how good mmap was...

They can try and rewrite the web and remove all the silly benchmarks, but they were the loudest "web scale" cowboys back in the beginning and we remember them for it.


This post does not inspire confidence. Sorry, has to be said.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: