Hacker News new | past | comments | ask | show | jobs | submit login

My company uses MongoDB. Our biggest pain points are:

1. MongoDB has massive storage overhead per field due to the BSON format. Even if you use single character field names, you're still looking at space wasted on null terminators. 32bit fixed length int32s also bloat your storage use. We solve this by serializing our objects as binary blobs into the DB, and only using extra fields when we need an index.

2. In Mongo, the entire DB eventually gets paged into memory and relies on the OS paging system which murders performance. For a humongous DB, not so much.

3. #1 and #2 force #3, which is sharding. MongoDB requires deploying a "config cluster" - 3 additional instances to manage sharding (annoying that the nodes themselves cannot manage this, and expensive from an ops/cost standpoint).

What I would like to know is:

1. What is the storage overhead per field of a document in RethinkDB? If it's greater than 1 byte, I'm wary.

2. Where is the .Net driver?




1. In the coming release we'll be storing documents on disk via protocol buffers, which, unlike BSON has an extremely low overhead on fields. A few releases after that we'll be able to do much better via compression of attribute name information (though this feature isn't specced yet).

2. No ETA yet, but we're about to publish an updated, better document, better architected client-driver to server API spec, so we'll be seeing many more drivers soon.


If you use proto-bufs, it means you already have a system for internal auto-schematization. Why not pack all the fields together and use a bit-vector header to signify which fields are present and which fields have default values? I'd LOVE to see a document DB with ~1 bit overhead per field.


Yes, that's pretty much what we're going to do. It's a bit hard to guarantee everything in a fully concurrent, sharded environment so it'll take a bit of time, but that's basically the plan.


#1 - you might find it's not just the per field overhead, but the per document one. Check out the powerOf2Sizes settings;

http://docs.mongodb.org/manual/reference/command/collMod/#us...

10gen have been thinking about compression but nothing specific has happened yet (https://jira.mongodb.org/browse/SERVER-164). ZFS + compression is interesting, but not 'production' quality if you're using linux, and last time I tried to get MongoDB running on Solaris I gave up...


What really bugs me about Mongo:

https://jira.mongodb.org/browse/SERVER-863

The issue has been open for over two and a half years, is one of the most highly voted issues, and has yet to even have reached active engineering status.

Agree with you that compression is just a workaround for the awful BSON format.



I meant for RethinkDB (BTW - I've been a contributor for the MongoDB driver).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: