Hacker News new | comments | show | ask | jobs | submit login

I'm using it for the analytics suite at my company (large ecommerce multinational).

Its weird at first, coming from a background of Access then SQLite then MySQL/PHPMyAdmin but you get used to it. I essentially treat it like a gigantic python dictionary object.

The sharding is too much of a ball-ache to set up so I've created an optimal way of distributing/mapping files across our cluster to make use of all machines.

Data integration is nice. Making sure there's no temptation to output each integrated line to the terminal, pymongo and its C extensions can integrate a ~500 byte record in ~0.0001 seconds.

Basically the main advantage is not having a schema whatsoever - you can just add random attributes to documents whenever the hell you want. But later you have to be careful with exception handling since documents might not have the attributes you expect.

Where exactly is the net gain in your situation?

So right off the bat, you've lost the querying power that SQL offers. When dealing with data that's intended to be analyzed, that sounds like a pretty big loss.

Clearly its built-in sharding support, which is often touted as one of its biggest benefits, wasn't suitable for you. So you had to invest some time and effort coming up with an alternate system. That sounds like a loss to me.

When it comes to the schema issue, it sounds like you haven't actually reduced the effort or work in any way, but merely pushed it somewhere else. Like you admit, you still do have to deal with the schema, it's just handled within the application logic, rather than the database. That sounds worse to me, especially if there is more than one application using the database.

I'm just not seeing the benefit.

right off the bat, you've lost the querying power that SQL offers.

Can you point me to any reasonably-priced database systems that allow me to execute SQL queries against a cluster of shared-nothing machines running commodity hardware? I'm not interested in paying $20-100K/TB/year for Vertica and friends.

SQL can be great if you have vast amounts of money or if your data can fit onto a single machine. When neither of those things are true though, nosql DBs become important.

Well are we talking with shard capabilities built-in, or having to roll your own? If you can roll your own then I suggest looking at Galera (we use this for our production MySQL stuff) http://www.codership.com

If it has to have built-in sharding then I can't think of anything off-hand.

MySQL Cluster?

Auto sharding, shared nothing, SQL support, it does a pretty good job of things.


MySQL cluster is another option beyond MySQL Galera (and Cluster actually has a NoSQL layer on top of it.. FYI)... but I have to say if we want to talk about scaling issues for MongoDB, you might not be happy with some of the limitations of NDB.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact