Hacker News new | comments | show | ask | jobs | submit login
MongoDB Days (gaiustech.wordpress.com)
70 points by creamyhorror 1430 days ago | hide | past | web | 52 comments | favorite

Shrug. Oracle suffers from problems that you don't get on your z/OS running IMS on top of VSAM files (which by the way is hierarchical, not relational - which is what your big ole bank will be using).

Talking about the relational model and "sound mathematical underpinnings" is fine, but I've seen dozens.. hundreds?.. of production relational databases and they've all been monsters. Most have been partially or significantly denormalized for good/bad reasons. All have their own warts, usually significant.

If people end up normalizing their MongoDB as the author suggests - Well I'd expect that. It's pretty rare to get your DB model right first time. Any thought that you can is probably tinged with madness. If you can have a good stab at it and iterate you're ahead.

Plus if you've ever worked with a TPS (or TM/2PC) you'll know they're a nightmare. Any volume systems I've worked on dispensed with them and use a reconcile/compensate mechanism.

MongoDB has gone down a replica/sharded model that focuses on Map/Reduce. How successful this is, well that's a different argument - but comparing it to an old-school big-iron mentality is a waste of time.

I think the main revelation in databases was the division of the world into OLTP and OLAP.

So you keep your normalised schema for production and have a methodologically denormalised schema for reporting.

Mind you, computer science courses generally don't teach OLAP -- stuff like dimensional modelling. My university didn't when I was there, though they've rolled it into a larger "Data Mining" course alongside MapReduce and friends.

Agree. & this makes sense from an architectural perspective, since the demands of each domain are different.

I was arguing from practical terms and my own experience. Even in established, DB-heavy, "we have hundreds of DBAs" organisations, you can't have a porcelain schema. It evolves over time, breaks, get optimised and de-optimised.

The Oracle shops I've worked in don't have one schema, they'll have dozens. Usually because they can't shoehorn emerging requirements into the databases they have. Or because of project timelines. Or they need to be performance tuned in some special way, which in the case of Oracle could be considered a dark art.

These orgs may have just one big data warehouse, but that becomes a dumping ground. It's possible to do your dimensional modelling (I worked on star schemas in the past, but that's about it) -- but it's a big ball of mud to try and weave together.

Even then your OLAP is under-stress because it's got a lot of ground to cover. Usually they start out as marketing engines, but get co-opted into all sorts of things. The most common and troublesome is regulatory reporting. Once your data warehouse gets used for something like regulatory reporting you're a bit stuck as nobody wants to touch it.

Then, because point to point integrations between the databases is brittle and cost-prohibitive -- they break the golden rule and take data out of the OLAP and put it back into the OLTP. I won't name names, but this is commonplace.

SalesForce is one of the biggest Oracle DB users (I read biggest at one stage, but can't find the reference). Even then the SalesForce data model is so denormalised you could call it one big table[1].

I use MongoDB day-to-day and it's fit for purpose for what I do. That said, I'm not a rabid fan and an alternative/hybrid approach is always on the horizon. What I don't buy is the "the DB world has already solved all these problems", particularly when things like "relational integrity" and "two phase commit" get thrown in the mix - truth is there is plenty to be solved.

[1] http://www.dbms2.com/2011/09/15/database-architecture-salesf...

Does using a document store solve any of these problems, though?

In some limited ways. I don't think the technology and lines of approach are fully mature yet.

I don't know any MongoDB users that I think would be better off on Oracle - especially for the reasons cited in the article.

My own view is the critical pieces will converge. If I could have PostgreSQL with a greatly simplified replication/parallelisation solution I'd be a happy man.

Equally if MongoDB could do a COUNT query in even double the time of psql. Or compress their keys. Or have fine-grained locking... (All of which I'm sure they will get to, just when).

Maybe that's MongoDB 5, PostgreSQL 14, Rethink 3 - I've got no idea and I'm sure I'll look at them all. However, I'm not the guy rocking up to a MongoDB conference and coming away with the conclusion "Oracle had that in 1988".

Once you start adding all those features in, the performance you gained by not doing them evaporates. That is the lesson of MySQL.

My argument is not that MongoDB users would be better off on Oracle. It's that Oracle users would not necessarily be better off on MongoDB, since Oracle actually does many things that the makers of MongoDB claim it can't. If MongoDB had been made by MS, everyone would call that "FUD"...

So we stop trying? The contribution of MySQL to the world of databases is significant.

If your intent is to rebut FUD, then statements like "But just remember that these kids think they’re solving problems that IBM (et al) solved quite literally before they were born in some cases" don't contribute. You're just meeting rhetoric with rhetoric.

I (obviously) disagree. Reinventing the wheel is fun, we all do it. But (to stretch an analogy), if there have been cars for decades, and you claim to have just invented the wheel, and that all previous wheel were square, then I would ask what you (think you) have accomplished - why not a) make a better car and b) tell us why your car is better. If the MongoDB schtick is "we can do 10% of what Oracle can for 1% of the price" then that's great, there is a real market for that (assuming that 10% overlaps with the 10% of Oracle that you happen to use).

I was using MySQL for real work back in the 1990s. I remember then, them saying, you don't need foreign keys - just check it in your application. You don't need transactions either, just handle failures in your application, yadda yadda. And of course, MySQL these days supports all these things (with InnoDB, an Oracle product) because these things weren't there "for lulz" in commercial databases, they were there because people needed them and saw value in them. Now I am getting a complete sense of deja vu with MongoDB. It will have to add transactions. This is inevitable. It will have to add enforced schemas, row level locking, ACLs, and other features besides. If we remember, then in a year, let's touch base again, and you'll see I'm right. This is what I mean when I say the lesson of MySQL.

And in 10 years, there will be another product, let's call it ZongoDB, and the cycle will repeat again. You don't need this, you can't do that, they'll say, and the MongoDB crew, by now themselves old geezers (this means, in IT, in their 30s) will roll their eyes.

PS I may be an Oracle partisan, but I recognize too when Oracle implements something IBM did back in the 70s...

If by "trying" you mean, random coding expeditions performed by people in too much of a hurry to find out what their forebears discovered, then yes, you stop trying. You stop trying and start reading, start researching, start thinking. There are vast reserves of knowledge just sitting there in the source code, waiting to be read. Looking at the CAP theorem and deciding to make a distributed database to exploit different tradeoffs is very sensible, but hubristically ignoring everything that came before because it's older than a decade is not.

You'll have to be more specific if you want to have a real discussion. Calling MongoDB a random coding expedition doesn't give me anything to go on, except perhaps just to flame back. Same goes with "ignoring everything". It's an absolutist statement that is patently false.

If there is something specific that MongoDB is blatantly or arrogantly ignoring from Oracle (i.e. the original article) -- technical, business model, whatever -- Then call it out.

TokuMX adds compression, fine grained locking, and a lot more to MongoDB. It's a version of MongoDB with the storage code replaced with the same storage core as TokuDB.

Thanks. I'd read about the fractal tree indexes, but not the rest. I'll take a look.

In the 90s, we had "document stores", except we called them object databases.

I'm most interested in the features that commercial DBs like Oracle have that free/open-source DBs like Postgres, MySQL, & NoSQL DBs don't. Are things like "a materialized view (1996!), a continuous query, the result cache" available in any free DBs nowadays?

There's more of this sort of criticism in the following old thread, "SQL Databases Don't Scale":


where a few commenters say (somewhat unpleasant) things like:

- I find that this type of FUD comes about from people that aren't good at designing and implementing large databases, or can't afford the technology that can pull it off, so they slam the technology rather than accept that they, themselves, are the ones lacking. Most of them tend to come from the typical LAMP/SlashDot crowd that only have experience with the minor technologies.

- For me, thousands of transactions per second and 10s of terabytes of data on a single database is normal. It's unremarkable, it's everyday, it's what we do, we have done it for years. And I know of installations handling 10x that. It's only people who's only "experience" is websites that whinge about how RDBMS can't handle their tiny datasets.

- Mr. Wiggins article would be better titled something like "ACID databases have scalability problems, especially cheap ones startups use"

How true are these criticisms nowadays? Is open-source still far behind, or is it (as I think) more than good-enough for 98% of use-cases?

edit: Thanks for the responses, sounds like I'll be trying out Postgres for my upcoming personal project.

> Are things like "a materialized view (1996!), a continuous query, the result cache" available in any free DBs nowadays?

Getting there... http://www.postgresql.org/docs/9.3/static/rules-materialized...

(The rest of his examples are complicated Oracleisms and you'd probably get pretty far with MVs.)

9.3 matviews are a step forward, yet you could probably roll your own matviews with triggers.

The nice thing with materialized views in DB2 for instance (don't know about oracle) is that they are automatically picked by the optimizer to replace a join. So your logical query is - as it should be - completely exasperated from the physical storage underneath. The DBA just puts in a materialized view and the application will speed up magically :)

I think you mean "completely abstracted from the physical storage underneath" :) Although, I'm sure queries exasperate the storage all the time.

Regardless, I had no idea DB2 was so smart. I guess you get what you pay for.

" so they slam the technology rather than accept that they, themselves, are the ones lacking. Most of them tend to come from the typical LAMP/SlashDot crowd that only have experience with the minor technologies."

Behind Facebook and Twitter there's a big usage of MySQL. Not even PostgreSQL, MySQL.

Of course it's a pain for them, but it works and they're profiting from it.

The question is, how much they are saving by not using Oracle or other proprietary DB? They priced themselves out of the web, now they can't "cry to mummy" about it.

"For me, thousands of transactions per second and 10s of terabytes of data on a single database is normal"

It helps when you have dedicated top notch hardware (especially true some years ago) and the startups have to work with EC2 and EBS (ok, there are some better choices, still)

I just looked at the Oracle price list:


Processor License cost (excludes support):

* Standard Edition: $17,500

* Enterprise Edition: $47,500 (that's the one with Materialized View Query Rewrite)

* Partitioning option: $11,500

* Advanced compression: $11,500 (basic compression is apparently slow?)

Per processor.

This is why all these other databases exist. Very few businesses, certainly not many startups, have the kinds of value of the data stored to warrant that kind of cost.

Yes, very expensive. However, if you're a bank, you may not need too many servers and 50k per server is change.

I believe in the beginning of the web, Oracle wanted to push a per-user pricing model.

Yes, if your website has 10 users, it pays 10x$PRICE, 100, 100x$PRICE

Nobody could work with that (with Oracle)

It all comes down to cost barrier to entry, it's a shame IBM mainframes are still so damn expensive.

Perhaps if somebody could offer {DB2|Oracle|Informix|Sybase} As A Service, like what many providers do with PostgreSQL and MySQL, it would be a different story for startups.

My profression involves moving MSSQL and Sybase databases to MySQL and MariaDB, then I go and read some of the features from the article in Oracle and DB2 documentation and think to myself "I ain't helping fucking nobody", the higher-ups in enterprise and startups usually only see upfront costs, not long term benefits.

As per your first question; they are getting there. PostgreSQL now has a built-in function for managing materialised views, in MySQL/MariaDB that is still hand-rolled.

MariaDB however does have Virtual Columns and Dynamic columns (to tackle nested tables, as per the article).

Look to MariaDB for the MySQL developments, as Oracle are bringing very little to the table in terms of matching MySQL features to Oracle (not surprising). But they are resolving important security, and infrastructure issues and improving InnoDB heavily.

Also PostgreSQL does have HSTORE, for storing JSON data types, very tasty.

There are a lot free DBs options, so I'd bet a lot against anything but the most esoteric feature being only available in a commercial solution.

The foundation of CouchDB queries are materialized view / continuous query. PgPool provides result caching.

> I'm most interested in the features that commercial DBs like Oracle have that free/open-source DBs [..] don't. Are things like "a materialized view (1996!), a continuous query, the result cache" available in any free DBs nowadays?

I cant answer your question, but looking for feature X in other products is the wrong approach IMHO. You're often better served by looking at what problem feature X is solving, and how that problem is other places. Sometimes the problem doesn't even exist.

You're often better served by looking at what problem feature X is solving, and how that problem is solved other places

Last year I wrote a tool for a bank to suck in MongoDB data from 5 big nodes on physical hardware, into an Oracle database running on a virtual machine. The idea was to make it easier for others to write their reporting queries against an SQL database that they understand. It turns out with the right tweaks the Oracle database also performed a lot better, on a lot less hardware. It was one of the things that really improved my impression of the Oracle database product.

The article also reminds me of how a father and son went to a Microsoft presentation in 2000, where Microsoft showed their solution to the tricky problem of integrating multile backend servers. Their solution was to have front end tiers close to the client, and the client getting thinner. The son was very impressed. The father said 'that's what IBM did before the 70s!'

One of the comments in the original article said that developers use MongoDB because they're too lazy to use RDBMS. I don't see anything wrong with that - laziness is a virtue among developers!

Seriously though, we are using MongoDB with great success at StartHQ (https://starthq.com) having done a lot of work with relational databases before. It's a great fit for startups where the schema is constantly evolving & the amount of data stored can be quite small.

Also, by talking to the DB directly, without an ORM, we can keep things really simple. I dread to think of what the same code would look like if we were to use a relational database, either with or without an ORM.

Data locality is not about seek times on disk, it's about network transfer times between different nodes! Linear horizontal scalability often needs sharding of data, and with a document store like MongoDB, you can easily shard a complex denormalized document.

Now, if I have this complex document normalized into several tables, how am I going to easily shard several tables, all such that I can execute successful joins that only need to execute on one leaf node? What if I start reusing a small piece of data in one of these normalized tables? I might be forced to go between network nodes to get this data.

Normalization is like premature optimization. I can take any program and modify it such that every piece executes optimally fast, but I am likely to compromise on clarity or to add complexity while doing such a refactoring. In the end, it probably got me no real-world performance boost that mattered. 80/20% rule and all.

Same thing with normalization: automatically making all my data fully normalized from the start is a like a bad premature optimization habit that we are forced into with relational databases out of A) sheer habit and school teachings, B) lack of easy support for nested structured data.

That is of course what I was referring to when I talked about logical vs physical data models. You define your LDM then selectively denormalize it when you create the PDM.

The MySQL guys are very into sharding too but of course without hash joins, you can't join big tables anyway, so this "optimization" is easy because it costs you nothing.

You can do all of that with a SQL DB. In fact, Postgres works as an amazing key-value store if you want it to. They've even got some damned good performance in hstore compared to many use cases of Mongo. I use MongoDB and SQL DBs every day.

Mongo is much more than a key-value store. Document sub-components and subarrays can be indexed and aggregated against. I don't see how you can do that with hstore.

Yes, I can totally see how Postgres and hstore can run circles around key-value storage.

And by "all that", I would love to know how to use the open-source versions of Postgres or MySQL to have transparently-sharded tables with joins AND have that be in a replicated environment where I can do real-time failover.

I think the partitioning and replication stories with Postgres will have to get a lot less manual before there can be automatic sharding. It doesn't sound beyond the pale to me, but I wouldn't expect it in the next five years.

"We don’t work the way we do because tables are a limitation of the technology, we use the relational model because it has sound mathematical underpinnings, and the technology reflects that†. Where’s the rigour in MongoDB’s model?"

It seems all anti-NoSQL rants are the same whining and refusal to understand. "Sound mathematical base"? Really? So, if I can't describe something mathematically (I can, by the way) it's not worth it? A computer program is a mathematical description, there you have it.

The relational model breaks for very common use cases nowadays. Yes, maybe you think it's fun to do a query across how many tables to get the information you want, but if your website has a non-trivial traffic then the solution is usually to add more cache.

That's (one of the reasons) why PostgreSQL has hstore. Beyond the fanboy insistence that you can do everything with relational DBs, Postgres have accepted the reality that you need a more flexible data structure.

Edit: yes, please continue showing your contempt while I have to code around the limitation of relational databases.

A better title might be:

"MongoDB solves problems MySQL didn't solve at the time."

Relational databases are a kernel of mathematical relational beauty buried under a million tons of ugly hacks you have to do to keep them performant.

He has a DBA mindsight in a FullStack world. There are many projects now that only require one or two FullStack engineers. Not one dba, one QA, two developers, one deployment/operations, one requirements analyst, etc. In this new FullStack world MongoDB is a great fit: changes are easy (no alter table add new_thing varchar(200); It's interface is JavaScript, the same language we use in other places; It's super easy to get started with (definitely not the case with Oracle); Oh and if we need to scale later, it might scale better too.

Yes, in our brave new FullStack™ world we've gone from having three or four people who are each insanely good at something to having one person who is not insanely good at anything.


yes id still til take "dave" our database guru at BT - whose first boss was Edsger Dijkstra.

Sounds like the 3-in-1 printer model.

Changes are easy?

How do you migrate documents created before you changed your schema to documents created after? Or do you deal with an entire collection of documents all wildly varying in shape?

FullStack? seriously?

I'm not so sure about the criticism of applications being exposed as web services.

If you have a system setup where different parts run on different services you don't want to have to co-ordinate and resync all the applications when their underlying data structures change.

Web services solve this by having a agreed contract of communicating between systems, service A knows of a better way of accessing its own data that service B knows of accessing system A's data.

What if you need to do something that touches both A and B?

Then you would go through both A and B's web services?

Perhaps better-put: MongoDB solves problems OP doesn't understand because he sees them in terms of purely relational SQL databases.

Genuine question: What are these use cases that MongoDB solves better? I'm a fan of several other non-relational databases, but I haven't really encountered any use-cases for MongoDB that make it seem like a good fit.

MongoDB has a nice page which explains the core MongoDB use cases: http://docs.mongodb.org/manual/use-cases/ and also here: https://www.mongodb.com/solutions

What's great specifically about MongoDB (this may apply to others, and/or not be good for your needs, but here goes)

- Easy to set up (including replication)

- Fast

- Data in "JSON" format (it's BSON internally)

- Javascript (used for map/reduce mainly)

- Good availability of libraries

Theoretically, CouchDB would have been better, but I've heard some bad stories (remember UbuntuOne)?

I don't have much experience with other NOSQL DBs (except for Redis, it's great, but I like to call it a "DB Toolkit" - I'm not dissing it, I love it)

Mongo is a great way to cache and distribute intermediate results in large computations, for example, if you have an efficient way to serialize and deserialize the actual in-memory representation of your data (e.g. if all your machines are the same architecture, you can just shovel blobs of memory around this way). I wouldn't use Oracle for that, the overheads are too great.

For arbitrarily structured data, document databases are great. Of course, this isn't specific to MongoDB.

Blog author here. I understand perfectly well what problems MongoDB solves. However, I also understand perfectly well what problems RDBMSs solve. This is not about technology; it's a complaint about the hype and spin.

I think you're misrepresenting the OP there. He's not complaining about MongoDB itself. He's complaining that proponents of MongoDB who attended this particular event failed to sell MongoDB because they promoted features that relational databases already solve, but claimed that they don't.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact