
MongoDB Days - creamyhorror
http://gaiustech.wordpress.com/2013/04/13/mongodb-days/
======
jwilliams
Shrug. Oracle suffers from problems that you don't get on your z/OS running
IMS on top of VSAM files (which by the way is hierarchical, not relational -
which is what your big ole bank will be using).

Talking about the relational model and "sound mathematical underpinnings" is
fine, but I've seen dozens.. hundreds?.. of production relational databases
and they've all been monsters. Most have been partially or significantly
denormalized for good/bad reasons. All have their own warts, usually
significant.

If people end up normalizing their MongoDB as the author suggests - Well I'd
expect that. It's pretty rare to get your DB model right first time. Any
thought that you can is probably tinged with madness. If you can have a good
stab at it and iterate you're ahead.

Plus if you've ever worked with a TPS (or TM/2PC) you'll know they're a
nightmare. Any volume systems I've worked on dispensed with them and use a
reconcile/compensate mechanism.

MongoDB has gone down a replica/sharded model that focuses on Map/Reduce. How
successful this is, well that's a different argument - but comparing it to an
old-school big-iron mentality is a waste of time.

~~~
jacques_chester
I think the main revelation in databases was the division of the world into
OLTP and OLAP.

So you keep your normalised schema for production and have a methodologically
denormalised schema for reporting.

Mind you, computer science courses generally don't teach OLAP -- stuff like
dimensional modelling. My university didn't when I was there, though they've
rolled it into a larger "Data Mining" course alongside MapReduce and friends.

~~~
jwilliams
Agree. & this makes sense from an architectural perspective, since the demands
of each domain are different.

I was arguing from practical terms and my own experience. Even in established,
DB-heavy, "we have hundreds of DBAs" organisations, you can't have a porcelain
schema. It evolves over time, breaks, get optimised and de-optimised.

The Oracle shops I've worked in don't have one schema, they'll have dozens.
Usually because they can't shoehorn emerging requirements into the databases
they have. Or because of project timelines. Or they need to be performance
tuned in some special way, which in the case of Oracle could be considered a
dark art.

These orgs may have just one big data warehouse, but that becomes a dumping
ground. It's possible to do your dimensional modelling (I worked on star
schemas in the past, but that's about it) -- but it's a big ball of mud to try
and weave together.

Even then your OLAP is under-stress because it's got a lot of ground to cover.
Usually they start out as marketing engines, but get co-opted into all sorts
of things. The most common and troublesome is regulatory reporting. Once your
data warehouse gets used for something like regulatory reporting you're a bit
stuck as nobody wants to touch it.

Then, because point to point integrations between the databases is brittle and
cost-prohibitive -- they break the golden rule and take data _out_ of the OLAP
and put it back into the OLTP. I won't name names, but this is commonplace.

SalesForce is one of the biggest Oracle DB users (I read biggest at one stage,
but can't find the reference). Even then the SalesForce data model is so
denormalised you could call it one big table[1].

I use MongoDB day-to-day and it's fit for purpose for what I do. That said,
I'm not a rabid fan and an alternative/hybrid approach is always on the
horizon. What I don't buy is the "the DB world has already solved all these
problems", particularly when things like "relational integrity" and "two phase
commit" get thrown in the mix - truth is there is plenty to be solved.

[1] [http://www.dbms2.com/2011/09/15/database-architecture-
salesf...](http://www.dbms2.com/2011/09/15/database-architecture-salesforce-
com-force-com-and-database/)

~~~
jacques_chester
Does using a document store solve any of these problems, though?

~~~
jwilliams
In some limited ways. I don't think the technology and lines of approach are
fully mature yet.

I don't know any MongoDB users that I think would be better off on Oracle -
especially for the reasons cited in the article.

My own view is the critical pieces will converge. If I could have PostgreSQL
with a greatly simplified replication/parallelisation solution I'd be a happy
man.

Equally if MongoDB could do a COUNT query in even _double_ the time of psql.
Or compress their keys. Or have fine-grained locking... (All of which I'm sure
they will get to, just when).

Maybe that's MongoDB 5, PostgreSQL 14, Rethink 3 - I've got no idea and I'm
sure I'll look at them all. However, I'm not the guy rocking up to a MongoDB
conference and coming away with the conclusion "Oracle had that in 1988".

~~~
gaius
Once you start adding all those features in, the performance you gained by not
doing them evaporates. That is the lesson of MySQL.

My argument is not that MongoDB users would be better off on Oracle. It's that
Oracle users would not necessarily be better off on MongoDB, since Oracle
actually does many things that the makers of MongoDB claim it can't. If
MongoDB had been made by MS, everyone would call that "FUD"...

~~~
jwilliams
So we stop trying? The contribution of MySQL to the world of databases is
significant.

If your intent is to rebut FUD, then statements like _" But just remember that
these kids think they’re solving problems that IBM (et al) solved quite
literally before they were born in some cases"_ don't contribute. You're just
meeting rhetoric with rhetoric.

~~~
fusiongyro
If by "trying" you mean, random coding expeditions performed by people in too
much of a hurry to find out what their forebears discovered, then yes, you
stop trying. You stop trying and start reading, start researching, start
thinking. There are vast reserves of knowledge just sitting there in the
source code, waiting to be read. Looking at the CAP theorem and deciding to
make a distributed database to exploit different tradeoffs is very sensible,
but hubristically ignoring everything that came before because it's older than
a decade is not.

~~~
jwilliams
You'll have to be more specific if you want to have a real discussion. Calling
MongoDB a random coding expedition doesn't give me anything to go on, except
perhaps just to flame back. Same goes with "ignoring everything". It's an
absolutist statement that is patently false.

If there is something specific that MongoDB is blatantly or arrogantly
ignoring from Oracle (i.e. the original article) -- technical, business model,
whatever -- Then call it out.

------
creamyhorror
I'm most interested in the features that commercial DBs like Oracle have that
free/open-source DBs like Postgres, MySQL, & NoSQL DBs don't. Are things like
"a materialized view (1996!), a continuous query, the result cache" available
in any free DBs nowadays?

There's more of this sort of criticism in the following old thread, "SQL
Databases Don't Scale":

[https://news.ycombinator.com/item?id=690656](https://news.ycombinator.com/item?id=690656)

where a few commenters say (somewhat unpleasant) things like:

\- _I find that this type of FUD comes about from people that aren 't good at
designing and implementing large databases, or can't afford the technology
that can pull it off, so they slam the technology rather than accept that
they, themselves, are the ones lacking. Most of them tend to come from the
typical LAMP/SlashDot crowd that only have experience with the minor
technologies._

\- _For me, thousands of transactions per second and 10s of terabytes of data
on a single database is normal. It 's unremarkable, it's everyday, it's what
we do, we have done it for years. And I know of installations handling 10x
that. It's only people who's only "experience" is websites that whinge about
how RDBMS can't handle their tiny datasets._

\- _Mr. Wiggins article would be better titled something like "ACID databases
have scalability problems, especially cheap ones startups use"_

How true are these criticisms nowadays? Is open-source still far behind, or is
it (as I think) more than good-enough for 98% of use-cases?

edit: Thanks for the responses, sounds like I'll be trying out Postgres for my
upcoming personal project.

~~~
__alexs
> Are things like "a materialized view (1996!), a continuous query, the result
> cache" available in any free DBs nowadays?

Getting there... [http://www.postgresql.org/docs/9.3/static/rules-
materialized...](http://www.postgresql.org/docs/9.3/static/rules-
materializedviews.html)

(The rest of his examples are complicated Oracleisms and you'd probably get
pretty far with MVs.)

~~~
omarqureshi
9.3 matviews are a step forward, yet you could probably roll your own matviews
with triggers.

~~~
inflagranti
The nice thing with materialized views in DB2 for instance (don't know about
oracle) is that they are automatically picked by the optimizer to replace a
join. So your logical query is - as it should be - completely exasperated from
the physical storage underneath. The DBA just puts in a materialized view and
the application will speed up magically :)

~~~
noelherrick
I think you mean "completely abstracted from the physical storage underneath"
:) Although, I'm sure queries exasperate the storage all the time.

Regardless, I had no idea DB2 was so smart. I guess you get what you pay for.

------
kokey
Last year I wrote a tool for a bank to suck in MongoDB data from 5 big nodes
on physical hardware, into an Oracle database running on a virtual machine.
The idea was to make it easier for others to write their reporting queries
against an SQL database that they understand. It turns out with the right
tweaks the Oracle database also performed a lot better, on a lot less
hardware. It was one of the things that really improved my impression of the
Oracle database product.

The article also reminds me of how a father and son went to a Microsoft
presentation in 2000, where Microsoft showed their solution to the tricky
problem of integrating multile backend servers. Their solution was to have
front end tiers close to the client, and the client getting thinner. The son
was very impressed. The father said 'that's what IBM did before the 70s!'

------
olegp
One of the comments in the original article said that developers use MongoDB
because they're too lazy to use RDBMS. I don't see anything wrong with that -
laziness is a virtue among developers!

Seriously though, we are using MongoDB with great success at StartHQ
([https://starthq.com](https://starthq.com)) having done a lot of work with
relational databases before. It's a great fit for startups where the schema is
constantly evolving & the amount of data stored can be quite small.

Also, by talking to the DB directly, without an ORM, we can keep things really
simple. I dread to think of what the same code would look like if we were to
use a relational database, either with or without an ORM.

------
astral303
Data locality is not about seek times on disk, it's about network transfer
times between different nodes! Linear horizontal scalability often needs
sharding of data, and with a document store like MongoDB, you can easily shard
a complex denormalized document.

Now, if I have this complex document normalized into several tables, how am I
going to easily shard several tables, all such that I can execute successful
joins that only need to execute on one leaf node? What if I start reusing a
small piece of data in one of these normalized tables? I might be forced to go
between network nodes to get this data.

Normalization is like premature optimization. I can take any program and
modify it such that every piece executes optimally fast, but I am likely to
compromise on clarity or to add complexity while doing such a refactoring. In
the end, it probably got me no real-world performance boost that mattered.
80/20% rule and all.

Same thing with normalization: automatically making all my data fully
normalized from the start is a like a bad premature optimization habit that we
are forced into with relational databases out of A) sheer habit and school
teachings, B) lack of easy support for nested structured data.

~~~
wheaties
You can do all of that with a SQL DB. In fact, Postgres works as an amazing
key-value store if you want it to. They've even got some damned good
performance in hstore compared to many use cases of Mongo. I use MongoDB and
SQL DBs every day.

~~~
astral303
Mongo is much more than a key-value store. Document sub-components and
subarrays can be indexed and aggregated against. I don't see how you can do
that with hstore.

Yes, I can totally see how Postgres and hstore can run circles around key-
value storage.

And by "all that", I would love to know how to use the open-source versions of
Postgres or MySQL to have transparently-sharded tables with joins _AND_ have
that be in a replicated environment where I can do real-time failover.

~~~
fusiongyro
I think the partitioning and replication stories with Postgres will have to
get a lot less manual before there can be automatic sharding. It doesn't sound
beyond the pale to me, but I wouldn't expect it in the next five years.

------
raverbashing
"We don’t work the way we do because tables are a limitation of the
technology, we use the relational model because it has sound mathematical
underpinnings, and the technology reflects that†. Where’s the rigour in
MongoDB’s model?"

It seems all anti-NoSQL rants are the same whining and refusal to understand.
"Sound mathematical base"? Really? So, if I can't describe something
mathematically (I can, by the way) it's not worth it? A computer program is a
mathematical description, there you have it.

The relational model breaks for very common use cases nowadays. Yes, maybe you
think it's fun to do a query across how many tables to get the information you
want, but if your website has a non-trivial traffic then the solution is
usually to add more cache.

That's (one of the reasons) why PostgreSQL has hstore. Beyond the fanboy
insistence that you can do everything with relational DBs, Postgres have
accepted the reality that you need a more flexible data structure.

Edit: yes, please continue showing your contempt while I have to code around
the limitation of relational databases.

------
jacques_chester
A better title might be:

"MongoDB solves problems MySQL didn't solve at the time."

------
Pxtl
Relational databases are a kernel of mathematical relational beauty buried
under a million tons of ugly hacks you have to do to keep them performant.

------
don_draper
He has a DBA mindsight in a FullStack world. There are many projects now that
only require one or two FullStack engineers. Not one dba, one QA, two
developers, one deployment/operations, one requirements analyst, etc. In this
new FullStack world MongoDB is a great fit: changes are easy (no alter table
add new_thing varchar(200); It's interface is JavaScript, the same language we
use in other places; It's super easy to get started with (definitely not the
case with Oracle); Oh and if we need to scale later, it _might_ scale better
too.

~~~
smacktoward
Yes, in our brave new FullStack™ world we've gone from having three or four
people who are each insanely good at something to having one person who is not
insanely good at anything.

Progress!

~~~
walshemj
yes id still til take "dave" our database guru at BT - whose first boss was
Edsger Dijkstra.

------
callumjones
I'm not so sure about the criticism of applications being exposed as web
services.

If you have a system setup where different parts run on different services you
don't want to have to co-ordinate and resync all the applications when their
underlying data structures change.

Web services solve this by having a agreed contract of communicating between
systems, service A knows of a better way of accessing its own data that
service B knows of accessing system A's data.

~~~
gaius
What if you need to do something that touches both A and B?

~~~
callumjones
Then you would go through both A and B's web services?

------
area51org
Perhaps better-put: MongoDB solves problems OP doesn't understand because he
sees them in terms of purely relational SQL databases.

~~~
iand675
Genuine question: What are these use cases that MongoDB solves better? I'm a
fan of several other non-relational databases, but I haven't really
encountered any use-cases for MongoDB that make it seem like a good fit.

~~~
a5huynh
MongoDB has a nice page which explains the core MongoDB use cases:
[http://docs.mongodb.org/manual/use-
cases/](http://docs.mongodb.org/manual/use-cases/) and also here:
[https://www.mongodb.com/solutions](https://www.mongodb.com/solutions)

