Don't use MongoDB

harryh · on Nov 6, 2011

Hi,

I run engineering for foursquare. About a year and a half ago my colleagues and I and made the decision to migrate to MongoDB for our primary data store. Currently we have dozens of MongoDB instances across several different data clusters storing over a TB of data and handling 10s of thousands of requests per second (mostly reads but the write load is reasonably high as well).

Have we run into problems with MongoDB along the way? Yes, of course we have. It is a new technology and problems happen.

Have they been problematic enough to seriously threaten our data? No they have not.

Has Eliot and the rest of his staff @ 10Gen been extremely responsive and helpful whenever we run into problems? Yes, absolutely. Their level of support is amazing.

MongoDB is a complicated beast (as are most datastores). It makes tradeoffs that you need to understand when thinking about using it. It's not necessarily for everyone. But it most certainly can be used by serious companies building serious products. Foursquare is proof of that.

I'm happy to answer any questions about our experience that the HN community might have.

-harryh

samstokes · on Nov 7, 2011

Would you be able to sum up the things you consider Mongo to be extremely good at? Particularly in comparison to things like Riak (which I believe supports a similar data model), or indeed compared to an RDBMS.

All databases perform poorly if you try to use them for use cases they don't fit, but I find with NoSQL databases it can be hard to find concise, objective statements of which use cases each is ideal for.

fedd · on Nov 6, 2011

have users of foursquare run into problems? were they serious? did someone lose money? let's ask. it would answer whether to use an eventually consistent db.

harryh · on Nov 6, 2011

> have users of foursquare run into problems?

Of course we've run into problems from time to time. No one goes from nothing to foursquare's level of success without running into some bumps along the way.

> were they serious? did someone lose money?

No.

> it would answer whether to use an eventually consistent db

MongoDB actually isn't really an eventually consistent datastore. It doesn't (for example) allow writes to multiple nodes across a network partition and then have a mechanism for resolving conflicts.

willvarfar · on Nov 6, 2011

http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/

You had 11 hours downtime and didn't lose money?

What about opportunity cost? Reputation?

Now you have to share your secret :)

(I guess, if you weren't profitable, you had nothing to lose?)

harryh · on Nov 6, 2011

The 11 hours of downtime was a pretty big deal, but it had very little to do with MongoDB. It was basically a huge failure in proper monitoring.

9oliYQjP · on Nov 6, 2011

Kudos for not blaming the tool when that would have been the easiest route. It's worth mentioning that 10gen has MongoDB Monitoring Service out now. It makes monitoring MongoDB instances a lot more accessible and convenient.

willvarfar · on Nov 7, 2011

That's not how everyone remembered it at the time, nor the picture the blog painted. And the mongo monitoring thing is much newer, right? Its like saying that a fire wasn't a big deal because next time there'll be a fire station.

Stockholm syndrome?

Or perhaps investor pressure to close ranks ;)

jm3 · on Nov 6, 2011

If you don't have paying customers (Foursquare), you're not going to lose much in hard dollars when your service falls over. Reputation points? Sure. Dollars, not so much.

maxmcd · on Nov 7, 2011

Drawing a line between the difference in losing money for a paying service and a free service, while technically correct, is not the best business practice. Any online business looses money by being down, whether they can easily quantify it or not.

gsharm · on Nov 6, 2011

Where do you finally persist the data in that case?

antirez · on Nov 6, 2011

I appreciate the "public service" intend of this blog post, however:

1) It is wrong to evaluate a system for bugs now fixed (but you can evaluate a software development process this way, however it is not the same as MongoDB itself, since the latter got fixed).

2) A few of the problems claimed are hard to verify, like subsystems crashing, but users can verify or deny this just looking at the mailing list if MongoDB has a mailing list like the Redis one that is ran by an external company (google) and people outside 10 gen have the ability to moderate messages. (For instance in Redis two guys from Citrusbytes can look/moderate messages, so even if I and Pieter would like to remove a message that is bad advertising we can't in a deterministic way).

3) New systems fails, especially if they are developed in the current NoSQL arena that is of course also full of interests about winning users ASAP (in other words to push new features fast is so important that perhaps sometimes stability will suffer). I can see this myself as even if my group at VMware is very focused on telling me to ship Redis as stable as possible as first rule, sometimes I get pressures about releasing new stuff ASAP from the user base itself.

IMHO it is a good idea if programmers learn to test very well the systems they are going to use with simulations for the intended use case. Never listen to the Hype, nor to detractors.

On the other side all this stories keep me motivated in being conservative in the development of Redis and try avoiding bloats and things I think will ultimately suck in the context of Redis (like VM and diskstore, two projects I abandoned).

moe · on Nov 6, 2011

1) It is wrong to evaluate a system for bugs now fixed

I disagree. A project's errata is a very good indicator for the overall quality of the code and the team. If a database-systems history is littered with deadlock, data-corruption and data-loss bugs up to the present day then that's telling a story.

2) A few of the problems claimed are hard to verify

The particular bugs mentioned in an anonymous pastie may be hard to verify. However, the number of elaborate horror-stories from independent sources adds up.

3) New systems fails, especially if they are developed in the current NoSQL arena

Bullshit. You, personally, are demonstrating the opposite with redis which is about the same age as MongoDB (~2 years).

catwell · on Nov 7, 2011

> Bullshit. You, personally, are demonstrating the opposite with redis which is about the same age as MongoDB (~2 years).

Apparently you have no idea how many critical bugs have been fixed in Redis...

WayneDB · on Nov 6, 2011

I agree with your responses to 1 and 2. I take issue with the example for 3 though because Redis is nowhere near the complexity or feature set of MongoDB.

moe · on Nov 6, 2011

I don't think that counts as an argument.

When you strip MongoDB down to the parts that actually have a chance of working under load then you end up pretty close to a slow and unreliable version of redis.

Namely, Mongo demonstrably slows to a crawl when your working-set exceeds your available RAM. Thus both redis and mongo are to be considered in-memory databases whereas one of them is honest about it and the other not so much.

Likewise Mongo's advanced data structures demonstrably break down under load unless you craft your access pattern very carefully; i.e. growing records is a nono, atomic updates (transactions) are a huge headache, writes starve reads by design, the map-reduce impl halts the world, indexing halts the world, etc. etc.

My argument is that the feature disparity between mongo and redis stems mostly from the fact that Antirez has better judgement over what can be made work reliably and what can not. This is why redis clearly states its scope and limits on the tin and performs like a swiss watch within those bounds.

Mongo on the other hand promises the world and then degrades into a pile of rubble once you cross one of the various undocumented and poorly understood thresholds.

j_baker · on Nov 6, 2011

If I recall correctly, mongo only requires that the index gets stored in memory. The actual data itself can go on disk.

kanwisher · on Nov 7, 2011

If you actually use Mongo in practice, everything needs to be in ram to have any kind of performance

obfuscate · on Nov 6, 2011

It requires neither.

willvarfar · on Nov 6, 2011

facepalm. Indices on disk is a solved problem.

WayneDB · on Nov 6, 2011

You know, I didn't think about how similar Redis and Mongo are at the core when I first read your comment. The first thing that jumped out at me was the large set of disparities.

Thanks for that explanation. I agree that Mongo seems to have over-promised and under-delivered and that you do have to really craft your access pattern. I'm not a heavy MongoDB user, but from reading the docs and playing around, I was already under the impression that the performance of MongoDB is entirely up to me and that I would need a lot of understanding to get the beast working well at scale.

So, it's a tough call for me to say whether they over-promised or not, but like I said...I'm not a heavy user. I just read a lot. I do think it is easy to be deceived by Mongo's apparent simplicity (ie - usage of JSON, Javascript, schema-lessness, etc).

EDIT: zzzeek made a good point below about spending time in a low-key mode before really selling the huge feature-set, which convinced me, so I think you're right. I do like the idea of Mongo though, so hopefully they can get through it.

zzzeek · on Nov 6, 2011

there's something to be said for promoting an application proportionally to the maturity of its implementation. An application with a larger and more sprawling featureset would need to spend several years in "low key" mode, proving itself in production usage by a relatively low number of shops who treat it with caution. I think the issue here is one of premature overselling.

WayneDB · on Nov 6, 2011

Good point.

gfodor · on Nov 6, 2011

At the end of the post the author notes his concern isn't with the technical bugs per se, but with the deep rooted cultural problems and misplaced priorities the existence of those problems reveal.

antirez · on Nov 6, 2011

That's a fair problem, but I think It is true for other products as well and was true for things that we feel very solid today like MySQL. In other words there is a tention between stability and speed of development, a very "hard" tention indeed. It is up to the developers culture and sensibility to balance the two ingredients in the best way.

One of the reasons I don't want to create a company around Redis, but want to stay with VMware forever as an employee developing Redis, is that I don't want development pressures that are not drive by: users, technical arguments. So that I can balance speed of development and stability as I (and the other developers) feel right.

Without direct reference to 10gen I guess this is harder when there is a product-focused company around the product (but I don't know how true this is for 10gen as I don't follow very closely the development and behavior of other NoSQL products).

gfodor · on Nov 6, 2011

MySQL is a poor analogy because the history of MySQL is very similar to 10gen: a 'hacker' solution originally patched together by people who didn't take their responsibility as database engineers very seriously. It's only after years (decades) of work that MySQL has managed to catch up with database technology of the 80s in terms of reliability and stability (and it still has plenty of issues, as the most recent debacles with 5.5 show.)

On the other hand, commercial vendors like Oracle and open source projects like PostgreSQL recognize their role as database engineers is to first and foremost "do no harm." Ie, the database should never destroy data, period. Bugs that get released that do cause such things can be traced back to issues that are not related to a reckless pursuit of other priorities like performance. Watching the PostgreSQL engineers agonize over data integrity and correctness with any and all features that go out that are meant to improve performance is a re-assuring sight to behold.

This priority list goes without saying for professional database engineers. That there is such a 'tension' between stability and speed says less about a real phenomenon being debated by database engineers and more about the fact that many people who call themselves database engineers have about as much business doing so as so-called doctors who have not gone to medical school or taken the Hippocratic oath.

antirez · on Nov 6, 2011

I agree with you but my comments are more about telling what is going on in my opinion, instead of telling what I think should be the right priority list. Even if I agree I still recognize that MySQL had a much bigger effect to the database world compared to PostgreSQL, so the success of a database can sometimes take strange paths.

But I think a major difference between MySQL and Redis, MongoDB, Cassandra, and all the other NoSQL solutions out there is that MySQL had an impressive test bed: all the GPL LAMP applications, from forums to blogs, shipped and users by a shitload of users. We miss this "database gym" so these new databases are evolving in small companies or other more serious production environments, and this creates all the sort of problems if they are not stable enough in the first place.

So what you say can be more important for the new databases than it was for MySQL indeed.

jvehent · on Nov 6, 2011

> MySQL had a much bigger effect to the database world compared to PostgreSQL

And if MySQL never existed, what would have happened ? Would we have all used PostgreSQL in the first place and avoided years of painful instability ?

I read here all the time that fashion and ease of use are more attractive than reliability. And we introduce plenty of new software in complex architecture just because they are easy to use. We even introduce things like "eventual consistency", as if being eventually consistent was even an option for any business.

The problem is to not use random datastores. Use a database that has a proven record of stability. And if someone builds a database, he/she must prove that ACID rules are taken seriously, and not work around the CAP theorem with timestamps...

10 years ago, MySQL was not stable. PostgreSQL was. Today, most key-value databases are not stable, PostgreSQL is.

zzzeek · on Nov 6, 2011

Interesting to note is that early versions of Postgres, we're talking the pre-6 versions around 1995 here, were awful. Not like I was a very sophisticated user at that time myself but it definitely ate my data back then - we switched to MSQL at that time which at least didn't do that.

marshray · on Nov 6, 2011

Wasn't it still basically a university project for researching MVCC at that point? I love universities of course but we must admit they produce interestingly-architected abandonware sometimes.

My sense was that it got a pretty thorough review and revision/rewrite in the transition from Postgres to PostgreSQL.

einhverfr · on Nov 6, 2011

PostgreSQL has evolved a LOT in the last decade even. I thought the university project was looking at OO paradigms in relational databases (inheritance between relations and the like).

The change from Postgres to PostgreSQL was largely a UI/API change and the move from QUEL to SQL. However, over time virtually all of the software has been reviewed and rewritten. It's an excellent project, and I have been using it since 6.5.......

jvehent · on Nov 6, 2011

That was 16 years ago. Since then, PostgreSQL engineers spent a LOT of time proving the reliability of their engine. And today, 16 years later, we can consider it reliable.

Most key-value databases didn't prove (as in: show me actual resistance tests, not supercompany123 uses it) that they are reliable. The day they do, I'll be the first one to use them. Until then, it's just a toy for devs who don't want to deal with ER models.

zzzeek · on Nov 6, 2011

you misunderstand me. I LOVE postgresql. It is the best database ever and I try to use it as much as possible. My only point was, they started out as unstable and untrustworthy just like anything else would.

fdr · on Nov 6, 2011

I agree. There was no WAL logging, for instance. Most people consider 7.4 the first actually-possibly-not-a-terrible-idea release.

Then again, Postgres -- the project -- did not try to position itself (was there even such a thing as "positioning" for Postgres 16 years ago?) as a mature, stable project that one would credibly bet one's business on.

Lots of early database releases are going to be like Mongo, the question is how much the parties at play own up to the fact that their implementation is still immature and present that starkly real truth to their customers. So far, it seems commercial vendors are less likely to do that.

einhverfr · on Nov 7, 2011

Well, 8.0 is really the first really good release.

However, actually-not-a-terrible-idea is pretty relative, when you look at how the industry has evolved in the mean time. I mean, compared to MySQL at the time, PostgreSQL 6.5 was really not a terrible idea. 7.3 was the first release I didn't have to use MySQL as a prototyping system though.

And with 9.x things are getting even better.

ajsharp · on Nov 6, 2011

> And if MySQL never existed, what would have happened ? Would we have all used PostgreSQL in the first place and avoided years of painful instability ?

I think you're missing the point a little. Yes, MySQL is a heap, and having to work with it in a Postgres world sucks. But, the point antirez is making in that comment (at least how I read into it) is that an active user community in ANY project is hugely important in that project's formation and "maturity" (sarcastically, of course, because Postgres is clearly more mature than MySQL). There's no extrapolation here to the top-level Mongo discussion going on in this thread -- I was just clarifying antirez's point.

einhverfr · on Nov 7, 2011

I still think that solid engineering on any project begins with the engineering and leadership of a few, and the feedback of many. So yes, community is important, but less important than the core of that community which is necessarily small.

lurker17 · on Nov 8, 2011

"eventual consistency" was promoted by Amazon, which seems to run a pretty good business.

JonM · on Nov 6, 2011

Where can I get info on the MySQL 5.5 issues? I'm considering upgrading from 5.1 to get the new InnoDB plugin...

asianexpress · on Nov 6, 2011

Just in case you weren't already aware, you can use the InnoDB plugin in 5.1 http://dev.mysql.com/doc/innodb-plugin/1.0/en/index.html

I know benchmarks don't put this quite as fast as 5.5, but there are still possible gains to be made.

willvarfar · on Nov 6, 2011

Pssst look at tokudb

rdtsc · on Nov 6, 2011

> IMHO it is a good idea if programmers learn to test very well the systems they are going to use ...

Great point. It would also help if the company that makes a DB would put flashing banner on their page to explain the trade-offs in their product. Such as "we don't have single server durability built in as a default".

I understand if they are selling dietary supplements and are touting how users will acquire magic properties for trying the product for 5 easy payments of $29.99. In other words I expect shady bogus claims there. But these people are marketing software, not to end users, but to other developers. A little honesty, won't hurt. It is not bad that they had durability turned off. It is just a choice, and it is fine. What is not fine is not making that clear on the front page.

christkv · on Nov 6, 2011

It's good to see a voice of reason. I think we all win if NoSQL is allowed to survive. Having multiple paths to modeling and designing our applications is an enrichment of our ability to create interesting and valuable applications in our industry. The last 10 years have been about living under the modeling constraints of RDBMS's and the industry is slowly waking up to the realization that it does not need to be like this. Now we got choices. Graph db's, column db's, document db's etc.

I would like to thank you for the great job you have and are doing on Redis. It's an awesome piece of technology and warms my heart as an European :). Are you based in Palermo ?

einhverfr · on Nov 6, 2011

"Allowed to survive" is the wrong approach. "Finds a niche" is better.

The fact that software engineers need to understand is that NoSQL is in no way a replacement for SQL in areas of data with inherent structure. In such areas, the relational model wins hands-down, and NoSQL is a big, heavy foot-gun. The caliber of the foot gun goes up significantly when multiple applications need to access the same data.

On the other hand, the relational model breaks down in some ways in many areas. Some things that you'd think are inherently structured (like world-wide street addresses) turn out to only be semi-structured. Document management, highly performing hierarchical directory stores, and a few other areas also are bad matches for the relational model. Other stores work well in many of these areas, from the filesystem to things like NoSQL databases.

The big problem occurs when semi-structured data (say files which contain printed invoice data in PDF format) have to be linked to inherently structured data (say, vendor invoices). In these cases, tradeoffs have to be made......

I have no doubt that NoSQL is able to find a niche. I doubt it will be one which at best involves inherently structured data.

rdtsc · on Nov 6, 2011

> I think we all win if NoSQL is allowed to survive.

What does that even mean? Is it some sort of cultural practice or religion we are afraid of losing. So we should look over lost data and bad designs just because something falls under the "NoSQL" category?

I think anyone married to a technology like it is a religion is poised for failure. Technology should be evaluated as a tool. "Is this tool useful to me for this job?" Yes/No? Not "it has NoSQL in its title, it must be good, I'll use that".

foobarbazetc · on Nov 6, 2011

No shit, nmongo.

Anyone with half a brain can go look at the MongoDB codebase and deduce that it's amateur hour.

It's start up quality code but it's supposed to keep your data safe. That's pretty much the issue here -- "cultural problems" is just another way of saying the same thing.

Compare the code base of something like PostgreSQL to Mongo, and you'll see how a real database should be coded. Even MySQL looks like it's written by the world's best programmers compared to Mongo.

I'm not trying to hate on Mongo or their programmers here, but you've basically paid the price for falling for HN hype.

Most RDBMSes have been around for 10+ years, so it's going to take a long, long time for Mongo to catch up in quality. But it won't, because once you start removing the write lock and all the other easy wins, you're going to hit the same problems that people solved 30 years ago, and your request rates are going to fall to memory/spindle speed.

Nothing's free.

christkv · on Nov 6, 2011

I think the discussion here also misses an important aspect of the conversation which is about application data modeling. Mongo will sooner or later reach a "stable" level as it matures just as mysql, postgres and all other datastores have done. I picked mongo due to the good fit it had to the problems I needed solved not only from the server perspective but from the modeling perspective. The ease of ad-hoc queries and the schemaless nature of the db lent itself well to the kind of problems I wanted to solve.

So even if in 30 years it's got the same characteristics as our current dominant data storage models I consider it a net win that I will be able to use a document oriented database for development over a more traditional RDBMS for some off my applications.

The richer our toolset is the better we are off as not every problems is a nail to be hammered in with an RDBMS.

So a high five to all the people who dare go against convention and take a chance on a new approach to data modeling being it Mongo, Riak, CouchDb, Redis, Neo4j, Cassandra, HBase or any other awesome opensource project out there.

jacques_chester · on Nov 6, 2011

These are not new approaches to data modelling.

Document databases, network databases and hierarchical databases (IMS, CODASYL etc) predate relational databases by decades.

Relational is the universal default for a simple reason. When first introduced it proved to be far better, in every conceivable way, than the technologies it replaced.

It's as simple as that. Relational is a slam-dunk, no-brainer for 99.99% of use cases.

Still, if you really want a fast, proven system for one of the older models, you can get IBM to host stuff for you on a z/OS or z/TPF instance, running IMS. It'll have more predictable performance than AWS to boot.

Vivtek · on Nov 6, 2011

I agree entirely - I think when people rebel against "relational databases" they're actually just realizing that the normalization fetish can be harmful in many application cases.

You're better off with MySQL or PostgreSQL managing a key-value table where the value is a blob of JSON (or XML, which I've done in the past), then defining a custom index, which is pretty damn easy in PostgreSQL. Then you have hundreds of genius-years of effort keeping everything stable, and you still get NoSQL's benefits. Everybody wins.

einhverfr · on Nov 7, 2011

Normalization is a tricky thing. On one hand, highly normalized databases have better flexibility in reporting, IMHO. On the other, you lose some expressiveness regarding data constraints. High degrees of normalization would be ideal if cross-relation constraints were possible. As they are not, typically one has to normalize in part based on constraint dependencies just as much as data dependencies.

einhverfr · on Nov 6, 2011

First, the more I have looked, the more I have found that non-relational database systems are remarkably common and have been for a long time.

The relational model is ideal in many circumstances. However, it breaks down in semi-structured content, content where---parentheses for grouping---(hierarchical structure is important, data is seldom written and frequently read, and where read performance navigating the hierarchy is most important) and so forth.

So I'd generally agree, but not every problem is in fact a nail.

jacques_chester · on Nov 6, 2011

> However, it breaks down in semi-structured content, content where---parentheses for grouping---(hierarchical structure is important, data is seldom written and frequently read, and where read performance navigating the hierarchy is most important) and so forth.

Again, this problem is not new. Database greybeards call this OLAP and it's been around since the 80s.

There is nothing new under the sun in this trade.

einhverfr · on Nov 6, 2011

No. I am talking about something like LDAP, not OLAP. LDAP may suck badly in many many ways but it is almost exactly not like OLAP.

OLAP is typically used to refer to environments which provide complex reports quickly across huge datasets, so a lot of materialized views, summary tables, and the like may be used (as well as CUBEs and the like). Hierarchical directories are different. In a relational model you have to transfers the hierarchy to get the single record you want and you are not aggregating like you typically do in an OnLine Analytical Processing environment.

This is why OpenLDAP with a PostgreSQL backend sucks, while OpenLDAP with a non-relational backend (say BDB) does ok.

I am not saying anything new is under the sun, just that some of the old structures haven't gone away.

jacques_chester · on Nov 7, 2011

I was referring to the read/write preponderance. Normalisation optimises write performance, storage space and also provides strong confidence of integrity. But it means lots of joins, which can slow things down on the read side.

That's why OLAP came along. Structured denormalisation, usually into star schemata, that provide fast ad-hoc querying. I think part of the enthusiasm for NoSQL arises because most university courses and introductory database books will go into normalisation in great detail, but OLAP might only get name checked. So folk can get an incomplete impression of what relational systems can do.

If I had a purely K/V data problem -- a cache, for example -- I would turn to a pure K/V toolset. Memcache, for example.

Hierarchical datasets have long been the blindside for relational systems. Representable, but usually requiring fiddly schemes. But in the last decade SQL has gotten recursive queries, so it's not as big a problem as it used to be.

einhverfr · on Nov 7, 2011

Normalization is formally defined based on data value dependencies. However, because there is no way to set constraints across joins, in practice, the dependencies of data constraints are as important as the dependencies of data values.

As far as recursive queries, I am not 100% sure this is ideal either from a read performance perspective. There are times when recursive queries are helpful from a performance perspective, but I don't see a good way to index, for example, path to a node. Certainly most databases don't do this well enough to be ideal for hierarchical directories. For example indexing the path to a node might be problematic, and I am not even sure you could do this reliably in PostgreSQL because the function involved is not immutable.

jacques_chester · on Nov 7, 2011

Your replies so far are excellent. You're pointing out things I've overlooked, thanks.

> However, because there is no way to set constraints across joins, in practice, the dependencies of data constraints are as important as the dependencies of data values.

I don't follow your argument here. Could you restate it?

> As far as recursive queries, I am not 100% sure this is ideal either from a read performance perspective. There are times when recursive queries are helpful from a performance perspective, but I don't see a good way to index, for example, path to a node.

Poking around the Oracle documentation and Ask Tom articles, it seems to be more art than science; mostly based on creating compound indices over the relevant fields. Oracle is smart enough to use an index if it's there for a recursive field, but will struggle unless there's a compound index for other fields. I don't see an obvious way to create what you might call 'recursive indices', short of having an MV.

> Certainly most databases don't do this well enough to be ideal for hierarchical directories.

It'll never perform as well as a specialised system. But relational never will. An RDBMS won't outperform a K/V store on K/V problems, won't outperform a file system for blob handling and so on. This is just another example of the No Free Lunch theorem in action.

My contention is that we, as a profession of people who Like Cool Things, tend to discount the value of ACID early and then painfully rediscover its value later on. The business value of ACID is not revealable in a benchmark, so nobody writes breathless blog posts where DrongoDB is 10,000x more atomic than MetaspasmCache.

einhverfr · on Nov 7, 2011

> I don't follow your argument here. Could you restate it?

Sure.

Quick note, will use PostgreSQL SQL for this post.

Ok, take a simple example regarding US street addresses.

A street address contains the following important portions:

1) Street address designation (may or may not start with a digit). We will call this 'address' for relational purposes. 2) City 3) State 4) Zipcode

As for data value dependencies:

zipcode is functionally dependent on (city, state), and so for normalization purposes we might create two relations, assuming this is all the data we ever intend to store (which of course is always a bad assumption):

create table zipcode(zipcode varchar(10) not null primary key, city text not null, state text not null, id serial not null unique);

create table street( id serial not null, address text, zipcode_id int references zipcode(id), primary key(address, zipcode_id));

So far this works fine. However, suppose I need to place an additional constraint on (address) for some subset of (zipcodes), let's say all those in New York City. I can't do it declaratively, because all data constraints must be internal to a relation.

So at that point I have two options:

1) You can write a function which determines whether a zipcode_id matches the constraint and check on that, or

2) You can denormalize your schema and add the constraint declaratively.

I did some searching and determined strangely that although subqueries in check constraints are part of SQL92, the only "database" that seems to support them is MS Access. But while there are obvious issues regarding performance, I don't see why these couldn't be solved using indexes the same way foreign keys are typically addressed.

> Poking around the Oracle documentation and Ask Tom articles, it seems to be more art than science; mostly based on creating compound indices over the relevant fields. Oracle is smart enough to use an index if it's there for a recursive field, but will struggle unless there's a compound index for other fields. I don't see an obvious way to create what you might call 'recursive indices', short of having an MV.

No, there is an inherent problem here. Your index depends on other data in the database to be accurate. You can create an index over parent, etc. but you still end up having to check the hierarchy all the way down to find the path. You can't just index the path.

Consider this:

CREATE TABLE treetest (id int, parent int references treetest(id));

INSERT INTO treetest (id, parent) values (1, null), (2, 1), (3, 1), (4, 1), (5, 2), (6, 2), (7, 6);

The path to 7 is: 1,2,6,7. To find this, I have to hit 4 records in a recursive query. That means 4 scans.

So suppose we index this value, reducing this to one scan.

Then suppose we: update treetest set parent = 3 where id = 6;

and now our index doesn't match the actual path anymore.

With specialized hierachical databases, you could keep such paths indexed and make sure they are updated when any node in the path changes. There isn't a good way to do this in relational systems though because it is outside the concept of a relational index.

> My contention is that we, as a profession of people who Like Cool Things, tend to discount the value of ACID early and then painfully rediscover its value later on. The business value of ACID is not revealable in a benchmark, so nobody writes breathless blog posts where DrongoDB is 10,000x more atomic than MetaspasmCache.

No doubt about that. I think we are 100% in agreement there!

I'd also add that while RDBMS's aren't really optimal as backings for something like LDAP for a big directory, and while RDBMS's are horribly abused by dev's who don't understand them (ORM's and the like), they really are amazing, valuable tools, which are rarely valued enough or used to their fullest.

Later this week, I expect to write a bit of a blog post on http://ledgersmbdev.blogspot.com on why the intelligent database model (for RDBMS's) is usually the right model for the development of many business applications.

einhverfr · on Nov 7, 2011

In response to PostgreSQL's custom index types, taking a quick look at the API, I don't see a way of telling GiST indexes which entries need to be updated when a row's parent id is changed.

Consequently I don't believe there is a reasonable way to index this because there is no way to ensure the indexes are current and so you don't have a good way of testing that a row is in a path on the tree other than building the tree with recursive subqueries.

The thing is, unless you have a system which is aware of hierarchical relationships between the rows (which by definition is outside the relational model), you have no way of handling this gracefully. So here you have lots of reads, I really think dedicated hierarchical systems will win for hierarchical data.

Of course this wouldn't necessarily mean you couldn't store everything in the RDBMS and periodically export it to the hierarchical store.....

jacques_chester · on Nov 7, 2011

Informative replies, thank you.

> 1) You can write a function which determines whether a zipcode_id matches the constraint and check on that, or

Ah, this old chestnut. Been there, written the PL/SQL trigger, got the t-shirt. Agreed that there isn't a purely declarative approach here.

> You can't just index the path.

Postgres might be the winner here, if someone sufficiently motivated came along and wrote a custom index type for this use case.

gnaritas · on Nov 6, 2011

> When first introduced it proved to be far better, in every conceivable way, than the technologies it replaced.

That's not exactly true; what they did was offer a generic query and constraint model that worked well in all cases while offering reasonable performance. They were not generally faster in optimal cases, but they were much easier to query especially given new requirements after the fact because the queries weren't baked into the data model itself. That generic query ability and general data model always come at the cost of speed; always. Document databases have always been faster in the optimal use case.

SemanticFog · on Nov 6, 2011

You're absolutely right -- RDBMSes were designed to solve problems with the nosql-type approaches that preceded them. The nosql bandwagon is blindly rolling into the past, where it will crash into the old problems of concurrency and consistency under load.

BTW if you want nosql-style schema flexibility within an RDBMS, then a simple solution is to store XML or JSON in in a character blob. Keep the fields you need to search over in separate indexed fields. If you make incompatible version changes, then add a new json/xml field.

desas · on Nov 6, 2011

Another solution is to use the hstore feature in postgres to store key value data.

jacques_chester · on Nov 7, 2011

> BTW if you want nosql-style schema flexibility within an RDBMS, then a simple solution is to store XML or JSON in in a character blob.

In all sincerity, I would strongly recommend against this. If your problem absolutely defies normalisation, don't use a relational database.

christkv · on Nov 6, 2011

very true but it's a resurgence of modeling alternatives which can only help to enrich our ability to write interesting applications. yes you can model a social network in a RDBMS but it's not as efficient or as flexible as using neo4j. or yes you can model a key value document in a RDBMS but again it's not a good fit. The right tool for the right problem. You don't build a house with only a hammer so why should we build applications only on one storage concept ?

drdaeman · on Nov 6, 2011

I've opened the source code (at Github), but didn't really understood it. The code seems readable, though.

Do you care to provide some examples for those not familar with proper C++/Boost development practices, please?

itaborai83 · on Nov 6, 2011

I'm curious and I might be missing more than half of my brain. Would you be willing to show some examples of bad coding on their source tree?

krig · on Nov 6, 2011

I looked at using BSON in a project a while back, and ended up scrapping it mainly due to perceived poor code quality. Plenty of potential errors ignored, unclear error messages, unsafe practices.

I was also turned off by the sloppy use of memory. Heap allocated objects returned from functions with poor checks to see if anyone manages that memory on the other side. Lots of instances of strcmp, strcpy and similar unsafe string/buffer manipulation functions.

It's been a while since I looked at it so I don't have any particular examples at hand, but that was my impression.

mushishi · on Nov 6, 2011

I haven't ever used MongoDB but got interested, and first non-trivial source file I picked is this: https://github.com/mongodb/mongo/blob/master/db/btree.cpp

Take a look at for example: bool BtreeBucket<V>::find

Without even thinking about what it is doing, it's quite clear that it is not readable code, and it's not immediately obvious what the high level structure of the logic is. The function does not even fit into two screens so it's hard to reason about; your short-time memory is overused.

felix_krull · on Nov 6, 2011

this is the implementation of a b+ tree. the underlying logic has been very well researched since the 70s.

if there is a part of mongodb that I am sure does not contain bugs, it is that very file you link to.

if you want to know what it does, go out and read the relevant papers on data base technology. or graduate in CS.

jemfinch · on Nov 6, 2011

Clearly you didn't actually read the source file. I graduated in CS. I know B+ trees.

I also know that an 85-line, 7-argument method in a 1988-line file shouldn't depend on a global variable ("guessIncreasing") modified from several other, unrelated functions. I know that in bt_insert, which (apparently) assigns to "guessIncreasing" and then resets it to false just prior to exit, should be using an RAII class to do so instead of trying to catch every exit path, especially in a codebase that uses exceptions.

This code is amateur hour.

mushishi · on Nov 6, 2011

Thanks for attacking me personally. But I have no interest to pursue it more. I made claims that clearly hold true, and they have nothing to do with what you said (I did not say anything about bugs, for example)

pnathan · on Nov 6, 2011

That is characteristic of mathematical code, like btree. (ranty aside: being able to recognize this and find out information regarding btree for maintenance is(should be) one of the key reasons to get a CS degree)

I found the btree file relatively readable. Some macro stuff is not familiar to me, but I am sure I could figure it out in a few hours if I felt like. And I haven't yet rolled around to implementing a full-on btree, ever.

LeafStorm · on Nov 6, 2011

Just for comparison, CouchDB has had one major bug that could cause the loss of data, detailed here: http://couchdb.apache.org/notice/1.0.1.html

The bug was only triggered when the delayed_commits option was on (holds off on fsyncing when lots of write operations are coming in) and there was both a write conflict and a period of inactivity - when the database was shut down, any writes that happened afterwards would not be saved.

They immediately worked to develop a process that would prevent any data from being lost if you didn't shut down the server, then a week later had released an emergency bugfix version without the bug. Then later they released a tool that could recover any data lost from the bug if the database hadn't been compacted.

That's the kind of attitude database developers need to have towards data integrity.

itaborai83 · on Nov 6, 2011

One of the things that I love about Couch is that the standard way to shutdown the process is simply doing a kill -9 on the server process. No data loss. No Worries. Want to back up your data? rsync it and be done with it.

Couch may have its warts, but it is damn reliable.

maxogden · on Nov 6, 2011

I've heard from many people that with Couch you get "all of your disappointment up front"

itaborai83 · on Nov 6, 2011

I feel that Couch has too much server side programming. It can be off puting sometimes. If anyone wants to make some money, I'd suggest them putting a server on top of a couch cluster that receives mongo queries.

I mean, how hard can it be to

1) Manage some indexes,

2) Keep some metadata around and

3) Build some half-assed single index query planner?

Couch is already a solid piece of technology. It just needs a better API to "sit" on top of it, kinda like what Membase is doing now.

edit: or on top of Riak, Cassandra, PostgreSQL or etc ... on the API side, Mongo has clearly won.

markazevedo · on Nov 6, 2011

Which is to say, not much?

latch · on Nov 6, 2011

There's a lot of anonymity going on here. A new HN account, an unknown company and product, and claims with no evidence.

Why are't links to 10gen's Jira provided? Where's the test code that shows the problems they had with the write lock?

This is an extremely shallow analysis.

angelbob · on Nov 6, 2011

And yet he makes some good points. Pretty much all of this is verifiable.

I don't agree with a lot of his conclusions, but mostly his data is correct.

rmc · on Nov 6, 2011

The original author should provide the verifiable evidence.

latch · on Nov 6, 2011

Look, I'm not the best person to do this..but...good points?

1 - Default writes are unsafe by default:

MongoDB supports a number of "write concerns":

* fire-and-forget or "unsafe"

* safe mode (only written to memory, but the data is checked for "correctness", like unique constraint violations)

* journal commit

* data-file commit

* replicate to N nodes

The last 4 can be mixed and matched. Most (all?) drivers allow this to be specified on a per-write basis. It's an incredible amount of flexibility. I don't know of any other store that lets you do that.

When a user registers, we do a journal commit ({j:true}), 'cuz you don't want to mess that up. When a user submits a score, we do a fire-and-forget, because, if we lose a few scores during the 100ms period between journal commit, it isn't the end of the world (for us, if it is for you, always use j:true)

The complaint is the default-behavior (which I think you can globally configure in most drivers) of the driver? Issue a pull request. Is the default table created in MySQL still MyISAM ?

2 and 6 - Lost Data

This is the most damning point. But what can I say? "No?" My word versus his? I haven't seen those issues in production, I hang out in their google groups and I don't recall seeing anyone bring that up - though I do tend to avoid anything complicated/serious and let the 10gens guys handle that. Maybe they did something wrong? Maybe they were running a development release? Maybe they did hit a really nasty MongoDB bug.

3 - Global Lock

MongoDB works best if your working set fits in memory. That should simply be an operation goal. Beyond that, three points. First, the global lock will yield, i believe (someone more informed can verify this). Second, the story gets better with every version and it's clearly high on 10gen's list.

Most importantly though, it's a constraint of the system. All systems have constraints. You need to test it out for your use-case. For a lot of people, the global lock isn't an issue, and MongoDB's performance tends to be higher than a lot of other systems. Yes it's a fact, but with respect to "don't use MongoDB", its FUD. It's an implementation detail, that you should be aware of, but it's the impact of that implementation details, if any, that we should be talking about.

3 and 4 - Sharding

Sharding is easy, rebalancing shards is hard. Sharding is something else which got better in 1.8 and 2.0, which the author thinks we ought to simply dismiss. I don't have enough experience with MongoDB shard management to comment more. I think the foursquare outage is somewhat relevant though (again, keeping in mind that things have improved a lot since then).

7 - "Things were shipped that should have never been shipped"

This is a good verifiable point? I remember using MySQL cluster when it first shipped. That was a disaster. I also remember using MySQL from a .NET project and opened up a good 3-4 separate bugs about concurrency issues where you could easily deadlock a thread trying to pull a connection from the connection pool.

I once had to use use clearcase. Talk about something that shouldn't have shipped.

This is essentially an attack on 10gen, that ISN'T verifiable. Again, it's his anonymous word versus no ones. Just talking about it is giving it unjust attention.

8 - Replication

It's unclear if this is replica sets or the older master-slave replication. Either way, again, I don't think this is verifiable. In fact, I can say that, relatively speaking, I see very few replica set questions in the groups. It works for me, but I have a very small data set, my data pieces themselves are small. Obviously some people are managing just fine (I'm not going to go through their who's who, I think we all know some of the big MongoDB installations).

9 - The "real" problem

We've all seen some pretty horrible things. I was using MySQL in 5.0 and there was some amazing bugs. There's a bug, which I think still exists, where SQL Server can return you the incorrect inserted id (no, not using @@identify, using scope_identity) when you use a multi-core system. MS spent years trying to fix it.

I guess I can say what 10gen never could...If you were using MongoDB prior to 1.8 on a single server, it's your own fault if you lost data. To me, replication as a means to provide durability never seemed crazy. It just means that you have to understand what's going on.

Look, I don't doubt that this guy really ran into problems. I just think they have a large data set with a heavy workload, they thought MongoDB was a silver bullet, and rather than being accountable for not doing proper testing, they want to try and burn 10gen.

They didn't act responsibly, and now they aren't being accountable.

angelbob · on Nov 6, 2011

If you were using MongoDB prior to 1.8 on a single server, it's your own fault if you lost data. To me, replication as a means to provide durability never seemed crazy. It just means that you have to understand what's going on.

Well, except for that thing where the replication decided that the empty set was the most recent and blew everything else away. And those cases where keys went away.

Losing data, particularly when the server goes down, is fine. Even not writing data isn't terrible, though his points about not knowing whether it has been written in case of failure are really good ones. But corrupting data and then replicating that corrupted data is really, really bad. Often unfixably bad.

They didn't act responsibly, and now they aren't being accountable.

For the complaints about the default write stuff, sure. For everything else... Dunno. He brought up a lot of real, actual issues which were not documented MongoDB behavior. Yes, there's also a fair bit of complaining about the documented bits, and sure, boo-hoo, whatever. But the idea that 10gen is shipping stuff with serious data integrity bugs, and doing so knowing, doesn't seem out of line here.

And while MySQL also has some bad stuff, sure, it has nothing like as many data integrity bugs as MongoDB.

And I say all of this as a serious fan of MongoDB.

einhverfr · on Nov 6, 2011

"This is a good verifiable point? I remember using MySQL cluster when it first shipped. That was a disaster. I also remember using MySQL from a .NET project and opened up a good 3-4 separate bugs about concurrency issues where you could easily deadlock a thread trying to pull a connection from the connection pool."

You can STILL deadlock a transaction against itself in MySQL w/Innodb. How do they let this happen? I do not know. I just know I have been bitten by deadlocks in multi-row inserts quite often there enough to get really really frustrated when I use that db. This is in fact documented in the MySQL manual.

For better or worse, projects which start out without a goal to offer highly reliable software from the start never seem to be able to offer it later.

latch · on Nov 6, 2011

I've also seen a lot of SQL Server developers write large stored procedures that manage to easily deadlock. It's been years since I dealt with it...had something to do with lock escalation, from a read lock to an update lock to an insert lock.

You could say "don't use SQL Server"..or you could say "it's important that you understand SQL Server's locking behavior"

einhverfr · on Nov 6, 2011

It's one thing for two transactions to deadlock against eachother. It takes special talent to allow a transaction to deadlock against itself, which InnoDB apparently allows.

I have NEVER had issues with PostgreSQL transactions deadlocking against themselves, even with monstrous stored procedures.

regularfry · on Nov 6, 2011

I honestly have no dog in this race, but an argument which boils down to "MySQL is just as bad" is not one I'd choose to pursue.

latch · on Nov 6, 2011

I spent the time to write all that, and all you got from it is "MySQL is just as bad"...I obviously did a bad job.

edit:

I brought up MySQL because I think we all know that companies, you, me knowingly ship products with bug. In fact, you can look at public bug tracking for a bunch of major software and see bug fixes scheduled for future releases.

However, if you are going to accuse a database vendor of knowingly shipping data-corruption bugs, I think you absolutely have to back that up. It's slanderous. Obviously, if you think that, you also shouldn't use their product. But you either know something the rest of us don't, or you're a complete ass, if you make those kinds of statements without evidence.

regularfry · on Nov 6, 2011

No, of course that's not all I got from it. I was making a point specifically about the comparison you seemed to be making: that because MySQL did something (shipping with stupid defaults, dataloss bugs, whatever), it doesn't count as a black mark against MongoDB if they do the same.

I didn't comment on the rest because I don't care, not because I don't get it.

nomoremongo · on Nov 6, 2011

Pastebin author here.

Refutations are going to fall into two categories, it seems:

1. Questioning my honesty

2. Questioning my competence

Re #1, I'm not sure what you imagine my incentive to lie might be. I honestly just intended this to benefit the community, nothing more. I'm genuinely troubled that it might cause some problems for 10gen, b/c, again, Eliot & co are nice people.

Re #2, all I can do is attempt to reassure you we're generally smart and capable fellows. For example, these same systems exhibit none of these problems, and we're sleeping quite well through the night, on the new database system they've moved to. I'll omit the name of the database system just so there is no conflict that might undermine my integrity and motives (see #1).

edit:

(also, there are a few comments about "someone unknown/new around here"... trust me, I'm not new or unknown. I'm a regular.)

nullymcnull · on Nov 6, 2011

So, you've got direct engagement from the CTO above, and plenty of other commentary to consider here, but you dropped back in only to announce that "hey, all dissent falls into 2 convenient buckets, and here are my quick rebuttals to those strawmen"? Really?

If intellectual laziness like that is any indication, I doubt anyone is going to be reassured on your point #2. You've dropped a bomb on 10gen here, and done it anonymously to boot. You've got their people sifting through past issues on a Sunday, and for what? Because you fucked up a project by making poor choices, and probably took the well-deserved heat for it? Nevermind your categories. Man up and respond to these people directly, or don't respond at all.

ajsharp · on Nov 6, 2011

Some transparency would be appreciated.

- Who are you?

Your HN acct was created 14 hours ago, and it's name is extremely specific to this particular post -- "nomoremongo". Nothing wrong with it, just a little peculiar considering the subject mater.

- Where did you experience these problems?

I guess I could see some issue around revealing where you work, but honestly, that just sucks. It really doesn't have anything to do with questioning your honesty and integrity; it's more about just being open about things. If you're going to be open about your experiences, why not be open about all of it?

Not to be rude, but the anonymous-nature of this post comes off as a bit over-dramatic.

lawnchair_larry · on Nov 6, 2011

I don't understand why anyone is surprised or bothered that this would be an anonymous post.

Has anyone commenting ever used a piece of software that caused them major problems, while watching others with less experience talk about how great it is? For me, it is beyond my capabilities to refrain from speaking up about it.

His identity does not matter, and it would start a war between people or companies. He is not interested in doing that, and he is not speaking on behalf of a company. There is not really any other way to do it.

Sometimes people need to put information out there but don't want to be personally associated with the information. This is fairly logical, because they are not associated with the information. They just discovered what was already true.

nomoremongo · on Nov 6, 2011

Some are also (fairly) questioning "why the anonymity?", and "where is the evidence?"

Those two things are connected: I can't provide the evidence without revealing identity. And the reason for the anonymity is we still have some small databases with 10gen and a current support contract. I had intended to go public with all this after we had transitioned off the system entirely, but more and more reports have continued to pop up of people having trouble with MongoDB, and it seemed as though delaying would be imprudent. An anonymous warning would be more valuable than saying nothing.

So--if you choose to ignore or dismiss our claims, you're entitled. :-) I still feel satisfied that I did what I needed to do.

price · on Nov 7, 2011

Reading this overall thread, I think you didn't have as much of an impact on people's thinking as you'd probably like to. I hope after your company finishes transitioning you do go public on this, with all the specific evidence.

ajsharp · on Nov 7, 2011

Are you willing to reveal your and your company's identity once you're completely off of Mongo?

nomoremongo · on Nov 7, 2011

Yep. I do regret not GPG signing it or something so we could later claim it without more conspiracy theories. But I'll blog about it on an official blog as soon as we're clear of any interest in MongoDB.

endlessvoid94 · on Nov 6, 2011

So why are you doing it anonymously?

johnx123-up · on Nov 7, 2011

For me, the post is very helpful---when MongoDB markets itself MongoDB is a scalable, high-performance, open source, document-oriented database. Written in C++... http://www.mongodb.org/

I presume that anyone would post such content anonymously.

jonpaul · on Nov 6, 2011

I've used MongoDB in production since the 1.4 days. It should be noted that my apps are NOT write heavy. But, many of the author's points can be refuted by using version 2.0.

Regarding the point of using getLastError(), the author is completely correct. But the problem is not so much that MongoDB isn't good, it's that developers start using it and expect it to behave like a relational DB. Start thinking in an asynchronous programming paradigm, and you'll have less problems.

I got bit my MongoDB early on. When my server crashed, I learned real quickly what fsync, journaling, and friends can do. The best thing a dev can do before using MongoDB is to RTFM and understand its implications.

The #1 reason that I used MongoDB, was because of the schema-less models. That's it. Early on in an applications life-cycle, the data model changes so frequently that I find migrations painful and unnecessary.

My two cents, hopefully it helps.

CarlHoerberg · on Nov 6, 2011

Schema-less is imho a overrated feature. ORMs like DataMapper (Ruby) and NHibernate (.NET) can generate the schema on the fly for RMDBS, so no need for migrations pre-production. But when your application is in production you need migrations even with a "schema-less" db! See, rename a field and "all your data" is lost, unless you migrate the data from the old field to the new one..

vidarh · on Nov 6, 2011

"Schema-less" has the potential (if you use it properly) advantage of allowing gradual migration.

As long as your code can handle all versions of objects in current use, you can deploy new code, then either migrate objects as they're updated/rewritten, and/or slowly migrate objects in the background.

For certain types of schema changes in large enough data stores, this can be a killer feature. I remember one RDBMS setup I had to deal with where we were "stuck" having to do a lot of suboptimal schema changes because the changes we actually wanted to do resulted (based on tests in our dev environment) the system to slow to a crawl where it was unusable for 8+ hours and we just couldn't afford that kind of downtime. We spent a lot of engineering time working our way around something that'd simply be a non-issue in a schema-less system.

fauigerzigerk · on Nov 6, 2011

"Schemaless" most of the time means "code based schema". Dealing with multiple schema versions at the same time is always possible, relational or not, but it causes significant bloat and complexity. When I hear gradual migration I think code decay, but I can see why it could be useful sometimes.

In my view, schemaless models are only desirable if the schema is not known until runtime, e.g. user specified fields or message structures, external file formats that you don't control but might need to query, etc.

einhverfr · on Nov 6, 2011

Well, also there is the issue of highly unstructured data. In LedgerSMB, we put it in PostgreSQL along with highly structured data, and just use key-value modelling. These include things like configuration settings for the database in question and the specifics about what a menu item does. I might migrate some of this to hstore in the future (particular the menus).

There are many shortcomings of this approach but when dealing with highly unstructured data (or basically where the inherent structure is that of key/value pairs) it strikes me as the correct approach, and not different really from using NoSQL, XML, or any other non-relational store.

zzzeek · on Nov 6, 2011

right but was that MySQL ? schema migrations are not a problem on quality systems like Oracle and Postgresql. Altering tables and such doesn't stop the database from running at all.

it's always MySQL's fault in these things.

goldmab · on Nov 6, 2011

Fair enough, but you can also have a schemaless store by using JSON fields in PostgreSQL or MySQL.

angelbob · on Nov 6, 2011

Not indexably. But you can do a hideous many-tables-per-real-table thing where each field gets a tall thin table in PostGRES or MySQL, do a lot of joins to get your data, and index the fields in that.

It's not as awful as it sounds, performance-wise. It is as awful as it sounds in terms of maintainability, of course.

joevandyk · on Nov 6, 2011

You can index hstore fields in PostgreSQL.

deoxxa · on Nov 6, 2011

That's not an unfair comparison at all - indexing the data in a JSON blob is entirely possible and practical.

einhverfr · on Nov 6, 2011

What you want to index regarding a large text file and what indexes you can create and use may be different.

christkv · on Nov 6, 2011

Even more common is when you have a mature application with a lot of users and you need to add new fields to f.ex the user table and you can't because alter table across a sharded db setup will take days or weeks so you end up creating a table that's a hashtable

key, value

and then proceed to pay the cost of joins against it. Most of my excitement around NoSql comes from hard earned pain not from "oh new shiny thing, I got to use it".

mechanical_fish · on Nov 6, 2011

I'll take well-understood pain that I can patiently work around, one time, over the course of days or weeks, if the alternative is random bugs that bite you in the night for years at a time.

Joins are no fun, yes, but as you gritted your teeth and implemented those cute little table-based key-value stores, did you find yourself mentally calculating the time required to restore the whole system from backup while muttering tiny prayers? Probably not. Did your code wake up the ops team an average of once per month for several years? Did you lose data? Did you have to put up an apologetic blog post? Did anyone have to get on the phone and rescue customer accounts, one at a time, with profuse apologies and gifts? (Now that is a non-scalable process...)

But at least this argument about maintenance is a real argument. The one about wanting to save time during initial development by skipping the declaration of schemas reads like the punchline of a Dilbert cartoon that you'd find taped to the wall in the devops lunchroom.

christkv · on Nov 6, 2011

@mechanical_fish yes and it was a mysql installation. Weird things happen with all systems once you push them up to the edge of performance both of the hardware and interconnections between servers.

Slow interconnect between servers caused me headaches in the past with mysql for replication. Shared switched did the same. Problems with locks under high contention did the same. Problems with the client libraries the same. In fact all storage systems have similar problems and pain. Some are just more battle tested than others.

kfool · on Nov 6, 2011

(Disclaimer: I work on ChronicDB)

I second that: schema-less is misunderstood.

There's a difference between flexibility of schema definition and flexibility of schema change[1].

Flexibility of schema change, which NoSQL does not solve, is increasingly more important. Not just for large data stores but also for the data development process and release process. To avoid playing the suboptimal schema-change game both the code and the data need to be updated together. Or at least be given the illusion that they have[2].

A probably obvious question most developers must have asked by now is: if we've built great tools to version source changes, how come we haven't built great tools to version data changes?

[1] - http://chronicdb.com/blogs/nosql_is_technologically_inferior...

[2] - http://chronicdb.com/blogs/change_is_not_the_enemy

stickfigure · on Nov 6, 2011

See, rename a field and "all your data" is lost, unless you migrate the data from the old field to the new one

This is not true.

I wrote Objectify, a popular third-party java API to App Engine's datastore. The data migration primitives worked out building Objectify are what ScottH built into Morphia, the Java "ORM" system for MongoDB. With a small number of primitives (mostly @AlsoLoad and lifecycle callbacks) it's possible to make significant structure changes on-the-fly with zero downtime.

This is, IMHO, the best thing about schemaless datastores. There's no longer any compelling reason (at least, in the datastore) to take down a system for "scheduled maintenance".

For more information, here is the relevant section of Objectify's documentation:

http://code.google.com/p/objectify-appengine/wiki/Introducti...

bborud · on Nov 6, 2011

ORMs are a pain to use. In addition to know the domain you need to map from and the domain you map to, you now also have to understand the mapping process.

zzzeek · on Nov 6, 2011

...in exchange for dramatically pared-down and simplified code, consistent data access practices, and hundreds of hours of developer time saved. Driving a car is tough too - how to steer, drivers license, gas, insurance, what a PITA. Yet somehow it remains preferable to walking in many cases, despite the latter being mastered by most two year olds.

bborud · on Nov 7, 2011

If you need all that code to talk to the database I suspect you are in effect using your database as the integration layer. Ouch.

rapind · on Nov 6, 2011

The same should be said for ODMs as well. A document might be a little more straightforward to map to an object but there is still plenty of miss-match.

latch · on Nov 6, 2011

I'll agree with this. Document stores don't solve the object-relational impedance mismatch, but they do help (and personally, I find they help more than "a little").

einhverfr · on Nov 7, 2011

ORMs encourage bad database design and little interoperability on the db level.

On short folks build their db around the ORM instead of vice versa.

rb2k_ · on Nov 6, 2011

My main problem with schema migrations was that once you reach 100 million records or so, those tend to lock down the DB server and take quite a while

einhverfr · on Nov 7, 2011

Let's see. On Pg:

postgres=CREATE TABLE alter_benchmark(id bigint);

CREATE TABLE

postgres=# explain analyze

postgres-# insert into alter_benchmark (id) select * from generate_series(1, 200000000);

postgres=# create temporary table alter_benchmark(id bigint); CREATE TABLE postgres=# explain analyze insert into alter_benchmark (id) select * from generate_series(1, 200000000); QUERY PLAN

-------------------------------------------------------------------------------- ---------------------------------------------------------

Insert (cost=0.00..12.50 rows=1000 width=4) (actual time=1082180.877..1082180. 877 rows=0 loops=1)

   ->  Function Scan on generate_series  (cost=0.00..12.50 rows=1000 width=4) (a

ctual time=87400.737..512954.539 rows=200000000 loops=1) Total runtime: 1086336.466 ms (3 rows)

postgres=# alter table alter_benchmark add test text;

ALTER TABLE

takes insignificant time (less than a second).

I feel so spoiled using PostgreSQL :-D

As I understand it PostgreSQL doesn't rewrite the table to change the column. It might to change the data type of a column. EXPLAIN ANALYZE doesn't work with ALTER TABLE because there is no query plan generated, so I have no idea how quickly the statement actually executed. All I know is it completed in under a second.

phaylon · on Nov 7, 2011

You could try `time psql < alter-statement.sql`. I know, it'd not really be useful as it measures lots of overhead. But if it's fast on that, it's fast during an active session.

einhverfr · on Nov 7, 2011

I could have turned on timing too (\timing in psql). All I know is it returned within one sec. Oh well, next time, I suppose.

va_coder · on Nov 6, 2011

Schemaless is awesome. Are you dba or a developer? If you're a developer like me schemaless is awesome because of it's flexibility. I focus less time on the how to do stuff and more time on the what stuff should we do.

I've been using Hibernate for 9 years and I finally came to the conclusion that it's just not worth the pain. When working on RDBMS I'm using straight SQL from now on.

einhverfr · on Nov 7, 2011

Schemaless also dispenses with the ability to declare what correct data is in the schema. For critical apps that's a high caliber footgun. For critical apps that have to integrate with eachother, it's a nice piece of artillery aimed squarely at your foot.

ajsharp · on Nov 6, 2011

> But when your application is in production you need migrations even with a "schema-less" db!

I disagree. The most frequent use-case I come across is adding columns / fields to a table / collection, and not needing to ALTER TABLE and run a database migration as part of the deployment process to add said fields is extremely awesome.

kfool · on Nov 6, 2011

> "because of the schema-less models."

> "I find migrations painful and unnecessary."

A schema-less model neither makes a migration less painful nor eliminates it.

In MongoDB, what did you do when the data model changed?

electic · on Nov 6, 2011

We extensively tested this inside Viralheat with a write heavy load of over 30,000 writes per second and basically it failed our test. It is not robust for the analytics world is the conclusion we came to. Though, I hope it gets better one day...it has potential.

jacques_chester · on Nov 6, 2011

Have you talked to Greenplum? They have a postgres derivative that can cope with Yahoo's clickstream data.

(I am not affiliated with either of them).

latch · on Nov 6, 2011

What was the problem? Performance? I'm not sure what robust means to other people, but to me it implies crashes and data-issues.

jdagostino · on Nov 6, 2011

what did you end up going with?

electic · on Nov 6, 2011

Our company is a big data company. So our amazing engineers are responsible for storing hundreds of millions of pieces of data per week AND also crunching and analyzing that data. So basically we need a system where we can have incredible write and read performance but also a system that is elastic in nature. Most importantly, it has to be available.

Before I go into more details, MongoDB is great for most people who don't have a high transaction volume. It is easy to setup and easy to use. So if you are in this camp, MongoDB is probably a good fit for you.

We did about two months worth of extensive tests in our lab. Basically two things didn't bode well for us. One, the locking killed reading...we just had a hard time keeping the flow of writes and the flow of data to our statistics cluster alive. Yea, you could use replication but that too didn't work too well performance wise. Two, the sharding didn't seem that robust. As the cluster got bigger and bigger, we started noticing the overhead of keeping it up was getting to be too great. Rather than write in detail, I think this article covers some of the scaling issues we experienced:

http://blog.schmichael.com/2011/11/05/failing-with-mongodb/

We finally used a hybrid system. We went with Membase, now CouchBase, to handle immediate storage and we are now implementing Hadoop for our long term storage needs.

P.S. Our entire stack is a KV in nature.

Goldcap · on Nov 6, 2011

Just reading about your transactional volume, it seems like at it's face MongoDB wouldn't be a good fit for this project. 30k per second is not anywhere MongoDB pretends to live, I think by their own admission. And Sharding in MongoDB, while being called a core feature, was bolted on after core development, probably intended to give Mongo some credibility with those who want it to be more scalable. IMHO if you need that kind of scalability, you're already straying from the Mongo Niche, 2.0.0 notwithstanding.

So agreeing with a point earlier, if you don't like a write lock implementation, and have concerns about scaling, and have a huge transactional volume, just really not something that fits well with MongoDB.

I've been using Mongo now (currently using 1.8) for three (is it almost three now?) years, 2 million hits/day, with a replicated set, and while I've needed maintenance, reindexing, and (gasp) restarts on occasion, never had any of the problems identified by the author of this post.

Bottom line, sounds to me like someone was in over someone's head from an architectural standpoint, made a bad choice of MongoDB, and then blamed 10gen for his own lack of foresight. So while I empathize with the struggle, I fault him for not knowing his options in advance, TESTING first, then betting the farm on a fairly new opensource codebase.

LOTS of other database solutions that would scale better. Analyzing lots and lots of transactional stateless data with MongoDB map-reduce? Well, just kinda like killing yourself by trying to sprint up from the bottom of the Grand Canyon. "You really tried to do that?"

christkv · on Nov 6, 2011

there's an initial hadoop plugin for mongo that might be a better fit for doing map-reduce over large datasets https://github.com/mongodb/mongo-hadoop

bdarfler · on Nov 6, 2011

We easily support 10s of millions of writes and reads against Mongo per hour on a very small (single digit) number of shards in the cloud (i.e. crappy disk I/O). While that is around an order of magnitude less than 30k a second I would be surprised if we couldn't scale mostly linearly by adding shards.

P.S. If your stack is KV then you should use a KV store.

electic · on Nov 6, 2011

"I would be surprised if we couldn't scale mostly linearly by adding shards"

MongoDB aside, why should you assume? You should test the heck out of any DB solution before using it to base your product on.

nikcub · on Nov 6, 2011

Links about Foursquare's problems with MongoDB. The site was down for a while when their 1.6 instance crashed:

* http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/

* http://www.infoq.com/news/2010/10/4square_mongodb_outage

* http://groups.google.com/group/mongodb-user/browse_thread/th...

I like MongoDB, it is easy to setup, work with and to understand. I think it has an opportunity to become the mysql of nosql (in more ways than one)

Foursquare and 10gen (the makers of MongoDB) share USV as an investor.

vannevar · on Nov 6, 2011

It should be noted that this was not really a problem with MongoDB. Foursquare used a poorly-chosen shard key that caused a disproportionate load on one of its shards, and on top of that did not have proper system monitoring in place to alert them that a server was running out of RAM. It should also be noted that no data was lost in the process of resolving the problem.

latch · on Nov 6, 2011

And both companies were extremely transparent about it and the community generally appreciated the way it was handled:

https://groups.google.com/forum/#!topic/mongodb-user/UoqU8of...

nikcub · on Nov 6, 2011

I thought both Foursquare and 10gen handled the situation then very well, especially considering how much traction the story got (it had all the elements - a popular service, a popular new database, etc.)

I was sort of suggesting that this anonymous post may have come from somebody at Foursquare, since what is described kinda matches what happen there. The 'politics' element could also match because of the common investor - but I see that both 10gen and 4sq have responded here saying that they do not know who wrote this - which I believe.

ehwizard · on Nov 6, 2011

From CTO of 10gen

First, I tried to find any client of ours with a track record like this and have been unsuccessful. I personally have looked at every single customer case that’s every come in (there are about 1600 of them) and cannot match this story to any of them. I am confused as to the origin here, so answers cannot be complete in some cases.

Some comments below, but the most important thing I wanted to say is if you have an issue with MongoDB please reach out so that we can help. https://groups.google.com/group/mongodb-user is the support forum, or try the IRC channel.

> 1. MongoDB issues writes in unsafe ways by default in order to win benchmarks

The reason for this has absolutely nothing to do with benchmarks, and everything to do with the original API design and what we were trying to do with it. To be fair, the uses of MongoDB have shifted a great deal since then, so perhaps the defaults could change.

The philosophy is to give the driver and the user fine grained control over acknowledgement of write completions. Not all writes are created equal, and it makes sense to be able to check on writes in different ways. For example with replica sets, you can do things like “don’t acknowledge this write until its on nodes in at least 2 data centers.”

> 2. MongoDB can lose data in many startling ways

> 1. They just disappeared sometimes. Cause unknown.

There has never been a case of a record disappearing that we either have not been able to trace to a bug that was fixed immediately, or other environmental issues. If you can link to a case number, we can at least try to understand or explain what happened. Clearly a case like this would be incredibly serious, and if this did happen to you I hope you told us and if you did, we were able to understand and fix immediately.

> 2. Recovery on corrupt database was not successful, pre transaction log.

This is expected, repairing was generally meant for single servers, which itself is not recommended without journaling. If a secondary crashes without journaling, you should resync it from the primary. As an FYI, journaling is the default and almost always used in v2.0.

> 3. Replication between master and slave had gaps in the oplogs, causing slaves to be missing records the master had. Yes, there is no checksum, and yes, the replication status had the slaves current

Do you have the case number? I do not see a case where this happened, but if true would obviously be a critical bug.

> 4. Replication just stops sometimes, without error. Monitor > your replication status!

If you mean that an error condition can occur without issuing errors to a client, then yes, this is possible. If you want verification that replication is working at write time, you can do it with w=2 getLastError parameter.

> 3. MongoDB requires a global write lock to issue any write

> Under a write-heavy load, this will kill you. If you run a blog, you maybe don't care b/c your R:W ratio is so high.

The read/write lock is definitely an issue, but a lot of progress made and more to come. 2.0 introduced better yielding, reducing the scenarios where locks are held through slow IO operations. 2.2 will continue the yielding improvements and introduce finer grained concurrency.

> 4. MongoDB's sharding doesn't work that well under load

> Adding a shard under heavy load is a nightmare. Mongo either moves chunks between shards so quickly it DOSes the production traffic, or refuses to more chunks altogether.

Once a system is at or exceeding its capacity, moving data off is of course going to be hard. I talk about this in every single presentation I’ve ever given about sharding[0]: do no wait too long to add capacity. If you try to add capacity to a system at 100% utilization, it is not going to work.

> 5. mongos is unreliable

> The mongod/config server/mongos architecture is actually pretty reasonable and clever. Unfortunately, mongos is complete garbage. Under load, it crashed anywhere from every few hours to every few days. Restart supervision didn't always help b/c sometimes it would throw some assertion that would bail out a critical thread, but the process would stay running. Double fail.

I know of no such critical thread, can you send more details?

> 6. MongoDB actually once deleted the entire dataset

> MongoDB, 1.6, in replica set configuration, would sometimes determine the wrong node (often an empty node) was the freshest copy of the data available. It would then DELETE ALL THE DATA ON THE REPLICA (which may have been the 700GB of good data)

> They fixed this in 1.8, thank god.

Cannot find any relevant client issue, case nor commit. Can you please send something that we can look at?

> 7. Things were shipped that should have never been shipped

> Things with known, embarrassing bugs that could cause data problems were in "stable" releases--and often we weren't told about these issues until after they bit us, and then only b/c we had a super duper crazy platinum support contract with 10gen.

There is no crazy platinum contract and every issue we every find is put into the public jira. Every fix we make is public. Fixes have cases which are public. Without specifics, this is incredibly hard to discuss. When we do fix bugs we will try to get to users as fast as possible.

> 8. Replication was lackluster on busy servers

This simply sounds like a case of an overloaded server. I mentioned before, but if you want guaranteed replication, use w=2 form of getLastError.

> But, the real problem:

> 1. Don't lose data, be very deterministic with data

> 2. Employ practices to stay available

> 3. Multi-node scalability

> 4. Minimize latency at 99% and 95%

> 5. Raw req/s per resource

> 10gen's order seems to be, #5, then everything else in some order. #1 ain't in the top 3.

This is simply not true. Look at commits, look at what fixes we have made when. We have never shipped a release with a secret bug or anything remotely close to that and then secretly told certain clients. To be honest, if we were focused on raw req/s we would fix some of the code paths that waste a ton of cpu cycles. If we really cared about benchmark performance over anything else we would have dealt with the locking issues earlier so multi-threaded benchmarks would be better. (Even the most naive user benchmarks are usually multi-threaded.)

MongoDB is still a new product, there are definitely rough edges, and a seemingly infinite list of things to do.[1]

If you want to come talk to the MongoDB team, both our offices hold open office hours[2] where you can come and talk to the actual development teams. We try to be incredibly open, so please come and get to know us.

-Eliot

[0] http://www.10gen.com/presentations#speaker__eliot_horowitz [1] http://jira.mongodb.org/ [2] http://www.10gen.com/office-hours

rit · on Nov 6, 2011

One addendum to Eliot's "both our offices hold open office hours"; we (10gen) also recently opened an office in London.

Although we don't yet have a fixed office hours schedule, we typically hold them every 2 weeks. The exact dates are announced via the local MongoDB Meetup Group°; we always hold the hours at "Look Mum No Hands" on Old Street.

At least one (and often several) of our Engineers make themselves available during this time to answer any questions and assist with MongoDB problems.

° http://www.meetup.com/London-MongoDB-User-Group

lubujackson · on Nov 7, 2011

Great response. I'll take this over an anonymous, half-informed screed any day.

wedgemartin · on Nov 7, 2011

We've been using Mongo for almost a year now, and we've not seen any of the major issues such as data loss referred to. We've seen some of the growing pains of a quickly moving, dynamic platform, but nothing outside of the realm of what is reasonable for such a powerful solution. It's true that implementing sharding is no simple task, but with enough planning up front, you'll find yourself able to scale horizontally very quickly. After a couple of weeks of planning, we wound up making a few small changes in our codebase to migrate from master/slave to a sharded environment. Not a huge undertaking by any stretch, provided the current flexibility of our platform. Also, due to the fact that 10gen does make all bug information publicly available, we've managed to get it done with zero surprises.

Wedge Martin CTO Badgeville

tzury · on Nov 6, 2011

Eliot, thanks for coming online and publishing your perspective.

MongoDB simply gets better in any version and it is indeed a reliable platform, at least as human beings (employees) are.

skrebbel · on Nov 6, 2011

> If you want to come talk to the MongoDB team, both our offices hold open office hours[2] where you can come and talk to the actual development teams. We try to be incredibly open, so please come and get to know us.

I envy how all your (potential) customers are from California.

mnutt · on Nov 6, 2011

I've been to their open office hours in NYC and, though we don't have a support contract, they were incredibly welcoming and helpful.

mschireson · on Nov 6, 2011

Besides office hours in California NY and London we also have user groups in many cities http://www.10gen.com/user-groups and have (one day, very inexpensive) developer conferences frequently (next two in Dallas and Seattle).

meghan · on Nov 6, 2011

We try to get as much face to face time with the community as possible. Check out 10gen.com/events and 10gen.com/user-groups.

kanwisher · on Nov 7, 2011

Half the startups in NYC use mongo, but that might be cause they are connected to Union Sq Ventures

martin_sunset · on Nov 7, 2011

Or it might be because MongoDb really shines in the typical start-up use case...

einhverfr · on Nov 7, 2011

Or at least better than MySQL for cases where not all data fits a perfect relational model?

newman314 · on Nov 6, 2011

Given the response, what are some best practices/gotchas for MongoDB then?

It might be helpful for 10gen to put together a short doc on what to watch out for evaluators.

shuzchen · on Nov 6, 2011

Most of the best practices/gotchas can be found by reading the online documentation. Of all the replies Eliot gives they were either plainly obvious (oh, you have a system under heavy load and you're surprised that it gets worse when you give it another task to do?) or mentioned in the documentation. If you're planning on using something - especially for a production system - I sure hope you at least read all the available documentation.

I don't think a short doc is of any help for evaluators. You shouldn't be basing your decision on 400 words and some bullet points. If you're serious about your datastore then you should treat it seriously.

ehwizard · on Nov 6, 2011

In addition to the documentation, videos from the conferences are a great place to start:

http://www.10gen.com/presentations/mongoboston-2011/schema-d... http://www.10gen.com/presentations/mongosf-2011/practical-sc...

sdotsen · on Nov 7, 2011

When I was doing my research and came across a bunch of "Why not to use MongoDB" articles, I looked at alternatives solution to see if there was anything "better." Granted NoSQL is the new kid on the block but I wanted to see what my options were. Guess what I'm using, MongoDB. Why? Their documentation is fan-f'n-tastic. Their newsgroup support is just as good, lots of folks who help troubleshoot issue, including the developers themselves.

k_bx · on Nov 7, 2011

I was even gonna write a big blog post and say something similar to what you just said, but (of course) you said it better. Thank you.

on Nov 7, 2011

[dead]

betaclass · on Nov 7, 2011

The original story was submitted by nomoremongo, not nmongo. The original story was very detailed and identified known problems with MongoDB. This post is a one-liner.

So prove you were the one who wrote the account in pastebin.

wizzard · on Nov 7, 2011

More evidence: nomoremongo and nmongo have some differences in their writing style. nomoremongo uses semicolons properly, nmongo does not. An even bigger difference: nmongo doesn't capitalize his Is.

"Maybe i already work for Fox News"

"and i am the original owner"

"based on the FUD i'm spreading"

"Yes, i am a troll"

"And i think everyone who truly"

It's a different person.

einhverfr · on Nov 7, 2011

A previous discussi ("Failing on Mongo") on had nmongo asking someone to post this with a link to the pastebin.

judofyr · on Nov 7, 2011

I think you have it mixed up:

nomoremongo posted the comment at 2011-11-06T03:43:48Z: http://news.ycombinator.com/item?id=3201772

nmongo posted the story at 2011-11-06T07:05:13Z: http://news.ycombinator.com/item?id=3202081

Produce · on Nov 7, 2011

Ah, but how do we know that you're not a random person who's trying to discredit the OP?

meghan · on Nov 7, 2011

It's the same account as the person who submitted the link.

udp · on Nov 7, 2011

I wonder if someone guessed the password to the throwaway account. I bet it (was) either the same as the username or something like "password".

nmongo · on Nov 7, 2011

This account never had any credibility to begin with, and i am the original owner, that's the point i'm trying to make.

harryf · on Nov 7, 2011

Have you ever heard of Gartners hype cycle? http://en.m.wikipedia.org/wiki/Hype_cycle

Hype, fud, fact and misinformation are all part of determining which technologies succeed and which fail.

Many of HN readers have been through 2 or 3 of those cycles and have well developed instincts for spotting BS and verifying technologies.

In other world all you've really demonstrated here is you're a bit of a dick.

Super_Jambo · on Nov 7, 2011

I hope you get sued for any loss your idiocy has caused.

Torn · on Nov 7, 2011

You submitted an anonymous anti-mongo story under the name 'nmongo'? What was your agenda here?

itaborai83 · on Nov 7, 2011

and now he is openly trying to discredit himself. He is either a troll with a conscious or his cloak of anonymity is wearing thin

nmongo · on Nov 7, 2011

Yes, i am a troll, and things have gotten a little out of hand. Just because a story was very successful at fishing for up-votes, it doesn't have to be true, people around here need to be a lot more sceptical. And i think everyone who truly pays attention will know by now that MongoDB is the next MySQL.

Jterrell · on Nov 7, 2011

"....i think everyone who truly pays attention will know by now that MongoDB is the next MySQL."

You are joking again?

CrownStem · on Nov 7, 2011

Whether you are the original poster or not, you're not a troll, you're a sociopath emboldened by anonymity. Cloak yourself in some idealistic mission if it makes you feel good- but your mission isn't to make the point that "people around here need to be a lot more skeptical"- You're a sociopath that enjoys kicking a hornet's nest just to watch the reaction.

com4 · on Nov 8, 2011

You're a fine troll sir. Well played.

Now please go back to digg/reddit/4chan and tout this fine accomplishment where that's accepted.

nmongo · on Nov 7, 2011

My intention was to troll as many hipsters as possible and make them a little more aware of how easy to manipulate they are, without even providing the slightest bit of evidence. It cracks me up that there are startups out there right now, making foolish architecture decisions based on the FUD i'm spreading. Start thinking for yourself!

fiznool · on Nov 7, 2011

And in the process of discrediting, you might have turned many people away from MongoDB. You actions seem irresponsible to me. Unbeknownst to you at the time of posting, I'm sure, but your blog has gone somewhat viral, and it could take 10gen a while to recover from the negative press. Did you consider this when posting?

Kudos to Eliot for coming on and answering your phony accusations. I feel sorry for him though as he has obviously spent a great deal of time in responding, when he could have been doing other important things, like fixing urgent bugs. As others have pointed out, this is the mark of a company who take very good care of their customers. Customer service is what differentiates chiefs from cowboys.

HN is an important community resource, especially for people with little startup / dev experience. I would urge you to think next time before being so irresponsible.

nmongo · on Nov 7, 2011

There are only a few comments from credible sources in this thread, and none of those had anything negative to say about MongoDB, don't believe blindly.

itaborai83 · on Nov 7, 2011

It´s worth asking. Zed?

brianj · on Nov 9, 2011

I agree, Zed would post under his own name.

SaltwaterC · on Nov 8, 2011

Zed has the balls to post stuff under his own name.

gstar · on Nov 7, 2011

Interesting you characterise mongodb users as hipsters - why is that? (at the risk of engaging a troll)

We use mongodb extensively, but I get the hipster feeling also, mostly because they hold office hours at Look Mum No Hands in Old Street, which is ultra proto-hipster.

sk3tch · on Nov 9, 2011

I think he was more pointing the finger at HN in general.

mbyrne · on Nov 7, 2011

My karma: 160 Account created 552 days ago

nmongo's karma: 678 Account created 1 day ago.

djtriptych · on Nov 7, 2011

No one made any architecture decisions in the few hours this was a story. You managed to cause a dustup and come out looking like a sociopath.

blago · on Nov 7, 2011

Did you intend the flurry of "mee too" comments?

teyc · on Nov 7, 2011

You have too much time on your hands.

Hacker News used to be a place where serious and somewhat time poor programmers gather to exchange ideas and learn from one another in good faith.

We are certainly not here to listen to some dumbfuck spread misinformation.

You should have known better.

Consider joining Fox News. But I'm not sure if they'll stoop this low.

Kind regards.

Aaronontheweb · on Nov 7, 2011

Dude, was the Fox News ad hominem necessary? This doesn't look like Reddit...

teyc · on Nov 7, 2011

When I was about 14, I was cocky as hell. Finally I was given a dressing down. Best thing that happened to me.

nmongo · on Nov 7, 2011

Maybe i already work for Fox News, or even Oracle... That's the whole point, don't be gullible, question everything!

teyc · on Nov 7, 2011

You could do better if you thought about your actions. The boy who cried wolf had his fun laughing off the villagers - "question everything." he said.

mangodrunk · on Nov 7, 2011

If true, you do realize that you falsely tarnished a real company and product. If this was supposed to be some lesson in verifying sources and information, I think you went about it in the wrong way. What if someone started spreading misinformation about nmongo to prove a point (even an insignificant and unrelated one), would you like that?

sigzero · on Nov 7, 2011

asshat

deedubaya · on Nov 7, 2011

Then you are, in fact, a douche.

saturn · on Nov 7, 2011

What exactly was a hoax? The document pasted was rather detailed and, while somewhat overblown, was obviously written by someone who knew what they were talking about. It contains a lot of criticism of design decisions by MongoDB; these are pretty common and being opinion, can't really be called a hoax.

There's also a couple of anecdotes of MongoDB supposedly failing in various ways in the author's experience. Are you saying those were fake?

Just because you submitted the document here does not mean you wrote it. Pastebin logs the document as being submitted on 5th Nov. http://pastebin.com/FD3xe6Jt

I don't buy it. I don't think nmongo wrote the doc on pastebin. Maybe I'm overrating my character-detection abilities, but it didn't smell like it was written by some immature time-wasting kid.

edit: I use mongo in prod; very much a student of the "right tool for the job" school. Not trying to add or subtract weight from the original text; ambivalence reigns supreme regarding internet nosql battles. Just saying that my possibly unreliable circuits detect quite a gulf between the original document and the OP's hysterical, caps-lock-engaged cry for attention here.

megaman821 · on Nov 7, 2011

This admission has my "spider-sense" tingling also. The communication style between this guy and the author of the pastebin log seems so different.

It is plausible that someone guessed the password of nmongo's throwaway account, quickly changed that password, and then started posting the whole thing was a hoax.

nmongo · on Nov 7, 2011

It's hilarious how all my attempts to make people aware of this story being a hoax are flagged or buried, spreading FUD is so much easier.

_ugfj · on Nov 6, 2011

This rant is completely outdated and it shows: "pre transaction log" "fixed this in 1.8". You realize MongoDB is at 2.0 now and the transaction log was introduced in 1.8, right? Yes, MongoDB had problems but since the transaction log it's pretty good. I have used MongoDB since early 1.3 and I knew what I was doing and we never lost a bit of data. There is a tradeoff -- while MongoDB handled write load easily that a MySQL box with 2-3 times the RAM , I/O capability couldn't at all we understood the bleeding edge of using MongoDB back then. We have, for example, kept a snapshot slave which shot itself down often, took an LVM snapshot then continued replicating. Never needed those.

We have meticulously kept a QA server pair around and the only time when I have ran into a data loss problem was when I have hosed one of those -- but only one and even the QA department could continue (and hosing that server was me not knowing that Redhat 5 had separate e4fsprogs and e2fsprogs, only partially MongoDB fault but now it works without O_DIRECT so even this would not be a problem any more) . Never understood for example how could foursquare get where they got to -- didnt they have a QA copy similarly?

dextorious · on Nov 6, 2011

""This rant is completely outdated and it shows: "pre transaction log" "fixed this in 1.8". You realize MongoDB is at 2.0 now and the transaction log was introduced in 1.8, right?""

You do realize that 1.8 vs 2.0 is not eons ago, but just a few months, right? And you do realize that the cavalier-throw-all-caution-to-the-wind development attitude that cause all this problems can and does continue to exist? You don't eliminate that just because you added a transaction log (as late as in 1.6, IIRC).

Also: http://news.ycombinator.com/item?id=3200683

openmosix · on Nov 6, 2011

Well, I worked in Vodafone (and Nokia) in very large (laaarge) projects, serving ~50 milions users. Years ago, no hope for NoSQL, we used MySQL. We hit at least 10/20 bugs, solved by 'hotpatch' from Sun. So? I think as developers we should get used to bugs and patches. Should I write a post "don't use MySQL?". We also hit several bugs in the generational garbage collector. Stop using Java? I don't feel the drama here.