Hacker News new | past | comments | ask | show | jobs | submit login
What 10gen nailed with MongoDB (calv.info)
109 points by calvinfo on Aug 29, 2012 | hide | past | favorite | 84 comments

This article is spot-on, but 10gen did one more thing they probably shouldn't have: marketed their not-very-scalable, not-very-durable database as a solution to problems companies were having with scalability and durability.

I'm very curious about this. I feel like the community at large seems to agree mongo is a dangerous choice. But then I read articles like this. At what scale do you replace Mongo? Should it only be used for bootstrapping something? Is it something that 80% of apps will be fine on?

I'm in an interesting spot myself right now. I've been hacking on Django since 0.96. I love the ORM, and hate everything else. Python on the other hand is a wonderful language. Rails... I've been spending 8 hours a day for the past month building a rails project. I love a lot of things about rails, hate the ORM, and dislike a lot of the magic.

On the flip side: I've been a Javascript/front-end developer since before jQuery existed, and I love coffeescript. I understand callback-style evented programming and therefore node.js apps make perfect sense to me.

My co-founder and I have just finished some heavy duty design/wireframe/conceptual work on a new project of ours. It's time to build the API.

I am COMPLETELY torn with what framework/language/database to use for this project.

At first I thought mongodb and coffeescript+node would be fun and reliable. But the community seems to think mongo is unreliable and almost dangerous. So I thought, hmm, Riak looks like it will let me throw whatever data I want into it (like mongo) but it's rock solid. I later dismissed that since I figured I'd need to do a lot of heavy lifting on my own. Rails keeps poking me in the back of my mind but I honest-to-god hate ActiveRecord and am afraid to use DataMapper for fear it won't play nice with a lot of the popular ruby goodies out there. I've modeled most of the project now in Django and i'm starting to play with Tastypie for the REST component. It feels too kludgy.

I'm a wreck. Mommy?

I don't know what kind of project it is - but have you actually considered just using Postgres? Its not hip and takes a bit to setup, but after you are done, you have reliable workhorse with all the stuff you need (access control, schemas, etc. pp.) It has a vast amount of documentation and reading about its advanced features never gets old. There is a lot of talent around that knows how to use it. If you still consider Ruby but don't like ActiveRecord magic, don't forget to have a look at Sequel.

I would recommend against DataMapper. It is nicely constructed, but has some weird issues. Also, DataMapper 1 is going to be replaced by DM 2, which is a completely different beast (but better, hopefully).

Don't be fooled by ease of setup of the DB too much. It should be rather straight-forward, but even if a DB can just be downloaded and ran to dabble with it, always remember that this just shifts the moment at which you really have to dig into the details. You don't want to do that 2 days before going to production. Be on the lookout for red flags, though: unreasonable and weird configurations that have to be flipped for no understandable reasons at all before going into production. That shows that the project is not maintained very well or the developers have lost track. Also try to figure out how long it takes you to solve a new problem with the database, with documentation reading and all. Pick the one that you can wrap your head around best.

I have a FreeBSD box in my closet (I felt like playing with FreeBSD this weekend) currently running Postgres 9 and PostGIS which is how my current stuff is working.

It's actually pretty rad to have every single location I've ever checked-into via Facebook stored in this database. I can say "find all the spots 2 miles from my house" and it's shockingly quick.

I have PostgreSQL on my mac. It's easy to install and use.

PostGIS was the killer. Version 2.0 (homebrew installs postgis2) does not play nice with Django locally (on my mac) and the ticket has been open for a year or something to fix it. That's a huge red flag for me right there. https://code.djangoproject.com/ticket/16455

I guess I could install 1.5 manually.

My problem with trusting Ruby/Rails is that I've only been seriously developing in it for a month.

Okay, but thats more a Django problem then a PostGIS problem.

Lets put it like that: feel free to try some new stuff, but don't start getting all crazy with it. So, if you tried Rails and weren't that convinced by it, maybe default to something that you know, even if it isn't a love relationship. Learning is a great thing, but do it piece by piece, especially if your goal is getting work done.

How quick exactly?

I've been testing ElasticSearch's geo location search and most queries take 50-150ms.

Standard term searches and filters will take like 5ms, with id GETs taking mere fractions of that still.

Well, compared to your numbers, I guess not that quick. Django is responding in 0.28 seconds. That is going over wifi to my server in a closet though. And i'm downstairs kinda far from the router.

Screenshot: http://wsld.me/J5zu

If I hit it repeatedly for a while it comes back in as quick as 0.14, but usually not that fast.

ping to server:

    michael at Achilles in ~
    ○ ping -c 5 apollo 
    PING apollo ( 56 data bytes
    64 bytes from icmp_seq=0 ttl=64 time=9.497 ms
    64 bytes from icmp_seq=1 ttl=64 time=83.113 ms
    64 bytes from icmp_seq=2 ttl=64 time=7.335 ms
    64 bytes from icmp_seq=3 ttl=64 time=6.998 ms
    64 bytes from icmp_seq=4 ttl=64 time=6.393 ms

    --- apollo ping statistics ---
    5 packets transmitted, 5 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 6.393/22.667/83.113/30.241 ms

I'd call it good enough, I was of the understanding that SQL database geo plugins were usually miserable. I'll add a mental exception for Postgres.

I used PostGIS in proper GIS scenarios and it was faring quite well. The advantage of PostGIS is that it supports all the index types you need directly. And been around awhile, in a good sense.

Setting up Sequel and Postgres is really easy. sudo apt-get install postgresql, gem install sequel... and I think that's about it.

I didn't say its hard. But setting up all users properly etc. (which you definitely should get comfortable with _before_ you have your whole dev environment running under 'postgres') takes a few minutes. It is a pretty unsurprising process though, which is why I wrote about the red flags: it doesn't raise any.

You must be joking if you think that count as 'setting up' a database also I just set up 4 servers:

apt-get install mysql-server redis riak-server mongo-10gen-server

Easy of setup matters. Because unless you are using a hosted database then you will be managing and maintaing that database day to day. Which is why I would never recommend PostgreSQL for startups. It is far, far too convoluted for many of the basic tasks you will be doing everyday.

That's why everyone chooses MySQL as a relational database. It is easy to setup, easy to maintain and EVERY problem has been solved and searchable online.

Wait... what?! I don't see any major differences between MySQL and PostgreSQL in terms of setup/maintenance. They have slightly different authorization styles, but Postgres can be configured to behave just like mysql.

Ignoring for a moment that postgres was my example, but not my point: choose what you can wrap you head around. I used postgres successfully in multiple projects an no one ever had a big problem with it. I've seen MySQL setups burning sky-high because of some Gotchas the team ran into. That doesn't make either of those bad.

The underlying issue is that (except for the most trivial cases), investing time in properly operating and using your database is a huge gain that many young companies ignore. 6 Month later, they have a burning datastore at hand that they don't know how to fix.

Exactly ALL databases will have issues. Which is why I suggested MySQL. Every issue has either been solved online or worked around in Percona, Facebook or Twitter's implementation.

There are huge benefits that come from being the most widely deployed database.

There's no good reason to use MySQL over PostGres these days.

If you are thinking of them like they are interchangeable, you need to stop, use a SQL database and quit messing. Seriously, they are not at all the same category of thing.

Mongo is an update-in-place structure store with powerful atomic operations. Mongo will not grow storage if you are just changing the value of existing fields.

Couch is a compare-and-swap document snapshot store with multi-master replication. In replicated mode it's eventually-consistent with manual conflict resolution.

Riak is an indexable, eventually consistent key-value store with tuneable tradeoffs for speed versus safety, and no central point of failure.

Cassandra is a horizontally scalable, eventually consistent key-value column hybrid designed for huge data.

And so on... they are not "databases", they are not something where you replace one with the other, they are specific tools for specific tasks, usually much more so than unsexy but adaptable SQL.

IMO you're worrying about the wrong thing right now. Build your API and application without any backend persistence. Make sure your API works. This will result in a nice data API for your backend that you can then plug into various tools.

Use this to test and play with and empirically figure out what the best persistence layer is for your app. There's no possible way you know at the beginning of the app what persistence is best for your app, so don't make that decision yet.

I agree with your approach. I feel though that the angle to take would be something like mongo... where I do not need to worry about a schema/migrations etc... and rather focus on the API logic itself.

I'm a big fan of relational databases. The problem with them is the high upfront cost of developing a new product: maintaining your schemas, updating data types, getting bogged down in data mapping code and worrying about what the most efficient SQL implementation is going to be, etc.

I like the fact I can whip something up quickly with MongoDB, get everything working and then once my schemas and services operating thereupon have stabilised then I can start looking at swapping my implementations out for a SQL-based back-end if I think it's worthwhile.

I strongly recommend Sequel for Ruby:


It provides an ActiveModel-compliant interface, so it works fine with Rails. It also has a lot less magic (associations don't use proxy objects, for example) and has a lot of features that ActiveRecord is sorely missing (such as complete support for composite primary keys). I don't know what your other problems with ActiveRecord are, but Sequel is definitely worth a look. I'm very happy with it.

Tastypie developer checking in. What's don't you like about it? Personally I like how it separates resources from ModelResources and lays out your API (that's why I got involved with the project in the first place).

You are thinking too hard. Just pick one, you can always change later.

All of the options you have presented are "Very Good" - they do so many things right. To steal a pithy quote from my co-worker: "Its a balloon squeeze."

This decision will not determine the success or failure of your business, move along.

I pick Rails & Ruby.

Just a quick note -- if you're working in python, I recommend having a look at zodb[1]. Then you won't need an ORM, because it is an actual object database. The main caveat with zope and zodb is that the move to python3/pypy is very slow, and not highly prioritized.

Another, real world, object database is gemstone -- and can be used with smalltalk or ruby (among other things)[2].

[1] See eg:




[2] http://maglev.github.com/



If the main thing you like about Django is ORM yet are considering a DB layer that doesn't require an ORM (like mongo) - have you looked at Pyramid (previously Pylons)?

Besides giving you only the tools you want, it works very well with a driver like PyMongo. If you're starting out small you can also get free developer hosting with MongoHQ to get you up and running quickly.

Oh and you get to keep doing things in Python which is very nice. I've also had no problems with MongoDB for my web needs and I've worked at a few places that use it without the scary data corruption stories told to developers in order to make them keep drinking their Java.

If you have no basis upon which to decide what database to use, you go with the one with the most domain expertise floating around. This means you use the most popular one, or the one that your tech people have used before. Conceptualizers are the last people on the decision list here, sorry, and there's nothing wrong with ActiveRecord outside of academe.

tl;dr: Reflect upon your assumptions before making a decision more complicated than it might need to be.

how about Flask and SQLAlchemy?

I'm surprised this is buried by people suggesting Rails or Postgres or what have you. SQLAlchemy as good a SQL ORM as there is, and it isn't married to a framework, or a SQL backend.

If you like Python and want to use a SQL backend, SQLAlchemy is a good place to start.

SQLAlchemy is great, but you cannot say that it is simple :)

he says he loves the Django ORM. This is typically predictive of a poor response to SQLAlchemy's model :)

I've looked at SQL Alchemy and think it's pretty radical actually. I think that this evening/weekend I am actually going to begin doing this in flask+sqlalchemy. It was a strong contender from the beginning.

Here is my first flask app ever, https://github.com/whalesalad/arbesko-files

No authentication or session management or anything above basic request/response, but it was fun and serves it's purpose inside of Arbesko well.

I sincerely advice against using node. (I'll get to that in a bit). For the DB, I personally like CouchDB. It's easy to get started with, and it lends itself very well to horizontal scaling.

As for why node is a bad choice: you can think of 2 categories of languages:

1) Those which allow you to get things done quickly but in which large code bases are hard to maintain because you need to carry too much information in your head, and they possibly don't scale very well because they tend to run in an interpreter (python, ruby, lisp).

2) Those which require some amount of boilerplate to get simple things done, but provide structure and modularity so it's easier to maintain large code bases, and they possibly produce high-performance code because they tend to be compiled or almost-compiled (C#, Java, Go).

There's a trade off between "getting up to speed quickly" and "maintainable performant code".

Node is bad because it fails at getting you up to speed quickly AND it doesn't make your code maintainable at large scale. This is because async-style programming is not natural to how we think.

The "performance" should not be taken as a plus for Node because you can get it in other languages/platforms (Go, for example) without sacrificing maintainability.

The node debugging experience is quite nice. Unfortunately, I found myself having to use it.

The issues with carrying around information in your head is true IFF you have a large monolithic app. SOA, which is advocated with compiled languages as well, helps to avoid this issue because each service is limited in scope. Then again, SOA itself is counter to "hello world" latency...

I think this is important because if you find yourself dealing with a (example) rails app that is experiencing cognitive burden overload issues, splitting it out into a SOA can be a nice transition (eventually replacing parts with other languages if it is required.)

Also, if you care about performance and quick execution and want a scripting environment, luvit is much faster than node, much lower memory usage and IIRC you can use lua continuations instead of async style.

IMHO, node.js as an edge server providing connection handling and templating with backend services giving it JSON seems like a reasonable 3-tier architecture, especially because your edge tier can then be maintained by people with front-end experience.

> IMHO, node.js as an edge server providing connection handling and templating with backend services giving it JSON seems like a reasonable 3-tier architecture, especially because your edge tier can then be maintained by people with front-end experience.

I have yet to try it, but this seems like a great approach. Manage most of the computing intensive tasks in whatever environment you like/need. Use Node to talk to APIs and render templates. With a well developed client framework you can even make only the first render in Node and the following in the client talking to the same APIs.

This leaves the most complicated back-end problems in the hands of developers that like and understand those problems and creates a friendly environment for the front-end developer who usually focuses on building features. The front-end developer even has already some experience with async code because of events in the browser, so he'll probably get up to speed quickly with Node.

What's SOA? And how does it help with cognitive overload issues?

Service-Oriented Architecture. It helps because each service is self-contained, and only communicates with other services through defined interfaces and known boundaries. Because they are self contained, you then only have to have the mental overhead of the complexity of the service you are currently editing (so long as you do not break any of its contracts.)

IE: Encapsulation and isolation, which is something that "scripting" languages generally stink at enforcing (on purpose!)

Besides shell and perl (which really do excel at a certain kind of task), calling a language a 'scripting language' is usually not much more than a dismissive insult by someone with a heavy investment in something like Java, which seems more serious mostly because it involves a lot of self-important boilerplate and has high levels of adoption by big old companies where you have to wear a tie.

no insult. i am deeply invest in ruby and js. calling a scripting language in my book means no a priori verification of the code before beginning to run it. Of course this is a spectrum from Coq to Ruby..

Completely agree with the CouchDB choice. It has really elegant and unique features but combined with rock solid durability. You can configure it to fsync after every update it does as much as it possibly can together with the OS to make sure you data reaches the disk before returning a result. That is very important if you value your data.

As for other features, I like the REST-ful interface, a continuous changes feed, master to master replication, and a usable interface to see and manage data -- Futon. I haven't yet found a product that comes with all those features.

Another thing I consider a great feature of CouchDB is the cached map-reduce views. It makes it easy to think and reason about indexing.

I've seen this situation quite a bit with traditional SQL databases: somehow you end up with a giant complicated query, and it's becoming a bottle-neck and it's having a terrible impact on performance. Somehow you need to untangle the mess and figure out where the problem is. Sometimes you find out the query is just poorly written. Other times the problem is fixed by simply adding indexes to some table.

This is a terrible situation I don't want to run into.

I have a feeling MongoDB might also suffer from a smilar problem, because it doesn't have cached map-reduce views; it has a query language.

In CouchDB, views are pretty much indexes, and they're pretty much the only way to do a query. I'd argue that because it's explicit like that, it's easy to reason about.

Yeah, we call it Incremental MapReduce and we _LOVE_ it. I swear, it's one of the biggest "ah ha!" moments whenever I do training for Cloudant's customers.

The basic idea is that the intermediate Map function results are persisted, while most other system (read: hadoop) throws them out.

If you're interested in some of the MapReduce history, where it's going, incremental indexing, etc., then you should read this: http://gigaom.com/cloud/why-the-days-are-numbered-for-hadoop...


Yeah, Couch is seriously underrated. HTTP API, Master-Master, append-only, these are things to love if you cherish your web app's data. I wish Couchbase hadn't abandoned the Apache version, or rather, I wish someone else had stepped in to take their place as "the Couch company" the same way 10gen is for Mongo.

Cloudant is close, but their BigCouch stuff still isn't merged yet as far as I know, so if you go with them you're still using some weird Couch fork instead of the open source version. Which might be more of a psychological problem than an actual one, but it makes me really uneasy about the Couch ecosystem they're the only thing we have that's close to a gold-standard champion.

Yea, it's unbelievable how underrated it is.

I searched Reddit for "couchdb", and there's pretty much no new entry since 2 years ago. I almost thought couch was dead.

It's definitely really hard to find out what's going on with Couch lately. Even checking the bug tracker isn't very helpful for finding out what's going into the next version and when it's releasing.

They lost me when they started to shift "everything" towards mobile a while back, maybe 3 years ago. Do they still focus a lot on mobile?

And what's the deal with CouchDB vs. CouchBase?

Just to clear up one point that was semi-made already, "they" meaning Apache CouchDB isn't shifting anything toward mobile. They're still making the same great product from before.

The mobile stuff that you're referring to is CouchBase's. It's a completely different product and their work should not be confused with Apache CouchDB's, though it often is.

Mobile is a focus insofar as the master-master sync working well for mobile apps where the user may be online most of the time, but offline a significant amount as well. I believe TouchDB is where they're focusing exclusively on mobile now, and it's less of a focus with the main branch.

As for CouchDB vs. Couchbase, Couchbase is a fork of CouchDB headed by CouchDB's creators Damien Katz and J. Chris Anderson. Couchbase basically throws a lot of the nice things of CouchDB out the window (such as the HTTP API) in order to integrate Membase feature.

This is worth reading: http://www.couchbase.com/couchdb

To be even more specific while still generalizing, Couchbase is the Membase product with a heavily modified CouchDB swapped in for the storage mechanism (previously SQLite).

As for the leaders, it's a smattering of CouchOne/Couch.io and Membase/Northscale people. Or, as a friend put it, CouchOneBase.io.


There are all kinds of considerations with respect to node.

But discounting all interpreted languages because they are interpreted is crazy biased. I don't at all accept (and it is not self-evident) that a high-boilerplate compiled language is necessary to have a large codebase or acceptable performance. Unless most of your storage is in memcached, the database is far more likely to be a bottleneck than your interpreter (!)

There are a number of compilers for Lisp.

Between these two issues with your post, I would not be convinced to accept your advice regarding node, either.

I'm aware that lisp can be compiled, that's why I said that dynamic languages "tend to be" interpreted (the implication being, they're not always interpreted).

My point is about comparing dynamic and non-dynamic languages. Dynamic languages tend to have "magic" where complex logic occurs without you explicitly declaring it, at the expense of either a) maintainability or b) performance or c) both

There's a trade off here: varying degrees of magic vs varying degrees of boilerplate.

Boilerplate code can make things very explicit and thus easier to reason about.

Now, NodeJS has the worst of both worlds. The callback mess is not easy to write, not easy to reason about, not easy to maintain, and there's no magic.

That's a lot of sacrifice, it better be for good gain. But exactly do you gain? I don't see any gains. Performance? You can get that with other languages/platforms (like Go for example) without sacrificing code clarity.

This idea that MongoDB is dangerous is just crazy and seems to often come from the same people who endlessly tout PostgreSQL as the world's most perfect database. The fact is that very large companies are using MongoDB in production environments. Yes. MongoDB still has some issues in particular the write locks but the situation is improving pretty rapidly and almost exclusively affects very high end users.

So the answer to your question is simple. Use what matches your data model. MySQL if your data is more relational as it is proven and well known. MongoDB is perfect when you treat it like a document store. For example I fetch one User object (with all the user's data self contained) from the database and do everything client side. So in my case MongoDB works perfect.

But either way. Do whatever gets you up to speed the fastest.

> The fact is that very large companies are using MongoDB in production environments.

Not as a criticism of Mongo but the thing about very large companies is that they can devote a team to just making sure some bit of technology works. Almost anything is suitable for a large production environment if you can afford to have a couple people elbows deep in it tuning and troubleshooting.

> This idea that MongoDB is dangerous is just crazy

The idea that a database was shipping with a mode where it would return after a write without waiting for the result to be written to disk. Or a database where silent data corruption is an easy possibility and yet still called itself a database is what is "crazy"

Firstly that mode is completely valid for many use cases e.g. logging, metrics. Secondly all the drivers handle this behaviour behind the scenes for you. Thirdly it is trivial to set the write concern to specify how many instances of the replica set you want the write to persist to as well.

And silent data corruption is an easy possibility ?

One of the main sticking point for serious mongodb infrastructure is the global lock -- yes, write operations cause the entire mongodb instance (every database, every read operation) to block. This is clearly one of the main issues surrounding high-concurrent mongodb development.

Thankfully with mongodb-2.2 there is now instead a database lock which mitigates the issue.

Personally I feel redis more closely resembles language primitives with simple key-value pairs, hashes, lists (ordered, repeating values allowed), sets (un-ordered, no repeating values), and sorted sets (more interesting).

Redis has no clustering (yet) so is best used when your data is transient and you are able to repopulate.

I have no experience with Redis or MongoDB, but your statement is only true if the storage layer is transient. Some of us use servers that don't go away upon shutdown, backed by a SAN.

Have you tested how long it would take your redis instance to be usable again after it finishes replaying the commitlog (AOF)?

agreed. however, in the article it praises the mongodb defaults. but time and time again we've been bitten by open files issues that are easily solved by shipping better init.d scripts. I also feel that scaling out mongodb clusters is unnecessarily complicated.

> Think about a web developer who shows up to a hackathon, ready to break out his new side project. He doesn’t want to spend hours planning schema or creating databases and tables. He just wants a quick way to persist and retrieve data.

When you say that, this is what I hear:

> MongoDB is the quickest way to accrue large quantities of technical debt.

In this analogy, fast accrual of debt leads to one place: bankruptcy. Which, in software engineering, is the ground-up rewrite.

I’m questioning whether it’s productive to consider popularity when building components that will underly the long-term architecture of other people’s software. Overnight success can disappear as quickly as it arrives, and I, for one, have a lower time preference for these things. I’d rather be responsible for a tool that gains a lot of respect for being a robust, reliable and high-performance piece of kit over a long period of time, than one which had blazing popularity in the beginning but then proved to be the source of many a developer’s nightmare later.

Popularity of a non-overnight type is one of the biggest reasons to use something like Postgres or Rails: this means it gets beat on a lot, there is documentation and you can find people who know how to use it.

Some projects with this kind of popularity (n.b.: not the ones I mentioned by name) are designed like shit, do irrational things, have performance problems, have security problems, whatever. But they have been used and they are usable and you can find documentation and experts to deal with them.

> In this analogy, fast accrual of debt leads to one place: bankruptcy. Which, in software engineering, is the ground-up rewrite.

Bankruptcy can be just that -- bankruptcy and loss of job, liquidation of a company.

It can go either way. Spending years perfecting a product and polishing it to 100% only wake up and find out that someone else used something crappy fast and easy to setup thing, got the product out, and started getting real customers. Now they have enough to go double their team and start rewriting if they want.

Or picking something fast and easy to setup and after getting a few initial customers finding that scaling out doesn't work. The site crashes and everything is going to shit, data is silently corrupted, it has already been backed up and overwrote older backups and customers are leaving for something else, maybe not as flashy and cool, but something that works.

Uh, what?

What's wrong with using MongoDB to screw around while you are developing it? It's not like you have to marry it. Figure out what schema you are going to be using, and then write your table layouts for Postgres.

Sorry but what you are suggesting is crazy.

MongoDB has a completely different data model to PostgreSQL. You can't just build your app around one approach and then trivially move to another.

Pick the database for the data model not the other way around.

I could not fail to disagree with you less.

(^ that means I agree).

This idea that you can arbitrarily switch between database X and database Y with radically different models is a really harmful fallacy. Code which attempts to ‘abstract away the details’ is often leaky, buggy, complex, and just as prone to tight coupling as any other solution.

At the very least, such has been my experience. And I think certain individuals have a policy of downvoting those comments they disagree with rather than explaining their counterpoints in a comment.

In my experience there isn't an issue. It all comes down to how well you design your application. Here's the interface to a service in a fun side project I'm currently working on (comments removed):

    public interface IAnalysisService
        Guid CreateAnalysis(Guid ownerId, string name);
        void LoadingAnalysisInputs(Guid analysisId);
        void RequestAnalysis(Guid analysisId, string number, IEnumerable<Records> inputs);

        Guid CreateAndRequestAnalysis(Guid ownerId, string name, string number, params Records[] inputs);
        Guid CreateAndRequestAnalysis(Guid ownerId, string name, string number, IEnumerable<Records> inputs);

        Analysis FindAnalysis(Guid analysisId);
        Analysis FindAnalysisForProcessing(Guid analysisId);

        void CompletedAnalysis(Guid analysisId, AnalysisResults results);
        void FailedAnalysis(Guid analysisId, string reason, params object[] args);        
These methods implement some business rules, do some database work and throw a few messages onto a service bus. Why is it so hard to believe that the database implementation for any of these methods affects the consumers? Here's a sample implementation:

        public void FailedAnalysis(Guid analysisId, string reason, params object[] args)
            AnalysisRepository.Failed(analysisId, string.Format(reason, args));

            Bus.Send(new AnalysisCompleted {AnalysisId = analysisId});
I don't see any obvious bugs, complexity or leaky abstractions.

Your comment is lost on most people here - Ruby doesn't have interfaces.

And ruby doesn't have macros that would make such a db switch even simpler.

> For 90% of web development cases, simply storing and retrieving objects from a persistent store is enough of an API.

"90%", that's quite a specific number.

Most of what I've written for the past fifteen years is web apps, and I'd say about 10% fall into the "all I need is put and get objects" category. Especially once I knew how to use SQL effectively, especially once I knew how to work with ACID, these features became irreplaceable in almost every project I've worked with.

IMHO it's all about what you're familiar with. Rich Hickey thinks we're morons for using OOP, as he's a brilliant functional guy. Shrugs.

Completely agree. Web development is simple, but it's not that simple. Relationships between tables, and the ability to normalize and aggregate data are very useful, even for web development.

> Think about a web developer who shows up to a hackathon [...] He just wants a quick way to persist and retrieve data.

This is the _worst_ reason to use Mongo.

If you are doing a hackathon or just prototyping and need persistence, use pickle. Or store JSON to a text file. Who cares what the solution is. It should:

a) Be built into the standard library. b) Not require bringing up another service to work.

[Edit.. clarity in the quote]

That's the point. It's actually easier to use MongoDB than to store JSON in a text file. And things move faster, and you still have the dream that you can scale it up (even if 99% of the time it doesnt happen).

MongoDB does pretty much everything the opposite of Oracle: Easy to setup, use and program.

However, Oracle sure is raking in the dough, so is Mongo succeeding simply because they're fulfilling a niche left by Oracle, or because it's the 'right way' to build a software product company?

I hope the answer is the latter, and that Oracle's billions are simply the result of the being entrenched after years and years of doing it the 'old way. That being said, I think it's a little too early to be championing Mongo's business model (even though I'm rooting for it).

Although I am a heavy user/believer of MongoDB, one caveat that is overlooked in this article is the administration part of Mongo Clusters.

Even the simplest replication needs 3 servers, add sharding to the dance for extra performance and the server counter jumps up. For startups this is a major decision to consider as a full-time ops guy isn't always affordable. Luckily, PaaS services as MongoLabs and MongoHQ save the day.

I don't understand your point.

You could use replication (master/slave) with 2 servers and you don't have to use sharding for anything. How is MongoDB not the same as every other database ?

If anything MongoDB is by far the easiest database I've ever used for setting up clustering/sharding.

To have a legitimate replication setup, an arbiter node is highly recommended: http://docs.mongodb.org/manual/administration/replication-ar...

Master / slave is deprecated. Replica sets can run on 2 nodes however, running on an even number of nodes in a set is a very bad idea as a primary node failure will cause the rest of your nodes to go read-only. Always run an odd number. Avoid master / slave.

Have you tried CouchBase? The built in admin panel is a dream. Adding and removing servers to a cluster is super easy.


I was nailed by MongoDB. It's a great database for prototyping, but all projects that go anywhere run into its limitations: scale, durability, intimate relationship with OS, ease of administration. MongoDB is right for some projects, but in many cases is not. So if you know that you can transition from it at a later point, it gets you off to a flying start. If you foresee that it will be hard to switch later, I would advice spending some time considering future scenarios upfront.

Nailing community and ease of use (drivers for everything, easy to install) and documentation wouldn't have been much use if the product sucked; it doesn't.

A lot of people that I've seen with issues around Mongo haven't understood how it's expected to be used: i.e. it's not a relational database and the schema design is therefore different.

This might be because a lot of people have no idea what the viable use cases are for all the million different new database-things there are. Because it just isn't that clear unless you have used them all or are a total nerd for the design of database software.

In this respect MongoDB is partly a victim of its own success - that means more people using it who aren't sure what it's for, or who are led to believe it's for something it's not really so hot at.

Interestically someone just released a new async scala drivers : http://news.ycombinator.com/item?id=4454077

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact