
Ask HN: Companies using MongoDB, suggestions to deploy at scale? - jyotiska
We are using MongoDB for past 1 year, but with time, the aggregation framework is getting slow with 1M+ documents in a collection, even with indexes.<p>People who are currently using MongoDB in production, what kinds of performance optimizations did you have to make in order to deploy it in scale? Any tips, stories, suggestions are welcome!
======
Ronaldo777
We moved to PostgreSQL.

Our performance problems went away, with almost no work other than the initial
migration. We fixed many data integrity problems during the migration, and
have not noticed any problems since. Our uptime has become near perfect. We
now get to use all of the other great features that PostgreSQL offers.

~~~
bhouston
We have thought about this. We are interfacing via Mongoose to MongoDB. Is
there an easy way to still use Mongoose while transitioning to PostgreSQL?

~~~
nullspace
Mongoose is such a brilliant library right? If I was writing something in
nodejs - the existence of mongoose itself would justify simply going for a
mongodb backend.

------
nullspace
Want to see other replies as well - but here are some of my thoughts.

1\. Run your query with explain - to confirm that your queries are indeed
indexed[1]. There have been a couple of times where we discovered that a
specific branch of a complex query was not hitting index - so discovering that
was useful.

2\. One thought on slow reads with aggregation framework and 1M+ documents is
that you are bottlenecked on memory. Even though all your queries may be
indexed - you may be hitting disk too often (and doing a lot of random I/O)
which is expensive. Check out MMS - and specifically the page faults graph -
and you will get an idea about how many times a second you hit disk. For my
specific application - response time was very important - so we tried to keep
our page faults close to zero. Is increasing the RAM in your instance a viable
option? If so this would definitely give you some breathing room.

3\. You should try and verify if your response is slow because the query
processing time itself is high, or if it's because you have several concurrent
queries competing for system resources (CPU / memory etc). If it's the latter
- the fix is easy - just add more machines to your replicaset (balance this
with adding more RAM).

4\. Check your lock %. My guess - from your brief description - this will not
be an issue for you. If this is an issue - you __may__ need to shard your
system - or do 5.

5\. I think the most scalable solution would be to rethink your data pipeline.
Why do you need to apply complex aggregation queries on 1M+ documents? Can you
build a processing pipeline that takes in your data - and builds "prepared
views" for the queries you expect out of your aggregation pipeline? In my
application - we used stream processing pipeline that aggregated all the
messages we got - in all the different ways we needed - and inserted them into
different databases (to work around the DB level locking :( ) in mongo. Each
collection had exactly ONE type of query that would be executed - this would
be an extremely simple query and would be indexed. This pattern is common -
even outside of mongo.

Good luck :)

[1]
[http://docs.mongodb.org/manual/reference/method/cursor.expla...](http://docs.mongodb.org/manual/reference/method/cursor.explain/)

~~~
boucher
In addition to verifying you are hitting an index for all of your queries, you
should verify that all of your indexes are in memory. Depending on the size of
your dataset, trying to fit more (or all) of it in memory may be an option
worth trying.

------
bhouston
We've had slowness on [https://Clara.io](https://Clara.io) because of what we
believe was excessive file system fragmentation as we were running on zfs and
then later brtfs. We were on zfs and brtfs because it allowed for easy and
quick backups via snapshots. But eventually writes were slowed leading to long
lock times and making everything basically unworkable. A reinstall would
always fix the issue for a while. We believe that the same reasons that
backups were fast, the copy on write feature, is the same reason we got
excessive fragmentation after a while.

I believe it was because we were not using xfs as our file system, which we
have recently moved over to. So far the issue hasn't re-occured.

Other than this issue, MongoDB has been pretty good. Indices are important and
formulating queries so that they are index based is also important (the same
result can be obtained in multiple ways, some of which likely do not use the
indices, while other formulations will use them.)

We use Redis as a cache in front to reduce the unnecessary repetitive load on
MongoDB for common operations.

We have MongoDB cluster setup and it has saved us a number of times. Also it
is really just a live backup system -- we've had a few hardware failures and
we still have never had to restore for an archive backup, we just resync from
one of the failover servers. But using that is likely a no-brainer, and you
are likely already doing it.

~~~
s_kilk
We had similar issues with MongoDB on btrfs. While the snapshot features were
nice, the write performance was abysmal. We moved to xfs at some point and got
a massive increase in write throughput.

So far I've worked at two separate companies deploying MongoDB into production
and we've never had any real issues, mostly because in both cases the lead
engineers really knew what they were doing, had read the manual front-to-back,
and they knew exactly what kind of OS and filesystem setup to use for maximum
effectiveness.

------
giaour
There are ways to get around the latency you're seeing without (or while)
moving away from Mongo. If you rewrite your queries as mapreduce operations,
then you can run them at a scheduled interval to update a reports table with
the previous interval's data. This is called incremental mapreduce and Mongo
has a great guide on how to get started:
[http://docs.mongodb.org/manual/tutorial/perform-
incremental-...](http://docs.mongodb.org/manual/tutorial/perform-incremental-
map-reduce/)

You won't get real time data any more, but switching to batch processing will
speed up your users' experience.

The cynic in me, however, wants to just tell you that you've outgrown Mongo
and should move on. It happens; not every database is designed to support
every need, and Mongo happens to be a spectacularly shitty fit for the kind of
OLAP scenario you're describing.

If you absolutely must have real-time analytics crunched on the fly, I
recommend that you look at data stores that are a better fit for your
application. InfluxDB is great for time series data. Aerospike is getting a
lot of press in the AdTech industry. And old standbys like MySQL or Postgres
can handle queries over a million records without breaking a sweat.

------
BIackSwan
I'd worked on a couple of Mongo DB projects a year and half ago. It could not
scale for performance, it was supposed to be fast datastore for simple lookups
and heavy updating (multiple 10s of updates/sec - spiking to hundreds of
updates) (no matter what we did), eventually had to switch out to other tools
(either inbuilt custom ones or off the shelf).

I won't claim that we did everything under the sun to optimize it, we could
have probably optimized it to work eventually. It was just easier to switch
and not deal with the headaches of maintaining/optimizing the mongoDB setup
after days/weeks of headaches.

~~~
MichaelGG
>heavy updating (multiple 10s of updates/sec - spiking to hundreds of updates)

In what world is 100 updates/sec "heavy"? I don't mean to come off rude, but
even with a full disk write per update that fits into normal magnetic disk
IOPS without even batching transactions.

For comparison, with zero work on our end, and running off a 2-disk RAID 1
array with a 10 or 15K HDD, it was trivial to pass 10K/sec updates.

~~~
gaius
It is heavy for MongoDB.

It's a weird phenomenon you see these days, people talk about 10 updates a
second on a million records as some sort of intense load that needs "sharding"
and God knows what else, when a DBA from 20 years ago wouldn't even blink at
it.

------
exabrial
1m records? We have 25 million+ in normalized MySql tables with proper
indexing and run 20ms queries on modest Xeon E5-1650, 12gig ram dedicated to
db, Samsung 840 Pro.

Seems like a drop in the bucket to be experiencing performance problems...
What are your use cases? Why did you pick MongoDB i the first place?

~~~
bane
Heck, I have a SQLite dB I'm working with right now that has ~30 million rows
across a few dozen tables. Queries against the indexed fields are returning in
130ms on my rMBP.

------
vvpan
I really don't understand why RethinkDB, a seemingly superior tool, is not
getting more traction. MongoDB was and still is a hack, a simple design that's
prone to blow up.

~~~
arthursilva
The last time I checked using it in producing wasn't even recommended by the
authors [http://rethinkdb.com/stability/](http://rethinkdb.com/stability/)

~~~
vvpan
Hm. They should call it webscale and be done with it.

------
rinens
Change your data model so that you don't need to aggregate. Duplicate if
necessary, handle the resulting potential inconsistency.

Have the right indexes; especially with complex aggregation you might be
surprised to find they aren't being used due to a quirk of your query.

A single mongo instance can handle a beefy load, but not if every read is an
aggregation over a million by million by who knows how many docs.

~~~
twunde
I second the idea of denormalizing your data. It feels odd because that's
exactly what you're trying to avoid with relational databases, but honestly,
mongo is designed to handle denormalized data
[http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-
fo...](http://blog.mongodb.org/post/88473035333/6-rules-of-thumb-for-mongodb-
schema-design-part-3)

------
late2part
Reading these comments, I get the curious impression that people don't think
MongoDB is webscale.

~~~
exabrial
I loled.

------
twunde
Definitely take a look at Mongo's recommendations:
[http://docs.mongodb.org/manual/administration/monitoring/#di...](http://docs.mongodb.org/manual/administration/monitoring/#diagnosing-
performance-issues) This has gotten much better since I last used mongo
heavily. I ended up using the slow query log, but it looks like the tooling
has improved significantly and now there is a list of common problems

------
cheald
Use TokuMX (I don't have a stake in TokuMX, I'm just a happy customer). It's
mostly a drop-in for Mongo 2.6 (though converting data is a downtime
operation). To your point specifically, we're running aggregation stuff on
collections with >1 billion documents and it hums along great. Massively
better disk usage, clustering keys, ACID guarantees, document-level locking,
tunable compression, and a bunch of other small improvements make it a really
excellent choice.

TokuMX does have one significant problem in that the technology they use for
indexes experiences performance degradation over time if you do lots of
deletes from the head of the collection (ie, if you're processing data that
gets inserted and deleted roughly chronologically), but it is fixable with
reIndex() (which is an async operation analogous to OPTIMIZE TABLE). That's
really the only large problem we've had with it, and it's surmountable with
regular maintenance. In the case of monotonically-ordered data though, you can
address the problem with partitioned collections, which lets you split a
collection up into chunks, and then drop chunks as they become unnecessary,
without having to do expensive things to the whole collection.

It looks like WiredTiger should provide many of the same benefits, so you may
evaluate that first before taking the plunge, but I haven't personally
compared it with our tokumx setup yet.

Outside of that, you probably need to start sharding your dataset. Figure out
a shard key that will let you distribute your data well, and start splitting
your load among multiple replica sets. Figure out if your issues are IO, CPU,
or whatever - there's a lot of documentation on how to do this, so I won't
rehash it here, but in general, you first need to understand what is slow
before you can fix it. Chances are very good that your culprit is disk IO,
though (because it always is in databases), and you can probably get some of
that performance back with more RAM (to keep more of your working set in RAM
at a given point), faster disks, and/or a striped disk array. Depending on
your platform, some kernel tuning may help, as well, though it's going to be
specific to your platform and setup.

You might also look at other tools and see if they're better-suited for your
purposes. We use Elasticsearch as our ad hoc query layer on top of TokuMX, and
are doing some stuff with Cassandra and BigQuery for bulk analysis of
historical data. I'm not suggesting that you replace your whole stack, but if
you're trying to shove a square peg through a round hole, you're just going to
be frustrated. Scaling is all about specializing your toolset to fit your
problem domain - there is almost never a single solution that is the best
thing for all use cases. If you need to add another tool to the stack to
offload one part of your requirements, do it!

------
squidcactus28
1M records? Bro you are seriously scaling.! Good job!

------
SFjulie1
well by asking this question to mongo users you miss the audience of the
people successfully scaling up -for less money- that may belongs to the people
NOT using mongo.

Business is about costs and prices not about thinking that business is about
the religious adoptions of mongo, agile, C#, python, linux ....

This question is saying that your business is ruled by the techs not anyone
with the common sense needed to pay the bills (thus your paychecks).

The feedbacks I have to give you is that TCO per mb stored increase more than
linearly with mongodb, with weiredly enough a decrease in SLA (due to the
weired unexpected incident hard to troubleshoot) so if you look at the
probability you have success, it is bad news. Customer expect cost to diminish
with volume, thus prices (investors too). If the fat of your business model is
around document storage, you have a problem.

You should not trust me, and do some real costs analysis of the cost per
documents for your business case with more than one scenarii and make some
costs analysis and some projections. Because even if someone else optimization
worked for them, it may not be relevant to your business case: at one point
you make money, so you issue a transaction that cannot be handled in a
distributed system (CAP theorem + NoSQL==NoACID) and this point differs than
fellow devs.

Mongo could be -not is- killing your business. If you do mongo because that is
the only techno you know, consider that you have been taken hostage, and call
SWAT to have your mongo guru arrested. They may look nice and harmless, but
you are the hostages of dangerous fanatic religious techs. Maybe call an
exorcist, it could be as efficient.

And then go back to thinking what is the most efficient way to have your bills
paid, and how to diminish cost as your business grow. Forget technical stuff
as a compass for making successful software solution. Success is about
benefits only

Je dis ça, je dis rien, hein?! My .02 eurocents.

~~~
mikegioia
This is an unbelievable waste of a comment and the OP should not even consider
this as feedback.

You completely ignore the question, make a comment about "religious adoptions
of mongo" and then do an about-face by showing your OWN religious zeal AGAINST
mongo.

Mongo is not killing his business. Your comment should be deleted.

To the OPs question: can you post any more specifics? 1M documents with an
index shouldn't be an issue for the aggregation pipeline. What are the specs
of the machine? What are your indexes? I think it would be tough to help
without having that info, and sadly a question requiring that type of info may
be best suited for something like Stack Overflow.

~~~
Ronaldo777
SFjulie1's comment is a good one, mikegioia.

Taking a look at the economics of a technology that's being used is a very
sensible thing to do, especially for a business. If this analysis shows that
one solution is inferior to another, it does not mean that the analyst has a
"religious zeal" against the economically-inferior technology.

~~~
mikegioia
My entire point in this entire thread is to __for once __keep the conversation
on topic about MongoDB without everyone going into a tangent on switching to
Postgres or something else. I just want one post where we can actually talk
about MongoDB scaling.

~~~
Ronaldo777
Somebody came here asking for help with a real world problem.

It would be doing this person a disservice to not mention alternative
technologies that have solved very similar problems that have affected many
other people/organizations.

It is total absurdity to avoid bringing up these solutions to the problem
merely in order to maintain some vague purity of the discussion, or to
otherwise keep the discussion unnecessarily and unhelpfully focused.

~~~
mikegioia
I feel like you're just ignoring what I'm saying. It is NOT helpful to suggest
alternative technologies all the time, especially in this specific post.

If I asked on a forum for help with the engine on my Dodge Dart, and you
posted "Just buy a Prius", that __would not __be helpful. Do you see how that
isn 't helpful? That's what I'm trying to say, that I would like to see, __for
once __a post stay on topic when it 's about MongoDB (heck, even PHP for that
matter!).

~~~
Ronaldo777
If using a Prius instead of a Dart would very likely solve the problem(s)
being experienced, then it is a perfectly reasonable thing to suggest. It
would be wrong not to suggest it.

This submission is about a problem involving MongoDB. It is not about MongoDB
itself. The submitter clearly wants a variety of ideas here. That's why the
submission says, and I quote, "Any tips, stories, suggestions are welcome!"

"Any" means "any", including the suggestion that some other technology be
used.

