

Realtime Analytics with MongoDB - jrosoff
http://www.slideshare.net/jrosoff/scaling-rails-yottaa

======
rrival
I liked this one as well:

[http://www.slideshare.net/jrosoff/scalable-event-
analytics-w...](http://www.slideshare.net/jrosoff/scalable-event-analytics-
with-mongodb-ruby-on-rails)

------
nessence
I'm working on a system which is similar but higher volume.

Have you done any benchmarks to test thousands of updates per second?

Same, but on the front-end. What is the impact of generating 10 reports per
second for 2 hours? Do the writers get behind?

You won't have scaling issues in until the front-end hits some threshold of x
queries per y updates, with x servers.

Good presentation on another application of mongo.

~~~
jrosoff
We have hit 1000's of updates per second on our current system during some
high load periods and did not see any problems. Our steady state is 100's per
second, but it bursts to 1000's for extended durations about once per week if
not more often.

10 reports per second is actually not that much load and has almost no impact
on writers. We have an alerting system that runs while data is input to the
system. It effectively loads a report for each metric reported in the input
and decides whether or not to send an alert. That system generates queries
about 50 reports per second on an ongoing basis and does not impact the
writers. Our read volume in steady state is about 2x our write volume.

We have not seen any queueing problems on writes and the lock ratio in mongodb
is typically in the 0.01 - 0.005 range.

We have found that we can break this by running lots of map-reduce jobs
simultaneously while processing high write volume but that's a whole other
ball of wax.

Our data access patterns very easily accomodate sharding. Both reads and
writes are pretty even distributed across the set of URL's we track. By
activating sharding using URL as shard key, we feel we can handle scaling
several orders of magnitude beyond where we are now without anything more than
additional hardware (or virtual machines).

I'd love to hear how your system scaling goes. Feel free to hit me up via
email if you want to discuss (jrosoff AT yottaa.com)

------
SanjayUttam
If you like this, you may want to check out HummingBird...(Node + Mongo)

<http://webpulp.tv/post/757442457/hummingbird-michael-nutt>

~~~
jrosoff
Hummingbird is awesome and both as a tool and a case study. We learned a lot
from Hummingbird that we incorporated into the design of our system.

------
rgrieselhuber
Slide 11 and 12 were interesting because they are close to the same solution,
but 11 looks more complicated with Voldemort, et. al. in the mix. HBase with
Hadoop seems like another good alternative not mentioned.

~~~
jrosoff
Yeah slide 11 depicted what we thought would be a great solution before we
started investigating MongoDB. MongoDB effectively replaced all those other
systems for us and was _significantly_ easier to set up and develop against.

A few people have mentioned HBase as an alternative. We did not consider HBase
at the time we were making our architecture choices, however if we were
starting today, we'd probably have looked at it too. My first impressions of
HBase are that it lacks the level of documentation & community support behind
MongoDB. I am definitely going to dig in some more to see how it would
compare. That being said, we're totally happy with our choice of MongoDB and
would recommend it to anybody considering HBase.

