Ask HN: What is the best way to dump mongo data into Hadoop?

dangoldin · on Aug 13, 2012

I was at a meetup where the Foursquare data science team spoke about this problem. If I recall correctly, their solution was to have jobs that would take the data from Mongo and store it in flat files that would then be used by the Hadoop jobs. They found that the performance gained was worth the additional storage costs. They have a pretty well defined Hadoop process though so were able to optimize for it. If you plan on having a variety of Hadoop jobs it may not make as much sense.

Note that this information may be outdated so just treat it as a data point. I'm sure others will have better ideas.

misiti3780 · on Aug 13, 2012

i was looking at this presentation, but cant really make sense of the two slides on "BSON Data ..."

http://engineering.foursquare.com/2012/06/22/our-hadoop-stac...

taligent · on Aug 13, 2012

There is this: http://api.mongodb.org/hadoop/MongoDB%2BHadoop+Connector.htm...

And a generic connector as well: http://blog.mongodb.org/post/29127828146/introducing-mongo-c...

heretohelp · on Aug 13, 2012

Should be trivial since document stores are less relational and the data should be relatively isolated.

You really just need to learn the subject matter, there is no magical wand for loading data from one to the other.

You understand one, then you understand the other, then you understand how to port and grapple with the data.

Just start reading.