Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What is the best way to dump mongo data into Hadoop?
2 points by misiti3780 on Aug 13, 2012 | hide | past | favorite | 4 comments
I have a large amount of data in a mongodb database and I want to utilize apache hadoop (not mongo map-reduce) to analyze this data. Does anyone have suggestions/tutorials/etc. on what the best way to do this is? (i.e. export mongo data to the HDFS)



I was at a meetup where the Foursquare data science team spoke about this problem. If I recall correctly, their solution was to have jobs that would take the data from Mongo and store it in flat files that would then be used by the Hadoop jobs. They found that the performance gained was worth the additional storage costs. They have a pretty well defined Hadoop process though so were able to optimize for it. If you plan on having a variety of Hadoop jobs it may not make as much sense.

Note that this information may be outdated so just treat it as a data point. I'm sure others will have better ideas.


i was looking at this presentation, but cant really make sense of the two slides on "BSON Data ..."

http://engineering.foursquare.com/2012/06/22/our-hadoop-stac...



Should be trivial since document stores are less relational and the data should be relatively isolated.

You really just need to learn the subject matter, there is no magical wand for loading data from one to the other.

You understand one, then you understand the other, then you understand how to port and grapple with the data.

Just start reading.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: