

Hadoop 0.18 Highlights - jwilliams
http://developer.yahoo.com/blogs/hadoop/2008/09/hadoop_018_highlights_1.html

======
FiReaNG3L
Do you guys think Hadoop would be a good solution to parse and convert 16 GB
worth of XML files to SQL? I'm thinking of using EC2 to distribute this in the
cloud, as on my desktop it can take days.

~~~
jwilliams
What is the exact solution? If it's a massive parallel parsing operation..
then maybe.

If you want it to parse the XML and put into a database - probably not. You're
going to be bottlenecked by the database, so not much point increasing the
parallelism of the parsing.

~~~
FiReaNG3L
I have 750 x 150 MB XML files, I just want to generate a huge .sql to import
it all at once which is much quicker.

~~~
jwilliams
I think you'll be constrained by the inserts - even if you use bulk insert.

You are probably better off using a bulk loader to get the data in -
SQL*Loader on Oracle. I believe MySQL has an equivalent.

You could use Hadoop to process the data into whatever intermediate format
(SQL or Bulk Data), but my instinct would be that getting the data
loaded/inserted will be the slowest part by far.

