
Crunch – Go-based toolkit for ETL and feature extraction on Hadoop - jondot
https://github.com/jondot/crunch
======
rectang
That's an unfortunate name for a Hadoop-related project.

[https://crunch.apache.org/](https://crunch.apache.org/)

------
HilaPeleg
Looks cool, how's the performance in Go compared to other tools like Python or
Ruby streaming?

~~~
pjmlp
What about compared with Java, a language with a proper native compiler?

~~~
jondot
Unfortunately the standard Java map/reduce interface will win out because the
IPC is more efficient. But then doing ETL that way becomes clumsy and
painfully slow. The idea is the use rapid tools to do the same job. Streaming
is then one step higher across the happiness factor because you just use

Unix pipes to test your solution (and you can keep piping other tools as well
as pipe Crunch processors over and over again)

------
bsg75
Is this useful for load targets other than Hadoop? RDBMS for example?

~~~
jondot
Actually yes! Because I'm using Pig to "abstract" I/O you can use Pig to load
data from anywhere (any format), it streams it into Crunch, and then again,
Pig to save data to anywhere (any format).

------
omerh
Tested it right now, so easy to use and works great! Great stuff

~~~
jondot
Enjoy! :)

~~~
omerh
Do you have an example for doing ip2location of some kind? We do it today not
optimized and it will be helpful

~~~
jondot
Sure, will prepare something and push it soon

