Hacker News new | past | comments | ask | show | jobs | submit login

Not sure if you have used it but Spark is exceptionally good at data movement.

In fact that is what a lot of people initially started using it for (as a replacement for Hive/Pig). You can write SQL against HCatalog tables, do some transformation work then write the results out to a different system. We have hundreds of jobs that do just this.




Well, I guess that is the power of it being so general purpose. I have used Spark more for analytics (and Spark SQL) but not extensively for ETL. What you're saying makes sense, you're still using Spark as an execution/computation engine, just writing the plumbing code to use it like an intermediary ETL tool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: