
HotelTonight's ETL Pipeline Built w/ Iron.io and RedShift - carimura
http://engineering.hoteltonight.com/ruby-etl-with-ironworker-and-redshift
======
Throwadev
Can you give any detail on what the end result is? What kind of data do you
end up with in redshift, and what kind of queries? What kind of data do you
extract from mixpanel?

Is all this running continuously, or different schedules for each worker? Any
of these event based rather than schedule based?

~~~
harlow
> Can you give any detail on what the end result is? What kind of data do you
> end up with in redshift, and what kind of queries?

I'll have to get our BI team to create a follow-up post.

In a lot of ways its the most interesting part of this project. I'm not
entirely sure what data we can share; I'll push to get something out.

> What kind of data do you extract from mixpanel?

The mobile devices push user interactions with the app as events to Mixpanel.
We pull that data daily into Redshift, and this allows us to run historical
reports and discover patterns of user behavior within the mobile app.

> Is all this running continuously, or different schedules for each worker?
> Any of these event based rather than schedule based?

The Extractors are Schedule based. So with Mixpanel for example we do a daily
dump around 4am (once all the Mixpanel data is available for export).

With our Rails events we push them to IronMQ and the scheduler kicks off
workers every 15 mins to pull them off.

The `Transformers` and `Loaders` are event based. So the Extractors would kick
them off when they have completed their work.

------
stephenitis
pretty great example of using IronWorker to perform a ETL with many complex
pieces.

It is extremely nice to be able to use git/version your workers separately
from your greater application on the Iron.io platform

------
zman831
Why iron.io over cron?

~~~
carimura
Iron.io has 3 functions: the scheduler, the queue, and the processing/worker
platform. CRON would just be the scheduler part of that and you'd still be
left dealing with where to host workers, how to scale them, manage/monitor,
etc.

IronWorker (and IronMQ) automates a lot of this operational overhead and makes
it easy to just focus on your worker code.

Also when it scales, you can have hundreds, even thousands, of workers running
in parallel.

------
kholmes79
what about dependencies in the workers?

~~~
carimura
I can't speak for HT specifically but dependency management is fairly
straightforward. You can include any gems or libraries in your worker code
package. If they contain native extensions, we'll compile and package them in
at the time of upload so that the worker can execute at runtime until the next
time you change the code and re-upload.

More information here:
[http://dev.iron.io/worker/reference/builds/](http://dev.iron.io/worker/reference/builds/)

