
ETL for Developers - jumpingdeeps
https://blog.stitchdata.com/introducing-stitch-etl-for-developers-f05943b2dc91#.i38e1tf68
======
tomlock
>nobody enjoys writing and maintaining data pipelines or ETL. It’s the
industry’s ultimate hot potato. It really shouldn’t come as a surprise then
that ETL engineering roles are the archetypal breeding ground of mediocrity.

Bullshit. I enjoy it.

Yes, I'm a mediocre programmer, but I love building ETL. I'm also great at
dinner parties.

------
asavinov
The list of connectors is quite impressive which is very important for data
integration. But I could not find what kind of data transformation operators
are available for building an ETL or data migration workflow.

~~~
cm
I'm an engineer at Stitch. Our approach to transformation is to do just enough
to move data from one system to another without losing precision or fidelity.
So, we transform datatypes and structures into more appropriate forms for the
target system, but we don't have any transformation operators like aggregation
or windowing.

We have found that this approach works well for our users, who prefer to get
the rawest possible data, and the systems we target like Redshift that are
themselves powerful transformation engines. This gives the user unlimited
flexibility for defining transformations, and a full audit trail for
understanding how their data has changed.

We are always evolving, though, so if there's a use case that you think
requires this approach, I would be eager to hear more about it.

~~~
Eridrus
Do you have any sort of SDK for adding integrations that you do not support?

While this looks super useful if you support all of the integrations someone
needs, it seems like the moment that's not the case someone needs to maintain
a complete ETL pipeline for those data sources you don't support, and their
load is only reduced by the fact that they have to maintain fewer data
sources.

~~~
cm
We do have an API for sending data into the Pipeline, documentation for it can
be found here: [https://docs.stitchdata.com/hc/en-
us/categories/203326787-Im...](https://docs.stitchdata.com/hc/en-
us/categories/203326787-Import-API)

Additionally, we'll be releasing a Java client library any day now, with other
languages and platforms to follow.

------
vardump
Seems to be just an advertisement...

------
doug1001
this might be valuable technology, but as far as i can from reading the OP and
from a look through the repo, this is _not_ ETL--despite the fact that "ETL"
is in blog post's title, despite the fact that the technology is described
there as "a fully managed ETL service", and despite the fact that the term is
used over a dozen times in the short blog post. As everyone here knows, ETL is
"extract transform load"; this technology is directed to a _portion_ of the
first of those three, "extact", and indeed" only a portion because the sole
(?) focus seems to be on providing various connectors for data access from
various sources, but Stitch doesn't for instance, provide a query language API
so that you can write a single query that works across multiple data sour es.

granted, i think providing a library of connectors for many diverse, data
sources--eg, SalesForce CRM, Google Analytics, MySQL--is valuable, but it's
not ETL, it's not even "E", just part of it. What's more the functionality
Stitch does have is, in my experience, far from the most difficult or time-
consuming component to build, instead, "Transform" is.

~~~
cm
It's accurate to say that we're more focused on the extraction and loading
parts of ETL. In our experience, almost all useful data analysis requires data
to be transformed at multiple stages - for example, once to cleanse and
normalize the data, and again to aggregate it. Our goal is to leave the data
in the rawest form possible without losing accuracy or precision, so that
further transformation can occur, likely using SQL. We recognize this isn't
perfect for every use case, but our customers love it for getting rapid access
to data that would otherwise be locked away in SaaS applications and
transactional databases.

