
Does my Startup Data Team Need a Data Engineer - asragab
https://blog.fishtownanalytics.com/does-my-startup-data-team-need-a-data-engineer-b6f4d68d7da9
======
asragab

      At this point a pipeline built on top of Stitch / Fivetran / 
      dbt is far more reliable than one built on top of custom- 
      built Airflow tasks.
    

I'd be curious if anyone who has used or integrated these products into their
infrastructure could verify or comment on whether they are as effective as the
author seems to suggest.

    
    
      If you hire a data engineer who just wants to muck around in 
      the backend and hates working with less-technical folks, 
      you’re going to have a bad time.
    

I'm not sure this was the intent, but I found this somewhat dismissive. I
think communication skills are indeed important and being able to effectively
explain technical considerations to less-technical parties (or parties whose
technical expertise is not aligned with Data Engineering), but I have
encountered in my own experience an active disregard for those considerations
by data scientists as orthogonal to their needs at best or at worst, details
for which they cannot be bothered. This is underscored by the notion that we,
as Data Engineers, "muck around in the backend." We do, and we have to, and it
helps to like it.

There are a few other areas of input and contribution that a good data
engineering team can provide, that I don't think get enough attention in the
post:

    
    
      1) Machine Learning Productionization
      2) Being a source of data expertise (consulting) with other 
         developers (working on services or the main product) in 
         the organziation
    

Regarding 1, while the author seems convinced that the ELT/ETL tooling and
ingestion pipeline building can be taken off-the-shelf, I don't if it is as
likely that there is the same kind of mature tooling for machine learning
model deployment/integration. Though, I believe that is changing, slowly.

