Hello!
I just got a new job as a "platform" engineer, which looks like it's going to be a whole bunch of data engineering (i.e. getting data from point A --> redshift/etc). I come from primarily a webops/devops/sysadmin background, which means I've been feeling a little out of my depth on "what's all this data stuff and how does it fit together?". I'm curious for any good resources you may have.
Thanks!
At least what has worked best for me is using BigQuery and building custom tooling around that. Using Airflow, some simple python tools, .json.gz archives, and a mix of streaming data and hourly/daily exports has worked well and not been a huge task to keep operational. Of course, every situation is different; ymmv.