I work at a small (~100 person) bootstrapped company. We are hosted almost exclusively on AWS, and we use RDS for most of our transactional DB needs. Up until now, our analytics has been running on a combination of Metabase (hooked up to our RDS databases) and internal admin reporting. It works, but its all a bit clunky and we're outgrowing its capabilities. At this point in time, this is for internal (i.e. management) analytics only.
We are at the stage now we we need to start doing it "properly". I have a reasonably good understanding of tech in general, including on the infra, database and sql side. I have also done a fair bit of research into data warehousing, ELT/ETL pipelines, etc.
To start off with, I'd like to get a proof of concept up and running, just by extracting data from our RDS databases (we don't have a HUGE amount of data - few hundred million records spread over 2 DBs). There are other data sources that we'd like to connect to, but that can come later.
We would prefer AWS or self hosted, but we are open to any possibilities (Google, Fivetran, Snowflake, etc).
I guess there are 3 pieces to the puzzle:
- Data warehouse (I almost think that we should just be pushing this all to another RDS database (and denormalise, aggregate, etc where appropriate).
- ETL/ELT
- Reporting (Metabase, Superset, Quicksight?)
We're looking for the simplest option that we can use to get up and running, but also have a clear path to growing with us over time. Do you have any suggestions on the best way to get started?
100s of millions of rows is relatively small for ClickHouse, so you don't need to worry about advanced optimizations like compression codecs or materialized views. Standard datatypes and the default LZ4 compression should be just fine. Y
Here's quickstart video that will teach you most of what you need to get started: https://www.youtube.com/watch?v=phTu24qCIw0
I work on ClickHouse at Altinity and created the education video. I'm kind of biased. ClickHouse is not the only OSS data warehouse. There is also Druid and Pinot. I think, however, for this particular case where you are pulling from RDS that ClickHouse will turn out to be a good solution.
Good luck whatever approach you choose!