Hacker News new | past | comments | ask | show | jobs | submit login

Definitely the kind of thing you use a dedicated replica for

We had a little MySQL db, and both the data and the different systems consuming it grew quite rapidly, faster than we could get ahead of given company priorities. We have a read-replica for the BI dashboarding system, and this keeps our world relatively stable and reliable.

Really not a replica. That is a fine shim for early days but it means you are severely limited to what you can do with reporting based on the data structure prod has. A better pattern is prodDB->Kafka/kinesis streams to a reporting DB like redshift/snowflake/big query. That way you can shape the data however you need, and it lets data teams avoid bogging down engineering.

That’s putting the cart miles ahead of the horse at small-medium organizations. SQL scales a long way. It’s not a shim - it’s the best way to do business until the technology is a limit for you. Modern RDMS can go a long way.

Scale enough to have people dedicated to building and maintaining data lakes is a late stage problem. Who’s going to go build and maintain that reshaping of data?

I guess I really value data, especially for early stage companies trying to understand users and find fit. I don’t think the DB needs to be a dedicated analyticsDB, MySQL and especially Postgres work great for analytics. My issue is with read replicas. In most cases it doesn’t make sense to force the prod DB to have an analytics friendly schema for the replica to use. Making all those views and interesting in them as important business questions come up shouldn’t require a production DB migration.

That said I’m helping an early stage company and an AWS read replica plus Metabase is meeting most of our needs fine for today. We’ll probably start pushing events to bigquery soon so we can make some metrics that would otherwise take crazy joins and sub queries.

Most early stage companies will be writing queries directly against OLTP tables - which is why a read-only replica of your master DB is the safest/fastest option.

There’s options though yeah? Materialized views are a great fit for data reshaping for such things.

> prodDB->Kafka/kinesis streams to a reporting DB like redshift/snowflake/big query

Over engineer much? I've worked at trading and advertising analytics firms that had less engineering

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact