Hacker News new | past | comments | ask | show | jobs | submit login

Yo, why are you doing BI queries on MySQL?

I'm no fan of MySQL either, but probably because they were already using MySQL and they had some Business questions they wanted answered Intelligently without setting up a bunch of new infrastructure. Sometimes at a startup you just need to get things done, and fix it later when (if) it becomes a pain point.

Definitely the kind of thing you use a dedicated replica for

We had a little MySQL db, and both the data and the different systems consuming it grew quite rapidly, faster than we could get ahead of given company priorities. We have a read-replica for the BI dashboarding system, and this keeps our world relatively stable and reliable.

Really not a replica. That is a fine shim for early days but it means you are severely limited to what you can do with reporting based on the data structure prod has. A better pattern is prodDB->Kafka/kinesis streams to a reporting DB like redshift/snowflake/big query. That way you can shape the data however you need, and it lets data teams avoid bogging down engineering.

That’s putting the cart miles ahead of the horse at small-medium organizations. SQL scales a long way. It’s not a shim - it’s the best way to do business until the technology is a limit for you. Modern RDMS can go a long way.

Scale enough to have people dedicated to building and maintaining data lakes is a late stage problem. Who’s going to go build and maintain that reshaping of data?

I guess I really value data, especially for early stage companies trying to understand users and find fit. I don’t think the DB needs to be a dedicated analyticsDB, MySQL and especially Postgres work great for analytics. My issue is with read replicas. In most cases it doesn’t make sense to force the prod DB to have an analytics friendly schema for the replica to use. Making all those views and interesting in them as important business questions come up shouldn’t require a production DB migration.

That said I’m helping an early stage company and an AWS read replica plus Metabase is meeting most of our needs fine for today. We’ll probably start pushing events to bigquery soon so we can make some metrics that would otherwise take crazy joins and sub queries.

Most early stage companies will be writing queries directly against OLTP tables - which is why a read-only replica of your master DB is the safest/fastest option.

There’s options though yeah? Materialized views are a great fit for data reshaping for such things.

> prodDB->Kafka/kinesis streams to a reporting DB like redshift/snowflake/big query

Over engineer much? I've worked at trading and advertising analytics firms that had less engineering

Sounds like they hit a paint point.

Curious to hear if they were running SQL queries using some sort of ORM layer, or if they had people who were knowledgeable with SQL itself writing queries or creating Views to extract the data they needed, as well as tune the database and ensure proper INDEXES were in place.

In my past experience, the two methods above can produce wildly different impacts on database performance.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact