Hacker News new | past | comments | ask | show | jobs | submit login

If the postgres database is recording business transactions, you don't want to cause your business to stop being able to take credit cards because you generated a report.



Assuming you use a connection pool, why would it stop? Either the query returns a result or it doesnt? Am I missing something?


Reporting queries can put a significant load on the db, to the point that it interrupts service.


Futhermore, Postgres is an OLTP (transactional) database, designed to efficiently perform updates and deletes on individual rows. OLAP (analytical) databases/query engines like Clickhouse, Presto, Druid, etc. are designed for efficient processing across many rows of mostly unchanging data.

Analytical queries (like "find the average sales price across all orders over the past year, grouped by store and region") can be 100-1000x faster in an OLAP database compared to Postgres.


I see, thanks!


What about using a read-only replica for reporting. Are there any downsides to that? Seems to be easier to manage


That’s the use case for cdc, to make it equally easy to use a DW. As always the complexity is just air you move in the balloon. The oltp db can spit out the events and forget them, how you load them efficiently is now a data engineer’s problem to solve ( if it was easy to write event grain on an olap you would not need an oltp). Kafka usually enters the room at this stage and the simplification promise is becoming tenuous.


that works great if all the data you need for reporting is in the database you're replicating.

You'd likely want a data warehouse if you also need to report on data that isn't in your prod database (e.g. stripe, your CRM, marketing data, etc.).

If setting up a data warehouse, etl, BI, etc. sounds like a lot of work to get reports, you're right, it is.

shameless plug: we're making this all much simpler at https://www.definite.app/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: