Show HN: Safe Data Changes in PostgreSQL

Valodim · on March 10, 2023

Sounds like a pretty great idea, executing queries on production is a necessary but scary flow below a certain company size, until you can afford to be slow about it.

Got a question: presumably this adds a trigger, that adds some amount of extra work to every query, achieving a benefit for the relatively negligible number of manual queries. Makes me wonder about the performance impact of that trigger?

An article about some performance characteristics in the docs would help a lot to assuage that concern.

mitchpatin · on March 10, 2023

Inquery co-founder here. Great question regarding performance implications - we're working on an article exactly like you described. Additionally, we're exploring a few updates to significantly reduce the impact to the database: (1) adding the filtering at the trigger level, and (2) using the WAL instead of triggers.

pawelduda · on March 9, 2023

I see this doesn't yet support something I've wanted for some time:

What's the simplest way to snapshot a database, perform an operation in app, snapshot it again and get an overview of how many rows were affected in each table? Like an overview of diff. Optionally list changed pkey ids. I tried googling but couldn't find anything like this, so I was thinking of making a DYI solution but wasn't that desperate yet.

PaulMest · on March 9, 2023

I'd guess that https://neon.tech/ could help with this. Neon is branded as "Serverless Postgres". They have APIs to create branches of your database.

So you could effectively:

1) Create a snapshot of your production DB -> DB_2A

2) Then create a snapshot of that snapshot -> DB_2B

3) Now you have two copies of the exact same database. Run your query/workload/migration on DB_2B.

4) Run some metadata queries against DB_2A and DB_2B and compare the results.

5) If your metadata queries are inline with expectations, delete the snapshots. If not, leave them around for a bit for manual inspection.

pawelduda · on March 9, 2023

Interesting. Will check it out in detail. From a quick glance tho I'm afraid that vendor-specific cloud is a no-go because I would rather not upload any client's data in there, even if anonymized. Besides, it's an overkill to convince anyone to switch to a different vendor for sake of diffing DB alone :)

ciminelli · on March 9, 2023

We were looking at tooling for snapshotting production data for testing purposes, that use case is interesting to have the diff view based on changes happening from application actions. Would you use it for testing changes or more for debugging production issues?

pawelduda · on March 9, 2023

Well, my main use case would be to speed up project onboarding where I can play around with the app from user perspective and check how my actions impact database... Could help cutting through a lot of frontend/backend layers and just focus on raw data.

If you have multiple DB snapshots and WAL enabled, theoretically you could inspect the log to see what happened inbetween. There's pg_xlogdump for that but I think it will output very raw data...

ciminelli · on March 10, 2023

How would you want to actually look at the database diff? Just a summary view of rows that were changed given a certain time period?

pbronez · on March 10, 2023

Dolthub does that https://www.dolthub.com/

krembo · on March 9, 2023

Why not use aws rds snapshots to create a new replica in no-time?

klysm · on March 10, 2023

I wonder if you could utilize templates to snapshot the DB.

pawelduda · on March 10, 2023

This is what tools like `dslr` do and it's very fast. My problem wasn't backing up/restoring though but rather diffing data in these 2 databases :)

klysm · on March 14, 2023

It seems like it should be doable to write something that generically diffs tables.

rr888 · on March 9, 2023

Its something every support desk should have. I often write my own tools like this. Ideally write a select query, execute it, note how many rows it selets, copy the where clause, start transaction, update using the where clause, execute, check rowcount matches what you selected earlier, commit trans.

mitchpatin · on March 9, 2023

Inquery co-founder here. Glad to hear the idea resonates with you! Do you usually just run these command in a local IDE? Would you prefer our solution be a local application or a self-hosted container in your VPC accessible through your web browser?

rr888 · on March 9, 2023

In the past I've done a few different ways but now I see strict infosec rules, That part is more important than where it runs. eg My last job we had a workflow where you needed a ticket approved by second eyes, which used CyberArk to create a new remote desktop running a DB IDE where you could do your business. Commands were tracked but no real restriction..

New firm you get your personal account temporary RW permissions via a centralized service.

mitchpatin · on March 10, 2023

The "temporary access" approach seems to be pretty popular based on our conversations with engineers at large-ish tech companies. We hadn't heard of anyone using a remote desktop for this problem, though.