
Downloading Large Heroku Postgres Backups - mooreds
https://blog.testdouble.com/posts/2019-11-12-downloading-large-heroku-postgres-backups/
======
themgt
This is way overcomplicated. All you actually have to do to grab your latest
scheduled Postgres DB dump is run:

    
    
      curl -o latest.dump `heroku pg:backups public-url -a [app]`
    

(and then pg_restore)

~~~
mjw1007
Hm, is the secrecy of that URL the only thing stopping someone else
downloading the backup?

~~~
haffla
It looks like this
[https://xfrtu.s3.amazonaws.com/29577c2f-fd6c-4794-b7c5-4390u...](https://xfrtu.s3.amazonaws.com/29577c2f-fd6c-4794-b7c5-4390uedghfrtf/2019-11-15T02%3A17%3A43Z/hktdg95r-04ad-4489-88c3-983eykjthgf?X-Amz-
Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=IOAUSHJKFSUWFDSF%2F20191115%2Fus-
east-1%2Fs3%2Faws4_request&X-Amz-Date=20191115T111307Z&X-Amz-
Expires=3600&X-Amz-SignedHeaders=host&X-Amz-
Signature=801eb84a9577da86e9de5df0b8676cbf3cf8d7d33764d5b326205071b332f43a)

I believe it expires after 1 hour (X-Amz-Expires=3600).

------
zrail
At a previous gig we wanted to have a regular scrubbed snapshot of our
production Heroku database for dev and QA work. We didn't want unscrubbed data
to hit a developer's laptop, both because it contained PII and because it was
huge. The nightly process we ended up with (after a lot of iteration) was:

1\. fork the production db and wait for it to sync

2\. run a task within the production Heroku app pointing at the new fork which
does the PII scrubbing and compacting[1]

3\. `pg_dump` the compressed result onto the Heroku dyno running the task
above

4\. upload the dump as a release on the GitHub page for our main codebase

When a developer wanted a fresh dev database all they had to do was run a
local rake task that would grab the db dump from the latest GitHub release,
drop their local database, and restore the dump.

[1]: This isn't as easy as it sounds because one of the tables we wanted to
compress contained raw financial data, but we wanted a coherent story for dev,
so we dropped 95% of the data, keeping more recent stuff, randomized all the
amounts, and then recalculated summary tables.

~~~
kylecordes
It warms my heart to read of people doing this: development on scrubbed data.

It is sadly not unusual, even at companies with resources and knowledge to do
better, to hand production data (including ample customer personal data) to
every developer.

------
hinkley
Can anyone explain to me why the Heroku model is so singular?

It’s an interesting process, but it’s not one that has been emulated by
competition as far as I’m aware. Which seems like it would make it harder to
switch vendors.

And yet they have been around for ages.

~~~
bdcravens
There are alternatives. Cloud 66 is essentially Heroku on your own
servers/cloud instances. Dokku is an open source alternative that leverages
Heroku buildpacks.

------
danpalmer
It's worth noting that under GDPR this shouldn't be done without
consideration.

\- Do you have user data for users in the EU?

\- Do you need all of that data on your machine to be able to provide them the
service?

\- Are you taking reasonable steps to ensure data security.

For many small developers: the reasonable step is disk encryption and a good
password, and some basic scrubbing of identifiable data, but that's important
to build into processes so that it doesn't get missed, and so that in the
event of a breach you can prove that you're taking reasonable steps.

