
Show HN: Automating Postgres Data Recovery - csdvrx
https://github.com/csdvrx/pg_csdvrx
======
csdvrx
Background: On a server with a hardware RAID problem, about every large file
got corrupted during the yearly backup. As always, it is not so funny to
discover that less than a week after removing the good copies!

The gzip archives on their way to a new server were stationed on the RAID
during the corruption. gzip files with large corruption in the middle or the
beginning can't be saved. Only a few tsv files could be recovered.

There were directories from /var/lib/postgres that had been used to create the
gzip files, but most of the data was in /lost+found - about 9T, in big files
of 1G each.

The permissions indicated the file were postgres, and strings showed they
contained a lot of valuable information.

I found out about pg_filedump which helps you extract data but require many
manual steps. After testing the approach, I realized it was not possible to do
that on tens of thousands of files and many tables.

So wrote a script that gives you a command ready to run with pg_filedump. I
also added support for the inet format to pg_filedump. Check the video to see
how to get a TSV ready to import in 3 steps if you do it by hand. You can also
script the script (!), which is very useful with large volumes of data.

Some lines will be lost, some record corrupted, but you can fix that easily by
checking the input records. And some data is better than no data!

