
Show HN: PGBackup.com, Postgres backup as a service - EmielMols
https://pgbackup.com
======
koolba
> A dedicated, virtual server that runs your postgres version in replica mode.
> It continuously receives changes from your primary database server, becoming
> an off-site backup.

This isn't an offsite backup. This an offsite replica that will faithfully
replicate a "DELETE FROM account WHERE true".

Marrying that with a scheduled logical pg_dump and/or physical pg_basebackup
that runs against the local copy (for performance and not impacting the
master) would create a true offsite backup.

Until then, it's just one more place that will get wiped out when the source
data is accidentally destroyed.

~~~
mey
My current backup solution for my personal server is on a daily basis run sudo
-u postgres pg_dumpall > /var/base/backups/db/postgres/backup.sql and then
ship that plus many other things off to tarsnap. This is a naive approach
because the data set is small, but it could be tuned and improved to be delta
based at some point. The joy is Tarsnap is managing the history of the data
for me on that file and retains it as long as I wish to pay for.

I have a separate script that manages cleaning up of backups older than a
certain age, etc.

~~~
koolba
Within the realm of pg_dump completing in a reasonable amount of time, this is
a fine strategy.

Bonus points if the server that runs pg_dump and uploads to tarsnap has write-
only access (to tarsnap), uses unique unguessable prefixes for the backups,
and is distinct from the server that manages cleaning out old backups.

Course that's probably way overkill for a personal server :D

~~~
cperciva
_uses unique unguessable prefixes for the backups_

Why? If you're worried about someone overwriting old backups, don't: You can't
overwrite or modify archives in tarsnap. Once created they can only be read
(if you have the read keys) or deleted (if you have the delete keys).

~~~
koolba
> Why? If you're worried about someone overwriting old backups, don't: You
> can't overwrite or modify archives in tarsnap. Once created they can only be
> read (if you have the read keys) or deleted (if you have the delete keys).

Nice. Some reason I thought a write key would be able to overwrite as well. If
that's not possible then yes it's not necessary to have random prefixes. Fyi,
most of my tarsnap usage is in "set it and forget" mode so been a while since
I perused the docs.

The random prefixes would apply if you're using something that doesn't provide
overwrite protection like pushing blobs directly to S3.

------
sandGorgon
I would pay for this.

The product you should be building is replication (what you have actually
built) and wal-e configured to backup to amazon s3 (where I provide the
bucket).

you guarantee that the backups are happening, build in some intelligence to
make sure catastrophic "DELETE" statements would trigger a backup first, etc.

Give me the ability to spawn a replica using the exact backup that I choose,
etc.

This is something I could totally pay for!

~~~
craigkerstiens
I can definitely see the demand for something like this, but the engineering
on it becomes quite difficult. Without the ability to cancel long running
queries there can be some replication lag and the WAL disk could fill up.
Without actual administrative access to the instance it becomes very hard to
fully guarantee such replication.

If you do want this more out of the box the real options today are either
built it yourself or choose a provider that's delivering it.

~~~
EmielMols
I don't think this is a huge problem. Yes, long running queries might
introduce lag. When this lag hits a threshold of your choosing, PGBackup (or
any service that would offer it) would then gladly send out pager alerts.

The same would happen if the WAL could not be retrieved in time (because wal
storage full): PGBackup would start a new base_backup and send out pager
alerts.

------
0xmohit
Instead of terming it "backup", won't "replication" be more appropriate?

~~~
EmielMols
Technically, yes. But we figured the name "backup" is easier to remember, and
this would allow us to add a bunch of other 'backup related' services later on
:).

~~~
NegativeLatency
But the difference between a "backup" and a db "replica" is important

------
JoshTriplett
Seeing a service like this makes me wonder if Postgres could support encrypted
replication. That would allow using a service like this to provide reliable
backup without having to trust it with customer data.

~~~
EmielMols
Thanks for this. That would be an ideal scenario for me as well (it's even
mentioned in the FAQ).

It's technically rather challenging, but when enough users would ask for it, I
know putting some resources into it might be viable.

~~~
adrianpike
This looks like a potential option;

[https://github.com/wal-e/wal-e](https://github.com/wal-e/wal-e)

------
devopsproject
Don't show an actual password in your examples unless you want hundreds of
people using it.

~~~
EmielMols
Haha, thanks. It's just a dummy database with random data :).

------
EmielMols
Author here, To quote from the FAQ:

> Why did you build PGBackup?

> We were tired of constantly configuring a postgres replica for each (small)
> project. And then being pretty unsure if the backup would still be up-to-
> date by the time the primary server crashes.

> When talking with other postgres users about backups, we also found that
> primary and backup servers often run in a single data center or at a single
> (budget) provider. In effect, exposing these users to non-neglible risk of
> loosing database+backup.

There's some - obvious - challenges in getting new users to trust you enough
to send them their database copy. At the same time, people host very sensitive
data at (virtual) budget servers without thinking twice. What do you think?

------
EmielMols
Great feedback here, but also through the on-site chat. Again, the current
version is built with very limited resources mainly to check if/how the idea
would resonate.

\- We will try to add Point-in-time-recovery asap (saving base backups+xlogs).
These would run of the replica, not putting load on your database server. As
koolba correctly points out, this will make it more "backup" than "replica".
Will have to figure out how this fits in the pricing model.

\- I would personally love to offer better (guarantees of) encryption of the
backups, ideally encrypt the data pages on the primary server. We would have
to see how this would technically work.

\- First couple of backups are currently replicating :)!!

Thanks, HN!

------
infinite8s
Not sure if the cost of this is worth it. I set up WAL-E
([https://github.com/wal-e/wal-e](https://github.com/wal-e/wal-e)) to do
server-side encrypted backups (it does a daily base backup + continuous WAL
delivery) in about half a day. If you setup lifecycle rules on the S3 bucket
you can have an automatic deletion policy handle cleanup.

------
roshansingh
Do you expect us to open the port on public interface?

~~~
EmielMols
We might add proxy-over-ssh/vlan options later on, but it would be
incompatible with the 2-clicks-and-your-done interface right now.

As far as I know, postgres authentication is rather simple and exposing the
port does not easily add huge liability.

This is really a trade-off between ease-of-setup and better security. We're
eager to talk to (a lot of) users to see what there current setup is and how
PGBackup could add some value for them.

~~~
monksy
> exposing the port would not add liabilities.

Famous last words of security on the internet.

~~~
brandon272
The comment caught me off guard as well. It seems as a customer you're placing
a lot of trust in this company as you're replicating your data to them and
then trusting that they will handle and store it securely.

