More

melor · on March 2, 2019

CPU usage varies based the selected compression algorithm and level used. Snappy and LZMA area available now. Compression is native code. There are some newer interesting algorithms (zstd/lz4) that we are looking into adding.

melor · on March 2, 2019

One of the pghoard developers here. We developed pghoard for our use case (https://aiven.io):

* Optimizing for roll forward upgrades in a fully automated cloud environment * Streaming: encryption and compression on the fly for the backup streams without creating temp files on disk * Solid object storage support (AWS/GCP/Azure) * Survive over various glitches like faulty networks, processes getting restarted, etc.

Restore speed is very important for us and pghoard is pretty nice in that respect, e.g. 2.5 terabytes restored from an S3 bucket to an AWS i3.8xlarge in half an hour (1.5 gigabytes per second avg). This means hitting all of cpu/disk/network very hard, but at restore time there's not typically much else to do with them.

melor · on March 30, 2017

The next part in the series will include read-write benchmarking. Taking suggestions for other benchmark scenarios!

melor · on March 1, 2017

We host a number of our customers' database systems on us-east-1.

What worked well for us (https://aiven.io):

- Architecturally relying only to a few cloud provider services (only need VMs, disk, object storage)

- Upfront investment on being able to move services from one region to another without downtime

- Pre-existing tooling for easily (manually) reconfiguring backup destinations on the fly

- Not running everything on just AWS

What did not work so well:

- Backups should automatically reroute to a secondary backup site on N consecutive failures

- Alert spam, need more aggregation

- New failure mode: extremely slow EBS access, some affected VMs were kinda working, but very slowly: need to create a separate alert trigger for this

melor · on Feb 28, 2017

Only limited impact to Aiven services due to service migration capability http://help.aiven.io/announcements/aiven-customer-notice-aws...

melor · on Dec 5, 2016

We provide UpCloud as one of the cloud options for our SaaS database/metrics/messaging offering at Aiven.io and have been extremely happy with their disk i/o performance.

Here's just a quick "hdparm -t" test I just ran on two random low-end nodes:

upcloud-de-fra: 1028 MB in 3.00 seconds = 342.12 MB/sec

aws-us-west-1: 58 MB in 3.02 seconds = 19.17 MB/sec

I would of course recommend everyone to benchmark their actual workload on each cloud option before making the decision.

teilo · on Dec 5, 2016

This is a non-answer. What is the actual underlying hardware? Large SSD arrays? Your serious customers will not trust you unless you answer this question. We will not trust our data to unknown technology.

It is relatively easy to reach 100,000 IOPS in SSD RAID configurations with enough drives. As the GP says, there's no magic here.

melor · on May 12, 2016

From the Release Notes:

Major enhancements in PostgreSQL 9.6 include:

Parallel sequential scans, joins and aggregates

Elimination of repetitive scanning of old data by autovacuum

Synchronous replication now allows multiple standby servers for increased reliability

Full-text search for phrases

Support for remote joins, sorts, and updates in postgres_fdw

Substantial performance improvements, especially in the area of improving scalability on many-CPU servers

melor · on April 28, 2016

A replication slot can be used by defining it in the pghoard.json configuration. However, the slot needs to be created (and removed after no longer needed, important!) manually. We've been planning to add more automatic replication slot management to PGHoard.

anarazel · on April 28, 2016

Good. Without archiving or slots in place, you really can't rely on such backups...

melor · on April 28, 2016

Both do mostly the same thing with some differences. The biggest difference currently could be that WAL-E uses the PostgreSQL "archive_command" to send incremental backups (WAL files) in complete 16 megabyte chunks, whereas PGHoard uses real-time streaming with "pg_receivexlog", making the data loss window much smaller in case of a disaster.

willlll · on April 28, 2016

You can set archive_timeout to something like 1 minute to bound the window.

melor · on April 28, 2016

Takes care of realtime WAL streaming, compression, encryption, restoration and backup expiration among other things. Open Source and written in Python.

brudgers · on April 28, 2016

Curious if it backs up to other cloud storage providers in vendor neutral ways.

melor · on April 28, 2016

Currently S3 (AWS + compatible), Google Cloud, OpenStack Swift, Azure (experimental), local disk and Ceph (via S3 or Swift) are supported. More can be added quite easily as the object storage logic is behind an extendable interface.

Which vendor neutral protocol are you interested in using?

merb · on April 28, 2016

What will happen when the Storage (swift or ceph) is offline for some time?

oskari · on April 28, 2016

PGHoard can archive PG's WAL segments in two modes: streaming directly using pg_receivexlog or as an archive_command to archive complete segments.

When PGHoard is used in streaming mode it keeps reading new segments from PG and stores them in compressed & encrypted form in a queue ready to be uploaded. The segments will stay there until they can be uploaded.

When using archive_mode PGHoard handles the operation synchronously so PG won't actually remove or recycle the WAL segment in question until the command completes.

Postgres will keep running normally in both cases, but the files will be queued in different places, compressed or uncompressed. This may cause your disk to fill up eventually, but PGHoard will trigger an alart after a configurable number of upload failures.

oskari · on April 28, 2016

PGHoard has quite high unit test coverage (85%) and it's pretty easy to add a new object storage configuration to tests to verify that all the APIs used by PGHoard work properly.