I am seriously considering migrating it to PostgreSQL, rewriting the entire backend.
One of the greatest things about MongoDB is the existence of Studio 3T, a really good administration software for MongoDB, and I wouldn't want to miss it.
I spent a couple of years coding with MySQL and used phpMyAdmin. I wouldn't want to go back to such a tool, and I think that pgAdmin is what would be PostgreSQL's alternative to phpMyAdmin.
The only software which I found to suite me was DataGrip.
Are those the only two reasonable administration tools for PostgreSQL? DataGrip and pgAdmin?
They finally released a native Linux version and I’ve been on cloud nine ever since.
Mastering PostgreSQL Administration
This talk is designed for PostgreSQL administrators. It covers all aspects of PostgreSQL administration, including installation, security, file structure, configuration, reporting, backup, daily maintenance, monitoring activity, disk space computations, and disaster recovery. It shows how to control host connectivity, configure the server, find the query being run by each session, and find the disk space used by each database.
I’d like to hear from anyone running their own Postgres. Why did you choose to do that vs pay for a managed instance? How much of a hassle has it been? Is it something you really need significant Postgres experience to do well (to achieve security, backups, high availability, etc)? Biggest gotchas to watch out for? Best material to read?
Thanks in advance.
High availibility is a bit more complicated to set up with Postgres, and I don't have much experience with that part. Plain replication isn't too difficult as far as I understand, but automated failover is not in the core tools, so you would use an external tool that does this as far as I understand.
My experience with Postgres in general is that it doesn't really need any attention if you're at a reasonably low level of traffic. And reasonably low doesn't mean toy database, if you're not doing anything horribly inefficient. If you have a high write volume you might have to pay attention to vacuum settings and tweak them.
If you have a high income, might just make sense to pay for a managed version for a side project if you aren't specifically interested in learning more about managing Postgres. If you're e.g. a student I think you can get pretty far with managing your own Postgres on a small VM.
The Postgres documentation is very good, I'd certainly read the parts about backups and replication there.
Useful to look at barman and see if it can meet one's needs:
No sense re-inventing the wheel if it can be avoided.
Also, Barman offers point-in time recovery (PITR).
Lastly, we do a nightly logical dump which gets uploaded to a bucket (located at another cloud service).
There is Patroni, which has a Docker-based setup template in Spilo. I don't have expertise enough to say anything about these, except that I've been running an HA setup with Spilo for several months without issues so far.
Would be interesting to hear what more experienced postgres people have to say about it :)
Also we've opted to just use traefik (based on spilo service registration) rather than pgbouncer.
We manage a rather large cluster using it and for the most part, it works.
Getting functional PITR that you can restore rapidly is honestly our biggest gripe with PG. WAL-G/Barman certainly do work, but when your DB is measured in terabytes, even bootstrapping a replica is very slow.
The primary reason was the desire to use extensions; I'm now able to use Foreign Data Wrappers with great effect to turn the Postgres databases in an enterprise data integration platform/data warehouse.
Additional benefits include a staggering 50x performance increase* , a much finer grained user authorization strategy (by being able to (automatically) alter grants and alter default privileges of other users with superuser permissions). Also, my business users can now leverage NFS or sFTP shares to load very large datasets direct from the server and we can actually shut down unused environments outside of business hours (Azure's DBaaS can be scaled down, but not shut down).
The costs of this setup is actually ~20% lower compared to Azure's managed offering (for same core count, could be even lower if you add the enormous performance gains to the comparison). This included cost of developers/operators! (we're finding the operational effort is more or less equal to a managed offering; almost non-existant). The only thing that really caused some discomfort in the beginning was our setup of pgBouncer.
Off course, There're some caveats; we run an OLAP workload, not business critical and therefor haven't really put a lot of effort into a HA setup. Still, the advantages of an "real" Postgres server cannot be overstated! To be able to use extensions alone has brought the time to market of new features from weeks to days.
* Geometric mean of TPC-DS benchmark suite. On Azure's Ls_v2 machines, leveraging the NVMe disk with bcache. Compared to a tuned Azure's "Postres as a Service" with equal core count.
That being said, the benefits of managed offerings are clear and sound: you don't waste your time for many DBA tasks, paying some extra and losing some control. Many companies choose this. Some change their mind over time and move back to self-managed, or use hybrid approach.
One more thing. If you need to maintain dozens or more databases, you need automation. This leads to either managed offerings, or k8s (look at StackGres.io).
All of them look interesting and developing rapidly.
My question: Is out there any other PostreSQL managed service faster than Azure and as simple to set up?
If that is not the case then I'm in trouble. I write and benchmark applications on my own laptop and always assume that if my low-end laptop is fast enough then whatever production-grade infrastructure we deploy it on will be on-par or faster. Seems like this assumption breaks down when it comes to Azure.
We are thinking in upgrade it to a better plan, but we are not sure because we are not sure that it will solve our problems.
I think that comparing to your own laptop is unfair because there is no connectino needed, everything is on the same CPU / RAM, etc. Remember when you connect to a external DB informatin has to travel from one point to another.
> I think that comparing to your own laptop is unfair because there is no connectino needed, everything is on the same CPU / RAM, etc. Remember when you connect to a external DB informatin has to travel from one point to another.
Yes indeed, but in my case I was talking about DB query performance where the network plays a minimal part. If a query is near-instant on my laptop I expect it to be the same (or better) on production infra, but with Azure's Basic tier I noticed the opposite. I agree that if you're firing off lots of queries then network latency starts to add up, but in my case I haven't even reached that point - network latency is insignificant compared to the actual delays I'm seeing on the individual queries themselves.
We're lucky to have experience with Postgres, performing backups (of the various types) and restoring DBs on production environments. We don't have things automated, but that would be next.
I think startups are hard to do it in-house, not for a lack of expertise (or willingness to get it), but the fact that there are already so many things to handle.
Can you share your experience with using soyoustart?
The experience is mostly solid, though they don’t provide any kind of console access and I once got locked out after putting a machine to sleep from the command line (and had to open a ticket for someone to reboot it manually).
The "problem" is that this is not a virtual machine and most assumptions or ways of thinking that work for AWS or similar don't work here. For example, you can't just upload an image or move data volumes around between machines. The storage volume is a physical disk attached to the machine, if you need to copy them you need to actually boot the machine, SSH into it and do your thing. Similarly, those disks can break down and you need to plan ahead (they provide 2 disks per machine and expect you to use a software RAID 1 or have a backup/restore strategy that can tolerate those disks failing).
This is where building your own layer on top of it (whether Kubernetes or VMWare or similar) makes sense and will make your life easier and allow you to use most tools we take for granted when it comes to VM providers - you're essentially building your own AWS.
This is a whole other league compared to Linode - the former provides VM instances with varying levels of performance - the one we’re talking about is bare-metal and you are guaranteed to have the performance of whatever hardware specs you chose. Bandwidth is also unmetered so you can saturate the pipe 24/7 if you wanted to. Very cost-effective for serving media, though every machine only comes with 250Mbps, so maybe it's better to have an origin server on SYS, and lots of downstream instances on Kimsufi which serve as a downstream cache for the origin - they are dirt-cheap and come with 100Mbps bandwidth each (unmetered as well).
I keep a copy via streaming replication and do daily backups.
When setting up backups you'll want to know what you're backing up and how to restore it; Do you need to be able to restore to an arbitrary point in time, or to a snapshot of the data at the time of backup? Or do you need to be able to perform partial restores?
For HA, you'll at least need to specify if any data loss is acceptable; whether the system should perform automatic recovery in the event of a real failure, or if the HA merely exists for operational purposes where failovers are controlled events.
For example. PostgreSQL streaming replication is reliable and easy to understand, but it's asynchronous by default and you do need to monitor that the replica stays current with the primary. It's just fine as is for basic operational HA so that you can perform maintenance with minimal (seconds) downtime, but if you want a fancy autohealing system that guarantees commits are never lost, you'll likely need a more complicated setup with synchronous replication and real clustering software.
Generally, the stricter your requirements get (and the more data you deal with), the more effort it takes to implement it yourself.
Initially we have used RDS as PostgreSQL instance when the product is in pilot phase. RDS costed us $548 for just 2vCPU 16GB 500GB SSD (db.r5.large Multi-AZ). Considering the increasing active customer base and volume of data involved, we found that RDS is very expensive and costed us more (> 100%) than the market affordable estimated product pricing (:facepalm:). As per our performance benchmark, we found that db.r5.large can accommodate 250 customers and scalable linearly. To reduce the RDS spend, we had to reduce two costs.
1) Reduce the RDS Instance cost / customer - we aggressively optimised our sync flow and final benchmark reveals 500 customers can be accommodated in db.r5.large Instance (50% less RDS Instance spend / customer)
2) Reduce the RDS Storage cost / GB of customer data - we could not find any way to reduce the storage cost. Since RDS Instance is fully managed by AWS, no possibility of data compression.
When we compared the total cost based on our usage, RDS Instance cost is just 10-20% and Storage cost is 80-90%. So finally we decided to host our PostgreSQL Instance in EC2 with transparent data compression. This is our current configuration and usage metrics.
r5a.xlarge (4vCPU 32GB)
PG Master - Availabilty Zone 1
PG Slave - Availabilty Zone 2 (Streaming Replication)
8 x 100GB ZFS RAID0 FileSystem with LZ4 Compression (128KB recordsize)
40GB (wal files) ZFS FileSystem with LZ4 Compression (1MB recordsize)
600GB Compressed Data (3.1TB Uncompressed - x5.18 compression ratio)
2 x r5a.xlarge - 2 x $104.68 = $209.36
2 x 8 x 100GB - 2 x 8 x $11.40 = $182.40
2 x 40GB - 2 x $4.56 = $9.12
Total EC2 Cost = $400.88
If we had to use RDS
db.r5.xlarge Multi-AZ = $834.48
3.5TB Multi-AZ = $917.00
Total RDS Cost = $1751.48
So we have reduced our cost by $1751.48 / month (greater than x4 times) by using EC2 instead of RDS. Best of all we have purchased 3 Years No Upfront Savings Plan which further reduced our EC2 Instance cost to $105 (50% reduction). RDS doesn't have No Upfront Reserved plan for 3 Years and for 1 Year No Upfront we get just 32% Instance cost reduction.
Apart from the direct EC2 Instance cost and Storage size reduction, major benefit we indirectly got by migrating to EC2 Instances is
- IO Throughput increased by x5 due to ZFS LZ4 Compression. Importing of 3GB Compressed GZIP file would take around 2.5 - 3 hrs in RDS whereas in EC2 it just takes 30 - 45 min.
- Existing Savings Plan discount automatically applied (50% reduction)
- Ability to migrate to AMD based Instances (r5a.xlarge) - 50% reduction compared to Intel based Instances (r5.xlarge) in Mumbai region. It'll take ages before AMD based Instances are available in RDS.
- Ridiculous EBS Burst Credits by using 8 Volumes in RAID0. Base IOPS - 8 x 300 (100GB) = 2400 IOPS. Burst IOPS Credits - 8 x 3000 = 24000 IOPS :D
- PG Master is used for backup sync write operations and PG Slave is used for reporting and analytics. RDS requires Read Replica to be created from the already existing db.r5.xlarge Multi-AZ Instance for read operation which will further increase the estimated RDS cost by x1.5
- Planning to migrate to AWS Graviton 2 ARM64 Instances. AMD (r5a.xlarge) and Intel (r5.xlarge) based Instances have hyperthreading enabled which leaves us with just 2 real cores and 4 threads. But basic Graviton 2 Instance r6g.large itself has 2 real cores. So I'm kinda estimating that the basic r6g.large (2vCPU 16GB) Instance itself can support upto 1000 Active Customers (further 50% EC2 Instance Cost reduction).
(I don't use EBS in my day job, but Azure's disk offering don't really offer adequate perforamance when used with any filesystem other then EXT4 in my experience)
I haven't used Azure, but based on my raw benchmark and real time usage, I would say type of FileSystem doesn't affect performance of EBS Volumes. Our peak IO usage is 1200 IOPS and 20 MB/s. I would say similar RDS configuration would have x4 - x10 write amplification due to data being not compressed.
It doesn't look like a huge win given how much complexity you added, while RDS manages it for you.
Based on our current customer base of backup and restore solution with addons, AWS spend is about 12% - 16% of the total product revenue. Our company has about 5000+ Active Customer base where the core product offering is different. Backup and Restore solution is itself an add-on. If we would have priced this considerably larger due to larger RDS spend, then it won't be surprising even if we get just only 10% of the current addon customers (700+).
Plus I would say this isn't a much complexity, everything is automated using Terraform and Ansible - pg installation, streaming replication setup, ZFS RAID0 setup, etc... Not a single command is executed in our EC2 Instances manually. The only benefit we get from RDS is the failover capability with minimal downtime. But for that, x4-x5 increased RDS cost isn't worth for us.
We still use RDS for OLTP and service databases, but not affordable for the backup offering.
This was our experience when we tried it.
This project makes is super easy to setup a resilient and highly-available postgresql cluster.
And since the postgresql client lib handle connection to multiple replica... no need for some kind of load-balancer (pg_bouncer, pgpool...) in front of it anymore (even if they can still be useful sometimes).
The former is used to split reads and writes (and so far I haven't seen "magic" tools that automate it well, pgpool has lots of issues; this task is better to solve in app code -- most modern frameworks either already solved it such as RoR, or at least work well with 2 connections),
the latter is used to multiplex connections and improve performance similarly to nginx, haproxy, envoy, but with native Postgres protocol support.
Is anything similar available for postgresql?