Hacker News new | past | comments | ask | show | jobs | submit login
Percona Everest: open-source automated database provisioning and management (percona.com)
103 points by petecooper 9 months ago | hide | past | favorite | 51 comments



Most of the pain of running databases in k8s is all of the "day 2" operations like backups, clustering, scaling, upgrading, tuning, etc., so I'm glad to see all that accumulated knowledge built into controllers like this.

One feature I feel is lacking is better handling of database credentials. I see there's a "copy to clipboard" button next to the password, which tells me we're still using the same single, static, plain-text DB passwords that we've been using since the 90s. I'd love to see some kind of cross-platform RBAC system that uses rotating credentials or asymmetric crypto or something.


The problem is typically day 1000 problems: The database broke, nobody really understands all the stuff and dependencies by the kubernetes helm chart and still you have to fix it.

Downtime is now calculated in days and not hours.


Recover to a snapshot in one to two hours, then debug

Dump the snapshot into a managed DB short-term if you have to if the team can’t wrangle the controller


Google cloud managed postgres accepts both old school passwords and IAM users (where you never see the password, it's baked into the DB).


Same with RDS on AWS (pg and MySQL both)


I got really excited about this and then realised it's only for Kubernetes: the one platform I've never believed you should deploy a database to (relational or otherwise.) I guess there are some use cases for such a deployment, but after 20 years of experience across many organisations on three continents, I've never encountered a situation that involves constantly rolling forward the database engine. Bring the engine inline with updates, sure, but weekly? Even monthly? No.


I wouldn't have put a database on k8s five years ago, but today the options are a lot better for persistent and resilient storage on k8s and there are pretty significant advantages to being able to normalize your deployment platform. I use CrunchyData's Postgres operator on top of Longhorn volumes for most of my stuff and being able to have what amounts to one-click deployment of multi-host-redundant, automatically backed-up databases is really nice.

The Percona Everest operator doesn't look as full-featured for Postgres, which is the only database I typically use, so this isn't for me, but might be a little better if you're a MySQL user (though that's just me guessing).


Kubernetes is bad for running databases is a _very_ outdated belief


As a DBRE, I disagree. See post below [0].

[0]: https://news.ycombinator.com/item?id=41414237


Kubernetes is bad for running anything at all, including databases.

(K8s is second-system effect and job security for sysadmins. From a technical point of view it causes more problems than it solves.)


Can you qualify this statement? It’s 2024, Kubernetes is old tech and bullet proof at that.


Does Kubernetes has reliable and predictable cronjobs already?



We've seen a multitude of issues, like jobs failing to start, getting too delayed (also the infamous "if your cronjob fails too much it will stop working forever ")

Though it seems they rebuilt the controller to address most of the issues https://kubernetes.io/blog/2021/04/09/kubernetes-release-1.2...


This is just not true anymore. I went through the pain of using it early and there were times to feel like that, but t brings far more to the table than it costs anymore…


It's fine if you run it at a cloud provider. Setting up a k8s cluster yourself is painful though and at a cloud provider it costs far more than using just bare metal and/or docker (we do not, as it's another thing to manage and very boring). We have auto deploy scripts in Perl since the 90s and never had any needs for any of this stuff; we now host for less than $100/mo, with millions of users and a lot of profit with very little maintenance. I wonder why IT people like burning money so much, especially here on HN.

There is no need for almost any sites; sure facebook/google, but you are not running those nor is it likely you (not specifically you obviously, me neither) ever will. VPSs are robust these days and have no downtime besides kernel updates. I cannot phantom why you would want to burn humans or money on this kind of complexity. But then again, we like profit (not growth because of growth) and it seems that most here really are not that interested in that. If I cannot be Gates or Musk (and I cannot, nor can you, again, no attack on you; just statistics) then I rather have little work or headache with millions $ of profit/mo coming in instead of 'growth'. Maybe i'm odd, but I am free for the past 30+ years because of these choices (currently; common lisp, apache, perl, php, mysql, haproxy, redis, wireguard; hopefully I get this down to just common lisp + wireguard before I pass). We don't use libraries or tech less than 10 years old unless it's really needed and we contribute to everything we use (so we use very few things otherwise we have to hire and that's a waste of $); I sleep very well at night knowing nothing is going to happen.


> It's fine if you run it at a cloud provider. Setting up a k8s cluster yourself is painful though and at a cloud provider it costs far more than using just bare metal

I think it's almost exactly the opposite: I'd rather use cloud-specific tooling on clouds but k8s is a Better OpenStack on bare metal. It provides a standardized layer upon which generally-reasonable tools can operate without thinking about it much. There is a cost factor--it doesn't need to be a high one, though, and it's also a forcing function into stuff like "actually thinking about redundancy" ahead of time.

I've deployed in production everything you described and unless I was optimizing, as you are, for cut-to-the-bone opex and personal stress when it breaks bad (which is not a judgment call but it is certainly not the only reasonable decision to make; investing more in operations to have more "bounce" when things goes bad is not a bad thing), a reasonably thought-out k8s environment is going to be easier than shell scripts from the 90s once I need to have anyone who isn't me take over a problem.


No stress. Definitely less than most people I have met who spend their life doing this kind of over architected nonsense. But he, if you say it's easy (it is definitely not though; and it does go spectacularly wrong even at big companies where no one knows why, because complex) then do whatever: I am guessing your income depends on complexity for clients/your employer/devops gigs while mine depends on things being simple and never (again; it's been 30 years) breaking. Things don't go bad: there is enough 'bounce' here, we just refuse to spend money or time there as we do not need it. I rather work on features than stuff that should be invisible in the first place.


My income depends on no such thing, if anything it depends on reducing complexity where it doesn't provide value, but it's telling that that's the only place your mind goes. And because of that, I think there's not much value in continuing this conversation.


> ... without thinking about it much

God forbid if you had to think and know about your infrastructure and how it worked, and whether it was as minimal and simple as possible whilst delivering results. Best to just use abstractions upon abstractions you don't know well and hope for the best.


This is the biggest issue; if something truly goes wrong with k8s, your only way it is destroy everything and redeploy; you will have no clue at all (well very likely; of course there are people who do, just not very many) what happened. This started with AWS roughly 2 decades ago when they simply said; assume it will break and architect for it: don't try to figure why things break, just restore and move on. This was absolutely brilliant: now people deploy million$ projects without actually understanding too much of the environment and pay $$$ to make sure they never have to. Well done Werner.


...I do know them pretty well, which kind of puts a hole in this kind of snooty nonsense. Because I know the abstractions and what's under them, I don't have to think about it much, because I've internalized what it's going to do.

I've built systems that exist today both ways. There are reasonable arguments for both. Please don't be weird.


I couldn't agree more.


And yet you both throw incredibly weak & broad non-technical aspersions.

Casting such broad unspecific unnuanced nets, denying value blanketly where clearly some people do find value, is trolling.


From a technology point of view k8s doesn't do anything better than what Perl scripts used to do 25 years ago.

(Not that Perl scripts are any good. They're crap technology, but unfortunately so is k8s.)

K8s doesn't solve a technical problem. It solves two contradictory social problems:

a) It gives sysadmins a job creation program, full of expensive and opaque stuff that requires expensive sysadmins. b) It makes sysadmin stuff fungible and replaceable for developers.

Solving both problems is probably an important social issue if you're running a Google scale organization. But it's solving a social and organizational problem, not fulfilling a technological need.


K8s is a fantastic development tool. You can't ask for a better self-service tool to enable developers to ramp up on a platform and develop or test their apps across teams and orgs in a standard, portable, safe way. Its biggest problem arguably is that it's too configurable, and doesn't have enough abstractions to hide the complexity.


Depending on what you're after, the serversideup.net libraries can be pretty handy.

They also have a relatively new project out called spin which is a docker swarm orchestrator. Seems to be the real deal.

https://serversideup.net/open-source/spin/


How do you patch and update your host machines? You need to roll to a backup host, have it take over as primary, patch and reboot your DB host, and then roll back.

I would much rather have an automated process for doing that. Kubernetes provides a framework for allowing that to happen.


I was managing a database cluster that served 30% of the population of my entire country.

We rarely needed to restart the hosts or database service (maybe once in every two years), and even if we had the failover process took minutes and it was automated. Without Kubernetes.

Nowadays people often think that Kubernetes is the only answer for scalability and automation problems.

As the matter of fact we had another product that was running on Kubernetes, they are now migrating off from it, turned out that the old school approach with bare EC2 instances, Ansible, Terraform and Packer is much more reliable, scalable and cost effective than Kubernetes.

I have to add that yes, we had to write some custom tools, but it was way less effort than managing a K8S cluster and all the software/operators/controllers that are needed for running your workload on it.


> As the matter of fact we had another product that was running on Kubernetes, they are now migrating off from it, turned out that the old school approach with bare EC2 instances, Ansible, Terraform and Packer is much more reliable, scalable and cost effective than Kubernetes.

Absolutely it is. I've found this to be the case across a lot of orgs now.


With the exception of kernel patches (and even then, tools like ksplice exist), you generally do not have to restart the OS to apply patches.

If the DB itself is being patched, then yes, you’ll restart the DB process, but it’s also not difficult to automate failover.


K8s is just a complicated confusing poorly documented thing for running containers. Database server processes running in a container is totally fine. Not really different than running said processes on a bare kernel like it's 2010.


I have to disagree, k8s is extensively documented and the reference docs and APIs are easily accessible. K8s at its core is a customizable, extensible, dynamic API server, with a focus on containerization. It's built with scale and customization in mind, you're not supposed to use it only for running a few containers. I've worked for people that use it to manage VMs with custom controllers. You can change pretty much anything and fit it to your needs. All this with defined, somewhat opinionated sane defaults and conventions


Poorly documented? Surely you jest, the K8s documentation is _excellent_.

For basically every core resource type, there’s a user-guide, examples, a tour of it, in-depth docs and then the api docs themselves.


Explain why kubernetes isn't a good choice for hosting a relational database.


For small databases (anything with ~< 10,000,000 rows), sure, it's probably fine. Other than that, no.

* Unless you are self-hosting K8s and thus have a large amount of control over the underlying storage, the amount of IOPS you're getting will be hazy at best. Tbf this is also true with every single DBaaS, because the latency on network storage is absurd, so IOPS become somewhat meaningless.

* Unless you have modified the CPU scheduling options [0] in K8s, you have no control over core pinning or NUMA layout. This is even worse due to the fact that your K8s nodes are probably multi-tenancy.

* By its nature, K8s is designed to host stateless apps. It is fantastic at doing this, to be clear. I love K8s. But a system where the nodes can (and should, if you're taking advantage of spot pricing) disappear with a few minutes' warning is not a great host for an RDBMS.

* Hot take: it makes provisioning a database even easier, which means people with even less understanding or care of how they operate will be doing so with reckless abandon, which means people like me have even more work to do cleaning up their mess. I am a big fan of gatekeeping things that keep companies afloat. If you want to touch the thing that every service is depending on, learn how it works first – I'd be thrilled to help you. But don't come in and just yolo a copy-paste YAML into prod and then start chucking JSON blobs into it because you can't be bothered to learn proper data modeling, nor reading RDBMS docs.

Re: [0], if you don't care about core pinning, then it's unlikely you're going to care about any of these other points, and you probably also don't understand (or care) how blindingly fast an RDBMS on metal with NVMe can be.

I am not a Luddite. To reiterate, I have administrated self-hosted and managed K8s professionally. I also run it at home. I just have strong opinions about understanding fundamentals (and not causing myself extra work by allowing people who don't care about them to run infra).

[0]: https://kubernetes.io/docs/tasks/administer-cluster/cpu-mana...


> it makes provisioning a database even easier, which means people with even less understanding or care of how they operate will be doing so with reckless abandon, which means people like me have even more work to do cleaning up their mess. I am a big fan of gatekeeping things that keep companies afloat. If you want to touch the thing that every service is depending on, learn how it works first – I'd be thrilled to help you

Couldn’t agree more.

Kubernetes and such tools do not make things easier, they just give you the illusion of it.


That is an excellent way of putting it. Abstractions makes you think you’ve got a handle on things, but when they break, now you have two problems – neither of which you’re probably equipped to solve.


When I was doing Econometrics at Uni, the lecturer wouldn't let us use the functions in the software that did the mathematics for us. He made us learn using Spreadsheets and doing the calculations manually, _then_ he let us use the automated functionality once we understood it.


> By its nature, K8s is designed to host stateless apps. It is fantastic at doing this, to be clear. I love K8s. But a system where the nodes can (and should, if you're taking advantage of spot pricing) disappear with a few minutes' warning is not a great host for an RDBMS.

Why is this a problem? A typical deployment will have multiple replicas, with (hopefully) small replication lag. Those should be able to be promoted to be the new primary within a minute.


> A typical deployment will have multiple replicas, with (hopefully) small replication lag. Those should be able to be promoted to be the new primary within a minute.

What happens within that minute to database writes?


Do you enjoy being paged for something out of your control? I don’t.


I don't get a page for something that is not under my control, that is an organisational problem if you get that. But how is it relevant here?

In my example, I will get a page for large replication lag. But not for an unplanned failover. That will be an alert, but not a page.


Well, the problem is a little bit more complicated than just having replicas.

You cannot build operational procedures based on “hope”.

High replication lag occurs for many many reasons (and they are not a rare event, or something that you can prevent). As well as network partitions.

Replication and binary logs can get corrupted, there can be deadlocks, duplicated row errors, etc.

The thing is that database administration is a broad and complicated topic, a small mistake or the lack of understanding how these systems work can easily lead to huge data losses.


> typical deployment

Ah yes, HN. You know there are billions of sites(wp mostly), LoB apps etc that run on 1 mysql/pg/etc instance right? Replicas are not typical and a tiny minority.


Exactly. Kubernetes and micro services? Sure, for about 0.5% of the industry. Everyone else needs two servers and a load balancer.


Technically four servers, because you’ll want a HA LB as well, but yes.

Tech is rife with people who have never set up an old school HA solution proffering advice on how a miasma of cloud services makes theirs better.


Why would I want k8s for a simple Wordpress site in that case?


OP was talking about 'typical database setup' being a replicated db. It's not typical. Nor is the use of k8s for stuff outside HN and massive companies. Not that I mentioned k8s anyway.


I think many of the performance points here are a trait of a VM in Cloud more than K8s, and they would be no different if running in EC2 right?


Anyone know how this compares with stackgres and cloudnative-pg?

Looks like this supports mysql and mongodb, which the others don't.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: