
KubeDB – Run production-grade databases easily on Kubernetes - openmaze
https://kubedb.com/
======
stevenacreman
It's good to see a project focussing on production-grade databases on
Kubernetes. Particularly the production grade part.

There are 33 open source operators for managing databases on Kubernetes. Out
of that list only 3 claim to be production ready.

Out of 126 Operators that I've looked into the vast majority are abandoned and
unfinished. Most state the project status as Alpha in the readme.

Kubedb itself has a version number of 0.8.0 for the operator and very low
version numbers for the databases. For example version 0.2.0 for Redis.

Version numbers can mean anything but they are usually a good indicator of
what the project owner thinks the status is.

It would be cool to see a break-down of status and expected dates for
milestones for Kubedb.

For anyone interested in browsing other Operators I keep a table updated half
way down this blog post.

[https://kubedex.com/operators/](https://kubedex.com/operators/)

The project statuses come directly from what the authors have stated. Many
beta status projects are being used in production.

~~~
manigandham
Part of the problem is that Kubernetes itself is still changing rapidly and
already has design-by-committee cracks in the API.

It would help if the community took a break from new features and worked on
stability first so that Operators and other extensions can finally take off.
Some of the things being developed now are so esoteric that it seems to be
more about finding the next exciting thing to add than usability.

~~~
shaklee3
You're using that term in a derogatory sense. Would you rather have Google
decide how everything is designed, and everyone else has to deal with it? I
think you'd see a ton of GCP-specific stuff if that were the case.

I used to think how you did about kubernetes because I saw just how long it
took for features I really wanted to get in. Then I attended some of the SIGs,
and realized that there are so many use cases out there unlike mine, and that
doing what I want may break what others want. So instead of making a decision
that screws over everyone but one cloud provider, what I've seen is very
methodical and careful decision making from many companies working together.
This usually means that you get something that may not do exactly what you
want out of the box, but there are hooks to do it if you'd like. I'd much
prefer this over nothing at all.

It would be worth sitting in on a SIG you're interested in, and see how
@smarterclayton and @thockin handle these kinds of decisions. I see so much
negativity on HN about k8s, and it really seems like people just don't
appreciate the amount of attention that goes into each decision. I think if
you spend the time to trace the history of a feature and understand why things
are done, it may change your mind about how complex k8s is.

------
markbnj
I'm wary of the operator model in general, and we haven't had great success
using operators to deploy complex stateful services in our clusters. But to be
honest we also haven't had great success deploying them using OTS charts from
helm stable either. One of our k8s stateful services is a large elasticsearch
cluster indexing about 150m events per day, and the chart was forked and
heavily modified by us to get it right. I feel that complex stateful services
often have enough devils in the details that trying to implement them through
an abstraction gets you into trouble. Operators aspire to be a "smart agent"
that can translate a CRD resource declaration into a functioning thing,
allowing you to implement your data store at an even higher level of
abstraction than a helm chart provides. Since in my experience charts are
themselves too abstract for this purpose (you either end up forking/modifying
or, if the chart actually provides full coverage of the configuration options,
creating a whole new hard to comprehend API to the k8s resources you're trying
to deploy), I'm not that excited about having a back-end clippie that can do
it for us. It's probably fine for simple use cases, and especially those where
you often need to create and destroy simple dbs, but imo not yet for large
production use cases.

~~~
marcc
The Operator/CRD pattern is promising for autonomously operating simple use
cases of existing software and for operating really complex software that
needs very specific, rare knowledge to operate.

Unfortunately, we aren’t there yet for most software. Let’s take Postgres as
an example. Even though you have to manage your pg database manually (or use a
service that manages it for you), that’s just because the right automation
software hasn’t been built yet. Someday, a Kubernetes Operator (or equivalent
implementation) will exist that can manage a large Postgres cluster better
than a team of DBAs. It’s crazy that there are hundreds (thousands?) of
configuration parameters in Postgres, and these are coupled to the operating
system settings in weird and unexpected ways that most people don’t know. We
should be building this knowledge into a K8s Operator and letting that control
our pg.conf and os configuration, instead of giving that control over to a
team of humans who might be able to put in some sane defaults, but will always
be working to get the optimal performance out of Postgres as the usage share
changes.

This exists in some places already. For example, Rook is a K8s operator that
provisions and manages Ceph in a Kubernetes cluster. As a small startup, if I
need this functionality, I don't want to hire a full time Ceph admin to figure
it out, and I don’t have the expertise to take on operating Ceph myself. Rook
productized operating Ceph for us, and “baked in” all of the needed knowledge
to manage block and object store and even set up concurrent, shared file
systems. I trust Rook to manage Ceph, and I don’t think that I could do a
better job with human intervention.

We have a long way to go. Operators are a tool that might help get us there
but Operators are just a pattern that exists that we can use. One thing for
sure is that we shouldn’t assume that human control over complex software is
required to achieve optimal performance.

~~~
andrenth
What's your strategy for handling possible filesystem failure/corruption
scenarios without a team that understands the underlying technology?

~~~
marcc
That's a great point. I do have a team that understands the underlying
technologies and has been successful in troubleshooting several production
problems with Rook/Ceph, one recent one including file system corruption. My
original post is just trying to state that our engineering team does not
maintain a deep operational knowledge of the best way to configure, manage,
monitor, scale, etc (operate) ceph in production. We rely on the Rook operator
for this.

Troubleshooting acute outages caused by hardware or software failures requires
a different skill than properly configuring the system to scale and minimize
the chances of a corruption or outages. Rook solves the later, but we do
understand the architecture and what Rook (and Ceph) are doing. We've just
removed the expert level, craftsman, speciality knowledge required to operator
Ceph because we decided, after a thorough evaluation, that the software in
this case is the most capable solution.

~~~
andrenth
I find this unusual because usually the knowledge require to troubleshoot a
complex piece of software is much more complex than that required to set it up
in the first place. In other words, how can you troubleshoot it if you don’t
know how it’s built?

It’s a bit like debugging software you didn’t write.

------
keypusher
While I have completely embraced running stateless services in Docker, I have
been hesitant to migrate the database layer to containers. While I have not
tested it personally, I have seen numerous reports of performance issues when
using volumes. Is this no longer an issue, or was it limited to bind mounts?
Do volumes not use the storage driver? Also, I have run into permission issues
when using volumes with Docker, which I'm sure was just my own ignorance but
it does seem like a cause for confusion and potential error. I have read
through the documentation on the linked page, and the quickstart guides for
KubeDB seems great for getting up and running, but I do worry about situations
like if an automated PG database failover can't reconcile a timeline, there
isn't much documentation on failover at all and this could add significant
complexity to something that is already a potential nightmare. Anyone care to
share their experiences running production databases in k8s?

~~~
cookiecaper
There is no good reason to run non-test database workloads in Kubernetes or
Docker. Databases are designed to sit close to the hardware and have a stable,
dedicated chunk of resources for a long time, whereas Kubernetes pods are
subject to vaporization at any moment. Databases traditionally have fought the
_operating system_ to try and maintain enough control to remain performant.
Introducing additional layers into this would be dubious at the best of times,
but when it's something fundamentally contrary to the application's nature
like stateless orchestration, it's pure farce.

There could not be an application worse-suited to running in Kubernetes et al
than a traditional database. Anyone claiming something that rams this square
peg into that round hole is "production ready" is showing that they're an
empty husk and shouldn't be trusted near anything important.

Note the downvotes already rolling in less than two minutes after I posted
this. This subject is a major third rail here. It goes against the agenda of
very powerful people and my account has been censured in the past specifically
for making this particular argument, that database workloads and Kubernetes
don't mix. Keep that in mind when you're asking HN for their experience on
this (or any other topic that YC considers critical to the interest of their
investments -- they've shown that they're willing to taint the discussion if
it gets too dicey).

~~~
ownagefool
I assume you also consider databases on cloud with mounted block storage as
not production ready too?

~~~
cookiecaper
For the record, I have to use my reply allocation sparingly, since usually
when I start talking about this I'm mysteriously throttled for long periods.

That said -- no, that's not the same thing at all. Barring anomalous
conditions, VMs run as long as you keep them running. They won't be reaped and
rescheduled onto some other node in the cluster, whether by automated
rebalancing processes or by manual `kubectl delete po...` or `kubectl drain`.
You can easily set up a VM that will behave more-or-less like conventional
hardware if we ignore the perf hit.

This is a pretty simple thing. The reason people say you need to make your
apps "12 factor" when you go to k8s is because it doesn't work well if your
app cares about state. Databases care _deeply_ about state. You can't just
kill a DB server and spin up a new one to pick up where it left off. You can't
parallelize a DB workload by spinning up 8 little DB nodes. It's not a web
server and it just doesn't work like that. Things like CockroachDB exist
specifically because normal databases don't work like that.

This is where people usually bring up things like annotations, labels,
StatefulSets, etc. First, note that the facilities that accommodate stateful
workloads are _not_ priorities for Kubernetes and are generally not well-
tested or consistent. This wouldn't be a news story or an independent project
if they were.

Second, please realize you're doing all of that work to try and make
Kubernetes do something it's not really designed to do, with potential
negative impact on the availability and scheduling processes for the
applications that _do_ work well on Kubernetes, when you could just spin a VM
and avoid _all_ of these issues entirely. There's no reason to put a
production DB on k8s other than cargo culting.

~~~
cpuguy83
A database is an application like any other. Containers are about managing the
lifecycle of the process and container managers assist in getting the right
state to a container. Wether or not a container has state or not doesn't make
it easier or harder to run in a container.

If you aren't managing your state, then yeah you will run into a nightmare
when trying to containerize stateful apps... or running them at all. You will
literally have the same problems with a VM or physical hardware.

It's important to separate state management from process management. A
stateful application is absolutely not harder to contaknerize than a stateless
one. Rather it is simply just harder to run stateful applications in any
regard.

I would personally argue that it is easier to run a stateful app with a
container manager. I know it sounds crazy but... keep in mind container tools
are cenetered around what each individual application requires and the tooling
tends to make it easier to express and assist in managing the state
requirements of that application.

For that matter you can even prevent the scheduler from scheduling your
stateful app on a new node, which seems to be the answer for the crux of the
argument against containerizing a stateful app.

~~~
cookiecaper
> Wether or not a container has state or not doesn't make it easier or harder
> to run in a container.

I agree, which is why I specifically avoided that language. Containers don't
have to be implemented without regard for state -- but if you're talking about
Docker or k8s, they are. Docker throws away anything not explicitly cemented
in the image or designated as an external volume.

LXC, zones, and jails are containerization techniques that respect state. It's
fine to run a database in these if desired. They behave just like real VMs;
they have an init process, they get real IPs, they don't automatically destroy
the data written to them, and they generally don't mysteriously shut down or
get rescheduled. You can't be confident about any of that with Docker or k8s.

Statefulness is not a primary use case for Kubernetes. It took two years for
StatefulSets to leave beta and there was a substantial false start in PetSets.
As recently as April, which is the last time I seriously looked, there were
still competing APIs for defining access to local volumes.

If you want to run a production database workload in a jail or a zone, that
sounds fine to me. It's not about containerization in the abstract. It's about
the way that Kubernetes and Docker do it.

(I mention Docker and k8s together because for most of k8s history Docker was
the only supported runtime. It supposedly can use other runtimes now, but
they're not widely used afaik, and behave similarly re: state anyway)

~~~
ownagefool
So actually kube and docker throwing away your state (that you haven't
specifically persisted) is basically a good thing, because it makes you very
aware of where your state is.

~~~
cookiecaper
I'm on board insofar as bosses, regulators, and customers find "heightened
awareness of state" an acceptable substitute for the production data that was
sacrificed to the cause.

------
softwaredoug
For those terrified of an AWS dominated future, projects like this are
crucial. The closer we can get to OSS based push button open source DB cluster
in any cloud, the less we need fear AWS will host everything and lock us in to
a walled garden of closed source AWS systems.

~~~
deboflo
You are more likely to get locked in with kubernetes than with AWS. It’s
easier to migrate out of highly decoupled, well documented systems piece by
piece (AWS) than out of monolithic frameworks like k8s.

~~~
shaklee3
That makes no sense. Kubernetes leveraged aws primitives (elb) if needed, and
at its core, it deploys containers. As long as your application runs in a
container, you aren't locked in.

~~~
deboflo
I think we can agree that Kubernetes does far more than schedule containers,
even if “at its core” that’s what it does. How many lines of the 2e6 lines of
code k8s project are directly related to scheduling containers? Very few. If a
scheduler is all that is needed and you want to use any of the 3 different
types of load balancers provided by AWS, a simpler architecture might be just
to use AWS ECS. 500 lines of declarative Cloudformation or Terraform will do
the job.

~~~
shaklee3
What features are you referring to specifically that lock you in? Sure, it's a
large project. But most LOC are around being modular and pluggable, and
adhering to standards (OCI, CNI, CSI). I can't think of anything that would be
particularly difficult to move out of if needed.

~~~
deboflo
There isn’t sufficient separation between components within Kubernetes for
ease of migrating piece by piece away from kubernetes. Documentation also
plays an important role in migrations. I once counted the pages of
documentation for Kubernetes vs AWS for equivalent functionality (VPC, ECS,
Route53, etc) and AWS had 20 pages for every page of Kubernetes.

------
lukeqsee
Earlier discussion:
[https://news.ycombinator.com/item?id=18698759](https://news.ycombinator.com/item?id=18698759)

------
an-allen
I’ve always been troubled by production-grade handling of state in containers
- specifically as it pertains to data backup.

This module takes that into account - and defines a “backup k8s object” that
will trigger a db dump. But there is still no way to get point in time data
recovery/backup that you get from current production-grade managed state
providers. Im going to say its production grade if we are using the standards
of 10 years ago. Production-grade today, I feel, is a bit more robust.

~~~
DasIch
[https://github.com/zalando-incubator/postgres-
operator](https://github.com/zalando-incubator/postgres-operator) supports
point in time data recovery just fine and is used in production for 100s of
databases at Zalando.

~~~
pritambarhate
It would be good to know the size and scale of these databases.

~~~
DasIch
I don’t have actual numbers but I did a quick search and most are a few GiB to
tens of GiB, although there are a few hundreds of GiB large. In practice size
is not the limiting factor, IOPS are because they all use gp2 EBS volumes.
Databases that have huge IOPS requirements are still deployed outside of
Kubernetes and run in i3 instances. In that case they still use spilo though,
so basically the same system for backups and automatic failover as on
Kubernetes.

That being said we also have an ElasticSearch operator that is used to deploy
ElasticSearch on Kubernetes, there nodes running on i3 instances and the
corresponding instance storage is used. Although used in production that’s
still very new and sadly not open source.

~~~
bogomipz
>"In that case they still use spilo though, so basically the same system for
backups and automatic failover as on Kubernetes."

What is "spilo"? I am not familiar with this term. Thanks.

~~~
DasIch
Spilo[1] is a Docker image that provides postgres bundled with Patroni[2].

The postgres-operator I linked earlier but also our setup on AWS (with one
image per EC2 instance) uses that to actually run Postgres.

    
    
      [1]: https://github.com/zalando/spilo 
      [2]: https://github.com/zalando/patroni

------
SoylentBob
Interesting project! Thanks for sharing.

How does this compare to other community efforts, e.g. Zalandos Patroni
project, aside from supporting more databases than just postgres?

------
mosselman
Does anyone know of a docker alternative like this? So something like KubeDB
that lets me deploy a production-ready postgres db on docker swarm for
example?

~~~
cpuguy83
I would not run a database on swarm. It simply does not have the right api's
at the cluster level to properly express state requirements.

The original swarm design had some of this but it was pulled just before
release for more design work... which was never completed.

I wrote the only storage support currently in swarm, which is the "mounts" api
in your service spec...

So, technically you could use swarm to do it, but it will be painful and I
don't think any amount of tooling will help until docker includes some support
for cluster-aware storage.

I would be happy to hear if people have successfully done this, though!

~~~
mosselman
Thank you for your reply. Do I understand correctly that the biggest issue is
the fact that containers won't run on the same node and you'd thus have
storage issues? Would these issues be (partially) mitigated if you'd run
postgres on a single node?

~~~
keypusher
If you are running multiple copies of postgres on a single node, then you have
not significantly improved the resiliency of your database to failure, and it
still does not solve the state transition problem. What happens when the
primary database fails (or the node dies)? Whether it is on this node or
another node, you need to have a replica (sync or async) that you can fail
over to, preferably in an automated way. Docker swarm is not equipped to
handle these transitions for you, at which point you are just running your
database in Docker, with no real benefit over running it on actual hardware or
a VM, and with significant added complexity.

------
bearjaws
Funny because I was just baffled by the pricing of HA MongoDB (from formerly
mlab), it gets way too pricey way too fast.

When looking at the hardware being provisioned I realized it wasn't even
anything too crazy and could be had for 1/4 the price at Linode.

I will definitely be using this in the future.

------
rmoriz
How does it handle PG updates like for example from PG 9 to PG 10?

~~~
Volundr
Based on my reading of the documentation, it doesn't. So you'd be responsible
for taking a backup via pg_dumpall, and restoring it post-upgrade.

~~~
rmoriz
Thanks for the confirmation. I was not able to find it, either. Strange what
use cases are called „production-grade“ nowadays...

~~~
elsonrodriguez
Anyone that says "Stateful" and "Production Ready" and "Kubernetes" in the
same sentence is likely to disappoint you.

------
geggam
Performance tests please ?

