Hacker News new | past | comments | ask | show | jobs | submit login

How about managing own k8s running on VMs / bare-metal?

Pretty much anyone who works in ops longer understood from the go that its impossible to be totally provider-agnostic. K8S is just a nice api on top of provider api that still requires provider specific configuration.




Disclaimer: I work for Red Hat and am very biased, but this is my own honest opinion.

If you're going to run on bare-metal or in your own VMs, OpenShift is very much worth a look. There are hundreds, maybe thousands of ways to shoot yourself in the foot, and OpenShift puts up guard rails for you (which you can bypass if you want to). OpenShift 4 runs on top of RHCOS which makes node management much simpler, and allows you to scale nodes quickly and easily. Works on bare metal or in the cloud (or both, but make sure you have super low latency between data centers if you are going to do that). It's also pretty valuable to be able to call Red Hat support if something goes wrong. (I still shake my head over the number of days I spent debugging arcane networking issues on EKS before moving to OpenShift, which would have paid for a year or more of support just by itself).


So you move from vendor lock in with the cloud provider, to vendor lock in with an expensive, proprietary* IBM k8s distribution with its strange nonstandard opinions about workflows that you have to manage yourself?

Don't get me wrong, I appreciate RHAT's code contributions very much, they have done a lot for k8s! But running OKD on one's own is a bad idea, while paying for IBM support makes you as much a hostage as anything Google will do to you. Better to just stick with a distribution with better community support and wait for RHAT's useful innovations to be merged upstream (while avoiding the pitfalls of their failed experiments...)

* Yes it's open source, but the community version (okd) isn't really supported, nor is it widely used, so if you're serious about running this you're doing so for the Enterprise support and you're going to be writing those checks


Thanks for the edits (and acknowledging our contributions). I wasn't sure if you were just trolling or not before, so I didn't want to engage.

Your concern is valid, and I agree with you that OKD is not supported enough. I have my own theories as to why, but I will keep my criticism "in the family" (but do know there are people that want to see OKD be a first-class citizen, and know we are falling short right now). We had some challenges supporting OKD 4.x because the masters in 4.x now require Red Hat CoreOS (and nodes it is highly recommended), but RHCOS was not yet freely available. This is obvoiusly a big problem. Now that Fedora CoreOS is out, there is a freely distributable host OS on which to build OKD, so it will be better supported and usable. FWIW I have a personal goal to have a production-ish OKD cluster running for myself by end of the year.

I'll admit I am a little offended at being called a "proprietary IBM K8s distribution," but I don't think you meant to be offensive. IBM has nothing to do with OpenShift, beyond the fact that they are becoming customers of it. Every bit of our OpenShift code is open source and available. You are right that it's not in a production-usable state (although there are people using it) but it's a lot better than you'll get from other vendors. We are at least trying to get it to a usable state, unlike many of them. We are strapped for resources like everyone else, and K8s runs a mile a minute and requires significant effort to stay ahead). This space is still really young and really hot, and I am confident we'll get the open source in a good, usable, state, much like Fedora and CentOS are. I also don't think OpenShift is really all that expensive considering the enormous value it provides. The value really does shine at scale, and may not be there for smaller shops.

I don't blame you for waiting, I probably would too. Our current offering is made for enterprise scale, so isn't tailored to everyone. I've heard OpenShift online has gotten better, but haven't tried it myself. Eventually I plan to run all my personal apps on OKD (I have maybe a dozen, mostly with the only users being me and my family), but until then I've been using podman pods with systemd, which will be trivial to port to OKD once it's in a good state.


Yes, I realize I will come across as being overly negative here, and I apologize for this.

It's not that openshift is bad per se, I just don't imagine it solves many problems an org that is fretting about lock in or gcp pricing will have. Such an org is probably cost sensitive and looking for flexibility, but openshift is expensive, and if you adopt its differentiating features you are de facto locking yourself in. And if you do not leverage those features out of a desire to avoid lock in, you are effectively paying a whole lot just for k8s support...

And I really should say, for certain orgs (especially bigcos) this may well be worth it, I just don't think it is a good option for anybody worried primarily about avoiding vendor lock in and keeping costs in check.


Honestly, the fact that it requires rhel or centos was enough to make it not feasible for us. I wish that would change, since I can't think of any reason the distribution should affect openshift.


There are several reasons, many of them are that the nodes themselves are managed by OpenShift operators. If you run (as cluster admin) `oc get clusteroperators` you'll see plenty that are for hosts, such as an upgrader and a tuner. If the operators had to be distro agnostic it would be a support nightmare, and we wouldn't be able to do it. With RHCOS (immutable OS) we also have enough guarantees to safely upgrade systems without human intervention. Can you imagine doing that in an Enterprise environment while trying to support multiple distributions? I can't.


Can you describe what kind of tuning the rhel tuning tools do that are not available using the normal kernel constructs? Last I checked tuna and others did everything you could do in Ubuntu, but without knowing the guts of the system.

Again, I think the idea of OS is great, but you've lost us, and likely other big customers because of that restriction. Having old kernels is just not an option for some people.


I'm not informed enough to tell you what the tuning tools do, so I'll dodge that question. But "[h]aving old kernels is just not an option for some people" is exactly the type of problem this solves. You literally don't have to know or care what kernel your node runs, because it doesn't matter! The OS is a very thin layer underneath K8s, a layer which is entirely managed by applications running as pods (supervised by an operator) on the system. Whatever apps/daemons/services you need to run move to pods on OpenShift. If you need to manage the node itself there is an API for it. If you truly need underlying access, then this is not for you, but you'd be amazed at how many people (myself included) started out balking at this and thought "no way, for compliance we need <tool>" but after re-thinking the system realized you really don't. By "complicating" the system with immutable layers, we actually simplify the system. It was much like learning functional programming to me. By "complicating" programming by taking away stuff (like global variables, side-effects, etc) it actually simplified it and reduced bugs by a huge margin.

If you are like me and are old school and think "huh, yeah that makes me nervous" I completely understand that, but we've seen some serious success with it. I'm a skeptical person, and telling me I can't SSH to my node freaks me out a bit, but I'm becoming a convert.

I would also note if you buy OpenShift you get the infrastructure nodes (masters, and some workers for running openshift operator pods) for free (typically, but I'm not a salesperson so don't hold me to that if I've misspoke :-P), so you aren't paying for the super locked in OS. I suppose you do have to pay for RHEL8 or RHCOS on the worker nodes running your pods, and we don't support other distros (because we expect a very specific selinux config, CRI-O config (container runtime), among other things), so I guess there's some dependence there, although I recommend RHCOS for all your nodes and then just use the Machine API if you need it.


btw. I would never run OpenShift after the CoreOS debacle. This was a really sketchy move and still is.

Yes RH did a lot for k8s, but killing of a working distribution without a direct migration path that is like "start again". will make your customers angry.

also I think the OpenShift terminology is way too much and OpenShift should be a way more thinner layer on top of k8s.


I agree, the CoreOS thing went down grossly. It was a technical nightmare tho. They deeply merged CoreOS and Fedora/RHEL and created a hybrid animal. Creating an upgrade path would have been an insane challenge, and in the end the advice would have been to rebuild anyway to avoid unforeseen issues. They could have left CoreOS repos and stuff up tho and given a longer transition period.


I work for Microsoft and I quite like OpenShift (although we have AKS as the default, "vanilla" managed k8s service, you can also run OpenShift on Azure).

Sure it's opinionated, but at this point, which flavor of k8s isn't? Even k3s (which I play around with on https://github.com/rcarmo/azure-k3s-cluster) has its quirks.

Everyone who's targeting the enterprise market has a "value-added" brew of k8s at this point, so kudos to RH for the engineering effort :)


Don't want to sound snarky, but how about an upgrade path from 3.11 to 4.x? I am a heavy Openshift user and it seems that RH just dumped whatever architecture they had with pre-4 clusters and switched to a Tectonic-like 4.x installations without any way to upgrade other than a new installation. This makes it hard to migrate with physical nodes.


Not snarky at all, you are more correct than you may realize. The upgrade is a challenge because we move from RHEL 7 to Red Hat CoreOS 8 as the host OS, as well move all the OpenShift code to operators instead of in the binary. OpenShift itself can now also manage the nodes (thanks to RHCOS), as well as fully self-upgrade. For container runtime we move from Docker to CRI-O (for a number of reasons, high on the list is security). It's a massive overhaul which is really more akin to a brand new product than a major version. I generally can't stand major overhauls because most of the time they don't give you much new and often bring regressions. However, this really was a tremendous improvement, the fruits of which are not even fully realized.

Because of that major change, the cluster upgrade path is a little more involved than usual. It's a complete reinstall of the OS and rebuild of the cluster. There are tools to help tho, and as the path is tread things will get easier and better supported. If you go through the same wave as me, you'll be annoyed at first but then once it is done and you have a 4.x cluster you'll be really happy with it (especially when you can manage everything as an operator).

Luckily from an app perspective very little will change since Kubernetes is the API. A sibling comment linked to some helpful documentation. I can't be specific right now but I can tell you that your need is known, and some very smart people are working on it. If you want to email me (check my HN bio page) or jump in the Keybase group called "openshift_okd," I'm happy to chat more about it (not in an official Red Hat support capacity, just as friend to friend :-) ). I haven't done a migration myself yet but I know people who have, and I plan to get into it personally soon as well.


There is a free tool to help with migrations from OpenShift 3.x to 4.x - https://access.redhat.com/documentation/en-us/openshift_cont...

It's not a rolling upgrade, but it allows you to move the apps, PVs, policies in as granular a manner as you wish.


I'm running my own on bare metal dedicated servers. You will need to install a few extra things (MetalLB for LoadBalancer, CertManager for SSL, an ingress controller (nginx, Ambassador, Gloo), and one of the CSI plugins for your preferred storage method). It is extra work but as a personal cluster for hobby work, I'm paying $65/mo total for the cluster. Same specs would probably be $1000/mo at a public cloud provider.


MetalLB looks fun, hadn't seen that one.

If you want something production-grade (i.e. doesn't say "beta" on the tin) then I think Calico should solve most of the same problems too (it does BGP peering to your ToR switch):

https://docs.projectcalico.org/networking/determine-best-net...

Does MetalLB do something extra I'm missing?


Metallb had more features than calico before calico had the external service advertisement feature. Now that they do, services can use ecmp load balancers just as metallb does.


Traefik can also automatically get letsencrypt certs, if you don't want to use CertManager. Traefik gives the added benefit of also doing Ingress.


From what I've seen looks like managing k8s on your own often ends up requiring a dedicated team to keep with their insane release cycle.


Can confirm. Depending on your cluster size you will need at least 2 dedicated people on the "Kubernetes" team. You'll probably also end up rolling-your-own deployment tools because K8s API is a little overwhelming for most devs.


To be honest we are heavly using EKS and AKS in other teams and each of those teams has a dedicated devops subteam to help them not only with k8s but also other infrastructure because bare k8s is pretty useless for business.

So either way you end up in a situation where you require dedicated devops team pr dedicated teammembers to keep up with changing requirements.


I started learning Kubernetes and was overwhelmed. The biggest problem was the missing docs. I filed a Github issue asking for missing Kubernetes YAML docs:

https://github.com/kubernetes/website/issues/19139

Google will ignore it like all of the tickets I file. The fact is that Google is in the business of making money and they are focused on enterprise users. Enterprise users are not sensitive to integration difficulty since they can just throw people at any problems. So eventually everything Google makes will become extremely time-consuming to learn and difficult to use. They're becoming another Oracle.


"kubectl explain deployment"


Confirmed here too. I don’t think management realized the amount of toil that Kube requests to stay up to date.


The big problem with running your own cluster is the extra machines you need for a high-availability control plane, which is expensive. That is why Amazon and now Google feel like they can charge for this; you can't really do it any better yourself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: