Hacker News new | past | comments | ask | show | jobs | submit login

Nice, exactly what I needed today, after firefighting a latency spike because of a compute node after a separate ingress node was restarted.

I would make the taxionomy a bit more precise: failures at running software on the platform and failures at running the platform itself (which of course affect the software running over it: high latency, network packets dropped, DNS not reolved, ...)

In general I find that when the platform works as expected, it is not that easy to make software run on it fail. That is, it is harder than without what Kubernetes provides (granted: you can have it without Kubernetes, but many of us didn't bother to have capable admins setting things up the right way).

What I find extremely fragile is the platform itself. It works, but you have to start and stop things in the right order, navigate a sea of self-signed TLS certificates that can expire, iptable rules and services and logs.

All have failure modes you need to learn: it takes a dedicated team. And once you have that, you'll need another team and cluster to perform an upgrade.

But hey, when it works, it is really cool to deploy software on it.




I feel like their needs to be a sanity check. Realistically what does kubernetes do to "not make software fail"? Health checks? Autorestarting containers that crash? Enforce various timeouts? Relatively quick distributed configuration deployments? These can be hard problems for sure, but do you need all of kubernetes to get these benefits in a production application?


Essentially why we made a choice to go with Swarm - less features, although all the features we needed are there (restarting containers, scaling them up or down and across several machines).


I made this choice as well and I regret it. I've had a nice share of bugs regarding DNS and service discovery.

However, those bugs seem to be fixed after version 19.x, but I'm still bitter about all the problems that I had because of simple features failing on me.


The same can be said of Hashistack (Consul, Nomad, Vault). Orders of magnitude simpler with the obvious tradeoff of being less featureful.


Consul and Vault offer functionality that Kubernetes is completely lacking, though.


Is it worth going with those? I'm torn between deploying stuff on Kubernetes (on a managed control plane, of course, which you can get for free nowadays with providers as long as you use their own machines) and rolling my own with the Hashistack.


Depends on your use-case, really. IMO, Hashistack is fine if you have well-defined requirements. You can pick and choose the components you need (including non Hashi stuff) and integrate them with relative ease, keeping resource footprint and operational complexity at a minimum. I’ve found that it’s quite easy to pinpoint problems when they arise.

Kubernetes is more of an everything-and-the-kitchen-sink approach that you can’t really outgrow but because of its monolithic design, debugging can be quite challenging. It does, however, come with a much larger and much more active community that you can go to for help.


But swarm only has overlay network iirc?

Swarm is great to get simple orchestration going, but it doesn't really do the same thing.

With k8s you can configure multi-point routing, storage backends and load balancing among other things.

With swarm, you get ... Overlay network (which is shit if you actually need to really scale) and 'scaling', which starts n containers of the same service.

Swarm feels more like an mvp which actually works even for small teams. K8s is more like a behemoth which only dedicated personal can fully tame with a silly amount of features most people don't need.

We've used both at my current job for toy projects (self hosted). Never in production however.

And I'm a personal user of gcp - which works wonderfully .. albeit more expensive than simple root servers


Possibly edgy opinion:

Load balancing should be a solved problem already. Swarm and Kubernetes should be using dead simple off-the-shelf software for ingress and load balancing. Any competitor should be able to use the same solutions. To put it another way, this shouldn't be a differentiator.

The problem is that the functionality in tools like nginx are still tied to static network architectures that evolve slowly, and don't take advantage of things like diurnal variability in workloads.


Kubernetes does use dead simple off-the-shelf software for ingress and load balancing. That software though, unfortunately, has a lot of knobs, and what "Ingress" and "Service" resources do is make sure those knobs are turned to the right settings.

The nginx ingress controller for example, under the hood, just generates and applies a big ol' nginx config! You can extract it and view it yourself, or extend it with a Lua script if you want to be fancy and do some FFI inline with processing the request, etc.


> The nginx ingress controller for example, under the hood, just generates and applies a big ol' nginx config!

I learned the hard way that GKE, in using GCP's load balancers don't support the same syntax for Ingress patterns as when you use an nginx Ingress. Definitely read the documentation thoroughly!


> That software though, unfortunately, has a lot of knobs

Lots of people have different definitions of 'easy' which is why I didn't say 'easy'. But how did you get this far off the rails with 'dead simple'?


Same with swarm, it's just using IPVS at L4. Does not provided anything at L7, it's up to you to provide such a service on the cluster.


That's the opposite of dead simple and k8s tries hard to make things more complex than needed.


Is it hard to integrate a (HA) hardware load-balancer in front instead?


The easy way to do this is with NodePorts, wherein you configure your LB with all the nodes in your cluster being app servers on a certain port for a certain app. However you will lose some performance as there's some iptables magic under the hood.

Beyond that there's a sea of options for more native integrations that will depend on whether your LB vendor has an K8s integration, how friendly your networking team is, and how much code you're willing to write.


for baremetal you can use metallb or kube-router if you have bgp infra. no need for hardware ha lb


Swarm comes with overlay networking, you can install network plugins for whatever you need.


You can't switch networking for Swarm Mode


Yes you can. The plugin just needs to be swarm capable.

https://docs.docker.com/engine/extend/plugins_network/


Docker Swarm doesn't support basic features such as shared volumes (between separate hosts), though.


sure it does, you just need a proper volume driver like rexray AWS EFS


The following link states that rexray does not support shared volumes, and that it was designed so that "it forcefully restricts a single volume to only be available to one host at a time."

https://github.com/rexray/rexray/issues/343


> navigate a sea of self-signed TLS certificates that can expire

When certificates become a logistics category of their own, you really do need some monitoring software to warn you when certificates are winding down.

Years ago I worked on a complex code signing system and one set of users were adamant we have alerts starting at 90 days out. It took some convincing to get me to agree it was a priority, but some of my coworkers were never convinced.


Some people, when faced with a problem, think: I know, I'll use distributed systems!

Now they have a new career.


> a sea of self-signed TLS certificates that can expire

I would like to know more about what's going on here. Is this just a sloppy description and in fact Kubernetes uses a private PKI, so that the certs you're using aren't in fact self-signed but signed by a private CA?


There's (IMO) a mix.

Kubernetes and cloud native software make a lot of use of TLS for mutual auth.

A standard Kubeadm cluster (very vanilla k8s) has 3 distinct Certificate authorities, all with the CA root keys online.

On top of that things like Helm, FluentD and Istio will make use of their own distinct TLS certs.

One of the most "fun" pieces is that k8s does not support certificate revocation, so if you use client certs for AuthN, then a user leaving/changing job/losing their laptop can lead to a full re-roll of that certificate authority :)


If you issue user certs, it is best to do it from software and have them live for 8h max.

Also recommended: keep api behind a jumpbox.


I've seen short-lived certs as a suggested workaround for the lack of revocation and as a user that might be the best option.

That said the distributions I've seen that make use of client certs, don't do that (typical lifetime for a client cert is 1 year), so I'm guessing a load of people using k8s will have these certs floating about...


It’s a mental shift similar to using iam assume role sessions rather than static keys with perms right on em.

Powerful creds should be limited in lifespan and machine issued to users in a transaction that involves mfa.

We built our own but Gravitational Teleport I think has these patterns in a product.

The other issue even if you get certs right is service tokens (an admin could have grabbed a copy of one and they can be used from outside cluster) so want to keep that api wrapped IMHO.


As an aside, if you want revocable, non expiring user creds you would be better served by bearer tokens.


True, the only real in-built option in k8s land for that is Service accounts, which aren't designed for user auth. but can be used for that purpose.


With certs that short-lived, you're just reinventing Kerberos, badly.


Using a similar pattern (tickets I guess) doesn’t mean you’re reinventing something.

You might equally say we’re reinventing “authenticated sessions”. More in common with jwt cookies tbh.

We don’t want to run krb infrastructure so we don’t do that.

The runtime dependency on ldap or NIS, plus keeping krb itself HA, fed and happy plus OS dependent PAM setup make krb fairly undesirable in a production cloud environment if ssh certs and kube certs would suffice.


> so that the certs you're using aren't in fact self-signed but signed by a private CA?

I find that most people confuse or combine "self-signed" with "signed by a private CA". For a lot of uses, the configuration pains are the same to the user: "I have to load this cert into the CA root trust store". They don't realize how much better a private CA really is.

And of course, PKI would be so much more useful with "name constraints" so you don't have to trust a private CA for all domains just the one domain you care about.


k8s tends to use a private PKI. I do wonder if we're comparing secure verses insecure solutions though. Nobody ever has TLS issues with swarm, puppet, or aws, or are they just trusting the network?


If anything, this enhances your actual point but... if you are using self-signed certs with Kubernetes, then your cluster isn't secure.... so you shouldn't do that.


Arguably self signed certs eliminates possible state funded middle man attacks making it more secure


As a technology, properly-implemented self-signed certs are totally fine. The problem is that k8s does not have the features necessary to use self-signed certs securely. Instead, it expects you to create your own CA (or CAs: you can use separate ones for different kinds of communication if you want extra bulkheads) and then to share out your private CA's cert to all the k8s components. This achieves your goal of cutting out MITM attacks via unscrupulous commercial CAs while also making it possible to trust families of certs for a given purpose, rather than having to whitelist every single consumer's private key.


This is great. If a state funded threat is in your network in the position to place a certificate on a server do you think your self signed certificate will protect you?


Think of it as a cost and effort threshold. Prevents the dragnet / fishing methods from eavesdropping. It's trivial to force $Company to let you in with letter. The effort to break encryption is not trivial. You have to be doing something wrong to get specific attention.


I mean if you use cert pinning with a public CA you get the same results, aka you can easily spot MITM. I am generally not a big fan of private CAs.


Self signed certs are absolutely secure and more secure than non-self signed assuming you are using proper key management.

Now, whether or not that works for your users...


I think he's implying a whitelisted set of self signed certs, not "trust any cert".


I think more likely he's implying using a private certificate authority.


Unfortunately, Kubernetes does not support that mode of operation.


You can create your own CA cert and put it in the root store on each instance. This can be incorporated into the provisioning and deployment pipeline for creating the nodes.


Definitely! That is how you are supposed to do it! That is a private CA, not self-signed certs, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: