Kubernetes – The Hard Way

rmoriz · on Sept 12, 2016

I'm surprised that RedHat's OpenShift 3 / Origin does not get any publicity on HN. It uses Kubernetes (of which RedHat silently became the largest committer) and also fills several missing parts including a CD pipeline.

https://www.openshift.org/

It's amazing to see how many large Enterprises already selected OpenShift, either in various cloud setups or on premise scenarios.

Looks like that for many HN people there is only AWS and GCE out there.

ogrisel · on Sept 12, 2016

Apparently the goal of this guide it to setup a kubernetes from scratch (on top of VMs provisioned on an orchestration-agnostic public cloud), not how to use a managed kubernetes cluster as provided by openshift.

avtar · on Sept 12, 2016

Perhaps the parent was referring to the self-hosting OpenShift option which acts as a Kubernetes distribution:

https://docs.openshift.org/latest/getting_started/administra...

pjmlp · on Sept 12, 2016

And Azure.

moe · on Sept 12, 2016

Many "large enterprises" have also selected Windows as their server OS and Oracle for their database...

After decades of replacing RHEL where possible (and plenty agony where it wasn't possible) I can not help but chuckle at the idea of even considering the company that brought us 'yum' for a virtualization platform.

otterley · on Sept 12, 2016

Why did you replace RHEL? You scoff at it without any explanation whatsoever. (BTW they invented RPM, not yum, and they don't position yum as a virtualization platform to my knowledge. Citation, please?)

For many of us, RHEL (or CentOS) is the only distribution we would consider running at scale because we feel they make better choices and have better policies and documentation than the competition.

icebraining · on Sept 12, 2016

they don't position yum as a virtualization platform to my knowledge.

I think moe means "considering (the company that brought us 'yum') for a virtualization platform".

user5994461 · on Sept 12, 2016

CentOS > Debian for running docker and kubernetes.

The debian stable kernel has been suffering for many months from critical bugs that results in docker crashing randomly.

Note: kernel crash = unresponsive stale system, only fixable with a hard reboot

cies · on Sept 12, 2016

Cannot find much on mounting storage to the container. It exactly that --storage-- that has been recognized as a difficult problem when moving an infrastructure to containers.

Given the title "The Hard Way", I kind of expected it would dive deeper into this topic.

jordanthoms · on Sept 12, 2016

Of course - following the first law of technical documentation, the amount written about it is inversely proportional to the complexity of the problem.

Seriously though, it is getting a bit silly how almost every single kubernetes demo involving persistent data just uses emptyDir or hostPath and we are all supposed to pretend that those are a production-ready solution that won't lose data. AFAIK the PersistentVolumeClaim and StorageClass stuff will be the solution for this?

cies · on Sept 12, 2016

To me it seems that there are many approaches. A storage cluster seems to be a necessity for anything serious. But then the questions is: which one? Open source solutions are not too many (Gluster, Ceph, FreeNAS come to mind), and from the closed source camp ScaleIO looks really promising.

But non of these are part of your K8s tutorial :)

user5994461 · on Sept 12, 2016

These storage systems are meant to be SAN devices. Meaning either Amazon EBS volume or Google volumes, on their respective clouds.

I recall reading that EBS is being [partially] implemented and tested. (Read: The documentation is full of "known bug: ..." and "please don't try that in production").

tw04 · on Sept 12, 2016

Probably depends on the scale/scope of your deployment. I know NetApp has a docker plugin for their storage platforms, I would imagine other vendors do as well:

https://github.com/NetApp/netappdvp

creshal · on Sept 12, 2016

And gluster at least has horrible performance and stability issues. And with containers, kernel panics kill an entire host…

strikerz · on Sept 12, 2016

Do you have more details on performance and stability issues? The recent releases of gluster have been working very well.

Gluster is completely in userspace. How would stability issues, if any, in userspace cause kernel panics? Can you comment more about that?

creshal · on Sept 12, 2016

> Do you have more details on performance and stability issues?

Tried using it for a while (last November until ~April) on Debian Jessie, with latest versions. Lots of bugs with SSL mode, lots of desynchronization issues, crippling performance issues, …

And no matter what I ran into, the bugs were already known and filed for months at that point, with zero developer reaction.

> Gluster is completely in userspace.

Unless you try using its NFS server to get not entirely disastrous performance.

strikerz · on Sept 12, 2016

What workloads were you using Gluster for? How did you reach out to the developers? Usually the mailing lists are very responsive and most bugs brought up on the lists are addressed.

> Unless you try using its NFS server to get not entirely disastrous performance.

Wrong again, Gluster's NFS server is in userspace too. Not sure how that can cause a kernel panic.

creshal · on Sept 12, 2016

> What workloads were you using Gluster for?

Sharing config files. Wordpress sites. Django sites. The performance was too shitty for everything.

> How did you reach out to the developers? Usually the mailing lists are very responsive and most bugs brought up on the lists are addressed.

IRC. I ended up finding all my errors unresolved by it in the mailing list archives with zero developer attention, and since I didn't have the time to help RedHat make their product actually usable, I dropped Gluster.

> Not sure how that can cause a kernel panic.

I guess it's technically not a kernel panic if I/O just stalls so hard that you can neither mount nor unmount nor otherwise access any of your NFS targets… but you have to hard reset the whole machine anyway.

lazylizard · on Sept 12, 2016

we used to run 3 gluster nodes, with a few hundred gb of shared storage..mostly used to keep logs and as war file directory for jboss/tomcat. from what i can tell gluster itself seems ok..but someone also enabled ctdb..which seems to hang every half a year or so..

e12e · on Sept 12, 2016

Lol, this is how I've been feeling about every virutalization and container/jail solution since the dawn of this century. The early jail-type solutions were somewhat excused, because the answer was either "local filesytem", "nfs" (later nfs and/or iSCSI) - or for the adventurous some kind of cluster filesytem.

On that note, are anyone using encrypted NFSv4 in production? On paper it ticks most of the right boxes, but I'm not convinced it actually works...

zwischenzug · on Sept 12, 2016

It's mandated by our security team, but no-one has ever got it working...

thinkersilver · on Sept 12, 2016

I can understand the frustration. Been watching this space to see a plausible story on having storage volumes available to containers from any host.

Histpath and emptydir aren't really acceptable as solutions since they add complexity to your cluster set up.

There are some cluster storage technoloies out there but there are no tutorials, or overviews detailing, the pros and cons, and their performance limitations in a k8s environment.

Networking was the hot topic this time last year but storage is the topic no-one is willing to talk about. Tectonic is attempting to build a solution designed for the container use-case[1]. But that's a long way off.

If you are building cloud-native applications on-prem and you need state you're on your own for now.

[1] https://coreos.com/blog/torus-distributed-storage-by-coreos....

cies · on Sept 12, 2016

> If you are building cloud-native applications on-prem and you need state you're on your own for now.

Same conclusion I've come to.

I guess K8s is just one part of the solution, it needs to be paired with a storage solution to be really able to replace existing infra. When looking at open source solutions the landscape is pretty empty with Gluster, Ceph and FreeNAS. Where only Gluster and Ceph provide some level of HA.

ThePhysicist · on Sept 12, 2016

Check out the "persistent volumes" section of the manual:

http://kubernetes.io/docs/user-guide/persistent-volumes/#par...

as well as the "volume management" section:

http://kubernetes.io/docs/user-guide/volumes/#types-of-volum...

Those two should give a good overview of the available options. Basically, you can use local paths, NFS shares, vendor-specific storage (e.g. AWS EBS) and various other things.

creshal · on Sept 12, 2016

> Those two should give a good overview of the available options. Basically, you can use local paths, NFS shares, vendor-specific storage (e.g. AWS EBS) and various other things.

Yeah, but which of them actually work?

cies · on Sept 12, 2016

> Yeah, but which of them actually work?

And on top of that, which of them work when scaling the app out. In other words: how does each option deal with having multiple containers mount it.

edit:

From this link -- http://kubernetes.io/docs/user-guide/volumes/#types-of-volum... -- I found that the following volumes allow mounting by multiple writers: nfs, glusterfs and cephfs.

But which one of these works well under (production) load?

db48x · on Sept 13, 2016

There is no single "production load"; everyone's load is different, so you'll have to try it and measure. This is always the case with any storage system.

ThePhysicist · on Sept 12, 2016

It depends on the environment: If you're on AWS, you can use EBS, if you have an NFS share you can use that. In the simplest case, you can just a `hostPath` volume to mount a local directory from the pod's host.

justinsb · on Sept 12, 2016

If you're on GCE or AWS, GCE PDs or AWS EBS works out-of-the-box (i.e. using kube-up, kops or kube-aws on AWS, kube-up or GKE on GCE).

Everything should work, but it's harder to validate / work around every NFS server's possible set of configuration options (for example).

kozikow · on Sept 12, 2016

Petsets (http://kubernetes.io/docs/user-guide/petset/#what-is-a-pet-s...) are adressing the problem of storage persistently linked to the container.

objectivefs · on Sept 12, 2016

One way to handle persistent container storage is to use a shared filesystem such as glusterfs or objectivefs. It's easy to set up and can be very helpful when moving your infrastructure to containers. Once created you can mount your shared filesystem either on the host and share it with the container or directly in the container, see e.g. https://objectivefs.com/howto/how-to-use-objectivefs-with-do...

cies · on Sept 12, 2016

If I understand correctly you are suggesting a block storage solution. But that would then prohibit a scale out, right? (I'm reading on the ObjectiveFS site that it does allow the concurrent access of shares... Hmmm.)

The other solution is object storage, which is the real way forward (IMHO). This route make a app more adherent to the 12-factor principles.

objectivefs · on Sept 14, 2016

Both GlusterFS and ObjectiveFS are shared filesystems instead of block storage, so they work with POSIX files/directories (like NFS) making them easier to scale out. If your application can work directly with an object store that is of course also a great way to go.

kordless · on Sept 12, 2016

Check out Quobyte: https://www.quobyte.com/containers

okket · on Sept 12, 2016

Previous discussion: https://news.ycombinator.com/item?id=12323187 (23 days ago, 79 comments)

yarapavan · on Sept 12, 2016

Thanks!

Please see author's tweet here - https://twitter.com/kelseyhightower/status/77498377095689830...

The github repo is updated with updated DNS add-on, better examples and AWS support.

justinsb · on Sept 12, 2016

Please note, this is not a supported AWS configuration. This is just a way to run Kubernetes on AWS instances, but it is really just treating them as bare-metal.

My advice: You should be using kops or kube-aws for production. (I work on kops, so I am biased to believe it is important)

yarapavan · on Sept 13, 2016

Thank you! Learned about kops today.

brobinson · on Sept 12, 2016

Is there a "Kubernetes - The non-Google Way"? It seems like every tutorial is for GCE.

kelseyhightower · on Sept 12, 2016

I've updated the tutorial to include AWS a few days. Based on your feedback I've added a note regarding AWS support toward the top of the project README and in the project description field.

In some cases you'll find side-by-side labs that focus solely on AWS like this one for bootstrapping the underlying compute: https://github.com/kelseyhightower/kubernetes-the-hard-way/b...

In other parts of the tutorial you'll find sections for both GCE and AWS that highlight the different commands required for each platform - only a small fraction of the tutorial requires something different between GCE and AWS. Both cloud providers are now treated equally in the updated tutorial.

brobinson · on Sept 12, 2016

Thanks, Kelsey!

crb · on Sept 12, 2016

The likely reason for this being posted to HN again today is that Kelsey added AWS instructions along the GCE ones.

brobinson · on Sept 12, 2016

Oh, I only scanned the top of the README and it looked the same as the last time it was posted. Thanks for the pointer!

thallian · on Sept 12, 2016

There is a step by step guide from CoreOS: https://coreos.com/kubernetes/docs/latest/getting-started.ht...

Never tried it though, so I can't speak of the quality (and of course it talks about running it on CoreOS).

brobinson · on Sept 12, 2016

Thanks! I'll give it a shot!

jschneiderhan · on Sept 12, 2016

I came across an AWS version while googling the other day. I figured I'd share in case that is more useful to you.

https://github.com/ivx/kubernetes-the-hard-way-aws

wyclif · on Sept 12, 2016

The very first Lab link gives a 404 error. Heads up.

gdubya · on Sept 12, 2016

That's why it's "the hard way"

t0mek · on Sept 12, 2016

Open the README.md file directly. Links will work then.

https://github.com/kelseyhightower/kubernetes-the-hard-way/b...

Apparently, there's something odd in the GitHub support for relative links in the main project README.md.

kelseyhightower · on Sept 12, 2016

Thanks for mentioning this. It looks like there is something wrong on GitHub's end. The current, temporary, workaround is to visit the README directly and the relative links should work.

kelseyhightower · on Sept 12, 2016

I was able to contact GitHub support and reported the issue. Looks like relative links from the main page are working again. Thanks for the heads up.

justinsb · on Sept 12, 2016

Please note that this guide is _only_ for learning purposes - it makes a lot of decisions that do not follow the recommended production approach. If you're installing for production purposes, you should use one of the existing tools if you can (I work on kops and so naturally recommend it for AWS). If none of the tools meet your needs, you should probably first open issues on those tools; if you still must write your own installation method, you should combine this guide with the official documentation to understand a cluster as created by one of the recommended tools, before building your own.

kelseyhightower · on Sept 12, 2016

Happy to improve the tutorial to incorporate the production approach that should be taken for each task. The main goal is allow people to run through a cluster bootstrap step by step so they learn how things work.

If you don't mind filing a few issues on the project I'll be happy to rework each lab to start addressing these concerns. Until then I've added a note to the README regarding production readiness, but that should not prevent people from learning.

justinsb · on Sept 12, 2016

You should participate in sig-cluster-lifecycle. The conclusion of that sig (or my interpretation of it) was that having a production suitable cluster is - today - essentially orthogonal to having a discoverable build-your own cluster. So we are all best served by having this document for those that want to learn about how kubernetes works, but a separate approach for installing a production cluster that prioritizes correctness over obviousness. I think what you are doing is great, but I worry that using words like "support" imply that you will be offering some support for people that try to run this in production.

However, the long term goal of sig-cluster-lifecycle is to bridge the gap here, to make a production configuration easy and obvious. So I'd love to see you start using the new kubeadm work, for example, so that we can meet in the middle!

kelseyhightower · on Sept 12, 2016

Thanks for the feedback. My goal is to augment those tools with step-by-step documentation so people get an opportunity to learn how things work. Yes, I will attempt to "support" people as they attempt to learn things the hard way.

I would also like to note that I'm not in competition with those automation tools. I'm just really focused on helping people learn. Some parts of the Kubernetes community really want to learn this stuff without an automation tool so they can skill up to build their own, custom, tools that match their preferences and tradeoffs.

justinsb · on Sept 12, 2016

That's wonderful - I think what you're doing is super important - everyone wants configuration to be easy enough that the automation tools are a comparable effort to doing things yourself! Helping people that go the kubernetes-from-scratch route has proven really difficult in the past, hence the focus on making things easier before encouraging manual installation, but I'll start sending people your way :-)