
Kubernetes by Example - rbanffy
http://kubernetesbyexample.com/
======
eeZi
For anyone interested in Kubernetes: Red Hat's OpenShift is worth taking a
look at.

It's upstream Kubernetes + a PaaS framework built in top of it.

It takes care of role-based access control, has a secured Docker registry
(prevents applications from pulling each other's source code), Jenkins
integration and can automatically build, push and deploy your applications.

Our team started using it and it's great. The documentation is top-notch (it's
probably the best docs I've ever seen in an open source project).

I've seen many teams re-invent the wheel over and over again, when OpenShift
already does most of what they need.

Happy to answer questions!

[https://www.openshift.org/](https://www.openshift.org/) (`oc cluster up` and
a running Docker is all it takes for a first test)

Docs:
[https://docs.openshift.org/latest/welcome/index.html](https://docs.openshift.org/latest/welcome/index.html)

Blog: [https://blog.openshift.com/](https://blog.openshift.com/)

~~~
orf
I just tried to get started with minishift and it doesn't seem to work.

`minishift` seems to be similar to `minikube`. On my mac, running `minikube
start` successfully starts a minikube instance in Virtualbox.

Unfortunately `minishift start` seems to sit there and fail after 120 seconds
(with xhyve and vbox) because "the docker-machine didn't report an IP
address", and it seems that the docker-machine is not even created.

This is a shame, I'd very much like to try out openshift. If anyone else has
the same issue here please let me know!

Edit: Someone replied but deleted their comment. I should have run `oc cluster
up --create-machine`!

------
whistlerbrk
As someone who stepped out of the devops world for a minute and is now trying
to convert my companies infrastructure to use these tools, this is very useful
and I'm reading through the whole thing.

However, I'm still confused by how the tools in the ecosystem interact with
the capabilities of various cloud providers. That is, we're using DigitalOcean
and Docker and I want to get our infra to a point where I can easily spin up a
brand new staging environment (staging-2 say) with an isolated postgres node
using an attached volume which also say runs redis, a proxy node, a couple of
nodes for application servers, and a couple of nodes for background jobs, all
w/private networking, secure, w/non-root access, and run a quick task to seed
dbs.

I just can't seem to find guides which put the whole thing together, just
pieces and I'm lost in researching an overwhelming number of tools from
Ansible to Terraform to Kubernetes to Helm, etc, etc, etc.

~~~
rocgf
I'm not a very versed DevOps guy, but I have used Ansible, Terraform and
Kubernetes.

It should go a bit like this:

\- Use Terraform to provision VMs, networking resources and storage from
DigitalOcean. Basically, write the scripts that make your whole infrastructure
available with a single command.

\- Then use Ansible for anything you might want installed on those machines -
Kubernetes, security packages, SSH keys for your team.

\- Use Kubernetes to then deploy your application on top of your secure and
replicable infrastructure.

Each of the steps above should be roughly one shell command. If you are
disciplined enough to always provision machines, install packages and deploy
your app via config files, this should be very much achievable.

~~~
ransom1538
Hey just curious:

With a dockerfile is there any point to using Ansible/Puppet/Chef any more? I
have used those tools to keep 'how machines are installed' in code in the
past, but with dockerfiles these days I don't get the point anymore. Seems
like the future will be having a few dockerfiles (staging, prod, dev, db, etc)
then just finding a place to run them (digital ocean, docker swarm, aws
container service - or your macbook).

~~~
bdcravens
You still have to bootstrap the instances/droplets/etc with Docker,
Kubernetes, etc before Dockerfiles can be used. Unless you're starting with
images where those are already baked in, Ansible/Chef/Puppet are good ways to
accomplish this on a clean OS.

~~~
zimbatm
It's debatable. Ansible/Chef/Puppet themselves need to be bootstrapped as
well.

Bash is perfectly fine if the machine can be installed with less than 100
lines of code and immutable infrastructure is being used (so no need for
idempotent operations). Just make sure to add ShellCheck to the CI pipeline.

~~~
icebraining
_Ansible /Chef/Puppet themselves need to be bootstrapped as well._

Nope, Ansible just needs SSH access.

~~~
zimbatm
Not when using AutoScaling Groups. Then the machine will be provisioned on
boot and therefor ansible would have to be installed on the machine.

~~~
icebraining
No, you'd use Ansible to prepare the new image once, then you'd use that image
to scale. Bootstrapping everything each time a VM is created doesn't make much
sense.

------
robotmay
I've been playing with kubernetes for the past month and I'm just now deciding
not to go with it for our new production systems, mostly because I just don't
understand it well enough to know how to fix it if it goes wrong.

There's a lot of cool things about kubernetes (e.g. I had an automated SSL
cert fetcher for LetsEncrypt that applied to any SSL ingress I added) but it
still does some weird things sometimes (like constantly trying to schedule
pods on instances without enough spare memory, and then killing other pods
because of that; fairly certain that's not supposed to happen).

I think I'll revisit it next year and hope that it's a bit easier to get into.
I'm especially hopeful about using it with Spinnaker and some sort of CI,
though I couldn't find anything lighter weight than Jenkins that was
straightforward to get set up on it.

~~~
jcastro
> constantly trying to schedule pods on instances without enough spare memory

I assume you're explicitly declaring the memory requirements for the
application?

EDIT: I ask because usually I read examples just throwing deployments at a
cluster, ends up the more explicit you are with the app requirements (CPU/mem,
etc.) the better a job the scheduler can do. I realize this sounds like
advanced common sense.

~~~
robotmay
Aah yeah that might have helped. I hadn't been setting my memory requirements
as I didn't technically know what they were :D

That's good to know for next time!

~~~
jcastro
There's a great example on why setting limits is a good thing from one of
Kelsey Hightower's talks, for some reason your initial comment triggered
remembering this advice:
[https://youtu.be/HlAXp0-M6SY?t=21m51s](https://youtu.be/HlAXp0-M6SY?t=21m51s)

~~~
lobster_johnson
Requests/limits settings also decide what "quality of service" class a pod
will use.

Setting requests and limits to the same values puts it in "Guaranteed",
whereas setting requests and limits to different values gives you "Burstable".
The default QoS class is "BestEffort", which makes the pod expendable. (This
isn't the best-documented part of Kubernetes.) (Edit: You can see the QoS
class of a pod with "kubectl describe pod <name> | grep QoS".)

QoS classes are important for scheduling. When Kubernetes needs to evict pods,
it will pick burstable and best-effort pods first.

Note that setting memory limits can sometimes not work well with apps that use
GC. I have had particularly disappointing experiences with Go apps. Go's GC is
a bit odd in that it reserves a huge chunk of virtual memory (which is why Go
apps, if you look at "vsz" only, often seem to take more RAM than expected),
and is not very aggressive about releasing it. I have a Go app that uses only
a few megs actual, but because of the GC will allocate half a gig and get OOM-
killed by the kernel before the GC is able to collect.

~~~
xkarga00
You have confused BestEffort with Burstable. Agreed about the docs.

~~~
lobster_johnson
Gah, of course I did. Thanks, edited.

------
cyphar
I would recommend updating this to describe ReplicaSets[1] over
ReplicationControllers. They are very similar and serve the same purpose, but
the huge difference is that ReplicaSets have selector support -- meaning that
you can require N replicas of pods that have X selector (rather than requiring
N replicas of pods with the exact same spec).

[1]:
[https://kubernetes.io/docs/concepts/workloads/controllers/re...](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/)

~~~
pcthrowaway
I tend to agree, as ReplicaSets are, as the k8s docs describe, the 'next-
generation replication controller', though I couldn't find a place where they
state that RCs are in fact deprecated. The section on service discovery
constructs an example using RCs, and I'm guessing other sections do as well,
so the change may be non-trivial.

~~~
smarterclayton
RS is still beta though, so while they're the future, RCs aren't going
anywhere for a long time. In general use RS (or even better, deployments) and
don't worry about it.

------
matart
I host several smallish PHP/HTML sites for family, friends, and a few clients.
Is Kubernetes a viable solution. These sites get very little traffic.

What I am looking for is: \- Ability to easily deploy containers \- Ability to
route by url \- Ability to swap out containers without affecting others

Does Kubernetes solve this problem for me? Is there a better option?

~~~
bharani_m
I am also interested in knowing this.

Also, can Kubernetes be used to deploy to a single VPS instance (e.g: a 10$
Digital Ocean droplet) or is it only for a multi-node system like GKE?

~~~
cgag
You can use GKE with a single node if it's big enough (it won't let you do it
with the absolute smallest instance). I do it because I just like the k8s
model for deploying apps. If any of my sideprojects deployed there ever start
to matter I'll scale it up.

------
rhizome
Might be helpful to have some text that explains the bare list of jargon on
the front page, is the idea that readers should already know the lingo? I have
thoughts about what's going on here.

~~~
devrelm
Right. I don't even know what Kubernetes is — a link to their homepage[1]
might've be nice.

[1] [https://kubernetes.io/](https://kubernetes.io/)

~~~
collinmanderson
I feel like if it were all on one page, I would at least skim through the
examples, but I don't want to click through to each one.

------
AdrianRossouw
We've been exploring openshift and minishift for a project for the last few
weeks, and we've come back very impressed.

We especially like the interface they built that ties everything together.

------
jalfresi
So, trying to get a clear handle on just what Kubernetes is - its basically
supervisord but for containers, across a pool of servers? With the addition
of:

\- A complete application e.g. wordpress + mysql containers, can be
represented as pods

\- Pods can be "scheduled" e.g. auto-scaled, across "nodes" e.g. servers, with
load balancing etc

Is that right?

~~~
bkeroack
More generally, Kubernetes is an abstraction layer between your application
and hardware/cloud that allows you to declaratively define what your app is,
how it runs, where it runs and what dependencies it needs. All in a
standardized format that allows versioning and easy modification.

As opposed to the old-school method of having to describe all that in words to
an ops team (or not and just expecting them to figure it out): "This is a Java
app. It needs JDK 1.6 and at least 1GB of RAM. It expects to write logs at
/var/log/foo.log. It needs a MySQL database and Redis and Elasticsearch at
configured hostnames and ports. We need to run at least 6 instances
horizontally scaled behind a load balancer."

~~~
rhizome
_" This is a Java app. It needs JDK 1.6 and at least 1GB of RAM. It expects to
write logs at /var/log/foo.log. It needs a MySQL database and Redis and
Elasticsearch at configured hostnames and ports..._

Wow, brittle. Can't Java use environment variables? And since when does
development tell ops what hostnames a machine should have, not to mention how
many instances to run?

Sorry, don't mean to jump all over you, but your words apparently jumped all
over me. :)

~~~
StevePerkins
> since when does development tell ops [deployment details]?

Not to be cheeky, but since "devops"!

Docker, Kuburnetes, etc... all of these things live on the point of
intersection between developers and operations. Some of the config artifacts
(e.g. dockerfiles) typically live in source control, which is usually the
domain of developers. But the values with which they're populated are
typically set by operations.

It is indeed a dance that the two groups have to work out among themselves,
and different organizations will handle it differently. Ultimately, I think
you find that it really isn't FUNDAMENTALLY different from the dance that they
already do in the old school. There will always have to be a handshake, where
dynamic values are stored someplace and code knows to point to that place. How
you handle that handshake is ultimately a human process thing.

~~~
rhizome
I'm familiar, thanks. Keeping both ops and app in the same repo is one thing,
but combining parts of them in code is a pretty amateur Separation of Concerns
flaw and will inevitably break. I'm not sure what you're defending here.

~~~
dalailambda
The app can still use environment variables and be completely independent (12
factor), but the Dockerfile/Kubernetes config still needs to provide those
things, so there is a clear distinction between ops and dev, they're just more
mingled than previously.

~~~
rhizome
I think maybe you didn't read the top-level comment in this branch of the
thread?

------
beat
Thanks for this. As much interest as there is in kubernetes right now, it's
surprising how little good documentation there is.

~~~
hellbreaker
I just bought Kubernetes in Action (MEAP) last week. I would highly recommend
this book. It clearly spells out what it is and how docker containers are just
units within a distributed system. Compared to trying to figure out how to
properly do networking with Docker Compose, Kubernetes is clearly thought out
and much easy to use and reason with. The final version comes out in August.

~~~
technofiend
I was going to ask for some stock market tips since KIA isn't supposed to be
published until August 2017, but I see that MEAP means there is an early
access version of it that gives you incremental updates. Thank you for the
pointer.

~~~
raesene9
I'd second the recomendation for KIA, I've been following along with the MEAP
and it's being kept well up to date with the latest version, which came out
yesterday, covering features from 1.6 like RBAC.

One of the problems with books about things like Kubernetes which move quickly
is that they can be well out of date before they hit final release.

~~~
lhuser123
Thanks for the recommendation. Will take a look at KIA.

------
amq
I've found Docker Swarm Mode to be refreshingly simple after playing with
Kubernetes. Am I crazy to have it in production?

~~~
raesene6
From what I've seen of both, Docker swarm mode is smaller in scope and simpler
than Kubernetes.

I think some setups will suite swarm better, whilst others will benefit from
the richness of what k8s provides.

------
dominotw
A replication controller (RC) is a supervisor for long-running pods.

A deployment is a supervisor for pods and replica sets

So what's the difference between these supervisors?

~~~
teraflop
They basically serve the same purpose, but Deployments move more of the
"state" into the Kubernetes API controller.

For example, the standard way to do a rolling update using RCs is to create a
new RC that is responsible for the updated pods, and then gradually
increase/decrease the replica counts to reach the desired state. This is
conceptually simple, but the downside is that the Kubernetes API doesn't know
anything about the relationship between the old and new controllers. All the
responsibility is pushed to the client.

With Deployments, both the old and new configurations are first-class objects.
So you can view the history of previous configurations, and you can query the
progress of a rolling update. You also get better-defined behavior when
multiple clients are trying to concurrently make changes, because Kubernetes
can arbitrate between them at a higher level.

------
philip1209
For figuring out how to write pod and service specs - I really like looking at
the Helm Charts source code:

[https://github.com/kubernetes/charts/tree/master/stable](https://github.com/kubernetes/charts/tree/master/stable)

(Helm aims to be a package manager for kubernetes, and its packages are called
Charts)

------
nikon
It'd be great to read something about how people are handling logging in
production with K8S+ELK for example.

~~~
mdaniel
I don't know how much it qualifies as "reading," but we've experienced great
success using a DaemonSet of [https://github.com/rtoma/logspout-redis-
logstash#readme](https://github.com/rtoma/logspout-redis-logstash#readme)

Because Kubernetes is great about applying docker labels, we get the k8s
container name, Pod name, Pod namespace, UID, and then the normal docker
metadata provided by logspout-redis-logstash. Then use the normal, and
essential IMHO, multi-line codec on the logstash side of things:
[https://www.elastic.co/guide/en/logstash/5.4/plugins-
codecs-...](https://www.elastic.co/guide/en/logstash/5.4/plugins-codecs-
multiline.html)

We have a few ``if [docker][image] =~ "foo"`` statements to snowflake the
types of multiline split patterns, but all in all it just works.

The next level up the hierarchy of needs is to _also_ grab the systemd journal
content from the Node itself and send that along, too, but it has not yet
become a priority. Not to mention the likely substantial increase in store
size once the much, much chattier kubelet traffic arrives in ES.

------
bogomipz
This is a great! Thanks Openshift team.

------
mjak
I've managed to get kubernetes up and running on Azure but stuck besides that
on what the next steps are

Anyone have resources they'd like to share for me to pick up the basic
requirements to get a docker stack up?

Up to now I've used docker compose

------
traderalex81
We just recently decided to go with Openshift Origin after a long debate and
POC's. We're currently using Mesos/marathon with a bunch of custom deployment
scripts which are terrible, deployment issues etc..

------
skyisblue
Anyone have experience with AWS ECS and Kubernetes? How do they compare?

~~~
puzzle
I briefly evaluated ECS more than a year ago and it was lacking even more
features than Kubernetes did. It has improved quite a bit since then, but
Kubernetes has added even more functionality in the meantime. In a nutshell,
if you run more than a few services, Kubernetes is probably the better choice.

------
cookiecaper
First, please consider whether you _actually need_ any of the Kubernetes
stuff. The odds are that you don't. A huge number of people are switching to
k8s just because it's the cool thing to do, without understanding any of the
implications. Most companies have to undergo major software rearchitectures
and renovations to make good use of the featureset that Kubernetes promises.

Second, _please_ don't run a database in it, omg. See point one. If there were
just one application that is not reasonable to run in a container, it would be
a database.

Third, yes, this seems to be the new way things happen in the software world.
I'm worried about what tools we'll need to develop to overcome the cacophony
of noise and half-solutions that is incumbent in the GitHub era.

~~~
eicnix
Why shouldn't you run databases on it? Ive been running larger
databases(10tb+) on it for some time.

If your database can deal with failure of single node and uses network storage
it's totally doable to run databases on Kubernetes.

Take a look at Patroni on how to run Postgres on IaaS or Kubernetes.

~~~
cookiecaper
Database servers are generally designed from the ground-up to utilize
basically the entire machine and to be left running for a long time, with
caching and memory utilization techniques designed around these assumptions.
VMs muck with this somewhat but at least the memory region is reserved for
that VM's usage. Docker eliminates that, and k8s eliminates the knowledge even
of what type of hardware/memory the system is being executed on (without
taking several contrived steps to restore this).

Database servers are also designed on the assumption that each node will be
available on a consistent basis, at a consistent address, and that the data
directory will be available to the server as soon as it starts up (except for
rare node bootstrap operations).

This is the _exact opposite_ of what Kubernetes/Docker seek to provide, and in
fact, such things can only be provided within Kubernetes by extensive special
configuration that leans heavily on experimental features.

There could not be a _worse_ ideological match. Yes, you can try really hard
to ram that square peg through that round hole, eventually pushing it through
with significant damage to both the peg and the hole, but _why would you_?

~~~
xkarga00
> This is the exact opposite of what Kubernetes seek to provide

Not true. You can have dedicated machines for your db instances, check out 1.
node affinity 2. Statefulsets.

Statefulsets are designed to provide consistency (no "split brain"), stable
network identity ("at a consistent address"), stable storage identity ("and
that the data directory will be available to the server as soon as it starts
up"). Those features are beta but it's a matter of time (hardening and the
likes) before we tut them stable.

> why would you?

For all the same reasons you would run any other workload on the same
platform.

~~~
cookiecaper
The reason to run any other workload on Kubernetes is to dynamically schedule
your containers across a cluster of anonymous hardware resources, and to
provide automatic monitoring, recovery, and control of those containers when
certain events occur. The goal, essentially, is to abstract the
hardware/system-level element from the deployment element. That's all well and
good, but applications that make certain architectural assumptions do not lend
themselves well to random scheduling across an anonymous array of system
resources. Databases are absolutely among that class of applications, as are
many other application types (which are now called "stateful" applications).

For k8s to work well for an application, that application has to be anonymous,
without any masters or controllers. It has to be able to tolerate the sudden
vaporization of any member. It has to be willing and able to share host
hardware with any number of other services, including some which may be bad
neighbors/CPU hogs, and it has to be content to be rescheduled in the event
that a node goes down, that a pod is killed to ease the transition, etc.
Databases fail on virtually every point of this.

If the database and Kubernetes start from opposite design paradigms, why does
it make sense to run a database inside Kubernetes? I am still not getting it.
`kubectl delete pod/my-postgres-pod` is not a smart thing; you don't want any
of that scheduling magic that Kubernetes provides.

At most you would want Kubernetes to tell you that your database is failing
health checks and execute the STONITH process to failover to replica, but you
hardly need Kubernetes if all you care about is process monitoring.

So can you elaborate on the reasons? Databases are simply not designed for
this kind of infrastructure and I see no value in trying to pretend they are.
Isn't this the reason that CockroachDB exists, so that people can finally run
their DBs in something like Kubernetes without endless headaches?

I think it would be very interesting to analyze the stability and performance
features of a PgSQL k8s deployment, PgSQL VM deployment, and PgSQL bare metal
deployment. The only issue is that you can't expect the failures to be open
with their data.

