
Kubernetes clusters for the hobbyist - pstadler
https://github.com/hobby-kube/guide
======
raesene6
This guide makes an interesting choice with regards to etcd security, which
I'm not sure I'd go with.

etcd stores a load of sensitive cluster information, so unauthorised access to
it is a bad thing.

There's an assumption in the guide that you have a "secure network" and
therefore don't have to worry about etcd authentication/encryption. The thing
is if you have a compromised container (say) and that container, which has an
in-cluster IP address can see your etcd server, then it can easily dump the
etcd database and get access to the information held in it...

Personally I'd recommend setting up a small CA for etcd and using it's
authentication features, there's a good guide to this on the CoreOS site
[https://coreos.com/etcd/docs/latest/op-
guide/security.html](https://coreos.com/etcd/docs/latest/op-
guide/security.html)

~~~
pstadler
This is great input and definitely something worth considering.

Related issue on GitHub: [https://github.com/hobby-
kube/guide/issues/6](https://github.com/hobby-kube/guide/issues/6)

~~~
raesene6
no worries, I've got some more info. on that on my blog
[https://raesene.github.io/blog/2017/05/01/Kubernetes-
Securit...](https://raesene.github.io/blog/2017/05/01/Kubernetes-Security-
etcd/) which may be of use.

------
_rp6i
The second question:

> Choosing a cloud provider

This really annoys me about Kubernetes. Essentially _all_ the official
documentation is about how to select a cloud and let a cloud-specific tool
magically do everything for you. There's no procedure for setting up a single
host for development purposes or to have a Dokku-like personal PaaS.

This guide is super useful because it avoids all the magic and lets you set
things up properly (despite assuming you're doing it on a cloud) and
potentially even do it on a single host.

~~~
pstadler
Thanks for that. This is one of the few comments in this thread that truly
captures the idea behind this guide.

I've grown quite a thick skin since I exposed the first project of mine to a
wider audience. But still, it's feedback like yours that keeps me going.

~~~
sagichmal
Let me also chime in and say I deeply appreciate the tone and level of
technical detail in this... repo? report? guide? It's precisely the sort of
thing that's been infuriatingly absent from the Kubernetes community for far
too long. If this or something like it had been released with 1.0 we'd be a
lot further along than we are now, I think.

------
dkarapetyan
I like the juxtaposition of the words "hobbyist" and "kubernetes cluster".

------
oblio
It seems that a proper Kubernetes setup is the modern day equivalent to the
proper email server setup of the 90's or the 00's :)

~~~
erikb
Yeah, the community follows the wrong approach. It's not so different what two
services need in features. There should be an end-to-end solution. But each
tool, including kubernetes, only delivers something that is not even 100% of
one feature and hopes that someone else comes up with a solution for the other
features.

Usually in my day job I have to work with these tools, and currently have a
stack of 5, that mostly have incomplete documentation, zero explanation of how
they actually solve the problem and nearly zero ability to debug (e.g. what
value have kubernetes logs and events. Usually when you have a problem is when
you have no kubernetes logs yet/anymore and the events only tell you what you
already know). Now I'll probably need to learn another one, considering these
three options for storage.

At the weekend, where I'm mostly trying to relax and have physically only
2/7th of the time to alot I learn the basics behind containerization, e.g.
namespaces, cgroups, virtual network adapters, iptables. And I feel in this
small time slot I make a lot more progress at getting to that end-to-end
solution that people actually need.

The example I'm using is wordpress+mysql. It's a simple thing that covers >70%
of what anybody wants to deploy. And on-premise with kubernetes+docker it's
still not possible without hacks (e.g. for volumeclaims and logging), after, I
don't know, 4 years of Kubernetes? I bet in spending 2 years worth of weekends
any normal person can come up with something better.

\---

Re missing features, examples for Kubernetes:

A) Why does Kubernetes not solve the networking part? If I have a cluster and
containers that may run in different places, of course the tooling I use to
maintain that cluster needs to ensure that containers can talk to each other.
There can be an abstract API and the option for other people to write plugins,
but the core needs to come with one solution, that mostly works and is
debuggable when not.

B) CrashLoopBackoff. Why did no Kubernetes developer get the idea that this
state may require any kind of log/debugging?

C) Why does kubernetes assign random ports to services and not provide a
simple way to retrieve them? Of course I can get them after I've learned the
JSON api. But usually that is not considered a solution but hacking. I really
don't care what the service is running on, I just want to use it.

D) Why do I need to manually say how many containers should run for each
service? There are very distinct options a user may need. E.g., most services
should run 1 instance and replace it if it dies. If it needs reliability I
want to define how reliable it should be and accordingly the cluster should
decide if it needs 2, 3 or 5 instances. And lastly I want to run stuff on all
my nodes. Each node with a container. That is not even possible afaik.

E) Most kubernetes tools are not working well with environment proxies. For
instance kube-proxy shell tool will completely bug out. But surprise, clusters
make a lot of sense in enterprise environments, and enterprise environments
have proxies. It's also not a hard setup for testing. A raspberry pie can be
your home networks proxy.

F) on premise storage solutions, considering that a restarted container may
not run on the same host.

G) Since not much is really running right now we didn't run into that problem
yet. But it's totally possible that the whole cluster runs out of resources. I
haven't seen any piece of info about the general cluster status and when I
need to increase or replace infrastructure.

Honestly if I don't have all these things solved, am I really better of using
Kubernetes or writing my own scripts? I think it currently is about equal in
effort. And if that's the case my own solution has the huge advantage of being
under my control and allowing me to learn a lot of new things.

~~~
eicnix
It seems you have not discovered the wide possibilities of kubernetes yet :)

How are you setting up your cluster?

A) Kubernetes abstracts the networks solution so people can choose between
multiple implementations. Some people already use openstack and want to use
their openvswitch network for their containers others want to use a pure
container network like flannel. Kubernetes distributions usually come with a
integrated network solution.

B) You can still retrieve the logs from a container is CrashLoopBackOffState.
You can even retrieve the logs from the previously failed container by using
the `kubectl logs <container> \--previous`. Applications can write information
about their failure to /dev/termination-log which can be used to debug the
failure.

C) You can define the port yourself otherwise kubernetes defaults to a random
port to avoid port conflicts. The recommended way to expose HTTP services
would be by using ingress.

D) You can run an instance on every node by using a daemonset.

E) Are you talking about outbound traffic from your containers to an external
system? You would need to configure this in the container engine and the
container itself. I had little problem doing this in an enterprise environment
that required an http proxy for all external communication.

F) You can attach you existing SAN solution over iSCSI, fibre channel or even
NFS. Another solution would be to run a distribution storage like ceph or
glusterfs in kubernetes for kubernetes. You then provision persistent volumes
that are attached to the node your pod is running on. If your pod is
rescheduled the volume will be moved too.

G) If you have resource requests/limits set kubernetes will not schedule your
pods if you have no resources available.

~~~
erikb
I'll read all of the details you provided. Thanks a lot. It may be lack of
knowledge in our team, not kubernetes. Sorry if I was too frustrated and
blaming the tool instead of my lack of knowledge.

Re proxy: Not only that, but also. Let's say you have single instance
deployment, and run kube-dashboard. You want to access the dashboard on
localhost:8080 so you start the kube-proxy shell command. What actually
happens is that the go code underneath kube-proxy will redirect the request
through your $http_proxy, even if localhost, the dashboard's internal address,
and your hosts external address are in the $no_proxy environment. And if your
network proxy doesn't allow the dashboard's port you get a HTTP 403 instead of
the dashboard.

Re storage: We have a cluster with 10+ nodes, each with 5+ disks a 10 TB. How
would you make sure that your software talks to the correct disk, after it
gets restarted and may end up on another node?

Re limits: The highest goal is not avoiding ressource exhaustion, the highest
goal is service continuity, though. It can all run on 100% and die. no
problem. I just need to know that I should exchange the first few
disks/processors/machines before the customer facing service slows down.

~~~
eicnix
@Proxy I am not sure what the issue is that you are facing. kubectl proxy
should create a tcp-proxy from your loopback device to the pod. So there
should be no http_proxy involved. If you think a kubernetes component does not
respect the no_proxy setting you should create an issue.

@storage If you want to move your disks with your containers you need an
additional storage system(SAN, cloud provider or distributed storage in
kubernetes). Then you create persistence volumes in kubernetes that references
a disk from the storage provider. This allows you to assign this disk to a pod
with a persistence volume claim. The disk that is linked to the container
through persistent volume and persistent volume claim will be moved on the
same node the container is scheduled. If you want to run stateful workloads on
kubernetes I would advice you to use a storage system. You can use local disks
but you loose some of the flexibility that you gain from kubernetes by tying
your containers to certain nodes with the data. There is also work being done
on improving the handling of local storage to treat it more like a resource
and introduce separation by using local volume per kubernetes local disk
volumes.

------
bryanlarsen
Why are you doing all of this stuff manually? There are several providers that
will set all of this stuff up automatically for you. I like the Kismatic
toolkit
([https://github.com/apprenda/kismatic](https://github.com/apprenda/kismatic)),
but there are a bunch of others. Sure, maybe once you go to production you'll
want to install manually so that you have everything finely tuned the way you
want, but learn it by using it rather than trying to have to figure things up
front.

Or even better just use GKE for development / learning purposes. Just stop the
cluster when you're not using it, and it'll be a lot cheaper than something
you won't want to take down because you spent days installing it.

~~~
irontoby
Because Kubernetes is a complex beast with many moving parts, and learning
about all those moving parts becomes more and more important as your usage
grows.

Personally I've used Stackpoint.io to provision some small clusters but I was
very excited to see this project because deploying my own cluster from scratch
is next on my todo list. Kelsey Hightower's "Kubernetes the Hard Way"[1] is
the canonical go-to reference here but it's also very daunting so this looks
like a great middle ground.

Let's face it, even today the k8s docs can be quite sparse sometimes or gloss
over the details, so knowing how all of the pieces work from the ground up can
be a big help. Plus, you prevent vendor lock-in when whatever automated tool
you're using doesn't solve your use-case or decides to start charging a lot of
money.

[1] [https://github.com/kelseyhightower/kubernetes-the-hard-
way](https://github.com/kelseyhightower/kubernetes-the-hard-way)

~~~
fndrplayer13
I agree with this assessment. We decided to do "kubernetes the 'sorta' hard
way" by leveraging the Saltbase installer with some level of customization and
full control via terraform of how our infrastructure was being allocated. I
think its valuable to learn what the tool is doing if you have to maintain it.
When something breaks, an upgrade has issues, or you need to better understand
the system to make a decision, I feel that you gain a lot in setting up a
system yourself. I think you'll be more likely to know precisely where to look
to debug things. You also get closer to the tool which makes it easier to
contribute back into the community. You also get the benefit of making your
own infrastructure decisions. Yes k8s can provision ELBs and EBS volumes (and
their equivalents in Google Cloud, Azure, etc) as well as autoscale nodes via
a cluster addon, but the big moving pieces, such as instances, VPCs,
Networking, etc, remain well-defined in Terraform or some other infra-as-code.
That means that you can decide how to deploy that etcd cluster, how it gets
backed up, whether or not its encrypted at rest, etc. Generally speaking, we
just value the level of control and insight that we get out of controlling the
stack definition ourselves. To some extent that may be antithetical to the
purpose of k8s, since the goal of the project overall seems to be
simplification and centralization of best practices of deployment.

With all that being said, kops is an incredible tool (as are others) and we
used it to learn about the system and test some of the functionality for
ourselves. Can't recommend it enough.

------
fndrplayer13
Great set of resources -- I just went through the process of defining a
terraform cluster in AWS over the past few weeks, though I'm leveraging the
k8s Saltbase installer for the master and nodes.

I'm curious, why no mention of AWS as a provider for roll-your-own? Is this a
cost thing?

Also, I get the feeling that Ubuntu is _not_ a first class citizen of the k8s
ecosystem, but perhaps my newness to the ecosystem is to blame here. The
Saltbase installer, for example, only supports Debian and RHEL distros, `kops`
prefers Debian, and the documentation for cluster deployments on kubernetes.io
and elsewhere also seems to be somewhat suggestive of Debian and Core OS.
Perhaps thats just a mistaken interpretation on my part. I'm curious what
other peoples thoughts on this topic are!

~~~
dustinkirkland
Ubuntu is _absolutely_ a 1st class citizen in the K8s Ecosystem!

The front page of [https://kubernetes.io/docs/](https://kubernetes.io/docs/)
has a bullet that links to a super simple way to deploy Kubernetes to Ubuntu
on any of [localhost, baremetal cluster, public cloud, private cloud]!

See:

* Installing Kubernetes on Ubuntu: Deploy a Kubernetes cluster on-premise, baremetal, cloud providers, or localhost with Charms and conjure-up.

------
gtirloni
I'm surprised a hobbyist K8s administrator is not choosing to use kubeadm
instead.

[https://kubernetes.io/docs/getting-started-
guides/kubeadm/](https://kubernetes.io/docs/getting-started-guides/kubeadm/)

~~~
cweagans
[https://github.com/hobby-kube/guide#installing-
kubernetes](https://github.com/hobby-kube/guide#installing-kubernetes)

> There are plenty of ways to set up a Kubernetes cluster from scratch. At
> this point however, we settle on kubeadm. This dramatically simplifies the
> setup process by automating the creation of certificates, services and
> configuration files.

------
ramshanker
I just had my first read of Kubernates. Looks doable. Time to jump on the
bandwagon.

------
paukiatwee
Really great resources! I was working on my own version of k8s setup scripts
using Ansible, and I will definitely use this guide to improve mine.

~~~
daviddumenil
Can I ask what you thought of the k8s roles in Ansible Galaxy?

------
guiriduro
Great timing, I was wondering to myself about the feasibility of a 10€ cluster
on scaleway last week.

------
bryanlarsen
I found gluster-kubernetes quite simple to install. But the install
instructions do assume that you're going to be giving it it's own partition,
which you would be doing on any sort of real production deployment.

------
empath75
You can spin up a cluster on gce in a couple of minutes.

~~~
ralmeida
There are a couple reasons to do it manually and/or outsite GKE, notably:

1) Cost. In VPSs like Digital Ocean/Scaleway, there's usually a large network
out transfer quota included in the price, which just isn't there with GCP,
where you pay for network usage on a metered basis.

2) Learning. Although you can defer most of the heavy work to GKE, it's still
good to understand the moving parts so you can make better choices as you
grow.

------
tuco86
exactly what i was looking for! Eureka!

------
jug5
Good to see the author using wireguard as an additional network security layer

