
Bare Metal K8s Clustering at Scale - tdurden
https://medium.com/@cfatechblog/bare-metal-k8s-clustering-at-chick-fil-a-scale-7b0607bd3541
======
zbentley
To everyone asking "why would you ever do this", ask them, or look for videos
of the talk/etc., but don't assume the decision was poorly thought through.
That seems prejudicial at best (and I say that as someone that regularly
pushes back on web software folks I work with trying to build huge-company-
scale orchestration layers when they don't need them).

I saw these guys talk at QCon. It was a fascinating talk, and an excellent
example of SRE adaptability and nonstandard, uncommon innovation given unusual
constraints.

Not speaking for them, just from my memories of the talk and the following
Q/A, but their reasons for this stack were primarily:

\- They couldn't run in the cloud because connectivity to their sites is often
terrible.

\- They mostly ran IoT stuff from the k8s clusters--automated kitchen
equipment like fryers and fridges, order tracking/status screens, building
control systems, and metrics aggregation so they can see how businesses are
doing.

\- Because of the bad connectivity, a "fetch"/"push" (from the k8s clusters at
the edge) model was needed for deployments/logging/administration/getting
business data back up to the cloud.

\- They explicitly did _not_ process payments.

\- k8s was used primarily for ease of deployment and providing a base layer of
clustered reliability for pretty simple services. Since the boxes in the
cluster were running in often-unventillated racks/closets full of junk in
random restaurants, having that base layer was very important to them. Other
solutions were evaluated and they chose k8s after consideration.

\- Unlike typical IoT/automation setups here, they wanted to be able to
experiment, monitor, and deploy software without the traditional industrial
control practice of "take shit down, flash your controller (call a tech if you
don't understand that), spin it up, and if it breaks you're down until we ship
a new control unit or you manually fail over to a backup".

\- However, they didn't want to fall into the IoT over-the-air update security
pitfalls (it would really suck if someone hacked your fridge's temperature
control system and gave a week's worth of customers salmonella). As a result
they spent a ton of time making very good (and simultaneously very simple)
deployment/update authorization and tracking tools. They chose the "pull"
model and keying/security layers explicitly to avoid having to think about
tons of open remote-access vectors and/or site hijacking.

\- The k8s tooling (and some of their own) allowed easy, remote rollbacks to
"default/clean state" in case something went wrong, which was critical given
that downtime might compromise a restaurant and having a "reset button"
automated in was important for ease-of-use by nontechnical, overworked site
managers.

\- The clustering allowed individual nodes to fail (which they will, because
unreliable environments), and people to manually yank ones with confidence.

\- While, as some commenters pointed out, the leader (re)election system
chosen might be unacceptably slow/randomized for, say, a cloud database, it is
perfectly sufficient for failing over a control system in a restaurant. A few
seconds of delay on an order tracking screen, or a system reboot/state-loss of
in-flight orders is vastly preferable than some split-brain situation making
the restaurant accidentally cook 1.25x the correct number of sandwiches for
hours, to go to waste.

It's important to understand their use case: they needed to basically ship
something with the reliability equivalent of a Comcast modem (totally
nontechnical users unboxed it, plugged it in, turned it on, and their
restaurant worked) to extremely poorly-provisioned spaces (not server rooms)
in very unreliable network environments. For them, k8s is an (important)
implementation detail. It lets them get close to the substrate-level
reliability of a much more expensive industrial control system in their sites
(with clustering/reset/making sure everything is containerized and therefore
less likely to totally break a host), while also letting them
deploy/iterate/manage/experiment with much more confidence and flexibility
than such systems provides.

I think this is a great story of using new tools for a novel (or at least
unusual) purpose, and getting big benefits from it.

Brian, Caleb: great talk, great writeup. Sorry HN is . . . being HN. Keep at
it.

Edit: QCon talk summary is here: [https://www.infoq.com/news/2017/07/iot-edge-
compute-chick-fi...](https://www.infoq.com/news/2017/07/iot-edge-compute-
chick-fil-a). If you have any employees/friends that went, they should have
access to the video. It may be made public at some point, too.

~~~
oblio
> Brian, Caleb: great talk, great writeup. Sorry HN is . . . being HN. Keep at
> it.

I don't think HN was super vicious. They presented an out-of-the-box solution
to a problem but they didn't define the problem fully. Based on what we saw,
their solution seemed way overkill.

Glad to hear that there was a solid reason behind it, not just hype and
recruiting buzz.

------
reacharavindh
Wish the article had more context for readers.

These days it feels like everybody needs to throw in Kubernetes at everything
introducing complexity for the sake of being cool.

I guess those of us that likes to run non-distributed software for small scale
applications are the new grumpy grey beards....

~~~
chx
> for small scale applications

I am a grumpy grey beard no doubt but I still maintain: most websites do not
need more than a single server -- certainly not more than a single database
server. And, for most, a few hundred dollar dedidcated server is aplenty.
Apply YAGNI until blue in the face.

~~~
dvanduzer
A five minute outage of the point-of-sale system during the lunch rush can
easily cost even the smallest of restaurants several hundred dollars.

True, _most websites_ do not have this problem, because most websites do not
drive revenue like that. There are _plenty_ of use cases where you need five
nines, but only within limited not-24/7 time windows.

~~~
chx
Two servers... still doesn't need Kubernetes.

~~~
coffeesn0b
It's actually 3 nodes, with plans to expand out to more based on workload.

------
zimbatm
This type of deployment is a perfect fit for NixOS. Immutable deployments with
zero configuration drift, easy rollback and options to both push and pull the
system configuration updates. It's also easy to customize the system to the
hardware unlike CoreOS or Rancher while providing pre-built binaries of all
the dependencies.

Setting up a single-node kubernetes is basically adding one line to the system
config:

    
    
        services.kubernetes.roles = ["master" "node"];

------
beepbeepbeep1
The intetesting bit they miss detail on is why they are running k8 at the edge
in restaurants.

The only reason i can think of that is they get to push point of sale software
out by using K8s from some central system. I cant think of a worse use/abuse
of k8 as a software updafe system if that's what they are doing.

The other reason is they distributed their compute and resturants pay the
power bill but that sounds just as silly.

Curious to know why you would use k8s at the edge

~~~
ealexhudson
They made a comment about having some kind of IoT infrastructure in each
restaurant.

It absolutely smells of over-engineering, though. There are a lot easier ways
of pushing software out than maintaining k8s locally; and they're almost
certainly going to need to build a system which manages and monitors all these
clusters...

~~~
coffeesn0b
We deal with a challenge of frequent network outages or latency issues... keep
in mind that we have locations out in the middle of nowhere with no QoS. There
are a variety of loads at the restaurant that require low latency and high up
time. On site K8s clusters were a natural fit for that solution.

Trust me, it would have been way easier to just hook this stuff up to the
cloud :-P . I still dream that we will be able to some day.

------
fredsted
I'm looking forward to the article detailing _why_ they decided to do this.

~~~
FrancoisBosun
I'm pretty sure this is related to being able to continue running the
applications even if the venue loses Internet access. You can't stop
processing orders if your Internet access has a hiccup.

------
roncohen
This caught my eye: Home made leader election protocol that relies on UDP.

~~~
madmax96
This one was kind of troubling, if you ask me:

    
    
        >If the leader ever dies, a new leader will be elected
        >through a simple protocol that uses random sleeps and
        >leader declarations.
    

Why not have each node self-generate a UUID and engage in some gossip process
that ends with the cluster becoming aware that some node's corresponding UUID
is uniquely significant, therefore recognizing that node as a leader?

I have some really bad memories of "random sleeps" at scale.

~~~
kevin_nisbet
Well, from the sounds of it they're not running at scale (always 3 nodes), and
it sounds similar in principal to the way HSRP/VRRP works which is a well
defined and understood protocol for doing leader election on a local network.

I suppose the question might be why not use VRRP itself, but if this works for
them and has conflict resolution I don't think it's all that troubling.

------
alainchabat
Is anyone has a solution/tool to run easily kubernetes on a single bare metal
server? Kubernetes or anything other docker container "orchestration" tool.
Tried to google (certainly wrong keywords), and found some quite complex
process, or maintained tools that are mainly for aws/gcp

~~~
kryptk
Minikube works great to get your feet wet, but it's not suitable for anything
except a playground.

~~~
dboreham
Minikube is a VM, so not the bare metal solution requested.

~~~
rraghur
minikube start --vm-driver=none doesn't use a VM AFAIK

------
yanslookup
This is great. Does anyone have guides on how to do the cluster creation
bootstrap on public clouds where you don't get a known DNS name ahead of time
and master nodes may come and go? Ie I want to bake an AMI and create an ASG
so that we can turn it on and it will self cluster, create certs, etc and can
add and remove nodes at the whim of the ASG.

~~~
coffeesn0b
We haven't open sourced how we do this yet... we have an MVP way of doing it
by using Ansible to provision the NUCs, and nmap (please don't laugh!) so that
they can find each other on a specific virtual network at the restaurants.

We're replacing a lot of these solutions with "better ways" over the next
weeks and months, but I'd be happy to share how we went about it. You can
contact me on LinkedIn:
[https://www.linkedin.com/in/calebrhurd/](https://www.linkedin.com/in/calebrhurd/)

The biggest key was that we use RKE for the clustering/certs on bare metal.
That's definitely our secret sauce (pun intended).

------
stuff4ben
This is the first I've heard of RKE as a K8s installer. I always just thought
it was another name for Rancher 2.0. Would love to see a good comparison of
Kubeadm vs RKE. This article briefly mentions kubeadm and that they didn't
choose it.

------
Symmetry
I went into this thinking they were using old AMD K8s clustered into a budget
supercomputer.

------
alexmorse
I don't understand why at all you would do this for a restaurant

What challenge is this addressing, what problem does this solve? Is there a
problem to solve here?

I do assume there's a good reason for this, but as presented it seems like a
very stupid waste of money.

~~~
danpalmer
It sounds like a huge cost saving to me. Being able to install a few dumb
machines in the restaurant and then have remote installation and management of
applications running on them would be great. I imagine that kubernetes would
be more reliable than PXE booting images across the internet (as that often
requires physically rebooting machines which requires involvement of the
restaurant staff, will be error prone, etc), not to mention that building
bootable images with your software on is not a very modern practice.

Bear in mind that in terms of cost, this is competing with a person driving to
each restaurant and fiddling around with computers for an hour, which is a
very expensive process.

~~~
oblio
> not to mention that building bootable images with your software on is not a
> very modern practice.

1\. Why not?

2\. Who cares if it's not modern if it does the job?

And they wouldn't even need to make a special app, they could just make it a
webapp ergo make a 1-time image with a browser...

~~~
danpalmer
> 1\. Why not?

It's becoming more common to distribute applications with orchestration
software like Kubernetes. The technology around PXE booting is quite old, and
mired in enterprise cruft.

> 2\. Who cares if it's not modern if it does the job?

Developers love new tech, especially if they can get a Medium post out of it.
This doesn't make it a good reason of course, but if this is the tech that
more developers are familiar with, that's a good reason.

I personally wouldn't want to learn how to boot 6000 remote machines off built
disk images over the internet, I'd rather use the skills I already have around
Ansible or learn Kubernetes.

> And they wouldn't even need to make a special app, they could just make it a
> webapp ergo make a 1-time image with a browser...

I've never been to a Chick-fil-a, but if the setups are anything like my local
McDonalds, that's a complex 5 screen setup showing a fluid mix of static
images, videos, animations, and applications, not to mention that other stores
have different setups/layouts/display types/etc - I don't think you'd be able
to _reliably_ do this in a browser. My guess is that it's a multi-screen aware
wrapper around video components and web views. That will need re-deploying
regularly I would imagine. And that's not to mention the kitchen ordering
system, the self-service machines, the tills, etc.

On-site machines totally make sense, smart applications deployed locally,
frequently, make sense.

------
robert_foss
As a note, Chick-fil-A is notoriously anti-LGBTQ.

[https://thinkprogress.org/chick-fil-a-still-anti-
gay-970f079...](https://thinkprogress.org/chick-fil-a-still-anti-
gay-970f079bf85/)

~~~
majewsky
Without taking a political stance here, how is this relevant to this
submission?

~~~
owly
I assume to counter their appeal to work there at the end of the article.

