
Talos: OS for Kubernetes - stevetodd
https://www.talos-systems.com/
======
rainyMammoth
It took me a lot of reading before understanding that this is an OS on which
you run a Kubernetes node//master and not a so-called OS on top of Kubernetes.

Somehow When I read "OS for Kubernetes" I thought it was yet one more layer of
abstraction on top of Kubernetes.

~~~
andrewrynhard
Ahh thanks for the feedback. One thing is clear from all this is that we can
do a better job in documentation.

~~~
cwojno
I wish these sales-type pages had a "click here if you're an engineer." The
page would contain a buzzword-free, straightforward explanation of what the
product's purpose is, what its major features are, and what interfaces are
exposed for control and flow.

I have passed on countless products when evaluating solutions for having very
flashy sales pages with no actual description of the product itself. "Improve
your ROI! Decrease Downtime!" aren't features, per se, but potential benefits
that can only be determined if the product is a fit, and I can only determine
fit if I know that the hell it does.

/rant

~~~
andrewrynhard
As an engineer I understand this completely. We are already discussing
internally how we can improve our site and documentation. Our goal is to
create a vibrant community and getting all the sales fluff out of the way is
something I support!

------
jaytaylor
The actual code for it may be more interesting and informative compared to
OP's (somewhat sparse) link:

[https://github.com/talos-systems/talos](https://github.com/talos-
systems/talos)

------
andrewrynhard
Hey folks, Talos creator here. Happy to answer any questions you guys may
have. Sounds like some confusion about exactly what Talos is. A lot of good
feedback here that we will take and improve our documentation.

Talos is a Linux distribution built specifically for Kubernetes. The short
version is that we have stripped out absolutely everything that is not
required to make a machine a Kubernetes node, including SSH and console access
(I will explain why).

Here goes the long version. We have done a number of things to improve
security, including a read-only filesystem, except for what the Kubelet needs
(/var/lib/kubelet, /etc/cni, etc.). It runs entirely in RAM from a Squashfs,
and only Kubernetes makes use of a disk. We have stripped SSH/Console access
and added a gRPC API that gives engineers the ability to debug and remediate
issues.

We didn't just stop at this. We are writing everything, including the init
system, in Golang, which allows us to integrate deeply with Kubernetes.
Everything about Talos is API driven.

Some of the highlights include:

\- SSH/console access replaced with gRPC API that is secured via mutual TLS.

\- Immutable. Immutability prevents drift, making the cluster consistent
across the board.

\- Automated upgrades that can be orchestrated in an intelligent way. By using
Kubernetes events, and our API, we can roll out upgrades from an operator
(currently a WIP and planned for release in 0.3 Talos) and do safe in a safe
manner.

\- Cluster API (CAPI) integration that allows rapid creation of Kubernetes
clusters using Kubernetes style declarative YAML.

\- Support for AWS, GCP, Azure, Packet, vSphere, Bare Metal, and Docker. The
experience for each is consistent, making it easy to reason about Talos
regardless of where you run it.

\- CIS and KSPP security configuration enforcements.

\- Keeping current by supporting the latest and greatest version of
Kubernetes, while writing upgrade paths into the system.

\- Support for local Docker based clusters, easily created using our CLI. This
is super useful for creating CI pipelines where you might want to run
integration tests against the same Talos/Kubernetes versions running in
production.

\- Installs and upgrades are performed via containers.

We feel that by removing SSH/console, making the core of Talos read-only, and
treating the nodes as ephemeral machines, we are creating a much more secure
way to run Kubernetes. A really good talk was given on these ideas at Blackhat
this year: [https://swagitda.com/speaking/us-19-Shortridge-Forsgren-
Cont...](https://swagitda.com/speaking/us-19-Shortridge-Forsgren-Controlled-
Chaos-the-Inevitable-Marriage-of-DevOps-and-Security.pdf). We feel we align
with the recommendations made there.

In addition to security, we envision a system that will be self-healing and
intelligent. By having an API and integrating with Kubernetes, the sky is
really the limit on the tooling we can build to create this self-healing
system.

Our goal with Talos is to allow engineers to more or less forget about each
individual node. Managing the OS alongside Kubernetes is a lot of work.

I will address the questions and comments as replies. Feel free to ask more as
a reply to this comment.

Feel free to join our meetings every Monday and Thursday at 17:00 UTC on
[https://zoom.us/j/3595189922](https://zoom.us/j/3595189922). Also, join our
slack and I'd be more than happy to talk some more about Talos!
[https://slack.dev.talos-systems.io](https://slack.dev.talos-systems.io)

~~~
alex-mohr
Generally seems like a great offering!

I see immutable, but also upgradable? Is that via in-place upgrades or do
upgrades require a reboot?

Example: severe bug or vulnerability in kubelet or containerd/docker. Can I
use the API to roll out a fix to existing nodes such that running workloads
have no disruption?

~~~
ztjio
The whole point of Kubernetes is that you don't think this way. Replacing a
node is not an impactful event if you're using K8S correctly.

~~~
andrewrynhard
I agree with this to an extent. There are certainly places where replacing can
be expensive. For example, bare metal, or if the machine contains a large
amount of data and moving that data to a new node is time consuming.

~~~
hbogert
Your storage should be separated from the worker nodes. Unless you have some
hyper-converged setup, then you make the deliberate choice that your node
became special. (sorry for using the term hyper-converged)

------
cat199
while an 'os for kubernetes' would be interesting, the documentation doesn't
seem to really explain how:

"Talos lets you treat the cluster as the machine, so you can focus on your
applications instead of managing the OS."

all the examples seem to basically be kubectl, etc. commands with different
syntax. How exactly is this an 'operating system' and not 'yet another
kubernetes build / deployment utility'?

Maybe it's there. I don't see it from a cursory glance (read: clicked 10-20
pages) of the docs. Or, curmudgeonly old me is expecting the term 'operating
system' to mean something else? idk. Happy to be corrected.

~~~
andrewrynhard
The idea is that since

\- we go to great lengths to make the OS secure and immutable

\- we have an API

\- we will have automated upgrades

We can allow those who are operating clusters to care far less about the OS.
Managing SSH, packages, auditing requirements, etc. at the host layer is a job
in itself. We aim to remove that concern and allow you to focus on Kubernetes.

~~~
OJFord
What if it's mainly but not only k8s that you want installed - wireguard for
inter-node networking for example, or something to support GPUs/other
hardware?

~~~
andrewrynhard
We're working through a plugin system that might allow for things like this.
GPUs/other hardware will obviously not be pluggable, but we have had requests
for both and we are interested in adding support for them. We have really
powerful tooling that makes building kernels with modifications simple.

~~~
gtaylor
Is this plugin system going to end up being a package manager?

~~~
andrewrynhard
It is not. The specifics are still a WIP to be honest, but the plugin system
will almost certainly have a scope to it.

------
jacques_chester
If there are any folks here from Talos, I think there's a lot of confusion
about what it is, where it fits in the stack and so on.

I think part of the confusion comes from the scarcity of diagrams. I
personally also find it helpful to have design motivations laid out. There is
a list of "capabilities and benefits", but it would help me to understand what
the current state of the world before Talos is added to it.

~~~
andrewrynhard
Hi. I am writing up a response now :)

~~~
utopian3
FYI: Talos is a trademarked name. In tech. So is Telos. This will be very
confusing and likely cause you issues

~~~
peterwwillis
Things that are called Talos from a basic web search: 1) Cisco's security
group, 2) an energy company, 3) an Elder Scrolls character, 4) a
cryptocurrency, 5) a high school website, 6) some sort of data science thing,
7) body armor manufacturer, 8) a recurring Marvel Universe character, 9) a
biochem program, 10) a secure IBM workstation, 11) a beer dispenser, 12) a
Swiss management consulting firm, ...

------
outworlder
OS for K8s? I'm getting CoreOS vibes.

Sounds neat, but is it really needed? K8s doesn't really care about what it is
running on. Not sure what problems it would solve. And might create others
(compliance?)

~~~
weberc2
Last I checked k8s required you to have swap disabled and some cgroups
settings configured. As I recall, installing Kubernetes was quite difficult,
and having something that's ready to go sounds great.

~~~
stuff4ben
Installing Kubernetes has gotten easier since the time you last looked at it
then. Having used Kubeadm and then RKE from Rancher, I can get a 3 node (non-
HA) cluster up in under 10 minutes. This is on regular RHEL 7.4 machines when
I last did this in a non-automated way.

~~~
kgilpin
How difficult is an HA cluster? Because that’s more of a fair test. Setting up
a dev-only instance of anything ought to be easy.

~~~
meddlepal
I build and maintain large-scale Kubernetes infrastructure for a living the
way I put it is: easy to setup, somewhat more challenging to maintain.

It's not rocket science, but like any complex computing tool it requires
dedicated attention especially if you are going to run more than a handful of
clusters. A lot of the tooling in the ecosystem falls flat here, it all solves
the Day One: Getting Started problem but often punts on Day 2: Operations and
then once it realizes it is actually a problem hamfistedly engineers a bolt-on
solution.

So back to my point: Provisioning and setup easy... maintenance moderately
more complex.

~~~
andrewrynhard
Agree 100%. Part of the value add we feel we bring is going to be an OS that
keeps pace with Kubernetes. Also, automated upgrades. Making the maintenance
of a cluster a little easier over time.

------
asdfe
Security wise sounds a bit light. I like the idea but I'd be more comfortable
with a more security-first approach instead of "we are infra people" as we
know how that usually ends.

I'll be following the project closely as I think the idea is good.

~~~
andrewrynhard
I think you might be pleasantly surprised about our approach to security.
Security is very machine a priority and built into the OS.

~~~
asdfe
That's great! Keen to see your threat model and architecture.

------
streetcat1
So this is more like terraform, just automatic.

This is basically an implementation of the cluster API sig (which is also
being promoted by VMWARE).

~~~
andrewrynhard
We indeed integrate with CAPI, but the OS itself is not a CAPI implementation.
We have a CAPI provider that works hand-in-hand with the OS.

~~~
streetcat1
Yes, I read the code.

I am not sure that you want to call it an OS per se, since kubernetes itself
is the OS (manage memory, schedule processes, etc).

So there is no default CAPI provider?

~~~
andrewrynhard
We have our provider here: [https://github.com/talos-systems/cluster-api-
provider-talos](https://github.com/talos-systems/cluster-api-provider-talos)

Perhaps OS isn't the right thing to call, but I don't know a better
alternative :D

~~~
mattigames
You said it yourself: "Linux distro", it may not be the best for the marketing
side but it's way easier to understand for engineers

------
ypcx
Tried it, but the overhead of running a few nodes in a single local VM was
larger than running those nodes in multiple local VMs.

(I'm using this to run multiple local VMs:
[https://github.com/youurayy/hyperctl](https://github.com/youurayy/hyperctl))

~~~
andrewrynhard
Please file an issue on GitHub and we would be happy to look into this.

------
ggm
Can I put a pod back inside, with SSH, and use it to kubectl exec my way into
an interactive state?

I get why the "outside" has no SSH, but I would not preclude the existence of
a shell inside, busybox, you're back in the land of the UNIX living and all
that entails.

~~~
andrewrynhard
You absolutely could. The rootfs is read only so what you can do is limited.
Kubernetes also landed support for ephemeral containers in 1.16:
[https://github.com/kubernetes/enhancements/issues/277](https://github.com/kubernetes/enhancements/issues/277).
It still needs more work but the idea is that you will be able to attach a
debug container (you image of choice) to a pod on the fly.

------
cosmotic
You had me at "a modern"

~~~
blotter_paper
Compared to all those crusty old Kubernetes operating systems that popped up
in the '80s and '90s, Talos is a breath of fresh air!

------
_RPM
"legacy operating systems"

------
Keyframe
I understand it’s about k8s, but I’d like to see pros/cons vs OpenShift /
namely, OKD.

------
zapita
Does anyone have a tldr; of how this differentiates from Linuxkit, RancherOS,
and CoreOS?

~~~
andrewrynhard
Very similar but few key differentiators:

\- no SSH/console

\- has an API

\- not for general use

\- runs from squashfs in RAM

\- has a custom init system

------
9nGQluzmnq3M
Can somebody give me(/HN) a TL;DR of how this differs from existing OSs
targeted for Kubernetes, like GCP's Container-Optimized OS?

[https://cloud.google.com/container-optimized-
os/](https://cloud.google.com/container-optimized-os/)

The big difference seems to be removing SSH. I understand the theoretical
rationale, but in practice it seems like this would complicate troubleshooting
quite a bit. Yes, maybe you shouldn't be SSHing into prod ever, but how do you
keep the environment consistent if you do want to allow SSH in dev/test?

~~~
andrewrynhard
We have added an API to help with the practical issues in removing SSH. In
doing that it has also opened up interesting opportunities in automation that
we are currently fleshing out.

The difference here is that Talos is purpose-built for Kubernetes. What that
means is that we will pour resources into automated upgrades paths for our
users. Tighter integration with Kubernetes, where we envision a self-healing
system that makes use of the Kubernetes and Talos APIs to make decisions.

Also the things I mentioned in
[https://news.ycombinator.com/item?id=21066732](https://news.ycombinator.com/item?id=21066732)

