
The sad state of sysadmin in the age of containers (2015) - xg15
https://www.vitavonni.de/blog/201503/2015031201-the-sad-state-of-sysadmin-in-the-age-of-containers.html
======
ex_amazon_sde
Ex Amazon here. Most grumpy system engineers did not disappear: we got hired
by Google/Amazon/etc to build large-scale infrastructure... and sometimes sell
it back to you as a service.

Believe me or not, most of the underlying infra does not run on the popular
technology of the year. Far, far from it. That's why it works.

Modern devops, with its million tools that break backward compatibility every
month sometimes becomes the running joke at lunch.

~~~
spiderjerusalem
What fascinates me about this is, and sorry for being morbid, but what happens
when y'all die? Does knowledge of the lower levels of the stack go away with
your generation, or will there be enough of us young ones picking the
important stuff up?

~~~
scurvy
It's a legit concern. There was a NANOG panel about this exact thing. I
believe the quote was, "Take a look around. We're all old and greying. We have
a severe pipeline problem." And then much to AWS' dude's dismay, the topic
shifted towards blaming cloud services because no one takes the time to learn
how any of this works any more.

Want to guarantee your child's future employment? Don't just teach them to
code (the machines will do that). Teach them how to build networks and truly
understand network protocols.

~~~
setquk
I’m going to teach my children how to navigate the world of insane Harry-
Potter-esque rules which all IaaS/PaaS platforms enforce upon you. They will
become software language lawyers and be masters of the electric Disney dollar.

You know like “ahh don’t call the messaging endpoint more than 800 mega-milli-
times per mega-nano-second or it will cost you three bazillion CPU credits,
but only on three and a half cores which will starve all your instances, issue
an invoice and proceed to melt your credit card.”

~~~
w8rbt
DDOS is _now_ a billing issue.

------
loteck
The clearest explanation of why this happens is at the end:

 _Before, admins would try hard to prevent security holes, now they call
themselves “devops” and happily introduce them to the network themselves!_

1) The merging of devs into the sysadmin role was a product of: the work of
sysadmins (particularly systems change control and security compliance) not
being valued in our culture.

2) Devs delighted to be free of the shackles placed upon them by sysadmins who
were encumbered by the concerns expressed in this article.

If you were a devop who resolved to fix the problems bemoaned in this article,
my guess is you would turn around in 60 days to discover you'd become a
sysadmin.

~~~
agentultra
I recall the idea of "devops" from this book:
[https://landing.google.com/sre/book.html](https://landing.google.com/sre/book.html)

The stated goal of putting both systems administrators and software engineers
on the same team is to reduce friction and increase communication. One of the
worst, productivity-killing situations you can find yourself in when
developing network software and services is caused by the traditional "old
school" mentality of separating the two camps. When your software developers
operate independently of your systems engineers and administrators they're
forced to make assumptions about infrastructure, operations, and compliance
goals. Both teams have the same goals so why are they not on the same team? I
think some "old school" system administrators don't realize how costly such
communication mistakes are. Getting 6 months into a development project to be
told you cannot have a critical piece of infrastructure _for reasons_ is a
costly, costly mistake.

Containers are a smart solution to the build problem. Don't build your
containers from public, un-trusted images! Build your own images. Run your
own, protected, registry. You still have all of the compliance and validation
necessary and you don't end up debugging failed builds because one machine out
of a thousand is running on some minor shared library version not supported by
your software.

~~~
notyourday
> Don't build your containers from public, un-trusted images! Build your own
> images. Run your own, protected, registry. You still have all of the
> compliance and validation necessary and you don't end up debugging failed
> builds because one machine out of a thousand is running on some minor shared
> library version not supported by your software.

You have just lost all the speed to production advantages of containers.

~~~
mst
"speed to production" is not meant to be the primary advantage of containers.

"knowing exactly what you're running and being able to reproduce it" is meant
to be the primary advantage of containers.

What you're basically saying is "if your container system admins do their job
properly rather than throwing security and reliability out of the window, it
can take a bit longer than not bothering". This is trivially true, but not
really the point agentultra was making.

------
mr_tristan
This is definitely a rant that obscures the underlying point: the introduction
of _untrusted_ or _unreliable_ network resources, frequently hidden in a
string of dependencies.

I'm baffled how often I see an someone throw this sort of craziness - "go
fetch this thing from some random third party" \- into very important places,
such as the startup procedures of a container. It's something I see in a
culture of the two person startup just trying to get something out the door.
It's definitely "technical debt", and frequently, it won't get removed. Thus,
you try to scale up to meet load, and all these new instances go time out on
the same external resource that's randomly having problems... boom! At the
worst possible time. Never mind the potential huge security gaps.

But the specific _tools_ aren't the issue here. It's the culture of "ship
something now we'll deal with fallout later". A lot of people start using
Docker and won't ever look at the Dockerfile, or, will add a Maven dependency
and won't even check licenses or security updates for _any_ of the transient
dependencies.

Cloud technologies and containerization make everyone just think "we can do
things so fast now" and never, ever pay attention to details that can come
back to bite you.

On the flip side, it's a good time to be in cybersecurity; because this
cultural problem will never, ever, get solved. :)

~~~
borplk
At the end of the day it comes down to the fact that businesses just simply
don't care (Equifax etc).

They like the idea of security and that's where it ends.

In many places if you try to "do things right" you will get fired in two
months for being too slow/strict and they will happily replace you with a
clueless easily trusting person who "goes and fetches things from random third
parties".

Many times they get lucky enough to survive and they don't appreciate the
risks that they took. That pace becomes the expected norm and sets the theme
in the industry.

And when shit hits the fan the PR person writes a "we are oh so very sorry ..
security is totally our number one priority" blog post. They blame and fire
the poor bastard and replace him with another warm body.

When it comes to these "hidden" things like security companies do not reward
and also punish "doing things right" so on average and over the long term we
end up where we are today.

When the culture sufficiently shifts towards being sloppy you will get
hammered down quick if you try to be the voice of reason because it ends up
being you vs everyone else (the norm).

~~~
commandlinefan
> just simply don't care (Equifax etc).

And, honestly, why should they? Security breaches have yet to hurt an actual
company (they hurt users plenty, but not the organization that's actually
responsible).

~~~
mr_tristan
Data breaches are climbing in cost to organizations. Here's a claim, that in
2017, the average breach costs $3.62 million.
[https://www.scrypt.com/blog/average-cost-data-
breach-2017-3-...](https://www.scrypt.com/blog/average-cost-data-
breach-2017-3-62-million/)

(I've seen similar claims in different ranges. Costs of breaches in the US are
pretty high - over $7 million.)

Even Equifax probably wants the 3-4 billion in valuation it's lost since the
breach.

The solution appears to be buying tools to avoid and respond to breaches
quickly, instead of engaging and building in security awareness. (Microsoft's
security development lifecycle comes to mind.)

IMO, both approaches are likely cost effective, though I have no numbers or
research to back that up.

------
rdsubhas
As a "major theme", the author takes:

> Consider for example Hadoop. Nobody seems to know how to build Hadoop from
> scratch. It’s an incredible mess of dependencies, version requirements and
> build tools.

And as the major introduction to the blog post:

> I’m not complaining about old-school sysadmins. They know how to keep
> systems running, manage update and upgrade paths.

Huh? Old-school sysadmins know how to keep systems running, manage updates and
upgrades. At the same time nobody knows how to build Hadoop from scratch. At
the same time, Hadoop build instructions themselves have curl|sh scripts or
mirrors and the wiki page is outdated. And it uses Java (and thus maven/ivy).
And that downloads the internet.

According to the blog, Hadoop, maven/ivy/sbt/any dependency manager, package
managers, and everything is broken. But the tagline is:

> This rant is about containers, prebuilt VMs

What does any of this have to do with the "Age of containers" and pre-built
VMs? Is the author just talking about Gentoo/LFS-style "compile the whole
system from scratch"?

This feels like an incredibly rushed rant. I can only envision the author
requiring to setup hadoop for the first time, breaking their head for a few
days (it happens), and taking it out on everything.

~~~
emilsedgh
I think the logic is, if we didn't rely on Containers and prebuilt VM's,
Hadoop had to be easier to build to be useful.

~~~
badloginagain
The point everyone seems to be missing, and the one I think most important, is
that we're no longer building from trusted sources.

Build systems just download and run random code from the internet without
verifying that its the correct code, from the correct source.

Its a ticking time bomb.

~~~
dullgiulio
There is SSL/TLS, unless it's done wrong (invalid certificates get ignored by
the dependencies manager), it's safer than the old "md5 of the file" systems.

Now, some dependencies are fraudolent (especially true in the Javascript world
because it eventually targets a lot of user browsers), but nobody ever checked
the sources anyway...

~~~
cesarb
TLS only verifies that have connected to the correct server. It can't verify
whether the package on the server has been replaced by a malicious one. For
that, you need a "md5 of the file" (these days, a sha256, because md5 has long
been broken).

~~~
dullgiulio
You need to make sure the hash is also not tampered, both on server and in
flight to the user. How do you do that?

If the answer is: use TLS, there is no point in having the file hash at all.

~~~
dijit
No, the answer is to use PGP and a manifest hash.

This is how package managers work. TLS doesn't replace those.

------
pc86
> _None of these "fancy" tools still builds by a traditional make command._

Is there anything more "get-off-my-lawn" than "These tools don't use the thing
I like!"

~~~
megaman22
But I just don't understand why we have to have 47 half-built over-complicated
build systems or job runners or whatever the new fad term is for every
language, when there's something that does what they all do, is battle-tested,
and has been around for decades.

Everyone repeat after me. Makefiles are not scary. I can write a shell script.
Do I really need to learn grunt/gulp/webpack/npm/rake/fake/maven/gradle/ant
and on and on and on?

Probably somebody has released another one in the time it's taken me to write
this comment.

~~~
eropple
Makefiles aren't scary. But they're also not particularly good.

I use Rake (or Gulp, or whatever) because _then I can use Ruby_ (or
JavaScript, or whatever). Shell plumbing is fine for informal and small-scale
stuff, and I make my code conform if somebody down the line (who may be me)
wants to get out their duct tape, but the world is more complex than what
/bin/sh can see. Shell is the lowest common denominator. Expecting everything
to at all times be written in and for that lowest common denominator is not
reasonable. We're a tool-using species and we refine tools over time to make
them better. The profusion of tools happens because they iterate on each other
to be better. If old tools were sufficient, people would use them because
learning new ones is hard.

So, yes, you do need to learn those tools. Or invent a shell that isn't tooth-
pullingly difficult to use with a JSON file (and do _not_ say `jq`, I love
`jq` as an inspector but it does not step to `JSON.parse` and a working
subscript operator). Or change `make` so that a git checkout won't trigger a
full rebuild. Lots of baseline, stump-simple things that `make` is just not
going to do for you because it's built for a frankly outmoded method of
development.

~~~
adamc
Your second paragraph got to the heart of it. If we want to use some standard
build toolchain, it needs to use a nice language and not feel obscure. I was
explaining to someone a bash script I wrote, and he said "why not use Python".
There were reasons but... he was right, Python would be much easier to use and
maintain, and we have a lot more developers who know it.

That said, Maven is incredibly suck-tastic.

~~~
eropple
Eh. It's not my favorite thing out there, but Maven's fine for what it is.
It's designed for and explicitly for _well-behaved_ Java artifacts. If your
Java artifacts are not well-behaved, you're going to have a bad time--in my
experience, most of those cases are doing things you probably shouldn't be
doing.

(You may be a wizard and have a reason to do them, for sure--but that's what
writing Maven plugins is for. Or not using Maven. You've got choices.)

~~~
clhodapp
Given the limitations of the platform, there really isn't a such a thing as a
well-behaved JVM library that depends on other libraries, unfortunately.
Oracle _really_ dropped the ball by only serving their own needs with the
module system.

~~~
eropple
Can you expand on this? Having done a pretty decent bit of JVM development,
I've never really run into issues even doing some not-out-of-the-box stuff.

------
Animats
If you thought having to deal with old COBOL programs was a problem, it gets
much worse. When there are 10 year old containers in production, and the parts
to rebuild them are long gone, then you have a real problem.

There's a reason that Google has internal systems which can and do rebuild
everything from source.

~~~
kazen44
This is actually a very important point that gets ignored a lot.

What will happen if public container images are no longer available for
building your application?

Your in a for a world of hurt if a major dependency of your application
doesn't work anymore because someone pulled the image.

~~~
nunez
The right answer is to use your own repository and pull those images in-house
where they can be scanned and verified, I.e. , run a container from that
image, scan it using security tools, then action from there.

You can also kind-of recreate a Dockerfile from an image using docker history:
[https://stackoverflow.com/questions/19104847/how-to-
generate...](https://stackoverflow.com/questions/19104847/how-to-generate-a-
dockerfile-from-an-image)

In the general case, it’s the same as a Ruby Gem or NodeJS module going away
from rubygems or npmjs; once its gone, its gone

------
jchw
Hadoop is a rather extreme example... It's bad, but not everything nowadays
is. Many newer pieces of software install entirely from source with one
command.

Also, this is not at all endemic of containers, there's simply zero
connection. Dockerfiles tend to be very simple and easy to reason about. The
most complex application I have is around 50 lines of Dockerfile, and that's
mostly just made more complicated to arrange things for the best layer
caching.

I suppose we're supposed to believe that this is somehow worse than the days
of debugging m4 macros and autotools just to get a build that doesn't work.

~~~
gaadd33
Does that Dockerfile build a container that only has that app in and it's
required dependencies? Almost all of the ones I've seen given as examples seem
to have an entire copy of the OS in.

~~~
jchw
A container is basically a glorified chroot, so there is a few things not
strictly needed, but I typically use Alpine as a base system, which has a
shell, some core utils, and musl libc in just ~5 MB. Since it's on its own
layer, it gets deduplicated both in build and at runtime with other Alpine
containers (and many Dockerhub images have an Alpine option.)

That being said, since Go binaries have no inherent dependencies, I have
indeed made Docker images containing exactly one file: the Go binary. These
containers are basically the same as fat binaries, with the benefits of Docker
scheduling and networking at runtime.

~~~
seabrookmx
> I have indeed made Docker images containing exactly one file

To anyone reading this, you need some magic compiler flags in both Rust and
GoLang to make sure it's a statically compiled binary (doesn't dynamically
link against GNU lib-c).

But yes, this is super neat. I also like how it reads in the docker file:

FROM scratch ...

~~~
jchw
Ah yeah, without CGO_ENABLED=0, you'll get a very cryptic error when the ELF
binfmt can't find the linker binary...

Never tried it with Rust, but I look to using Rust in the future, so I guess I
better find out what the flags are for Rust.

Sidenote: It's often useful to have ca certs and timezone info. At that point
it's probably not a bad idea to just use Alpine and apk add those things.

~~~
steveklabnik
By default, Rust compiles all Rust code statically, but the standard library
depends on a libc. If you want to use MUSL, you can. If you bind to C
libraries, you may need to configure it or not, it depends on how the wrapper
is written.

~~~
seabrookmx
Thanks for the clarification!

There was a recent HN article that did a step-by-step using rust+musl on
Alpine. The "why" makes more sense now.

------
cwyers
> And since nobody is still able to compile things from scratch, everybody
> just downloads precompiled binaries from random websites. Often without any
> authentication or signature.

Apache has official mirrors that host repo files for various package managers
so you can install using apt-get or whatever it is that replaced yum (dnf?
dnf):

[https://www.apache.org/dyn/closer.lua/bigtop/bigtop-1.2.1/re...](https://www.apache.org/dyn/closer.lua/bigtop/bigtop-1.2.1/repos/)

So precompiled binaries from official sources are certainly available.

~~~
bovermyer
I think the author's point was this:

"Unless you compile it yourself, you can't trust it."

~~~
cwyers
There are over 2.9 million lines of code in Apache Hadoop alone, not counting
dependencies. If you can't trust Apache, you can't trust Hadoop, regardless of
whether or not you can compile it yourself.

~~~
lbenes
There are nearly 10 million lines of code in libreoffice, and yet I can and
have built it from source just by typing:

1\. $ git clone git://anongit.freedesktop.org/libreoffice/core

2\. $ apt-get build-dep libreoffice

3\. $ ./autogen.sh && make

Just because something has a large code base doesn't mean we shouldn't be able
to build it from source ourselves.

~~~
lostcolony
Did you read all those lines yourself? Did you even confirm checksums matched
before running them?

I think that's the parent's point. You can build from source, but how do you
trust the source? Is it any more egregious to trust a prebuilt binary from a
specific website than it is the raw source? If you can't trust the binary
being hosted by the author/caretaker, can you really trust the source being
hosted or maintained by the author/caretaker?

~~~
pjz
I don't think his point is so much about the source as it is about updating N
containers. For instance, say there's a known libssl bug. Can you tell how
many of your containers are running that version of libssl? And how do they
get updated?

~~~
WiseWeasel
1) List the number of containers running pre-fix versions of images of libssl-
using server software. 2) Bump the version of the images you're using as a
base for your server images to post-libssl-fix and push.

------
kokey
I think part of the problem is that Docker is the now popular answer to the
problem of 'it works on my machine'. Unfortunately 'works on my machine' also
involved going to the web site of a particular tool or library and following
the steps recommended for quickly trying the tool out, which gives things like
curl [http://somewebsite.com](http://somewebsite.com) | sudo bash situation or
following the steps in a blog post where someone quickly compiled a bleeding
edge version of it, with all the dependencies to build it, on their Ubuntu
laptop.

------
Fritsdehacker
I work for a hosting company and we host our own Openstack based public cloud.
We have a hard demand that for production systems we build all binaries we use
from source. We actually build these in docker and use that to deploy to
production.

What I'm trying to say is that the one doesn't exclude the other.

And we actually use make quite extensively.

I do however see the ops point, building from source hasn't gotten easier.

~~~
Fritsdehacker
In the case of Openstack though, the community made it really easy to build
docker containers from source.

------
endymi0n
I think the only reasonable answer is "it depends".

After the new updated base OS images for our systems after Meltdown and
Spectre were out, it literally took us ten minutes of human and about an hour
of machine time to recompile all of our containers, run the tests and deploy
them to production on our Kubernetes cluster, replacing the old insecure ones.

At the scale of our systems, any grumpy sysadmin would have spent at least
several days untangling the dependencies and carefully restarting all servers
in the correct order after some manually `sudo apt-get`ing (and probably
forgetting a few of the lesser used systems).

Sure, typing "FROM ubuntu" (which by the way _is_ a trusted and
cryptographically image, contrary to the OPs concerns) leaves me at the mercy
of whoever I trusted compiled that image.

Then again, what difference does it make to trust whoever compiled that
ubuntu-17.04.iso image I put on my CD?

Or, as Brian Tracy taught me to say in these situations: "You may be right."

~~~
kazen44
> At the scale of our systems, any grumpy sysadmin would have spent at least
> several days untangling the dependencies and carefully restarting all
> servers in the correct order after some manually `sudo apt-get`ing (and
> probably forgetting a few of the lesser used systems).

if you are operating at suchs a large scale, your sysadmin should be
automating things (not necesserly with docker mind you).

This automation has been possible on *nix for the better part of a decade by
now.

~~~
mmt
Agreed. As a (potentially) "grumpy" sysadmin, I wouldn't be manually apt-
getting or forgetting anything. In fact, one of the lessons I've learned is
that, at large enough scale, automated mass upgrades can do the forgetting for
me, so verifying (also automated) is important.

As for any "tangle" of dependencies that may exist, I've never seen that be
caused by any choice made by "Ops" but, rather, solely by those writing the
appllication code being deployed. As such, it would apply just as much to a
Docker image as to (what I view as) a traditional deployment.

------
komali2
I _do_ often think about how, by using large JS packages, it's very feasible
that in the daisy chain of NPM dependencies, somebody's managed to slip in
malware.

I'm not sure what to do about it other than just not using NPM at all!

~~~
Can_Not
Checkout from npm retire and auditjs (there's also more non free options).
You'll likely get some false positives (such as, who would host jQuery on a
public/untrusted cdn? or why would you even pass user input into that?), but
if a real malware actually showed up, you can find out as part of your CI
process. Using the lockfile provided by yarn/npm is also a good way to reduce
accidental unnecessary package updates.

------
nwmcsween
This has been my complaint from day one. Instead of docker have something like
ports or pkgsrc and simply create tools that simplify sandboxing, like cgexec,
a google kafel -> ebpf filter then all the package manager has to do is well
package. Docker IMO is a mudball of concerns that need to be separated.

~~~
gm-conspiracy
BSD jails?

~~~
kazen44
jails are actually very mature if you compare them to docker. They have a well
working security system and sane networking. (docker just does endless NAT
abstraction, which is terrible for certain use cases. Not to mention it breaks
a ton of useful networking features)

~~~
gm-conspiracy
I was asking a coworker the other day about using containers with a "shared"
mapped filesystem subdirectory for UNIX file socket communication versus
encapsulating everything in a TCP/IP network stack, but my coworker was
concerned about the security risks of a mapped filesystem on the containers.

I sometimes feel like a crazy person.

------
voltagex_
The rantiness and constant NSA references obscure some really good points.

------
_Kristijan_
"Feels like downloading Windows shareware in the 90s to me."

Bull's eye!

~~~
sofaofthedamned
That really is a superb quote.

------
sonaltr
not to be pedantic but why should containers be patched for security issues?

the entire point of containers (and orchestration systems) is the ability to
push updates without downtime.

Just update with a newer container....am I missing something?

Also about container security - a strict process internally can easily help
counter that (I believe Shopify had a nice talk about it at the Google Cloud
Platform event in Toronto - everything from using trusted images, running only
signed images and going through a security check for each layer).

EDIT: To add to it, please don't patch containers - the entire point of an
"image" is that if I run it locally and on my datacenter - it should behave
the same - live patching them just voids this concept.

~~~
claviola
How many people do you see setting up a deploy pipeline that includes pulling
security updates into the base image and redeploying as needed? In my
experience, it's _much_ more common to see docker images that have been
untouched for months with zero accountability of what exactly is running
there.

~~~
secabeen
That's my real concern: old, out of date images. How will we handle another
OpenSSL-level vulnerability in 7 years, with bad code buried in containers
that haven't been updated in 4, and for which the build infrastructure is no
longer functional?

~~~
Spivak
This really isn't that different from having some pre-built statically liked
app still kicking on your system with the source and/or build tooling long
gone.

There aren't really easy answers here. You can't fix bad software with more
tooling.

------
wwweston
Perhaps interestingly, I've had some similar complaints about package managers
like homebrew.

I've noticed over the last decade that a certain level of knowledge about
building from source seems harder to find via search or engaging in a
community forum/chat. The assumption is often that _everyone_ will be using
the package manager (I'd say that's _especially_ true on macOS, but it might
be an artifact of me spending more time there since it's been my most used dev
machine), so if you have trouble building something, the answer will
frequently be "just use homebrew -- whoever maintains the formula will already
have solved your problems for you."

There's two problems with this: first, that's presumably true for the source
build, and if you can hit a case that the source build doesn't work for,
chances are pretty good you can hit a case that brew doesn't work for (my
experience is that more often than not they go together). Second... I'm
happier to use package managers for applications meant to live on my machine
and never go live elsewhere (heaven knows there's lots of stuff I just want to
install and get on with my life rather than fiddle with), but for applications
I'm deploying, it seems to me it's generally wise if someone involved in the
project has a picture of the build details and dependency graphs in their
head.

Some of the automation we're throwing at operations is really convenient. It's
not _simple_ , though. And depending on how much forethought is put into it,
I'm starting to think of it as ... maybe not technical _debt_ , but a
technical _credit card_ which is super convenient at times, but also can
easily become debt if you're not careful with it.

~~~
FraKtus
I never installer homebrew on any mac I develop. When you see all the
possibilities to customize projects, it's just ridiculous to want to build
something with a one-liner and expect it will take thousand of decisions for
you.

------
kbrwn
There are plenty of mechanisms in the container ecosystem to address each of
the problems the author states. Building software isn't all that it is cracked
up to be and it isn't very fun. Often times build steps for open source
projects are not well documented or the build process is inherently difficult
to push a paid product. Example: nginx

------
JohnStudio
I think the entire statosphere of DevOps is just about dead on the whole in
2018 .. in retro, after working with things like Docker .. and more specific
industry variations beyond the Amazon tech, it makes no sense to dwell on the
security / control of a dedicated systems admin professional since the tools
are all outside the local domain anyway. The rest from VoIP to IoT to
container services are managed whole-sale .... SysAdmin is a dino in the age
of distributed tech and outsourced IT resources.

I'm a programmer, so I'll take heat for it .. but I don't see a need for them
anymore.

~~~
nunez
Nah, you’re right

The entire purpose of DevOps IMO was to close the gap between sysadmins and
devs through code. Devs doing everything, including infrastructure, was and is
the entire plan! Public cloud made this super duper easy.

The problem is devs don’t want to manage core infrastructure (VPCs,
networking, modules for deploying lambdas and database clusters and container
orchestration clusters, etc) and _somebody_ has to do that stuff

Ideally, those would just be features like any other software team, as it’s
all API calls at the end of the day. But lots of companies have issues with
structuring their platform teams like software teams because its “not
software” even though it is

This problem is more deeply entrenched at large companies with hundreds of
millions of dollars of compute that they own that is owned by an old school IT
function that can’t fathom the idea of either giving it up or making it
accessible like cloud and would rather pay VMware tons for tools that make
teams even slower than have their sysadmins become developers

Then there’s the whole protectionist “You’re taking my job” and “devs can’t
possibly know this much about $infra” that isn’t dying off anytime soon

It’s complicated

~~~
mmt
> Ideally, those would just be features like any other software team, as it’s
> all API calls at the end of the day. But lots of companies have issues with
> structuring their platform teams like software teams because its “not
> software” even though it is

Just because it's is _implemented_ as API calls at the end of the day doesn't
necessarily make it not "not software" (if you'll pardon the double negative),
at least in the sense that I believe you mean.

To whit, I believe you're suggesting that if something can be expressed as
code, it's all "software" and can therefore be designed, written, and
maintained by the same kinds of experts, software developers.

I disagree, because the nature of the infrastructure-as-code code is too
different from the application software code.

One could, similarly, express an FPGA configuration in code, but a software
developer would not automatically be good at programming one. This is even
likely to be true for less extreme examples, such as programming expertise not
automatically transferring from general software (for lack of a betterm term)
to code that works well on, say, GPUs.

In the case of IAC "software", a more mature design is more likely to resemble
traditional sysadmin/network/security best practices than application software
features. It could also have significant financial side effects if there's an
error, assuming public cloud, which could require more stringent standards of
control, review, and quality, especially if a company ends up in SOX
territory.

>Then there’s the whole protectionist “You’re taking my job” and “devs can’t
possibly know this much about $infra” that isn’t dying off anytime soon

I'm sure some of this exists, but my own experience is an attitude not that
devs _can 't_ know a certain amount about infrastructure but that they simply
_don 't_, often because they actually don't _want_ to.

Perhaps they fear that if they do end up knowing that much, they'll end up
being the ones to manage that core infrastructure, which you indentified that
they don't want to do!

------
philjohn
Run your own artifactory, have that as your docker repository too, only allow
the use of jars that have been vetted ... it's what we do where I work, it's
not hard, and it solves 99% of these gripes.

------
rb808
He's right that security right now is a bit unreliable but random sysadmins
writing scripts and manually configuring things is never a good guarantee
either.

As a dev I'd like nothing more than k8s or similar to be the standard platform
to run applications so that everthing is standardized and I dont ever have to
require a sysadmin. I think already this is possible more secure than the
"good old days", but in the future I expect it to be more so.

------
ppeetteerr
Regardless of your take on the article, could we, at least, agree that all
build tools suck in their own special way after a certain level of complexity
is reached?

------
nineteen999
Thank heavens people are starting to point this out.

The last devops team I worked on had an obsession with shiny. Never mind that
they couldn't bootstrap a new base database for their application anymore or
automate the entire application deployment (even with a phased approach) on
either VMware or AWS. They wanted to keep piling new tools (often with sub 1.0
version numbers) of new tools on top of an unstable foundation, and would just
shrug when it fell apart in production (which it commonly did).

I tried pointing out to them that by giving commit access to their internal
puppet git repository to every developer in the building, they had effectively
given root access to them as well. All I received were shrugs and blank looks
all around.

One thing from the article that doesn't seem to be discussed enough here in
the comments is trend of pulling random Docker images from the internet and
deploying on your infrastructure simply because it's easier to integrate
random versions with feature X than maintain your own builds, or work with the
vendor's provided packages to achieve the same result. The security
implications of this in particular has been bugging me for years.

~~~
mmt
That helps to explain why there are _so_ many DevOps job postings with nearly
identical lists of tools, especially for startups here in the SFBA.

------
89vision
To bake a pie from scratch, first you must compile Hadoop.

------
zuni
It is a mess, but one part of the ever transitioning "sysadmin role" that is
valuable is, being able to code and understand it. That being said, good
sysadmins (rare), can and do.

Isn't most of this being driven by society in general though? Everyone wants
everything now... we're generally feeding this trend to do whatever it takes
in the shortest amount of time.

------
nunez
I think a lot of stuff that I used to need to know —- OS kernel internals,
hardware specifics, relatively deep network knowledge —— is less useful in the
age of public cloud, containers and immutable infrastructure. There isn’t
really a need to tune a kernel for performance or do deep troubleshooting to
root cause OS issues and ensure uptime; if something seems off, kill the
machine or container and let auto scaling or the container orchestrator take
care of it. If something wrong with cloud networking, call AWS, as the most we
can do is prove that it’s on them. Etc.

While I still know these things, I haven’t had to employ them in a while. I
will probably use those even less as I start getting into management. That’s a
bit saddening.

But the new stuff I’ve learned over the years, namely, designing systems like
I’d design software, is amazing. Applying TDD to infrastructure isn’t
something I thought I’d be doing, but here we are, and we have “modern DevOps”
tooling (and lots of other things) to thank for that.

~~~
auslander
> There isn’t really a need to tune a kernel for performance

Maybe, but knowing SElinux, kernel namespaces and syscalls for Docker seccomp
profiles are crucial for security, and security is the Next Big Thing.

Sad you are getting into management. Lets push to undo that notion that
manager is worth more salary than lead engineer. I'm pushing :D

~~~
nunez
Why is that sad?

~~~
auslander
Well, you said yourself:

> I start getting into management. That’s a bit saddening

Less one engineer in the world?

~~~
nunez
Oh! I’m not at all sad about getting into management; I’m saddened that I
haven’t put the lower-level stuff I know (sort of) into good use in a while.

I want to go into management! We need more engineers in management!

------
arca_vorago
I was all ready to rant and then read the first part: "I’m not complaining
about old-school sysadmins. They know how to keep systems running, manage
update and upgrade paths.

This rant is about containers, prebuilt VMs, and the incredible mess they
cause because their concept lacks notions of “trust” and “upgrades”."

Fair enough, rant preemptively avoided.

------
myWindoonn
Reproducible build tools are a thing. Try Nix sometime. nixpkgs has, say,
Hadoop. nixpkgs can make Docker containers too.

~~~
davexunit
nixpkgs isn't rigorous about reproducible builds. Hadoop is actually a great
example of this. They do not build it from source, a prerequisite for calling
a build reproducible. Instead, they download the binaries that the Apache
project has already built and run patchelf on them to make them work.

[https://github.com/NixOS/nixpkgs/blob/master/pkgs/applicatio...](https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/networking/cluster/hadoop/default.nix)

~~~
ixxie
That is true, but in my experience in small scale personal desktop and cloud
computing, NixOS is in practice reasonably reproducible on the system level.

For me in terms of interface, the strength of the Nix ecosystem is declarative
system, ops and service configs in the same language used for package and
build specification. The technical strength is striving towards reproducibe
builds by hashing the dependency tree to build an immutable store. Yes
sometimes this means getting binaries but you can always pin the package
version or even have multiple versions in tandem. The practical upshot of this
immutability is system level rollbacks, which are generally reliable although
there are ways to break it. Yes there is garbage collection.

Nixpkgs has is quite an achievement, and yes it has its warts but we are
working hard to make it better. If we manage to shape up the data science I
will try it out at work too. Im very curious how it might scale.

------
jcastro
> Essentially, the Docker approach boils down to downloading an unsigned
> binary, running it, and hoping it doesn’t contain any backdoor into your
> companies network.

I don't get why people still claim that this is "the Docker approach". This is
not the docker approach, everyone hopefully knows this is an anti-pattern by
now.

~~~
gaius
_I don 't get why people still claim that this is "the Docker approach"._

Because it is the Docker approach (as distinct from the _container_ approach
necessarily).

~~~
Ceezy
You can download images from trusted sources with Docker

------
lasermike026
I'm out! I'm done. I started as a developer. I then migrated to sysadmin, then
systems engineer, then devops, and back to developer. I am done with playing
the platform game. None of it matters. What matters is writing code that does
work leading to profits. Always be coding. CAPEX over OPEX.

------
nunez
> None of these “fancy” tools still builds by a traditional make command.
> Every tool has to come up with their own, incomptaible, and non-portable
> “method of the day” of building.

100% this. Learning Make a few years ago was one of the best decisions I’ve
ever made. It’s simple (until it’s not), straightforward and available on just
about any Linux and UNIX installation (version compatibility aside).

Trying to make the case for POSIX-compatible Bash scripts has been tough,
though.

I also agree re: Docker. While you _can_ secure a container image with USER
statements in the Dockerfile and knowing how to give containers just the
capabilities that they need to do their job, it is WAY too easy to run them as
root and give them privileged access to things. It should’ve been the other
way around. Also, every container orchestration platform seems like a really
elaborate hack.

------
Floegipoky
I feel like this has a lot to do with the "cattle, not pets" mindset and
disposability of modern infrastructure. For instance the author talks about
patching a container. But that's not idiomatic- instead you would include the
patch in your build pipeline and deploy the new image.

------
rambossa
> Ever tried to security update a container?

Wouldnt the approach for security updates be to replace the container?

~~~
icebraining
Depends on the approach, you can use mutable or immutable containers. In fact,
the OpenVZ VPSs that were at one time reasonably popular were just containers.

------
batoure
I don't disagree with the OP here per say.

However I do think OP made a mistake by using Hadoop as the example. Hadoop
has much more in common with an OS at this point than a single application.
The ability to download a single component of the eco system is mainly to
support small scale testing and development.

You wouldn't build Debian by going and getting each piece of the linux kernel
from its source building those building all the pieces of the middle ware up
and so forth... that is the whole point of having Debian in the first place.

Do certain complex software eco systems need better support for fingerprinting
their builds, Yes. Does all this mean the sky is falling... probably not.

------
disordinary
There's always been propriety code installed on systems, who knows what that
big Oracle database is actually doing?

Are these sysadmins actually looking through the open source code that they're
compiling to make sure that there isn't a security flaw in plain sight?

With Docker you can get containers maintained by a trusted source, as a
sysadmin you don't have to deal with all the hassle of upgrading things, you
can just replace the container with the latest version. With Docker content
trust the containers are signed.

There have always been cowboys out there and that has nothing to do with the
tech stack.

------
pmontra
If a company has a threat model and a list of business goals then they would
have at least a risk matrix and they might decide what to do: either go the
slow way and build software they can trust or accept the risks, backdoors and
all the rest. Most companies skip all of that and hope for the best. Not
always a conscious decision.

Sometimes they get their unpatched servers encrypted by some ransomware,
remember they don't have any backup, close shop and move on to the next
business idea. I've seen that happen.

------
hodgesrm
I see this as more of an opportunity than a problem. The fact that Hadoop,
Kubernetes, and other platform-like systems are complex to manage properly
with good attention to security implies they should be delivered as cloud
services rather than having everyone run their own. This enables K8s users to
focus on apps while offloading management to specialists who can focus on
running the services well.

If you operating at large enough scale you can bring the "cloud service" in-
house.

~~~
scurvy
One of the gating factors here is the both the speed of Kubernetes development
(move fast and break all the things), and the terrible state of accompanying
documentation.

If "they should be delivered as cloud services" is some sort of k8s apologist
stance for its sorry state of maturity, then we got issues. OTOH, if it's "You
shouldn't run it in house unless you have an army of people to read every new
commit", that's wrong, too.

~~~
hodgesrm
First of all let me be clear I'm not an expert in K8s and certainly not an
apologist for bad software.

On the other hand there's a level of complexity in distributed systems that is
impossible to avoid even in stable infrastructure. You have a design choice of
trying to make the system as easy as possible to operate (at the cost of other
features) vs. finding operating models that make it less of an issue.

Personally I would rather spend time futzing around with my applications that
run on kubernetes vs. trying to run kubernetes itself. It would be sufficient
if Kubernetes services were portable across a marketplace of providers so I
could pick a place to run my applications.

~~~
hodgesrm
Also as far as security is concerned it appears to me that a lot of people are
deploying technology that they simply don't understand. This is not just a
problem with Docker but with apps from ecosystems running on npm and pip.

You _can_ build images securely with Docker but it requires building them
yourself, using private registries, checking carefully for vulnerabilities,
and testing. If you don't want to do this, pay somebody else to do it right.
There's no free lunch.

------
YouAreGreat
> The first internet worm spreading via flawed docker images?

Good question, why don't we _see_ exploits of all that implicit trust to the
degree that, eg, the DOS shareware scene gave your PC visible virus infection,
or the early internet gave us worms that would bog down the whole net?

My attempt at an answer: Because the black hats aren't hobbyists anymore.
_Visibility_ is for amateurs.

~~~
icebraining
Are you saying that every new black hat is immediately a professional? Or that
there aren't any new black hats? In every other activity of human life, new
amateurs appear as the older ones become professionals. Where are the visible
exploits from the new amateurs?

~~~
YouAreGreat
> Are you saying that every new black hat is immediately a professional

To a degree that's what I'm saying. Sentencing for "computer crimes" when
perpetrated by non-corporate entities has reached epic levels. I'm sure
qualified talent takes that into account.

~~~
icebraining
Has sentencing significantly increase around the world? The US is not the only
source of malware.

------
voltagex_
Uh, the Bigtop project mentioned in the Debian wiki doesn't seem to be working
either.

[https://ci.bigtop.apache.org/job/Docker-
Sandbox/112/BUILD_EN...](https://ci.bigtop.apache.org/job/Docker-
Sandbox/112/BUILD_ENVIRONMENTS=debian-8,STACK=hdfs,label=docker-
slave-06/console)

Does anyone care?

------
awakeasleep
I want a prediction market to offer bets on which big company is affected by
the breach that starts the reckoning.

~~~
politician
After the reckoning, how do you propose to collect your winnings?

~~~
ddingus
Whuffie

------
_bxg1
Seems like the security issue is easy to fix? Only installing binaries you
trust and comparing them against a hash is basic computer admin 101; it's not
specific to Docker. Just make sure you trust whoever made the docker image,
and that they give you a hash to compare it against.

------
tw1010
Isn't this just an example of automation? Other than sunk cost, by what
argument is it reasonable that the sysadmin job should in any way deserve to
be protected or aught to continue to be a thing? To my ears containers sounds
like they've kind of solved the problem.

~~~
kazen44
How have containers solved maintaining production systems?

It's just moving the abstraction layer higher up the stack.

Also, who will design and build security for these systems? Or care about low-
level performance?

A ton of dev's don't care about prod in my experience, and just want to ship
shiny features.

Containers and VM's are not very different from a deployment or maintance
standpoint. The deployment strategies used with containers could be done with
(lightweight) VM's a decade ago. (heck, jails have been used for nearly 25
years)

Also, sysadmins have been automating stuff since forever. Without automation
one inherently has a very unstable system.

------
Xaena
I'd love to see a revised version of this for 2018 (with fewer NSA tinfoil
references).

~~~
auslander
You think NSA dissolved itself in the last 3 years?

~~~
Xaena
Nothing about my comment implies that unless you are trying to be
argumentative.

------
nova22033
>And since nobody is still able to compile things from scratch

Is is true that we had better security when we had to "compile everything from
scratch" whatever that means? Did people compile the OS from scratch?

~~~
acheron
_Did people compile the OS from scratch?_

...yes.
[https://www.freebsd.org/doc/en/books/handbook/makeworld.html](https://www.freebsd.org/doc/en/books/handbook/makeworld.html)

P.S., make sure you have a spare couple days before running _make buildworld_
on a 150 MHz Pentium.

------
sysbell
The fact is, many open source software projects provide their own Dockerfile
(and in many cases, images). Using these is akin to downloading and deploying
a release tarball.

------
Nano2rad
This complexity could make companies depend on commercial version even for
open source software.

------
auslander
Security is what the article is about, and it is spot on.

Cloud infra today requires coding skills. Thats why so many web developers
trying to do infra too. Without systems or infrastructure knowledge.

True cloud automation engineer, a.k.a DevOps, a.k.a SRE, is the sysadmin
learned to code more than bash. Python, REST, classes, methods, objects.

Or web developer, learned Linux and networking in deep, down to userland to
kernel system calls and routing protocols.

These people are rare, hence the ruckus and mayhem. You will be hacked.

------
thepumpkin1979
(2015)

~~~
sctb
Thanks! We've updated the headline.

------
bovermyer
TL;DR version:

"No one cares about verifiable security anymore."

The author defines a category of people and then attacks it for A) not
verifying the integrity of its systems, and B) not doing things with make.

A is valid and something that needs attention. B is missing the point.

------
gigatexal
Ugh make files. Kill me. Give me a bash script or give me death.

~~~
triztian
Whats wrong with Makefiles?; they're quick and easy ways to describe how to
build simple and _not super_ complicated systems.

I've found this to be a great resource, hope you find it useful:

* [http://gromnitsky.users.sourceforge.net/articles/notes-for-n...](http://gromnitsky.users.sourceforge.net/articles/notes-for-new-make-users/)

~~~
gigatexal
It’s ironic. I love python but the syntax for make files was a pain. They’re
rather cryptic. I guess they’re a staple time to conquer them once and for
all.

------
golergka
> Back then, years ago, Linux distributions were trying to provide you with a
> safe operating system. With signed packages, built from a web of trust. Some
> even work on reproducible builds.

> But then, everything got Windows-ized. “Apps” were the rage, which you
> download and run, without being concerned about security, or the ability to
> upgrade the application to the next version. Because “you only live once”.

These two a both very valid and great approaches for solving different
problems. Sometimes you're just a regular user without any valuable data that
just wants to do things in a quick and convenient way. And sometimes, you're a
system administrator that needs to evaluate the whole build pipeline and plug
all the holes for production deployment.

Both alternatives should exist, and one doesn't cancel the other.

~~~
pferde
Yes, but when "system administrators" with valuable data just want to do
things in a quick and convenient way in production, we've got problems.

