
Running servers and services well is not trivial (2018) - kiyanwang
https://utcc.utoronto.ca/~cks/space/blog/sysadmin/RunningServersNotTrivial
======
kev009
Neither is: any complex manufacturing technique, chemical or pharmacology
synthesis, drilling for oil, building a mile of interstate highway, assembly
of automobiles etc but I don't think someone with 2 years of experience could
either cause significant process damage or process improvement at these
complex and expensive activities.

I am therefore regularly shocked by how much money is poured into or otherwise
spent on tech and tech workers and how laughable the result is. I've really
never seen a server infrastructure at any company where I haven't shuttered in
horror within a few hours of looking at design, practices, or systems
themselves. When I was much more junior I was perplexed by the sudden appeal
of i.e. AWS EC2 but then I've seen from the inside multinational conglomerates
try to acquire and deploy servers (i.e. just the logistics that a non-
technical person could handle, no software) and understand. The irony to me
is, if you fail at logistics, you have near zero chance of doing anything else
right, so cloud doesn't reduce your liability much unless it is fully managed
like GApps or Salesforce. This should be a huge warning sign for a variety of
stakeholders, but money's cheap and the livin's easy right now.

~~~
nmfisher
> I am therefore regularly shocked by how much money is poured into or
> otherwise spent on tech and tech workers and how laughable the result is.
> I've really never seen a server infrastructure at any company where I
> haven't shuttered in horror within a few hours of looking at design,
> practices, or systems themselves.

Yet, all these companies remain in existence.

Maybe what you consider to be "laughable" is actually "good enough" when it
comes down to dollars and cents?

~~~
kev009
Existence doesn't mean much in times of quantitative easing; pick your
favorite punching bag company in recent memory that is still extant. If you
don't understand the economic situation backing all this it's hard to discuss
your rebuttal further so we need to checkpoint there before I invest any time
on more responses.

The issue is largely orthogonal to my own skillset because one needs to only
view quality regressions in order to assert my point, not be the cause or hero
of them (ain't philosophy generous?!). The times a 2 year experience system
operator could go in and wreck havoc on a mainframe operation were not
particularly common and the standards and practices (i.e. airgapped backups on
tape, geographically dispersed sysplex etc) ensured business continuity were
generally planned right in from the initial purchase order with the vendors'
help once upon a time. In this decade people can and did launch massive
financial exchanges on mongodb when it had serve and well known issues.

------
geocrasher
I feel this way every time somebody says this about mail. "Why not just
install Postfix on a cheap VPS and run your own mail? It's easy!"

Running a mail server _is_ easy. Running a reliable mail server with good
deliverability is _not_. Not being spammed into the next millennium is _hard_.
Yes, it's possible (I do it on some inexpensive OVH VPSs) but it takes time to
come up with a solution that works without being too restrictive. Plus, spam
is a moving target.

~~~
holri
I am running Debian stable (exim4, spamd) on my own little hardware as my own
email server out of the box for years now, and never had any issues at all
besides the initial config. Spam filtering works better than commercial
providers like gmx or posteo. I have unattended-upgrades installed and every
few years I do a apt-get dist-upgrade. That's it, for years now.

~~~
throwaway3157
I also use to manage exim4/spamd long ago. Part of the problem I remember is
that your email (sent from your server) can be marked as spam on someone
else's inbox (e.g. you are blacklisted without knowing it). So while you
haven't had issues on your side, it's hard to prove that you didn't have
issues outside your box. This was especially common with VPS datacentres, as
their IPs routinely showed up on blacklists.

~~~
holri
My mini server (own hardware) is on a colocation in a datacenter in the city
where I live. I never had any issue with missing emails at the receiver side,
nor was my IP in a blacklist.

------
oceanghost
This article is talking about the difference between theory and practice.

I admin a small, low traffic business site for a friend. In theory, running a
small PHP site should be easy.

In practice, I cannot think of a week that has gone by where everything just
"worked." From script kiddies to hackers, to web-components suddenly
disappearing. To YouTube audits, to random bugs that aren't reproducible.

Large providers will always have an advantadge in that, they can identify
adverasrial activity before it reaches my site, and then apply what they learn
to every site they operate.

------
gfodor
For a upcoming project we're using AWS Marketplace + Cloudformation in
combination with the community edition of Chef Habitat to try to solve this
problem. Once someone deploys the cloudformation template, it sets up:

\- Email (SES)

\- SSL certificates + renewal (ACM + lets encrypt)

\- Backup + Restore (RDS snapshots + AWS Backup for EFS)

\- ALB (if desired)

\- CDN (Cloudfront)

\- Firewall

\- Auto scaling database (RDS pgsql serverless) with automatic pausing

\- Auto scaling storage (EFS)

You can adjust capacity by just dialing up or down the ASG, and our Elixir app
auto-clusters using the Habitat ring for service discovery. Packages are
upgraded when new versions are pushed to our package repository. All binaries
run in jailed process environments scoped to Habitat packages, with
configuration management and supervision handled by Habitat.

This is a non-container approach, focusing on VMs. However, the theory is that
this will be pretty much turnkey for a scalable self-hosted product on AWS,
including software updates. It's hard to say how well the theory will fall out
in practice, but I'm optimistic. Avoiding a mess of microservices was fairly
important in making this kind of thing possible, we have a few services, with
a dominant monolith in Elixir.

~~~
mwcampbell
This is the first I've heard about Chef Habitat in a few years. I'm curious
about why you chose it. Also, how do you configure new VMs? Do you have a
custom machine image, or do you use a stock image and have a startup script
(via EC2 user data) that sets it up? I've spent a little time looking at
Habitat docs, and I find it unfortunate that they're trying so hard to jump on
the container bandwagon when apparently it also works well on bare metal or
VMs.

~~~
gfodor
Our project was greenfield 2.5 years ago, and in evaluating the ecosystem we
ran into the work going on w Habitat after trying to figure out if there was a
VM-analog to the stuff going on with k8s and containers (which was a very
scary prospect to dive into.) Habitat seems to be a good tech stack for
solving a variety of operational problems that you typically pull different
products off the shelf and duct tape together -- particularly if you are
focused on VMs.

It was a bit of a bumpy ride but the product has stabilized enough we have
confidence in it and it turns out to be quite a good fit for the self hosting
use-case. Basically, it's an all-in-one solution for packaging, isolation,
service discovery, configuration management, and deploys. So it was relatively
easy to transition our production deployment bits to a packaged up deployed
solution to 3rd party servers, with all of the bells and whistles.

Our setup is a fairly minimal custom AMI that just installs the basic package
dependencies for Habitat and configures the supervisor. Everything else is
bootstrapped within Habitat, including our own proprietary configuration
management service which allows a nice web-based GUI for configuring the ring
state. The user data script just does some basic UNIX setup and DNS
initialization and then does a bunch of Habitat service loads. By abstracting
over all configuration via Habitat, a unified interface exists for configuring
all the services across all the machines, regardless of their underlying
configuration management approach, programming language, conventions, etc.

------
lykr0n
This is true to some extent. I'm building out the infrastructure for my
company, and there is a lot of effort on my part required to automate stuff.

Something like kafka requires figuring out the configuration you want, putting
that into Salt, adding the correct configuration options, deploying it and
making sure zookeeper works, and then generating certificates and what not.
It's not a simple process.

Setting up monitoring, and other things like floating IPs is a pain. Custom
wrappers for Terraform scripts and other components required to deploy the
systems you need to run an app. It's a lot

~~~
anonytrary
Ugh this comment triggers my hate for the current state of infrastructure.
It's just configuration hell at this point. We thought all of these wrappers
and services would make infrastructure easy, and they did to a certain degree,
but it also created brand new problems that are arguably just as annoying as
the old ones.

Edit: As a developer, I would rather sketch out the infrastructure that I
need, then hand it off to someone whose entire job is to set up
infrastructure. I work at a small startup where the devs are still doing a lot
of infrastructure, and all it does is create technical debt. Without a solid
team that owns infrastructural concerns, developers just end up digging
themselves a grave.

~~~
weberc2
I think the problems are mostly around configuration that is so complex and
redundant that we need programmatic abstractions to deal with it—we need
something like a general purpose programming language to DRY up and simplify
that configuration, but the powers that be are clinging desperately to YAML
(despite encoding a shitty, half-baked AST on top of it a la CloudFormation or
“generating” YAML with text templates a la Helm) as the human/configuration
interface presumably because the “it’s as easy as YAML!” is such a good
marketing schtick.

A certain amount of configuration complexity will always be there, but there’s
still a lot of incidental complexity that could be cut away if we just
generated these config a with a general purpose language.

~~~
imtringued
There are worse things than YAML. Dealing with hundreds of snow flake
configuration formats that do not follow common sense rules cost me more time
than working around mistakes in YAML. There are obviously some really, really
stupid ideas like generating YAML with text templates but the problem isn't
with YAML. It's the tool that generated the YAML file.

------
michaelbuckbee
This is why I love Heroku - it hits this really sweet spot where you can
deploy apps to it fairly easily (Docker or their packaging system) and skip
huge chunks of that "going to bite you eventually list" of admin tasks.

~~~
ryanar
Depending on your org size, Github/Gitlab could still be cheaper than Heroku,
and the software is much better than any FOSS thing you can find.

~~~
ativzzz
Typically you use both, github for the git repo and Heroku for automatically
deploying from github.

------
eandre
This is certainly true, but it strikes me that the solution must be a move
towards simpler primitives so that managing a cloud-based piece of software
doesn't require so many moving parts.

~~~
jerf
I've mentally tried to spec out what that would look like. It's weird and very
unlike how we operate today. It is not clear to me that anything I've mentally
sketched out is a win even if it were magically manifested for me with no
effort.

I don't think this is accidental complexity, it's essential complexity. To the
extent that it seemed easier in the past, it's because we ignored some of that
complexity and paid the price. Putting up a truly production-grade service is
fundamentally hard.

Now, I think it probably will get easier over time as we grapple with these
problems. Integrating with auth, for instance, should be easier, and that can
be solved with some more code, and formal and informal standards. It's not
_all_ essential complexity. But I think a good deal of it is, or at least it
is from anything remotely resembling our current perspective.

~~~
avmich
> I don't think this is accidental complexity, it's essential complexity.

I think this is the kind of essential complexity which we faced when
developing operating systems. OSes are fundamentally helpers, which don't
solve the application problem, but make the solving easier (a good OS makes it
easier by much).

So we can and should use the results which we got from OS development.

------
galaxyLogic
I think what is needed is a statically typed systems-integration language.
Problems are caused when there is nothing to "type-check" that the values you
write into config-files are references of objects of correct type.

Static typing would also support editors that let you choose from allowed
values.

Maybe such a language exists? If not, why not?

~~~
Qasaur
This already sort-of exists in the form of Nix/NixOS/NixOps. Entire system
configuration is deterministic and is specified in a configuration file, and
it has its own little functional DSL called Nix that is used to specify system
configs and how to build software packages.

Just started using it and I think it's probably one of the best software
ensembles I've ever used in my career. Completely knocks Docker, K8S etc. out
of the water.

~~~
nh2
Yes, the Nix ecosystem makes running both servers and services much easier
than any other tools I have used.

I describe the deployment I run with it here:
[https://news.ycombinator.com/item?id=21468506](https://news.ycombinator.com/item?id=21468506)

Nix is not typed (it would be slightly better if it were), but everything is
evaluated before it hits your servers which allows for lots of static checks.

I started writing a tutorial on NixOps here if you want to learn it:
[https://github.com/nh2/nixops-tutorial](https://github.com/nh2/nixops-
tutorial) (only has 1 part so far, I'd like to show how to bootstrap a Consul
cluster and distributed file system next).

------
feedbeef
Real account of just how "not trivial":

[https://nickcraver.com/blog/2016/02/03/stack-overflow-a-
tech...](https://nickcraver.com/blog/2016/02/03/stack-overflow-a-technical-
deconstruction/)

That said, thanks to VPS and Raspberry Pi, it's fun and extremely inexpensive
to privately deploy and maintain semi-reliable personal services. Anything
beyond that requires immense planning, expense, and dedication.

------
outsomnia
It's a good article and basically correct, I don't agree with the conclusion
"why not pay someone else to do it", it may scratch your itch in the short
term, but in the long term you have no agency if you stuck all your fingers
into the finger-trap of github actions, metadata and other EEE. It's literally
microsoft at the wheel as well.

Similarly free CI services are very attractive but when they stop being free,
or die, and you have a lot of investment in tests specific to their
infrastructure spread over all your apps and again no agency to keep it going
yourself, it's less attractive.

"Why not pay someone else to do it" glosses the privacy and security results
are not the same when you pass all your email or IP to a large foreign company
who may compete in some of your markets, compared to doing it in-house.

Yes it can be unexpectedly difficult to do even a small thing securely and
well. But it doesn't mean that it's not the right approach.

------
snicker7
Does anyone here have any experience with JuJu? From how its marketed, it may
alleviate much of the burden of setting up infrastructure (especially if using
charms provided by the charm store).

------
cmhnn
Bonus points for understatement in title that caught me at just right time to
lol.

Will skip reading comments because I know some k8 fan boy or some other type
of religious enthusiast will insist all of it is solved.

------
VvR-Ox
Like in other business domains this is a "make or buy" decision.

To decide of course you need enough data about the situation at hand:

\- initial costs

\- keep system running costs

\- other pro/con items like "availability" ( _e.g. you can use a local server
even if there is issues with your companies internet access_ ) that can be
prioritized and rationalized with methods like the cost-utility analysis or
the cost-benefit analysis

I think it is hard to give general advice on what to do in such cases.

------
nooyurrsdey
to expand on this, not much of what's done in managing software applications
is "easy". people so often talk about spinning up VMs and services at scale
and forget that even running your own well maintained blog is a big effort

~~~
ericd
Static site generators make it pretty darn easy, though, especially if you
serve it off S3. You don’t have to think about that site every year if you
don’t want to, and it’s generally fine for blogs.

~~~
dcuthbertson
I get a feeling about this as the one described in the article about git
servers. Yes, static site generators make it easy to post new content written
in plain text, markdown, and other markup languages. However, it's the initial
configuration that can be tough. I spent a year playing around with Hugo.
Along the way, I learned a little bit about HTML, CSS, SCSS, go-template
language, shortcodes, configuration through toml, toml/yaml front matter,
reStructuredText, markdown, and git submodules. When my blog was Jekyll-based,
I had to maintain gems that were compatible with the version of Ruby & Jekyll
used by GitHub.

None of this stuff is easy when you're starting out. Even when you've become
used to whatever environment you've created, there are a lot of moving parts
to monitor and manage.

~~~
ericd
Yeah, I’ve found the purpose built blog generators I’ve played with to be a
bit overly complicated for what they do. I’ve found Frozen-Flask to be more
straightforward, and it has the advantage of being switchable back to a
dynamic site fairly easily. An alternative would be wgetting Wordpress or
similar.

------
bsder
What do people use for 2FA if you only have 10-20 employees?

~~~
adtac
2FA is just something you know and something you have. I have a Yubikey with a
RSA authorisation key (+enc and signing keys too, but that's irrelevant) in it
that I've hooked ssh-agent and GPG to. It is the only key accepted by my
servers. Obviously, disable password login. The key has a password, which is
the something I know.

~~~
bsder
But that doesn't help if I want that 2FA key to access their mail, Gitlab
instance, AWS servers, etc. What do I do when someone leaves and I want to be
able to revoke it and take over their repositories and accounts?

If I'm a 1000 person company, I've got resources to spend on my corporate 2FA
infrastructure.

What do I do when it's 10 people? 20 people? 50 people?

~~~
toast0
When the company I was working for was small, we had ssh 2fa via duo security,
and used g suite, with mandatory 2fa, and got as much as possible set to do
SSO via g suite. G suite isn't great, but there are/were a lot of hooks to
login with it, so that was nice; and these days, it has a sane way to force
2fa (when I did it, setting the org to mandatory 2fa meant your new users
couldn't login, because they hadn't set up 2fa because they couldn't login;
thanks google).

We self-hosted git though (using gitolite for access control), running servers
was a core competency for the team, so having a little baby server on the side
that just dealt with text files for 50-100 people wasn't a big deal. It was
running on a mac mini at the CEOs house until he forgot to pay his cable bill
once and we couldn't push code for a day.

~~~
bsder
The idea of Google having control over my SSO is _NOT_ appealing given their
current track record.

