
Rolling your own servers with Kubernetes - old-gregg
https://gravitational.com/blog/aws_vs_colocation/
======
ranrotx
AWS employee here--thoughts and opinions are my own.

Prior to AWS, I was in IT Operations at a large financial services company. I
saw the writing on the wall that over time, companies would not want to manage
this part of their IT infrastructure themselves. Keep in mind, I was someone
who was responsible for keeping the lights on for a decent number of Linux
severs.

For an individual company, there really isn't much value in having to maintain
firmware levels on all your hardware, patch hypervisors (and try to coordinate
all of the maintenance around a fixed pool of hardware), perform months-long
evaluation of new hardware before purchasing, test and validate configurations
on new hardware, etc. I used to do all of this. I don't miss it either.

Yes, the items above are important, but doing them right is really table-
stakes for any reliable IT Operations department. You can choose to spend time
getting these right, or delegate that responsibility to a service provider
whose main job is to get that stuff right (and recoups that cost across a much
larger customer base).

~~~
latch
What seems to often go unsaid in these discussions is that the choice isn't
between cloud and colo. There's a third, hugely popular and mature option:
dedicated providers - which address all of your issues.

It's convenient for cloud vendors to have people believe the choice is between
them or having to deal with hardware.

~~~
dhruvkar
what's an example of a dedicated provider?

I also worked in ITOps at a medium-ish company and we were moving our colo to
Azure, when I left.

~~~
acjacobson
Rackspace

~~~
jethro_tell
The guys that are selling managed services on AWS?

~~~
Karunamon
They sell all of those those things. RS has their fingers in many pies, even
though some of them appear to conflict at first glance.

------
gambler
_> Should you roll your own servers?

If you are not certain, the answer is most likely “no”. The staggering growth
of AWS happened for a reason._

Funny how for many decades companies and people were running their own
servers. The hardware was getting cheaper each year. More software became
available via open source. Then several ubecorporations entered the
hosting/cloud business, and suddenly no one seems to be able to afford their
own infrastructure.

~~~
jasode
_> The hardware was getting cheaper each year. More software became available
via open source. Then several ubecorporations entered the hosting/cloud
business, and suddenly no one seems to be able to afford their own
infrastructure._

The decision framework the executives use isn't just the _"
hardware+software"_ \-- it's the whole _" IT organization"_.

In other words, it's not "in-house cpu" vs "Amazon's cpu". It's in-house IT
employees' speed of tech innovation vs Amazon's engineers'. An example of this
disparity was Guardian's disaster with its in-house organization trying to use
OpenStack.[1]

For many _non-tech_ companies where IT computing is a cost center, their
employees won't be able to match the iteration speed of Google's engineers
constantly improving on GCP or Amazon's employees enhancing the features of
AWS.

We've all heard the stories where a company's project submits a requisition to
internal IT department for 2 development servers for their programmers -- and
then the IT bureaucracy tells them that it will take 2 weeks. Over time, the
internal IT dept treats the other departments as adversaries instead of
customers. Executives get fed up with slow IT departments and get excited when
a few clicks on AWS dashboard gets them servers spun up in 10 minutes. It's
not just a cpu+hardware comparison.

Companies outsource to AWS/GCP/Azure because it's quicker turnaround with more
datacenter features than their internal IT teams can deliver. Most companies
are not like Facebook or Dropbox that can maintain an internal IT organization
at a high level equivalent to AWS.

[1] [https://www.computerworlduk.com/cloud-computing/guardian-
goe...](https://www.computerworlduk.com/cloud-computing/guardian-goes-all-in-
on-aws-public-cloud-after-openstack-disaster-3629790/)

~~~
gambler
_> Most companies are not like Facebook or Dropbox that can maintain an
internal IT organization at a high level equivalent to AWS._

Let's try this with different phrasing. In 200X a lot of companies were able
to maintain their own infrastructure, just like Facebook and Amazon did at the
time. Forward 10-13 years. We have cheaper hardware. We have extra 10+ years
of development in open-source software. And yet that list of self-hosting
companies _shrunk_ by a huge degree. Doesn't that seem interesting?

~~~
jasode
_> In 200X lots of companies were able to maintain their own infrastructure,
just like Facebook and Amazon did at the time._

But my point is that companies' IT departments did _not_ maintain
infrastructure just like Amazon did.

In ~2005 when companies were first experimenting with AWS cloud, they might
start with dev & test servers. They click a few buttons and are amazed when
new servers get spun up in minutes and their programmers are productive
_immediately_. The natural question that company execs ask is, _" why can't
our own internal IT department spin up servers for us in 10 minutes?!? Why
does it take them so damn long?!?"_

They wouldn't have asked those hard questions if their internal IT capability
was equivalent to AWS. Eventually, their improved experiences with AWS on
Dev&Test&QA convinced them to migrate mission-critical Production workloads to
AWS as well.

 _> We have cheaper hardware. We have (supposedly) better software._

You're still focusing on on hardware+software and not considering the _IT
employees ' speed of execution_ in how company executives compare the
situation.

Even Netflix as a tech company maintained their own datacenters for over 10
years but ended up migrating to AWS. Their "Guardian" moment was a big
database corruption in 2008. They also had ambitious plans to expand into
countries outside of USA. Those were some of the factors that convinced them
they didn't want to invest anymore in their own datacenters and preferred AWS
take care of it. Amazon employees iterated on datacenter capabilities faster
than Netflix engineers could do it.

~~~
nopzor
netflix does not depend on aws to deliver their video bits. they do it
themselves based on a big network of core and edge pni and caches. this infra,
that netflix has built, in datacenters, is a big competitive advantage. it
would be worse performance and much higher cost to do this over something like
aws cloudfront.

~~~
jasode
_> this infra, that netflix has built, in datacenters, _

Unless things have changed since 2015, reports say Netflix eliminated their
last datacenter already.[1]

The Netflix "edge appliances" for CDN streaming are located in _others '_
datacenters owned by Verizon,Comcast,ISPs,etc.

[1] [https://arstechnica.com/information-
technology/2015/08/netfl...](https://arstechnica.com/information-
technology/2015/08/netflix-shuts-down-its-last-data-center-but-still-runs-a-
big-it-operation/)

~~~
pathseeker
That's still _in datacenters_ that aren't Amazon's, Google's, etc. It's the
most critical component of netflix (the actual video delivery) and it's not
"in the cloud".

~~~
jasode
_> That's still in datacenters_

The point isn't that they are still _in_ datacenters. Yes, of course, they
are. Even the "cloud" ultimately resolves down to somebody's datacenter
somewhere. The point is that Reed Hastings & Netflix wanted to get out of
_managing their own_ datacenters.

Putting their Netflix appliances inside of _ISP owned_ datacenters still lets
them _avoid managing_ their own datacenters. Their critical user accounts
signup, monthly billing, and analytics, etc workloads are at AWS. And as the
article mentions, even updating the cache on the Netflix appliances is
coordinated through AWS. The combination of those strategies keeps them out of
the "datacenter business" and let's them stay focused on their core competency
of "video content".

~~~
yebyen
The distinction makes important fault domains and whose data centers.

If you are a Comcast customer and your internet goes down, and Netflix is
unavailable, who do you complain at? If you're smart enough to notice that
both are down, the answer is almost certainly not Netflix. That does not make
the Netflix workloads in Comcast data center any less critical, they are core
business functions. But they are well aligned with Comcast, who also depends
on the proper functioning of those datacenters.

It makes sense for Netflix appliances to be in Comcast datacenters then,
especially given that Comcast cannot outsource their data centers any more
than the Pentagon can reasonably do so.

Joe Company from off the street can outsource their data centers and derives
no competitive advantage from maintaining their own private data centers.
Netflix in that sense is closer to Joe Company than they are to Comcast, I
guess. I'm not sure what all we can learn from this, but it's interesting.

------
diehunde
So before we had an IT team that would maintain the bare metal servers. Now we
need "Cloud Engineers" to maintain the cloud infrastructure working properly.
I don't know if the argument of externalize the maintenance of the servers is
valid since the complexity of the cloud services is just increasing everyday.

~~~
jononor
True for those that go for Kubernetes etc straight away. If you just have a
few apps to run and put it on a PaaS like Heroku, things are pretty manageable
as a side-thing for the regular software developers. No dedicated team, or
even person, needed.

~~~
jrumbut
When your infrastructure is too simple, you have simple regrets. When your
infrastructure is too exotic, you get exciting, bleeding edge regrets.

This is my view based on the work I've done. I like the ability to dispense
with capacity planning or dealing with power supplies and fire suppression you
get from cloud hosting, but when in any doubt I set up what I need using VMs
(be those droplets or EC2 instances or whatever).

I like to imagine that I've avoided wasting weeks by wasting a few hours here
and there.

------
argd678
Last physical datacenter my team at the time ran our app in we spent weeks
troubleshooting some hardware network driver issues that caused the network to
drop. Was an enormous distraction and Dell and VMware support were useless at
resolving it for us. I’m glad to have others deal with those low level “oh it
must just be your setup” issues.

~~~
lima
Anecdotal evidence. I've also seen colleagues spend weeks arguing with AWS
support while debugging a weird performance degradation issue, that would have
been straight-forward to investigate in a bare metal deployment with full
control over everything.

It's not like the cloud is a magical place where no unexpected issues ever
happen. Cloud providers can be surprisingly buggy, especially AWS,
particularly at scale when you start hitting the implicit scaling limits their
docs conveniently forgot to mention.

Each technology has their unique challenges.

~~~
falcolas
> spend weeks arguing with AWS support

Yup. We've run into that repeatedly. The "we didn't notice anything on our
side, please send more screenshots and logs" gets really old when working with
"managed services".

Network packet losses/truncations, EKS control plane failures, cloudformation
stacks getting stuck in really wierd states, inconsistent cloudformation
implementations for new and existing services, ENI weirdness in containers...

Managed services just feel like they aren't - more and more every day. Amazon
(or any other cloud) will never have the same investment in your availability
and infrastructure as you will.

~~~
isatty
Cloudformation is a tragedy in itself.

~~~
falcolas
The _idea_ behind CloudFormation is great, though. Platform-native IAC with
promised first party support for all future projects, and backfilling existing
services. Plus, it supports deep integrations with first-party supported
configuration management services.

The problem is that the reality has not lived up to the promise, and "first
party support" means "only the first party can support".

~~~
devonkim
CloudFormation is a declarative state management language and framework just
like Terraform. The problem is that CF abstracts said state away from you to
the degree that you can't hack around it and you wind up forcibly deleting
resources if you try to use it like a configuration management framework. With
CF Custom Resources you can add all sorts of other stuff and that's pretty
cool at least.

------
webwanderings
This is basically the "on-prem/hybrid cloud" business which everyone's after
(IBM, Google, AWS, MS). The market seems to be pushing towards this model. The
old businesses were naturally not moving to "cloud" so the cloud wants to come
to their house (which makes sense).

------
rmbryan
The next-to-last line cracked me up: "The costs can be much lower… or much
higher!"

~~~
duxup
The cost of shiny new tech. "Hey look at all these cool things you could / can
do that would lower costs and do other cool things."

Weeks or months later you're still learning about how you do the thing that
will maybe lower costs and do cool things ....

~~~
SteveNuts
Meanwhile the market and mindshare has moved on to the next shiny new tech.

~~~
geezerjay
Kubernetes has been here for a while. I'm sure we can agree that it ceased to
be a candidate for a meteoric fad for a while now.

~~~
imtringued
I've probably wasted more than three weeks installing kubernetes on 3 nodes.
I've tried Racher, RKE, kube-admin, kubespray and doing everything myself. I
always failed and the way it failed was completely opaque and I didn't feel
like I could even understand what was going wrong.

~~~
scarejunba
Don't install k8s. Use it managed. It's best that way. Too many moving parts
for a small op to manage.

------
segmondy
I use to enjoy having a cheap desktop under my table or in a closet serving
traffic to people across the internet. Computers have gotten faster, software
has gotten better, network has gotten faster, things have gotten cheaper.

Sadly instead of seeing more of these, most of these are now being outsourced
to cloud providers, we have bought and drank the kool-aid that they can do it
better and cheap. Which is not true, perhaps this is why it's so hard to find
good unix and network admins these days.

~~~
Daishiman
As an ex big iron commercial UNIX sysadmin, let me tell you; they can do it
_way_ better than most sysadmins.

Developing IT policies for physical security, server security, update
policies, purchasing, wiring and so on takes a huge amount of knowledge, and
the possibility that a middle-of-the-road sysadmin is not only competent at,
but excels, in all those areas is literally zero.

Cloud providers have world-class experts in each area whose sole
responsibility is providing for their specific area. It's economies of scale
at its best. It doesn't mean that it's always the best choice, but having to
have someone do full-time server maintenance at no additional value to
yourself or customers is just a huge drag.

~~~
luckylion
> As an ex big iron commercial UNIX sysadmin, let me tell you; they can do it
> way better than most sysadmins.

I'm pretty sure that is true. However, the parent said "they can do it better
_and cheap_ ", and cheap it isn't.

------
peterwwillis
Some thoughts:

1) Do you really need to invest a million dollars in bin-packing containerized
stateless microservices?

2) You don't have to use K8s to get the benefits of rolling your own servers.
In fact, I'd argue you should do the latter well before you do the former.

3) Definitely hire someone who has done it before. You will save so much time
and money your head will spin.

4) Do not just build a rack full of random commodity gear. Make sure it is
suited specifically for your purposes, and then weigh the cost of service
contracts and managed colo against a $100k+ cage monkey on call 24/7.

5) Do not fall for the "We've got <insert tech hype>, we don't need redundant
hardware!" lie. The more parts of your system rely on lots of hardware, the
more fragile your system becomes. Distributed decentralized services become a
PITA when the underlying gear is flaky, and centralized services require it.
Do not underestimate the shittiness of your colo; always design for the most
redundancy you can get for what you have. If you can run dual power, do it.
Dual network stacks, do it. Redundant disks, do it. Remote management, do it.
Outside modem to a management port on the router, do it. Always be postponing
entropy. Later, when you become a FAANG (yeah right) you'll have the time and
money to automate away _some_ of the redundancy issues.

6) It's sometimes harder to upgrade disk or bandwidth on an existing machine
than it is to just buy another machine, but the more machines you have, the
more problems (and overhead costs), and there is an upward limit on the
scaling of most services without rebuilding them. So buy big on local disk and
network, and buy new machines to expand cpu and memory. The redundancy and
performance issues inherent to SAN/NAS often make local disks a better choice,
as they are unlikely to impact your whole network at once, you can fix/upgrade
them piecemeal, and they are less difficult to manage/operate.

7) Don't DIY critical components such as data replication; it's surprisingly
hard. In fact, don't even believe vendors if they claim they can solve X for
you with some magic software or hardware. Get them to show you a live demo on
your own network.

8) Don't forget that in 3 years you'll be replacing it all.

------
acd
There is also kubespray Ansible provisioning of Kubernetes, kops and kubeadm.

Rolling Kubernetes on your own is quite hard though especially the networking
part so there should be a market for a company helping out with provisioning.

Using public cloud can be very expensive for some work loads which were not
written for the cloud due to higher resource usage. It is still quite hard to
forecast cost of the cloud due to a lot of moving parts and micro billing for
each item.

With traditional service providers it can be easier to budget for the service
expenses.

~~~
ljm
From my limited experience, you'd save a lot on the cloud if you replaced the
default IngressController. Google Cloud, for instance, will spin up a fresh
load balancer for every service by default and those cost money.

Of course, it's not an easy task as a beginner to use nginx or traefik in its
place, and to handle the complexity of that deploy.

~~~
YawningAngel
Approximately 100% of the GKE users I know about use a reverse proxy behind
the GCE Ingress. FWIW I'm not especially ops-y and configuring ambassador took
me about a day.

------
rjzzleep
isn't this what they all sell nowadays? A prettier UI on top of kubernetes?

\- Kubermatic by loodse

\- rancheros

\- even mesosphere is kinda that now, except the first two are opensource, but
definitely not the only solutions

I personally deploy kubespray here and there, but I would still recommend
smartos for normal humans, unless they want to setup an elk + kafka cluster or
whatever else you might want to do.

~~~
jskaggz
upvoted for triton+smartos - hear hear

------
xrisk
The article seems to focus on K8s with reference to micro-services. How well
does K8s do if you're running a monolith?

~~~
mfatica
Using kubernetes without microservices is like using hadoop on a 1000 row csv
file

~~~
humbleMouse
I don't know if that's a fair comparison. If you deploy a monolith you have to
set up a custom multi data center deploy and manage all the load
balancing/disaster recovery/volume management(databases) yourself. Kubernetes
isn't that hard to setup and gives you out of box solutions to all that.

It also gives more flexibility on adding additional services around your
monolith. Elk stack, kafka, that kind of stuff. Also gives you a standard api
to deploy against. I don't think your metaphor holds up.

------
haolez
I’ve done this for small projects using AWS and Scaleway.

The setup is simple and it just works, but the maintenance and security
aspects were more worrying. One time I found out that one of my nodes failed
to automatically apply security patches from my distro. I couldn’t detect it
sooner because my monitoring was lacking. The overhead escalates fast.

If I start an effort like this again today, I would make sure to get
monitoring right from day 1. Possibly a combination of Prometheus with
Grafana, or something along the lines of ElasticSearch APM.

------
huangbong
So what is Gravity? kops/kubespray without an MIT license?

~~~
alexk
Without MIT.... but with Apache2 ! :)

[https://github.com/gravitational/gravity](https://github.com/gravitational/gravity)

In all seriousness though, gravity is an open core toolkit to package and
deliver complicated sets of micro services in air gapped and restricted
environments as a virtual installable appliance.

It takes care of both application and Kubernetes lifecycle, software
distribution and licensing workflows.

------
shereadsthenews
If you get to build your own servers why torment yourself with a 2-socket
solution? It's just harder to program with no benefits. A single-socket UMA
solution will be more than large enough single nodes and a lot easier to
program and manage.

------
vuln
[https://github.com/aquasecurity/kube-
hunter](https://github.com/aquasecurity/kube-hunter)

[https://labs.mwrinfosecurity.com/blog/attacking-
kubernetes-t...](https://labs.mwrinfosecurity.com/blog/attacking-kubernetes-
through-kubelet/)

------
momofarm
I once heard a story that the reason some people would build their own server
is they don't trust those service provider on the internet. They feared that
the data store there are quite easily breached.

Thanks to these guys, they just lower the unemployment rate a little more!

------
MuffinFlavored
> A “starter pack” 15amp cabinet with a gigabit connection can be rented for
> as little as $400 per month.

Link?...

~~~
mlrtime
Yeah, this isn't possible for a entire rack, maybe 1U if it includes network
and power if it is in any decent datacenter.

~~~
MuffinFlavored
How much would a 48U rack like described in the article cost a month? I would
guess in the thousands if not ten thousands....

~~~
mlrtime
It mostly goes by power not rack units, if you have 48U and only 15amps,
you're not going to fill it up. 15A is not a lot.

~~~
snuxoll
15A is more than you think these days, still probably need more for a full
rack but a modern server can draw around 150W under full load these days.
Hell, my entire rack at home hangs around ~.3A during normal usage (2x Dell
R210 II's, Dell R320, Dell R520, Lenovo SA120 with 4 bays filled, Juniper
EX2200-48T, Mikrotik CRS317).

~~~
darkwater
Can I ask you what do you use all those servers for? Looks like a not-so-small
office rack config

~~~
snuxoll
FreeNAS for media and VM storage on the R320+SA120.

The R520 runs Proxmox and hosts whatever stuff I’m playing with at any given
time along with Kubernetes, FreeIPA, Graylog2, probably my mailserver again
soon, etc.

One R210 II runs Sophos XG to handle router+security duties, the other runs
Windows Server Essentials 2016 for AD and NPS.

The CRS317 switch provides 10Gb networking for storage on a couple of servers,
and everything else is connected to the EX2200.

------
fcantournet
The problem is not running kubernetes on "bare-metal". That's the somewhat
easy and cheap part. The problem is the supply chain to spec, purchase, rack,
validate, provision and keep up to date your servers, their drivers and
firmware. It's your capacity planning. It's keeping up with the rate of
hardware evolution. What happens when you have to go from 10G to 25G
networking ? Oh and you also need a solid scalable L4 loadbalancing solution
(facebook has a nice opensource dataplane based on ebpf)

You can build / assemble from open-source bits most of the classic IaaS bits
(compute/storage/networkLB). It's doable, the open-source solutions are pretty
good. But unless you have 10k compute nodes how can you bear the cost of
having the expert knowledge necessary to debug this ?

Hardware sucks and is always on fire. Once you take this into account, the
economic equation changes dramatically.

------
cagenut
I used to specialize in 2 - 10 rack build outs, but its been nearly a decade
since those gigs evaporated. I wonder if k8s creates a new niche for it.

~~~
nineteen999
These kind of gigs are definitely nowhere near as common as they once were,
but they haven't evaporated completely - I work in emergency services
infrastructure in the process of a 10 rack build and that is something I don't
see changing in the next 10 years, where we will do a refresh possibly once,
or twice more.

The complexity in our case is in the backend transmission network and unless
we control everything (hardware, network) end-to-end as much as possible, we
can't guarantee to getting close meet our SLA's for our customer. They
wouldn't allow us to run our system in the cloud even if we wanted to.

------
NickBusey
I am curious to see how they recommend handling storage on bare metal k8s.

~~~
q3k
I highly recommend Rook [1], which is based on Ceph, provides
PersistentVolumes to k8s workloads and is also running on k8s itself.

[1] - [https://rook.io/](https://rook.io/)

~~~
the-rc
I worked on Colossus at Google and Ceph is the closest thing out there.
Gregory Farnum gave a great talk about it at the Open Source Summit 2017.

I heard from someone at a large company, though, that it's not getting a lot
of love from Red Hat nowadays, even if you're a large paying customer. Now I'm
curious.

~~~
noahdesu
I'm working on Rook+Ceph at Red Hat. Rook 1.0 was just released last week
which added support for the very latest Nautilus release of Ceph.

~~~
the-rc
Great to hear. I couldn't get more details from them, but it was something
about both rebalancing and data recovery. They picked SwiftStack, which is at
least in part open source. Maybe by now you have figured who that company is.

My team is still very interested in Ceph and Rook, for the record.

------
prisar
link is broken

~~~
numlock86
Yup, 404. Even manually going to their blog I can't find the entry anymore.
Bookmarked because it was supposed to be an ongoing series with multiple
parts. Weird provider. Probably best to avoid.

~~~
compuguy
I have a feeling they deleted the blog post. There is an archived copy here:
[https://web.archive.org/web/20190508123542/https://gravitati...](https://web.archive.org/web/20190508123542/https://gravitational.com/blog/aws_vs_colocation/)

------
a012
You can colo your server and roll your own k8s clusters. But you can't afford
the luxury of high throughput networking, EBS, etc., and don't forget HA
options (Multi-AZ) for your clusters.

~~~
linuxdude314
Untrue. I purchased a 48 port 10GbE Arista 7148 the other month on eBay for
$320.

~~~
ericd
Nice! OT, but have you seen any reasonably priced RJ45 10GBe out there?
Preferably without screaming fans :-)

~~~
olavgg
You really do not want RJ45 / 10GBase-T for higher network speeds. They
consume A LOT more power and has significantly higher latency than SFP+/SFP28
and fiber cables.

I have a Brocade ICX 6450 with 4 10G ports in my homelab, replaced the fan
with a whisper silent one. Cannot hear and is right next to me. Brocade ICX
6610 has 16 10G ports and 2 40G ports, not super quiet, but not that noisy
either.

~~~
ericd
The only issue is that I have 3 10 GbE devices, and all are 10GBase-T, so that
makes the SFP option less attractive.

That said, I'm very much a networking noob, so maybe there's a way to make
that work with some sort of copper to SFP bridge or something. Not sure if
that would kill the advantages, even if it existed.

Good idea on replacing the fan rather than hunting for a quiet router!

