
GPUs as a service with Kubernetes Engine are now generally available - rey12rey
https://cloudplatform.googleblog.com/2018/06/GPUs-as-a-service-with-Kubernetes-Engine-are-now-generally-available.html
======
minimaxir
With the new discounts on preemptible GPUs
([https://cloudplatform.googleblog.com/2018/06/Introducing-
imp...](https://cloudplatform.googleblog.com/2018/06/Introducing-improved-
pricing-for-Preemptible-GPUs.html)), the economics of quickly spinning up a
fleet of GPUs with Kubernetes for a quick parallelizable ML task become very
_interesting_. (assuming that Google allows enough GPU quota for a fleet of
GPUs for nonenterprise users anyways)

What I want to use Kubernetes + instant-GPU-fleet for deep learning
hyperparameter grid searching. (i.e. spin up a lot of preemptible GPUs; for
each parameter config, train the model on a single GPU in parallel for linear
scanning speed scaling).

Kubeflow
([https://github.com/kubeflow/kubeflow](https://github.com/kubeflow/kubeflow))
is _close_ to this functionality, but not quite there yet in user-
friendlyness. (you have to package everything in a huge Docker container and
launch jobs from the CLI; ideally what I want to do is to spawn containers and
start training directly from the JupyterHub notebook on the master node)

~~~
boulos
Disclosure: I work on Google Cloud (and helped launch Preemptible VMs).

> (assuming that Google allows enough GPU quota for a fleet of GPUs for
> nonenterprise users anyways

This is actually why we have separate preemptible quota [1], which we grant
more freely. You can't stock out our full-price customers, so we're happy to
let you spin up tons of V100s (and as of this morning TPUs!).

[1]
[https://cloud.google.com/compute/quotas#quotas_for_preemptib...](https://cloud.google.com/compute/quotas#quotas_for_preemptible_resources)

~~~
minimaxir
I was wondering why there was a separate quota. That makes much more sense!

------
kozikow
We have been using GPUs with GKE for a while. At some point, we used 20+ GPUs
in an production workflow without any problems.

Everything generally works well, maybe except the initial phase when some
containers won't port well from nvidia-docker-compose due to problems with
Cuda libraries. Ideally, you need to match the version of Cuda everywhere.

My dev setup for quick experimentation with GPU docker container on GKE:
[https://tensorflight.blog/2018/02/23/dev-environment-for-
gke...](https://tensorflight.blog/2018/02/23/dev-environment-for-gke/) .

------
rjain15
Are these GPUs on bare metal or virtualized GPUs on VMs

~~~
wmf
Everything in GCE/GKE is running inside VMs, but when you attach a GPU you get
the whole GPU (via PCI passthrough).

------
fpgaminer
On a related note, last week I took a dive into Kubernetes on gcloud for a
personal project and came out with some interesting knowledge.

First off, this was for a _small_ personal project. Something that I
originally intended to run on an f1-micro. I decided to check out Kubernetes
mostly to learn, but also to see if it could offer a more maintainable setup
(typically I just write a mess of provisioning shell scripts and cloud-init
scripts to bring up servers. A bit of a mess to maintain long-term). So
basically, I was using Kubernetes "wrong"; its target audience is workloads
that intend to use a fleet of machines. But I trudged forward anyway.

This resulted in the first problem. You can't spin up a Kubernetes cluster
with just one f1-micro. Google won't let you. I could either do a 3x f1-micro
cluster, which would be ~$12/month, or 1x f1-small, should would be about the
same price. Contrast with my original plan of a single f1-micro, which is
~$4/mo. Hmm...

Well after playing around I discovered a "bug" in gcloud's tooling. You can
spin up a 3x f1-micro cluster. Then add a node pool with just one or two
f1-micros in it. Then kill the original node pool. This is all allowed, and
results in a final cluster with only one or two nodes in it. Nice. "I know
what I'm doing, Google is just being a dick!" I thought. I could still spin up
Pods on the cluster, no problem.

Then the second discovery. The Kubernetes console was reporting unschedulable
system pods. Turns out, Google has a reason for these minimums.

All the system pods, the pods that help orchestrate the show, provide logging,
metrics, etc; they take up a whopping 700 MB of RAM and a good chunk of CPU as
well. I was a bit shocked.

I'm sure most developers are just shrugging right now. 700 MB is nothing these
days. But remember, my original plan was a single f1-micro which only has 700
MB. This is a personal project, so every bit counts to keep long-term costs
down. And, in deep contrast to Kubernetes' gluttony, the app I intend to run
on this system only uses ~32 MB under usual loads. That's right; 32 MB. It's a
webapp running on a Rust web server.

So hopefully you can imagine my shock at Kube's RAM usage. As I dug in I
discovered most all of the services are built using Go. No wonder. I love Go,
but it's a memory hog. My mind started imagining what the RAM usage would be
like if all these services had been written in Rust...

Point being, 700MB exceeds what one f1-micro can handle. And it exceeds what
two f1-micros can handle, because a lot of those services are per-node
services. Combined with the base RAM usage of the (surprisingly) bloated
Container OS that Google runs on the nodes in the cluster. (Spinning up a
Container Optimized image on a GCE instance I measured something like 500 or
more RAM usage on a bare install...). And hence why Google won't let you spin
up a cluster of less than three f1-micros. You can, however, use a single
f1-small since it has 1.7MB of RAM in a single instance.

At this point I resigned myself to just having a cluster of three nodes.
_shrug_ the expense of learning, I suppose. And perhaps I could re-use the
cluster to host other small projects.

It was at this point I hit another road block. To expose services running on
your cluster you, more or less, have to use the LoadBalance feature of Kube.
It's convenient; a single line configuration option and _BAM_ your service is
live on the internet with a static IP. Except for one small detail that Google
barely mentions. Their load balancers cost, at minimum, $18/mo. That's more
than my whole cluster! And my original budget was $4/mo...

There are workarounds, but they are ugly. NodePort doesn't work because you
can't expose port 80 or 443 with it. You can use host networking or a host
port; something like that. Basically build your own load balancer Pod, assign
it to a specific node, and manually assign a static IP to that node. (hand
waving and roughly recalling the awkward solution I conjured). But it requires
manual intervention every time you want to perform maintenance on your
cluster. The opposite of what I was trying to achieve.

To sum it all up; you need to be willing to spend _at least_ $30/mo on any
given Kubernetes based project.

So I gave up on that idea. For now I've fallen on provisioning shell scripts
again. Though I've shoved my application into containers and am using Docker-
Compose to at least make it a little nicer deployment.

I also took a few hours to run through the Kubernetes The Hard Way tutorial;
thirsty for a deeper understanding of how Kube works under the hood. It's a
fascinating system. But after working through the tutorial it became _very_
clear that Kube isn't something you'd want to run yourself. Not unless you
have a dedicate sys/devops to manage it.

Also interesting is that Kube falls over when you need to run a relational
database. The impedance mismatch is too great. Kube is designed for services
that can be spread across many disposable nodes. Not something Postgres/etc
are designed well for. So the current recommendation, if you're using a
relational database, is to just use traditional provisional or a managed
service like Cloud SQL.

P.S. For as long as I've used Google Cloud, I have and continue to be
eternally frustrated by the service. It's a complete mess. Last week while
doing this exploration I ran into the problem where half my nodes were
zombies; never starting and taking an hour to finally die. I had to switch
regions to "fix" the problem. Gcloud provides _no_ support by default, even
though I'm a paying customer. Rather, you have to pay _more_ for the
_privilege_ of talking to someone about problems with the services you're
already paying for. Incredibly frustrating, but that's Google's typical M.O.

Not to mention 1) poor, out-dated documentation; 2) The gcloud CLI is
abysmally slow to tab-complete even simple stuff; 3) The web console is made
of molasses and eats an ungodly amount of CPU resources just sitting and doing
nothing; 4) little to no way to restrict billing; the best you can do for most
services is just set up an alert and pray that you're awake if shit hits the
fan. 5) I'm not sure I can recall a single gcloud command I've run lately that
hasn't spewed at least one warning or deprecation notice at me.

~~~
puzzle
Were the system pods using all that memory or just reserving it? It's not
straightforward to scale them, because the node might run just your tiny Rust
server or 20 high traffic web apps. You don't want the log agent to keel over
just because of the latter. GKE and many other Kubernetes deployments use
something called addon-resizer to determine CPU and RAM given to cluster
services. The problem is that, typically, it scales based on node count and
the settings are usually conservative on the lower end, i.e. your case of a
single node. I think it assumes clusters are all at least 10/15 nodes. On a
test cluster, I see the metrics server using only 16MB of RAM, but it requests
104MB. Ironically, the autoscaling nanny in the same pod uses another 8MB.

This is a known issue that is not easy to solve in the general case. I think
Tim Hockin ran a conversation about how to autoscale on the very low end at
last year's Kubecon, with people like you in mind. The other use case he
brought up is how to set up services in a Minikube cluster that might be
running in a 2GB VM.

~~~
fpgaminer
> Were the system pods using all that memory or just reserving it?

700MB was the sum of all the requested minimum RAM for all those service pods.
So yeah, you're probably right that they're ceilings of sorts. Still it's a
bit crazy to see a logging service, who's job is merely to haul logs off to a
different server, requesting 200MB.

I'm also bewildered by Container Optimized OS's memory consumption. IIRC it
was 500MB+ bare; doing nothing. As reported by top. I forget which, but I
stood up either Debian Stretch or Ubuntu 18.04 and it was only ~200MB with
Docker installed.

~~~
puzzle
The logging service numbers are easily explained: the remote server might have
transient failures, so the forwarder will cache stuff in memory (the
alternative is to just stop reading the logs, but then you risk losing
messages if the pod dies in the meantime). You complained about Go, but it
doesn't help that the fluentd agent is written in Ruby. There's a new Go
rewrite of it, but I don't think GKE or others use it yet.

Was COS a standalone GCE instance or GKE? In the latter case, memory will be
used by the usual suspects: fluentd, kubelet, kube-proxy, docker, node-
problem-detector. For both, there are also a few Google daemons in Python
(ugh).

------
erikb
How about making vanilla k8s usable on-premise first...

~~~
ethanwillis
I have a lot of bare metal servers in a rack at my home.. Tried getting kube
running well last night actually.. Well if you're using Ubuntu server 18.04
GOOD LUCK. There's plenty of issues in the various repositories around kube. I
eventually just tried using Canonical's conjure-up tool to install kube.

You'd think that Canonical's tools would work on their LTS right? wrong..
absolutely wrong. It couldn't even do an openstack install either. Absolutely
abysmal.

~~~
erikb
Yep and there are altogether problems that come up after you got it running.
E.g. you probably want to make use of PVCs, but there is no stable dynamic-
storage provider yet. Google isn't even working on it as far as github is
showing.

