
Kubernetes 1.9: Apps Workloads GA and Expanded Ecosystem - el_duderino
http://blog.kubernetes.io/2017/12/kubernetes-19-workloads-expanded-ecosystem.html
======
slap_shot
I cannot give enough thanks to everyone that works so hard on Kubernetes. The
speed of these releases, community support, industry adoption, feature set,
and ease of use are beyond anything I could have asked for.

I started a startup about 18 months ago and our product requires a massive
amount data pipelining. The infrastructure we have running on Kubernetes is
beyond what a small group should be able to deploy and maintain by ourselves,
but Kubernetes makes it not only possible, but enjoyable (really).

To everyone involved (and there are a lot): Thank you SO much. I can't wait to
see what 2018 looks like for Kubernetes.

~~~
ericand
Great post. Could you describe further what you mean by data pipelining? Is
this something that you could also use Spark, Kafka for? I'd be interested to
know if Kubernetes would work for me as well.

~~~
lobster_johnson
One promising piece of tech is Pachyderm [1], which runs natively on
Kubernetes. It allows you to set up pipelines of batch jobs (no streaming yet,
as far as I know) that operate on immutable, versioned data.

There are some gotchas, though. For one [2], it requires read-write access to
your storage buckets (!), and so on GKE you have to modify your nodepool's
access scopes to give the whole node read-write access to _all_ your buckets.
Not is this terrible security practice, but access scopes are deprecated on
Google Cloud. You should always grant access through service accounts, not
scopes. We spent some time fighting this, as there's literally no way to make
Pachyderm use a service account.

There are some other niggling issues, and the Pachyderm team has been less
than enthuastic in addressing them (e.g. see [3], which has zero activity
despite being absolutely _crucial_ to run in production), but it does seem
quite promising. The architecture, at least, is sound.

[1] [http://pachyderm.io](http://pachyderm.io)

[2]
[https://github.com/pachyderm/pachyderm/issues/2538](https://github.com/pachyderm/pachyderm/issues/2538)

[3]
[https://github.com/pachyderm/pachyderm/issues/2537](https://github.com/pachyderm/pachyderm/issues/2537)

~~~
jdoliner
Hey lobster_johnson, Pachyderm CEO here. Thanks for the shout out and we're
glad you feel Pachyderm is promising for these use cases. I just wanted to
clarify / respond to a few things.

> no streaming yet, as far as I know

Pachyderm actually does do streaming, there's nothing in the API about it
because it all happens automatically. This is one of the huge advantages of
immutable data, the system can automatically detect if it's already done a
computation by looking at a hash of the input and the code. The system will
never do the same computation twice, all pipelines are streaming by default.

Sorry we've been less than enthusiastic in addressing these niggling issues.
The truth is it's pretty hard to be enthusiastic in addressing issues for
specific deployment scenarios. AFAIK the Google Cloud Storage library we're
using doesn't support service accounts, the whole library has been deprecated,
the second GCS library we've had deprecated underneath us. So upgrading that
is a bit of a pain but the bigger pain is that now users with existing Google
clusters will have their clusters break under them unless we can figure out a
way to get scopes and service accounts to both work. And we have every reason
to believe that all of this will be deprecated within the next 6 months and
we'll be on to our 4th GCS library. And that's only one deployment scenario,
we're doing the same thing for AWS, Azure and a host of local deployment
options.

Anyways, these are our problems to solve, and we will solve them. We're
thrilled that you want to use Pachyderm and we want to make that experience as
smooth as possible. We just need to ask for a bit of patience regarding these
deployment scenarios.

~~~
dsnuh
Do you guys have any deployment docs or Kubernetes deployment for on-prem
Pachyderm? I tried setting it up a while back on our cluster and it was pretty
challenging.

~~~
jdoliner
We do:
[http://docs.pachyderm.io/en/latest/deployment/on_premises.ht...](http://docs.pachyderm.io/en/latest/deployment/on_premises.html)

Hopefully that should work for you. Deploying on prem can mean a lot of
different things so some paths will work better than others. The most common
on prem deployments we see are back by Minio.

~~~
dsnuh
Great, thanks for the info. Admittedly, this was a while back and I know docs
change quickly in this space.

I agree that on-prem can be a challenge. You really have to know your stuff,
and your environment. We run it in production and it has been quite a journey.

------
linsomniac
I really wanted to like Kubernetes, but I couldn't figure it out. I'm a pretty
experienced Linux admin, and despite days of reading all the sources of
documentation I could find I couldn't get the networking setup to where I
thought it should be.

I don't know if that was a problem of my mental model being wrong and the
documentation not educating me on the correct model, or the documentation of
the CNI, Flannel/whatever just being too sparse, or what.

I started with the Google Ubuntu packages, and was able to get containers up
fairly quickly, but being able to access them from anything other than the
host machine I just couldn't figure out. Asking for help on IRC wasn't useful
(aside: what's the deal with IRC channels with tons of users but no messages
for days?)

I've since wiped that cluster and tried the Ubuntu Kubernetes installation
using conjure-up, it seems like it did the right thing but I'm not sure yet
how to get to a cluster from a single machine.

Trying to decide if I pursue Kubernetes more, try Dokku/Flynn/Deis (which I
just saw referenced in a previous HN discussion) because they look really
promising...

I had hoped it would be the ganeti of containerization, but I had far fewer
problems with ganeti.

~~~
slap_shot
FWIW, if you create a GKE cluster and create an nginx deployment:

kubectl apply -f [https://k8s.io/docs/tasks/run-
application/deployment.yaml](https://k8s.io/docs/tasks/run-
application/deployment.yaml)

And simply create a Load Balancer:

[https://kubernetes.io/docs/tasks/access-application-
cluster/...](https://kubernetes.io/docs/tasks/access-application-
cluster/create-external-load-balancer/)

You have nginx exposed via an external IP address in about two minutes. Once
you're comfortable with that, you can start working in Ingress and such to get
more sophisticated.

I agree the IRC/Slack channel communication isn't the best. I prefer the
Kubernetes User Group. You can usually get a great response in less than a
day. I've used it a lot.

~~~
linkmotif
I was going to suggest this. Create a GKE cluster and play with it for a while
to see what you’re supposed to have. Then you can create one yourself from
scratch once you understand the target.

~~~
merb
well that's what I tried. and basically I get:

> gcloud container clusters get-credentials cluster-1 ...trimmed

> Fetching cluster endpoint and auth data.

> ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=404,
> message=The resource "projects/infinite-alcove-164114/zones/europe-
> west3-c/clusters/cluster-1" was not found.

> Could not find [cluster-1] in [europe-west3-c].

> Did you mean [cluster-1] in [europe-west3]?

note that neither gcloud alpha, nor glcoud beta nor europe-west3-(a|b|c)
worked.

prolly a bug when using `Regional (Beta)`.

~~~
swozey
I was in the Regional Alpha so I'm familiar with this but it's been awhile
since I had the issue. The fix I believe was;;

export CLOUDSDK_CONTAINER_USE_V1_API_CLIENT=false

Note; It's in beta now and I believe --region was added to the recent gcloud
command but I could be incorrect. If you have problems msg me on k8s slack,
mikej, I can run through it with you.

I don't believe you need this at all now that it's beta, but I set up my beta
regional cluster a few weeks ago.

Also, re: IRC, there are 27k people on the k8s slack. I've used k8s for years
and never gone into irc. I just don't use irc anymore.

~~~
merb
thanks for your help, but I guess I'm fine now (just trying out stuff) well
the ENV probably would have helped but there is also a config: `gcloud beta
config set container/use_v1_api_client false` and I needed the --region (I
always used --zone, since that was what the GUI suggested).

------
jcastro
Kubecon just wrapped up and all the videos of the talk are up, lots of good
content here: [https://www.youtube.com/channel/UCvqbFHwN-
nwalWPjPUKpvTA/vid...](https://www.youtube.com/channel/UCvqbFHwN-
nwalWPjPUKpvTA/videos)

------
cirowrc
I'm particularly excited about the advance of the Container Storage Interface
(CSI).

Having developed two custom docker volume plugins I wonder whether Docker is
going to adopt it for its cluster offering (swarm mode) or leave it aside as
it did with CNI (real question). Anybody knows about it?

~~~
dward
It's to be seen but owners of the spec represent mesos, kubernetes, docker and
cloud foundry so it seems like Docker is open to the idea. source:

[https://github.com/container-storage-
interface/spec/blob/mas...](https://github.com/container-storage-
interface/spec/blob/master/OWNERS)

~~~
cirowrc
oh, I didn't look at that. Thanks!

------
eduren
Its really fascinating to see Windows support be added throughout the
container ecosystem. Does anybody have experience actually moving legacy
Windows applications into a containerized infrastructure?

------
reacharavindh
Acknowledging the risk of sounding like "why can't I use rsync scripts instead
of this Dropbox?",

Can someone explain to me like I just got out of high school, why do I need to
complicate my setup with Kubernetes/Docker Swarm/Deis/Dolly/Flynn instead of
spinning up&down containers using scripted LXC commands?

Someone was talking about using Kubernetes for data pipelines. For such a
case, why is Kubernetes better than a script that spins up pre-made container
images?

Not being snark at all. Genuinely asking a "why?". Appreciate any educating
responses!

~~~
nielsole
Kubernetes is not supposed to be run on a single node, but on dozens or
hundreds of machines. Scheduling, networking, and authorizing heterogeneous
highly available workloads by possibly different teams + managing ingress +
managing networked storage is what Kubernetes is built for.

I wouldn't know how to do this with LXC

~~~
colek42
The API for k8s is great. We mostly run large (50 core 500GB memory) machines
co-located at client sites. We have hundreds of these to manage, but they are
separated geographically. K8's API/Infrastructure would be great to
manage/monitor them.

~~~
nielsole
Sounds like an interesting problem. What is your use-case? Caching servers in
ISPs? Kubernetes kind of assumes that any node can fail any time and
reschedules workloads. This assumption might not be in line with your
requirements.

------
odammit
Keep up the good work. You guys/gals make my job easy. Infinity thanks, I owe
you a drink (or three).

------
thomasahle
I have a simple service I would like to put in production. It is a simple ML
code + Rest API + testing sandwich.

I only have one (virtual) server to run this on, but I still think Kubernetes
looks really helpful, allowing me to throw new versions in test and update the
production server without downtime.

Minikube is really nice to use, and everything seems to work well, however
everywhere I read, I am told minikube shouldn't be used in production. Can
anybody tell me if there's a reason for this? And if so what is the
recommended setup for a single-node Kubernetes?

(Side question: Why does minikube run everything in a virtual machine? Doesn't
that hurt performance?)

~~~
rckrd
Minikube runs in a virtual machine because its meant to be a cross platform
development environment, and linux containers aren't natively supported on
macOS or windows.

Minikube does have a --none driver, which run on bare metal and can be used to
provision a cluster on virtualized hardware, such as a CI system or a GCE
instance.

Why not minikube in production? Upgrades, downgrades, clean-up and
maintenance. We don't offer any real guarantees on these lifecycle operations
unlike a more robust setup like GKE.

Disclaimer: I work on minikube

