Is the answer the same as the question "who needs a microservice architecture"?
* self healing: when you create a deployment/replica set. it will be maintained at all cost, so if the app has a memory leak or anything goes wrong, it will be contained and kept up and running
* Rolling update: even when you run 5 frontends, it is a pain to use capistrano or other tools to just update at git repo. it is literally a one liner in Kubernetes. If you use CI/CD the setup is just a few lines in any Jenkins/Gitlab/Travis...
* Service discovery: the combination of ENV and predictable DNS endpoints is just awesome
* Ecosystem: PaaS, Serverless... Many of the new world infra is built on K8s, so it is a door to the next gen, whether you know you will use it or not.
As for Micro Service Architecture, just starting with the web frontend and a couple of lightly dockerize middleware makes it sooooo simple that you instantly want to get more out of it.
As the overhead of running K8s vs. set of servers is relatively low especially at small scale, it is definitely worth looking at. Happy to do a run through with you and show you how the deployment of a tiered app works as a demo, ping me on @SaMnCo_23 if interested.
* Run Ceph in separate nodes and connect it to the cluster. With Juju, you can do that from the bundle, as Ceph is also a supported workloads. This gives you scale for storage
* Run Ceph within the cluster with a Helm chart. We see that for openstack-helm for example. Also gives you scale, but the lack of device discovery makes it less interesting in my opinion
* Run an NFS server, plain easy but not very scalable.
* Use hostpath, which is the default but doesn't get you scale.
Ceph is object-based SDS solution designed to take servers with local drives and create a SAN out of them. In order to do this, they take each LUN (Ceph volume) and scatter the data across all nodes in the cluster. They do not assume that applications will run on these servers themselves... they assume compute is elsewhere, like a traditional SAN. The goal is to replace a SAN with servers, not create a converged platform. Also, Ceph was designed during an age where an Intel server did NOT have tier-1 capacity (8 - 20 TB), which is why they shard a volume across so many servers.
This causes a problem for modern applications like Cassandra, Mongo, Kafka etc, where they like to scaleout themselves and want a converged system, where data is not scattered, but on the node where an instance of that cluster runs. Ceph also disrupts (undo) the HA capabilities that these scaleout applications have (For example, a Cassandra instances data will not be on a node on which it thinks it is).
Since GlusterFS /can use/ NFSv4 as a client, it should work with the stuff @samco_23 uses
Canonical at this stage only supports Ceph commercially, but it doesn't mean GlusterFS is not a good option. I haven't tried it myself, so can't tell.
Anyone?
- Use the local node storage although this is very simple but can get complicated on more complex installations
- Connect to your existing storage solution using ISCSI or NFS
- Running your own distributed storage solution on top of Kubernetes for Kubernetes. e.g. https://github.com/rook/rook
In general and in my opinion, it is always better to run with k8s since you have, for the stateless pieces, cluster awareness. So there is never a downside to it, especially as the control plane is very lightweight for small clusters, and you can colocate many parts.
At this stage, you have no data. Normally, if you app depends on data, the pods will keep failing until data has been moved and is available (at which point they stabilize in an equilibrium state).
If you run Ceph in both environments, then you may want to use Ceph replication. If you use another storage layer, then it's also up to you to make sure the data is moved around.
rook should have support for that use case. However, it is still alpha as per the GitHub readme, so use at your own risk.
However, you may want to consider the data replication problem outside of the scope of k8s.
If the 2 sets of clusters are physically close to each other, you may want to just point the new apps to the old data and pull the switch from the first one.
Another option would be to run another beta feature in K8s called Federation, which allows to manage several cluster via a single control plane.
Sorry for the long comment, your question has a broad scope, and it's hard to answer without diving into details.
You might be interested in Deis v1 PAAS as a historical reference. Deis is a company that specializes in Kubernetes (was just bought by Microsoft). They have been in the container orchestration game since before Kubernetes was a kid (and even before containers were really en vogue.) Deis v1 PAAS is the ancestor of Deis Workflow (or v2) which is a product that runs solely on Kubernetes.
Workflow does not do distributed filesystems internally where PAAS v1 did. That is why I'm telling you about it. PAAS v1 had its own storage layer called deis-store, which is (was) essentially CephFS and RBD under the hood. They did the best they could to make sure you did not have to be a competent Ceph admin just to get it started, but as it happens you would be running Ceph and susceptible to all of the Ceph issues.
Distributed filesystems are complicated business.
Deis was running Ceph for internal purposes. Deis used the Store component to take care of log aggregation ("Logger"), container image storage ("Registry"), and Database backups. When Workflow was released, it was targeting Kubernetes and required PVC support (AWS S3, or GCE PD, or one of the other storage drivers).
It still handles Log Aggregation, Database Backups, and Image Storage, but it uses the platform services to do this in an agnostic way (that is, whatever way you have configured to enable PVC support in Kubernetes.)
The Ceph support provided by Deis v1 was never intended to be an end-user service, it was for internal platform support. I thought about using it for my own purposes but never got around to it. The punchline is this: porting your applications to Deis requires you to re-think the way they are built to support 12factor ideology. Porting your applications to Kubernetes requires no such thing... but it helps!
Also that distributed storage is a complicated problem, and if you undertake to solve it for yourself, you should not take it lightly. (OR do take it lightly, but with the understanding that you haven't given much rigor to that part.)
What was good advice for Deis v1 is still good advice for Kubernetes today. If you are building a cluster or distributed architecture to scale, you should really consider separating it into tiers or planes. In Deis v1, the advice was to have a control plane (etcd, database), storage plane (deis-store or Ceph), data plane (your application / worker nodes), and routing mesh plane (deis-router, traefik, or the front-end HTTP serving nodes.) All of those planes may require special attention to make them reliable and scalable.
In my opinion none of this has anything to do with AWS or Google, but those two providers have positioned themselves well to be the people that do work on solving those hard problems for you. I would certainly start experimenting with Rook, I had good experiences with deis-store and I've been looking for something to fill the void for me.
The first thing that anyone doing serious deployments needs is an image registry. For that to be HA hosted on a cluster, you need some kind of distributed filesystem.
But those PD/EBS solutions are pretty compelling and they're not going away.
If you have an existing SAN solution you can connect to it via fiber channel over iSCSI.
Kubernetes is so much more than just "planet scale". It encourages patterns and mindsets for efficient software delivery that can really pay dividends.
Here are some of my favorite things:
Cloud agnostic. Your team and business are not at the mercy of pricing, features or availability of a third party. You can run it on everything from a massive cluster on AWS to some cheap mini computers off ebay: https://hackernoon.com/diy-kubernetes-cluster-with-x86-stick... Moving between cloud-providers when they both run Kubernetes is fairly trivial. You can also run on multiple clouds at the same time. Kubernetes abstracts the infrastructure away. It's also really easy to run a single node cluster on your own machine for local development. Try doing that with AWS services in a reliable way.
Immutable infrastructure:
The fact that containers don't hold state FORCES you to develop your applications in a 12-factor pattern. Deploy images by tag forces you to create a pipeline that automates their builds. It also allows you to effortlessly roll-back. It's not an afterthought or something you need to glue together.
High availability:
Just define how many replicas of your service you want and k8s does the rest. If they crash so what. Not only will they be restarted automatically but they will automatically be distributed across your fleet for you. Node goes down? Who cares. It's self-healing.
Service discovery:
Just put a k8s service in front of your application replicas and everything is automatic. Nothing to install, simply refer to the stable dns service name and everything will be routed. Software agnostic.
Config Management:
Very easy to inject secrets and configs as env vars or mounted into the pod. No third party library or framework needed to leverage it.
Dev - Stage - Prod envs made easy:
The same container image can move through each env effortlessly and you can be sure there is no "artifact rot"
Extensible and open
You can run different container runtimes such as rkt or different pod networks and persistent storage options. There is not a single company trying to steer it in some way. Also recently with helm charts it's becoming very easy to "apt-get install" whatever you want on your cluster. Very powerful and portable.
It does take some time getting ramped up but once it clicks there is no turning back.
Another use case is in media for transcoding. It is not a trivial job to orchestrate transcoding at scale, and Kubernetes with or without GPUs is an excellent solution for that as it is trivial to setup a completely automated job queue.
Also another interesting field will eventually be HPC but there are some constraints about compute that K8s does not tick scheduling wise at this point in time. There is a pluggable scheduler in the works I think, and this will eventually help. Also the LXD example is a nice optimization but it would not replace the scheduler in any way.
[0]: https://medium.com/intuitionmachine/gpus-kubernetes-for-deep...
[1]: https://hackernoon.com/job-concurrency-in-kubernetes-lxd-and...
A bit OT, but I'd like to see how this works...
Ah, very cool - https://www.youtube.com/watch?v=wyY-lTmgb8c
It works, but the GPUs aren't very stable at 4x vs. a normal 16x.
The PSU is the Corsair AX1500i (1500W), with 10x lines for GPUs. It's robust on paper, didn't have any problem with just 4 plugged in.
But I must say... The T630 are very noisy compared to these, but so much more powerful #NotGoingBack
If you have a PSU that big then that probably isn't the problem. I thought you might be using the PSU that comes with those extender boxes and they usually are very puny (250 W or so).
Do you use it for gaming or for CUDA?
Do you run the 4 GPUs in the extender?
So many potential failure points in there. The sole use case is CUDA. Essentially I wanted a portable cluster with GPUs and that did the work for a couple of month. Now it's getting more serious so the switch to T630 makes sense, and I repurposed the NUCs into the control plane of the K8s cluster.
https://clustercompute.com/images/image4.jpg
Which was a lot of fun.
Do you have all the GPUs internal to the T630?
Any chance of a picture (of the guts)?
I'm seriously thinking of duplicating your effort.
Replicating is not very hard. You need a lightweight x86 machine for MAAS, which takes ~20min to install, one VLAN for the iDRAC (IPMI), another for networking that can connect to internet, and off you go. You can also enable KVM power management in MAAS to run the Juju control plane in VMs and save a box if you're limited in compute power.
https://maas.io
https://jujucharms.com/docs
https://www.ubuntu.com/containers/kubernetes for all the goodies.
If you run into problems, I am SaMnCo on #juju in freenode.
I have plenty of other hardware floating around here so no problem on hooking it all up.
Thank you for the image.
By default, the bundle come with a "auto" tag, which will activate privileged containers just when GPUs are detected.
You can enforce "false" to remove that, but then you won't be able to run GPU workloads.
Or you can enforce "yes" and have them activated all the time.
Does that answer the question? Not sure if I understood it right.
Just because I want to use a GPU shouldn't require the power to change the clock, switch UIDs, chown files, mess with logs, reboot the machine, etc.
That said, if you set the allow-privileged flag to false GPU drivers will still be installed but you may not be able to make use of the cuda cores
https://kubernetes.io/docs/concepts/storage/volumes/#hostpat...
I suspect that there is a bug somewhere.
https://github.com/madeden/blogposts/blob/master/k8s-gpu-clo...
You don't need to mount the /dev entries into the container at all. The experimental support creates them automatically for you when you are using GPU resources. Perhaps it's device nodes, not the libraries that required privileges?
OK I gave it a try and you are absolutely right. For the nvidia-smi, I could run it the /dev/nvidia0, which is cool.
I was also able to run it unprivileged. I guess my mistake was to believe the example from the docs and not test without.
Thanks for sharing that, I'll update my charts and the post accordingly.