Hacker News new | past | comments | ask | show | jobs | submit login
Kubernetes at Box: Microservices at Maximum Velocity (box.com)
157 points by robszumski on July 22, 2016 | hide | past | favorite | 31 comments

I haven't used kube in production yet.

However, I'm using mesos, marathon and chronos to manage a production environment with service discovery glue based on Route53.

Using Docker to ship an application to a well configured environment is just a delight, the amount of configuration needed is absolutely minimal.

However, I think people need to realize that it's "easy" if your services are not talking to each other and dependent on one another in a way. If service X is using service Y directly (via HTTP), it gets a bit more challenging.

The way I like to configure micro-service is based on messaging so you send a message to a queue and multiple satellite services can consume that message and do stuff with it.

If your services are dependent on one another, the configuration gets trickier and the maintenance gets a bit harder.

Good job by Box also contributing back to the core of Kube based on what they needed, based on it getting merged I am guessing other people will find it useful as well.

I'm using Mesos/Marathon stack, and even that part I don't have trouble with. Mesosphere has a service called mesos-dns that will take care of service discovery, and Marathon has built in support for dependencies (don't launch X until Y is launched).

We have built some tools for port discovery (talking to mesos to figure out what port service Y is built on), but even with all our tools, we recently did a complete cloud migration (to GCP), and it was easy as backing up and deploying Zookeeper on the new nodes. Once the slaves were up, everything was running as if nothing changed in under an hour.

Thanks for the feedback - may I ask if you've looked at Kubernetes at all? It appears to do exactly what you describe, and I'd love to hear if you evaluated it but found something missing. You can see some details of how this would work in a recent blog post by CoreOS (https://coreos.com/kubernetes/docs/latest/services.html).

Disclosure: I work at Google on Kubernetes.

I have looked into Kubernetes and am running a couple of POCs on it.

I agree 100% Kube answers everything I am missing with mesos/marathon combination, that's why I am planning to start moving over new services.

Could you explain a bit more why things get harder if service X is using service Y? I'm sort of the opposite of you in that I haven't used much of the Mesos stack yet but am familiar with Kubernetes. Services depending on each other is pretty trivial in Kubernetes so I'm wondering what makes it so hard in mesos.

It's not that it's hard to do it with Marathon/Mesos, it's just hard to maintain an application with dependent services.

It's just a matter of my personal comfort, I like to use isolated services that just use messaging and fire a message when their done with their role.

Even if Kube handles everything perfectly, it's still harder to maintain application with inner-service communication, hard to follow problems/errors/stack traces etc...

Have you checked out DC/OS? It's built on Mesos/Marathon/etc. and has some pretty cool service discovery/routing built in, i.e. Minuteman which allows you to define virtual IPs and ports that map to specific services.

Haven't tried DC/OS. Been using the open source solutions with custom glue code around it.

DC/OS is an open source solution: https://dcos.io/

Anyone working on K8s at Box or I guess anywhere else that has deployed it partially feel free to answer this, but:

How do you handle gatewaying traffic into Kubernetes from non-K8s services? I've been trying to get a basic cluster out the door with one of our most stateless services, but I'm having a having a hard time just getting the traffic into it.

The mechanism I'm using is having a dedicated K8s nodes that don't run pods hold onto a floating IP to act as gateway routers into k8s. They run kube-proxy and flannel so they can get to the rest of things, but ksoftirqd processes are maxing CPU cores on relatively recent CPUs trying to handle about 2Gbps of traffic (2Mpps) which is a bit below the traffic level the non-k8s version of the service is handling. netfilter runs in softirq context, so I figure that's where the problem is.

Are you using Calico+BGP to get routes out to the other hosts? What about kube-proxy?

I work at Box on this project.

Our network setup is constantly evolving due to a number of internal networking limitations related to nearly static ip-addressing and network acls. I'll describe our current setup and then describe where we'd like to go. The big piece of context is that we already have a number of services already being managed via puppet and a smaller number of new and transitioned services in Kubernetes so we need to allow interop though a number of different mechanisms.

We are currently using Flannel for ip-per-pod addressability within our cluster. No services are communicating inside the cluster so they aren't using kube-proxy yet. For services outside the cluster talking into the cluster we are using a heavily modified (https://github.com/kubernetes/contrib/tree/master/service-lo...) which we have contributed back yet. It supports SNI and virtual hosts. And we get HA and throughput for the individual loadbalancers by using anycast.

We have a number of internal services outside the cluster slowly moving to SmartStack. So I assume we will be figuring out interop with that and running it as a sidecar at some point. We would like to move to calico as we have some fairly high throughput services running outside of the cluster which we need to avoid bottlenecking on a loadbalancer for. We have separate project running internally to move our network acls from network routers to every host via Calico.

Hope that is more helpful than confusing.

Thank you for that answer, it's helpful. We've also been considering Calico but it seems like a fair bit of work and the project's pretty overdue as it is.

The K8s slack channel is pretty good for things like this.

You can either bind the container to a host port and register the ip of the node (or use the k8s dns or api to find the ips). Otherwise register a service with a nodeport and all the nodes will accept traffic and load balance internally.

You can get a list of ips from the DNS (instead of just the service ip), and I think that interacts appropriately with host ports.

We ran into the same ksoftirqd issue in our own bare-metal deployment. Turns out there's a performance regression in the linux kernel that manifests when we configured the system with more receive queues than we had physical cores in a single socket.

We dropped the receive queues down to 12, from 48, and hit line rate. More info here:


I don't work at Box. It has also been 6 months since I touched K8S, so a lot of details I have about K8S in working memory is gone. I'm also interested in the answers to the question you raised.

Off the top of my head:

Have you thought about putting flanneld on the machines hosting the non-K8s services? Probably impractical, but it's something to consider.

The other is to treat the services inside the cluster as if it is in a different datacenter and explicitly expose nodeports that the other services need. If you're using HTTP as the transport, maybe use an http proxy running inside the cluster and proxying them to the services within the cluster. That's how I did it with getting AWS ELB to talk to the services within the cluster I set up.

The trick with flanneld on our other hosts is that AFAICT there's no way to run flanneld as purely a "grab routes and install them" without having it get a totally unnecessary (and completely unused) subnet lease.

I have considered just writing a quicky daemon that will do just the work of syncing routes without getting a lease (or trying to modify flanneld to do so).

The service in this case is memcache with a bunch of mcrouter pods in front of it to handle failure and cold cache warming. I still need to get traffic to the mcrouter instances and that's where I'm running into the bottleneck.

Fair enough. I'm not familiar with mcrouter or memcache.

Fronting the mcrouter pods with a service and using a node port (http://kubernetes.io/docs/user-guide/services/#type-nodeport) is not workable?

Are you running on physical hardware?

Yes, Dell 1950s and R420s. The gateways are R420s with Intel 10gbit cards.

This is the 1st use case I have seen where microservices are starting to make sense.

My question is what about network security ? How is that part managed ?

We are using a product called calico which integrates with kube, openstack, baremetal to setup iptables rules which simulate network ACLs on all of the receiving hosts so that services are only able to reach network endpoints which are whitelisted for that service.

Disclaimer I work for Box on Kube

conntrack module cause performance hits or you disable some ? any tcp tuning kernel bare metal side ?

Really cool story - it's been awesome to see how Box has contributed back to the community as well!

Disclosure: I work at Google on Kubernetes.

Totally lost it at "e knew we'd ultimately need dozens (even hundreds) of microservices to be successful" and did not read any further. I am having a very hard time seeing that as a criteria for success, not to even mention imagining how that mess is managed. Is this really common to have so many microservices?

Shame you didn't read any further, it was a good article. Though not necessarily a criteria for success, some businesses have requirements that make many services make sense. We may not need that many services, but its always interesting to learn how folks solve their engineering problems.

You can easily end up with dozens of services if you split up your application aggressively enough. Think about all the parts in your application that are handled by workers like Sidekiq/Celery, these can all be applications and not part of the monolithic.

For example for us at Gogobot, every time a user submits a recommendation we detect the language. This can be a service instead of a worker and the code can live separately.

If you do this often enough and aggressively enough, you end up with dozens of services. Once you have an environment that allows easily testing/launching these it's much more efficient to launch a service than replicating your monolithic and assign worker for things.

If your application / poorly architected then when you split it you just spread that pain around to more systems

Design it then split if where it makes sense.

I have talked to a number of people at large companies who have dozens to hundreds of services. Each team typically has a few services, but when you have hundreds to thousands of engineers, having dozens to hundreds of services seems totally reasonable to me.

I hope you all appreciate the fact that the kubernetes team was initially required to order new hardware for spinning up services.

s/kubernetes/box no?

s/foota/has no reading comprehension

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact