Hacker News new | comments | ask | show | jobs | submit login
Horrors of Using Azure Kubernetes Service in Production (movingfulcrum.com)
371 points by pdeva1 5 months ago | hide | past | web | favorite | 134 comments

My $DAYJOB is leading a team which develops applications and gateways (for the 1k+ employee B2B market) that integrate deeply with Azure, Azure AD and anything that comes with it. We do have Microsoft employees (who work on Azure) on our payroll, too.

I can tell you, as I'm sure anyone in my team can, that Azure is one big alpha-stage amalgation of half-baked services. I would never ever recommend Azure to literally any organization no matter the size. Seeing our customers struggle with it, us struggle with it, and even MS folks struggle with even the most basic tasks gets tiring really fast. We have so many workarounds in our software for inconsistency, unavailability, questionable security and general quirks in Azure that it's not even funny anymore.

There are some days where random parts of Azure completely fail, like customers not being able to view resources, role assignments or even their directory config.

An automatic integration test of one of our apps, which makes heavy use of Azure Resource Management APIs, just fails dozens of times a week not because we have a bug, but because state within Azure didn't propagate (RBAC changes, resource properties) within a timeout of more than 15 minutes!

Two weeks back, the same test managed to reproducibly produce a state within Azure that completely disabled the Azure Portal resource view. All "blades" in Azure just displayed "unable to access data". Only an ultra-specific sequence of UI interactions and API calls could restore Azure (while uncovering a lot of other issues).

That is the norm, not the exception. In 1.5 years worth of development, there has never been a single week without an Azure issue robbing us of hours of work just debugging their systems and writing workarounds.


On topic though, we've had good experiences with these k8s runtimes:


- Rancher + DO

- IBM Cloud k8s (yeah, I know!)

Haha... I have experience with Azure as well, seen both good and bad things. As I read the title, I was already quite sure to read such a post. When Kubernetes became popular, I tried it with Azure and both scripts and documentation were broken. When I found out, I stopped trying.

Regarding Azure in general: Azure Websites is c*. Having used Heroku and App Engine for some time before, this feels like a joke. Deployments sometimes work, sometimes they don't. Have to deal with node gyps? Don't, just don't. If you ever are forced to use Azure Websites (free startup package? ;)), learn Ansible as soon as possible and convince your team to switch to VMs.

The VMs are okay, you can't do much wrong with. I don't really know where the complexity of Azure Websites comes from, maybe from the fact that it runs on Windows, but this cannot be the full explanation. I have seen people work with node on Windows (even without Ubuntu on Windows) and they were fine. For anyone interested, this is the Azure Websites backend: https://github.com/projectkudu/kudu

Disclaimer: my long adventure with it was years ago, maybe the service has changed 100% but I doubt it

My Team uses CosmosDB heavily (so far, and not too far anymore) and it is another half-baked service. The support people are Indians (Microsoft outsourced Azure support to an Indian company - MindTree) and are not very knowledgeable on the CosmosDB service. They always point us to the URI of a web article (of course from Microsoft) and says everything will work if you follow the article, and close the ticket. Over the time, we understood that we know more than them on CosmosDB, and used to ignore their replies, but nevertheless raise tickets to make sure that they are aware.

I generally like Microsoft, but their official support channels are pretty terrible. Just go looking for something in the MSDN forums - a large percentage of the posts from "Microsoft" people are telling the customers/developers that they posted in the wrong forum (often incorrectly), or suggesting some inane thing that the original post already specified and then closing the thread. GitHub issues are slightly better, although if you get off the beaten path of the new hotness, responses get very thin.

You're much better off trying to find some backchannels via MVPs on Twitter or through blogs, or figure out the developers or evangelists that give talks on this kind of stuff and contact them directly.

My company uses Azure quite heavily, but not Kubernetes.

No crashes, ever. Way more reliable than AWS ever was. (GCP is our failover.)

So it seems that your experience is, from my POV, the exception. Maybe there's something wrong with the way you guys have Azure set up?

Which provisioning model are you using -- ASM or ARM? The last time I used Azure we used the (deprecated) ASM model which was pretty stable instead of the newer and often broken ARM model. We ended up staying on the deprecated model until we moved to AWS (for unrelated reasons).

Wow, our experiences were different. I found ARM to be a great tool compared to ASM.

The moment I removed the last ASM bits, my entire infrastructure became reliably versioned and deployable.

That's really great to hear!

This has been my experience as well, I was surprised to read about some of the issues that were posted. Been using azure app service for about 50 .net/core services for over a year with 100% uptime. Guess I'm just lucky!

In most cases the cause is on the DevOps team and not on Azure, GCS or AWS. I can attest to that in screwing up some configs early on. That being said this is a very new offering from Microsoft and is possible it has some kinks to work out.

The biggest red flag I saw when I was working with Azure is that I noticed that a lot of their CLI commands ("az do-a-thing") actually have the pattern of "retry until it works"... and the first few tries often actually fail!

Thanks for the shout out to IKS (ibm cloud). you have no idea how obsessed we are with this service so it's always great to see someone noticed. :)

We were very surprised with the quality of IBM's Kubernetes service too ! We had a cluster there for almost a year and everything runned very smoothly.

We missed having more instance types to choose from, but it was a nice experience.

We are constantly working to add more machine-types based on customer requests. Recently we have added several new flavors including bare metal systems. You may want to check them out. https://console.bluemix.net/docs/containers/cs_clusters.html...

So, just to understand, are you writing the software because the large business clients are already tied to azure and gonna keep using it? Trying to understand why there is a market, and why people would want to use it.

Can't speak to the Azure Resource Management API's but I've had a very different experience with Web Apps, Azure SQL, Storage, Traffic Manager. Other than a few short-lived bouts of missed writes into Service Bus and Cosmos (their worst product, IMO) the platform has achieved 100% uptime for us for several years. Pretty amazing. From what you're describing I'm guessing you're REALLY not their average user and they de-prioritize quality on your use cases. Probably better off on another platform if you can help it.

Some stuff works really good, I had very good experience with the Table and Blob Storage, Traffic Manager as well. But seriously, these are pretty basic services. ;-) What did you host on Web Apps if I can ask?

It's basically a SASS eCommerce app. Kind of an enterprise class OLTP solution, but with pretty unusual traffic bursts and due to what we're selling very high DB contention. We test with 100,000 concurrent users, have achieved about 10,000 orders per minute--much more than our credit card gateway (Stripe) will allow us to process. There are some rough edges here and there, but I've experienced just as bad and sometimes worse with AWS.

I've had wildly different results. My shop wasn't large by any means but azure worked pretty much perfectly for us. The only issue we ever had was when changing the size on an azure swl db went from a 20 minute operation to taking up to an hour sometimes. Other than that it let us scale as we wanted and have the engineers duplicate environments with their local changes arbitrarily. Gave us a 400 dollar/month bill for something that would have taken a full devops to handle with 10 engineers

With DO you mean digital Ocean?

I guess so since RancherOS is available as a fully supported option (as well as Fedora Atomic and CoreOS).

FWIW we're running a simple, custom cluster made of Debian droplets set up using kubeadm.

I think DO stands for DC/OS in this context

No, I actually meant DigitalOcean. We tested both Rancher (k8s) and Portainer (Swarm) backed by DO Infrastructure. Both worked well. Of course it's not a managed solution, but both are operationally very easy. DO also announced native k8s support, so I'm excited for that.

It's in beta and closed access at the moment though https://www.digitalocean.com/products/kubernetes/ Would be interested as test/QA environment but I guess combined with 'CPU Optimized Droplets' could work for production loads too

> That is the norm, not the exception.

Have you never used any other Microsoft software?

I mean it is the same software company that made Windows ME, Vista, 7, and 10, along with countless other chocolate covered turds.

(Eng lead for AKS here) While lots of people have had great success with AKS, we're always concerned when someone has a bad time. In this particular case the AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet. As part of this investigation we increased the system reservation for both Docker and kubelet to ensure that in the future if a user over schedules their nodes the kernel will only terminate their applications and not the critical system daemons.

Does it seem weird to anybody else that a vendor would semi-blame the customer in public like this? I can't imagine seeing a statement like this from a Google or Amazon engineer.

It also doesn't seems to ignore a number of the points, especially how support was handled. I think it's bad form to only respond to the one thing that can be rebutted, ignoring the rest. And personally, I would have apologized for the bad experience here.

While it might be phrased in a way that implies the customer is partly to blame, the actual details would indicate the main problem was with Azure Kubernetes Service. Critical system daemons going down because the application uses too much memory is not a reasonable failure mode (and the AKS team rightfully fixed it).

Exactly. The whole point of offering a service to the public is that you know more than other people. So of course customers will do wrong things, be confused, etc.

In Microsoft's shoes, I would have strongly avoided anything that sounded like customer blame. E.g.: "We really regret the bad experience they had here. They were using the platform in a way we didn't expect, which led to an obviously unacceptable failure mode. We appreciate their bringing it to our attention; we've made sure it won't happen going forward. We also agree that some of the responses from support weren't what they should have been and will be looking how to improve that for all Azure users."

The goal with a public statement like this isn't to be "right". It isn't even to convince the customer. It's to convince everybody else that their experience will be much better than what is hopefully a bad outlier. The impression I'm left with is that a) Azure isn't really owning their failures, and b) if I use their service in a way that seems "wrong" to them, I shouldn't expect much in the way of support.

...and apparently forgot to notify customer of it, and in general communicate with customer better.

I think this is the main reason for AWS lead. They simply treat customers right (well, better than G and MS anyway).

Yet Azure is the top cloud provider, and AWS is #2.

They are number one because Microsoft doesn’t break out Azure and Office 365 revenue.

My understanding, which may be incorrect, is also that they consider all SPLA revenue as cloud revenue as well.

(SPLA is the licensing paid by service providers to lease their customers infrastructure running Microsoft products. So if you pay some VPS or server provider $30/mo or whatever they charge for Server 2012, and they turn around and send $28 of it to MS or whatever, MS reports that $28 as cloud revenue)

Well, this is on the front page, the top comment is misinformation, the posters left out details that made them look bad, and they seem to be going on a smear campaign out of spite on every platform they have. at what point is any of this in good faith?

What makes you think it's not in good faith? As far as I can tell, Prashant Deva had a series of bad experiences on Azure, including significant downtime. He's mad, and he's saying so.

From his perspective he was using it right; from Azure's apparently he was using it wrong. A difference in perspective isn't bad faith.

Probably the part where he doesn't actually ever say it's a difference in perspective- that's your take. He says AKS is terrible, etc, etc. You're giving him a benefit of the doubt, which I appreciate, but he's gone too far in his bias. Maybe underlying it is a real issue, that clearly hackernews wants to indulge, but the threshold has been crossed.

He doesn't have to say it's a difference in perspective. He's giving his perspective. That's what blog posts generally are.

I note that you don't say your comments here are just your perspective as you trash-talk him. Does that mean you're pursuing a smear campaign and not acting in good faith? Why should he be held to a standard you yourself aren't willing to follow?

I would prefer a vendor responds publicly rather than request a private message. It's possible that one side was angry, and writing a blog post that makes it on HN will surely get a ton of negative attention. If that's the case, they should have the right to clear anything they'd like. I didn't read it as blame, but explanation.

I think it's good to listen to both sides. But the response from Azure eng can be more professional. Customers have the right to do anything, maybe technically right or wrong. But the original post's attitude is more like blaming and throwing out random tech details, not explanation.

Steve Jobs told customers they were holding their phones wrong, so, to me, not really.

Jobs was a dick; Microsoft has PR and developer relations teams that are trained in how to provide feedback to their community.

No. If the customer is at fault there is no problem in blaming them especially if they run a smear campaign.

Well, we really don't know what was said as the blog didn't actually provide any of the original communications. It's a he said she said thing at this point. Frankly the author comes across as having a huge axe to grind. That may be with good reason but it's hard for me to judge the quality of the Azure support when we never see any of their communications, just paraphrases.


I like hearing the other side of the story. I wish I got the other side more often.

AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet.

I'm a bit confused why the cluster nodes don't come configured like this out of the box... kubernetes users aren't supposed to have to worry about OOM of the underlying system killing ops-side processes are they?

they do if cluster admin didn't setup proper system-reserved and kube-reserved (both are kubelet flags) and configured enforcement.

In this case, the cluster admin would be whoever is provisioning the cluster nodes. In Google Kubernetes Engine, the "Capacity" and "Allocatable" info shown on the nodes are different (I see some mem/cpu reserved for probably system stuff). This makes me think GKE probably subtracts node capacity from what's allocated for the system automatically.

P.S. I work at Google.

correct, it should be provisioned by k8s provider (AKS in this case) and is what GKE is doing https://cloud.google.com/kubernetes-engine/docs/concepts/clu...

note, it also needs to match node configuration (how cgroups are setup specifically) so I doubt this works well on EKS which is BYO node. Maybe it's the issue with AKS too, I don't know enough about how it works...

AKS now reserves 20% of memory from each agent node and a very small amount of CPU to protect docker daemon and kubelet to function with misbehaving customer pods. However, that just means customer's pods will be evicted or no place to schedule when all resource is used up. This is something we see now in customer support cases.

That seems crazy high. If I have a node with 512G of RAM the kubelet/sys will take 100G? why would kubelet ever need this much?

AKS caps at 4G.

4GB of RAM per machine, or 4GB reservation of a 20GB machine?

'my stuff didnt work on AKS' is one thing; 'my stuff brought AKS and the dashboard down' is an fundamental failure that is in no way mitigated by this comment, and it feels very dishonest to try to redirect the blame for it.

My experience with azure has been reasonably positive, but even I've seen some weird stuff where things randomly don't work (AAD) or the dashboard just refuses to show anything for a while.

That this is a widespread endemic problem in Azure seems entirely plausible...

it is unclear what this response hopes to achieve. it is mentioned in the post that our containers do crash. that should under no condition cause the underlying node to go down. this has even been pointed out by others responding to this thread. it is interesting though that none of the other issues in the blog post are bought up.

Setting aside the workarounds and safety margins discussed in other comments, I would expect a reasonable operating system to allow explicitly prioritizing processes so that the important ones can only run out of memory after all user processes have been preemptively terminated to reclaim their memory. I would also expect a good container platform to restart system processes reliably, even if they crash.

Yeah, it should do that. You can read up on how the kubelet and Linux OOM work in k8s here https://kubernetes.io/docs/tasks/administer-cluster/out-of-r.... Once the OOM kicks in though, I think you're in a pretty bad place.

Scheduling is only really going to work well if you set limits, requests and quotas for containers. Please do this if you're running containers in production. I know it's a pain, as it's non-trivial to figure out how much resource your containers need, but the payoff is you avoid the issues described in the article.

My guess is that the system reservation change was very welcome for me as well.

Note that a service as AKS also draws in new customers that may not yet have years of Kubernetes experience. I'm one of those for example, and I created an AKS cluster so we could deploy short-lived environments for branches of our product. We're using GitLab and the 'Review Apps' integration with Kubernetes.

The instability experienced by the author of this article is something I experienced as well, and I have spent a lot of time draining, rebooting, and scaling nodes to try and find out what is happening. I would not have been able to guess the absence of resource limits could possibly kill a node.

Fortunately these instabilities disappeared a couple of weeks ago after a redeployment of the AKS instance, and it has been stable ever since. I guess the system reservation change was included there? From my perspective that was also the moment AKS truly started feeling like a GA product.

Sounds like you're still beta testing

Ah, and Hyper-V supports dynamic memory, so the system reservation backing can effectively be thin provisioned. That's nice. (Hm, dynamic memory probably got switched on from the start.)

Thanks for posting this here. It would be cool for there to be a way to hold application users to account without needing to chase viral Internet posts and do your best to pin some accurate reporting on slightly after the fact. A tricky general problem.

If there's one thing I miss with Azure (and AWS), it's the perpetually-free 600MB RAM KVM VM GCloud gives everyone to play with. It only has 1GB outbound, but inbound bandwidth is free, and I can do pretty much whatever I want with it. But anyways...

I don't think Azure ever uses dynamic memory for VMs - if I SSH into a VM I see the full allocation of whatever size it was supposed to be out of the bat.

I think this has to do with cgroups and ensuring the OOM killer doesn't target what is essentially the `init` process of a Kubernetes cluster - the docker daemon or kubelet.

Last time I tried to use AKS I just got cryptic errors about the size of VMs available in Europe so I gave up and used GCP.

This is a pretty bad mistake from the customer if this is true. If not done already it would probably be good to expose Prometheus metrics on CPU/Memory usage per node.

Yes, this is true on it's face: it's bad to deploy containers to k8s without appropriate resource limits. However, this should in no way affect the operation of the node, so the implied transfer of responsibility for this incident from AKS to the customer is invalid imo.

It does if you don't have enough system reserved. It causes the kubelet to not function well and eventually the node goes sideways due to oom killing.

Right, I think we're saying the same thing. If the node is properly configured an end-user pod should not be able to take down the kubelet.

lol so aks forgot to provision enough resources and possibly setup enforcement and you are blaming the user? the user should be able to run as close to edge of "allocatable" as possible or even go over it and be oom kill'ed without bringing down the entire node. this functionality is even built into kubelet already. there's no way you can twist this to make it into user error.

More generally I should be able to choose to run an interruptible workload that I know to leak memory. I should expect that if I don’t, one of my coworkers will, and the node will stay up. Not leaving enough RAM for the node’s core resources is a mistake, but far from the worst thing in the world.

We are indeed working on more convenient container monitoring and logging on Azure portal.

At Google you can’t even run anything on Borg until you specify how much memory it will use. You also have to specify how many cores you need and how much local (ephemeral) disk. And memory limit is hard: your task is killed without any warning if it attempts to exceed the limit. I was actually puzzled to discover that these limits are not required on k8s. Not only this leads to screwups like this one, it also makes it impossible to optimally schedule workloads, because you simply don’t know how much of each resource each job is going to use.

that's not actually how this works on Borg these days (and by "these days" i mean past 5+ years) and there's nothing about k8s not requiring limits by default that lead to this.

I'll let current googlers comment on that. That's how it worked 3 years ago when I was there. You could also let Borg learn how much a job is going to use, but no serious service that I'm aware of used this for anything in Prod.

I left years ago and there were serious services using Autopilot. The name is no longer secret, see https://github.com/kubernetes/kubernetes/issues/44095 or Tim Hockin's slides from two years ago, where he revealed that 2/3 of Borg users rely on it: https://speakerdeck.com/thockin/everything-you-ever-wanted-t...

The slide merely says "most Borg users use Autopilot", which could easily be true. Heck, I used it myself for non-production batch jobs. Those jobs were run as me. Any engineer at Google can spin up a job, and I'd venture to guess that most of them run at least something there every now and then. That's ~40k logical "users" as of 2018. The interesting question (which I admit I don't know the answer to as of today) is whether users that run search, ads, spanner, bigtable, and other shared service behemoths use Autopilot. FWIW my team did not use it at all.

Mental note - create a new cgroup for docker and kubelet.


When I deploy to Amazon ECS the upper limit of the resource geometry of my service is checked and if it exceeds that upper limit available of the underlying cluster, it refuses to deploy. I understand k8s has similar features. It reads like Azure doesn't have their k8s configured correctly.

If the containers in a pod request more ram than is available on any node in the cluster then the pod will fail to schedule and will remain in pending state, which can be seen in the events for the controller (replicaset, daemonset, etc) using, for example `kubectl describe replicaset myreplicaset.` We've gotten ourselves into this situation a few times on GKE. It's easily resolvable by tuning the resource requests or scaling the nodepool and has no adverse effect on the operation of the cluster.

>the AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet.

sounds like a bunch of people have just learned for the first time about OOM killer. I mean the production systems with overcommits and the running loose OOM killer and I bet without swap ... And they blame the customer. Sounds like a PaaS MVP quickly slapped together by an alpha state startup. You may want to look into man pages, in particular oom scoring and the code -17.

Actually Kubelet should already be adjusting OOM scores to make sure that user pods (containers) get killed over Kubelet or the Docker daemon. Why didn't that work here?

adjusting scores for other processes skews the chances yet doesn't guarantee. The way to guarantee it for a given process is to disable the killer for that particular process.

Interesting. The kubelet seems to use varying negative OOMAdjusts to prioritise killing[0] but if I'm reading the kernel code right anything at -999/-998 would return 1 from the badness function and essentially be equally valid to kill unless it was using over 99.9% of available memory.[1]

I see OOMScoreAdjust=-999 for kubelet being used but why not -1000. -999 seems like it would be equally likely to be evicted as -998 unless the for_each_process(p) macro always goes first to last processes?



seems that way to me too - everybody like kubelet and "guaranteed" containers gets 1.

>unless the for_each_process(p) macro always goes first to last processes?

It seems it would usually go first to the first processes - the macro below - i.e. it would get to the "top" processes like kubelet, docker, etc before the containers.

#define next_task(p) \ list_entry_rcu((p)->tasks.next, struct task_struct, tasks)

#define for_each_process(p) \ for (p = &init_task ; (p = next_task(p)) != &init_task ; )

Given that "chosen" is updated only "if (points > chosen_points)" it seems that the first listed process with score 1 will stay the "chosen" in that situation, ie. it will be one of the top processes like the [-999] kubelet, not a [-998] container.

From a provider of an Azure class i'd have expected that they wouldn't rely on that machinery and would instead go the way of disabling the killer for the top processes outright.

Interesting, How does one disable the killer for a process? I thought on linux it was only possible to adjust oom scores.

Worth nothing that both Microsoft and Amazon's Kubernetes offerings are very new (literally weeks since GA). While "officially" ready, it is pretty naive to rely on them for production-critical workloads just yet, at least compared to Google Kubernetes Engine which has been running for years.

If you absolutely need managed Kubernetes, stick to GCP for now.

I believe this is a cultural problem with Microsoft. Probably similar to other companies but it was very evident at Microsoft. People responsible to allocate resources (The management chain) rarely dogfood the product.

While the Engineers and PM would complain a lot about quality issues, management wants to prioritize more features. It was a running joke at Microsoft: No one gets promoted for improving existing things, if you want a quick promo, build a new thing.

So when you see a bazillion half baked things in Azure. That’s because someone got promoted for building each of those half baked things and moving on to the next big thing.

Going from 0-90% is the same amount of work as 90-99% and the same amount of work as 99.0% - 99.99%. Making things insanely great is hard and requires a lot of dedicated focus and a commitment to set a higher bar for yourself.

Here's a fun fact about Azure Kubernetes:

1. Deploy your Linux service on k8s with redundant nodes

2. Create a k8s VolumeClaim and mount it on your nodes to give your application some long-lived or shared disk storage (i.e. for processing user-uploaded files).

3. Wait until the subtle bugs start to appear in your app.

Because persistent k8s volumes on Azure are provided by Azure disk storage service behind the scenes, lots of weird Windows-isms apply. And this goes beyond stuff like case insensitivity for file names.

For example, if a user tries to upload a file called "COM1" or "PRN1", it will blow up with a disk write error.

Yes, that's right, Azure is the only cloud vendor that is 100% compatible with enforcing DOS 1.0 reserved filenames - on your Linux server in 2018!

You're not using Azure Disks because they are attached to your VM as a block device and have no knowledge of the file system. PVs in AKS using Azure Disks can only be attached to a single node, as clearly stated in the documentation: https://docs.microsoft.com/en-us/azure/aks/azure-disks-dynam...

>> An Azure disk can only be mounted with Access mode type ReadWriteOnce, which makes it available to only a single AKS node. If needing to share a persistent volume across multiple nodes, consider using Azure Files.

So you must be using a file share across multiple nodes using Azure Files, which is a SMB file share service that may have compatibility issues with the Samba protocol as described in the (arguably hard to find) docs: https://docs.microsoft.com/en-us/rest/api/storageservices/na...

>> Directory and file names are case-preserving and case-insensitive.

>> The following file names are not allowed: LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, PRN, AUX, NUL, CON, CLOCK$, dot character (.), and two dot characters (..).

The only way this would make any sense is if you were using Azure Files, rather than Azure Disks. There's virtually never a time when it makes sense to use Azure Files over Azure Disks, and even when it does, a change in the application would be better advised than using Azure Files.

>Yes, that's right, Azure is the only cloud vendor that is 100% compatible with enforcing DOS 1.0 reserved filenames - on your Linux server in 2018!

This is hyperbolic bordering on flatly false. This is more reasonable and accurate:

"Azure is the only cloud vendor that serves their Samba product from Windows boxes, and thus leak Win/NTFS-isms into their Samba shares [that shouldn't be used anyway]."

How would an ext4 filesystem, mounted under Linux, attached as a block device to a VM, be subjected to Windows-isms? What you're implying doesn't even make sense.

I really would have to disagree with the statement one should never use Azure Files over Azure disks.

1. Most Azure VM types have very stringent limits on attached disks; a K8s worker can easily blow past this limit.

2. You have tremendous complexity to deal with: pick Azure managed disks vs unmanaged disks on storage accounts (you can’t mix them on the same cluster). You have to understand the trade of of standard vs premium storage and how they bill (premium rounds up and charges by capacity, not consumption). And you need the right VM types for premium.

3. Managed disks each create a resource object in your resource group. A resource group last I checked had hard limits on the number of resources (like 4000?). Each VM is minimum 3 to 4 resources (with a NIC, image, and disk)... at scale this gets difficult.

4. Azure disks require significant time to create. , mount and remount. A StatefulSet pod failure will sometimes take 3-5 minutes for it’s PV to move to a different worker. And worse when your Azure region has allocation problems. Azure files are near instantaneous unmoubt/remount.

5. Azure disks are block storage and thus only ReadWriteOnce. Azure files are RWM.

So, sure, if you’re running a cluster database with dedicated per node PVs and limited expected redeployments... use Azure disks. If you need a PV for any other reason... especially for application tiers that churn frequently.. use Azure files.

This is all true, I sort of forget that I still have a sort of Azure Stockholm syndrome. I'd say it's good feedback but it's nothing Azure doesn't know about.

Maybe Azure Files performance has improved to the point where it's more usable for storage scenarios. I suppose it probably comes down to the use case and application behavior.

It would be good if Azure had someone testing out these scenarios and interfacing with the larger k8s community, maybe through the SIG, for these sorts of musings and questions.

Is this with the managed-disk volume (which IIRC is formatted ext4) or with `AzureFiles`, which is essentially SMB/CIFS?

I joined a healthcare startup in 2014 that had a small infrastructure on Azure. Back then AWS weren't signing BAAs and Azure was the only player in town. Being an early startup, the company didn't purchase a support plan from Azure. One day Azure suffered a major outage (may have been storage related) and over an hour later, I reached out to Microsoft for written confirmation that we could forward to customers. Since we didn't have a support plan they flat-out refused to provide any documentation whatsoever about the issue. They wanted $10,000.

Azure - never again. Company moved to AWS within a quarter.

FWIW, Amazon's hosted Kubernetes offering (EKS) isn't stable either (DNS failures, HPA is known to be broken, etc.).

I know HPA is a legit issue but DNS failures seem to be fairly normal in kubernetes. Scaling up kube-dns has helped us resolve that particular issue as well as moving away from Alpine and into minimal Debian images. Alpine has its own DNS issues that caused us much pain.

We've had issues with KubeDNS, too. Lots of retries and timeouts on the client side, and lots of conntrack entries.

Libc has pretty slow retries (5s, I think) by default, and until 1.11 hits you can't easily set up resolver configs, though you can inject an envvar separately into each. And musl-based distros like Alpine don't even support some of libc's options, iirc.

We ended up scaling up KubeDNS to 2 replicas and moving them to a dedicated nodepool just to make sure they weren't competing with other nodes. That fixed our issues for now.

Kube-dns (or CoreDNS in newer clusters) is pretty stable in my experience. It's still a very good idea to run more than one replica so that you can tolerate a single node failure, but if DNS failures are "fairly normal" that definitely warrants some additional investigation.

Most dns problems in kubernetes, in my experience, can be traced to udp failures due to the iptables kubeproxy backend.

Thanks we did look into it but not as thoroughly as probably needed. Switching out from Alpine fixed pretty much all our issues.

This is sad. From what I hear, one of the founders of k8s works on AKS.

Only a matter of time before GCP becomes the #1/2 cloud provider.

I’ve seen at least 4 “founders of Kubernetes” by now. How many are there in total?

Agree. The quality of their software is so much better than the other major players.

Yeah, Brendan Burns is at ms

Had a similar experience with azure cosmos graph API. The API is half baked. Doesn't support all gremlin operations. Even supported operations give non standard output. Switched to aws Neptune immediately when they launched

> The API is half baked

Doesn't surprise me. Cosmos was too good too be true :

- Serverless - Infinite Scalability - Mongo API or Gremlin API or SQL API

It's obvious that it can't hold up to all it's promises.

"Noone can give you what i can promise you"

There are definitely growing pains with using Kubernetes on Azure. I've wondered a few times if other platforms have similar issues and have seen more than a few complaints about EKS.

Microsoft has some great people working on Azure, but I do feel like AKS was released to GA too soon. Without a published roadmap and scant acknowledgment of issues, I'm not sure I could recommend it to my clients or employer. It's disappointing, because I've had few issues with other Azure services.

Full disclosure: I receive a monthly credit through a Microsoft program for Azure.

> I've wondered a few times if other platforms have similar issues and have seen more than a few complaints about EKS.

I can't speak to EKS but we've been running production workloads on GKE for over a year with very good results. There have been a very few really troublesome "growing pains" type issues (an early example: loadbalancer services getting stuck with a pending external IP assignment for days) but Google has been awesome about support, even to the extent of getting Tim Hockin and Brendan Burns on the phone with us at various times to gather information about stuff like the example I gave above. I give them high marks and would recommend the service without hesitation.

Brendan Burns is the lead engineer on AKS AFAIK

Brendan Burns was the Lead Engineer on Kubernetes at Google around 3 years ago, then went to Microsoft to lead this entire AKS/ACS/Container effort.


Hmm, this isn't great. Currently using Azure Kubernetes Service and we haven't had many issues so far, but we just made the shift.

Hope I don't have to move over to Google cloud.

It's heaps and heaps better than Azure.

You'll be fine. We run a number of AKS clusters and they have all been rock solid. I think the problem is people hear "managed cluster" and so they don't think they need to understand how k8s works. Follow best practices (resource limits, etc.) and you'll be just fine. We've even tested out the upgrade flow on a live production cluster and it was butter smooth.

Why? GKE works perfectly over there.

Many people (me included) don't trust Google to stay with any business besides advertising. There's been too many times that they have ended services with not much time to get off.

That and countless horror stories about once your google account or service is banned for whatever reason detected by their bot, you have no way to appeal with their nonexistent customer support.

The Cloud products follow a Deprecation Policy documented here: https://cloud.google.com/terms/

I can understand being upset about their consumer products but it doesn’t really apply here.

This is what always puzzled me about the Google-Alphabet strategy, specifically the idea of having all the assets under a single share ticker (GOOGL).

The more services you put under one banner, the more the stink of one disaster is going to linger, and hinder adoption of the successes.

To me, a far simpler proposition would be a new brand & share issuance for each new sub-company (eg. Waymo), with existing Alphabet shareholders getting pro-rata shares in the new company.

I’d bet on it being about the recognition that Google has been a pioneer and thought leader in the scalable systems/hosting space and they didn’t want to throw out the baby with the bathwater.

DNS failures were almost certainly related to all the k8s system services on the cluster not having CPU or memory reservations, and KubeDNS was flaking.

In general AKS is a vanilla k8s cluster and expects you know what you’re doing. MS arguably should enforce some opinions about how things like system services have reservations, etc, but none of this is vanilla. The trouble is that K8s defaults are pretty poor from a security (no seccomp profiles or apparmor/se profiles) and performance perspective (no reservations on key system DaemonSets).

We’ve had this interesting industry pendulum swing between extreme poles of “we hate opinionated platforms! Give me all the knobs!” And “this is too hard, we need opinions and guard rails!”. I think the success of K8s is exposing people to the complexity of supplying all of the config details yourself and we will see a new breed of opinionated platforms on top of it very shortly. It reminds me of the early Linux Slackware and SLS and Debian days where people traded X11 configs and window manager configs like they were treasured artifacts before Red Hat, Gnome and KDE, SuSE, and eventually Ubuntu, started to force opinions.

This is probably why they're releasing OpenShift on Azure. To let Red Hat engineers manage the kubernetes part.


I like openshift, I am not sure why developers are not hot about it.

> I am not sure why developers are not hot about it.

Because it's designed entirely for enterprise customers. If you have a startup you have very little reason to choose OpenShift compared to Heroku or AWS honnestly.

I still love Redhat tho.

What do you like about openshift that lacks in kubernetes?

We have a larger migration project going on for months. So far not a single failure occurred and our TEST environment is already fully migrated (quite responsive and rock solid) since 2 weeks.

However, I do share that Azure indeed has released a lot of half-baked features and services lately (last 1.5 to 2 years). I hope this trend does not continue.

Couple of questions to the OP:

1. What version of docker / container runtime is being used?

2. What base image for your containers is being used? eg. alpine has known DNS issues [1]

[1] https://www.youtube.com/watch?v=ZnW3k6m5AY8

Side question : what are the best practices for development ? Are you suppose to run a local kubernetes deployment ( it looks like it's pretty hard to set up) , or do you run everything outside of containers when developping and then deal with k8 packaging and deployment as a completely separated issue ( which looks like it could lead to discovering a lot of issues on the preproduction environment) ?

Azure is not bad but there are definitely some rough edges. We're having trouble with their Bizspark Sponsorship biling https://news.ycombinator.com/item?id=17698948

Key Vault (their HSM product) is even worse.

interesting. thanks

It's a very new offering, the Linux App Services are still in beta, I have no idea why you would roll it into production expecting no hiccups. AWS is also new on this. Give it 6 months and let the kinks work out before migrating workloads. Seems like common sense.

App Service on Linux is an unrelated service that actually runs on top of Service Fabric (a stateful microservice and container orchestration platform).

App Service on Linux is not in Beta, it is GA product for over year now with SLA of 99.95%. It does not use Service Fabric in backend, it uses it's own custom Orchestrator (which essentially removes the quirks of learning about orchestration away from the user)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact