Hacker News new | past | comments | ask | show | jobs | submit login
Go Production Performance Gotcha – GOMAXPROCS (metoro.io)
57 points by cbatt 5 days ago | hide | past | favorite | 29 comments





Another potential avenue for problems like this, which I'm a fan of, is taking advantage of k8s's static CPU policy:

https://kubernetes.io/docs/tasks/administer-cluster/cpu-mana...

Using this (plus guaranteed QoS), you end up with containers which can only even "see" a subset of cores of the whole node (and they get those cores all to themselves), which is great for reducing noisy neighbors when big machines are running many different services.


I am assuming that in OP's case, they want their go process to "see" the machine as it is though, to surface more accurate/better stats?

Interesting link nonetheless, thanks!


Ah interesting. I'll have to dive in deeper here. If I understand correctly this essentially gives you exclusive core pinning? Do you find that you this reduces total utilization when workloads burst but they don't leverage the other unused cores?

I thought this was why you are supposed to use ‘nproc’ instead of manually parsing cpuinfo or some other mechanism to determine CPU count. There are various ways in which a given process can be limited to a subset of system resources.

Unthinkable crutches in Go land continue.

This is something that needs to be fixed in the runtime itself to be appropriately container-aware and not require the users to write their own libraries to patch this out.

For example: https://github.com/dotnet/runtime/pull/99508

(alternatively, the goroutine runtime could have been made auto-scalable which would have reduced the impact at the cost of implementation complexity, like .net's threadpool is)


This isn't a gotcha, it's an important aspect of how Go runs. While it's not highlighted enough concurrency isn't magic, and GOMAXPROCS is important to control your Go app, especially within production environments.

Although it's not well known benchmarks for programming languages would show even faster results for Go with GOMAXPROCS set to 1.

This lets single-threaded benchmarks run with less overhead.

A real missed opportunity in communication. Because this isn't the first time we see articles like these pop up.


It’s a containers / cgroups / linux problem. In the context of cgroups limiting CPU count usable within a given container, where an application in the container wants to ask the host kernel how many CPUs it can use, the application should have access to a system API that is informed by the cgroups configuration, but it doesn’t, at least not portably by default.

In busy systems, GOMAXPROCS=cpu.limits will still lead to the process exceeding its cfs quota and be throttled as the Go runtime has additional threads beyond just GOMAXPROCS. IME, the only way to (nearly) guarantee there's no throttling is to set cpu.limits=GOMAXPROCS+1 to leave some room for these system threads. Unfortunately, there's broad adoption of uber/automaxprocs (which makes the same error) and utilizing the downward API to do this for other cases.

Happened to be experimenting with similar things at day job recently - very keen to see more

Alas (OFF TOPIC) clicking the burger menu on the site (mobile) gives a rather novel fullscreen black error

> "Application error: a client-side exception has occurred (see the browser console for more information)."


Well that's a little embarrassing, fixing now.

fixed.

super-fast!

Looking forward to having full-brain time, super intriguing to see folk so much further down the eBPF tracing path

Best of luck with the offering!


Rant mode on.

This is what software engineering looks like nowadays (at least based on my last jobs). The times I discuss truly software engineering topics (modelling, abstractions, algorithms, etc.) is significantly less than the times I discuss about: k8s shenanigans, Go/Java/etc. intricate details, library incompatibilities, build systems, S3 access rights, etc.

I don’t like it. I used to work in more interesting topics when I was junior, precisely because there were at most 3 components involved: the db, the server framework, the client side. So I was spending time in the core of the business (business logic). There was no need to scale, no sharding, no k8s, no aws, no failover. We were making money for the company (not like nowadays where every single company I work for is a “unicorn” and is not profitable)


I think this is what software engineering has almost always looked like. Computer science is definitely more pure and at the cutting edge of knowledge and implementations, but software engineering on the other hand is about understanding the tools available to you for your budget, and picking ones that will leave you with enough tolerances for your workload plus or minus expected variances, while expending the least of your budget. The longer you've been in your career, the more you have experienced the existing tools that are out there and can discuss those topics and weed out irrelevant details that some marketing department might tout because it's their tool's most favorable angle.

In short, computer science is algorithms, data structures, etc. Engineering is those things applied for a given set of constraints including time and budgetary constraints. Or at least that's how I've come to define the differences.

I generally tell my family my job is more like lego than anything else. It's about knowing the pieces available and how to put them together into something cool. And occasionally, you do get to build a new piece that brings the whole set together nicely if the brick you need doesn't exist yet.


Computer science has never been software engineering, though there's been a lot of cross-pollination.

But there's still a world of difference in my opinion between "how can I put all these tools together to deliver a solution to the problem" versus "oh crap, the service is running out of CPU again, argh, my service provider changed how we specify CPUs because the old way wasn't good for someone who isn't me, and okay, I'm upgrading, and oh shit the upgrade also changes how we get to the encryption keys and now it's busted, let's revert, what do you mean it irrevocably upgraded our key store to the new version and now the old version doesn't work, fix that then, oh, we can't because that's all in the cloud and we missed the upgrade emails in our other big pile of emails argh argh argh fine, call an incident and bring in half the teams in the company".

The latter was not created in the past 5 years, but it has gotten noticeably worse. When things work, they work better than they did before, but when things fail, they fail harder, in the sense that they can create much more complicated snarls than they used to. I much prefer even dull requirements elicitation meetings to too much of the latter.


Containerization has done wonderful things for our industry.

Abstraction has done horrible things to our industry.

We're so many layers up now that hardly anyone (except a blessed few) can really understand the whole thing, so the average engineer is left thrashing against the mountain of abstraction created for the sake of some nebulous "standardization" or even just for abstractions sake.

I argue that public cloud systems are harder to "get right" and present a larger security risk than many appreciate and Kubernetes is, while very cool, a massive chunk of complexity that makes it harder to actually understand what is going on; to not even mention the opaque, intentionally confusing and under-documented complexities that come with operating in public clouds in the first place.

It's not always possible to make a simpler system, but it's always possible to make a more complex one. I think modern day systems are firmly in the "more complicated than they need to be" territory. To be completely honest, I don't really see that changing.


I dont think this is related to Go tbh. For some reason when it comes to parallelism most of the literature suggests to create a pool with n threads, n being the number of threads that the cpu can run in parallel. This makes sense if the threads are always running. Falls totally apart when the threads spend most of their time idle.

> The times I discuss truly software engineering topics (modelling, abstractions, algorithms, etc.) is significantly less than the times I discuss about: k8s shenanigans, Go/Java/etc. intricate details, library incompatibilities, build systems, S3 access rights, etc.

I don't see a distinction, tbh. Coordinating software is part of software engineering even if it's "boring" or "frustrating". If you want to code business logic in a vacuum you'll be left with useless code and nowhere to run it.


I'm guessing you work at a small to medium sized company? Either that or you've likely been pushed into a role you dislike?

There are still people focusing on those 3 components nowadays, but it's because either they're so small they don't have to scale yet, or they're big enough they have an SRE/Devops/Whatever team that deals with the cloud crap so they can focus on the actual applications.


In my experience, even if the company has a platform team, they still expect senior/staff software engineers to know about platform/infra topics and work on them. It’s strange because those very same companies don’t expect platform engineers to know about product related topics.

For example, while a software engineer needs to know about let’s say, DDD, the business logic of the product, and K8s, a platform engineer in the same company only needs to know about K8s.


Knowing about something doesn't mean it should be bogging down half your workday though. I think it's important for engineers to understand a high level the platform that runs their code and how it scales, but if they're spending a large chunk of their time orchestrating deployments and screwing around in IAM as OP insinuates, that's a red flag IMO. My teams certainly don't.

If you’re building something front scratch, the architecture/modeling/abstraction work is done first. This sounds like a scaling issue, and a simpler one at that. As for harder problems, we’ve had to rearchitect parts of our software several times as it scaled.

So yea, I can understand why you’d find this kind of work annoying, but in my experience it’s mixed in with more traditional harder problems.


> This is what software engineering looks like nowadays

when was it different? Java has had a jillion flags to fiddle in prod for 30 years, every company comes up with their own C++ subset that they and their compiler can tolerate, every Python shop has idiosyncratic linter settings.

are you saying you wished we'd all ended up writing Eiffel?


Sadly it might be even worse, when the only thing you're doing is connecting SaaS products with Web APIs via some low code tool.

Which is quite trendy on enterprise consulting nowadays.

Bizztalk and BPEL were ahead of modern times, apparently replacing XML with REST/JSON made them cool.


My last job was like this, at a startup with about 15 engineering and 100 people. Just Python, Postgres, Celery, basic GCP and a few more. There are many companies like this out there.

how long ago you were a junior engineer?

I suspect it was possible because availability/performance requirements were different.

also-I think your defined problem is partially caused by developers themselves: if you give them simple (even if it is challenging) project, they get bored and start creating problems themselves to make project "worthy" of their time


just go work on a monolith that's slow as molasses and no team wants to truly own and the modelling, abstractions, and perf algorithms will naturally take priority.

also, i'd argue those are more computer science topics, software engineering is much more about practical application and integration with wider systems.


yep, that is because of cloud. i hate it, with a passion. and am never going to touch it ever again. BUT you also have to keep in mind that when you were a junior, there were no large websites like facebook or youtube and whatnot, like there are today. and these beasts simply have way different requirements. and it is not only about being the top dog. but even a corporate website of a larger company these days simply needs computational power/infrastructure to serve the whole world these days. so it's not like there is a single thing we can blame here. the dev and ops have been merged and there is no longer a distinction. you have to write code with distributed computation, db,... in mind these days. which means you need to think about all those things that are outside of the business domain.

You can find jobs developing apps for internal workflows. It doesn't pay as well as Silicon Valley startups with fancy technology.

Honestly if you can get testing, build and deploy right you've solved 90% of the problems in any company I've worked at. Most places suffer from low morale and velocity because their software is fragile and poorly factored. Tons of haunted forests requiring specialist knowledge to manually test every release.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: