> Over the course of 22 kernel builds, I managed to simplify the config so much that the kernel had no networking support, no filesystems, no block device core, and didn’t even support PCI (still works fine on a VM though!).
Flashbacks to a job where I was asked to figure out why a newer kernel was crashing. This was a very frustrating time, because I had (have) basically zero real C/C++ experience but I'd helped out with Bitbake recipes and everyone else was busy or moved to other projects.
To cut a multiweek tale of dozens of recompilations short: The kernel was fine. The headless custom hardware was fine. The problem was a hypervisor misconfiguration, overwriting part of the kernel address space. All of our kernels have been corrupt, but this was the first one where the layout meant it mattered.
A month of frustration, two characters to fix, the highest ratio I've encountered so far.
My reward for struggling through a complex problem I was unqualified for? "Great, now we need to backport security patches from the main Linux kernel to the SoC vendor's custom fork..."
Ah, I remember 2017’s HN where the mere word “Go” in the title made posts skyrocket to 498 points. In 2024 the title should’ve contained “Rust” for the same outcome. Quite curious what will be required in 2031.
This is honestly wild. 99% of devs would have found a work around and moved on. Going so far as to create a multi-kernel test bench to narrow down the source of the instability is a level of dedication I have not personally seen, and I respect it.
Problems like this tend to come back and haunt you though. Sure, you can set max threads to 1 and move on with what you're doing for a while... but a lot of people run Go so they can have a lot more than 1 thread.
I've run into some of these where it's a lot more rare to hit, and so then it's reasonable to not do the thing that hurts, but watch out for it in the future. Sometimes you get lucky and it magically fixes itself forever; sometimes the weird case that you only hit with internal traffic ends up getting hit by public tratfic a lot.
Crashes like this where a wild write breaks something at a distance are always a PITA to debug (especially here, where the wild write is harmless if there's no data race)
By the same token, you might be the first person I have ever seen give respect to the 1x developer. I respect that. We could no doubt all learn a thing or two from the 1x developer that doesn't rush through everything with quick solutions.
If you check the issue[1] he reported the crash on November 7th and reported the issue is related to gcc and the kernel on November 8th. At least he was very quick going through the rabbit hole.
> means that a 1x developer will take around 10 hours to get to the same place
A "workaround" isn't an adequate substitute for actual the understanding and fixing of the root cause of a bug.
What you think is a 10x developer is, in fact, a short-term 10x developer, medium-term 1x developer, long-term -10x developer. Their work while seemingly great at first is just accrued debt with an incredibly high interest rate. But they're rarely the ones fixing it.
Now, like everything, a balance needs to be struck between spending hours fixing a bug _the right way_, or, finding a temporary workaround. The real 10x developer is incredibly good at finding this balance.
> A "workaround" isn't an adequate substitute for actual the understanding and fixing of the root cause of a bug.
Right, hence why we recognize that a 10x developer is a weaker developer. Was there something that implied that a weak developer is a substitute for a talented developer for you to say this, or are you just pulling words out of thin air?
'a weaker developer' is the opposite of the standard definition of 'a 10x developer'. if you are going to use phrases to mean the opposite of their conventional meaning you should warn people so they can avoid arguing with you on the assumption that you're using them to mean what everyone else uses them to mean
Why is the onus on me to try and hold back those who don't know how to think straight from getting into an argument with a straw man? They could, you know, just not do it.
Rather than encourage a dumbing down the content to the lowest common denominator, why not encourage those with low comprehension to not get into arguments in the first place? Dumbing down the content does not improve the quality of discourse. That is a entirely flawed notion. Hacker News is decidedly intended for a hacker audience, not an audience of high school debate team hopefuls. There should be no harm in speaking to hackers at their level.
Merely arguing about your _definition_ of 10x engineer and 1x engineer.
Your original comment at the top of the thread implies the engineer who wrote the blog post is a 1x engineer because they spent so much time finding and fixing this bug.
Yes, and as the comment before it asserts, most engineers would never take that kind of time. They'd bang out some workaround as quickly as possible and move on with life.
But my original comment at the top also praised the value of the 1x engineer; noting that the rest of us could learn a thing or two from them. There is no denying that 1x developers are the better developers.
The question remains outstanding: Where did you pick up the suggestion that the quick fix is a suitable replacement for the talented engineers who can fully understand a problem that prompted the rebuttal?
Emm, 10x developer is a myth. The real 10x developer is the one that creates such an infrastructure/libraries/culture that enables 10 other engineers to move fast. The 10x developer is not the person that is 10x faster than other developers on the code base simply because they can hold the spaghetti code they wrote in their head.
Is the 10x thing really only about speed? I would say this guy is a perfect 10x example as he actually gets to the root cause of a difficult problem. When I think of 1x (or less) devs they are usually the type that don't get things done because they can't (without a lot of help), not because they are slow. I.E. overall technical chops, not just speed.
As long as they're moderately competent, yes. They need to know how to debug normal things, but they don't need to handle every esoteric niche. The thing that matters is ability to put out tons of productive code.
Often that means a speciality where they're an expert, but it doesn't have to mean that.
> When I think of 1x (or less) devs they are usually the type that don't get things done because they can't (without a lot of help), not because they are slow. I.E. overall technical chops, not just speed.
It sounds like you're taking "1x" as a dismissal. Isn't it supposed to be a pretty ordinary dev?
Yeah... didn't mean the 1x as dismissively as it came off. Got to carried away with my point. My mistake.
But to my original point, I just don't buy the 10X story as being only how fast you write code. I've known developers who designed and wrote such that much time was saved working with that code in the future. These people are, IMO, much more deserving of the 10x moniker as maintaining code is much more important that just cranking it out... with the only possible exception being early stage startup (but not really as they will be crippled in the next stage).
I believe the 10x developer thing is stupid, and based on perspective. Watching someone work faster than you doesn't mean they are doing better work, and on the other hand could mean perhaps you are maybe a 1/10th developer.
That said, if a 10x developer does exist, Marcan is one of the few of them. What a ridiculous statement.
I'm pretty confident in my computer abilities, but when I read stuff like this I feel like I have no skills at all compared to this guy. Like I'm still a high school athlete and he's an Olympian (also he was only 26 when he wrote the article).
This is very elegant. I’ve had my share of nasty system bugs ( compilers and kernels ) , but the dedication and the speed with which he went through it is quite remarkable.
The explanations are also very clear. Thanks for posting.
One thing I learned from that post back then is that you can instruct Grub to ignore some part of your physical memory. Really nice trick, not sure this is doable on Windows / Mac?
I feel like I still didnt fully understand what's going on here. Is the following correct? "Threads hava a 'canonical' stack that the OS auto-grows for you as you use more of it. But you can also create your own stack by putting any value you want in RSP. This is what the Go program did, and the vDSO, assuming it ran on an auto-growing stack, tried to probe it, which lead to corruption."
I believe that Golang, as a green-threaded runtime, is allocating a separate carrier thread for running syscalls on, so that a blocking syscall won’t block the green-threads. These syscall carrier threads are allocated with a distinct initial stack size + stack size limit than green-thread-scheduler carrier threads are, because it’s expected that syscalls will always just enter kernel code (which has its own stack.)
But vDSOs don’t enter the kernel. They just run as userland code; and so they depend on the userland allocated stack to be arbitrarily deep in a way that kernel calls don’t.
As shown in the article, Golang seems to have code specifically for dealing with vDSO-type pseudo-syscalls — but this is likely a specialization of the pre-existing syscall-carrier-thread allocation code, and so started off with a bad assumption about how much stack should be allocated for the created threads.
(I should also point out that the OS stack size specified in the ELF executable headers, only guarantees the stack size of the initial thread of a process created by exec(2). All further threads get their stacks allocated explicitly in userland code by libpthreads or the like calling malloc(2). Normally these abstractions just reuse the same config params from the executable (unless you override them, using e.g. pthreads_attr_setstacksize). But, as the article says, Golang implements its own support for things like this, and so can implement special thread-allocation strategies per carrier thread type.)
The problem is that the vDSO (which is compiled as part of the kernel but runs in userspace) does a stack probe for security reasons, trying to see if it will overrun the stack. It does this by checking if at least a page’s worth of data is accessible. If not, it will (typically) fault. However, Go programs use a stack size so small that they may have other data a page away, which means the problem may mess with that data and cause bad things to happen.
In that case it couldn’t corrupt data, but the orq instruction itself could crash if it pointed to an unmapped address. (Which is kind of the point of stack probes.)
Thread stacks in Linux are demand-paged: if you touch the next page then it magically exists, up to a limit. But the machine is not concerned with the convenient properties of this virtual memory area. To the CPU the register RSP is just an operand, expressed or implied, to some instructions.
Could someone explain why Google is obsessed with small stack sizes? The musl library also has an extremely small thread stack size, which makes many applications crash that are used to 8192K on Linux.
On Linux the 8192K aren't reserved unless the are actually used, so what is the point?
Ok, Golang will allocate green threads from its own allocator, but 104 bytes?!
I don't believe that Google has anything at all to do with musl libc.
Applications should not be assuming a hard-coded stack size, that's a bug. They should be checking the value of PTHREAD_STACK_MIN. That is what it's there for.
I wonder if they might be favoring spawning numerous instances of an application with few threads and little memory use to scale at the replica level over just scaling within an application? At least Go defaults do suggest that (and are also the reason for very poor performance on large many-core nodes).
Flashbacks to a job where I was asked to figure out why a newer kernel was crashing. This was a very frustrating time, because I had (have) basically zero real C/C++ experience but I'd helped out with Bitbake recipes and everyone else was busy or moved to other projects.
To cut a multiweek tale of dozens of recompilations short: The kernel was fine. The headless custom hardware was fine. The problem was a hypervisor misconfiguration, overwriting part of the kernel address space. All of our kernels have been corrupt, but this was the first one where the layout meant it mattered.
A month of frustration, two characters to fix, the highest ratio I've encountered so far.
My reward for struggling through a complex problem I was unqualified for? "Great, now we need to backport security patches from the main Linux kernel to the SoC vendor's custom fork..."