Naturally, this article inspires multiple comments declaring that the way the kernel allocates or arranges memory is so obviously stupid and wrong, and that they know the right way of doing it.
I love the certainty of Hacker News posters! So confident in their own abilities and opinions. Never a moment of concern, introspection or doubt. What a world to live in!
Also, has someone thought of writing a 'it should be re-implemented in Rust' comment auto-generator? Or is that already running here?
It's a nice antidote to Hacker News, on days when Hacker News is too... Hacker Newsy :^)
That is a stunningly apt description of those comments.
And I am definitely guilty of using "generally" in that manner.
When listening to someone else's shared experience, just don't forget to understand it in context. Of course it's helpful to know a little bit about that person.
No. No, there really isn't.
 'reading' being a highly optimistic term.
This is, in my opinion, a rather generous interpretation
of events ... What I see when reading this is a system
(the Linux kernel and surrounding infrastructure like the C
language) that is very poorly designed. ... I'm not trying to suggest the kernel should be rewritten in Rust.
If one cannot mention their opinion on this site what indeed is the point of the site? Furthermore, your response is typical of what I've seen to shutdown discussion of kernel development: to suggest the author is an idiot and cannot possibly understand the complexities involved, and if they did they would follow the established development procedures. Do you believe the current state of kernel development is perfect? If not, if one cannot discuss problems how will one ever improve? In this particular case it seems to be a well studied class of bug (use-after-free). Why do you suggest that the many years of work on tools and other methods to address this problem should be ignored?
What I see when reading this is a system (...) that is very poorly designed.
I know it's just your opinion, so what? Everything that people write on here is their opinion. But in true Hacker News style, you're straight out declaring that the low level internals of Linux memory management are rubbish. Not just poorly designed, but very poorly designed. As you say yourself, 'to suggest the author is an idiot'
Before this declaration, did you pause and think that maybe, just maybe, the kernel code authors might not be idiots? Or perhaps, you might have reflected, the kernel developers may have considered various different memory management schemes, maybe they weighed up the pros and cons of different approaches and found the current system to be good? Did you entertain the possibility that there might well be other non-obvious considerations that influence the choice of data structure? Apparently not, because you're happy to quickly declare that it's terrible code.
I'm not declaring you to be an idiot. It's the fact that you and others on here launch straight into the full-on 'this is/they are stupid, they should obviously do this' kind of statements, showing a dearth of empathy and a complete failure to look at the other side's view.
I'm not trying to shut down discussion. I'm not saying the kernel is perfect, or that the developers are beyond reproach. I'm not saying ignore years of work. It's the way in which some HN users, like you, jump into extremes of viewpoints (even your response is full of it too, with the questioning about perfect, cannot discuss problems, etc)
And the rust comment was a throwaway joke, HN commentators cannot seem to resist putting a 'do it in rust' comment somewhere in all code discussion...
I don't think my position is as clearly articulated in my first comment as it might be, but I doubt you or others have much interest in it so I'll drop the conversation here.
those who have an interest in kernel development might get something of technical interest out of your comment. those who don't have an interest in kernel development find a lot of the internal debates in that field to be funny, particularly when they seem to show up in all kinds of tangentially-related threads.
it's a quirk. kernel developers have this quirk and it's funny. just laugh. it's ok.
But my experience is in a particular type of software, and maybe I'm shit at my job. :P So why does the kernel use linked lists so much? Are there cases where I can do better by using them myself?
Edit: someone already asked, here: https://news.ycombinator.com/item?id=14182922
Besides, my Turing machine has an infinite length of ticker tape, so memory allocation is not my concern :)
(Please don't use sarcasm, it's rarely constructive)
I'm saying that HN has a vocal population who are so very sure of themselves and their own opinions that they'll rapidly post messages here declaring that innards of a large, complicated piece of software are so obviously wrong and that they know what the right way is. They seem to live in a world of utter certainty, never a doubt that they might not know all the considerations that could weigh in on the design of some inner working of a complex system.
HN would be a better place if these people could entertain some humility, perhaps a little bit of self-doubt, perhaps an inkling that other people may know something they don't, and that the world is not always the black/white, yes/no certainty that they seem to experience. Or we can just drown in the continual "This is stupid! they should do X" style comments.
The thing was, I didn't actually have any certainty about anything. I just thought that the best way to talk about an issue was to pick a "side" (often at random), and let other people pick their "side", and then defend your "side" every time it gets rebutted, gradually researching the issue from the perspective of your side in order to best steel-man your own position, while assuming others are doing the same in reaction to your own arguments.
In other words, basically pretending internet comments are a debate club, or a court-room, where you're not seeking truth in an Aumann's-agreement-theorem "both people gradually converge onto the truth" way, but rather an adversarial discovery system where the truth comes out for the audience in the compilation of evidence presented by both sides, whether those sides see it themselves or not.
I don't know, it was fun at the time. Maybe I was a teenager with a lot of testosterone.
Until you see such a thread go four or five levels deep, it's not really clear that that's what's happening, and so you might want to join such a thread as a sane, calm individual, only to have your own argument taken apart "by the seams" in response (e.g. asserting that due to errors in your deduction, you didn't prove your point, even if you may be right) without engaging with its content. Good for lawyering, bad for truth-seeking.
If you then are willing to get baited into playing the game in response, honing your own attack to be bulletproof like a lawyer would, then the truth might eventually find its way out. But if you didn't want to play this game, you'll probably just quit when you realize what's happening, maybe with a petulant "you kno what? Screw this." at the guy who just can't seem to get your words through their head (when they never even believed in their chosen position, so there's no convincing to be done.)
If you explicitly indicate that you want to play this game, it could be great fun for all parties. But I know from experience that the other side of this—being baited into playing without knowing the game is afoot—is painful, and can burn people out on a forum community.
Personally, I think the adversarial system is inherently and downright bad and wrong. The idea isn't to get to the actual truth, it's merely to win the argument at all costs. There might be an idea that this efficiently gets to the truth, but I don't think it does at all; this system just finds who's better at arguing. An inquisitorial system I think is superior at getting to the actual truth. See your own comment about "good for lawyering, bad for truth-seeking": I think this is exactly what's wrong with the common-law system.
Personally, do like to play devil's advocate a fair amount, but not to the extent where I need to actually win the argument, I just want to make sure all sides are covered and that people aren't missing something major by hopping on a bandwagon or engaging in group-think. There's a big difference between this and just endlessly arguing one side without actual regard to the truth or the merits of the other sides (and there may be many--another problem with this adversarial approach that always assumes only 2 sides).
Also related, personally I think formal academic debates are similarly bad and wrong, though not quite as bad as lawyering because at least with academic ones someone's life or welfare is not directly impacted by the contest over who's a better arguer.
I disagree with this one. The best, most productive discussions I've seen on HN are those where people have been clear about their views and expressed them succinctly without the hedging and careful politeness that's expected these days.
I've more often heard, or gotten the impression, that discourse is getting more polarized and vitriolic these days. So which is it?
I do quite a bit of work on improving the safety of various code, both professionally and in my side projects. The techniques I advocate are those that I've found effective. But I can't do everything on my own, unless you're offering to pay my contracting rate for me to work on a new kernel.
This is a false dichotomy. Some discussion can be constructive while some is not.
Whereas, suggesting a rewrite can lead to results when the tool is new, small with significant use + bugs, or small to medium with safety- or security-critical usage. Examples of last ones include DNS, web, key, file, and other servers that are mission critical and widely deployed. Example rewrites that actually happened... we'll go with DNS... are IRONSIDES DNS in SPARK Ada and bluejekyll's TrustDNS in Rust. Maybe add Lobo web browser since it ambitiously wrote a browser in pure Java. Dead now but impressive how far project got.
So, yeah, there's certainly pointless discussion that waste space on HN. Arguing for a Linux rewrite is always one of them unless someone with a pile of money dedicated to rewrites is asking which mainstream OS to rewrite in safer language. I'd prefer FreeBSD but probably recommend Linux for higher impact.
Google does use rust in the user layer of Fuchsia.
This is not true. There are even bindings written in Rust for Magneta kernel in fuchsia.
> Is this supposed to be taken as Google engineers have infinite wisdom on how to build kernels and therefore we should copy what makes sense for them?
Main benefit of using Rust (memory safety) is probably not high enough comparing to eventual costs - that's why they don't use it in kernel. You make bold claims about decisions of fuchsia developers.
What's your expertise in building kernels and operating systems that you think we should not copy it? How many kernels and operating systems did you built? I assure you that this team have a lot more wisdom than you and me combined in this space. Have a little respect.
> and there are lots of explanations for why it is in C other than C is better than Rust.
No one proved that Rust is better for writing kernel to this day because there are 0 kernels written in Rust used in high scale production systems today. Until this will change, C is proven to work in this space and considering that - it's better/safer choice there today.
>> Why are fundamental things like resource allocation at least not standardised in the kernel?
> I think this is a typical reaction from someone with no clue what is going on inside.
It's not hard, but you do need to understand the idiom. Why isn't there safety mechanisms? Because the kernel is optimized for speed, and instead of adding run-time checks which can be expensive, there are various debugging tools, including slab poisoning and KASAN to find such issues.
While I was making a baseless assumption, I like to believe that it was baseless because there was not enough available information, and not because I was ignoring obvious sources of information :P
(Singly) linked lists are trivial to implement in an immutable way (the article talks about the difficulty of doubly-linked lists, implemented in a mutable way).
It's trivial to have linked lists share a tail; this makes datastructures like cactus stacks really easy.
Linked lists are amenable to reasoning by induction (e.g. in Coq, Agda, Idris, etc.).
Linked lists are amenable to lazy algorithms (e.g. iterators/generators).
Linked lists can be heavily optimised, e.g. using stream fusion.
That's off the top of my head. Note that in many cases it might be preferable to use linked lists in the source, but have them compiled to some other representation like arrays (or, in the case of fusion, a single tight loop).
In the kernel, where you have to handle every single error path, this is a big deal.
* In multi-threaded context insertion and deletion can be done lock-free which is not the case for arrays ;
* For intrusive list, insertion and deletion can't fail due to memory allocation ;
* If you do only insertion at the tail and removal at the head these operation need only a pointer change which is more efficient than an array ;
* Merging two list is only one pointer operation ;
* Intrusive list have better memory allocation properties ;
The problem is that, on new hardware, linked list should be used in less cases that in the past and some people have generalized this to linked-list should never be used. This is false, if you care about performances you should keep thinking and testing what is the best data-structure in different cases. If you don't care about performances, you should use an abstraction who implement what you need and don't car about the underlying data-structure.
I just love when multithreaded code frees the element I currently iterate over in a different thread.
> For intrusive list, insertion and deletion can't fail due to memory allocation ;
As long as you store your elements on the stack or use static values you mean? Heap allocation happens at some point, you just move the location where it happens.
> If you do only insertion at the tail and removal at the head these operation need only a pointer change which is more efficient than an array
Use a circular buffer. Until the array is full you can just modify the start/end pointers.
Even if I did control it, there are probably several lists a socket might be a part of. Storing the socket in a dynamic array will only work for one of those lists. It's similar to a sparse database index - you can only have one of them. The rest of the lists have to be linked.
Not to mention, when items are shuffled around in a dynamic array, you lose the ability to have a consistent pointer to them.
You say "the same operation just requires pointer changes (after finding where you want to insert or delete)" like it's some trivial thing that doesn't enter the equation. But finding where you want to insert or delete in a non-contiguous linked list is not a consideration that should be an afterthought in brackets.
Furthermore, when you have found the location, it's still not _just_ pointer changes. You actually have to allocate the memory for the node, too.
Traversing pointers and allocating memory is a lot slower than working with a contiguous block of memory (assuming you don't have to re-allocate the contiguous block of memory, of course).
Either way, don't take my word for it. If you're a C++ programmer, just benchmark std::vector against std::list, or watch this video - https://youtu.be/YQs6IC-vgmo
> Furthermore, when you have found the location, it's still not _just_ pointer changes. You actually have to allocate the memory for the node, too.
This is not normally true. In embedded development and game engine memory managers, data nodes already include list pointers, and there are several ways the data nodes can come to exist that don't involve a separate allocation per node. Sometimes it's allocations of blocks of nodes. Sometimes it's slab allocations (to use the terminology from the article) - a memory manager that pre-allocates constant sized memory blocks.
Kernel lists don't normally involve hundreds of thousands of elements, and using dynamic arrays for small lists, the separate alloc & realloc time may not be even close to amortized away.
> just benchmark std::vector against std::list
This is unconvincing! Kernel & embedded code do not normally use std::list. There are very good reasons people warn against kernel development in C++! And performance benchmarks are the lowest item on the priority & constraint list. The least bad thing that can happen is going slower. Fragmentation is a bigger priority, and dynamic arrays have much worse fragmentation patterns because the allocation size changes over time.
Reallocating a big dynamic array is also a no-go in many time-constrained situations.
> You actually have to allocate the memory for the node, too.
The way this is done is by intrusive linking (the data must include list pointers). The data usually knows in which lists it is linked, so there is no point in having a version of the data without a list head.
don't forget, deletes, as well.
We're talking about kernelspace, though, and different rules apply there. Memory allocation works very differently, and you have to handle every possible error path.
I'm don't know how the slab cache allocation would benchmark against allocating larger chunks, I was just trying to give a classical answer to the question.
on modern cpu's with huge caches, i seriously doubt this claim.
fwiw, i have played around with synthetic problems where: i insert sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequence of positions. and almost always, vectors outperform lists by at least couple of orders of magnitude.
or to put it another way, it is almost never about either lists / vectors or something else, and boils down to couple of 'rules' e.g. access data predictably (avoid trashing the cache), keep data compact (more cache utilization) etc. etc.
Which means that you scan the list twice for each node. First for insertion and then for deletion. Starting at maybe a hundred that would be much more efficient with a balanced search tree. They have a good Red-black tree in Linux.
But if it's only arrays vs lists - with linked lists, pointers to items are never invalidated. With dynamic arrays, they are.
Dynamic arrays are fine for plain old data arrays. But it's harder for data that has "identity" and is frequently mutated. And if a node is in multiple lists, I guess arrays are out. You'd need to store the data redundantly. Extra bad if you need atomic mutations.
So it's not surprising that linked lists perform very poorly in such circumstances.
Also in some cases the linked list header may be embedded in other structures, so you'd have to pull the header out to go with a purely vector-based approach for when you need to shift (assuming you can't move the entire structure). And if the structure needs to reference itself in the list for quick removal, that's a problem if its index is changing. Otherwise you could just search the vector for references at linear cost.
I thought this same thing when I started doing embedded console game development and saw a lot of linked lists going on. I came to realize there are a bunch of really good reasons to use linked lists.
The biggest difference in practice is that linked list pointers are commonly fields inside the data that you're linking, as opposed to being a separate data structure. With a dynamic array, it's usually a separate data structure. Neither of those is absolute or always, but that's how it normally plays out.
With that in mind:
Sometimes you're loading a block of pre-formatted binary data off disk or from a network that has space for linked list pointers. Rather than allocate anything else, you can fill in the list pointers.
Sometimes you have a lot of small lists to manage. If you're allocating space for your list data, and the dynamic array doesn't get big enough for amortizing to take over, then you're doing double allocations, one for the data and one for the dynamic array & potentially first couple of reallocs.
Fragmentation is worse with a dynamic array (when it's a separate data structure), and re-ordering events are much costlier in arrays than with linked lists.
Consistent performance. Every insert and every delete requires the same amount of time, at the cost of a higher average traversal time. You can ameliorate some of the deletion reordering cost if you can change the order of the array, but not all use-cases allow for this.
Also, intrusively linked lists (that is, the same item represented in multiple lists) are just as poor, performance wise, when implemented with arrays.
I did a quick test of this at one point, comparing C++ std::list with a linked list; the performance of the dynamic array was significantly higher for most operations (about one order of magnitude for my test), but the worst-case was much, much worse than the linked list (somewhere close to two orders of magnitude).
Perhaps Rust could have prevented the bug? I don't know. I haven't looked into Rust a great deal, but the impression I get from what I've read is that it severely limits expressiveness (read: the ways in which you can define APIs) unless you encapsulate things in blocks annotated "unsafe", at which point you lose any checking the Rust does. Would it even be possible to port the sockets code to Rust without fundamentally changing the APIs and data structures? To the Rust people: it comes off pretty badly when you throw around claims like "C effectively is unsafe everywhere" because "unsafe" in English doesn't mean what it means in Rust nomenclature; calling something "unsafe" implies it has bugs or security holes. It is perfectly possible to write bug-free secure C code, and also perfectly possible to write buggy or insecure Rust code. What I think you mean is: C code is roughly equivalent to what you'd find in a Rust block annotated "unsafe", and thus cannot perform some of the compile-time checks that Rust can. Calling it something else like "lowlevel", "expressive", or "tricky" would be better.
I omitted so many details writing this -- otherwise the main story would have dragged on way too long. My comment here  gives some context.
Words are hard; your suggestions of "lowlevel", "expressive", or "tricky" have problems too: Rust is not exactly "high level" or "low level", that is, it feels higher level, but can easily go as low-level as you need; "expressive" is suggested because you see Rust as being very restrictive, but not everyone feels that way, "tricky" might work, but also has a negative connotation as well.
It is perfectly possible to write bug-free secure C code,
This is, in my opinion, a rather generous interpretation of events. What I see when reading this is a system (the Linux kernel and surrounding infrastructure like the C language) that is very poorly designed.
For example, resource allocation is a fundamental issue and a constant source of problems. We've known this as a field for a very long time. Why are fundamental things like resource allocation at least not standardised in the kernel? Better would be to enforce safe resource usage by static checking. Rust is an example of what can be done here, but much more is possible. (I'm not trying to suggest the kernel should be rewritten in Rust. I'm just using it as an example that resource usage [in particular, memory allocation in Rust] can be done in a way that prevents errors.)
Burning huge amounts of human effort is a popular approach to software development but I really hope we can do better in the next few decades. The worst thing about the current situation is the programmer blames themself ("This one kept me up until 3AM. In hindsight, every bug looks simple, and when I figured it out, I was embarassed that it took me so long") rather than wondering why this problem is even possible in the first place.
> Why are fundamental things like resource allocation at
least not standardised in the kernel?
I'm an outsider myself, so take this with a grain of salt. But having implemented a few datastructures in C, this gave me a little chuckle. You can't "standardize" resource allocation. To be efficient and organized you need to tailor allocation to the problem at hand. I bet there are at least half a dozen common strategies for allocation in the kernel. (And most individual methods are of course "standardized". Do you suggest kernel developers are too stupid to see obvious improvements?)
What you can, and probably should, do is have all elements of one type allocated/managed through the same mechanism.
What seems to be the issue here is that the ownership of memory is not clear. The author seems to think they own it (it's not clear from the blog post if they allocated the memory but I assumed they did) while some other part of the kernel is freeing the memory. There are ways to track ownership of memory so this cannot occur. Rust does this. It's independent of how memory is allocated. A weaker form is to use some convention, which doesn't seem to be the case in the example described.
The other issue I have with your response is what I see as a typical knee-jerk reaction whenever the kernel is criticised: your calling me clueless and saying "Do you suggest kernel developers are too stupid to see obvious improvements?" I.e. to suggest that any criticism of the kernel comes from a place of complete ignorance and kernel development could not possibly be improved.
Kernel developers are not gods. They are not perfect. They are not above question. What I have seen is a lot of arrogance amongst kernel developers, the few times I've dipped into kernel development, that makes me think they probably are missing some obvious improvements.
This paper: http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf is both a good read and an example of missed improvements to the Linux kernel scheduler, and it does suggest (though doesn't conclusively demonstrate) that these improvements are missed due to the arrogance of the kernel developers.
These types of problems (getting ownership wrong) might be common, but whether they can be solved using a mechanism like Rust's remains to be shown. I doubt it, but I'm not intimate with Rust.
(Personally most Rust programs I've seen so far are off-putting at a syntactical and sometimes architectural level - the latter is of course not entirely the language's fault. Kernels will probably not be one of the first areas of Rust adoption).
I too find Rust a bit off-putting on the syntactical level, though architecturally what little Rust I've looked at does appeal to me, as I'm in the functional programming camp from which Rust draws inspiration. I'll hopefully get to do some real work with it soon, which will give me a better feel for it.
Agreed that the power of Rust's borrow checker remains to been shown on a large scale. I'm optimistic it will be, but time will tell.
You probably mean popular/used-in-production kernels, but there are OS's being written in Rust, e.g. Redox 
Yes, fundamentally this appears to be the issue, and I agree entirely that the concept of ownership is something you can get, albeit a not great version of it, in a standard way. C++'s unique and shared pointers, as an example, make ownership concepts much clearer. Why is the Linux kernel using raw pointers? Even just having a unique pointer would provide tons of help.
This is not an uncommon issue. The Linux kernel is plagued with use after free vulnerabilities and invalid dereferences etc. But there's very little upstream movement to solve these problems.
There's nothing wrong with scary-potentially-misused-unsafe stuff as long as you can isolate it and therefore less likely misuse it unsafely...
C effectively is unsafe everywhere, and other languages that are safe are non-zero-overhead, or so far up the stack that they cannot be used in kernel/system level stuff.
Is that right?
Otherwise GC is basically the only way in rust. You can have tightly-scoped GC in rust that doesn't affect the runtime of everything else, though. No good such GCs exist right now.
There are some type system tricks in hypothetical type systems you can do to make it possible to deal with these without runtime tracking. You can use the type system to ensure Detach traits get implemented, where Detach is for breaking cycles in a pass before destruction.
I'm betting you could do something with a dependent type system - from what I understand you can encode the state of data as a type... but I'm not really familiar enough to say for certain. Take this comment with a huge grain of salt.
Or rather, knew of one without a GC.
They are. The slab allocator mentioned in the article is one part of that. The problem here was not that allocation is ad hoc, but that the author was unaware of the conventions.
It would, of course, be nice to have some static analysis to help out people in that situation.
Don't get me wrong, I learned all the sharp edges by trial-and-error in the C89 days with Pascal and embedded assembly, but I think we've got to have systems languages which provide higher-level abstraction for bare-metal kernels. It's just that even codebases like OpenBSD don't go far enough like seL4 to verify their security model beyond reasonable doubt with formal methods.
Absolutely agree. I think environment is partly to blame. Devs fight fires rather than prevent them because it makes their efforts more visible.
It's the reason you can demand metrics to justify everything - the only metric that justifies defensive programming are examples of bugs/failure, and it would be better to just avoid them in the first place.
The real problem isn't linked lists. It's that the author learned how something worked (the spec), forgot about that, and then essentially coded against the wrong spec. The title might have instead been: "I forgot something and broke a kernel." Much more accurate given people who know they're dealing with slab allocators, linked lists, etc usually use them correctly. They can also sleep a bit better instead of being up until 3am debugging.
Here's one: https://github.com/troydhanson/uthash
There are not distinct allocations for list links; the pointers are embedded (via a small struct) in the struct that is being dealt with. In walking the list, you normally need to look at that stuff anyway, and with everything together you can share cache lines.
It isn't normal to run down the entire length of a list without looking at the data. Since you need to look at the data anyway, it is best that the data share cache lines with the linked list info.
There a number of variations on linked lists, which are cache friendly (e.g. unrolled linked lists)