Linked Lists Are Still Hard

joosters · on April 24, 2017

Ah, Hacker News comments, you are so predictable.

Naturally, this article inspires multiple comments declaring that the way the kernel allocates or arranges memory is so obviously stupid and wrong, and that they know the right way of doing it.

I love the certainty of Hacker News posters! So confident in their own abilities and opinions. Never a moment of concern, introspection or doubt. What a world to live in!

Also, has someone thought of writing a 'it should be re-implemented in Rust' comment auto-generator? Or is that already running here?

fao_ · on April 24, 2017

You might like http://n-gate.com/

It's a nice antidote to Hacker News, on days when Hacker News is too... Hacker Newsy :^)

pc86 · on April 24, 2017

> Now that they've got some hard data to ignore, they have another shitfest of lies, confusion, and using the word "generally" to mean "in the cases I have noticed." One Hackernews wonders what the big deal is with the Constitution. Someone shows up to compare it to software.

That is a stunningly apt description of those comments.

And I am definitely guilty of using "generally" in that manner.

jstimpfle · on April 24, 2017

And I think that use of "generally" is justified, at least if you are (relatively) experienced.

knieveltech · on April 24, 2017

We like anecdata now?

jstimpfle · on April 24, 2017

There is a difference between giving anecdotal evidence and speaking from years of experience. The latter is often the best you can get, and much better than set-up studies, for "soft" controversial or topics.

When listening to someone else's shared experience, just don't forget to understand it in context. Of course it's helpful to know a little bit about that person.

fao_ · on April 24, 2017

> There is a difference between giving anecdotal evidence and speaking from years of experience.

No. No, there really isn't.

https://en.wikipedia.org/wiki/Anecdotal_evidence

Jaruzel · on April 24, 2017

I don't think my eyes are ever going to recover after reading[1] the n-gate.com about page. :O

---

[1] 'reading' being a highly optimistic term.

fao_ · on April 26, 2017

Really? I find it easier to read than most modern "well designed" websites.

gnuvince · on April 24, 2017

My browser never recovered, had to `kill(1)` it.

rdudek · on April 24, 2017

Bookmarked! This should be a great daily dose of laughs.

TurboHaskal · on April 24, 2017

Good laugh. Whenever I stumble upon a website made with werc I know I'm in for a treat.

yeukhon · on April 24, 2017

Are the contents auto-learned and auto-generated? I would be impressed, but most likely human?

knieveltech · on April 24, 2017

Oh that's getting bookmarked. Thanks for the share!

noelwelsh · on April 24, 2017

Since you appear to me to be referring to my comment (as I'm the only top-level comment to mention Rust) let me quote some of my comment, with emphasis added to phrases I used to indicate that I was presenting was my opinion, not presenting my opinion as fact, and indeed not suggesting a rewrite in Rust (which you appear to have incorrectly inferred.)

This is, in my opinion, a rather generous interpretation of events ... What I see when reading this is a system (the Linux kernel and surrounding infrastructure like the C language) that is very poorly designed. ... I'm not trying to suggest the kernel should be rewritten in Rust.

If one cannot mention their opinion on this site what indeed is the point of the site? Furthermore, your response is typical of what I've seen to shutdown discussion of kernel development: to suggest the author is an idiot and cannot possibly understand the complexities involved, and if they did they would follow the established development procedures. Do you believe the current state of kernel development is perfect? If not, if one cannot discuss problems how will one ever improve? In this particular case it seems to be a well studied class of bug (use-after-free). Why do you suggest that the many years of work on tools and other methods to address this problem should be ignored?

joosters · on April 24, 2017

It wasn't specifically you, but IMO your comment is guilty of the charge I was making:

What I see when reading this is a system (...) that is very poorly designed.

I know it's just your opinion, so what? Everything that people write on here is their opinion. But in true Hacker News style, you're straight out declaring that the low level internals of Linux memory management are rubbish. Not just poorly designed, but very poorly designed. As you say yourself, 'to suggest the author is an idiot'

Before this declaration, did you pause and think that maybe, just maybe, the kernel code authors might not be idiots? Or perhaps, you might have reflected, the kernel developers may have considered various different memory management schemes, maybe they weighed up the pros and cons of different approaches and found the current system to be good? Did you entertain the possibility that there might well be other non-obvious considerations that influence the choice of data structure? Apparently not, because you're happy to quickly declare that it's terrible code.

I'm not declaring you to be an idiot. It's the fact that you and others on here launch straight into the full-on 'this is/they are stupid, they should obviously do this' kind of statements, showing a dearth of empathy and a complete failure to look at the other side's view.

I'm not trying to shut down discussion. I'm not saying the kernel is perfect, or that the developers are beyond reproach. I'm not saying ignore years of work. It's the way in which some HN users, like you, jump into extremes of viewpoints (even your response is full of it too, with the questioning about perfect, cannot discuss problems, etc)

And the rust comment was a throwaway joke, HN commentators cannot seem to resist putting a 'do it in rust' comment somewhere in all code discussion...

noelwelsh · on April 24, 2017

Thanks for the response. I appreciate you taking time to clarify your position.

I don't think my position is as clearly articulated in my first comment as it might be, but I doubt you or others have much interest in it so I'll drop the conversation here.

joosters · on April 24, 2017

No worries, re-reading it I think I came across too harsh... no offence meant. I'm guilty at posting the same kind of things that I'm complaining about too :)

metaphorm · on April 24, 2017

or, you could just shrug and laugh.

those who have an interest in kernel development might get something of technical interest out of your comment. those who don't have an interest in kernel development find a lot of the internal debates in that field to be funny, particularly when they seem to show up in all kinds of tangentially-related threads.

it's a quirk. kernel developers have this quirk and it's funny. just laugh. it's ok.

matwood · on April 24, 2017

Completely agree. I try to practice the Principle of Charity[1]. This means when arguing see the counter argument in its best light. I think this can also be expanded when reading others code. Assume they were not idiots (in this case they were kernel developers...), and that they made decisions for a reason. Figure out those reasons, and only then improve/criticize the code.

[1] https://en.wikipedia.org/wiki/Principle_of_charity

zzzcpan · on April 24, 2017

Programming is too dogmatic to have reasonable arguments.

nickpsecurity · on April 24, 2017

Dogmatic programmers are unreasonable. Then there's others who have discussions anywhere from partly to wholly trying to learn something in the process. I've seen people clarify or modify their position numerous times discussing programming on HN.

sqeaky · on April 24, 2017

Only because some programmers are dogmatic. We could be better at resisting this as a community and talking about probably helps.

obstinate · on April 24, 2017

I'll bite! I admit a bias against using linked lists. I've rarely had a need for them in a professional context. In one of the few instances I have used them, I later replaced the list with an array and got better performance. The Doom 3 BFG edition writeup on intrusive lists seems similarly bearish on the concept, unless your lists are very long.

But my experience is in a particular type of software, and maybe I'm shit at my job. :P So why does the kernel use linked lists so much? Are there cases where I can do better by using them myself?

Edit: someone already asked, here: https://news.ycombinator.com/item?id=14182922

zzzcpan · on April 24, 2017

Well, you are participating in this allocation bikeshedding too by implying that some doubt is deserved. It doesn't matter really, networking stack's performance is dominated by other issues and the whole article is not even about performance, but about bugs. And there are definitely ways to prevent bugs or at least prevent them from causing problems.

joosters · on April 24, 2017

I'm not giving an opinion on what the right way to do kernel memory allocation is, I don't know enough on Linux internals to offer a useful opinion. Luckily, HN seems to be populated by many such experts who instantly know the right thing, and it turns out the answer is obvious to all of them and the existing code is dumb and stupid. We are blessed by their genius, apparently.

Besides, my Turing machine has an infinite length of ticker tape, so memory allocation is not my concern :)

lmm · on April 24, 2017

Right back at you: have you considered the possibility that those people are right? Maybe the reason lots of people are saying this is bad design in the kernel and it would be better off written in Rust is simply that this bad design in the kernel and it would be better off written in Rust.

(Please don't use sarcasm, it's rarely constructive)

joosters · on April 24, 2017

I'm not commenting about whether these people are right or wrong, that's not my point.

I'm saying that HN has a vocal population who are so very sure of themselves and their own opinions that they'll rapidly post messages here declaring that innards of a large, complicated piece of software are so obviously wrong and that they know what the right way is. They seem to live in a world of utter certainty, never a doubt that they might not know all the considerations that could weigh in on the design of some inner working of a complex system.

HN would be a better place if these people could entertain some humility, perhaps a little bit of self-doubt, perhaps an inkling that other people may know something they don't, and that the world is not always the black/white, yes/no certainty that they seem to experience. Or we can just drown in the continual "This is stupid! they should do X" style comments.

derefr · on April 24, 2017

I don't do it myself any more (because I don't like having my posts downvoted to the point that nobody replies to them) but I used to be exactly the type of person you're referring to.

The thing was, I didn't actually have any certainty about anything. I just thought that the best way to talk about an issue was to pick a "side" (often at random), and let other people pick their "side", and then defend your "side" every time it gets rebutted, gradually researching the issue from the perspective of your side in order to best steel-man your own position, while assuming others are doing the same in reaction to your own arguments.

In other words, basically pretending internet comments are a debate club, or a court-room, where you're not seeking truth in an Aumann's-agreement-theorem "both people gradually converge onto the truth" way, but rather an adversarial discovery system where the truth comes out for the audience in the compilation of evidence presented by both sides, whether those sides see it themselves or not.

I don't know, it was fun at the time. Maybe I was a teenager with a lot of testosterone.

lmm · on April 24, 2017

I think it works. I mean there's a reason we use this approach for courts and elections, the cases where getting the truth is the most important for society.

derefr · on April 24, 2017

Sure it works, sometimes; the problem is more that it's a particular social game that people (like my younger self) would decide to just start up in the middle of an otherwise not-debate-oriented discussion, without giving any context to what they were doing. I'd just assert a bald-faced claim of something I knew shit-all about, wait for a rebuttal to show up, and then do quick, angry research in order to parry said rebuttal.

Until you see such a thread go four or five levels deep, it's not really clear that that's what's happening, and so you might want to join such a thread as a sane, calm individual, only to have your own argument taken apart "by the seams" in response (e.g. asserting that due to errors in your deduction, you didn't prove your point, even if you may be right) without engaging with its content. Good for lawyering, bad for truth-seeking.

If you then are willing to get baited into playing the game in response, honing your own attack to be bulletproof like a lawyer would, then the truth might eventually find its way out. But if you didn't want to play this game, you'll probably just quit when you realize what's happening, maybe with a petulant "you kno what? Screw this." at the guy who just can't seem to get your words through their head (when they never even believed in their chosen position, so there's no convincing to be done.)

If you explicitly indicate that you want to play this game, it could be great fun for all parties. But I know from experience that the other side of this—being baited into playing without knowing the game is afoot—is painful, and can burn people out on a forum community.

Arizhel · on April 24, 2017

Interesting. Maybe there's a problem because of differing expectations then. I can also see parallels in the legal system, and in differing legal systems: the adversarial system used in English common-law countries is not universal, for instance.

Personally, I think the adversarial system is inherently and downright bad and wrong. The idea isn't to get to the actual truth, it's merely to win the argument at all costs. There might be an idea that this efficiently gets to the truth, but I don't think it does at all; this system just finds who's better at arguing. An inquisitorial system I think is superior at getting to the actual truth. See your own comment about "good for lawyering, bad for truth-seeking": I think this is exactly what's wrong with the common-law system.

Personally, do like to play devil's advocate a fair amount, but not to the extent where I need to actually win the argument, I just want to make sure all sides are covered and that people aren't missing something major by hopping on a bandwagon or engaging in group-think. There's a big difference between this and just endlessly arguing one side without actual regard to the truth or the merits of the other sides (and there may be many--another problem with this adversarial approach that always assumes only 2 sides).

Also related, personally I think formal academic debates are similarly bad and wrong, though not quite as bad as lawyering because at least with academic ones someone's life or welfare is not directly impacted by the contest over who's a better arguer.

lmm · on April 24, 2017

> HN would be a better place if these people could entertain some humility, perhaps a little bit of self-doubt, perhaps an inkling that other people may know something they don't, and that the world is not always the black/white, yes/no certainty that they seem to experience. Or we can just drown in the continual "This is stupid! they should do X" style comments.

I disagree with this one. The best, most productive discussions I've seen on HN are those where people have been clear about their views and expressed them succinctly without the hedging and careful politeness that's expected these days.

mwcampbell · on April 25, 2017

> careful politeness that's expected these days.

I've more often heard, or gotten the impression, that discourse is getting more polarized and vitriolic these days. So which is it?

lmm · on April 25, 2017

HN in the dang era (post-2015) is a lot more polite, if only superficially, than previously.

Danihan · on April 24, 2017

It's easier when people make claims, rather than just being wishy-washy, since specific claims can be disputed / falsified.

mbel · on April 24, 2017

Another thing that does not seem to be very constructive is complaining about all those kernels written in evil C instead of writing one in Rust.

lmm · on April 24, 2017

If we don't believe discussion is constructive then this whole site is pointless.

I do quite a bit of work on improving the safety of various code, both professionally and in my side projects. The techniques I advocate are those that I've found effective. But I can't do everything on my own, unless you're offering to pay my contracting rate for me to work on a new kernel.

benchaney · on April 24, 2017

> If we don't believe discussion is constructive then this whole site is pointless.

This is a false dichotomy. Some discussion can be constructive while some is not.

nickpsecurity · on April 24, 2017

The large monoliths with huge ecosystems almost never get rewritten in anything. They stick around as legacy systems where the programming language & much of its code are preserved over time. Any discussion about rewriting such a thing entirely in a different, safer language would be foolish based on empirical evidence that it won't happen. I know this. It's why you see me on INFOSEC end suggesting to (a) gradually rewrite it in a safer version of C, (b) use compiler tech that automatically ads safety, (c) use monitoring tech or CPU's that spot severe problems, and/or (d) virtualize it as a deprivileged VM on top of something better. You won't see me tell them to rewrite it in Ada, D, or Rust because they'll never do it no matter what I say.

Whereas, suggesting a rewrite can lead to results when the tool is new, small with significant use + bugs, or small to medium with safety- or security-critical usage. Examples of last ones include DNS, web, key, file, and other servers that are mission critical and widely deployed. Example rewrites that actually happened... we'll go with DNS... are IRONSIDES DNS in SPARK Ada and bluejekyll's TrustDNS in Rust. Maybe add Lobo web browser since it ambitiously wrote a browser in pure Java. Dead now but impressive how far project got.

So, yeah, there's certainly pointless discussion that waste space on HN. Arguing for a Linux rewrite is always one of them unless someone with a pile of money dedicated to rewrites is asking which mainstream OS to rewrite in safer language. I'd prefer FreeBSD but probably recommend Linux for higher impact.

Arizhel · on April 24, 2017

I wonder how hard it'd be to do a Linux kernel rewrite in Rust or D, but in a way where you could do it bit-by-bit, keeping the rest of the kernel as-is in C until you get to it. This would be especially useful for the drivers since that's the bulk of the code.

nickpsecurity · on April 25, 2017

Most of the times it's been tried the team gave up. That was usually on NetBSD since it's designed to be easier for newcomers to work with. Changes slower than Linux, too. Still gave up...

steveklabnik · on April 25, 2017

With Rust, you'd have to drop support for some platforms that the kernel can run on but Rust can't just quite yet.

lossolo · on April 24, 2017

Then why google is using kernel written in C mostly in fuchsia? It's new OS in development.

mattnewton · on April 24, 2017

Because Google doesn't use Rust in its code bases? Is this supposed to be taken as Google engineers have infinite wisdom on how to build kernels and therefore we should copy what makes sense for them? Because they are smart but afaik haven't tried this yet, will make plenty of mistakes scaling it up, and there are lots of explanations for why it is in C other than C is better than Rust.

Tyr42 · on April 24, 2017

Counterpoint: https://fuchsia.googlesource.com/magenta-rs/+/88580e68f95830...

Google does use rust in the user layer of Fuchsia.

mattnewton · on April 24, 2017

I should have phrased it less strongly to say that Google doesn't have broad Rust experience AFAIK. I know nothing about their decision making process and the trade offs involved, but if I had to make an unprivledged wild guess I would say the main factor for C is the ability to leverage experience building similar systems in C inside the company, and the amount of prior art outside the company written in C as a close second reason.

lossolo · on April 24, 2017

> Because Google doesn't use Rust in its code bases?

This is not true. There are even bindings written in Rust for Magneta kernel in fuchsia.

> Is this supposed to be taken as Google engineers have infinite wisdom on how to build kernels and therefore we should copy what makes sense for them?

Main benefit of using Rust (memory safety) is probably not high enough comparing to eventual costs - that's why they don't use it in kernel. You make bold claims about decisions of fuchsia developers. I've checked your comments but only small percentage of them are technical and only saw that you write about yourself as "javascript programmer". What's your expertise in building kernels and operating systems that you think we should not copy it? How many kernels and operating systems did you built? I assure you that this team have a lot more wisdom than you and me combined in this space. Have a little respect.

> and there are lots of explanations for why it is in C other than C is better than Rust.

No one proved that Rust is better for writing kernel to this day because there are 0 kernels written in Rust used in high scale production systems today. Until this will change, C is proven to work in this space and considering that - it's better/safer choice there today.

mattnewton · on April 24, 2017

That escalated to ad-hominem attacks quickly. My position was simply that "Google does it" is not an argument in of itself. Source: I am a Google engineer. Never claimed to be an expert on kernels, Rust or C.

bostand · on April 24, 2017

Well, we got both types here:

>> Why are fundamental things like resource allocation at least not standardised in the kernel?

> I think this is a typical reaction from someone with no clue what is going on inside.

jstimpfle · on April 24, 2017

Not entirely sure what "the other type" is here, but that sentence of mine was quite a bit over the top. I hope I'm not coming across as a know-it-all in the rest of the comment and in my other comments.

DigitalJack · on April 24, 2017

I suspect this attitute has to do with the average age of an HN commentor.

tytso · on April 24, 2017

The real problem is that developer (who is a graduate student) doesn't understand how to do use the kernel structure. You can't just have "your own linked list". In order to do what you are doing, you have to add a list_head structure to the struct sock structure just for that linked list, and that struct list_head has to be initialized when the struct sock is initialized, and list_del() is called when the struct sock is released --- oh, and you need to handle locking properly to avoid races between adding a struct sock to the list and removing it. Or you can use RCU, but you do need to avoid races one way or another.

It's not hard, but you do need to understand the idiom. Why isn't there safety mechanisms? Because the kernel is optimized for speed, and instead of adding run-time checks which can be expensive, there are various debugging tools, including slab poisoning and KASAN to find such issues.

brenns10 · on April 24, 2017

Another possibility: the developer understands these things, but wrote a blog post with an explanation that omits some details in order to appeal to a broader audience.

pera · on April 24, 2017

Eh to be fare, because the kernel linked list implementation is slightly different from what most people think of a linked list, I believe it's easy to get a bit confused with it from time to time... I agree with you that it's not hard per se, but some of the bugs one can introduce with this type of linked list are a bit hard to track down, at least in my own experience.

qeternity · on April 24, 2017

I am hardly a kernel programmer, but it seems the real issue here is still that the release_sock() call isn't actually doing what he assumed.

brenns10 · on April 24, 2017

Yeah, this is the real problem. To be clear though (the article skimmed over lots of details to remain generally accessible), I was not discussing the release_sock() function which releases a lock on a socket [1]. Instead, my code registers a set of function pointers in a struct [2]. While release_sock() does have documentation, the function pointer of the same name does not, and the only examples I have are other files of code within the MPTCP protocol implementation. The callback is related to the function, to be sure, and so I should have connected the dots, but this is how we learn.

While I was making a baseless assumption, I like to believe that it was baseless because there was not enough available information, and not because I was ignoring obvious sources of information :P

[1]: http://lxr.free-electrons.com/ident?v=4.1;i=release_sock [2]: https://github.com/multipath-tcp/mptcp/blob/e77e2ca0b5339af2...

andreasvc · on April 24, 2017

Why would one ever prefer a linked list over a dynamic array? I know about the asymptotic performance, but if you take actual memory performance into account (locality, cache, pointers are slow, etc), it seems dynamic arrays should be simpler and perform better.

chriswarbo · on April 24, 2017

> Why would one ever prefer a linked list over a dynamic array?

(Singly) linked lists are trivial to implement in an immutable way (the article talks about the difficulty of doubly-linked lists, implemented in a mutable way).

It's trivial to have linked lists share a tail; this makes datastructures like cactus stacks really easy.

Linked lists are amenable to reasoning by induction (e.g. in Coq, Agda, Idris, etc.).

Linked lists are amenable to lazy algorithms (e.g. iterators/generators).

Linked lists can be heavily optimised, e.g. using stream fusion.

That's off the top of my head. Note that in many cases it might be preferable to use linked lists in the source, but have them compiled to some other representation like arrays (or, in the case of fusion, a single tight loop).

biocomputation · on April 24, 2017

* Also, lists are a reasonable way of managing unrelated types.

jstimpfle · on April 24, 2017

What you say doesn't apply to the kernel, which uses doubly linked lists.

koverstreet · on April 24, 2017

Because with intrusive lists, insertion/removal can't fail due to memory allocation failure - with dynamic arrays, it can.

In the kernel, where you have to handle every single error path, this is a big deal.

jekub · on April 24, 2017

There is a lot of reason why you can prefer a linked-list over an array, even with the hardware changing they are still a lot more efficient in some cases :

  * In multi-threaded context insertion and deletion can be done lock-free which is not the case for arrays ;

  * For intrusive list, insertion and deletion can't fail due to memory allocation ;

  * If you do only insertion at the tail and removal at the head these operation need only a pointer change which is more efficient than an array ;

  * Merging two list is only one pointer operation ;

  * Intrusive list have better memory allocation properties ;

  * ...

If you look at the usage of linked-list in the kernel you will see that they are used for reasons.

The problem is that, on new hardware, linked list should be used in less cases that in the past and some people have generalized this to linked-list should never be used. This is false, if you care about performances you should keep thinking and testing what is the best data-structure in different cases. If you don't care about performances, you should use an abstraction who implement what you need and don't car about the underlying data-structure.

josefx · on April 25, 2017

> In multi-threaded context insertion and deletion can be done lock-free which is not the case for arrays

I just love when multithreaded code frees the element I currently iterate over in a different thread.

> For intrusive list, insertion and deletion can't fail due to memory allocation ;

As long as you store your elements on the stack or use static values you mean? Heap allocation happens at some point, you just move the location where it happens.

> If you do only insertion at the tail and removal at the head these operation need only a pointer change which is more efficient than an array

Use a circular buffer. Until the array is full you can just modify the start/end pointers.

brenns10 · on April 24, 2017

More important than any of the sibling arguments is that we're talking about a socket control block. I don't control where it is allocated or freed, so I can't dictate its location in memory.

Even if I did control it, there are probably several lists a socket might be a part of. Storing the socket in a dynamic array will only work for one of those lists. It's similar to a sparse database index - you can only have one of them. The rest of the lists have to be linked.

Not to mention, when items are shuffled around in a dynamic array, you lose the ability to have a consistent pointer to them.

bmon · on April 24, 2017

Insertion and deletion are prime examples, often requiring you to move large chunks of memory. The same operation just requires pointer changes (after finding where you want to insert or delete). I think in this example it would be common to be removing structs from the center of the list.

sclangdon · on April 24, 2017

Insertion and deletion _were_ prime examples. It's not necessarily the case with today's hardware, especially until you reach some threshold of the number of elements (usually hundreds of thousands).

You say "the same operation just requires pointer changes (after finding where you want to insert or delete)" like it's some trivial thing that doesn't enter the equation. But finding where you want to insert or delete in a non-contiguous linked list is not a consideration that should be an afterthought in brackets.

Furthermore, when you have found the location, it's still not _just_ pointer changes. You actually have to allocate the memory for the node, too.

Traversing pointers and allocating memory is a lot slower than working with a contiguous block of memory (assuming you don't have to re-allocate the contiguous block of memory, of course).

Either way, don't take my word for it. If you're a C++ programmer, just benchmark std::vector against std::list, or watch this video - https://youtu.be/YQs6IC-vgmo

dahart · on April 24, 2017

You are right; all your arguments are valid in user space programming with normal (certain kinds) of data. But to be honest, you sound like someone who hasn't tried kernel or embedded development. You must have some experience with it to understand why these things don't always hold true.

> Furthermore, when you have found the location, it's still not _just_ pointer changes. You actually have to allocate the memory for the node, too.

This is not normally true. In embedded development and game engine memory managers, data nodes already include list pointers, and there are several ways the data nodes can come to exist that don't involve a separate allocation per node. Sometimes it's allocations of blocks of nodes. Sometimes it's slab allocations (to use the terminology from the article) - a memory manager that pre-allocates constant sized memory blocks.

Kernel lists don't normally involve hundreds of thousands of elements, and using dynamic arrays for small lists, the separate alloc & realloc time may not be even close to amortized away.

> just benchmark std::vector against std::list

This is unconvincing! Kernel & embedded code do not normally use std::list. There are very good reasons people warn against kernel development in C++! And performance benchmarks are the lowest item on the priority & constraint list. The least bad thing that can happen is going slower. Fragmentation is a bigger priority, and dynamic arrays have much worse fragmentation patterns because the allocation size changes over time.

jstimpfle · on April 24, 2017

Don't forget that there are valid uses of linked lists even in 2017. If there are more insertions / removals than scans, an array makes no sense.

Reallocating a big dynamic array is also a no-go in many time-constrained situations.

> You actually have to allocate the memory for the node, too.

The way this is done is by intrusive linking (the data must include list pointers). The data usually knows in which lists it is linked, so there is no point in having a version of the data without a list head.

kazagistar · on April 24, 2017

Almost any insertion requires a scan.

wtetzner · on April 24, 2017

Well, given that the pointers are contained in the nodes themselves, if a piece of code is already looking at a node when deciding if an insertion is needed, then no additional scan is required.

jstimpfle · on April 24, 2017

Many do, but there are also insertions in unsorted lists or insertions after a given element...

Retric · on April 24, 2017

Cache means inserts into arrays are often faster than linked lists. Remember RAM is really really big and slow.

PeterisP · on April 24, 2017

Not if the list is allocated within a slab, as described in the article. Imagine list elements allocated continuously (as in an array) but the ordering is decided by the pointers within each node, instead of assuming a fixed order. All the elements still are in the same cache line.

Retric · on April 24, 2017

Linked lists take more memory than arrays, so even in your example it's not as obvious as your making it out to be. Further, your ordering assumes no deletes.

signa11 · on April 24, 2017

>... inserts into arrays are often faster than linked lists.

don't forget, deletes, as well.

Retric · on April 24, 2017

Yea, it's really seeks being horribly slow on out of cache linked lists being horribly slow, and then inserts and deletes being fast once you find the right node.

lorenzhs · on April 24, 2017

I agree that in userspace, linked lists rarely are the way to go. Chasing pointers is too slow.

We're talking about kernelspace, though, and different rules apply there. Memory allocation works very differently, and you have to handle every possible error path.

bmon · on April 24, 2017

I was trying to highlight that the need to find the target for insertion or deletion was still a notable operation, not as an afterthought. Sorry it came out that way! That said, you still might need to perform a similar operation to find the target for deletion when using an array. The same is true for allocations (as you pointed out).

I'm don't know how the slab cache allocation would benchmark against allocating larger chunks, I was just trying to give a classical answer to the question.

ReligiousFlames · on April 24, 2017

That's all well-and-good for benchmarks that don't need to delete elements, but suboptimal for kernel data-structures that don't require O(1) find but are mostly round-robinned like multiple sockets. Using reallocated arrays instead linked lists also increases memory bandwidth and puts kernel data-structures at increased risk of random bit flips (most machines lack Chipkill ECC), in addition to cache evictions.

signa11 · on April 24, 2017

> Insertion and deletion are prime examples, often requiring you to move large chunks of memory.

on modern cpu's with huge caches, i seriously doubt this claim.

fwiw, i have played around with synthetic problems where: i insert sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequence of positions. and almost always, vectors outperform lists by at least couple of orders of magnitude.

or to put it another way, it is almost never about either lists / vectors or something else, and boils down to couple of 'rules' e.g. access data predictably (avoid trashing the cache), keep data compact (more cache utilization) etc. etc.

jstimpfle · on April 24, 2017

> i insert sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequence of positions.

Which means that you scan the list twice for each node. First for insertion and then for deletion. Starting at maybe a hundred that would be much more efficient with a balanced search tree. They have a good Red-black tree in Linux.

But if it's only arrays vs lists - with linked lists, pointers to items are never invalidated. With dynamic arrays, they are.

Dynamic arrays are fine for plain old data arrays. But it's harder for data that has "identity" and is frequently mutated. And if a node is in multiple lists, I guess arrays are out. You'd need to store the data redundantly. Extra bad if you need atomic mutations.

yongjik · on April 24, 2017

Well on a 64-bit machine each node of a singly-linked list of integers will be 16 bytes (8 byte pointer + 4 byte integer + 4 byte for padding). That's 300% overhead. Becomes 500% for doubly-linked lists.

So it's not surprising that linked lists perform very poorly in such circumstances.

dozzie · on April 24, 2017

Because (doubly) linked lists can do something an array cannot? Like join operations in O(1)? Or removing an arbitrary element in O(1) once you have its handle?

blibble · on April 24, 2017

if it's unordered (like you'd expect a list of sockets to be) you can get O(1) removal for an array by swapping the end item into the removed slot

dozzie · on April 24, 2017

...unless something else holds a pointer to the handle you're swapping with.

staticassertion · on April 24, 2017

This is already a problem if you're removing an item, regardless of whether it's done with swap remove or not.

toast0 · on April 24, 2017

For the item you're removing, you are presumably removing it because all handles have been closed. But the item you're potentially swapping with may certainly have open handles.

staticassertion · on April 24, 2017

True.

blibble · on April 24, 2017

kernels don't tend to give userspace raw pointers into kernel memory

dozzie · on April 24, 2017

But kernels tend to give raw pointers to other kernel code.

blibble · on April 24, 2017

a good chunk of the fun in kernel code is maintaining the various layers of self-referential lists and dealing with the cleanup

robmccoll · on April 24, 2017

If you are using a slab allocator like in this case, you are effectively splitting the middle. Allocation of the nodes is contiguous in the slab, so locality of the nodes overall is likely better than malloc()ing each one. Allocation is cheap since you are likely just grabbing the head of a free list in the slab (or incrementing a counter if you never intend to free intermediate data), and you get less heap fragmentation/ higher utilization than a pure linked list. You still get fast insertion and removal anywhere in the list. There's a bit of storage overhead for pointers and pointer jumping on access, but ideally your list isn't super fragmented, so close to linear cache reads. If the list is long lived / has certain turnover patterns, some of these effects fall away. You could implement your own defrag/resort, but that assumes being able to relocate the data.

Also in some cases the linked list header may be embedded in other structures, so you'd have to pull the header out to go with a purely vector-based approach for when you need to shift (assuming you can't move the entire structure). And if the structure needs to reference itself in the list for quick removal, that's a problem if its index is changing. Otherwise you could just search the vector for references at linear cost.

dahart · on April 24, 2017

> Why would one ever prefer a linked list over a dynamic array?

I thought this same thing when I started doing embedded console game development and saw a lot of linked lists going on. I came to realize there are a bunch of really good reasons to use linked lists.

The biggest difference in practice is that linked list pointers are commonly fields inside the data that you're linking, as opposed to being a separate data structure. With a dynamic array, it's usually a separate data structure. Neither of those is absolute or always, but that's how it normally plays out.

With that in mind:

Sometimes you're loading a block of pre-formatted binary data off disk or from a network that has space for linked list pointers. Rather than allocate anything else, you can fill in the list pointers.

Sometimes you have a lot of small lists to manage. If you're allocating space for your list data, and the dynamic array doesn't get big enough for amortizing to take over, then you're doing double allocations, one for the data and one for the dynamic array & potentially first couple of reallocs.

Fragmentation is worse with a dynamic array (when it's a separate data structure), and re-ordering events are much costlier in arrays than with linked lists.

falcolas · on April 24, 2017

> Why would one ever prefer a linked list over a dynamic array?

Consistent performance. Every insert and every delete requires the same amount of time, at the cost of a higher average traversal time. You can ameliorate some of the deletion reordering cost if you can change the order of the array, but not all use-cases allow for this.

Also, intrusively linked lists (that is, the same item represented in multiple lists) are just as poor, performance wise, when implemented with arrays.

I did a quick test of this at one point, comparing C++ std::list with a linked list; the performance of the dynamic array was significantly higher for most operations (about one order of magnitude for my test), but the worst-case was much, much worse than the linked list (somewhere close to two orders of magnitude).

SamReidHughes · on April 24, 2017

No error handling code when adding an element to the set. Also, a list maintains elements in order, unlike an array pointing to objects that keep track of their position in said array.

dooglius · on April 24, 2017

There seems to be a lot of jumping to conclusions and knee-jerk reactions in this thread. As a C developer, and having worked on Linux drivers, the description here is quite strange; memory is always freed by the same "layer" that allocated it. The implementation of release_sock (http://lxr.free-electrons.com/source/net/core/sock.c#L2536) appears to have a callback, and appears to not free memory, so it doesn't look like we have the full story here. What is clear is that the author misunderstood the "right way" to use the API in some way, and the lack of good documentation for a lot of kernel APIs is a big problem.

Perhaps Rust could have prevented the bug? I don't know. I haven't looked into Rust a great deal, but the impression I get from what I've read is that it severely limits expressiveness (read: the ways in which you can define APIs) unless you encapsulate things in blocks annotated "unsafe", at which point you lose any checking the Rust does. Would it even be possible to port the sockets code to Rust without fundamentally changing the APIs and data structures? To the Rust people: it comes off pretty badly when you throw around claims like "C effectively is unsafe everywhere" because "unsafe" in English doesn't mean what it means in Rust nomenclature; calling something "unsafe" implies it has bugs or security holes. It is perfectly possible to write bug-free secure C code, and also perfectly possible to write buggy or insecure Rust code. What I think you mean is: C code is roughly equivalent to what you'd find in a Rust block annotated "unsafe", and thus cannot perform some of the compile-time checks that Rust can. Calling it something else like "lowlevel", "expressive", or "tricky" would be better.

brenns10 · on April 24, 2017

> so it doesn't look like we have the full story here

I omitted so many details writing this -- otherwise the main story would have dragged on way too long. My comment here [1] gives some context.

[1]: https://news.ycombinator.com/item?id=14184148

steveklabnik · on April 25, 2017

We use "unsafe" because Rust is concerned about memory safety, which is well-defined. Something memory unsafe does have a bug, and often security holes. That is, as you say, you can write code that is memory safe in C, but you get no assistance from the compiler in doing so. In Rust, you do in the vast majority of cases.

Words are hard; your suggestions of "lowlevel", "expressive", or "tricky" have problems too: Rust is not exactly "high level" or "low level", that is, it feels higher level, but can easily go as low-level as you need; "expressive" is suggested because you see Rust as being very restrictive, but not everyone feels that way, "tricky" might work, but also has a negative connotation as well.

sbierwagen · on April 24, 2017

  It is perfectly possible to write bug-free secure C code,

https://cve.mitre.org/data/downloads/allitems.html

noelwelsh · on April 24, 2017

"I guess the moral of this story is that tools are excellent, and you should probably use them."

This is, in my opinion, a rather generous interpretation of events. What I see when reading this is a system (the Linux kernel and surrounding infrastructure like the C language) that is very poorly designed.

For example, resource allocation is a fundamental issue and a constant source of problems. We've known this as a field for a very long time. Why are fundamental things like resource allocation at least not standardised in the kernel? Better would be to enforce safe resource usage by static checking. Rust is an example of what can be done here, but much more is possible. (I'm not trying to suggest the kernel should be rewritten in Rust. I'm just using it as an example that resource usage [in particular, memory allocation in Rust] can be done in a way that prevents errors.)

Burning huge amounts of human effort is a popular approach to software development but I really hope we can do better in the next few decades. The worst thing about the current situation is the programmer blames themself ("This one kept me up until 3AM. In hindsight, every bug looks simple, and when I figured it out, I was embarassed that it took me so long") rather than wondering why this problem is even possible in the first place.

jstimpfle · on April 24, 2017

I think this is a typical reaction from someone with no clue what is going on inside.

> Why are fundamental things like resource allocation at least not standardised in the kernel?

I'm an outsider myself, so take this with a grain of salt. But having implemented a few datastructures in C, this gave me a little chuckle. You can't "standardize" resource allocation. To be efficient and organized you need to tailor allocation to the problem at hand. I bet there are at least half a dozen common strategies for allocation in the kernel. (And most individual methods are of course "standardized". Do you suggest kernel developers are too stupid to see obvious improvements?)

What you can, and probably should, do is have all elements of one type allocated/managed through the same mechanism.

noelwelsh · on April 24, 2017

I wasn't suggesting there is necessarily one single mechanism for performing memory allocation. I'm aware of arena allocators and that sort of stuff, having used them in the past and having been an avid reader of GC/compiler literature in my youth.

What seems to be the issue here is that the ownership of memory is not clear. The author seems to think they own it (it's not clear from the blog post if they allocated the memory but I assumed they did) while some other part of the kernel is freeing the memory. There are ways to track ownership of memory so this cannot occur. Rust does this. It's independent of how memory is allocated. A weaker form is to use some convention, which doesn't seem to be the case in the example described.

The other issue I have with your response is what I see as a typical knee-jerk reaction whenever the kernel is criticised: your calling me clueless and saying "Do you suggest kernel developers are too stupid to see obvious improvements?" I.e. to suggest that any criticism of the kernel comes from a place of complete ignorance and kernel development could not possibly be improved.

Kernel developers are not gods. They are not perfect. They are not above question. What I have seen is a lot of arrogance amongst kernel developers, the few times I've dipped into kernel development, that makes me think they probably are missing some obvious improvements.

This paper: http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf is both a good read and an example of missed improvements to the Linux kernel scheduler, and it does suggest (though doesn't conclusively demonstrate) that these improvements are missed due to the arrogance of the kernel developers.

jstimpfle · on April 24, 2017

Fair! Just wanted to point out they already "standardize" within reason, i.e. function abstraction per datatype and the relevant documentation.

These types of problems (getting ownership wrong) might be common, but whether they can be solved using a mechanism like Rust's remains to be shown. I doubt it, but I'm not intimate with Rust.

(Personally most Rust programs I've seen so far are off-putting at a syntactical and sometimes architectural level - the latter is of course not entirely the language's fault. Kernels will probably not be one of the first areas of Rust adoption).

noelwelsh · on April 24, 2017

A reasonable discussion on the Internet! I'm shocked (but thank-you) ;-)

I too find Rust a bit off-putting on the syntactical level, though architecturally what little Rust I've looked at does appeal to me, as I'm in the functional programming camp from which Rust draws inspiration. I'll hopefully get to do some real work with it soon, which will give me a better feel for it.

Agreed that the power of Rust's borrow checker remains to been shown on a large scale. I'm optimistic it will be, but time will tell.

wtetzner · on April 24, 2017

> Kernels will probably not be one of the first areas of Rust adoption

You probably mean popular/used-in-production kernels, but there are OS's being written in Rust, e.g. Redox [1]

[1] https://github.com/redox-os/redox

jstimpfle · on April 24, 2017

I had heard of that project. I'm sure there are OS's in Haskell too, but yes, that was not what I meant.

staticassertion · on April 24, 2017

> The author seems to think they own it

Yes, fundamentally this appears to be the issue, and I agree entirely that the concept of ownership is something you can get, albeit a not great version of it, in a standard way. C++'s unique and shared pointers, as an example, make ownership concepts much clearer. Why is the Linux kernel using raw pointers? Even just having a unique pointer would provide tons of help.

This is not an uncommon issue. The Linux kernel is plagued with use after free vulnerabilities and invalid dereferences etc. But there's very little upstream movement to solve these problems.

jstimpfle · on April 24, 2017

I think unique pointers are basically RAII for pointers, which does not help for lifetimes that aren't controlled by lexical scope. Shared pointers have reference counting overhead, which I think is used mainly for concurrent access in the kernel. Furthermore often kernel people want to "see" and control where stuff happens.

Too · on April 24, 2017

unique_ptr helps even outside lexical scope when transferring ownership through function arguments, it makes it much clearer that ownership is transferred as opposed to borrowed. It can also be controlled manually (not only by scope) by manually settings its contents to nullptr.

foldr · on April 24, 2017

One issue with Rust (which I know you weren't actually suggesting for use in the kernel) is that it's difficult to implement recursive mutable data structures in a way that makes the borrow checker happy. Recursive mutable data structures are pretty common in the kernel, I should think.

deckiedan · on April 24, 2017

I thought the idea with rust was that it was low-level enough that such problems could be expressed with zero overhead in an unsafe block, and then given a safe interface for users further up the chain?

There's nothing wrong with scary-potentially-misused-unsafe stuff as long as you can isolate it and therefore less likely misuse it unsafely...

C effectively is unsafe everywhere, and other languages that are safe are non-zero-overhead, or so far up the stack that they cannot be used in kernel/system level stuff.

Is that right?

steveklabnik · on April 24, 2017

Yes, your parent is saying that it would be nice if Rust's safe subset could understand this, so you wouldn't need to resort to unsafe. You can absolutely do this stuff with unsafe, it's just, well, unsafe.

deckiedan · on April 24, 2017

How could a safe system actually do it? Without some kind of run time garbage collector? Doesn't this kind of problem really require run time (either manual or automatic) care of memory by definition?

Manishearth · on April 24, 2017

You can do it inefficiently and kinda manually in rust via weak pointers.

Otherwise GC is basically the only way in rust. You can have tightly-scoped GC in rust that doesn't affect the runtime of everything else, though. No good such GCs exist right now.

There are some type system tricks in hypothetical type systems you can do to make it possible to deal with these without runtime tracking. You can use the type system to ensure Detach traits get implemented, where Detach is for breaking cycles in a pass before destruction.

swsieber · on April 24, 2017

To be clear, when stuff's implemented in a safe block, it's usually implemented correctly, ie safe to use. It's just the logic behind it is harder to track than the normal lifetimes stuff.

I'm betting you could do something with a dependent type system - from what I understand you can encode the state of data as a type... but I'm not really familiar enough to say for certain. Take this comment with a huge grain of salt.

steveklabnik · on April 25, 2017

If we knew the answer to that, then maybe safe Rust would be able to handle it :)

Or rather, knew of one without a GC.

Too · on April 24, 2017

"Learning Rust With Entirely Too Many Linked Lists" is a great example of the complications you run into writing such a data structure: http://cglab.ca/~abeinges/blah/too-many-lists/book/

foldr · on April 24, 2017

This is great, thanks for linking to it. But yeah, Rust may be the first programming language where you have to read an entire book before implementing a linked list! (I do like Rust a lot though.)

Rusky · on April 24, 2017

> Why are fundamental things like resource allocation at least not standardised in the kernel?

They are. The slab allocator mentioned in the article is one part of that. The problem here was not that allocation is ad hoc, but that the author was unaware of the conventions.

It would, of course, be nice to have some static analysis to help out people in that situation.

ReligiousFlames · on April 24, 2017

I think it's poor methodology combined with a "dangerous" language: lack of unit-testing/TDD, low-level verbosity, lack of formal behavior prooofs, kitchen-sink features lumped into the kernel, constantly reinventing the wheel and lack of minimal package code re-use.

Don't get me wrong, I learned all the sharp edges by trial-and-error in the C89 days with Pascal and embedded assembly, but I think we've got to have systems languages which provide higher-level abstraction for bare-metal kernels. It's just that even codebases like OpenBSD don't go far enough like seL4 to verify their security model beyond reasonable doubt with formal methods.

Chris2048 · on April 24, 2017

> The worst thing about the current situation is the programmer blames themself ... rather than wondering why this problem is even possible in the first place.

Absolutely agree. I think environment is partly to blame. Devs fight fires rather than prevent them because it makes their efforts more visible.

It's the reason you can demand metrics to justify everything - the only metric that justifies defensive programming are examples of bugs/failure, and it would be better to just avoid them in the first place.

nickpsecurity · on April 24, 2017

"It turns out that MPTCP sockets are allocated using a slab cache. I found this out early on, but forgot about it nearly as soon as I found out."

The real problem isn't linked lists. It's that the author learned how something worked (the spec), forgot about that, and then essentially coded against the wrong spec. The title might have instead been: "I forgot something and broke a kernel." Much more accurate given people who know they're dealing with slab allocators, linked lists, etc usually use them correctly. They can also sleep a bit better instead of being up until 3am debugging.

ReligiousFlames · on April 24, 2017

There are numerous, tested macro libraries to prevent reinventing common data-structure/algorithm bugs.

Here's one: https://github.com/troydhanson/uthash

monomaniar · on April 24, 2017

see the resume of the author. good boy.

hbex5 · on April 24, 2017

A linked list is almost certainly not what you want either. They're so deeply cache-unfriendly that their appearance in kernel code should be viewed as suspect.

justinhj · on April 24, 2017

The article indicates use of a slab cache which means that all the socket structures will be together in memory.

tropo · on April 25, 2017

They are cache-friendly in normal Linux kernel usage.

There are not distinct allocations for list links; the pointers are embedded (via a small struct) in the struct that is being dealt with. In walking the list, you normally need to look at that stuff anyway, and with everything together you can share cache lines.

It isn't normal to run down the entire length of a list without looking at the data. Since you need to look at the data anyway, it is best that the data share cache lines with the linked list info.

UK-AL · on April 24, 2017

I was under the impression, that linked lists are quite common in the Linux kernel.

huhtenberg · on April 24, 2017

They indeed are.

forgotpwtomain · on April 24, 2017

> They're so deeply cache-unfriendly

There a number of variations on linked lists, which are cache friendly (e.g. unrolled linked lists)

Johnny_Brahms · on April 24, 2017

Have you looked at the Linux source? It is full of linked lists everywhere.

panic · on April 24, 2017

What would you use instead?