Hacker News new | past | comments | ask | show | jobs | submit login
Basic Data Structures and Algorithms in the Linux Kernel (cstheory.stackexchange.com)
682 points by jackhammer2022 on Nov 23, 2013 | hide | past | favorite | 59 comments

Great material, but it's been directly taken from the source material (http://cstheory.stackexchange.com/questions/19759/core-algor...) with no added content. I imagine Vijay (the author of the source material) put a lot of work into assembling this information. Vijay's CS Theory answer should replaced as the URL for this HN submission.

EDIT: Removed part of my comment, per the blog author's response below.

All I wanted was to bring attention to Vijay's amazing answer in stackexchange. I disclaimed this by thanking him for the brilliant research on the first line of the blog post. Can anybody change the link of this HN submission to the original answer? Mods?

I didn't meant to blog spam, not sure why the submitter didn't link to the original content.

Those about to hurt themselves because this isn't yet another Bitcoin story might want to search that page for Merkle trees. They've been used in other P2P networks as far back as the heydays of 2002:


Even earlier, check out details of Freenet and GNUnet.

A Coding for Interviews [1] group member mentioned that reading through the Java collections library [2] was the most valuable step he took while preparing for his Google interviews.

In addition to getting a better understanding the standard data structures, hearing a candidate say "well the Java collections library uses this strategy..." is a strong positive signal.

[1]: http://codingforinterviews.com

[2]: He suggested reading the libraries here: http://www.docjar.com/html/api/java/util/HashMap.java.html

I agree. It helped me when I was a bit rusty with data structures and was looking for a new job recently. And if you can try and re-implement the data structures in a different way. For example, the JDK HashMap uses chaining so try and build an open addressing one. Not only does it teach you about data structures themselves but it gets you practicing coding very quickly for these toy coding problems that get thrown at you during interviews.

I fully agree. I usually have a fun time going from method to method just exploring any of the implementations in that library. Josh Bloch (One of the writers of the library I think) uses a lot of examples from the Java collections in his book "Effective Java", which I also recommend to everyone.

Thanks for the docjar link. Is there a way I can see a syntax highlighted version of this?

Wow. Thanks!

Very nice summary.

I encountered many of these while reading through Understanding The Linux Kernel [0] and The Linux Programming Interface [1].

Both are great books which are primarily about the "how" of the kernel, but cover a lot of the "why" of the design and algorithms as well.

0: http://www.amazon.com/dp/0596005652

1: http://www.amazon.com/dp/1593272200

I'd also add Love's Linux Kernel Development[2], published in 2010. Great resource overall for someone wanting to go developing the internals of the kernel.

[2] http://www.amazon.com/dp/0672329468


I actually have that book as well. I don't know how I forget to mention it. As I recall, it was bit less dense than the other two.

Would you recommend those books to someone who is just familiar with the Linux command line and knows the basics of C? I want to start understanding the system I use everyday, in depth, but am not sure where to start.

I bought "Beginning Linux Programming" and I think it's a great book for beginner and intermediate C programmers looking to get a deeper knowledge of Linux. A bit basic if you've been working with C for some time tho.

After buying it I found The Linux Programming Interface... and I certainly regretted having bought the former rather than the later. But for beginners I recommend the first.

On a (slightly) related note, you should also check out the author Vijay's (http://www.eecs.berkeley.edu/~vijayd/#about) answer on the benefits of learning Finite Automata: http://cstheory.stackexchange.com/questions/14811/what-is-th...

Its overall a very good post, although WRT response #9 and #22 of his answer its surprising no one mentions the venn diagram-like effect where its very easy for (real) CS types to solve problems using regex and e-NFAs and all that higher level stuff and the point of the class is teaching how to convert from one form to another. Also its really easy for EE types to implement any DFA using a vast pile of flipflops and logic gates of various types in a simple scalable algorithmic-ish way that doesn't really exist (at least as a simple algo) for converting e-NFAs into NAND gates or whatever. That intersection of those two areas is where the assemblers, interpreters, compilers, disassemblers, and their close buddies all live.

So this whole e-NFA / NFA / DFA / regex topic is at least one provably complete way you can translate from 'ls *.html' into a whole bunch of gates and FFs in hardware.

There are of course usually ways a smart-ish human can make a faster less general more specific hardware implementation. But its real handy for language designers etc to know that based on CS theory its impossible to write a regex that can't, eventually, be turned into operating hardware.

Now you can add features to regex until you screw that up, if you try hard enough, but I can't think of any good examples at this time. And not all added features screw it up either, some pretty obvious syntactic sugar homomorphisms don't do anything other than make regexes easier for humans to write and understand.

There are standard examples where the minimal deterministic automaton for a language is exponentially larger than a minimal non-deterministic automaton

Can anyone provide one such example, please?

"the language of strings over the alphabet {0,1} in which there are at least n characters, the nth from last of which is 1. It can be represented by an (n + 1)-state NFA, but it requires 2n DFA states, one for each n-character suffix of the input."


For reference, that "2n" is supposed to be 2^n.

Thanks for catching that.

The stack exchange comment was amazing. You can't get a better raison d'etre for why studying algorithms is important.

Pardon me, but your french would leave a better impression if you would use it in a grammatically correct way. "reason for existence for why studying algorithms is important" ?

I speak french and this is the correct use.

It's not really french though, more like french words that are used and are now part of the english language.

Raison d'être just means 'why is it there'.

> it's not really french though

I know what you mean, but this is correct french, as in the french use it to denote the exact same thing.[1] There are many French phrases that change meaning as they are ported to English, but this is not one of there.

[1] http://fr.wikipedia.org/wiki/Raison_d%27%C3%AAtre_(homonymie...

Raison d'etre means "reason for existence" and it refers to a particular thing. The problem is that "why studying algorithms is important" does not describe a thing, so you can ask yourself: the reason for existence OF WHAT?. "raison d'etre of algorithm classes" or "raison d'etre of algorithm research" would be grammatically correct phrasing. Why use fancy words only to get basic sentence structure wrong?

See also: http://grammarist.com/usage/raison-detre/

More implementations listed at: http://cstheory.stackexchange.com/a/19773

What a fantastic read. Thank you for posting the original.

An interesting read indeed, I was looking for something like this for quite a while! Thanks for the link!

I like reading these off stack-exchange since I am often to lazy to read the textbook.

My other problem with algorithms textbooks is that I get into arguments with other developers about how much we need them. At least here, I can say "Look bucko, the Linux kernel itself uses them."

I decided we can do programming at the API level and never have to think to how that API gives us the right answer. Lower-level programming is responsible for optimization when our number of data points gets larger.

And we could go even lower level and ask why the algorithms work in the first place - which is the computer science aspect. I routinely deal with developers who feel they do not have time for this.

Also, if the data is small enough scale, we can brute-force it and nobody will notice.

These are great resources. Best advice I was ever given when starting CS and learning C was from the headmaster (hi RR!) asking me "What about Linus's linked-list? Have you looked at them?".

Up to that point, this (new) headmaster was seen by the students as "that non-tech guy here to administrate the school" and he was opening my eyes on the biggest codebase residing on my own computer that I never bothered looking through: the Linux kernel code.

As someone says further down the comments, this is not a specific Linux thing: looking at how Java HashMaps works or how Ruby implements "map" are great resources and you'll always get bonus points in an interview for referencing algorithms from "proven" source codes.

Awesome answer. This is a treasure for anybody wanting to read data structures & algorithms. I always felt bored to read data structures for the sake of reading them or for interviews with some made up examples. I am sure we can quote many other open source projects with interesting uses of these data structures. This is way more interesting than reading source code of data structure libraries in programming languages.

Excellent, I love reading this stuff. Very helpful and informative for those of us who are interested in computer science but studied in a different field.

My favorite algorithm has been the linked list implementation, pretty useful for implementing list on embedded platforms.

I've really got to force myself to learn these data structures. The circular, doubly linked list is one of my favorite structures, I oddly find myself reaching for it all the time. Which is funny, because when I first had to implement it at school, I thought I would never ever have a use for such a thing. But now, years later, I don't know what I'd do without it!

I wonder what I missed out on by glossing over things like red-black trees.

> My favorite algorithm has been the linked list implementation, pretty useful for implementing list on embedded platforms.

You can use GPL'd code for your work?

I'm guessing you mean "can't use GPL'd code" here.

embedded work is different, in that you often embed your container in your data-structure, so you end up with something like:

  typedef struct {
    int x;
    int y;
    Point *next;
  } Point;
this allows you to save on overhead of extra allocations when creating your list (but does mean you need to create extra functions for

  Point* find_list(Point* p, int x, int y);

That's not what he meant.

He meant that GP is so dumb that he is forced to use Linux .h in his project (along with all arch dependencies) rather than to take 5 minutes and code it from scratch. And that he is ignorant of licensing matters of GPL'd code or, more likely, he just doesn't give a f_ck about them. That'd be the gist of what ExpiredLink meant.

I'm not making much sense of that comment.

GP is wondering whether the original poster is allowed to use code under the GPL license by his employer. This license requires derived code to be distributed under the same license, which companies may not want to do because trade secrets etc.

Google for "BSD linked list" and get 293000 results. Its an old data structure, much older than the GPL. Also its very popular. Google for "ruby linked list gem" and simply use a debugged library. Or any other language of your choice. Being a very popular data structure there likely exist implementations under most licenses for most languages. So it just struck me as weird that in 2013 someone would cut-n-paste reimplement one found in the kernel other than for obvious academic or educational interest or direct interface with the kernel (device driver author?)

GP meant to ask: how is that relevant?

well, as long as you can 'appreciate' its effects on caches its all fine ;)

Wow! Does anyone know more about the author Vijay D.? Is this the person: http://www.eecs.berkeley.edu/~vijayd/

Given that he lists Berkeley on his Stack Exchange profile and the same research interests, I'd say yes.

Could someone explain what the utility of bubble sort is? I've read that even in cases where an O(nlogn) sort is impractical, insertion or selection sort is preferred.

Bubble sort is good for sorting particles in a particle system. Particles need to be drawn from the furthest from the camera to the closest to the camera.

Each frame the particles move a little bit, and the camera moves a little bit.

That means that in a given frame most, if not all, of the particles are probably already sorted. In addition, if the sort order has changed, it's probably only requires swaps of adjacent particles.

Because of this, bubble sort is often best sort to use for this operation.

Wouldn't insertion sort be even better then? As far as I recall insertion sort is always better than bubble sort, easier to understand too so I do not see why bubble sort is so popular in CS courses.

bubble sort is useful for two cases:

you have a small (3-5) fixed size array that you need to sort with very simple code.

A classic case is to enforce a defined lock ordering when you have more than two locks in dynamic objects that you all need to acquire. In this case open coded bubble sort is good and beats calling a complicated library function.

The other case is your objects are known to be nearly already sorted. In this case bubble sort is a very good algorithm too (although it may be better to avoid a full sort then)

In short bubble sort is often a good choice when anything else would be over engineered.

To expand on this point: simplicity ("just do what you need") is often under valued.

One simple data structure trick I learned recently (from Knuth) is: people often use complicated balanced tree libraries like b or rb. You only need a balanced tree library when the sort keys are in-balanced. One simple trick is to hash the input keys. As long as your hash function is good enough, the keys will be already spread out and you can then use a much simpler non balanced tree.

Insertion sort is almost always faster than bubble sort and just as easy to implement, so no there is no reason to implement bubble sort that I know of.

If you already have a list of objects, then bubble sort can beat insertion sort in terms of space, because it doesn't have to construct a second list. That's pretty much the only place left where bubble sort has any advantage, but when you need it, nothing else will do.

Insertion sort does swap elements in place so requires zero additional memory. It is like bubble sort but with better good cases and is as easy to implement. I recommend reading the Wikipedia article about it.


It swaps elements in place in the list it builds, but it requires building another list. Bubble sort doesn't.

Amazing answer!, unfortunately 'this is not a good fit for our Q&A format'.

There's a general rule on Stack Exchange that something isn't worth reading unless the deletionists are having a cow about it.

Come on guys, we need to save valuable and expensive disk space for those oh so precious "http://lmgtfy.com/" questions.

You can find on topic high detail accurate cited analysis of technical questions everywhere else on the internet (insert sarcasm); stack exchange is not for that; its for people who are somehow smart enough to use SE but not smart enough to use google.

And that's the value of this HN article; SE has made itself irrelevant, so when a valuable gem floats by in its sewer, unless someone points the gem out, no one will ever see it again.

Its too bad, the tech behind SE, and some of its ideas, and obviously the subject matter, could obviously create a better site than SE.

So the deletionist movement has jumped hosts from wikipedia to SE ?

How sad. How sad and lame.

Yeah, and I agree with VLM; SE sites were amazing once they started but now its just filled by a bunch of ass*.

I recently asked for an opinion on the fastest algorithm to layout elements on a webpage (something I consider should be Web Developers 101 and actually didn't found anything on google) and my question got closed promptly because 'it was not constructive'. Come on...

Wow. That's so cool.

I'd be interested in Basic Data Structures and Algorithms in C that are published under a non-viral license.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact