
Basic Data Structures and Algorithms in the Linux Kernel - jackhammer2022
http://cstheory.stackexchange.com/questions/19759/core-algorithms-deployed/19773#19773
======
timsally
Great material, but it's been directly taken from the source material
([http://cstheory.stackexchange.com/questions/19759/core-
algor...](http://cstheory.stackexchange.com/questions/19759/core-algorithms-
deployed/19773#19773)) with no added content. I imagine Vijay (the author of
the source material) put a lot of work into assembling this information.
Vijay's CS Theory answer should replaced as the URL for this HN submission.

EDIT: Removed part of my comment, per the blog author's response below.

~~~
nly
Those about to hurt themselves because this isn't yet another Bitcoin story
might want to search that page for Merkle trees. They've been used in other
P2P networks as far back as the heydays of 2002:

[http://zgp.org/pipermail/p2p-hackers/2002-June/000621.html](http://zgp.org/pipermail/p2p-hackers/2002-June/000621.html)

~~~
Sami_Lehtinen
Even earlier, check out details of Freenet and GNUnet.

------
bcjordan
A Coding for Interviews [1] group member mentioned that reading through the
Java collections library [2] was the most valuable step he took while
preparing for his Google interviews.

In addition to getting a better understanding the standard data structures,
hearing a candidate say "well the Java collections library uses this
strategy..." is a strong positive signal.

[1]: [http://codingforinterviews.com](http://codingforinterviews.com)

[2]: He suggested reading the libraries here:
[http://www.docjar.com/html/api/java/util/HashMap.java.html](http://www.docjar.com/html/api/java/util/HashMap.java.html)

~~~
niyazpk
Thanks for the docjar link. Is there a way I can see a syntax highlighted
version of this?

~~~
jkscm
[http://grepcode.com/file/repository.grepcode.com/java/root/j...](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/java/util/HashMap.java#HashMap)

~~~
niyazpk
Wow. Thanks!

------
incision
Very nice summary.

I encountered many of these while reading through _Understanding The Linux
Kernel_ [0] and _The Linux Programming Interface_ [1].

Both are great books which are primarily about the "how" of the kernel, but
cover a lot of the "why" of the design and algorithms as well.

0: [http://www.amazon.com/dp/0596005652](http://www.amazon.com/dp/0596005652)

1: [http://www.amazon.com/dp/1593272200](http://www.amazon.com/dp/1593272200)

~~~
aray
I'd also add Love's _Linux Kernel Development_ [2], published in 2010. Great
resource overall for someone wanting to go developing the internals of the
kernel.

[2] [http://www.amazon.com/dp/0672329468](http://www.amazon.com/dp/0672329468)

~~~
incision
Yes.

I actually have that book as well. I don't know how I forget to mention it. As
I recall, it was bit less dense than the other two.

------
netvarun
On a (slightly) related note, you should also check out the author Vijay's
([http://www.eecs.berkeley.edu/~vijayd/#about](http://www.eecs.berkeley.edu/~vijayd/#about))
answer on the benefits of learning Finite Automata:
[http://cstheory.stackexchange.com/questions/14811/what-is-
th...](http://cstheory.stackexchange.com/questions/14811/what-is-the-
enlightenment-im-supposed-to-attain-after-studying-finite-
automata/14818#14818)

~~~
ced
_There are standard examples where the minimal deterministic automaton for a
language is exponentially larger than a minimal non-deterministic automaton_

Can anyone provide one such example, please?

~~~
prutschman
"the language of strings over the alphabet {0,1} in which there are at least n
characters, the nth from last of which is 1. It can be represented by an (n +
1)-state NFA, but it requires 2n DFA states, one for each n-character suffix
of the input."

[http://en.wikipedia.org/wiki/Powerset_construction#Complexit...](http://en.wikipedia.org/wiki/Powerset_construction#Complexity)

~~~
comex
For reference, that "2n" is supposed to be 2^n.

~~~
prutschman
Thanks for catching that.

------
eshvk
The stack exchange comment was amazing. You can't get a better raison d'etre
for why studying algorithms is important.

~~~
stiff
Pardon me, but your french would leave a better impression if you would use it
in a grammatically correct way. "reason for existence for why studying
algorithms is important" ?

~~~
d0m
I speak french and this is the correct use.

It's not really french though, more like french words that are used and are
now part of the english language.

Raison d'être just means 'why is it there'.

~~~
why-el
> it's not really french though

I know what you mean, but this _is_ correct french, as in the french use it to
denote the exact same thing.[1] There are many French phrases that change
meaning as they are ported to English, but this is not one of there.

[1]
[http://fr.wikipedia.org/wiki/Raison_d%27%C3%AAtre_(homonymie...](http://fr.wikipedia.org/wiki/Raison_d%27%C3%AAtre_\(homonymie\))

------
jackhammer2022
More implementations listed at:
[http://cstheory.stackexchange.com/a/19773](http://cstheory.stackexchange.com/a/19773)

~~~
davexunit
What a fantastic read. Thank you for posting the original.

------
mrcactu5
I like reading these off stack-exchange since I am often to lazy to read the
textbook.

My other problem with algorithms textbooks is that I get into arguments _with
other developers_ about how much we need them. At least here, I can say "Look
bucko, the Linux kernel itself uses them."

I decided we can do programming at the API level and never have to think to
_how_ that API gives us the right answer. Lower-level programming is
responsible for optimization when our number of data points gets larger.

And we could go even lower level and ask why the algorithms work in the first
place - which is the computer science aspect. I routinely deal with developers
who feel they do not have time for this.

Also, if the data is small enough scale, we can brute-force it and nobody will
notice.

------
joshguthrie
These are great resources. Best advice I was ever given when starting CS and
learning C was from the headmaster (hi RR!) asking me "What about Linus's
linked-list? Have you looked at them?".

Up to that point, this (new) headmaster was seen by the students as "that non-
tech guy here to administrate the school" and he was opening my eyes on the
biggest codebase residing on my own computer that I never bothered looking
through: the Linux kernel code.

As someone says further down the comments, this is not a specific Linux thing:
looking at how Java HashMaps works or how Ruby implements "map" are great
resources and you'll always get bonus points in an interview for referencing
algorithms from "proven" source codes.

------
avisk
Awesome answer. This is a treasure for anybody wanting to read data structures
& algorithms. I always felt bored to read data structures for the sake of
reading them or for interviews with some made up examples. I am sure we can
quote many other open source projects with interesting uses of these data
structures. This is way more interesting than reading source code of data
structure libraries in programming languages.

------
aceperry
Excellent, I love reading this stuff. Very helpful and informative for those
of us who are interested in computer science but studied in a different field.

------
chintanp
My favorite algorithm has been the linked list implementation, pretty useful
for implementing list on embedded platforms.

~~~
ExpiredLink
> My favorite algorithm has been the linked list implementation, pretty useful
> for implementing list on embedded platforms.

You can use GPL'd code for your work?

~~~
octo_t
I'm guessing you mean "can't use GPL'd code" here.

embedded work is different, in that you often embed your container in your
data-structure, so you end up with something like:

    
    
      typedef struct {
        int x;
        int y;
        Point *next;
      } Point;
    

this allows you to save on overhead of extra allocations when creating your
list (but does mean you need to create extra functions for

    
    
      Point* find_list(Point* p, int x, int y);

~~~
eps
That's not what he meant.

He meant that GP is so dumb that he is forced to use Linux .h in his project
(along with all arch dependencies) rather than to take 5 minutes and code it
from scratch. And that he is ignorant of licensing matters of GPL'd code or,
more likely, he just doesn't give a f_ck about them. That'd be the gist of
what ExpiredLink meant.

------
alok-g
Wow! Does anyone know more about the author Vijay D.? Is this the person:
[http://www.eecs.berkeley.edu/~vijayd/](http://www.eecs.berkeley.edu/~vijayd/)

~~~
alecdbrooks
Given that he lists Berkeley on his Stack Exchange profile and the same
research interests, I'd say yes.

------
topynate
Could someone explain what the utility of bubble sort is? I've read that even
in cases where an O(nlogn) sort is impractical, insertion or selection sort is
preferred.

~~~
Negitivefrags
Bubble sort is good for sorting particles in a particle system. Particles need
to be drawn from the furthest from the camera to the closest to the camera.

Each frame the particles move a little bit, and the camera moves a little bit.

That means that in a given frame most, if not all, of the particles are
probably already sorted. In addition, if the sort order has changed, it's
probably only requires swaps of adjacent particles.

Because of this, bubble sort is often best sort to use for this operation.

~~~
jeltz
Wouldn't insertion sort be even better then? As far as I recall insertion sort
is always better than bubble sort, easier to understand too so I do not see
why bubble sort is so popular in CS courses.

------
almosnow
Amazing answer!, unfortunately 'this is not a good fit for our Q&A format'.

~~~
VLM
There's a general rule on Stack Exchange that something isn't worth reading
unless the deletionists are having a cow about it.

Come on guys, we need to save valuable and expensive disk space for those oh
so precious "[http://lmgtfy.com/"](http://lmgtfy.com/") questions.

You can find on topic high detail accurate cited analysis of technical
questions everywhere else on the internet (insert sarcasm); stack exchange is
not for that; its for people who are somehow smart enough to use SE but not
smart enough to use google.

And that's the value of this HN article; SE has made itself irrelevant, so
when a valuable gem floats by in its sewer, unless someone points the gem out,
no one will ever see it again.

Its too bad, the tech behind SE, and some of its ideas, and obviously the
subject matter, could obviously create a better site than SE.

~~~
rsync
So the deletionist movement has jumped hosts from wikipedia to SE ?

How sad. How sad and lame.

~~~
almosnow
Yeah, and I agree with VLM; SE sites were amazing once they started but now
its just filled by a bunch of ass __ __*.

I recently asked for an opinion on the fastest algorithm to layout elements on
a webpage (something I consider should be Web Developers 101 and actually
didn't found anything on google) and my question got closed promptly because
'it was not constructive'. Come on...

------
blahbl4hblahtoo
Wow. That's so cool.

------
ExpiredLink
I'd be interested in Basic Data Structures and Algorithms in C that are
published under a non-viral license.

