

Linux internals - jasontan
http://duartes.org/gustavo/blog/category/internals

======
xtacy
The illustrations look really nice! He says he uses Visio 2007:
[http://duartes.org/gustavo/blog/post/quick-note-on-
diagrams-...](http://duartes.org/gustavo/blog/post/quick-note-on-diagrams-and-
the-blog).

------
zi
I read through the entire article. Very fascinating. I admit that I don't
understand but more than 5% of what is being explained here, but it is a VERY
in depth look at how things are mapped out. I wasn't aware that operating
systems were so wasteful in the area of redundant resource management. Mad
props on this article.

~~~
danvet
No wasteful redundant resource management. It might look like that, but for
every data structure in the kernel there's a use-case that absolutely needs
the corresponding lookup/access to go fast. A bunch of examples from memory
management:

* You need your pagetables for the cpu to find your memory.

* You need your vma tree in each struct mm to quickly go from virtual address to the backing store to handle missing pagetable-entries. People routinely allocate insane amounts of virtual memory and just use a few bytes, so prefilling all the pagetables is a no-go.

* You need the struct page for every physical page to the backing store (file or anonymous memory) and keep track of it's state (e.g. locked into place due to i/o, dirty and needs to eventually be written back to the filesystem).

* You need the radix pagecache tree to quickly go from file-offset to it's cache page (read/write and page fault handling). Also you need to walk that tree in-order to write back dirty pages efficiently (i.e. avoid tons of random seeks).

* All the connections are used, too. E.g. when shrinking the size of a file, all the vma's that map it are looked up via the file's address_space to shoot down these mappings (and deliver SIGBUS in case userspace access these now invalid virtual address).

* Anonymous memory (i.e. not memory mappings of a file with the pages in the file page cache) also neds to be tracked in a similar way. It's sligthly different because anonymous memory has a few different corner-cases with Fancy Behaviour (tm) like the shoot-down-all-mappings-on-truncate dance necessary for file-backed memory.

And often-used data structures are insanely optimized: Fields that are not
used at the same time overlap (struct page is the most obvious example). And
because pointers are always aligned to 4 bytes, there are 3 bits available in
each that are used to store some information or the type of the object pointed
to (e.g. the color of the red-black tree implementation is stored like this).
Further structures are as often as possible embedded into each another to
avoid the cache-miss when chasing a pointer.

Because it's all extremely optimized, it's easy to assume all memory access is
O(1). It's not, and every time someone blurbs that his hash-table has O(1)
access times my brain immediately goes "BS! ... not on any real hardware and
not on any real kernel." Of course, you need pretty big hash-tables to notice
the effect ...

EDIT: layout fixes.

------
iuguy
This is probably one of the best descriptions of Linux memory management I've
ever seen:

[http://duartes.org/gustavo/blog/post/how-the-kernel-
manages-...](http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-
memory)

Great find!

------
grourk
"The first time I saw FSB packet descriptions I had a huge “ahhh!” moment so I
hope someone out there gets the same benefit."

I think he means "aha!" moment, but it certainly could go either way.

