Evolution of the Unix System Architecture: An Exploratory Case Study

pram · on Sept 23, 2021

"We could not study Unix versions that derive from the Research editions via AT&T System V, such as Solaris, AIX, and HPUX, because most of the corresponding code remains proprietary and inaccessible."

Understandable but disappointing. HPUX and AIX in particular are probably the closest thing to "living" UNIX time capsules.

jjav · on Sept 23, 2021

Solaris was open sourced as OpenSolaris, so they do have access to that. Oracle stopped merging to it, to noone's surprise, but the source of what's essentially Solaris 10 is available and formed the basis of illumos and other related projects.

https://en.wikipedia.org/wiki/OpenSolaris

throwawaylinux · on Sept 23, 2021

There was a very interesting time around 2000 where SMP scalability was a very hot topic among the serious open source kernels, with heated debates in Linux and BSD communities (it precipitated the forking of FreeBSD into DragonflyBSD) about how they should be designed, or whether it was even feasible to have the same kernel design that would work well on a desktop as a high end server.

Proprietary UNIXes had already gone through this, and scaled up to pretty big machines like the Sun E10K. Although none were aimed at small desktop machines, and they tended to have a reputation (deserved or not) for being weighted down by, among other things, locking ("Slowaris"). Complex layers of locks, which open source developers feared would result in a complexity explosion that would sink their kernels.

Linux famously was granted the use of the patented RCU algorithm by IBM, and that's the way they went. Linux does have a lot of locks and locking complexity, but RCU massively helped on that front. The free BSDs did trail in scalability but they ended up developing other similar techniques and lock management and debugging facilities that has seemed to hold them in pretty good stead despite the huge developer-hours advantage Linux has.

All this stuff was pretty interesting to me so when OpenSolaris was released later, I had to take a look and see what they did. And it was a big clunky looking thing. They had big hash tables of locks, layers of them. Hashes of locks is a simple trick for scalability which can work very well, but it also constrains your data structures significantly. A nice salable and cache-friendly tree can't (so easily) be used with hashed locks, hash tables are the more natural structure you're steered to. Which can be great and simple for some cases, but are not very cache friendly in most accesses with locality (even cache friendly hashes are a bit hacky or at least come with tradeoffs, not that I remember seeing any of those).

I went back and looked at it again -- like a page cache page lookup in Solaris was layers locks, recursive locks etc.

inode->i_contents rwlock(R) (may be recursed)

page pse mutex (protects page selock)

page selock (shared/exclusive, custom lock)

page hash lock (hash of locks protecting page cache hash table)

4 locks, they all nest, recurse, etc, and they tend to be these complex boutique special lock functions -- the page se mutex and hash locks are taken and released by special locking functions (usr/src/uts/common/vm/page_lock.c) hundreds of lines of special lock beahvior.

It looks like the hash lock can be avoided in the fast path, but I list it because the nature of it means the page cache structure is a global hash table (which doesn't scale in terms of memory usage, and has poor cache locality, and problematic collision QoS issues, possibly DoS type attacks between privilege domains, etc. Just a poor, clunky data structure for the job.

The equivalent path in Linux has no locks: lock-free lookup of a tree attached to the inode to find the page and increment its refcount.

It really is quite amazing the leaps and bounds this type of scalable programming techniques have come since the turn of the century contrasting Linux and Solaris (today now Linux is much more scalable than Solaris ever was). I can absolutely see why people were so worried these things would end in a complexity death spiral.

throw0101a · on Sept 23, 2021

> All this stuff was pretty interesting to me so when OpenSolaris was released later, I had to take a look and see what they did. And it was a big clunky looking thing.

SunOS was running on 20 processor (née core) systems as early as 1993:

* https://en.wikipedia.org/wiki/Sun4d#SPARCcenter_2000

* https://ieeexplore.ieee.org/document/289692

* https://www.manualslib.com/manual/1731798/Sun-Microsystems-S...

So I'm curious to know if there was any refactoring at some point, or whether this was 'just' the best that could be done when they first wrote it, and then they left/were stuck with it. Certainly Linux would have had the advantage of more years of research when it finally got around to solving the same problem.

throwawaylinux · on Sept 23, 2021

I'm not sure, not having any experience of proprietary OS development.

I could guess. The technology was not an insignificant thing (RCU as mentioned). Possibly more important was that Linux didn't have such a pressing need to scale up, whereas for Solaris they were a hardware company that made these big systems so they had to get it working at all costs. Linux's main market was 1 CPU systems for a long time, and there was strong push back against complexity or slowdowns caused by scalability improvements so the bar for them was very high.

bcantrill · on Sept 23, 2021

Your comment reflects a couple of fundamental misunderstandings about both SMP architectures and the locking primitives in the kernel; if you would like technical details that guided some of the implementation decisions in illumos, see the paper that I wrote with Jeff Bonwick in 2008.[0] There are several pieces of advice in there that particularly apply to your comment, first among them: "Intuition is frequently wrong -- be data intensive."

[0] https://queue.acm.org/detail.cfm?id=1454462

throwawaylinux · on Sept 23, 2021

> Your comment reflects a couple of fundamental misunderstandings about both SMP architectures and the locking primitives in the kernel;

Oh? What are my misunderstandings?

pjmlp · on Sept 23, 2021

Although Aix has quite a few capabilities that aren't that UNIX and more like VMS/Windows.

The way shared libraries work, for example.

trasz · on Sept 23, 2021

Can you say a bit more about the libraries?

pjmlp · on Sept 23, 2021

Easy, just like Windows nowadays, because Aix uses COFF not ELF, and you also get private by default and description files listing which symbols are public.

https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=libr...

CreateExportList is somehow similar to creating an import library in Windows,

https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=libr...

Here are all details, https://download.boulder.ibm.com/ibmdl/pub/software/dw/aix/e...

rjsw · on Sept 23, 2021

AT&T SysV UNIX used COFF and had a similar process for creating shared libraries, don't think you can claim that it is not UNIX-style.

pjmlp · on Sept 23, 2021

Might be, but when I came to Aix 5, all other UNIXes used ELF so I never used that classical version.

In fact, I never created shared libraries in either Xenix or DG/UX, and my first Linux was with kernel 1.0.9, where ELF was introduced.

So thanks for the correction.

inkyoto · on Sept 23, 2021

AIX introduced an extended COFF object and executable file format, XCOFF, that has eliminated the need to have both, static and shared, library files – the .a files were «both» at once and the linker switch had to be used to select between a statically or dynamically linked executable type. Which caused a lot of confusion for some at the time leading them to believe that AIX had no shared libraries.

userbinator · on Sept 23, 2021

...and even more disappointing because a quick search of the Internets found both AIX and HPUX source. But then again, given IEEE member's pro-DRM pro-copyright stance, maybe they didn't want to.

marbu · on Sept 23, 2021

Even if you found a single snapshot of allegedly AIX or HPUX source code of unknown origin, you would have a problem with referencing to it so that your claim can be both trusted and verified. Moreover it would still be nowhere near the level of detail we have for systems with public source code, where you have snapshots of most versions, later on we have particular commits ...

hulitu · on Sept 23, 2021

If you know too much we will have to ... you. It's sad that in the western civilization, copyright and DRM are used to limit access to information.

throwawaylinux · on Sept 23, 2021

Extremely interesting trend in cyclomatic complexity increasing then dropping in what looks like all systems and programs analyzed.

Interesting that Linux kernel is unusually low, about 4.5 while all the others (except gnu coreutils) are around 6.

cat199 · on Sept 23, 2021

"In this paper we study the evolution of Unix along the FreeBSD lineage"

For those that don't realize BSD is Unix.

gandalfgeek · on Sept 23, 2021

Short explainer video for those who don’t want to read the whole thing:

https://youtu.be/eCYkzciF28E

dmolony · on Sept 23, 2021

[PDF]