
Decoded: GNU Coreutils - bshanks
https://www.maizure.org/projects/decoded-gnu-coreutils/index.html
======
AstroJetson
Interesting project, this would make a good teaching tool for people
interested in more operating system interaction.

On the website it would be nice if you had a way to flag which ones were fully
annotated. But I like the main page design. The block graphics on the flow is
very helpful.

You've done some very nice work here.

~~~
SamuelAdams
Agreed, this would be very beneficial to students and people learning C.

~~~
saagarjha
I have used the source code for Plan 9's core utilities as a teaching aid for
students learning the basics of C and operating systems to great effect. I
have found GNU's tools to contain a lot of "cruft" to make them run fast,
which is not necessarily conducive for those who are not familiar to why they
work.

~~~
pantalaimon
Busybox and Toybox also contain pretty simple and easy to follow
implementations.

~~~
AstroJetson
Busybox is a fun read on how they got all that functionality into such a small
space. It always amazes me that my home router has all those functions because
of Busybox being loaded.

OTOH, I shouldn't because I think my current router has more CPU / Memory than
my first mainframe.

------
AceJohnny2
Huh, I'd never heard of FTS before, the functions used by Coreutils to
traverse the filesystem:

[http://man7.org/linux/man-pages/man3/fts.3.html](http://man7.org/linux/man-
pages/man3/fts.3.html)

~~~
woodruffw
FWIW, the POSIX equivalents are `ftw` and `nftw`[1]. POSIX 2008 deprecates
`ftw`.

[1]:
[https://pubs.opengroup.org/onlinepubs/9699919799//functions/...](https://pubs.opengroup.org/onlinepubs/9699919799//functions/nftw.html)

~~~
wahern
ftw/nftw is a crap interface as it relies on C callbacks and isn't re-
entrant.[1] Android/Bionic, Darwin/macOS, DragonflyBSD, FreeBSD, glibc, musl
libc, OpenBSD, NetBSD, and even Solaris all support the FTS API.[2]

Of the extant Unix environments only AIX and QNX seems to lack it. HP-UX
doesn't seem to support it, either, but I don't count HP-UX as extant. In any
event it's trivial to copy an FTS implementation from a BSD.

I'm a strong advocate for adherence to POSIX, but in this case there's
significant benefit from using FTS and very little, if any, cost.

[1] FTS uses a C callback for its comparator, which can be a headache but not
nearly to the same extent as it is with nftw.

[2] It's originally from BSD Reno. It was adopted by glibc, which in turn has
forced commercial vendors like Solaris to support it as they're chasing
Linux/glibc API compatibility. I would expect AIX to add it eventually.

~~~
woodruffw
Ah, linux-man bites again: the Linux versions of `nftw` and `ftw` claim to be
re-entrant, so I assumed that POSIX specified them as such. Looks like I was
wrong about that.

~~~
Anthony-G
Red Hat-based distributions include POSIX man pages out of the box.

    
    
        $ whatis ftw
        ftw (3)              - file tree walk
        ftw (3p)             - traverse (walk) a file tree
    

Run `man 3p ftw` to read the POSIX version.

On Ubuntu systems, the following packages can be installed to provide the
POSIX man pages:

    
    
        manpages-posix
        manpages-posix-dev

------
Tepix
Super interesting. I had a brief look at "head" and "df" and the flows seem to
be complete.

The flow for "tr" is not completely right given that the second parameter
("string2") is optional with certain options.

The text for "yes" mentions how it fills a buffer with multiple copies of the
string to achieve its very high performance. Nice!

~~~
yodon
Reminds me of a time at a large animation studio where I was working on
performance issues with the in-house asset management system.

The asset management tool was fully commandline based, no GUI. An artist
showed me what was involved in committing their scene: launch the command,
type yes, hit return, wait about 30 seconds, type yes, hit return, repeat. I
piped yes to stdin and it ran for 47 hours.

If you need a performance-optimized version of "yes" there are probably other
aspects of your architecture you should look at first.

~~~
masklinn
yes(1) takes a custom "expletive", so in the performance context it's usually
used for flood filling in situations where e.g. /dev/zero or /dev/urandom are
not appropriate sources for some reason.

~~~
saagarjha
yes outputs newlines, though, which makes it somewhat annoying to use for
flood filling in practice.

------
ben165
Nice project. Just looked for the source code of "yes"
([https://github.com/coreutils/coreutils/blob/master/src/yes.c](https://github.com/coreutils/coreutils/blob/master/src/yes.c)),
cause it seems to be the easiest code to understand.

I didn't get it. Damn. Looks still complicated.

~~~
yjftsjthsd-h
Yeah, `yes` ended up complex in order to get better performance; if you want
something simpler, maybe try [https://www.maizure.org/projects/decoded-gnu-
coreutils/env.h...](https://www.maizure.org/projects/decoded-gnu-
coreutils/env.html) ?

------
OJFord
Oh, this is great! And it's one piece of something I've been looking for (so
hopefully a good comment section to ask for more): a book on how (GNU/)Linux
'works'.

i.e. I'm not interested in a book of commands or cheat sheets, I can use `man`
and SO, Arch wiki, etc. for that when needed.

I think part of the problem is that I don't know what I don't know - but I
discovered namespaces (`/proc/<pid>/ns/<type>`) recently and don't know much
about it but thought it was interesting, and managed to use it to do what I
needed to overcome a problem I was having with `ip netns`. A similar 'aha' was
with inodes a while ago.

So, any recommendations for a book on 'how Linux works'? I think it's a gap in
my understanding (academically EE/CS - hardware, up to OS but not Linux-
specific, theoretical CS; professionally software, 'using' Linux). Best
candidate I've found is Brian Ward's 'How Linux Works: what every superuser
should know'.

------
reuben_scratton
I'm not particularly interested in coreutils per se but this is a super-
interesting way to look at any piece of software. I just _love_ the block
diagrams... why can't source code look like this?

Great work!

~~~
stormbrew
There are languages that model code like this. Using one for a while will
likely teach you why. If you think "spaghetti" is an apt description for text
code, wait until you see flow based code that's evolved in place for a while.

~~~
reuben_scratton
Presumably all the detail is shown at once? I'm picturing something more like
Google Maps navigation where you zoom into the source diagram and the code
appears.

Perhaps I'm imagining an alternative way to manage source trees, or modules,
rather than the code in individual source files. A hierarchy like a
traditional file system but with extra kinds of relationships. Perhaps it'd
have to be curated like reference documentation, perhaps it could be
automatically generated from source directives. I dunno, just idle thoughts...

------
hermitdev
This is probably going to be an unpopular opinion, I dont agree with the
criticism of the use of goto.

I know, I know, the classic "goto Considered Harmful". And, yes, it can be
abused just like every other programming construct.

I see 2 generally good use cases for goto: 1) error handling to cleanup in
failure in C (in C++ RIAA constructs ahould be used instead) 2) exitting
nested loops, instead of a flag and a break.

I've also seen C++ code use do while loops with while(false) so that they
could use "break" to exit the non loop. Its very confusing and non obvious,
because you see the do, and thus expect a loop. Meanwhile (the absurdly long
function) has the while(false) on a separate screen, not making it clear that
the code doesn't actually loop. Just use the goto. It's clearer and omits the
abuse of a looping primitive (that doesn't actually loop).

Yes, goto can be abused, so can many other statements.

Sorry, /rant

------
nn3
that's fairly cool. Wish such sites were available for more open source
packages.

Always wanted to know the algorithms used in sort.

So ls has more source lines than sort? Funny, but a bit depressing too.

~~~
Lt_Riza_Hawkeye
ls is kind of complicated, has a billion flags, and has to print pretty so it
has to do some ioctls to get the terminal width and such, then column things
out nicely so they don't overflow. Glad I'm not implementing ls from scratch.

~~~
michaelmior
exa[0] dubs itself a "modern" ls written in Rust. Interesting to compare.
Still over 6,000 lines not including comments.

[0] [https://github.com/ogham/exa](https://github.com/ogham/exa)

------
ketanhwr
This guy has some really nice articles, but I'm unavailable to find any link
to his RSS feed. Any help?

------
myroon5
I've been trying to contribute additional flags for some coreutils commands,
but I've been having trouble building on macOS. Should I use a certain VM or
container instead? Does anyone have a link to instructions for building on a
mac?

~~~
mattl
Do you have a modern GCC installed?

~~~
myroon5

      $ gcc -v
      Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
      Apple LLVM version 10.0.1 (clang-1001.0.46.4)
      Target: x86_64-apple-darwin18.6.0
      Thread model: posix
      InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

~~~
efficax
I would not at all be suprised to learn that GNU coreutils does not build with
clang/llvm. Try installing a real GNU C compiler with homebrew (brew install
gcc)

------
bshanks
If you like this, the author posts a number of other 'Decoded' episodes:
[https://www.maizure.org/projects/](https://www.maizure.org/projects/)

------
NikkiA
I never noticed `runcon` before, it and `chcon` seem decidedly out of place
being in core-utils, being that they're a) linux specific, b) SElinux
specific.

------
p0cc
This is amazing. The author needs a donate button!

------
tinktank
Are you doing this by manual inspection or in some semi-automated or automated
way? In any case, very nice work.

------
enriquto
this is very cool. Yet GNU utils have so many options that the interesting
bits are difficult to find. The same display but for openbsd utils would be
still more enlightening.

------
ausjke
the diagram is really nice,is it done by dia?

I use dia once a while, and wish freeplane could draw arrows between its nodes
as freeplane is easier to add child nodes.

------
xerxex
This is fantastic work!

------
theqoo
It is true archeology.

------
cat199
Or, just browse the originals:

[https://www.tuhs.org/cgi-
bin/utree.pl?file=V7/usr/src](https://www.tuhs.org/cgi-
bin/utree.pl?file=V7/usr/src) [https://www.tuhs.org/cgi-
bin/utree.pl?file=4.3BSD/usr/src](https://www.tuhs.org/cgi-
bin/utree.pl?file=4.3BSD/usr/src)

and some of their current decedents:

[https://svnweb.freebsd.org/](https://svnweb.freebsd.org/)
[http://cvsweb.openbsd.org/cgi-bin/cvsweb/](http://cvsweb.openbsd.org/cgi-
bin/cvsweb/)
[http://cvsweb.netbsd.org/bsdweb.cgi/src/?only_with_tag=MAIN](http://cvsweb.netbsd.org/bsdweb.cgi/src/?only_with_tag=MAIN)
[https://gitweb.dragonflybsd.org/dragonfly.git](https://gitweb.dragonflybsd.org/dragonfly.git)

conveniently all in one source tree, self hosting, and buildable with a single
command.

it is almost as if they were designed as part of a single coherent operating
system along with the very C language itself..

as discussed previously, typically with more intelligible sources:

[https://news.ycombinator.com/item?id=14542938](https://news.ycombinator.com/item?id=14542938)

and usually with better man pages
([https://www.freebsd.org/cgi/man.cgi](https://www.freebsd.org/cgi/man.cgi))

unix came first. gnu and posix came later.

the unix way is the standard, not posix, and not gnu.

from stallman himself
([https://stallman.org/articles/posix.html](https://stallman.org/articles/posix.html)):

" It seemed to me that nobody would ever say "IEEEIX", since the pronunciation
would sound like a shriek of terror; rather, everyone would call it "Unix".
That would have boosted AT&T, the GNU Project's rival, an outcome I did not
want. So I looked for another name, but nothing natural suggested itself to
me.

So I put the initials of "Portable Operating System" together with the same
suffix "ix", and came up with "POSIX". It sounded good and I saw no reason not
to use it, so I suggested it. Although it was just barely in time, the
committee adopted it. "

so, as can be seen, posix is RMS's way of keeping people from calling unix
unix, and as a result today, now that AT&T Unix is nearly dead, people lose
track of the fact that BSD _is_ Unix, and don't know what Unix actually is,
and are forced to 'decode' the sometimes deliberately less clear cloned
sources not knowing to look at the simpler original.

~~~
AceJohnny2
I think you missed the point of the website. It's not about the source code
itself, but providing a higher-level overview of the utilities' architecture.

~~~
cat199
Did I?

"This resource is for novice programmers exploring the design of command-line
utilities. It is best used as an accompaniment providing useful background
while reading the source code of the utility you may be interested in. "

^ "best used as an accompaniment"

I posit that GNUs intentionally obfuscated code design[1]^ requires the use of
such a site, whereas, reading the originals and their decedents, provides a
similar level of understanding program functionality _in addition_ to the
actual historical context in which most of the tools were developed _and,
also,_ typically occurs in the same source tree as the kernel, c library, and
build toolchain implementing the OS side of the equation and so is a better
and more productive thing to do with one's time that will still apply to
reading and understanding coreutils sources.

apologies if unclear. but kudos to the author for trying to do something
positive to spread knowledge of system internals, lest I be misconstrued.

.. [1] [https://www.gnu.org/prep/standards/standards.html#Reading-
No...](https://www.gnu.org/prep/standards/standards.html#Reading-Non_002dFree-
Code)

.. ^: the design of a utlity like 'cat'\+ has generally obvious direct
implementation. coming up with something else to be different necessitates
doing goofy stuff. see also other hn thread referenced.

.. +: see also:
[https://github.com/coreutils/coreutils/blob/master/src/cat.c](https://github.com/coreutils/coreutils/blob/master/src/cat.c)
vs [https://www.tuhs.org/cgi-
bin/utree.pl?file=V7/usr/src/cmd/ca...](https://www.tuhs.org/cgi-
bin/utree.pl?file=V7/usr/src/cmd/cat.c)

~~~
serhart
I'm pretty curious on how you conclude that GNU is "intentionally obfuscating
code" from [1]. I read [1] as good advice to avoid inadvertently getting code
into GNU that could be claimed by copyright. Focus on speed instead of memory;
simplicity instead of speed. I don't see "make it different for difference
sake."

I also conclude the opposite from the cat implementations. I really don't see
how the BSD cat is a more "pure" or straight forward implementation. I find
the GNU cat to have way more options and easier to read source code. My guess
is that novice programmers would find the GNU version easier to grok, which
also probably aligns more to the goal of GNU. Or I just like to read
obfuscated code.

~~~
ggm
_I really don 't see how the BSD cat is a more "pure" or straight forward
implementation. I find the GNU cat to have way more options_

Way more options is "cat -v considered harmful" not thought any more?

~~~
serhart
If you aren't trying to be Unix I would imagine you aren't thinking that. GNU
probably doesn't subscribe to that notion.

------
ape4
Rather than a flow chart, etc. It would better to convert the utilities to
simple C with no error handling or optimization. So `cat` would be about 5
lines. No mmap() or fancy stuff.

~~~
zyztem
You might find source of Xv6 - educational version of unix - useful. Here is
their implementation of cat, all 44 lines of it: [https://github.com/mit-
pdos/xv6-public/blob/master/cat.c](https://github.com/mit-
pdos/xv6-public/blob/master/cat.c)

