
The Source History of Cat - janvdberg
https://twobithistory.org/2018/11/12/cat.html
======
LukeShu
I was surprised to read that some versions of cat (apparently BSD Net/2 and
derivatives) have special code for _sockets_. What does cat do with sockets!?

Well, AF_UNIX sockets are sockets with paths in the filesystem. You must
either connect() to them or bind() to them instead of open()ing them.
Apparently, BSD-derived versions of cat will try to connect() to to a file if
open() fails.

With GNU cat, if you try to cat a socket, it will go like this:

    
    
        $ ls -l test.sock
        srwxr-xr-x 1 luke users 0 Nov 12 21:07 test.sock
        $ cat test.sock 
        cat: test.sock: No such device or address
    

but BSD-derived cats will successfully open the socket for reading. That
behavior can be accomplished on other systems by using socat instead; BSD cat
behaves somewhat like:

    
    
        $ socat UNIX:test.sock STDOUT

~~~
loeg
Hah, I learned something about cat today. Thanks.

Amusingly, the BSD socket behavior can be disabled with the compiler macro
-DNO_UDOM_SUPPORT, but as far as I can tell it is not documented nor hooked
into the rest of the build system in any way since its introduction in 2001:

[https://svnweb.freebsd.org/base?view=revision&revision=83482](https://svnweb.freebsd.org/base?view=revision&revision=83482)

------
enneff
The plan9 cat is nice:
[https://github.com/pete/cats/blob/master/plan9-cat.c](https://github.com/pete/cats/blob/master/plan9-cat.c)

~~~
christophilus
Agreed. But the lack of `{}` raises my blood pressure a few points...

------
zeveb
> But, if you pull up the manual page for something like grep, you will see
> that it has not been updated since 2010 (at least on MacOS).

Well, GNU grep was last released 16 months ago, and the last change to its
master branch was 4 weeks ago:
[http://git.savannah.gnu.org/cgit/grep.git](http://git.savannah.gnu.org/cgit/grep.git)

FreeBSD's grep was last updated back in August:
[https://github.com/freebsd/freebsd/tree/master/usr.bin/grep](https://github.com/freebsd/freebsd/tree/master/usr.bin/grep)

OpenBSD's grep was last updated 11 months ago: [http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/usr.bin/grep/](http://cvsweb.openbsd.org/cgi-
bin/cvsweb/src/usr.bin/grep/)

Oddly, it looks like the Darwin grep was last updated in 2012:
[https://opensource.apple.com/source/text_cmds/text_cmds-99/g...](https://opensource.apple.com/source/text_cmds/text_cmds-99/grep/grep.c.auto.html)

Strange that Apple would be shipping such an ancient grep.

~~~
setr
Iirc, Apple stopped updating but continued shipping all gnu utilities since
gplv3 was attached to them

~~~
LukeShu
I don't believe that macOS grep was ever GNU grep. I believe that macOS always
used a BSD variant of grep.

~~~
yesenadam
Using OS X 10.4.11 here, the grep file is dated Jan 2006, the end of the grep
man pages says "2002/01/22".

    
    
      $ uname -v
      Darwin Kernel Version 8.11.1: Wed Oct 10 18:23:28 PDT 2007; root:xnu-792.25.20~1/RELEASE_I386
      $ grep --version
      grep (GNU grep) 2.5.1
    

Other man pages: ed says 1993, sed says BSD 2004, cat says 3rd Berkeley
Distribution 1995.

~~~
LukeShu
Interesting. What does `type grep` say? Is it possible that it's
/usr/local/bin/grep from homebrew/macports/…, and that /usr/bin/grep is BSD
grep?

I found a comment claiming that prior to 10.8 (2012, Mountain Lion) it used
GNU grep, but nothing I'd feel comfortable citing.

~~~
yesenadam

      $ type grep
      grep is hashed (/usr/bin/grep)
    

It does seem to be the original grep for this machine (it's a Mac Mini) - it
has the same Jan 2006 date as most of the files in /usr/bin, and nothing has
an earlier date. There's no other file called grep elsewhere.

------
Isamu
Really nice history! I want to applaud the author on this loving treatment.

Also I want to point readers to the commentary of some of the Unix authors:

“Old programs have become encrusted with dubious features. Newer programs are
not always written with attention to proper separation of function and design
for interconnection.”

[http://harmful.cat-v.org/cat-v/unix_prog_design.pdf](http://harmful.cat-v.org/cat-v/unix_prog_design.pdf)

My point being: Unix (and derivatives) encompass a set of people who disagree
about what constitutes Unix philosophy.

~~~
loeg
> My point being: Unix (and derivatives) encompass a set of people who
> disagree about what constitutes Unix philosophy.

That's certainly a Unix truism! It seems everyone has their own subjective
beliefs about what Unix should be and decides their own beliefs constitute
"the" Unix philosophy.

------
akkartik
Interesting to think what a different conclusion the article would have
arrived at if he'd chosen to look at GNU _cat_ on Linux. A few sample points:

* 2002: 833 LoC ([http://landley.net/aboriginal/history.html](http://landley.net/aboriginal/history.html))

* 2013: 36kLoC, 2/3rds of them .h files ([https://news.ycombinator.com/item?id=11340510#11341175](https://news.ycombinator.com/item?id=11340510#11341175))

* 2018: 37kLoC of .c file dependencies going into libcoreutils.a and some LoC of .h files (coreutils has 60kLoC of .h files)

The methodology for counting lines likely isn't consistent across those data
points. But the trend is still unmistakeable. Maybe I'll tree-shake all the
dead code out and come up with an accurate line count one of these days..

~~~
akkartik
I just performed an _ad hoc_ file-level tree-shaking for 'src/cat.c' in GNU
coreutils 8.30, starting with `gcc src/cat.c` and gradually adding arguments
until I got it to build. Here's the command I ended up with.

    
    
        gcc -I. -I./lib /
          src/version.c /
          lib/progname.c /
          lib/safe-read.c /
          lib/safe-write.c /
          lib/quotearg.c /
          lib/xmalloc.c /
          lib/localcharset.c /
          lib/c-strcasecmp.c /
          lib/mbrtowc.c /
          lib/xalloc-die.c /
          lib/c-ctype.c /
          lib/hard-locale.c /
          lib/exitfail.c /
          lib/closeout.c /
          lib/close-stream.c /
          lib/fclose.c /
          lib/fflush.c /
          lib/fseeko.c /
          lib/version-etc.c /
          lib/xbinary-io.c /
          lib/version-etc-fsf.c /
          lib/binary-io.c /
          lib/fadvise.c /
          lib/full-write.c /
          src/cat.c
    

Those .c files add up to 5021 lines.

The .c files include 44 header files:

    
    
        lib/binary-io.h
        lib/c-ctype.h
        lib/closeout.h
        lib/close-stream.h
        lib/config.h
        lib/c-strcaseeq.h
        lib/c-strcase.h
        lib/ctype.h
        lib/error.h
        lib/exitfail.h
        lib/fadvise.h
        lib/fcntl.h
        lib/fpending.h
        lib/freading.h
        lib/full-write.h
        lib/gettext.h
        lib/hard-locale.h
        lib/ignore-value.h
        lib/limits.h
        lib/localcharset.h
        lib/locale.h
        lib/minmax.h
        lib/progname.h
        lib/quotearg.h
        lib/quote.h
        lib/safe-read.h
        lib/stdio.h
        lib/stdio-impl.h
        lib/stdlib.h
        lib/string.h
        lib/sys/ioctl.h
        lib/sys-limits.h
        lib/sys/types.h
        lib/unistd.h
        lib/unused-parameter.h
        lib/verify.h
        lib/version-etc.h
        lib/wchar.h
        lib/wctype.h
        lib/xalloc.h
        lib/xbinary-io.h
        src/die.h
        src/ioblksize.h
        src/system.h
    

The header files add up to 19.7k lines.

So the total line count for files GNU cat _actually_ needs to build is _at
least_ ~25k.

(I didn't bother checking for headers including other headers.)

Next step: do this for various versions of GNU coreutils.

~~~
rurban
Much more code for much less functionality than the BSD cat which can do
sockets. Not surprised at all.

------
saagarjha
Strangely, it seems that many versions of macOS on opensource.apple.com are
missing grep. It used to be its own project until 10.7 Lion, after which it
disappeared and then reappeared under text_cmds in 10.12 Sierra.

~~~
LukeShu
Apparently, the 10.7→10.8 update is when macOS switched from GNU grep to
FreeBSD grep.

------
kazinator
> _My aunt and cousin thought of computer technology as a series of
> increasingly elaborate sand castles supplanting one another after each high
> tide clears the beach._

They are basically right though.

The counterexample of some Unix utilities means nothing. You're not getting a
CS degree in order to develop the next version of _cat_ , are you?

We have some things with a long history and they are easy to identify. It is
just hindsight being 20/20.

For every one of those things, there are countless that can't be seen or felt.
They aren't here; they got washed away.

Who uses the Michigan Terminal System?

Or a web framework from ten years ago?

~~~
escape_goat
I am not sure that computer technology would have become powerful,
inexpensive, and ubiquitous to the extent that it is become today were his
aunt and his cousin correct.

The aunt and the cousin are thinking that 'computer technology' exists at the
level of abstraction of the sandcastles in the metaphor. To some extent it
does, but the vastly greater part of it is at the level of abstraction of the
knowledge and theory of building sand castles, as gained over the course of
many iterations.

One of the most common themes one hears, when reading what people write about
computer science, is how few _new_ ideas in computer science are actually
involved in nearly anything anyone does on a computer (or teaches at the
undergraduate level).

~~~
kazinator
The people implementing those ideas often _believe_ they are new, though.

------
pjmlp
If you like to have insights into how some UNIXes got built, these books are
quite interesting.

"The Design and Implementation of the 4.4 BSD Operating System"

"The Design and Implementation of the FreeBSD Operating System"

"Mac OS X Internals: A Systems Approach"

"Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture"

"HP-UX 11i Internals"

"IA-64 Linux Kernel: Design and Implementation"

~~~
JdeBP
How could you omit Bach and Comer? (-:

~~~
pjmlp
I never read it, thus cannot express my opinion about its contents.

Xenix manuals and later Steven's books were my introduction into UNIX world.

~~~
JdeBP
Not it, them.

* [http://jdebp.eu./FGA/operating-system-books.html](http://jdebp.eu./FGA/operating-system-books.html)

One of these days I shall get around to expressing _my_ opinions, which as you
can see are still missing. Indeed, the list itself is a decade out of date.
(-:

I have some SCO UNIX manuals on the other side of the room as I type this.

~~~
pjmlp
Very nice list.

------
rain1
I think that code bloat, especially in GNU, is a huge problem in our software
because it makes programs difficult to maintain, to understand and modify. I
feel like most people I interacted with online (present company excepted)
don't care about it and don't see it as a problem. I can get that it doesn't
affect them because they only use these projects as black boxes and don't
maintain them, so it isn't relevant to their work.

I created a wiki page to measure the number of lines of code* of various types
of software
[https://softwarecrisis.miraheze.org/wiki/Linecount](https://softwarecrisis.miraheze.org/wiki/Linecount)
\- LOC is a very very rough proxy for what I actually want to measure, but the
results are so stunning that even a inaccurate indirect measurement tells a
lot. You can see that for 2 projects that do essentially the same thing there
might be a 1000x difference in LOC.

It's fascinating what can happen to such a simple program like 'cat'. The same
effect is amplified further when you look at projects like gcc. I tried to ask
the question on a couple sites like stackexchange and reddit why does gcc take
half an hour to build instead of a fraction of a second but this question was
not taken well. I got a lot of resistance to it, X-Y answers, deleted etc. I
don't think that the common software engineer wants to take the idea seriously
that the day to day tools we use have a million fold inefficiency built into
them by accident. I also noticed that 'make' has no profiler, nobody has even
really done a breakdown of what takes how long to build in the gcc tree.

There are a lot of brilliant engineers who understand this problem and want to
solve it though. We see that in Alan Kay's STEPS project, aligrudi's work,
musl, toybox, maybe sbase and many of the independent bootstrapping projects
that have popped up. There's a lot of inertia and weight to the standard GNU
toolkit to push back against but I believe these problems are all solvable and
by solving them we can create programming languages and tools with leverage
far beyond what currently exists. I just hope such projects can be integrated
rather than be forgotten.

------
koyote
> [...] but it seems that many people still get most excited about the six
> months of work he put into rewriting cat [...]

Is it me or does 6 months seem like an awfully long time for re-writing such a
small and simple program?

~~~
rauhl
It was a different era, one in which computers were a lot slower, source
control was a lot more primitive, a lot of basic stuff was still being
invented, but … yeah, I feel a _lot_ better about my own productivity now!

~~~
ekun
While I agree about productivity now (although rewriting a source for cat that
is used decades later seems very productive), I think the above commenter has
it correct that it was probably a side project that he worked on and released
after 6 months and not so much the speed of the CPUs.

------
fanbelt
I worked with some version of Unix in 1984 that had a program called dog. It
would silently wait for a <CR> to be pressed after each screen of output. I've
never seen it anywhere else.

~~~
paulddraper
Was it
[http://www.linuxcertif.com/man/1/dog/](http://www.linuxcertif.com/man/1/dog/)
?

~~~
fanbelt
That looks like a different program named dog by coincidence. My dog had no
bells and whistles. Do you happen to know where the source for this is?

------
cestith
The __cat __utility is among the simplest, but once upon a time __true __was
about the simplest possible Unix utility.

    
    
        #!/bin/sh
    
    

Yes, that's really it. Fire up the shell, get it to exit with 0, which is
taken as success. That's all that's really necessary for its spec.

GNU's is around 29 KiB compiled, and it uses some of that to support
__\--version __and __\--help __flags. MacOS 's is around 17 KiB compiled and
ignores flags.

~~~
rain1
it used to be even simpler, a blank file

------
nuclx
That capital C in the title weirds me out.

------
rawoke083600
Cat is awesome :) There is also 'tac' (reverse of cat) installed on most
systems

~~~
jillesvangurp
I came across something called bat recently. It's a rust clone of cat with a
lot of nice features integrated. This seems to be a thing lately in the Rust
community to put out vastly improved versions of tools we haven't really
touched in ages. Loving it.

~~~
kungtotte
I'm a fan of exa as an ls-replacement :)

exa -l --git will list N/M git status flags in the output and:

exa --git-ignore will obey .gitignore when you're listing files :)

Works like a charm in my experience.

------
ccannon
I always wondered where the name cat came from which the article doesn’t
address. Any ideas?

~~~
drivers99
It's short for conCATenate.

original man page:
[http://man.cat-v.org/unix-1st/1/cat](http://man.cat-v.org/unix-1st/1/cat)

~~~
Isamu
Actually catinate, which is a real word but less used. But you are still
right.

~~~
LukeShu
In the earliest references, it was "concatenate". It wasn't until 7th edition
UNIX (1979) that "catenate" was given.

References:

\- 1971 draft (pre 1st edition) of the paper that would become the well-known
1974 CACM UNIX paper (earliest documentation on `cat` that I can find):
[https://www.tuhs.org/Archive/Distributions/Research/McIlroy_...](https://www.tuhs.org/Archive/Distributions/Research/McIlroy_v0/UnixEditionZero-
OCR.pdf) (tune in on page 28)

\- 6th edition cat(1) man page (1975):
[http://man.cat-v.org/unix-6th/1/cat](http://man.cat-v.org/unix-6th/1/cat)

\- 7th edition cat(1) man page (1979):
[http://man.cat-v.org/unix_7th/1/cat](http://man.cat-v.org/unix_7th/1/cat)

~~~
swoopitypoop
Latin root word "catena", meaning "chain".

~~~
loeg
Latin root "con-" ("com-") meaning "with," or "together." As in, "concatenate"
means something like, "chain together."

[https://www.etymonline.com/word/com-](https://www.etymonline.com/word/com-)

[https://www.etymonline.com/word/concatenate](https://www.etymonline.com/word/concatenate)

------
mitchtbaum
I only read the beginning and end, and I very much like the closing message
here.

A tldr of the middle would be cool. Maybe there was a pattern.

I'd like to add another OS not mentioned that will hopefully become a well-
appreciated artifact soon too, from Redox OS: [https://gitlab.redox-
os.org/redox-os/coreutils/blob/master/s...](https://gitlab.redox-os.org/redox-
os/coreutils/blob/master/src/bin/cat.rs)

I can't find it quickly now, but jackpot51 also has a very answer somewhere on
Reddit about how their networking stack's DNS query command departs from a
commonly deployed C program for Windows and Unix, iirc. fascinating

