
Unix V5, OpenBSD, Plan 9, FreeBSD, and GNU implementations of echo.c - dchest
https://gist.github.com/1091803
======
chaosfox
slightly related but, I also find it funny that most of the time we are
running none of these..

    
    
      $type echo
      echo is a shell builtin

------
scythe
Okay, so... from UNIX v5, OpenBSD added a -n flag that prevents a trailing
newline, Plan 9 adds the -n flag and pushes the argv into a buffer (why?)
before printing, FreeBSD does all that and also prevents a trailing newline if
the last argument ends with "\c" (why?), and GNU does... something
complicated.

~~~
mikelward
"adds the argv into a buffer (why?)"

It's calling "write", which is a system call. Without the buffer, it would
call write once for argv, and once for the newline if nflag is not set.
Calling a system call twice would result in twice as many context switches,
and thus be very slightly slower.

~~~
scythe
Oh, that makes sense.

Speaking of clever tricks, would it be faster or slower to use c99's variable-
length arrays instead of malloc?

~~~
alexgartrell
IIRC faster, because variable-length arrays in c99 just bump down the stack
pointer, while malloc can be pretty expensive.

But a single malloc is nothing compared to the cost of a context switch into
kernel mode.

~~~
apaprocki
You can simply use alloca() as a replacement for malloc() to avoid depending
on VLA support.

~~~
cpeterso
Or just declare a large static buffer.

~~~
tedunangst
"Nobody will ever echo this much data!"

~~~
tmp43522
It's actually not a bad suggestion

    
    
        #define BUF_INITIAL 1024
        char buf[BUF_INITIAL];
        
        int main(int argc, char** argv)
        {
            char* p;
            ...
            p = len > BUF_INITIAL ? malloc(len) : buf;
            ...
            write(1, p, len);
            ...
        }

~~~
cpeterso
djb's allocator alloc() does this. He preallocates a 4 KB static buffer before
hitting system malloc(). Avoiding the overhead of malloc() is pretty important
for the performance of systems like qmail that fork many small processes.

~~~
tedunangst
That's ridiculous. The overhead of fork is about a bajillion times higher than
malloc.

~~~
adestefan
With a COW fork() I bet it's smaller. I smell a test coming on, but alas it's
late here and I'm going to bed.

I'm also guessing that 4k was chosen because a malloc() of 1 page is faster
than a malloc() of >1 page. Of course that's with the assumption that the
systems use a 4k page size.

------
p9idf
cat.c is also interesting. The GNU version is an appalling unreadable mess.
V6's assembly implementation in easier to understand.

V6:
[http://www.bsdlover.cn/study/UnixTree/V6/usr/source/s1/cat.s...](http://www.bsdlover.cn/study/UnixTree/V6/usr/source/s1/cat.s.html)
V7:
[http://www.bsdlover.cn/study/UnixTree/V7/usr/src/cmd/cat.c.h...](http://www.bsdlover.cn/study/UnixTree/V7/usr/src/cmd/cat.c.html)
Plan 9: <http://plan9.bell-labs.com/sources/plan9/sys/src/cmd/cat.c> BSD:
[http://www.koders.com/c/fidF501905968D8BE7BBDD355C3C8DB62804...](http://www.koders.com/c/fidF501905968D8BE7BBDD355C3C8DB628048A0DEDE.aspx?s=netbsd#L1)
GNU:
[http://git.savannah.gnu.org/cgit/coreutils.git/plain/src/cat...](http://git.savannah.gnu.org/cgit/coreutils.git/plain/src/cat.c)

~~~
bdonlan
V6's assembly version also does less. The reason GNU's is so complex is
because it has a line-numbering feature (cat -n) not supported in V6 or V7,
and also tries to read and write large chunks, to avoid overhead from calling
stdio functions in a loop. It also tries to take advantage of non-portable
extensions where possible, but fall back to portable code when not supported.
Yeah, it looks a bit complex at first, but it's not really that bad if you
actually take the time to read it.

~~~
p9idf
Those features simply do not belong in a program whose purpose is to
concatenate its input. If you want to number a file's lines, 'echo ,n | ed
file | sed 1d' or 'awk ''{ print NR " " $0 }''' will do just fine. You could
even wrap your ed or awk script into a shell script with a descriptive name
like "lineno" rather than something silly like "cat -n". The reason GNU's is
so complex is because it does many things and does them poorly. The V6
implementation does exactly what is says on the tin, does it well, and does
nothing more: it catenates files.

~~~
ianb
I am confused by your definition of "poorly". Are you asserting that GNU cat
is slow, or unportable, or uses too much memory, or some other actual
noticeable problem?

~~~
p9idf
Yes. Gnu cat is slow. [http://hnwriteup.blogspot.com/2011/07/gnu-cat-vs-
plan-9-cat....](http://hnwriteup.blogspot.com/2011/07/gnu-cat-vs-
plan-9-cat.html)

------
IvarTJ
That must have been one of Ken Thompson’s more productive days.

(alluding to a quote from him that I can’t source, “One of my most productive
days was throwing away 1000 lines of code.”)

~~~
silentbicycle
Nah, the codebase just hadn't been touched by the FSF yet.

See also, "UNIX Style, or cat -v Considered Harmful"
(<http://harmful.cat-v.org/cat-v/>).

It seems telling that the GNU echo's source is "derived from code echo.c in
Bash."

~~~
andrewcooke
i don't get while people are being snide about code that does more and so has
more lines. a small amount of code is very nice and elegant, but if it doesn't
do what people need then it's pointless.

~~~
coliveira
You are making the system more complicated for everyone because of features
that only a few users know about. This is how code bloat starts its life
cycle.

If you need more features from a basic utility like echo or cat you should
create your own version, maybe with a slightly different name, and leave the
original as it is.

~~~
Duff
That's exactly what happened.

Those crazy kids at Berkeley cooked up BSD, which was written to meet their
needs and subsequently forked into a few variants. The GNU people made a GNU
collection of core utilities that met their particular needs and desires.

The Unix nerds at my University felt as you did, and ran a Unix System 5
variant into the late 90's.

------
mikelward
The "-n" option was added to UNIX around v6.

The "-n" special case opened the floodgates for many more options. And what if
I actually wanted to print "-n"? There's no way to do it.

~~~
ilikejam
Sure you can:

dave@cronus $ ./echo -n "-n

dave@cronus > "

-n

dave@cronus $

Not pretty, though.

~~~
mturmon
Cute, but does not work with bash builtin echo, for which two -n's equals one
-n.

    
    
       bash-3.2$ echo -n -n foo
       foobash-3.2$ 
    

Like always with echo -n, no matter what you try, it's not portable.

~~~
ilikejam
So...

[dave@mini ~]$ echo -n "-n foo

> "

-n foo

[dave@mini ~]$

Easy!

~~~
mturmon
All respect for persistence, but that's not a solution, because the OP wanted
to echo just "-n". I put the foo in to better show what was happening.

For similar reasons,

    
    
      echo "" -n
    

does not work, etc.

~~~
ilikejam
' echo -n "-n

" ' works fine with the bash built-in. It echo's "-n".

------
mikelward
You can see the original UNIX sources at <http://minnie.tuhs.org/cgi-
bin/utree.pl>

------
pixelbeat
One should avoid echo anyway due to portability issues. Use printf instead.

~~~
robtoo
More *nix systems have a /bin/echo than /bin/printf or /usr/bin/printf

~~~
mikelward
Yes, and they all do something dfferent. Things like "-e" and treatment of
"\c" are not uniform.

If a system has printf, and I think all recent Unix-like systems do, then it
works at least 98% the same.

------
gose
Here's Mac OS X's:

[http://www.opensource.apple.com/source/shell_cmds/shell_cmds...](http://www.opensource.apple.com/source/shell_cmds/shell_cmds-149/echo/echo.c)

It's close to the FreeBSD implementation.

------
alister
I just wish that in the early UNIX days they reserved some of the flags to
mean _one_ thing only, and required all commands to have them (where it made
sense). Like:

-r recursive (i.e., it should always be recursive mode if a command operates on files, and should always exist if it makes sense for that command)

-v verbose

-s sort

-i ignore case

-q quiet (suppress output)

If there were, say, 20 well-chosen standard flags (and they were enforced) it
could have given the UNIX tools another level of nice regularity.

------
brown9-2
I do not have much experience reading C code. Is the use of gotos and labels
in the GNU code common?

~~~
sjs
In C it's generally accepted that forward jumps using goto are okay because
the version that avoids goto would be complicated and confusing. Often used
for error handling and memory management.

~~~
sliverstorm
Error handling because you want to make a beeline to the handler, and memory
management because you need it to be quick and goto is cheap?

~~~
sparky
In both cases, to avoid duplicating code. If you malloc() something in your
function and intend on free()ing it before returning, it's often considered
best practices to write the return block (which includes the free()s) once,
and use "goto returnblocklabel" if you need to return early.

So rather than this:

    
    
      void test()
      {
        int *x = malloc(1000);
        int *y = malloc(1000);
        for(int i = 0; i < 999; i++) {
          if(badThing) {
            free(x);
            free(y);
            return;
          }
          doStuff()
          if(otherBadThing) {
            free(x);
            free(y);
            return;
          }
        }
        free(x);
        free(y);
        return;
      }
    

You'd have:

    
    
      void test()
      {
        int *x = malloc(1000);
        int *y = malloc(1000);
        for(int i = 0; i < 999; i++) {
          if(badThing)
            goto ret;
          doStuff()
          if(otherBadThing)
            goto ret;
        }
      ret:
        free(x);
        free(y);
        return;
      }

~~~
Jach
I realize your example is contrived, but a simple 'break' statement (which in
essence is a goto...) would work just as well. :) I somewhat vaguely recall a
situation in my C class that I wanted to use goto to avoid duplicate code but
the professor had previously threatened huge negative points if his scripts
detected one. (That whole semester was just as much about conforming your code
to his narrow specifics because "that's what happens in the real world." as
learning C.)

------
schiptsov
touch.c is another good example. Especially one from Open Solaris. ^_^

------
hackermom
Following the general software trend, it just keeps growing bigger and slower
while still doing nothing new. But is this used anywhere, really? At least
Bash uses its builtin.

~~~
p9idf
Bash's implementation[1] isn't especially fast. It gets its speed by not
having to fork, which is expensive on systems with dynamic linking.

[1]
[http://git.savannah.gnu.org/cgit/bash.git/plain/builtins/ech...](http://git.savannah.gnu.org/cgit/bash.git/plain/builtins/echo.def)
[http://git.savannah.gnu.org/cgit/bash.git/plain/support/rech...](http://git.savannah.gnu.org/cgit/bash.git/plain/support/recho.c)
[http://git.savannah.gnu.org/cgit/bash.git/plain/support/zech...](http://git.savannah.gnu.org/cgit/bash.git/plain/support/zecho.c)

------
nvictor
wow :O

