Unix V5, OpenBSD, Plan 9, FreeBSD, and GNU Coreutils Implementations of Echo.c

jws · on Feb 22, 2015

Looks like Plan 9 and FreeBSD are the only ones to correctly handle write errors. The others just silently ignore them. I guess the moral is "Don't rely on echo to save important data."

Also, the comment…

    /* This utility may NOT do getopt(3) option parsing. */

… which appears in two of them certainly cries out for an explanation. Was it true in NetBSD? Is it still true? Will people call us names if we do?

Update: bash does not handle errors. zsh might, it checks the result of fclose() or if stdout, fflush(). I'm not sure that is sufficient after ignoring all the fwrite() and fputc() errors. Plus it's in an 850 line function and my monitor is only 20" tall, so I'm not following all of it.

Update2: asveikau is correct. The only ones to attempt error handling miss short reads. FreeBSD also misses EAGAIN/EINTR handling. Plan9 doesn't need that, but its man page explicitly states that short writes should be considered errors by the caller.

I guess it's official. The unix read/write system calls are too complicated to be used by experts in the operating system. (I've rather thought that for some time now.)

anon1385 · on Feb 22, 2015

There was actually a project for the GNU Coreutils to make sure that write errors were being reported[1]. It looks like echo is using the "atexit (close_stdout);" method described in that talk to close the stream and detect errors. You can see what close_stdout actually does here: http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/closeou...

Arguably GNU echo is the only one to actually get this correct. Which is kinda amusing given how the comments on github are all about how ugly and bloated it is and how elegant the Plan9 and UnixV ones are.

The talk is worth a watch if you care about these things. It kind of re-enforces your point about read/write being to complicated to use correctly. Meyering found these kinds of bugs in lots of other programs like perl, python, rsync and emacs.

[1] http://www.irill.org/events/ghm-gnu-hackers-meeting/videos/j... (slides: http://www.gnu.org/ghm/2011/paris/slides/jim-meyering-goodby... )

asveikau · on Feb 22, 2015

> Looks like Plan 9 and FreeBSD are the only ones to correctly handle write errors.

Both only check for negative return codes. If stdout were piped/redirected to something where write(2) may return a short byte count (let's say a socket) neither of them will detect it.

> /* This utility may NOT do getopt(3) option parsing. */

I noticed this too. My guess, totally not backed up by anything but by instinct, is getopt(3) tried to do too much which doesn't make sense for echo. For example if it tried to parse "echo -foobar" as anything other than "you spit out the string '-foobar'" I could see it being problematic.

johnny22 · on Feb 22, 2015

as mentioned above, it's a POSIX thing.

tedunangst · on Feb 22, 2015

http://pubs.opengroup.org/onlinepubs/009604599/utilities/ech...

jws · on Feb 22, 2015

So POSIX allows no options to the command, but if the first operand is "-n" then the result is undefined. that compromise between the BSD and SysV behavior just have taken some time to work out.

jcoffland · on Feb 22, 2015

There is nothing wrong with complex code that is correct and readable. I imagine those who have a problem with the GNU version have not done a lot of shell scripting and therefore don't see a need for the complexity. The system V version would really suck in a lot of situations. Of course people love to criticize because it makes it sound like they know something.

angersock · on Feb 22, 2015

Generally no, other than the fact that it is complex--complexity itself is a sin. Always has been, always will be, though it can be forgiven.

Anyways, what's the issue with the System V version, other than not handling newlines?

Also, what is the purpose with the Plan 9 loading the thing into a buffer first, other than perhaps to get the benefit of an all-or-nothing write? I suspect weirdness with the Plan 9 file handling but am not certain.

sjolsen · on Feb 23, 2015

>complexity itself is a sin

Complexity is inherent to any task in some degree. That complexity must be expressed in full in any program which performs that task correctly, whether in terms of language/library constructions or the ways in which those are combined. When the tools with which you implement a task (here, C and the core Unix API) are structured substantially differently than the high-level concepts of which that task is composed, there is additional complexity inherent to mapping those concepts.

In light of that, I don't understand what you mean by, "complexity itself is a sin," unless you're referring specifically to the strictly unnecessary complexity introduced by a sub-optimal mapping of high-level semantics to core constructs. I don't think that's the kind of complexity we're seeing here. I think we're seeing the complexity inherent to mapping shell-level semantics to C semantics and the semantics of the Unix API, specifically error-handling semantics.

I agree with jws that what this code demonstrates is that the core Unix API is difficult to use for writing precisely correct high-level programs, even ones as simple as echo. I feel that this is exactly why it's important that core high-level programs like echo encapsulate as much of the complexity of mapping to C as possible (that is, in the style of GNU rather than SysV): so that one can easily write reliable programs with high-level tools like the shell without worrying about that complexity.

sangnoir · on Feb 22, 2015

Correctness beats complexity any day. Incorrectness is an unforgivable sin.

stonogo · on Feb 22, 2015

Those of us who program in shells that are not broken do not need echo to do tricks for us.

jcoffland · on March 7, 2015

You must be a *BSD user.

f2f · on Feb 22, 2015

what this fine list sorely misses is a true retelling of the "Plan 9 and the echo" story, as found here, on the Plan 9 mailing lists some millenia ago:

The Plan 9 and the Echo:

http://9fans.net/archive/2001/09/54

(for the history buffs: yes, that is dennis ritchie posting in that thread)

BuildTheRobots · on Feb 22, 2015

Excellent link; thank you.

It references _The Unix Programming Environment_, Kernighan & Pike (1984), pp. 77-79, "A digression on echo" which I found in PDF and is an immensely enjoyable and enlightening read :)

[1] http://books.cat-v.org/computer-science/unix-programming-env...

ori_b · on Feb 22, 2015

Note that the plan9 version does some extra work so that it can send the entire buffer in one write() operation. This is because Plan 9 guarantees that messages smaller than the size of a pipe buffer are sent as an atomic write, given a sufficiently large recieve buffer on the other end.

This simplifies quite a bit of code that reads and writes commands into ctl files.

lallysingh · on Feb 22, 2015

Why copy it all instead of writev()?

ori_b · on Feb 22, 2015

http://plan9.bell-labs.com/sources/plan9/sys/src/libc/9sys/w...

In other words, writev on plan9 isn't a system call -- it does the same copying that echo.c does. Also, echo.c probably predates writev.c.

dmm · on Feb 22, 2015

Two months ago OpenBSD removed the unused stdlib.h include which was present since they forked from NetBSD in March 1995.

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/echo/echo.c...

userbinator · on Feb 22, 2015

Interestingly enough there's another implementation of echo on its wiki page, which is as succinct and simple as the Unix V5 version:

http://en.wikipedia.org/wiki/Echo_(command)

I wonder if, like a "Hello world" program, some of these simpler utilities might not even meet the minimum level of creativity to be eligible for copyright, so complexity was added - true.c and false.c are the other two examples I can think of; the "true" utility is basically generated by the default code template provided by many IDEs.

There's also a fun story floating around called "The UNIX and the Echo" about what it should do without any arguments, or whether or not it should accept options or escape sequences:

http://stackoverflow.com/questions/3290683/bloated-echo-comm...

The official POSIX definition leaves it implementation-defined, with XSI forbidding options and requiring escape sequences:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ec...

It's amazing how much variety can be present for functionality that seems so simple at first glance.

_2bgp · on Feb 22, 2015

What's the minimum amount of creativity required to copyright software? Or is it really complexity, and not creativity?

tbirdz · on Feb 22, 2015

It would be more worthwhile to show various shells implementations of echo, because most of the time you are going to be running your shell's builtin echo instead of /bin/echo.

pritambaral · on Feb 22, 2015

Bash: http://git.savannah.gnu.org/cgit/bash.git/tree/builtins/echo...

ZSH: http://sourceforge.net/p/zsh/code/ci/a4ff8e69570cbdb8e7d5bf1...

arh68 · on Feb 22, 2015

Thanks for posting the bash source. It looked a lot like the GNU source, then I noticed at the top of it:

    /* echo.c, derived from code echo.c in Bash.

Zenst · on Feb 22, 2015

Wonderful to see the various flavours of the same simple command. Though GNU using a goto for what at quick glance is a non error situation does not sit well with me, but probably personal taste in code.

ALso surprised how many library's the FreeBSD and GNU flavours want to pull in.

stock_toaster · on Feb 22, 2015

Looking at the history of echo.c in the freebsd tree[1] is pretty interesting too. Seems before this [2] commit (apparently fixes for performance with many arguments) it was pretty simple.

[1]: https://github.com/freebsd/freebsd/blob/master/bin/echo/echo...

[2]: https://github.com/freebsd/freebsd/commit/cbf5708f4336b824b8...

userbinator · on Feb 22, 2015

To me this feels like a classic case of "premature optimisation" - by increasing the complexity of the code quite a bit (even adding dynamic allocation), they managed to save only 40 milliseconds on a benchmark involving the echoing of 1000 arguments.

Using 1000 arguments does not seem at all a typical use case for echo.

jlarocco · on Feb 22, 2015

I'm not sure "premature" is the right word. I mean, they benchmarked, and maybe even profiled, and made it faster.

My guess is that somone had a weird use case for echo where the old version was slow. He benchmarked, profiled, made some changes, and submitted a patch. "The source code is too long," is a crappy argument, so might as well use the faster, longer code.

ild · on Feb 22, 2015

Well 40 ms can be a lot or a little, depending on the number after optimization. If, for example the input was 1000 1-letter arguments, 40 ms to output 2kb is too much.

twic · on Feb 22, 2015

The Rust version, from the coreutils rewrite project:

https://github.com/uutils/coreutils/blob/master/src/echo/ech...

oso2k · on Feb 22, 2015

Here's the 31 line (not sloc) echo in sbase [0].

[0] http://git.suckless.org/sbase/tree/echo.c

muyuu · on Feb 22, 2015

Hehe I went in fully expecting GNU to have managed making something as simple as echo.c relatively complex and with inconceivable options. Was not disappointed. They probably even managed to include a GNUism or two.

caf · on Feb 22, 2015

The GNU echo is designed to be useable on a variety of systems which had different traditional behaviours - this comment in the source code is instructive:

  /* System V machines already have a /bin/sh with a v9 behavior.
  Use the identical behavior for these machines so that the
  existing system shell scripts won't barf. */

(BSD systems supported suppressing newline with -n; UNIX System V systems supported \ escape sequences; POSIX came later and allowed either unless the XSI option is supported).

muyuu · on Feb 22, 2015

Yep, including all that hydra-style-compatibility as part of the very base implementation of anything, is part of what makes GNU code the way it is.

I mean... in main() there's a labeled goto for "just_echo" - even that section alone has something like 6 or 7 indentation levels. If I was writing satire code of GNU style I'd probably wouldn't have gone so far. Hehehe.

abusque · on Feb 22, 2015

What do you mean by GNUism?

muyuu · on Feb 22, 2015

Non-POSIX options that may break compatibility at the user space (scripts, etc). Some people also include GCC stuff that is not standard compatible.

pjmlp · on Feb 22, 2015

Many young generations not touched by UNIX think way GNU tools behave == UNIX, when they usually offer more features than the real ones.

masklinn · on Feb 22, 2015

Then again, so do the BSD tools. And of course the options are often not compatible between the two e.g. bsd sed's -l enables line buffering, gnu sed's -l specifies line-wrapping length; bsd sed's -i requires an extension (empty for no backup) while gnu sed's does not.

POSIX sed supports exactly 3 options (-n, -e and -f), the latest FreeBSD sed supports 10 (adding -E, -a, -I, -i, -l, -r and -u — the last two being GNUism compatibility options not necessarily available on older versions, they are not on my OSX machine) and GNU sed supports 9 short and an additional 4 long options. And that's not counting the extensions to the sed command set.

pjmlp · on Feb 22, 2015

True, which is why it is hard to keep up with multiple UNIX implementations and to know what is supported on a given box without having to reach for man first.

masklinn · on Feb 23, 2015

I long for a straight POSIX-and-absolutely-nothing-else version of the core binaries, but haven't seen one so far.

Definitely less convenient, but would make building cross-platform scripts much easier.

vezzy-fnord · on Feb 22, 2015

Previously on Hacker News: https://news.ycombinator.com/item?id=2780661

agumonkey · on Feb 22, 2015

I read this as a tv show introduction...

_ZeD_ · on Feb 22, 2015

following one of the comment:

    @astro Yes. An XSI compliant echo actually supports no options at all. See here for how echo is supposed to work: POSIX. 2008 §echo

the link goes to http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ec... ; here I read

     OPERANDS
     
     The following operands shall be supported:
     
     string
         A string to be written to standard output.
         [...]
         The following character sequences shall be recognized on XSI-conformant systems within any of the arguments:
         [...]
         \c
             Suppress the <newline> that otherwise follows the final argument in the output. All characters following the '\c' in the arguments shall be ignored.

wait, what???

halayli · on Feb 22, 2015

For the curious, argv[0] can be null if exec* function didn't pass arguments to the program.

augustk · on Feb 22, 2015

Although GCC issues the warning "null argument where non-null required" if -Wnonull is present.

williammmeyer · on Feb 22, 2015

If you find this sort of thing interesting and are located in NYC there is a meetup devoted to it:

http://www.meetup.com/Classical-Code-Reading-Group-of-New-Yo...

andrewstuart2 · on Feb 22, 2015

Is there a good reason to do `argc--` and then `<=` in the if statement versus just using `<`?

I know those guys were pretty smart, so I'm legitimately curious.

zarriak · on Feb 22, 2015

I think the reason they used it was to make the

i==argc? '\n': ' '

part of the print statement cleaner.

pjmlp · on Feb 22, 2015

A good example of how even simple commands like echo, behave across multiple UNIX implementations.

arbre · on Feb 22, 2015

What if argc == 1 in the OpenBSD version ? You evaluate *++argv which is not defined.

andrewf · on Feb 22, 2015

argv[argc] is defined to be a null pointer.

emmelaich · on Feb 22, 2015

I don't like 'nflag'. Something like noecho reads much more nicely.

mackal · on Feb 22, 2015

Just noecho makes no sense. Passing -n suppresses printing the new line, noecho sounds like it suppresses all output

emmelaich · on March 1, 2015

Ah yeah. Braino.

I meant --no-nl or similar.

cogburnd02 · on Feb 22, 2015

Is there any particular reason the busybox version is missing?

cogburnd02 · on Feb 22, 2015

Well, if you're going to downvote me, at least explain why you disagree; don't hide behind the mouse-click. I was merely asking why another of the freely available versions of echo was missing from this list; what's wrong with that?

Jach · on Feb 22, 2015

Are you honestly expecting the author of this 4 year old commit to respond? Maybe if you included a link to the other version (even though it's already in a github comment) your first comment would have been useful and not downvoted, but as it is it's just pointlessly shared wondering that if serious would be better done as a private message to the author... as for your second, complaining about downvotes generally results in more.

anonbanker · on Feb 23, 2015

At least we can surmise who was responsible for the downvoting.

Jach · on Feb 23, 2015

I downvoted the post complaining about downvotes, but not the first post, since it was already downvoted. (Few things need to be downvoted/flagged to oblivion.) Since my attempt at explaining why the first post was downvoted received upvotes, at least we can surmise it's not totally off the mark to what others may think.

unwind · on Feb 22, 2015

I wondered the same thing, or rather was going to post a link to busybox here, but it's already in a comment to the original article. Not sure that's why you got downvoted though, seems a bit harsh as re-posting interesting links related to the original post is quite common here and not something I thought was frowned upon.