Looks like Plan 9 and FreeBSD are the only ones to correctly handle write errors. The others just silently ignore them. I guess the moral is "Don't rely on echo to save important data."
Also, the comment…
/* This utility may NOT do getopt(3) option parsing. */
… which appears in two of them certainly cries out for an explanation. Was it true in NetBSD? Is it still true? Will people call us names if we do?
Update: bash does not handle errors. zsh might, it checks the result of fclose() or if stdout, fflush(). I'm not sure that is sufficient after ignoring all the fwrite() and fputc() errors. Plus it's in an 850 line function and my monitor is only 20" tall, so I'm not following all of it.
Update2: asveikau is correct. The only ones to attempt error handling miss short reads. FreeBSD also misses EAGAIN/EINTR handling. Plan9 doesn't need that, but its man page explicitly states that short writes should be considered errors by the caller.
I guess it's official. The unix read/write system calls are too complicated to be used by experts in the operating system. (I've rather thought that for some time now.)
There was actually a project for the GNU Coreutils to make sure that write errors were being reported[1]. It looks like echo is using the "atexit (close_stdout);" method described in that talk to close the stream and detect errors. You can see what close_stdout actually does here: http://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/closeou...
Arguably GNU echo is the only one to actually get this correct. Which is kinda amusing given how the comments on github are all about how ugly and bloated it is and how elegant the Plan9 and UnixV ones are.
The talk is worth a watch if you care about these things. It kind of re-enforces your point about read/write being to complicated to use correctly. Meyering found these kinds of bugs in lots of other programs like perl, python, rsync and emacs.
> Looks like Plan 9 and FreeBSD are the only ones to correctly handle write errors.
Both only check for negative return codes. If stdout were piped/redirected to something where write(2) may return a short byte count (let's say a socket) neither of them will detect it.
> /* This utility may NOT do getopt(3) option parsing. */
I noticed this too. My guess, totally not backed up by anything but by instinct, is getopt(3) tried to do too much which doesn't make sense for echo. For example if it tried to parse "echo -foobar" as anything other than "you spit out the string '-foobar'" I could see it being problematic.
So POSIX allows no options to the command, but if the first operand is "-n" then the result is undefined. that compromise between the BSD and SysV behavior just have taken some time to work out.
There is nothing wrong with complex code that is correct and readable. I imagine those who have a problem with the GNU version have not done a lot of shell scripting and therefore don't see a need for the complexity. The system V version would really suck in a lot of situations. Of course people love to criticize because it makes it sound like they know something.
Generally no, other than the fact that it is complex--complexity itself is a sin. Always has been, always will be, though it can be forgiven.
Anyways, what's the issue with the System V version, other than not handling newlines?
Also, what is the purpose with the Plan 9 loading the thing into a buffer first, other than perhaps to get the benefit of an all-or-nothing write? I suspect weirdness with the Plan 9 file handling but am not certain.
Complexity is inherent to any task in some degree. That complexity must be expressed in full in any program which performs that task correctly, whether in terms of language/library constructions or the ways in which those are combined. When the tools with which you implement a task (here, C and the core Unix API) are structured substantially differently than the high-level concepts of which that task is composed, there is additional complexity inherent to mapping those concepts.
In light of that, I don't understand what you mean by, "complexity itself is a sin," unless you're referring specifically to the strictly unnecessary complexity introduced by a sub-optimal mapping of high-level semantics to core constructs. I don't think that's the kind of complexity we're seeing here. I think we're seeing the complexity inherent to mapping shell-level semantics to C semantics and the semantics of the Unix API, specifically error-handling semantics.
I agree with jws that what this code demonstrates is that the core Unix API is difficult to use for writing precisely correct high-level programs, even ones as simple as echo. I feel that this is exactly why it's important that core high-level programs like echo encapsulate as much of the complexity of mapping to C as possible (that is, in the style of GNU rather than SysV): so that one can easily write reliable programs with high-level tools like the shell without worrying about that complexity.
what this fine list sorely misses is a true retelling of the "Plan 9 and the echo" story, as found here, on the Plan 9 mailing lists some millenia ago:
It references _The Unix Programming Environment_, Kernighan & Pike (1984), pp. 77-79, "A digression on echo" which I found in PDF and is an immensely enjoyable and enlightening read :)
Note that the plan9 version does some extra work so that it can send the entire buffer in one write() operation. This is because Plan 9 guarantees that messages smaller than the size of a pipe buffer are sent as an atomic write, given a sufficiently large recieve buffer on the other end.
This simplifies quite a bit of code that reads and writes commands into ctl files.
I wonder if, like a "Hello world" program, some of these simpler utilities might not even meet the minimum level of creativity to be eligible for copyright, so complexity was added - true.c and false.c are the other two examples I can think of; the "true" utility is basically generated by the default code template provided by many IDEs.
There's also a fun story floating around called "The UNIX and the Echo" about what it should do without any arguments, or whether or not it should accept options or escape sequences:
It would be more worthwhile to show various shells implementations of echo, because most of the time you are going to be running your shell's builtin echo instead of /bin/echo.
Wonderful to see the various flavours of the same simple command. Though GNU using a goto for what at quick glance is a non error situation does not sit well with me, but probably personal taste in code.
ALso surprised how many library's the FreeBSD and GNU flavours want to pull in.
Looking at the history of echo.c in the freebsd tree[1] is pretty interesting too. Seems before this [2] commit (apparently fixes for performance with many arguments) it was pretty simple.
To me this feels like a classic case of "premature optimisation" - by increasing the complexity of the code quite a bit (even adding dynamic allocation), they managed to save only 40 milliseconds on a benchmark involving the echoing of 1000 arguments.
Using 1000 arguments does not seem at all a typical use case for echo.
I'm not sure "premature" is the right word. I mean, they benchmarked, and maybe even profiled, and made it faster.
My guess is that somone had a weird use case for echo where the old version was slow. He benchmarked, profiled, made some changes, and submitted a patch. "The source code is too long," is a crappy argument, so might as well use the faster, longer code.
Well 40 ms can be a lot or a little, depending on the number after optimization. If, for example the input was 1000 1-letter arguments, 40 ms to output 2kb is too much.
Hehe I went in fully expecting GNU to have managed making something as simple as echo.c relatively complex and with inconceivable options. Was not disappointed. They probably even managed to include a GNUism or two.
The GNU echo is designed to be useable on a variety of systems which had different traditional behaviours - this comment in the source code is instructive:
/* System V machines already have a /bin/sh with a v9 behavior.
Use the identical behavior for these machines so that the
existing system shell scripts won't barf. */
(BSD systems supported suppressing newline with -n; UNIX System V systems supported \ escape sequences; POSIX came later and allowed either unless the XSI option is supported).
Yep, including all that hydra-style-compatibility as part of the very base implementation of anything, is part of what makes GNU code the way it is.
I mean... in main() there's a labeled goto for "just_echo" - even that section alone has something like 6 or 7 indentation levels. If I was writing satire code of GNU style I'd probably wouldn't have gone so far. Hehehe.
Then again, so do the BSD tools. And of course the options are often not compatible between the two e.g. bsd sed's -l enables line buffering, gnu sed's -l specifies line-wrapping length; bsd sed's -i requires an extension (empty for no backup) while gnu sed's does not.
POSIX sed supports exactly 3 options (-n, -e and -f), the latest FreeBSD sed supports 10 (adding -E, -a, -I, -i, -l, -r and -u — the last two being GNUism compatibility options not necessarily available on older versions, they are not on my OSX machine) and GNU sed supports 9 short and an additional 4 long options. And that's not counting the extensions to the sed command set.
True, which is why it is hard to keep up with multiple UNIX implementations and to know what is supported on a given box without having to reach for man first.
OPERANDS
The following operands shall be supported:
string
A string to be written to standard output.
[...]
The following character sequences shall be recognized on XSI-conformant systems within any of the arguments:
[...]
\c
Suppress the <newline> that otherwise follows the final argument in the output. All characters following the '\c' in the arguments shall be ignored.
Well, if you're going to downvote me, at least explain why you disagree; don't hide behind the mouse-click. I was merely asking why another of the freely available versions of echo was missing from this list; what's wrong with that?
Are you honestly expecting the author of this 4 year old commit to respond? Maybe if you included a link to the other version (even though it's already in a github comment) your first comment would have been useful and not downvoted, but as it is it's just pointlessly shared wondering that if serious would be better done as a private message to the author... as for your second, complaining about downvotes generally results in more.
I downvoted the post complaining about downvotes, but not the first post, since it was already downvoted. (Few things need to be downvoted/flagged to oblivion.) Since my attempt at explaining why the first post was downvoted received upvotes, at least we can surmise it's not totally off the mark to what others may think.
I wondered the same thing, or rather was going to post a link to busybox here, but it's already in a comment to the original article. Not sure that's why you got downvoted though, seems a bit harsh as re-posting interesting links related to the original post is quite common here and not something I thought was frowned upon.
Also, the comment…
… which appears in two of them certainly cries out for an explanation. Was it true in NetBSD? Is it still true? Will people call us names if we do?Update: bash does not handle errors. zsh might, it checks the result of fclose() or if stdout, fflush(). I'm not sure that is sufficient after ignoring all the fwrite() and fputc() errors. Plus it's in an 850 line function and my monitor is only 20" tall, so I'm not following all of it.
Update2: asveikau is correct. The only ones to attempt error handling miss short reads. FreeBSD also misses EAGAIN/EINTR handling. Plan9 doesn't need that, but its man page explicitly states that short writes should be considered errors by the caller.
I guess it's official. The unix read/write system calls are too complicated to be used by experts in the operating system. (I've rather thought that for some time now.)