Well, what exactly is their role? Is there a limit to what we can do with these things?
To test a little programming language I made, I created a testing framework with bash and coreutils. I felt guilty about not using a "proper" language at first but it works so well. In parallel too.
I found that the the only thing I couldn't test was the argv[0] of the program. No matter how much I twisted the programs, I couldn't get them to do exactly what I wanted. So I sent a feature request and a patch to coreutils to give env this feature:
Hey if this makes it congratulations! I've never contributed to something so ubiquitous before but often thought how satisfying it'd be.
Like in years down the line, maybe as you retire and hang up your keyboard, you could sit back and smile as you realise your code is still deployed on millions, possibly billions of devices? That the code could far outlast 99.9% of code code anyone has ever written?
Well, I don't know if they're gonna use my patch. I reported a bug to gpg once and sent a patch in but in the end the maintainer rearchitected the code and gave me attribution. Perhaps they require copyright reassignment?
I can't deny I'm gonna be really happy if they do use it.
I did try it. Didn't work in combination with env.
# Sets PWD and SHLVL and the latter apparently can't be unset
env -i VARIABLE=value bash -c 'exec -a program ./program'
# Sets env's argv0, not my program's
bash -c 'exec -a program env -i VARIABLE=value ./program'
# SHLVL still set
env -i bash -c 'unset SHLVL; unset PWD; exec -a program ./program'
# SHLVL and _ still set
env -i zsh -c 'unset _ HOME PWD LOGNAME SHLVL OLDPWD; ARGV0=program
To test my programming language. It's a freestanding lisp interpreter that doesn't link to libc. I wrote the code that handles the environment variables and in order to test it I needed full control over the program's inputs including its environment. The env utility provides this control by emptying the environment and setting only the variables I specify, solving 90% of the problem. Only thing I still can't control is argv[0]. With this new feature upstreamed, my test suite will be complete.
I did not know about that possibility! It requires glibc and its dynamic linker though, doesn't seem to be as widely available as the GNU coreutils. I developed lone inside Termux: it does not have ld.so but does have env. There's a GNU binutils ld but it did not recognize the --argv0 option when I tried it.
ld.so is the dynamic linker. It is always used when invoking a dynamically linked program but you can invoke it manually which allows you to specify optionsfor the linker. But the name and path you need to invoke depends on the architecture and possibly distro, e.g. for amd64 these days its
I think that was meant to be understood in context as "meant to be actively developed" (as opposed to maintained) for a long time. The practices the author lists are ones that in larger programs would typically be criticized for damaging maintainability as the program grows.
I think perhaps the author meant "long development life." As in, they are basically write-once utilities. Part of the benefit of doing "one thing well."
weirdly I interpreted "long life" as a continually running program vs something that had a very clear input, short lived execution, and a clearly defined output.
I noticed at least one error, if the author is here. The short description on the shred page[0] is actually the description for csplit[1]. It should be something along the lines of "overwrite a file to hide its contents, and optionally delete it".
Cool, I didn't know this existed. I think simple ones like `yes` can be very interesting just to see how the base code of a utility (that writes to stdout) is written.
Replying to myself because the thread got trolled:
My point was in part that it's valuable for even a simple utility to be well written and optimized and that it's nice to have these minimal examples to learn about how to, e.g. write output very quickly. The program is so short that presumably the number of lines is unimportant, and if the author knows how to do it they might as well make it as fast as possible so it's never in the way, and so we can learn from it. That's why I think it's a good example.
um, who cares? most times people are needing to use "yes" as part of a "configure" script or similar. even if GNU yes was 10 times slower, it would NEVER be the bottleneck in any situation. so whats the point?
I presume somebody had a need, and scratched the itch.
With modern systems we can have 8 channel RAM and 128 PCIe lanes to feed a system stuffed with NVMe drives. The amount of throughput that can be obtained is nuts, and at that point all sorts of weird things can become an unexpected bottleneck.
This applies even in consumer systems. Suddenly your game loads far slower on a NVMe than it could because it never occurred to the programmer that instead of the disk, the JPEG decoder can become a bottleneck when you can read compressed data at 7 GB/s.
a decade or so ago, I ran into a problem where "yes" was already too fast, feeding it to apt-get install or something (in a context where we knew what the prompts were doing and if anything unusual happened we'd fail other checks immediately) and it had some clever input buffering going on... that wasn't clever enough and locked up on getting a megabyte of yes'es on the first read() call.
I've heard before that a lot of GNU programs are stretched like this so that they don't resemble anything from proprietary AT&T unix. E.g. all the extra options in basically everything and the ridiculous optimization of things like yes.
I remember reading that the GNU project was rather keen on avoiding and removing artificial restrictions. In other words, if a feature was useful and reasonable then it should be seriously considered for inclusion. As for the aggressive optimizations that are sometimes seen, I assume it’s just people wanting to improve a tool that’ll be used everywhere, more for fun and/or notoriety than anything else.
It feels like you're making a bad-faith argument here. You can implement 'yes' in a straightforward way in a couple lines of C, too.
main(int argc, char** argv) {
while (1) {
if (argc > 1)
for (int i = 1; i < argc; i++) printf("%s%c", argv[i], (i == argc - 1) ? '\n' : ' ');
else puts("y");
}
}
The point other folks are making is that it's written differently for a reason. Maybe not a reason that's important, but at the very least, let's try to compare apples to apples.
I don't see include, which means you're ignoring warnings aren't printing them in the first place. Also testing against a number instead of boolean. Also you have a horrible hack instead of proper flag parsing. And you're also abusing brace elision just to reduce LOC. And again abusing ternary syntax for the same reason.
It might be ugly, but that is idiomatic C code. C didn't even have boolean types until C99, and even then it's an "extension" of an integer type.
You could argue about the loop itself, after all K&R specified "for(;;)", but the other commonly used (ergo idiomatic) infinite loops use precisely the same number of lines.
"while(1)" is a perfectly idiomatic manner to create an infinite loop.
Likewise a void return type for main was entirely legal until C99. The BSD yes(1) I've laying around only prints the first argument, so flag parsing? What flag parsing?
Yes in nine lines of C inclusive of preprocessor macro invocations and white space.
There are no flags in yes(1) ergo there's no need for "flag parsing". yes(1) takes one optional string as input, and that's exactly what argv provides. I'm not sure what you think "flag parsing" is bringing to the table here, but checking the array of command line parameters and accessing an element is pretty far from a hack.
If it's more comfortable you can also declare argv as an array of character arrays e.g. char *[], but that won't change the line count.
C can obviously parse flags. See for example the plethora of C software that does so, such as many things from GNU Coreutils.
There is no requirement to parse flags for the basic functionality of yes. You implemented that yourself. You can remove your own requirement whenever you want. You don't need to parse a flag, at most you need to parse a string from the command line arguments.
But you're also arguing in bad faith. Your go code is shorter, okay, but it doesn't do the same thing as the GNU yes code, so what point are you trying to make? I can also link to philosophy 101 wikipedia articles:
I think I have made it pretty clear already, but here it is again:
the Go code has MORE functionality (flag parsing) with LESS code. yes its not as fast, and yes the executable is larger, but for many, thats a good tradeoff for the extra standard library features, and the reduced LOC/code complexity. sadly as of yet, I haven't seen any cogent technical arguments against my points thus far.
> the Go code has MORE functionality (flag parsing) with LESS code.
Your code does not have more functionality than GNU's yes as written. It's less code you have to write because of the flag parsing code that has already been written, and it's incompatible with GNU's yes because yours requires -m to change the message.
For an extremely simple utility like the 'yes' command that is compiled and distributed as a binary to trillions of installations what metric do you consider more important, size and speed? Or lines of code in the source? Think about this in engineering terms, everything is a tradeoff and it's your job to come up with the best solution.
previous comments have demonstrated this not to be the case, so I will stand by my previous points. I have already made over 10 comments on this one topic, so if any aren't already convinced, they never will be, either because they disagree with the tradeoff, or they just have stockholm syndrome for C.
I mean, yes, go has proper flag parsing as part of the standard library and C doesn’t. Yes that’s going to make a line count difference but it’s also why code golf arguments are pointless.
> go has proper flag parsing as part of the standard library and C doesn’t
That's the whole point. Every single command line program needs command line parsing. Go helps me get the job done, C forces me to write my own parser, or find some third party one.
Yeah, but it’s horses for courses. The C version can be deployed in far more places and can be far faster than the Go equivalent. Which is “better” is a contextual judgement call. There’s plenty of weird architectures out there that run C and almost nothing else.
Yes takes a single string with no embellishment, and that's what C provides. There's nothing additional to parse. There are no flags, no additional options, nothing else to configure… and that's by design as there's simply no need.
It's not about hard drive space, it's about start up time.
The 10x LOC (despite being a huge exaggeration) is also the fixed cost, not the marginal cost. You're only forging main loop and includes, not any fundamental complexity.
This is also a funny argument coming from a Go programmer considering that Go trades off conciseness and expressivity for simplicity. Show me some of your favorite Go and I'm sure we can replace it with some concise C++.
But if everything was written in Go it might be 1 megabyte * the number of users of coreutils. Might be worth using a few more lines of code to save a little bit of space on a lot of machines.
That isn't an apples-to-apples comparison. A C version could be similarly short, but the GNU yes implementer got carried away with efficiency considerations. If the Go program did the same, it would be longer.
I'm not saying this is the reason, but unnecessary (or sometimes dangerous if used in real life) optimizations have been created to speed up QA runs (of course, that only works if what you optimize away is not part of what you're trying to test), e.g., libeatmydata.
But for 'yes', I'd agree with you, though I guess the answer to "who cares?" is that whomever wrote it cared. They could have a legit performance reason or may just have done it to show they could.
Fyi before anyone else decides to read the idiocy that follows in this thread, this is guy is a known troll with a long history of this exact behavior. Same person:
Kind of a fun list of basic utilities. I've been using UNIX for a loooong time and had never heard of e.g. `shred`, `shuf`, or `factor`. Makes me want to try
sudo find / -type f -exec shred {} \;
to see how far it gets before killing itself (on a VM or easily re-flashed machine of course).
> to see how far it gets before killing itself (on a VM or easily re-flashed machine of course).
I did this, but with dd -- it completed. Was very anti-climatic. I was hoping it would crash or at least disconnect me, but the kernel, sshd and bash were still in memory and happily returned me to a prompt where I couldn't really do anything.
On Linux, it'd likely go until completion. You can't write to executables and libraries that are currently running (ETXTBUSY), so shred can't trash either itself or find.
The exception is shared libraries on Linux. Shared libraries are mmapped into the address space of the executable that uses them. Back in the day, mmap used to support a `MAP_DENYWRITE` flag, which it no longer does - so you _can_ write to a shared library that is currently in use. I found this out the hard way when a shared library that I'd written (which maintained some state in its constructor) started crashing mysteriously anytime it was used by more than one process. It took two weeks of debugging to figure out why this was happening. [1] has more details about `ETXTBUSY`, `MAP_DENYWRITE` and shared libraries.
rm doesn't actually modify the file. It does an unlink which removes the link from the filename to the file.
Essentially, a file can have 0 or more names linked to it. As long as a file has at least one name or at least one process with it open, it will persevere.
By contrast, shred actually writes to the underlying file.
True, but I wonder if it would first get stuck on trying to shred e.g. /dev/stderr or if after shredding the files in /etc some daemon woke up and tried to read a file there and choked.
I do like that /bin/true can actually fail and return false, which technically makes a "Not /bin/false" invocation more resilient: https://github.com/coreutils/coreutils/blob/master/src/true.... (and yes, I know it's the most unlikely thing, I just found it funny)
The other interesting thing about the true command is how much more complicated it got then it needed to be.
first an exercise
touch mytrue
chmod u+x mytrue
./mytrue
echo "error code for mytrue is $?"
This is literally how true started life. yes it is very zen.
The first offense was legal. All code had to have a copyright disclaimer.
even an empty file? Yes. so now it was a file with a copyright disclaimer and nothing else. And the koan-like question comes to mind is "Can you copyright nothing?" well AT&T sure tried.
Then somebody said our programs should be well defined and not depend on a fluke of unix, which at this point was probably a good idea. So true finally had code. It was "exit 0"
Then somebody said we should write our system utilities in C instead of shell so it runs faster. openbsd still has a good example of how this would look.
At some point gnu bureaucracy got involved and said all programs must support the '-h' flag. so that got added, then they said all programs must support locale so that got added. now days gnu true is an astonishing 80 lines long.
The cleverness inside coreutils is mostly around choosing effective ways to interface with the kernel, e.g. using copy_file_range() instead of read()/write() to avoid having to copy the data into userspace.
It's more a software engineering endeavour instead of computer science.
Honest question: why? I'm not really interested in portable scripts, usually I just want to get a job done as fast as possible and move on (for serious automation I prefer Python, shell is for more interactive use for me). Is there any upside to sticking to busybox commands in this case?
That guy is starting from very far away, HN not the place as it is not enough to fill in that amount of blanks.
And from the first thing you learn in real life, is humility, but sometimes, some people makes you feel you know everything, like here... but I very well know I am average/normal, which worsen even more his/her case. Yeah, a bit like John Snow.
This is an horrible feeling.
Not to mention, this post is border-line passive-aggressive... so...
"They are not designed for long life or to scale beyond their role."
Would love to see some examples from the author of programs he believes are "designed for long life" that have been around 30 years.
Or even ones he thinks will be around for 30 years.