Why bother with argv[0]?

yjftsjthsd-h · 2024-09-03T14:50:41 1725375041

So obviously claiming that there's no good reason for process to read argv[0] is either demonstrating the author's ignorance or needs a much stronger defense; I'd be fascinated to hear how they think busybox should work on an OpenWrt box with a 16MB root filesystem.

However, I am willing to consider the discussion about whether there could be merit to restricting the ability to write that value; I could imagine a system that populated it only from the actual file name and did not allow it to be written by the parent process or the child process at runtime. The obvious place this still falls apart is that an attacker could just

    ln /bin/curl ./some\ other\ name

but there are sometimes security measures that we use even though they're less than 100% effective so it at least conceivable that this might be a trade off worth making.

gwbas1c · 2024-09-03T17:37:33 1725385053

I agree, I think the author really shot themselves in the foot when they, at length, criticized the merits of a program using argv[0].

The real point are the security flaws in a calling program setting argv[0], because it really, really should be set by the operating system. (As a programmer, I shouldn't have to defend against these kinds of attacks. The OS should block it.)

The criticisms of valid programming practices, IMO, hurt the author's credibility and distract from the real point of the article.

nrdvana · 2024-09-03T22:12:28 1725401548

The real security flaw is extracting a value from a process's own memory to identify what the process is. If you want a secure way to identify what a process is and where it came from, that needs to be a new feature in the OS.

argv[0] was designed to be part of the arguments to the program, and it succeeds perfectly at that task. The problem is that it has been abused by external tools as a way to identify the program just because there was no other alternative.

It has to be writable because the entire argv string (in program memory) is writable and declared as

  int main(int argc, char **argv)

not

  int main(int argc, const char **argv)

and needs to preserve back-compat. Classic C code might be calling strtok on the arguments, so that block of memory needs to remain writable.

Joker_vD · 2024-09-03T22:22:10 1725402130

> The real security flaw is extracting a value from a process's own memory to identify what the process is. If you want a secure way to identify what a process is and where it came from, that needs to be a new feature in the OS.

How would that help? After all, even if this info comes from the OS, the decision logic still lives in your process's memory which the parent process still has full access to.

gwbas1c · 2024-09-04T03:03:03 1725418983

Take a closer look at the exploits listed, they all have to do with malware manipulating argv[0] when creating a new process; not with a process manipulating argv after it starts.

There is no mention of mutable memory attacks.

(If I was on a computer I'd fire up a C IDE to even see what happens when I mutate argv. I suspect the OS keeps its own copy of what the process was started with.)

Zironic · 2024-09-04T08:25:42 1725438342

It's not about mutable memory attacks, it's about not understanding the purpose of argv[0]. argv[0] is an argument, you are supposed to be able to set it to whatever you want. You are not supposed to rely on an argument to identify a program, that is nonsensical.

The problem here isn't argv[0], the problem is security software not understanding what argv[0] is and if you want security software to better be able to identify processes, the solution isnt changing argv[0], it's implementing an actual process ID checking.

thayne · 2024-09-04T14:40:08 1725460808

> it has been abused by external tools as a way to identify the program just because there was no other alternative.

There is an alternative, at least on linux: /proc/$pid/exe.

And if your question is "what executable is running" that is a better way to get it. But for a program like busybox, argv[0] is also important.

account42 · 2024-09-05T10:00:19 1725530419

Pretty much any OS lets you examine which binaries have been mapped into the process adddress space meaning there are plenty alternatives.

rerdavies · 2024-09-04T14:50:27 1725461427

It's simply a case of "Do use argv[0] for this, and don't use it for THAT.

Both Windows and Linux provide APIs to get the actual path of the executable. Posix, to the best of my knowledge, does not. Regrettably. And the Linux API is, admittedly, a bit weird. But not that difficult really. Nothing that you can't get Claude to spit out for you in under 45 seconds. ;-P

The contents of argv[0] are yours to use and abuse as you see fit. Operating systems don't know or care if you go trampling recklessly through the contents of argv[0].

And the authors contention that "power --shutdown" and "power --reboot" are viable alternatives to "shutdown" and "reboot" seems.... disingenuous. Is the politest word I can come up with.

And, if you haven't asked yourself, "wait a second, what happens if somebody passes me garbage via execve" before you are halfway through writing the substantial amount of code required to portably normalize argv[0] to an executable path, I don't think you can be trusted to write secure code of any form. Just normalizing the various forms of argv[0] that a Linux shell passes you is a non-trivial effort. So don't use it for THAT.

shadowgovt · 2024-09-04T14:49:00 1725461340

I see a common anti-pattern in security researchers in that they can lose sight of the human beings who operate the software.

argv[0] should be used by any logging message that purports to report the program name, because argv[0] should be a string the human recognizes as something they invoked. Taking it away would break usability.

This does, of course, imply that the program name is non-constant untrusted data. Which means we shouldn't be making security software that depends on knowing that name.

rzwitserloot · 2024-09-03T15:16:40 1725376600

That seems unnecessarily harsh.

I don't think that's the gist of the article, but the throwaway suggestion of 'just make lots of copies, who cares about diskspace' is insufficient and thus distracts. It's.. a single line about solutions in an article that isn't _about_ solving problems, it's about highlighting a problem exists and that it's worth solving.

I read the article more as: There is __often__ no good reason to use argv[0], and it should be avoided if at all possible, and if it cannot be avoided, it would behoove the industry to work on ways to make sure in the future it can be avoidable.

For example, why in the blazes does windows taskman.exe list argv[0] in the GUI table view? That's just asking for trouble. Show the actual file path, and always an absolute one - that way you avoid confusion about which executable you're actually running, and it's just as readable if not more readable for every app _except_ those who care about argv[0], e.g. if you ran `/bin/dd` and it's actually busybox, in taskman you'd see `/bin/busybox` instead which'd be worse than seeing 'dd'. That is simple enough to solve (add an API call to update _your own process name_ or at least update your own process 'title' which interfaces like ps/taskman can use accordingly), but, now we're talking about coordinating between OS, glibc, busybox, and so on - lots of parties. I don't mind that the article doesn't delve that deep, as that wasn't the point of it. The point is simply to show the problems the kludge of 'we will show argv[0] instead of the executable name' causes.

This article feels more about explaining that in the distant past, a mistake was made with some history as to why that mistake was made and the deleterious practical effects that this mistake is causing or is likely to cause (most of them security related). It's not really about solving the problem; that presumably comes later and should be sketched out by those who are knowledgable on _that_ subject. That doesn't imply the author is ignorant or that the article is insufficiently defended. Just that it hasn't covered all aspects of what it's writing about.

toast0 · 2024-09-03T16:04:34 1725379474

> Show the actual file path, and always an absolute one - that way you avoid confusion about which executable you're actually running, and it's just as readable if not more readable for every app _except_ those who care about argv[0], e.g. if you ran `/bin/dd` and it's actually busybox, in taskman you'd see `/bin/busybox` instead which'd be worse than seeing 'dd'.

This was kind of in the middle of your complaint about windows, but then you've got unixy busybox discussion.

On a unix filesystem, a file that's hard linked with multiple names has no single 'actual name'. All of the names are equally valid. You could show the filesystem and inode number, which should uniquely identify the file, but is pretty user unfriendly.

pdonis · 2024-09-03T17:44:04 1725385444

> On a unix filesystem, a file that's hard linked with multiple names has no single 'actual name'.

But each of the multiple names points to the same actual data, so it doesn't matter which one is shown. The obvious choice would be to show the absolute path that the OS used to load the executable.

wongarsu · 2024-09-03T18:55:10 1725389710

> On a unix filesystem, a file that's hard linked with multiple names has no single 'actual name'

The same is true for hard linked files on Windows. That never stops Windows from showing you a path.

There is almost always an "obviously right" path (the one used when opening the file). And if you lost track of that, deterministically choosing one of the possible paths is almost always more user friendly than just chowing inode numbers.

toast0 · 2024-09-03T19:42:57 1725392577

> There is almost always an "obviously right" path (the one used when opening the file).

The path used while opening a file is easy to get confused. If your cwd changed names or was deleted since you entered it, and you open an executable with a relative path, what is the "obviously right" path then?

dwattttt · 2024-09-04T01:14:07 1725412447

In context, there's still a right answer; the absolute path used to run the process should always be right, because the Windows locks the file (I'm sure with enough effort this can be made wrong, but I'm also sure that's not trivial).

Really, this discussion just shows the question is the problem; if you're asking "what path was used the launch the process", that's easy to keep track of & always be right. If you're asking "what is the path right now to the file that launched the process", maybe that has no answer.

kelnos · 2024-09-04T02:14:36 1725416076

I don't think that really follows. If the OS wants to track this, it should canonicalize the path to the executable on startup, and then stash it somewhere.

(And a program could do that itself, if it wants to.)

mbrumlow · 2024-09-03T15:31:01 1725377461

> highlighting a problem exists

Coding bugs into your programs is not a problem it’s a bug. None of the weird arg[0] examples can happen on the shell (without escaping), only when using system calls.

The more I read the article the more I feel this is a reaction to a behavior the author did not expect and fancy them as smart therefore the last 20 years of use age of this feature are obviously wrong.

cesarb · 2024-09-03T18:26:38 1725387998

> None of the weird arg[0] examples can happen on the shell (without escaping), only when using system calls.

  $ help exec
  [...]
  Options:
    -a name pass NAME as the zeroth argument to COMMAND

Even in shell, you can explicitly specify the argv[0] when running an executable.

mbrumlow · 2024-09-03T19:11:08 1725390668

Not in all shells, but in some exec is a pass through to the system call …

Bash is a language, so again we are telling the stream to do something silly and calling it out had a security problem.

The issue is not arg[0] but uninformed expectations on how these systems work.

Relying on the program/command name for security and not the executable path is a bug.

Furthermore if a bad actor has enough access to run exec you probably are in a bad way.

The whole post also seems to not understand that both windows and linux have ways to change this display after the executable is running via SetConseTitle and prctl or simply modify arg[0] directly.

ArchOversight · 2024-09-03T15:38:34 1725377914

There's the `setproctitle` in FreeBSD that is designed exactly for a process to update the information that is presented to tools such as ps.

https://man.freebsd.org/cgi/man.cgi?query=setproctitle&aprop...

asveikau · 2024-09-03T16:02:09 1725379329

There's also getprogname(3) on a lot of systems, and the __progname variable. I seem to recall this is an area where various Unix like systems have slight variations.

account42 · 2024-09-05T10:25:10 1725531910

prctl(PR_SET_NAME) on Linux - sets the thread name, and the name of the main thread is shown as the process name in most tools.

kelnos · 2024-09-04T02:12:57 1725415977

> it's about highlighting a problem exists and that it's worth solving.

If so, then I disagree with the premise of the article, fundamentally. I don't see a problem. If someone is writing security software and doesn't already know about the mutability of argv[0], and doesn't know that (on Linux at least) /proc/$PID/exe is the only correct way to gt the binary backing a process... well, then they have no business writing security software.

There is no problem here. The author is making a big deal about nothing, either because they have a weird axe to grind, or because they're ignorant.

Maxatar · 2024-09-04T00:02:22 1725408142

>Show the actual file path, and always an absolute one

There are numerous reasons why this is not desirable, for example knowing whether an application was called from one symbolic link or a relative path dictates what that application's working directory is.

croes · 2024-09-03T17:30:28 1725384628

It's easy to call something a mistake in hindsight.

You could argue the mistake was done elsewhere so this feature could be abused.

ahoka · 2024-09-03T17:16:36 1725383796

“That is simple enough to solve (add an API call to update _your own process name_ or at least update your own process 'title' which interfaces like ps/taskman can use accordingly)“

We could call it setproctitle, or something. \s

pzmarzly · 2024-09-03T16:03:54 1725379434

Not an author, but there's a good alternative. If busybox was edited to ignore argv[2], then applets could be called via shebangs, instead of symlinks:

    $ echo '#!/path/to/busybox echo' > myecho
    $ chmod +x myecho
    $ ./myecho 123
    ./myecho 123

Right now this doesn't work properly, because "./myecho" (argv[0]) gets placed into argv[2] of the process. Otherwise, this technique IMHO is better than symlinks:

- Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

- Doesn't read or write to argv[0].

- You could finally rename the applets. This is not that useful if busybox is your only posix userspace implementation, but very useful if you want many implementations to live side-by-side. E.g. on macOS, I'd like to have readlink point to BSD/macOS's readlink, greadlink to GNU coreutil's, bbreadlink to busybox's.

But as I said, this doesn't work for now. The best you can do now is to write shell two-liners https://news.ycombinator.com/item?id=41436012. Some of such two-liners may also fit into the inode inlining limit, so that's a plus. But you will have performance penalty on every call (since sh needs to start up).

cesarb · 2024-09-03T18:23:49 1725387829

> Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).

Is that really the case? AFAIK, OpenWRT uses SquashFS by default, and a quick web search tells me that "[...] In addition, inode and directory data are highly compacted, and packed on byte boundaries. Each compressed inode is on average 8 bytes in length [...]" (https://www.kernel.org/doc/html/latest/filesystems/squashfs....). That is, even if the content fits into the inode, it will make the inode use more space (they're variable-size, unlike on traditional filesystems with fixed-size inodes).

And using hardlinks (traditionally, we use hardlinks with busybox, not symlinks) goes even further: all commands use a single inode, the only extra space needed is for the directory entry (which you need anyway).

alerighi · 2024-09-03T18:22:01 1725387721

Well that would be inefficient. For each command you run the kernel has to read the file, detect that it has a shebang, parse the shebang line, and then finally load the actual executable in memory. That could be a performance problem, since busybox is used typically in embedded systems that doesn't have a lot of resources: imagine a shell script that runs a command in a loop, it has to do a lot of extra work.

Finally, symlinks can be relative, while the solution you proposed is not. This is particularly useful for distributing software, e.g. distributing a tar file with the busybox itself and their symlinks.

In fact, you don't even need symlinks at all: you can even have hard links, that could even save disk space on embedded filesystems, that are readonly images anyway.

cxr · 2024-09-03T19:01:49 1725390109

> Well that would be inefficient. For each command you run the kernel has to read the file, detect that it has a shebang, parse the shebang line, and then finally load the actual executable in memory.

Those that exist today would, but no kernel would have to work like that.

Once you've agreed that monolithic kernels have merits, you've accepted that the kernel can do whatever it wants to make this efficient—including being complicit in this scheme and leapfrogging over most of what you just described.

kelnos · 2024-09-04T02:09:14 1725415754

> Those that exist today would, but no kernel would have to work like that.

That's a pretty weird argument. "Yes, what you say is completely correct, but let's imagine a world where you were wrong."

We have what we have, today. We should form conclusions and make decisions based on things that exist, not on things that we might dream up.

cxr · 2024-09-04T03:43:00 1725421380

[flagged]

jolmg · 2024-09-04T04:34:22 1725424462

> it's against the rules here to those kinds of fake quotes.

What part of the guidelines are you referring to?

Also, despite the quotation marks, I don't think they mean to quote you. They're just rephrasing you as they understood you.

Coincidentally enough, I've just done that too in another comment:

https://news.ycombinator.com/item?id=41442007

cxr · 2024-09-04T15:40:43 1725464443

<https://news.ycombinator.com/context?id=13602947>

And I didn't mention the guidelines (i.e. newsguidelines.html). On that note, though:

> the site guidelines[...] aren't a list of proscribed behaviors but a set of values to internalize. I'd say "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize" covers this case pretty squarely

(from <https://news.ycombinator.com/item?id=15892014#15893789>)

See also:

<https://news.ycombinator.com/item?id=38688831#38690517>

plus lots (and lots) more.

vlovich123 · 2024-09-03T18:50:09 1725389409

I’m going to challenge you on the performance angle. Instead of doing the shebang line, it has to traverse the filesystem to resolve the link. I suspect that’s probably more expensive than parsing the shebang line. Indeed, a shell script that runs a command in a loop should have busybox detecting the built in command & executing it inline without spawning executables via the file system (this is common in bash as well btw).

There are valid reasons but I think the performance angle is the weakest argument to make.

Denvercoder9 · 2024-09-03T19:12:21 1725390741

> Instead of doing the shebang line, it has to traverse the filesystem to resolve the link. I suspect that’s probably more expensive than parsing the shebang line.

I highly doubt that. Path traversal is one of the most optimized pieces of code in the Linux kernel, especially for commonly accessed places like /bin where everything is most likely already in the dentry cache. For the script with a shebang on the other hand it first has to read it from disk (or the page cache), then parse the path from it, and then do a path traversal anyway to find the referenced file.

Too · 2024-09-04T05:27:40 1725427660

Imagine the performance problems of running 'shutdown' and 'reboot' in a tight loop!

Besides, should one really write something performance critical for embedded in shell in the first place?

soneil · 2024-09-03T19:06:04 1725390364

I was going to say it'd be easier to have a single script, eg

    #!/bin/sh
    busybox $0 $@

and then every command required could just be a hardlink to the same script, instead of replicating it over and over again for hardcoded command names.

Then I realised the whole point is to posit a world where $0 doesn't exist, and we're not allowed to be clever about it.

jenscow · 2024-09-04T01:08:40 1725412120

In such a world, shells would probably have something like a $SCRIPT_NAME to work around this.

account42 · 2024-09-05T10:30:35 1725532235

Are shebangs recursive? Otherwise this means that busybox can no longer provide /bin/sh.

pie_flavor · 2024-09-04T05:59:12 1725429552

Another example program that reads argv[0] is Rustup, the version manager for Rust. Rust versions can be set per directory either as a machine-specific override or via a file. Rustup is symlinked to all the Rust commands like rustc and cargo, and when invoked as one of those commands, it checks what version it is supposed to be using, and then forwards to that version. I don't see how you'd do this without argv[0] (or a dozen slightly different pointlessly recompiled binaries).

wietze · 2024-09-03T15:06:04 1725375964

For busybox/toybox the argv[0] thing is great, and seems to be the prime example of why argv[0] shouldn't go - yet it is a bit of an anomaly in how argv[0] is used.

If there really is a need for having one executable that comprises multiple commands, is `busybox whoami` instead of `whoami` so much more effort? To me, that would make more sense in terms of what is going on; aliases could be used if one-word commands are preferred. In most non busybox contexts, argv[0] is just an unnecessary addition that, as the linked article shows, can introduce weirdness.

It's clear from the comments there are still many who think argv[0] is a good thing, which is great - I'm glad the post sparked this debate.

blenderob · 2024-09-03T15:38:12 1725377892

> is `busybox whoami` instead of `whoami` so much more effort?

It's not the "more effort" that is the deal breaker here. It is a matter of compliance with specs and user expectations. What you're suggesting would make Busybox very non-POSIXy, very non-Unixy. All scripts written over the last many decades would need to be updated to call `busybox ls` instead of `ls`? How is that a viable solution?

> I'm glad the post sparked this debate.

This is a very strange way to deflect concerns about quality of the article!

gary_0 · 2024-09-03T15:49:43 1725378583

Yeah. The whole point of busybox is to provide the POSIX commands in one compact executable. Making things work any other way defeats the entire purpose of busybox.

adw · 2024-09-03T22:26:55 1725402415

In other words: `busybox` is primarily an implementation of a _standard library_ and only secondarily a command line tool, so it _must_ use the standard names.

piaste · 2024-09-04T11:48:44 1725450524

Given that 'alias' is in POSIX, would a combination of

(a) the hypothetical non-argv0 busybox being discussed, and

(b) a POSIX shell of the maintainer's choice, with built-in aliases for 'ls=busybox ls'

be sufficient to make the system POSIX complaint?

blenderob · 2024-09-04T12:14:11 1725452051

aliases are not inherited by subprocesses unfortunately! so the alias solution would not work when a shell script launches other shell scripts. It wouldn't work in a wide range of other scenarios too like Makefiles, bespoke build tools, binary executables that do execve("/usr/bin/cmp", ...) etc.

yellowapple · 2024-09-06T19:15:25 1725650125

In addition to the already-raised issue of subprocesses not inheriting aliases, I'd also be worried about aliases inherently being specific to particular shells. I'd hate to have to redefine those aliases for sh, csh, zsh, fish, and Lord knows what else. It'd also be an issue for invoking those tools without going through a shell in the first place - as is common for programs launching external programs as subprocesses.

That's indeed why I personally don't use shell aliases at all, instead opting for actual shell scripts in my $PATH. Those will work no matter what shell I'm using (if any).

sltkr · 2024-09-03T17:03:19 1725382999

`busybox whoami` is probably fine, but having to write `busybox ls`, `busybox grep`, `busybox cp` etc. would get tedious quickly.

Shell aliases don't solve all problems, even if you do:

    alias rm="busybox rm"
    alias xargs="busybox xargs"
    # etc.

you still have to write `xargs -exec busybox rm`, because xargs won't use the shell alias.

But the main problem with this approach is that POSIX and LSB require certain binaries to be available at certain paths. When they're not, most shell scripts will just break.

The minimal standard solution is probably to create shell scripts for all of these, e.g. in /bin/ls:

    #!/bin/sh
    exec /bin/busybox ls

But this both adds runtime overhead (on every invocation!) and is quite wasteful in terms of disk space. Busybox boasts over 400 tools. At 4 KB per file, that's 1.6 MiB of just shell scripts. Of course that can be less if the file system uses some type of compression which is common on embedded systems where storage space is small, but it still seems to defeat the purpose of using busybox to create a minimal system.

yjftsjthsd-h · 2024-09-03T18:29:54 1725388194

Well /bin/sh is also busybox, so I think you'd need

    #!/bin/busybox sh
    exec /bin/busybox ls

?

sltkr · 2024-09-03T19:05:20 1725390320

Great point!

Actually this observation invalidates the whole setup. Because even though you could define /bin/sh itself as:

    #!/bin/busybox sh
    exec /bin/busybox sh

Then you still cannot use #!/bin/sh in any other shell scripts, because for historical reasons the interpreter of a script is not allowed to be another interpreted script, it must be a binary. So /bin/sh pretty much has to be an actual binary.

mbrumlow · 2024-09-03T15:15:17 1725376517

Yes. Anybody who has shipped software would say.

I really don’t think it is a debate. The usage of arg[0] is massively understated by the article. Just go look at gcc or any modern day compiler. Its use so much that the conversion of should we has been hashes out by many different groups yet they still chose to implement it.

The security concerns are a non issue. As arg[0] was not the problem. It was the lack of technical knowledge of how systems work and a flaw in the security application.

hinkley · 2024-09-03T15:09:33 1725376173

I think you’re both forgetting that bash has been using this trick for decades.

Bash has an sh compatibility mode that runs when you invoke it as sh.

alerighi · 2024-09-03T18:25:24 1725387924

Well of course it's not only a matter of interactive usage (even because the busybox itself shell could do the conversion). The problem are script, or worse programs that invokes commands as subprocesses (programs that maybe you don't have have access to the source code!).

What you do? Replace every single occurrence of each command by prefixing it `busybox`? Not ideal at all...

epcoa · 2024-09-03T15:10:30 1725376230

https://pubs.opengroup.org/onlinepubs/9699919799/

You appear not to realize that busybox is an essential component of a POSIX like system.

jimrandomh · 2024-09-03T17:31:55 1725384715

That's fine for when users are interactively typing commands, but it doesn't work when the command is being run by a non-busybox program which expects commands to exist in the standard locations.

Arch-TK · 2024-09-03T15:25:41 1725377141

Restricting setting it would break login.

Not that it couldn't be fixed by changing how we handle login shells but still. Worth remembering.

Similarly the busybox situation could be solved by having busybox ship posix shell wrapper scripts which use `#!/bin/busybox sh` as the shebang and simply consist of a line like `exec /bin/busybox ls "$@"`.

nrclark · 2024-09-03T15:48:39 1725378519

It can already do that, afaik. When I last checked, BusyBox supported installations via 4 methods:

  - symlink
  - hardlink
  - shell script wrappers
  - executable binary wrappers around libbusybox

Arch-TK · 2024-09-03T19:34:26 1725392066

Nice!

zekica · 2024-09-03T16:02:57 1725379377

You are over-complicating it, you only need `#!/bin/busybox ls` as the entire contents of the file.

Arch-TK · 2024-09-03T19:32:56 1725391976

This doesn't work. The eventual command would be /bin/busybox ls /path/to/ls/wrapper ...

kazinator · 2024-09-03T20:20:43 1725394843

If hard linking (no symbolic) is used to install the BusyBox commands, then instead of argv[0], BusyBox could use the platform-specific means of obtaining the executable name, and take the basename of that path. On Linux this means /proc/self/exe; _NSGetExecutablePath on Drawin; getexecname on Solaris; GetModuleFilename on Windows; ...

mzs · 2024-09-03T20:57:51 1725397071

https://github.com/util-linux/util-linux/blob/master/login-u...

edit: basically login(1) execes your shell with - prepended, so an example where POSIX expects this

marcosdumay · 2024-09-03T15:35:16 1725377716

Is there a good reason for allowing writes to argv at all?

I think any reason one will find are based on backwards compatibility.

red_admiral · 2024-09-04T15:07:44 1725462464

Yeah, never mind shutdown/reboot, has the author heard of busybox?

zokier · 2024-09-03T16:29:35 1725380975

Isn't the reason for busybox multi-call binary mostly just ELF being bloated? So the answer for resource constrained systems would be to have more efficient executable format. I don't see why multi-call binary + bunch of symlinks would be intrisically much more size-efficient than something purpose-built.

yuliyp · 2024-09-03T18:53:53 1725389633

a lot of code is shared between different tools. Busybox has one copy of those. Before you mention shared libraries. There is still overhead, as well as complicating the usage (it needs to find a shared lib when starting instead of just having all it needs in the binary). This isn't really a property of the executable format. Any format would have the same problem.

surajrmal · 2024-09-03T15:38:24 1725377904

You can write a 2 liner shell script that prepends busybox per command. I've done this on a 16MiB restricted system and while ate maybe 4k per command, it wasn't a big deal with only 20-30 commands.

blenderob · 2024-09-03T15:54:49 1725378889

What about compiled binaries that for one reason or another is doing an execve() on "/usr/bin/cmp" or some such thing? Do you propose changing every script and every binary on earth that expects Busybox to be a POSIXy, Unixy environment?

pzmarzly · 2024-09-03T16:06:00 1725379560

On Unixes it doesn't matter if /usr/bin/cmp is a script or a compiled binary. If the script has correct shebang, kernel takes care of executing it.

oguz-ismail · 2024-09-03T16:51:14 1725382274

Shebangs are not part of the UNIX specification. What happens if an executable starts with `#!' is implementation-defined.

account42 · 2024-09-05T10:52:03 1725533523

UNIX/POSIX doesn't specify an executable format at all

immibis · 2024-09-04T12:12:07 1725451927

is being able to run busybox part of that specification? Linux ELF files are tagged for Linux OS, not generic Unix OS.

blenderob · 2024-09-04T12:20:03 1725452403

> is being able to run busybox part of that specification?

It's the other way round. This is like asking if running RHEL is part of the specification? Obviously not. But RHEL provides an environment that is quite close to the specification.

So is running a busybox part of the specification? Obviously not. But Busybox provides an environment that is close to the speficiation.

immibis · 2024-09-04T15:49:38 1725464978

Then it would be busybox's responsibility to make busybox work.

sophacles · 2024-09-03T16:22:31 1725380551

No you make this script:

   #!/bin/sh
   exec /bin/busybox cmp

And place it at /usr/bin/cmp

teddyh · 2024-09-03T16:49:43 1725382183

Surely you mean

  #!/bin/sh
  exec /bin/busybox cmp "$@"

oguz-ismail · 2024-09-03T16:53:48 1725382428

`#!/bin/sh' makes this less portable than it could be, if /bin/sh doesn't exists on my system it won't work, for example. Remove that line and it'll work everywhere.

teddyh · 2024-09-03T17:01:49 1725382909

If /bin/sh does not exist, what in the world is executing the shell script?

oguz-ismail · 2024-09-03T17:08:06 1725383286

The shell, of course. It just might not be (because it doesn't have to be) located in /bin.

teddyh · 2024-09-03T17:11:02 1725383462

I’m pretty sure that /bin/sh is mandated by POSIX.

oguz-ismail · 2024-09-03T17:45:25 1725385525

It's not. See https://pubs.opengroup.org/onlinepubs/9799919799/utilities/s...

    Applications should note that the standard PATH to the shell cannot be assumed to be either /bin/sh or /usr/bin/sh, and should be determined by interrogation of the PATH returned by getconf PATH, ensuring that the returned pathname is an absolute pathname and not a shell built-in.

alerighi · 2024-09-03T18:47:26 1725389246

So now to execute a program that could have been a direct run you have to fire up a shell, have it parse the file, and execute the instruction? Not really a great thing...

Plus, you have to know the absolute path of the executable busybox, not something you always know in advance.

sophacles · 2024-09-03T19:13:50 1725390830

You're not wrong about the extra execution of shell, although a lightweight shell like ash or dash doesn't have a huge overhead.

But realistically, the only people that need to worry about the actual location of the busybox executable are the people who write the install script - it would take that as an argument or variable and spit out all the little scripts as an automated process. The current installer already has to do most of this work as part of setting up the relevant links anyway.

Hizonner · 2024-09-03T15:37:51 1725377871

As the original author says (but seems to forget within a paragraph or two), the program should already know what program it is. If you're looking at argv to find out what program you are, you are doing it deeply wrong. It's an argument.

One good use for it is to make a guess as to where your executable is installed. Yes, it would be nice if there were a more certain way to get that... but not for security purposes. You don't want to rely on filenames for security anyway, because anybody can make copies and symlinks and rename files at will, and it's really, really hard to catch all the cases of that. Much harder than, for instance, remembering that argv[0] is a hint from your caller, not gospel from the OS.

In the same way, I know that it's fashionable nowadays for incompetent idiots to write security tools, but a security tool that trusts an argv value for anything much was obviously written by an incompetent idiot, because that's not what they're for.

mariusor · 2024-09-03T15:58:13 1725379093

Am I missing something, you didn't seem to address the case where you actually need to know which program you are? The way busybox provides the whole suite of linux-utils in one binary and require the command under which it was invoked to know what to do.

Hizonner · 2024-09-03T16:02:40 1725379360

Busybox still knows that it's busybox, and it is using that argument to decide which of its many functions to execute.

This person is arguing that that's somehow wrong because busybox, or more importantly some other software that's trying to monitor it, might get confused about whether it's busybox.

mariusor · 2024-09-03T16:09:30 1725379770

Busybox is quite well known project, but frankly from the way you write about it, it does not look like you know how it works so apologies if I'm explaining something that you already know.

Busybox is a reimplementation of the standard linux utils (ls, find, dir, etc..) for resource limited machines. To quote from the man page:

> BusyBox is a multi-call binary that combines many common Unix

> utilities into a single executable.

How it works is that it symlinks the binary to each of the commands it implements and then it executes the corresponding functionality based on the value of argv[0].

Hizonner · 2024-09-03T16:17:31 1725380251

I know exactly how it works, thanks.

The hangup here seems to be the definition of "program". I'm using it to mean something roughly like "executable", which I think is fairly close to what the original article meant it to mean. You seem to be using some concept of "program" that makes each of busybox's functions a separate program. As far as I'm concerned, on the other hand, busybox is one big program that does a lot of largely unrelated things, choosing which of them to do based on how it's invoked. There's no right answer. You could say that all of the software running on a whole computer is one giant program, and in fact sometimes I do find it convenient to think of it that way.

I don't know that your definition of "program" is wrong, but I do think it's alien to this context.

mariusor · 2024-09-03T16:19:56 1725380396

OK, apologies.

Then your previous statement makes no sense in context. At least to me.

Yes busybox knows it's busybox. But busybox doesn't do anything if it is not invoked in a certain way which relies on argv[0] being what it is today. I am not sure what you're arguing for frankly.

Hizonner · 2024-09-03T16:35:33 1725381333

I'm arguing against the idea that the way argv[0] works is somehow wrong, and/or perhaps should be changed to "more reliably" reflect the filename of the executable that actually got loaded, because some programmer might not understand what argv[0] actually does.

The article's lead argument for the "badness" of argv[0] seems to be, roughly paraphrased, that "the program should already know what it is [true], and this could confuse it [Huh? No I don't really know what that means either]". That's followed by a bunch of other stuff about other programs guessing what executable is running in a given process based on its argv[0], which is of course just deeply ignorant misuse of the value.

I mean, "the name" of the file that got loaded isn't even necessarily either well defined, or useful under any definition.

mariusor · 2024-09-03T17:03:29 1725383009

Then it looks like I'm terrible at reading comprehension today, I understood you were arguing the same thesis as OP. Apologies, again. :)

avidiax · 2024-09-03T13:51:53 1725371513

It is sometimes used to allow one binary to be the symlink target of hundreds of commands.

Android does this for most common shell commands. Toybox and busybox are examples of such implementations.

https://github.com/landley/toybox

https://en.m.wikipedia.org/wiki/BusyBox

cubist_castle · 2024-09-03T14:07:28 1725372448

I just learned that rustup/rustc/cargo etc. work like this too. I couldn't understand why the gentoo formula was symlinking the same binary to a bunch of aliases.

kbolino · 2024-09-03T14:15:36 1725372936

On my system, these are hardlinks (regular files with a link count >1 and the same inode) rather than symlinks, though I'm not sure why.

mostthingsweb · 2024-09-03T14:19:10 1725373150

Maybe to avoid broken links if you move the original files? That's the main benefit of hardlinks vs symlinks in my mind at least.

actionfromafar · 2024-09-03T14:50:23 1725375023

That can also be a downside, you believe you have moved stuff but now you can have different versions of programs that don't expect that to be a possibility.

ForOldHack · 2024-09-03T20:02:25 1725393745

If there is a simlink, a hardlink and an executable, all with the same name, which one will it run? Which one will the shell object to? Which one should the shell object to. If a virus/SUID program overwrites a simlink, no problem, but ift it traces the simlink to the executable, and then over writes that...

alerighi · 2024-09-03T18:33:51 1725388431

And that makes a lot of sense, especially for binaries that are statically linked (as usually are Rust binaries), since that could save a lot of disk space!

duped · 2024-09-03T14:41:16 1725374476

clang does this too.

mistercow · 2024-09-03T14:02:51 1725372171

Also if you want a program to call itself, which is sometimes useful, this way lets you actually call the same program, rather than assuming the name and path.

duped · 2024-09-03T14:20:05 1725373205

Don't do this - if you (reliably) want the path to the current executable there is no portable way to do it, but on Linux you need to readlink /proc/self/exe and on MacOS you call _NSGetExecutablePath. I forget the API on Windows.

theamk · 2024-09-03T14:58:49 1725375529

I would not say it in such absolute way - /proc/self/exe has downsides as well. As this resolves all symlinks, so this breaks all the things that depend on argv[0], like nice help messages, python's virtualenv, name-based dispatch, and seeing if the program which was executed via symlink or not.

A lot of times you know you never called chdir(), in which case I'd actually recommend executing argv[0], as this is nicest thing for admins. If you are really worried, you can use /proc/self/exe for progname and pass argv[0] as-is, but that's overkill a lot of times.

duped · 2024-09-03T15:29:24 1725377364

Those are all cases where you're using argv[0] as an argument to the program where it's appropriate. Using it as the path to spawn a child process is incorrect. You're free to re-use it as an argument.

I have fixed enough software that made this mistake that I'm confident to be absolute about it. It's a very easy mistake to make but it's really annoying when software makes it and someone needs to deal with it at a higher level. It's better for developers to know that argv[0] isn't the path to the executable it's what was used to invoke the executable.

vlovich123 · 2024-09-03T18:52:25 1725389545

What’s the issue with using argv[0] as a way to spawn yourself? I don’t recall running into a lot of issues.

duped · 2024-09-03T19:26:09 1725391569

If it's a relative path, then changing the working directory will break (chdir("/") is a very common tactic at the top of main()).

It's possible/desirable for the parent to change the PATH of a child process, particularly one that spawns other processes. So the argv[0] used to spawn the original process may be garbage for spawning children.

Similarly in any kind of chroot jail (which may or may not be docker these days), relative paths and PATH can be garbage even if they don't change.

The real problem is that I've seen in-house and open source frameworks/libraries that have a function like `get_executable_path` that reads `argv[0]` and this is just incorrect behavior. Spawning yourself is one of the less risky things you can do, but there are gotchas and a way to avoid them!

vlovich123 · 2024-09-03T19:49:24 1725392964

Hmm... I generally have so many issues with chdir (e.g. someone gives you a relative path to a file you need to read and now that's screwed up because you did a previous chdir) that I just avoid all use of it in the first place.

Generally don't run into chroot all that often these days & docker gives you a fully virtualized environment where if a relative path is garbage then you may have other problems too (e.g. given relative paths to files). You certainly have to be careful around chroot / docker anyway as I think resolving /proc/self/exe probably is dangerous too for all the same reasons and you need to be careful to use the literal "/proc/self/exe" string for the spawn command and also require that /proc is mounted and remember to pass through argv[0] unmolested (or mutating as needed depending on use-case).

There's enough corner cases that I'd hesitate given blanket advice as it requires knowing your actual execution environment to a degree that there's lots of valid choices that aren't outright "wrong". And some software may be portable where argv[0] is a fine choice that works 90% of the time without worrying about maintaining a better solution on Linux.

duped · 2024-09-03T21:59:39 1725400779

It's very common for daemons/servers to chdir("/") at the top of main. Relative paths sent by clients getting broken is a feature, not a bug. (In fact I just fixed a bug related to this an hour ago because a relative path was not being canonicalized before being passed to the daemon I'm working on and it caused a file to be written to the wrong place).

There's no way create a process such that /proc/self/exe is incorrect except if the process itself performs a chroot, or someone has overwritten what it points to. I'm talking about some other program running the process where those challenges don't show up.

> . And some software may be portable where argv[0] is a fine choice that works 90% of the time without worrying about maintaining a better solution on Linux

Except it's broken on MacOS and Windows, too!

I'm pretty confident saying that if you want to get the path to an executable, use the bespoke method for your platform because it ain't argv[0]. I have seen that codepath break so many times that there should just be a standard library method for it (and there often is, depending), and I have written this function at several companies.

There are not any edge cases that I'm aware of, except for a few esoteric ones. But there are quite a few edge cases for using argv[0], they exist on all platforms, and it's very annoying for people that have to fix or work around it because a software author didn't understand what argv[0] was.

im3w1l · 2024-09-04T00:11:39 1725408699

For the c-programmer adding a dependency is so difficult that he would rather use a roll his own 99% solution than use a library. It does protect him from supply chain attacks, I suppose.

vlovich123 · 2024-09-04T16:53:53 1725468833

> It's very common for daemons/servers to chdir("/") at the top of main. Relative paths sent by clients getting broken is a feature, not a bug.

I instead put that in the lauhch script / systemd policy. That way when I run the server locally for development weird shit doesn’t happen in my root.

theamk · 2024-09-04T03:24:47 1725420287

yesh, that's why my post literally said, "A lot of times you know you never called chdir()..."

Sure, don't put this in the library, but there is nothing wrong with using it in the app where you know no one makes this call.

mbrumlow · 2024-09-03T15:21:42 1725376902

I think you forget the exec system call’s first argument is a path to an executable, followed by an array of arguments, where arg[0] lives.

I can’t find issue with exec(“/proc/self/exe”, [ program , … ).

alerighi · 2024-09-03T18:32:34 1725388354

Well, it could be for example that /proc is not mounted. A lot of software breaks for this, while really there is no need for it to be so. Also that approach only works on Linux, if you want to write a portable software what you do?

mbrumlow · 2024-09-03T19:28:29 1725391709

I am mainly pointing out that arg[0] is still valid. Writing portable software is an entirely different topic.

sweetjuly · 2024-09-03T17:00:20 1725382820

Note though that both of these solutions are racy and so should not be done if "someone symlinking really fast and swapping the binaries" is in your threat model. Linux proc/self is safe though, just not the result from readlink.

duped · 2024-09-03T17:57:32 1725386252

Well that's true, but also something that can't be addressed within a currently running process afaik.

flohofwoe · 2024-09-03T14:24:21 1725373461

There's also this very handy and tiny cross-platform library:

https://github.com/gpakosz/whereami

ForOldHack · 2024-09-03T19:59:58 1725393598

Four cardinal sins of programming: 1. Self modifying code. ( The word 'recalcitrant' comes to mind. 2. calling your own program to execute itself. 3. Interrupting the flow of control with a jump. 4. Non-graceful exit. 5. Renaming 'hack' as 'vi' or 'ps'

SoftTalker · 2024-09-03T14:50:58 1725375058

There's no guarantee that the name and the path are still the same executable that is running, or that they even exist anymore.

wang_li · 2024-09-03T17:10:54 1725383454

In most of the variants of exec*() there are separate arguments for the thing to be executed and the *argv[] list. Argv[0] being the executable is just a convention. In perl $ARGV[0] is the first positional parameter. In

    $ perl myscript.pl a b c

$ARGV[0] is "a".

mistercow · 2024-09-03T15:22:54 1725376974

I mean sure. All software is built on assumptions. Make sure the assumptions you’re making are appropriate in context.

wongarsu · 2024-09-03T18:44:07 1725389047

Unless you are on Windows

glandium · 2024-09-03T22:19:35 1725401975

You can actually rename an executable that is running, on Windows. That's a way to handle self updates: rename the executable, create its replacement, execute the new one to make it remove the old executable.

akira2501 · 2024-09-03T14:12:50 1725372770

Beware TOC TOU problems when doing this.

fallingsquirrel · 2024-09-03T14:18:06 1725373086

You can do this without assuming the name by execing /proc/$PID/exe. Then you're not vulnerable to the argv[0] spoofing described in the article. (But of course since argv[0] does exist, you should set it properly and pass through your own argv[0] unchanged.)

dpassens · 2024-09-03T14:20:52 1725373252

That's not portable, though. OpenBSD, for example, doesn't have /proc.

hnlmorg · 2024-09-03T17:10:43 1725383443

That’s Linux only. Wouldn’t even work on macOS, which would likely be a significant number of your users.

hi-v-rocknroll · 2024-09-03T16:21:25 1725380485

coreutils-static did this too. The advantage of shared libraries and multiple-use single static binaries is they're only loaded once.

layer8 · 2024-09-03T15:35:59 1725377759

The article discusses this.

travisgriggs · 2024-09-03T14:38:13 1725374293

> “Should a program be allowed to behave differently based on its name?”

I don’t see why not. It’s allowed to behave differently based on the arguments that follow it. I personally think the genericity of including the program name itself as one of its own calling arguments is really meta cool.

SoftTalker · 2024-09-03T14:54:05 1725375245

One other historical reason for this (also the reason that older unix utilities tend to have such short names) is that people often interacted with unix machines over slow terminals or even paper teletypes. Typing "rm" instead of "remove" or "reboot" instead of "systemctl --reboot" was legitimately more convenient.

Arch-TK · 2024-09-03T15:30:25 1725377425

I mean, it's still more convenient to type `rm` rather than `Remove-Item` when doing day-to-day computer tasks on your computer (yes I'm one of those people who lives in a terminal).

It's also certainly better from a readability standpoint to have `Remove-Item` rather than `rm` in a script.

Likewise, I would much rather type `ls -Al` rather than `ls --almost-all --long-listing` (N.B. --long-listing is not the long option for -l, -l has no long option, I just made up an appropriate name) when listing a directory but would probably appreciate the long form in a script.

I think just like we have long options and short options, it would be helpful to have long commands and short commands.

im3w1l · 2024-09-04T00:19:01 1725409141

The best name from a readability standpoint is the shortest name you know by heart. So you might actually want to consider who your intended audience is.

account42 · 2024-09-05T12:03:25 1725537805

Exactly. Any for many commands you need to actually know what the command is doing to fully understand it and can't just go by intuition based on the name. So something more natural sounding can actually be more misleading because it tricks you into thinking the name describes the entirety of what the command does.

ForOldHack · 2024-09-03T20:06:02 1725393962

As someone who started on ASR-33s. I have empathy for Mr Ritchie and Mr Kernigan.

After all, its 50% faster to type a two letter acronym, than a TLA.

https://media.wired.com/photos/59327efdf682204f73696446/mast...

/e

Too · 2024-09-04T05:35:42 1725428142

If i download a new version of foo and rename my old version to foo_old_backup_2, should foo_old_backup_2 start behaving differently, just because it has a different name? NO THANKS!

A program should be sandboxed from its environment, including how the user started it. How a user names and organizes his files is a matter between the user and the operating system, not something individual program should care about.

account42 · 2024-09-05T12:05:57 1725537957

A lot of programs actually do need support files at specific locations (either full paths or relative to the executable) so you already don't get to abitrarily organize your program binaries any way you want (without adjusting the programs).

shermantanktop · 2024-09-03T14:43:49 1725374629

It’s the equivalent of the HTTP Host header, with similar utility. But I agree with the author that an OS provided trustable structure is a much better way.

marcosdumay · 2024-09-03T15:46:17 1725378377

> an OS provided trustable structure

Repeating the OP, your program takes every other parameter from the caller, why do you insist on the executable name to not be set by him too?

Windows defender is the one that is stupid by using it. Every OS has the real executable name in some place, security software should look there instead.

strawhatguy · 2024-09-03T20:53:10 1725396790

Yes, this is useful for backwards compat too, like bash with an 'sh' mode.

dheera · 2024-09-03T19:06:14 1725390374

There are also multiple reasons for a program to read its own executable.

- Decompressing and inflating a compressed binary block with a generic decompressor at the top (e.g. a bash or python script with a binary blob at the end)

- Checksumming its own executable (skipping the checksum string) to resist virus infection. Not bulletproof but viruses aren't usually smart enough to circumvent this

alkonaut · 2024-09-04T13:54:18 1725458058

Really the weirdness isn't that main is invoked with the program name as argv[0]. The weirdness comes not in main() but in execv. Shouldn't execv have just taken the user provided arguments, prepended the program name (as provided by the OS) and then invoke the main function of the program with that array?

The busybox argument or shutdown/reboot explains why the name of a symlinked binary is helpful as argv[0]. But does the busybox/shutdown case explain why the execv lets the user set the argv[0] value to anything other than what the path says?

MPSimmons · 2024-09-03T14:43:37 1725374617

If not, then busybox is going to need to change a TON

dataflow · 2024-09-03T14:47:25 1725374845

> I don’t see why not. It’s allowed to behave differently based on the arguments that follow it.

That's missing the point, I think.

The real question here is, is the name of a program really an argument to the program, from the user's perspective? I certainly don't blame users that disagree. It's more difficult for them to change argv[0], and the fact that this is possible is not necessarily obvious to them, nor to their users.

If it helps, think of it like this: imagine the file timestamp was similarly passed as argv[-1]. And that the file inode number was passed as argv[-2]. Would it make sense to change behavior on those too?

sokoloff · 2024-09-03T19:58:30 1725393510

> is the name of a program really an argument to the program, from the user's perspective?

When I use busybox [invisibly to me], I sure care that it knows whether I called it as "ls" or as "rm" and that it does the operation that I asked it to do.

theamk · 2024-09-03T14:43:03 1725374583

That's a weird take against argv[0] - all arguments are: "goes against modern design principles" and "can confuse programs which use argv[0] when they wanted "exec" instead"

For the former, I don't see how this goes against modern principles - in presence of symlinks, it is pretty reasonable to want to know both "how was this program called", as well as "what's the actual executable we ended up with". And this does more than just giving multiple names to same program - for example python uses argv[0] to tell if it's inside virtualenv and adjust search paths accordingly. This makes it appear like there are multiple python installs on system, with no extra disk space taken.

For the latter, yes, programs can have bugs and OSes can have non-obvious semantics, and if you are security software, it's very important to be aware about them. I would not mark "argv[0]" as something especially bad from security perspective. All the author's examples would still be possible in hypothetical world where argv[0] is set by system - as nothing stops user from creating a symlink in temporary dir with deceiving name (spaces and quotes are OK in filenames!) and exec'ing it directly. Instead, fix your security software so it quotes argv values?

cedilla · 2024-09-03T22:26:06 1725402366

> all arguments are: "goes against modern design principles"

And the key witness is systemd, which is too young to buy a beer - even in Germany.

wietze · 2024-09-03T15:36:21 1725377781

From a living-of-the-land perspective, having to symlink/hardlink/alias a command is much noisier - and thus easier to detect. So although you are right in saying it wouldn't completely solve the problem, making it a system responsibility would still significantly reduce the scope for abuse.

jrockway · 2024-09-03T16:59:59 1725382799

I think argv[0] is fine. It sounds like there is a lot of bad security scanning software that doesn't understand how the `exec` syscall works. That sounds like their problem and not a fundamental problem with argv[0].

Most people use argv[0] so they can do something like:

   $ mycommand help
   Type `mycommand foo bar` to foo bars.

   $ mycommand1.2.3 help
   Type `mycommand1.2.3 foo bar` to foo bars.

This is admittedly less fun when mycommand is /home/jrockway/.cache/bazel/_bazel_jrockway/7f95bd5e6dcc2e75a861133ddc7aee82/execroot/_main/bazel-out/k8-fastbuild/mycommand/mycommand_/mycommand` however.

yencabulator · 2024-09-08T17:03:24 1725815004

I routinely use basename(arg0) in my programs.

denysvitali · 2024-09-03T21:42:37 1725399757

I don't think argv[0] includes the full path (or at least some programming language strip the whole path and keep only the last part)

denysvitali · 2024-09-04T10:13:55 1725444835

I stand corrected - C, Go and Python are all consistent here and show the full path.

I seem to recall there was a language that only provided the stripped part - but I guess my memory is failing me here. Sorry for the wrong information above.

account42 · 2024-09-05T12:17:05 1725538625

It depends on the caller not on the language of the program being called. If you execute something via $PATH then most shells will only pass the command you typed and not the full path. Similarly, when you use a relative path like ./command then usually your argv0 will be that relative path and not the full path to the executable. So in practice argv may or may not be a full path and if it does not contain a slash then it generally isn't even a relative path (well, not relative to the current directory anyway).

For the help case in gp I think it makes sens for programs to always strip away anything up to including the last slash from argv0.

jrockway · 2024-09-04T17:39:05 1725471545

Languages might remove the dirname part, but argv[0] is not necessarily a path, it's just a string passed to the exec system call that is also passed to main(int argc, char *argv). While many languges don't call their main function that, somewhere in the runtime that's what it gets.

kelsey98765431 · 2024-09-03T13:52:59 1725371579

This is how busybox works in 'shim' mode. I am not however concerned with the security argument here, if you have the ability to run code you have the ability to do n to the power of x insidious things, and arg[0] abuse is just one of dozens, (hundreds?) of vectors or useful building blocks in an attack. if we are suddenly giving a shit about security on nixens, we should be looking at deeper SELinux rollouts (ease of use for sysadmins and maintainers so we never see permissive mode instead of just applying the difficult to remember command that will patch your policy settings. We need root capabilities to continue to be separated in the kernel access control scheme and probably we need to start using namespaces much more liberally like projects like silverblue/bluefin which reimplement entire os stack as a series of containers. Stronger container foundations and ease of use for existing security mechanisms will take us much further than worrying about ANYTHING else in the ABI which by the way will never change as long as linus is alive, and he will live on forever as an LLM most likely with the amount of mailing list posts he has made over the years.

js2 · 2024-09-03T15:29:59 1725377399

> Today however, disk space is no longer considered an issue; this is evidenced by macOS Sonoma, where shutdown and reboot are two separate executables.

Try running `ls -li /usr/bin` on macOS and you might be surprised to learn that all of these are a single executable: DeRez, GetFileInfo, Rez, SetFile, SplitForks, ar, as, asa, ... yacc. There's 77 different entries in `/usr/bin` (including `git` and `python3`) that are all links to the same binary (`com.apple.dt.xcode_select.tool-shim`). It's a wrapper that implements the `xcode-select` concept to locate and run the real executable provided by either the Command Line Tools package or a particular Xcode version you may have installed.

And that's not the only one. There's another 68 links starting with `binhex.pl` and ending with `zipdetails` that are a single 811 byte wrapper-script around perl.

Altogether, I see that there are 26 different names that are multiply linked:

  ls -li /usr/bin |
  awk '{print $1}' | 
  sort | uniq -c | 
  sort -n | grep -v "\s*1\s" | wc -l

Some of the other examples: less & more, bc & dc, atrm & batch, stat & readlink.

Having a program behave dynamically based on argv[0] is a useful tool in the Unix toolbox. The alternative would be compiling 77 different versions of `tool-shim,` creating 68 different versions of that perl wrapper, etc.

The `git` binary uses this concept too. You can create an executable named `git-foo`, put it anywhere in your PATH, and then call it as `git foo`.

In the end, argv[0] is just an argument that can be used to improve CLI ergonomics and reduce code duplication. It's not solely about disk space. I think that makes it a more common and useful concept than you give it credit for.

As to the rest of the post: I'm not really sure how argv[0] being in the caller's control is any different than the rest of the execution context being in the caller's control: the remaining arguments, the environment, limits on file descriptors, which file descriptors are open, the program's real and effective uid and gid, signals it might receive and so on. These all amount to untrusted input any executable has to be cognizant of, more or less so depending upon what privileges the executable has and what its goals are.

cryptonector · 2024-09-03T15:55:02 1725378902

Besides, disk space is not an issue, but container image size still can be an issue because those have to be copied around the network, and it's easy to have thousands of 10GB images consume more disk space than you might have thought you'd need.

tsujamin · 2024-09-03T18:07:54 1725386874

Surely the duplication would be (mostly) compressed away?

dcminter · 2024-09-03T14:10:46 1725372646

This lost me at "goes against modern design principles" without citing what principle(s) the author had in mind that would proscribe it.

st_goliath · 2024-09-03T14:30:22 1725373822

Given the tone and assumptions the article makes, and the things that are explicitly explained, this seems to be one of those articles where a novice learnt something new and then decided to write an article about it, despite not having fully grasped the concept yet.

As a result, the author has such strange, absolute positions, calling it a legacy that should be abolished (only tangentially knowing some actual use cases), or that strange quote about design principles.

Despite all the talk about security, the whole debacle that argc can be 0 (and argv[0] can be NULL), is completely left aside. This has caused actual security issues quite recently[1].

[1] https://lwn.net/Articles/882799/

gwbas1c · 2024-09-03T17:39:59 1725385199

The security issues the author points out later in the article do have merit.

Unfortunately, the author shot their credibility in the foot by perseverating on use of argv[0]; instead of glossing over it and getting to the point.

rpcope1 · 2024-09-04T14:59:24 1725461964

Almost any time someone uses the words "legacy" or "modern" in the context of computers, it's a giveaway to me almost always someone has an axe to grind with few or no real deep substantive reasons. I typically read these as:

"legacy" -> anything that has existed for more than a day that I don't understand and don't like that stops me from poorly reinventing the wheel

"modern" -> anything that I dreamt up or heard some other hipster talk about recently that I got hyped about

hiccuphippo · 2024-09-03T14:56:39 1725375399

I would guess the modern principle of disregard for disk space or memory usage :(

astrobe_ · 2024-09-03T17:10:34 1725383434

the Modern principle prescribe that one should never use software that's older than yourself. Some cults even prescribe that that one should not use a framework longer than you would wear a pair of socks.

lanstin · 2024-09-03T14:24:28 1725373468

This article seems to be an example of how some common security practices are kind of surface level. If you want to limit what a box can access on the network, do it in the network. Why is security looking for bad urls in the argv; if you know they are bad just block them? Or better yet if they aren't good, don't allow them. And if you want to know what a process is doing, ask the kernel to log its syscalls. If you take away argv 0 you will lose some valuable stuff (cute little busybox links, error logs that have argv[0] in them, and attackers will just name payload.exe ls.exe. And if your network is allow all, they will still reach CNC or collector end point.

dividuum · 2024-09-03T14:36:11 1725374171

Seriously: Their reason is basically "argv[0] is bad because security snake oil software is garbage":

1) Oh no, the only protection is looking at argv[0]. What kind of clown software is that? Software that notably runs on an already compromised system..

2) No need for argv[0] to fool software that concats argv values with spaces: just run 'curl -o "test.txt |grep" 1.1.1.1'

3) A long argument messes up telemetry? Let's hope that bucket doesn't have more holes.

shermantanktop · 2024-09-03T14:46:30 1725374790

These are all very realistic examples. Should they happen? No, but reality is messy and imperfect. The crappy software you describe would not exist if there were great solutions in this space.

Hizonner · 2024-09-03T15:43:06 1725378186

There are better solutions than that. Off the top of my head, on Linux, you could get what the article is asking for by doing a readlink on /proc/self/exe.

The crappy software exists because the people who write it don't have any idea what they're doing. And the reason for that is that the people who found companies in the security space have discovered that nobody can tell whether their products really work or not, so they can save money on talent and training.

marcosdumay · 2024-09-03T15:50:06 1725378606

Yes, they are realistic. No you shouldn't change your system to satisfy clown development dynamics.

And just as a warning, if you insist on doing so, the rules will get ever more complicated. Expect to not be able to achieve anything at all very soon.

shermantanktop · 2024-09-03T16:48:30 1725382110

If Crowdstrike is an example, then that's not true. Instead, success is not gated by rule quality, and you can get to global scale without a signal as to whether your rules are actually good or effective. And then someone publishes a new template and boom, Delta grounds their planes for days.

skobes · 2024-09-03T14:12:41 1725372761

"Windows’ own API calls for creating new processes (such as CreateProcess [6], ShellExecute [7]) do not allow you to set argv[0]: it sets it for you, based on how the path to the executable was provided."

Isn't this contradicted by the docs? CreateProcess receives lpApplicationName and lpCommandLine, and they can be different.

DSMan195276 · 2024-09-03T16:20:50 1725380450

Yeah they have this incorrect. if you provide `lpApplicationName` and `lpCommandLine` then the application name is not automatically added to the command line string, you have to add it yourself to the string provided as `lpCommandLine`. I checked and the docs for `CreateProcess` briefly mention this issue:

> If both lpApplicationName and lpCommandLine are non-NULL, the null-terminated string pointed to by lpApplicationName specifies the module to execute, and the null-terminated string pointed to by lpCommandLine specifies the command line. The new process can use GetCommandLine to retrieve the entire command line. Console processes written in C can use the argc and argv arguments to parse the command line. _Because argv[0] is the module name, C programmers generally repeat the module name as the first token in the command line._

magicalhippo · 2024-09-03T15:22:27 1725376947

Not the way I understand it. In the execv documentation[1], you pass the program name twice:

int execv(const char *path, char *const argv[]);

The argument path points to a pathname that identifies the new process image file.

The argument argv is an array of character pointers to null-terminated strings. [..] The value in argv[0] should point to a filename string that is associated with the process being started by one of the exec functions.

Windows does not allow you to do that, AFAIK.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/e...

skobes · 2024-09-03T16:33:40 1725381220

> Windows does not allow you to do that, AFAIK.

It does though, using the lpCommandLine parameter to CreateProcess as I said.

CreateProcess("main.exe", "foobar", ...)

argv[0] is "foobar"

magicalhippo · 2024-09-03T17:49:16 1725385756

I stand corrected. Been ages since I used Win32 API a lot, and I realized I can't recall using both of those arguments when calling CreateProcess.

halayli · 2024-09-03T18:26:36 1725387996

> This seems like a questionable design decision. Should a program be allowed to behave differently based on its name? From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

No it doesn't make software less predictable nor does it goes against modern design principles. argv has very handy use cases and can be used to provide better user experience.

Unless you have evidence to back up your claims, you're just turning a subjective opinion to an objective one without any merit.

Either way, it's software developer choice and irrelevant to the user as much as it is irrelevant to the user whether the developer prefers for(;;) over while(1).

JohnFen · 2024-09-03T14:02:41 1725372161

> Today however, disk space is no longer considered an issue

On desktop machines, perhaps, but this is certainly not true on all platforms Linux runs on.

Suppafly · 2024-09-03T14:53:11 1725375191

Plus the whole "space is not an issue" thing along with "you can just add more ram" is the reason everything is so bloated and slow even on well provisioned machines.

Sohcahtoa82 · 2024-09-03T16:30:40 1725381040

These days, Windows Calculator takes up more memory than mIRC.

Tell me why a simple calculator app needs more memory than a complete multi-server implementation of the IRC protocol (including SSL/TLS), not to mention a full scripting engine.

Suppafly · 2024-09-03T18:41:32 1725388892

>These days, Windows Calculator takes up more memory than mIRC.

I'd be somewhat surprised if that's actually true, but I haven't used mirc for years (started using hexchat once I was honest about the fact that I wasn't going to pay for mirc) but I think a lot of that is an inherent part of windows development now, basic c# projects with graphics end up being pretty big.

Interestingly enough, the new windows calculator is mit licensed and on github. But also it has a lot more features than most people think, it's not a 'simple calculator app', it's a full featured graphing calculator even if most people don't use those features.

Sohcahtoa82 · 2024-09-03T19:01:04 1725390064

> I'd be somewhat surprised if that's actually true

It 100% is.

I launched Calculator and according to the Processes tab in Task Manager, "Calculator" is using 31.2 MB of memory, and mIRC is taking 17.2 MB. That's with Calculator being freshly launched and no input given, compared to mIRC being connected to 1 server and in 8 channels.

If I go to the Details tab, then the story it tells is even worse. I include several metrics here:

Working Set: - Calculator: 91 MB - mIRC: 40 MB

Memory (private working set): - Calculator: 30 MB - mIRC: 18 MB

Memory (shared working set): - Calculator: 61 MB - mIRC: 23 MB

Commit size: - Calculator: 67 MB - mIRC: 49 MB

By basically every metric, mIRC uses less memory than Calculator.

> it's not a 'simple calculator app', it's a full featured graphing calculator even if most people don't use those features.

The only feature that should significantly impact the memory usage is the graphing. All its little measurement conversion options shouldn't take more than a few kilobytes. But even with the graphing, it's absurd that it takes more memory than the total memory I would have had in a 486 machine that could easily have run an app with the same features.

> but I think a lot of that is an inherent part of windows development now, basic c# projects with graphics end up being pretty big.

I suppose the price you pay for almost guaranteed memory safety and ease of development through abstractions means your base executable memory footprint includes an entire language runtime.

neonsunset · 2024-09-03T20:19:27 1725394767

> I suppose the price you pay for almost guaranteed memory safety and ease of development through abstractions means your base executable memory footprint includes an entire language runtime.

Surprisingly enough memory safety does not come with extra memory usage. Even GC itself is not incompatible with low RAM footprints even with steady allocation rate as can be seen with both .NET GC configurations and Golang.

One source of base RAM footprint is the presence of JIT compiler - JITing, profiling, then JITing again all those methods is not free, and requires more memory pages, page remapping and additional logic in memory.

It's not impactful and is desirable due to JIT advantages on long-running and complex (often server) applications but is noticeable on something like a Calculator app.

Another source of memory usage is the choices taken by a particular GUI framework - this is by far the biggest contributor of memory use. Quite a lot of them take very heavy-handed, WPF-inspired approach. There is nothing inherently wrong with it, but as was demonstrated with very fast and lightweight immediate mode GUI frameworks like Egui, there is a different way to this. I'm looking forward to putting https://www.pangui.io through its paces whenever it releases, or anything that is even remotely like this and is pure C#.

The Calculator itself seems to use quite a bit of C++ code, I have not profiled it, but I can promise you if you write two implementations of an identical ImGUI-based application with idiomatic constructs of C and C#, the memory footprint will be much closer than you think (for an AOT compiled C# version), maybe within ~5-15MiB delta of RAM once you start adding logic and allocating but unlikely much more. There is no inherent limitation within the language and GC themselves that would prevent that from being possible.

citrin_ru · 2024-09-03T16:42:44 1725381764

SSD relatively recently were not so big (compare to HDD with comparable price) and space is an issue on not so new desktops. I don't want to upgrade a notebook only because someone thinks that disk space is not an issue.

But of course this much more of an issue for embedded platforms like routers with OpenWRT.

Hizonner · 2024-09-03T15:39:55 1725377995

No, not on desktop machines either. Executables these days can be enormous.

The author's just dumb.

blenderob · 2024-09-03T14:59:50 1725375590

> argv[0] is a relic of the past

Busybox says hello.

Seriously though, how is this on the front page? Both the premise and conclusions contradict the reality of how argv[0] is used with symbolic links and hard links.

josefx · 2024-09-03T13:56:53 1725371813

Microsoft defender using broken by design detection rules? One could almost think it is an anti virus program.

dotancohen · 2024-09-03T14:09:39 1725372579

I also use argv[0] for the -h help text, to show examples how to use the command.

st_goliath · 2024-09-03T14:15:09 1725372909

There is also a neat little BSD extension, also supported on a number of other Unix-like systems and GNU userspace (i.e. glibc, but also other libcs like Musl):

    extern char *__progname;

which holds the program name without the (optional) invocation path in front of it. Basically the last path component of argv[0].

dotancohen · 2024-09-03T14:32:27 1725373947

Nice, thank you.

Brian_K_White · 2024-09-03T19:23:17 1725391397

wattttt? nice thank you

anonymousiam · 2024-09-03T14:12:00 1725372720

I've done this too, but you should remove the path elements from the argv[0] string before you include it in your error/help messages.

Brian_K_White · 2024-09-03T19:35:22 1725392122

Sometimes you want it, sometimes you don't, so it needs to be in there, and sometimes you remove it yourself if your context of the moment doesn't want it.

And neither the want-it nor the don't-want-it case is such an outlier that you can disregard and not serve that case.

Sometimes you're talking to the user about general usage and the full path is a distracting detail and not the important part of the message.

Sometimes the full path and truthful invoked filename are an unnecessary security disclosure like telling a web viewer details about the server.

Sometimes the full path and truthful invoked filename is a necessary fact in debugging, or in errors, or even ordinary non-error logs that aren't public.

jmholla · 2024-09-03T14:49:53 1725374993

You don't need to. Keeping them shows the user exactly how to call the program based on how they called it.

patrickmay · 2024-09-03T16:11:10 1725379870

Same here, as well as for showing an example invocation when the user fails to include a required argument.

kelnos · 2024-09-04T02:06:37 1725415597

> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

Says who? I'm not aware of any modern design principles that say anything about this sort of thing.

> argv[0] is ignored (mostly)

Pretty much any program I've written that has a --help option uses argv[0] to print out the usage string, i.e.:

    printf("%s [--some-arg] FILENAME\n", argv[0]);

> First off, argv[0] can be used to fool security software

Then that security software is poorly written. On Linux, the correct way to find the binary of a running process is by calling readlink(2) on /proc/$PID/exe. Assuming security software like this is going to have a lot of OS-specific code, it seems fine to me to expect they use it (and then have to do other things on other OSes).

> Another argument against this design is that if you have two programs that are so similar that it pays off to consolidate them into a single file, is there really a need for two separate programs/program names?

The author is talking about shutdown and restart being symlinks to systemctl on systemd-based systems. But what about something like busybox? busybox contains hundreds of programs, all conveniently in a single, statically-linked binary. On my system it's about 800kB. While I agree that even 250MB is not a big deal for most systems these days, it certainly is a problem for, say, a WiFi router that only has 8MB of flash.

> Ultimately, nobody wants to be bothered by argv[0].

False. I find it useful, and am not "bothered" by it at all. And I suspect security folks aren't really bothered either: the ones that actually know what they're doing look at /proc/$PID/exe when they want to find the binary backing a PID.

This article is kinda lame, and it seems like the author's objections are mostly based on ignorance.

remram · 2024-09-03T20:04:18 1725393858

> From a 2020s standpoint, this seems highly undesirable, as it makes software less predictable and goes against modern design principles.

This is not an argument at all, this is a statement that arguments exist. What are they?

It's like saying we shouldn't do something because it's "against best practices". I'm asking why are other practices preferred...

andrewmcwatters · 2024-09-03T14:07:27 1725372447

I wish amateurs would stop propagating the false idea that disk space and memory are cheap and not a problem.

layer8 · 2024-09-03T15:35:32 1725377732

If nothing else, argv[0] is useful for producing error messages that indicate the name of the executable that is outputting the message.

It's probably a good idea to not have it settable to other values by the invoking process, as is generally the case on Windows (ignoring its Posix subsystem here).

alerighi · 2024-09-03T19:05:45 1725390345

> It's probably a good idea to not have it settable to other values by the invoking process, as is generally the case on Windows (ignoring its Posix subsystem here).

Well there is an use case that I sometime use for setting argv[0]. Consider you want to run yourself as a subprocess. Why you want to do that? There are plenty of reasons, but in general the thing is that doing things after a fork() is not safe under some circumstances and thus sometimes you also want to exec yourself.

A technique is to then call yourself using another name in argv[0] for then in the main take a different flow from the normal command line parting, without adding an argument that the user can specify if it know that it exists.

Yes, I know that there are a ton of other methods to do the same thing (perhaps an environment variable, for example), but I find the method of argv[0] quite nice and simple to be fair.