Hacker Newsnew | past | comments | ask | show | jobs | submit | inlets's commentslogin

Why would the author think that the PATH environment variable is being used by the kernel? What an odd assumption.


Ignorance leading to assumptions. Their eureka moment: "The shell, not the Linux kernel, is responsible for searching for executables in PATH!" makes it obvious they haven't read up on operating systems. Shame because you should know how the machine works to understand what is happening in your computer. I always recommend reading Operating Systems: Three Easy Pieces. https://pages.cs.wisc.edu/~remzi/OSTEP/


The thing is, though, that PATH being a userspace concept is a contingent detail, an accident of history, not something inherent to the concept of an operating system. You can imagine a kernel that does path searches. Why not?

There's a difference between something being a certain way because it has to be that way in order to implement the semantics of the system (e.g. interrupt handlers being a privilege transition) and something being a certain way as a result of an arbitrary implementation choice.

OSes differ on these implementation choices all the time. For example,

* in Linux, the kernel is responsible for accepting a list of execve(2) argument-words and passing them to the exec-ed process with word boundaries intact. On Windows, the kernel passes a single string instead and programs chop that string up into argument words in userspace, in libc

* in Linux, the kernel provides a 32-bit system call API for 32-bit programs running on 64-bit kernels; on Windows, the kernel provides only a 64-bit system call API and it's a userspace program that does long-mode switching and system call argument translation

* on Windows, window handles (HWNDs, via user32.dll) in IPC message passing (ALPC, in ntoskrnl) are implemented in the kernel, whereas the corresponding concepts on most Linux systems are pure user-space constructs

And that's not even getting into weirder OSes! Someone familiar with operating systems in general can nevertheless be surprised at how a particular OS chooses to implement this or that feature.


> The thing is, though, that PATH being a userspace concept is a contingent detail, an accident of history, not something inherent to the concept of an operating system. You can imagine a kernel that does path searches. Why not?

Right. You can't be sure that someone didn't stick $PATH expansion into glibc, or something. Because someone did.

QNX gets program loading entirely out of the kernel. When QNX is booted, initial programs and .so files in the boot image are loaded into memory. That's how things get started. Disk drivers, etc. come in that way, assuming the system has a disk.

Calling "exec.." or ".. spawn" merely links to a .so file that knows how to open and read an executable image. Program loading is done entirely by userspace code. Tiny microkernel. The "exec.." functions do not use the PATH variable.[1]

However, "posix_spawn" does read the PATH environment variable, in both QNX [2] and Linux.[3] Linux, for historical reasons, tends not to use "spawn" as much, but those are the defined semantics for it. QNX normally uses "spawn", because it lacks the legacy that encouraged fork/exec type process startup. "posix_spawn" is apparently faster in modern Linux, especially when the parent process is large, but there's a lot of fork/exec legacy code out there.

"posix_spawn" comes from FreeBSD in 2009, but I think the QNX implementation precedes that, because QNX's architecture favors "spawn" over "exec.." It may go back to UCLA Locus.

Windows has different program startup semantics. Someone from Windows land can address that. MacOS has a built in search path if you don't have a PATH variable.[5]

[1] https://www.qnx.com/developers/docs/8.0/com.qnx.doc.neutrino...

[2] https://www.qnx.com/developers/docs/8.0/com.qnx.doc.neutrino...

[3] https://www.man7.org/linux/man-pages/man3/posix_spawn.3.html

[4] https://www.whexy.com/posts/fork

[5] https://developer.apple.com/library/archive/documentation/Sy...


* in Linux, the kernel is responsible for accepting a list of execve(2) argument-words

Yes it does, but the more surprising thing is (coming from AmigaOS with its dos.library function ReadArgs()) that the shell does this. The shell is also responsible for argument expansion - madness!

On AmigaOS, when you type "delete foo#? force", the shell passes the entire command line to the delete command. The delete command calls ReadArgs() with a template (FILE/M/A, ALL/S, QUlET/S, FORCE/S), and the standard OS function parses it into lists of files, flags, keyword arguments, etc. The "file" passed is "foo#?", and the command uses MatchFirst()/MatchNext() to do file pattern matching.

Every command (that uses ReadArgs() and didn't plump for "standard C" parsing) has the same behaviour: running the command with "?" gives you the template, which tells you how to use it. Args are parsed consistently across all programs.

Then you get "standard C", which because K&R and main(), ignores this standard Amiga parsing function and just does naive splits. Across multiple Amiga C compilers, quoting rules are inconsistent. Amiga C compilers have to produce an executable, and it knows it'll be called with a full command line, so the executable itself has to break that into words before it can call main(), and it's up to each compiler writer how they're going to do that. Urgh.

In unix-land, it's up to the shell to parse the command line, and pass only the words... hence why the shell naturally does all the filename globbing, and why you have gotchas like when these two commands are sometimes the same and sometimes they're not:

    find . -name foo*
    find . -name 'foo*'
Then we have Windows, which is like Amiga C programs - it's being passed a full command string and will have its C runtime parse it for main() to consume. There's a vague expectation that it'll do quoting "like COMMAND", which itself has very odd quoting rules. At least, most people are all using the same C compiler on Windows, so it's mostly only MSVCRT's implementation so it's mostly consistent.


Username checks out.

I think one of the most surprising things I learned about bash is that you can do this:

    touch ./-rf
    rm *
And now you have rm -rf'd. :)


We should use "--" more, but who has all this time to waste? :)


Indeed, always prefer ./* to *

I often wish there was a convenient way of doing such an operation in the shell: if path start with "/", leave it, otherwise prepend "./"


> I often wish there was a convenient way of doing such an operation in the shell: if path start with "/", leave it, otherwise prepend "./"

Both bash and zsh have enough functionality exposed via shell functions and variables for you to define a keybinding that does exactly this, interactively. Good idea.

Did you mean an interactive command? Or something else?


I meant non-interactive, for use in scripts which take user input. We already have "--" for end of options, but the support for it is not universal and even with that some programs will interpret certain strings in a special way. On the other hand, prepending the dot-slash should work for any program or argument passing style.


Prepend for all paths on a command line? Or just for the executable?

For all paths it could be dangerous and should very probably not be done. But for executables it's less dangerous and can easily be done by putting '.' into $PATH.


Well, execve(2) and execvp(3) are both "system" functions. C (which is already black magic for some people) invokes both by calling into functions exported from libc. If you're not super dorky^Wfamiliar with low-level systems stuff, you might guess that the two functions are implemented in the same place and in the same way. That the latter is just a libc wrapper around the former that does a PATH search is arcane detail you don't have to care about 99% of the time.

It's hard to appreciate how the world looks before you learn a fact. You can't unsee things.


But the man page section tells you which one is is a a kernel syscall (2) and which is a C library function (3)...


Which is the universally known convention everyone is born with inherent knowledge of. Also, people read man-pages.


What person diving into their shell's source code on Linux doesn't read manpages? Or even man's manpage at least once?


Daniel Huang, the one that wrote TFA? People are different, I don't know what else to tell you. But generally, people don't read man pages or docs.


One thing I was surprised to learn a couple years ago is that users and groups aren't really tracked much by the Linux kernel: they're just numeric IDs that track process and file ownership. So if you setuid() to a user ID that doesn't exist in /etc/passwd or anywhere else, the kernel won't stop you.


If I have a file on machineA with uid10001 and I copy the file to machineB, I might want it to retain that uid, but it shouldn't matter to machineB that it doesn't map to a real user.


Hopefully that user actually doesn’t exist on the second machine!


You’ll see this observation all the time building containers.


Don't if you only run them with root user.


or with ipa-esque authentication schemes and shared mounts


And NFS!


Unnecessarily rude. There was also a time when you didn’t know this. I can guarantee it!


The question is why author wrote such a clickbait title and made such an odd conclusion? Legitimate question IMO, nothing rude there.

It's not about knowledge, but about assumptions. The title and conclusion hint that there are some obvious assumptions, but these are not detailed. Maybe author assumed that because of the ubiquitous use of PATH across shells, it had to be managed centrally.


I don't think it's an odd assumption at all! The lines between shell, exec calls, globbing, etc, are very blurry if you don't already know how it all fits together.


Why not? Every executable is started with execve(2) syscall which takes an array of the environment variables that the kernel use to reset the process's environment variables it inherited from its parent, so obviously the kernel has full knowledge of the environment variables of all of the processes in the system.

Now, there is a reason why kernel actually does not have such knowledge, but it's not at all unreasonable to assume that the kernel has it.


The thing that really blows minds is the fact Linux does not do name resolution at all. Getting rid of glibc breaks a lot of stuff because everyone depends on glibc to do it.

https://wiki.archlinux.org/title/Domain_name_resolution

https://en.wikipedia.org/wiki/Name_Service_Switch

https://man.archlinux.org/man/getaddrinfo.3


You and I and bunch of other people know it and take it to be self-evident, but someone discovered it (maybe recently, maybe they have known it for a while) and did the nice write up for people who had not have known that yet. https://xkcd.com/1053/


The lucky 10,000 is a positive take on the situation. But the article using "real," which I think would connote to "legitimate" to most, seems a little more polarizing that sharing a discovery.


Click bait title for sure


That's not a truth that'd come from first principles, never mind a trivial truth; it's extremely trivial to imagine a kernel that does parse PATH where it wouldn't be true.

As such, it's a thing one has to explicitly look up to know, which the author did.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: