
How does `cd` work? - SteBu
https://blog.safia.rocks/post/171311670379/how-does-cd-work
======
jsnell
As a historical curiosity, in the very first versions of Unix chdir was a
normal command rather than a shell builtin.

You see, at that point Unix had no fork system call. There were multiple
processes, but they were created statically at startup rather than on-demand.
Running a command in the shell would cause the command to replace the shell in
the address space of the process, and the process quitting would put the shell
back in there.

This worked perfectly with cd being a normal command. Then they implemented
fork() and were for a while very confused trying to debug how in the world
fork() could have broken the chdir() system call :)

~~~
clord
This is a really common way for software to become better.

The pattern of development:

    
    
      - Make A   
      - Make B better   
      - A now broken because it had dependencies on B   
      - (maybe) Realize a fundamental pattern that unifies A and B.
      - (maybe) Unify principles so that A and B become lemmas.
    

This is a very scientific way of working. The truth emerges as the byproduct
of keeping all the plates spinning in your model. As long as you remain aware
of all the plates, you'll eventually realize something deep. It's also the
strongest argument I can think of in favour of rewriting a codebase plan-9
style. Once you have the truth in mind, bake your knowledge for big wins.
Baked knowledge can form the foundation upon which you can climb higher.

But many developers don't keep all the plates spinning. If you don't have
everything (or some subset) making sense in some unified theory, don't
rewrite. Without a unified grand theory, then you can't make something better.
This is why I write unit tests — doing so keeps your plates spinning. It's not
about proving your language and libraries are doing what is expected (although
that's valuable too in some languages without unified libraries) — it's about
showing that everything is working together the way your grand theory intends.
Unit tests should be written to reject some part of the null hypothesis.

~~~
emmanuel_1234
This is such a cool way of presenting it, I'll reuse that. Thanks!

------
feelin_googley
Interesting how the author was looking at "cd" obviously sourced from FreeBSD
project (perhaps found on a Mac computer running XNU kernel) and then, to
investigate further, she consulted the source code for _the Linux kernel_ to
learn about "the" chdir syscall.

Would the title "How does 'cd' work _in MacOS_?" be better for readers wishing
to learn, or worse? I wonder.

The blog posts from her friend at jvns.ca have been similarly Linux-centric
but never use titles that inform the reader as such, i.e., "How does X work?"
versus "How does X work in Linux?"

It makes me ask, how important is it for students to be aware of the Holy
Grail of UNIX: portability.

How long should one remain blissfully ignorant of incompatibilities between
UNIX-like OS that work against those "still seeking the Holy Grail"?

As a user, the UNIX programmers I admire the most are the ones who understand
the value of portability, set it as a goal and through broad knowledge and
utmost care can get very close to achieving it, despite the mind-numbing work
or the tradeoffs this might entail.

~~~
arnoooooo
I find it sad that people that are so curious about how things work end up
buying Macs, and therefore unwillingly contributing to less openness and
"studyability".

~~~
stuaxo
It doesn't stop them using Linux - it does make it more of a pain though.

------
_zachs
Not one character in the article is actually devoted to how `cd` works...it's
just the story of how the author found out where to find the code for it.

An article much more worthy of the title "How does `cd` work?" would maybe,
you know, actually go through the code that makes `cd` work.

~~~
Someone1234
I don't know if the "Update" by Julia Evans was in the article when you posted
this. But I'd point to that as a piece of useful insight about how it works at
the syscall level.

~~~
abhishekjha
>SO!! If you had a /usr/bin/cd program that ran chdir, that would be fine, but
when you started it it would change its own working directory and exit which
is not very helpful. It wouldn’t change the working directory of you (the
parent process)

I am a little confused. What is the parent process and why does that matter
here?

~~~
Someone1234
The parent is the shell that called the theoretical /user/bin/cd. It means
that a cd program would get launched by the shell, change its own working
directory and then terminate, which is completely useless.

What you want cd to do is change the caller's (or parent's) working directory,
or in this case the shell's working directory.

There are hacks to do it that you can find elsewhere in this thread (process
injection and debug APIs to name two) but frankly the shell changing its own
working directory via a builtin is a much simpler and more reliable solution
that is also cross-platform.

~~~
abhishekjha
Ah. This makes sense. Thanks. I will look into the hacks.

------
jwilk
Instead of which(1), you could use type, which is aware of builtins and
aliases:

    
    
      $ type cd
      cd is a shell builtin

~~~
chaosfox
if you use zsh on the other hand..

    
    
      % which which
      which: shell built-in command

------
kazinator
> The cd builtin is invoked as part of the Bash shell.

> The Bash shell invokes the chdir function.

Nope! Not quite.

Bash:

    
    
      ~$ ls -ld foo
      lrwxrwxrwx 1 kaz kaz 4 Mar  1 6:50 foo -> /etc
      ~$ cd foo
      ~/foo$ pwd
      /home/kaz/foo
      ~/foo$ cd ..
      ~$ pwd
      /home/kaz
    

Raw chdir and getcwd syscalls:

    
    
      $ pwd
      /home/kaz
      $ txr
      This is the TXR Lisp interactive listener of TXR 190.
      Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
      1> (chdir "foo")
      t
      2> (pwd)
      "/etc"
      3> (chdir "..")
      t
      4> (pwd)
      "/"
    

See the difference? While of course the shell will invoke _chdir_ , it has its
own idea of a current working directory, and translates the argument of _cd_
to something else. For instance _cd .._ doesn't translate to _(chdir "..")_.

~~~
wodenokoto
She says that it is all a ton of functions that sanitize input and implement
logic before calling chdir.

So the article does point out that CD isn't a direct alias for chdir

------
postit
Nice write up, but the addendum cached my eye.

\-- Can we nominate Julia Evans to the most positive influence in tech prize
or something?

------
saagarjha
Unrelated: it seems like this site has some sort of "debug" script set up that
downloads half a dozen copies of some test image ("r20-100KB.png"). Safia, if
you're on here, you might want to get it checked out since it's more than
doubling the amount of time it takes your site to load.

------
jwilk
Here's cd implemented as a stand-alone program:

[https://github.com/robertswiecki/extcd](https://github.com/robertswiecki/extcd)

~~~
ScottBurson
This looks fiendishly clever, if it really works, but I don't have time to dig
through the 'ptrace' man page to figure out what it's doing. Can someone
summarize?

~~~
saagarjha
Let me give it a shot: ptrace(2) allows processes to control other processes
for debugging (for example, GDB and LLDB use it). What it's doing is gaining
privileges to debug your shell process, using this privilege to gain control
over its memory, and then just copying over the directory string to the right
spot so that the shell thinks it has a new working directory.

~~~
asveikau
Actually looks like it copies the path into the address space of the parent,
backs up some register state, tickles the i386 syscall interface by setting
registers to call chdir(2), then restores the registers to their original
state, resuming the program already in progress.

------
banana_giraffe
I long ago wrote a cd replacement for Windows. It does a few clever things I
like, but I always felt slightly unclean for how it works. Since it's external
to cmd, it has to find the cmd parent-process that launched it, write a thread
into the process space, then launch that thread that does the call to
SetCurrentDirectory and updates the necessary environment variables.

It works, but I'm not entirely sure it should work.

------
forkandwait
I used the first edition, but I think all you need is this:

[https://www.amazon.com/Advanced-Programming-UNIX-
Environment...](https://www.amazon.com/Advanced-Programming-UNIX-
Environment-3rd/dp/0321637739)

~~~
Firerouge
Or try this one if you'd like to build your own from scratch

[https://www.amazon.com/Design-Implementation-MTX-
Operating-S...](https://www.amazon.com/Design-Implementation-MTX-Operating-
System/dp/3319175742)

------
cat199
> I decided to dive into the code for the Bourne shell to see what I might be
> able to figure out about these builtins. I came across the definition of the
> cd builtin here. (link to:
> [http://git.savannah.gnu.org/cgit/bash.git/tree/builtins/cd.d...](http://git.savannah.gnu.org/cgit/bash.git/tree/builtins/cd.def))

no, you came across the definition of bash's 'cd' builtin there..

On FreeBSD with a source checkout, the definition of /bin/sh's (the 'ash'
shell,
[https://en.wikipedia.org/wiki/Almquist_shell](https://en.wikipedia.org/wiki/Almquist_shell))
cd builtin is in:

/usr/src/bin/sh/cd.{c,h}

or, on the web:

[https://svnweb.freebsd.org/base/head/bin/sh/cd.c?revision=32...](https://svnweb.freebsd.org/base/head/bin/sh/cd.c?revision=320340&view=markup)

similarly, most anything in base is available in:

/usr/src/{path with '.' instead of slash}/{command name}

which is by the way, very handy, and in combination with having all your self
built packages in /usr/ports/distfiles/<package>.tar.<whatever> a big plus of
running a BSD system from source..

~~~
wahern
Yet another example where the BSD code is simpler and clearer.

I don't think it's particularly controversial to say that when you want to
explore how the internals of unix work, you're almost always better off
reading the BSD source code first. Even if you don't run a BSD. (musl libc is
great, too, but limited to libc.)

I do most of my systems programming for Linux platforms, but when I have a
question about semantics my first stop is POSIX to learn how it _should_
behave. My second stop is the BSD source so I can quickly grok the mechanics.
Lastly would be the Linux or GNU code, to confirm exact behavior. Setting
aside the fact that Linux or GNU code usually feels like it was written
inside-out and upside-down, you can't well understand why and how something
works without having a more general understanding of the problem and solution
space. This applies to everything in life, but in the context of systems
software programming I've developed a very concrete process.

Having a copy of POSIX locally (greppable, but also the local HTML frames
version is super easy to navigate), as well as easy access to BSD code in
/usr/src, can make this a very fast and efficient process. Much faster than
Googling, wading through Stack Overflow, and other haphazard habits.

------
seiferteric
Huh, makes perfect sense about changing the shell processes directory. I
always wondered why some commands were built-ins. Thanks!

------
ataylor32
Random piece of info about cd:

cd.. (no space between "cd" and "..") works on Windows

~~~
Ao7bei3s
Or try zsh with a good pre-made config like grmlzshrc[1]. It allows changing
to a directory by typing only its name (or path). This includes the ..
directory. Doesn't get faster.

[1] [https://grml.org/zsh/](https://grml.org/zsh/)

~~~
orf
Or use Autojump[1], changed my life

1\. [https://github.com/wting/autojump](https://github.com/wting/autojump)

~~~
cdancette
This is like the zsh plugin 'z'

------
Endy
Okay, but now I want to know how it works on something other than (foo)Nix,
because that's not an OS space I care about. What about how it works in CP/M
or foo-DOS? I've never used a *Nix computer, but I know that I used 'cd' going
back to the time of using a TRS-80 CoCo.

I know it's a low-level system call in most on-disk operating systems to
change directories. But, for instance, how does it translate physical address
to the human-readable name? Do modern (WinXP+) implementations actually take
the time to translate folder names from 8.3 to the extended name field? How?

Heck, I'll be dumb enough to ask - why, specifically at the call level, does
'cd\' stick to one level up/down whereas 'cd ' can just pull from just about
anywhere? And why, dare I ask, can 'cd' not display like 'tree'?

These are questions about how 'cd' works, to me. Not just, oh, in (foo)Nix
it's a system call.

~~~
JdeBP
What you want is a question and answer WWW site.

* [https://superuser.com/a/380231/38062](https://superuser.com/a/380231/38062)

* [https://unix.stackexchange.com/a/251215/5132](https://unix.stackexchange.com/a/251215/5132)

... and so on.

~~~
Endy
I appreciate the information honestly, but I meant it mostly in the sense that
those would be some of the things I'd expect to see in a "how does 'cd' work?"
post.

------
jng
The funny thing here: why does cperciva, of tarsnap fame, show up in the
source code of the CD script? I saw it on the article, checked it on my Mac,
and it does indeed show up. Intriguing.

~~~
jwilk
Mac's userland is based on FreeBSD. Colin Percival is a FreeBSD developer who
last modified this file:

[https://github.com/freebsd/freebsd/commit/0bc1bed704cc7b7292...](https://github.com/freebsd/freebsd/commit/0bc1bed704cc7b7292be893f0c8c2b9f8f6a4b60)

------
jsnk
I tried to create a command line tool that mimics some behavior of a bookmark
manager in the terminal.

[https://github.com/serv/lbm](https://github.com/serv/lbm)

Unfortunately, I learned that there's is no way to invoke cd programmatically
from a program, but I didn't get an explanation I could understand why this is
not possible.

Can someone explain why you can't invoke cd from a program?

~~~
jsweojtj
I have had this snippet in my `.bashrc` for years. No idea who to give credit
to:

    
    
        #    eg. save mc
        #    cd mc # no '$' is necessary
    
        if [ ! -f ~/.dirs ]; then  # if doesn't exist, create it
            touch ~/.dirs
        fi
    
        alias show='cat ~/.dirs'
        save (){
            command sed "/!$/d" ~/.dirs > ~/.dirs1; \mv ~/.dirs1 ~/.dirs; echo "$@"=\"`pwd`\" >> ~/.dirs; source ~/.dirs ;
            source ~/.dirs  # Initialization for the above 'save' facility: source the .sdirs file
        }
        source ~/.dirs  # Initialization for the above 'save' facility: source the .sdirs file
        shopt -s cdable_vars # set the bash option so that no '$' is required when using the above facility
    

What this does is, whenever you're in a directory that you'd like to
"bookmark" as you'd call it. Just type `save whatevername`. Then, when you
navigate somewhere else you can type `cd whatevername` and it'll change you
back there. It's simply adding to this ~/.dirs file and so overwriting is
taken care of by just saving the same name in a new (or the same) directory.
It just appends a line.

Also, you can just type `show` and any point and it'll tell you what you've
saved and where.

The nicest thing is that this persists with new logins (the only drawback is
that other shells that are running don't get the update automatically).

~~~
orf
[https://github.com/wting/autojump](https://github.com/wting/autojump) does
something similar, just automatically (and better, IMO).

------
LukeShu
Not to be negative, but for learning: There are a few "problems" with this
snippet of the article:

\----

    
    
        $ which cd
        /usr/bin/cd
        $ cat /usr/bin/cd
        #!/bin/sh
        # $FreeBSD: src/usr.bin/alias/generic.sh,v 1.2 2005/10/24 22:32:19 cperciva Exp $
        # This file is in the public domain.
        builtin `echo ${0##*/} | tr \[:upper:] \[:lower:]` ${1+"$@"}
    

Oh, bother! Reading shell scripts can be such a hassle sometimes. I know the
tr command is used to translate characters. In this particular case, the
second half of the command, the part after the pipe symbol, basically converts
the command cd dev to CD dev. I have no idea why this is. In any case, this
modified command is passed to the builtin command which is handled by the
shell (Bourne shell) that we are using.

\----

1\. The first mistake is typing `which cd`. `which` is a separate program that
looks things up in $PATH, which may not actually be what happens when you run
the command. You should have used `type cd`:

    
    
        $ type cd
        cd is a shell builtin
    

As you discover later in the article, `cd` _must_ be a shell builtin. Which
makes it a little mysterious (and interesting!) why the file /usr/bin/cd
exists; it won't really do anything, try it:

    
    
        $ pwd
        /home/lukeshu
        $ /usr/bin/cd /usr
        $ pwd
        /home/lukeshu
        $ # but it will print error messages
        $ /usr/bin/cd /bogus
        /usr/bin/cd: line 4: cd: /bogus: No such file or directory
    

So, why does /usr/bin/cd exist? The comment with the CVS ID gives us a hint:
It's a common "src/usr.bin/alias/generic.sh" that is copied (hard-linked) in
to /usr/bin for several shell builtins (
[https://github.com/freebsd/freebsd/blob/0bc1bed704cc7b7292be...](https://github.com/freebsd/freebsd/blob/0bc1bed704cc7b7292be893f0c8c2b9f8f6a4b60/usr.bin/alias/Makefile)
). For other builtins that don't _need_ to be builtins, it makes sense; let
other programs call them with exec. For `cd` it doesn't make much sense
though, and I'm not sure why it exists. Is it just for consistency with other
builtins, or does it serve a real purpose? IDK.

(edit: the short answer is "POSIX says so"
[https://github.com/freebsd/freebsd/commit/55d0b8395514ae4055...](https://github.com/freebsd/freebsd/commit/55d0b8395514ae4055e7af8e4e9812637dbcc463)
, but _why_ does POSIX say so? See my child comment for further citation.)

2\. The second mistake is about what `tr` is doing. You claimed it's
converting lowercase to uppercase; but that's backward, it's converting
uppercase to lowercase.

So, _why_ does it convert to lowercase? Recall that we learned that it's the
_same_ script being used for all builtins. If it weren't literally the same
file (at the cost of a few more bytes disk space), it could have just done a
search/replace within a template, having each be `builtin BUILTIN_NAME
${1+"$@"}`. But they wanted to save a few bytes, and instead the script must
_detect_ the appropriate builtin name by translating its program path to a
builtin name. If you execvp("cd", ...), it will invoke the script with $0 set
to "/usr/bin/cd". If /usr/bin is on a case-insensitive filesystem, and you
execvp("CD", ...), that will also call the script, with $0 set to
"/usr/bin/CD". How is it going to translate from "/usr/bin/CD" to "cd"? The
##*/ bit trims the leading directories, then the tr bit converts the remainder
to lower case.

(as an aside: the `${1+"$@"}` is a little interesting too; why not just write
`"$@"`? "$@" will expand to the full list of arguments (after argv[0]). The
${1+...} bit says to only do that expansion if the first argument exists
(i.e., there are >= 1 arguments). But that should basically be happening
anyway; if there are no arguments, "$@" should expand to a zero-length list.
IDK, perhaps a weird historical shell?)

~~~
jwilk
[https://www.gnu.org/savannah-
checkouts/gnu/autoconf/manual/a...](https://www.gnu.org/savannah-
checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Shell-
Substitutions.html)

 _One of the most famous shell-portability issues is related to "$@". When
there are no positional arguments, Posix says that "$@" is supposed to be
equivalent to nothing, but the original Unix version 7 Bourne shell treated it
as equivalent to "" instead, and this behavior survives in later
implementations like Digital Unix 5.0.

The traditional way to work around this portability problem is to use
${1+"$@"}._

~~~
LukeShu
Thanks for digging that up! But for scripts written for FreeBSD in 2002 [1], I
have to wonder which non-POSIX-y shells they were worried about.

[1]:
[https://github.com/freebsd/freebsd/commit/55d0b8395514ae4055...](https://github.com/freebsd/freebsd/commit/55d0b8395514ae4055e7af8e4e9812637dbcc463)

------
__david__
> I started, as I usually do, by searching for the term “chdir” using the
> GitHub search bar.

That's not bad, but I'd also suggest checking man pages for syscalls. `man 2
chdir` will give the some documentation on both Mac and Linux OSes and call
out the specs that are relevant to the call. (Why `man 2`? That searches
section 2 of the man pages which is dedicated to syscalls. How on earth would
someone know that? `man man` of course. :-) )

On linux, an interesting thing is that you can inspect a process's current
working directory (IE the last thing they chdir()ed to) by looking in
/proc/<pid>/cwd/. It presents itself as a symlink to the actual current
directory as stored in the kernel.

------
205guy
On the one hand, I admire the search for knowledge, taking things apart and
seeing how they work. On the other hand, this seems a bit cargo-cultish. It's
as if I took apart a record player, trying to see how it plays music, and I
told you it works because the motor turns the the record.

In this case, the author got tangled in the shell script and code, and totally
missed the whole subtlety and complexity of the Unix architecture. I expected
to see something about the processes, file system, and directory entries. Even
after the author added key information that was emailed to him by a reader, he
didn't really follow up or try to understand more. I understand that not all
coders have or need a degree in computer science, but it really surprises me
what this author doesn't know (and doesn't know he doesn't know).

Another example from his next blog about 'ls': "I’ll admit, scrolling through
all this C-code can be a little tiresome. Oh, how I miss the days when all I
had to do was read JavaScript source! Because C gives you so little out of the
box, a lot of the code that you end up reading is not that interesting. It’s
largely the kind of stuff that higher level languages implement in their
standard library." [[https://blog.safia.rocks/post/171381157060/looking-into-
ls](https://blog.safia.rocks/post/171381157060/looking-into-ls)]

Here's a book about Unix, among many (I got this one as a CS graduation
present): [https://www.amazon.com/Magic-Garden-Explained-Internals-
Rele...](https://www.amazon.com/Magic-Garden-Explained-Internals-
Release/dp/0130981389)

~~~
hzhou321
I had the same sentiment as yours, but the blog is fine (for the author and
appropriate audience).

I am quite surprised/curious how it got to top page here.

~~~
peterwwillis
HN is not immune to cargo cults. SV and modern "tech culture" applauds people
for writing blog posts like this: "Today I decided to find out what makes the
sky blue! I just thought I'd write a blog post about it." Then the culture is
reinforced, as when you mention how it's stupid to upvote the blog of someone
who literally admits they do not know what they are talking about, you get
shouted down for not handing out participation trophies. And yes, I realize
how rude this comment is, and how stereotypically anti-millennial it is. But
it's what has been happening on HN for years.

------
saagarjha
> For one, the source for Bash does not have a mirror on GitHub

Here's one: [https://github.com/bminor/bash](https://github.com/bminor/bash)

------
RickJWag
Nice! Great article fro HN.

------
qubex
Well... that was... anticlimactic.

~~~
arthurcolle
just like reading plot summaries on wikipedia

