
Writing a Unix Shell - luu
https://indradhanush.github.io/blog/writing-a-unix-shell-part-1/
======
chubot
FWIW I wrote a pretty complete shell in ~11K lines of Python:

[https://github.com/oilshell/oil](https://github.com/oilshell/oil) (osh/ and
core/ directories):

Right now the goal is to clone bash but have better parse time and runtime
errors. I hope to make an initial release this summer. But the project is much
larger (see the blog if interested).

~~~
enriquto
This project is one of the coolest things in the unix world today. I do not
care too much about the python implementation (writing it in C alone would be
even cooler), but the work is just awesome.

Your blog post about "vectorized, point-free and imperative style" was so
beautiful that it brought tears to my eyes. Thank you!

~~~
chubot
Wow thanks! I appreciate the encouragement. The architecture has settled, and
I'm slogging through all the features now, so that helps.

Yeah Python wasn't the original plan, but it helped me focus on correctness. I
was able to iteratively reverse engineer bash and other shells, while keeping
the code relatively clean.

I had hoped to rewrite it in native code, but it's a huge amount of effort, so
the first release will be Python. Existence and correctness are both higher
priorities than performance :)

I disguised Python by bundling a slice of the Python interpreter, which will
allow me to rewrite it over time without user-visible packaging/deployment
changes. I'm also able to fork the Python language because I have the "OPy"
compiler.

One thing that should make it faster is using integer offsets everywhere
instead of dictionary lookups, and that should be possible without much code
change. It is written in a fairly "static" style, although with some important
metaprogramming.

EDIT: I also hope to resume blogging more regularly once the code is released.
I have a big pile of ideas/notes. But getting it out there in any primitive
form is the most important thing right now.

------
teddyh
There is more to writing a shell than simply reading commands and executing
them after fork()ing. TTY handling must be done right¹ and interrupt and
signal handling is rather tricky to get right as well².

①
[http://www.linusakesson.net/programming/tty/](http://www.linusakesson.net/programming/tty/)

②
[https://www.cons.org/cracauer/sigint.html](https://www.cons.org/cracauer/sigint.html)

~~~
tyingq
The title does say "Part 1"

------
laumars
Please excuse the self promotion but I'm also writing a Unix shell in Go. It's
not designed to be a drop in replacement for Bash (ie it follows a few Bash-
isms but is essentially my own bespoke scripting language) however it does
spawn processes in their own PTY and support interactive commands such as vi
and top so it does work as a proper $SHELL. It's still very much alpha but at
a stage where I'm ready for feedback.

[https://github.com/lmorg/murex](https://github.com/lmorg/murex)

------
luckydude
Marc Rochkind's Advanced Unix Programming is a very approachable book that
walks you through, among other things, writing a shell.

It's a good book, give it a read.

~~~
vram22
Yes, good recommendation. I have mentioned it here on HN before. I had got to
read it when working in an HP joint venture, years ago. They (HP) used to ship
a complementary copy of his book with every new HP-UX business server or
workstation (CAD etc.) sold. IIRC the book even shows how to write a simple
DBMS (not RDBMS), with client and server parts. And shows how to write all
that code somewhat portably (in C) across some Unix versions that were common
then. Quite an achievement. The book is still available (in an updated version
- after 19 years!):

[http://basepath.com/aup/](http://basepath.com/aup/)

It also shows how to set up pipes - using the pipe(), dup() etc. system calls.
There are some subtleties involved in that. And that stuff is used for the
shell he creates.

Good stuff.

~~~
vram22
typo: s/complementary/complimentary (though the other meaning also holds, in
this case :)

------
xiaq
FWIW, here is a pretty complete shell mostly written by me in ~18K lines of
Go: [https://github.com/elves/elvish](https://github.com/elves/elvish)

------
reacweb
Writing a shell, like writing a compiler are very good exercises. Writing a
compiler gives a feeling of power, but I think writing a shell teaches
humility.

------
eriknstr
I have a pretty good idea for a shell but I don't know when I'll ever find
time to implement it.

The idea is that the shell should be able to optimize pipelines. Pipeline here
meaning a chain of commands piped into one-another.

So if you have a pipeline like

    
    
        grep ^2017-05-29 /var/log/somefile | grep -v 'INFO|WARN' | tail -n5 | cut -f1 -f3
    

Then the idea is that instead of piping the whole result from command to
command, your shell would compile a set of commands into a single program and
run that.

Now you might say that that sounds like it goes against the Unix philosophy,
but actually it doesn't need to. If all of the core utilities were implemented
in such a way that their logic could be extracted without duplication of code
then the shell can still be doing "one thing and one thing well".

\---

Another idea I have is to make these core utilities pipe objects instead of
text, like on Windows. I am very fond of bash but I think one thing that
Microsoft seems to have done right is to have the idea of being able to work
on objects in PowerShell. But I don't want PowerShell. I want a mostly Unix
shell, except, as I said, with objects.

I just wish I had more time :(

~~~
laumars
UNIX shells already optimise pipelines by running the commands along it
concurrently. In your example, the first grep will be outputing it's results
while it's still sucking in /var/log/somefile; and the second grep will be
parsing the output of the first grep as and when the first grep outputs.

tail is a little different because it needs to find the end of the file but
the greps certainly don't wait for each other to finish.

As for your "objects" point. The shell I'm working on (discussed elsewhere in
this thread) uses JSON to pass objects around functions written for that shell
while still falling back to standard flat text files when piping out to
traditional UNIX/GNU/whatever tools. Though I still need to make the whole
thing more intuitive.

~~~
eriknstr
>UNIX shells already optimise pipelines by running the commands along it
concurrently.

I am aware of that. Still, performing multiple operations on the data within a
single loop and with less copying would be an optimization.

~~~
laumars
I get what you're saying but technically those tools are running inside the
same single loop (as that file is only iterated the once most of that
pipeline) but distributed across multiple OS threads.

Shell scripts are surprisingly efficient at pipelined work where you're just
dealing with streaming data (as your example _mostly_ does). It's the more
complex logic they fall significantly short on.

The weakest area and why I kept saying "most" of that pipeline would be the
use of tail part way through the chain (as others have mentioned). But that
could be optimised within Bash to something like:

    
    
        tail -r somefile | grep 'expression' | egrep 'expression' | head -n number | cut ...
    

(I think -r flag on tail is GNU rather than POSIX though)

You're object point is definitely relevant though. For example how many times
have we seen shell scripts break because they don't consider spaces in
filenames?

Talking about annoyances in Bash I'd also throw in exception handling as a
major problem for traditional shells. Having Bash fail in expected ways and
handle them cleanly is a huge pain in the arse.

Those above two points (and wanting cleaner syntax for iteration) are what
drove me to write my own $SHELL so it does sound like I'm addressing some of
your annoyances - albeit not all of them. However it's a long way from being
something usable in production environments but I'm happy to access pull
requests if you do ever feel you want to contribute to something that's trying
to work towards at least some of your goals :)

------
axman6
Mark Hibberd gave a great talk at Yow! LambdaJam this yeah on writing mundane
code in Haskell, with a tutorial on writing a shell. The code can be found on
[https://github.com/markhibberd/xsh](https://github.com/markhibberd/xsh) and
the talk should be on YouTube in the near future (I hope).

~~~
v3gas
Yeah! :)

------
DroidX86
Not a very good tutorial on fork() let alone writing your own shell.

------
Spakman
Seems like a good time to point to Urchin, the Bash-ish like shell I wrote
that also allows writing Ruby directly on the command line. I've been using it
happily for years but haven't packaged up a release for a recent version:

[https://github.com/Spakman/urchin](https://github.com/Spakman/urchin)

------
bluejekyll
fork() also clones all the filedescriptors that might be open to the child. It
is the resposibility of the forking code to close filedescriptors it doesn't
need. Easy way to cause too many open files errors.

(I read this article quickly, so apologies if I missed it.)

And then there's the fun of dealing with stdout, and stderr at the same time;
in a single thread.

One more fun thing to do with fork is to learn to pass ownership of the child
process to pid 1, what the command nohup does. A parent process when it exits,
requires that all child processes exit as well.

~~~
geocar
> fork() also clones all the filedescriptors that might be open to the child.
> ... Easy way to cause too many open files errors.

fork() cannot fail with EMFILE or ENFILE[1]

[1]:
[http://pubs.opengroup.org/onlinepubs/9699919799/functions/fo...](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html#tag_16_156_05)

> It is the resposibility of the forking code to close filedescriptors it
> doesn't need.

FD_CLOEXEC can cause exec() to close file descriptors automatically.

> A parent process when it exits, requires that all child processes exit as
> well.

This is not true

~~~
bluejekyll
> fork() cannot fail with EMFILE or ENFILE[1]

This is true. But if you believe you're only closing files or sockets in the
child process, then running into this issue is very simple.

> > A parent process when it exits, requires that all child processes exit as
> well.

> This is not true

Sorry, you're right I was thinking of the SIGHUP to the child, it doesn't
actually require an exit.

------
temporary_art
We did this as a class assignment in undergrad, definitely one of the most
useful projects in college.

------
wruza
This one is an incorrect fork() tutorial, not the writing of unix shell in any
way.

~~~
b6
Please just explain what's wrong. It's more constructive.

~~~
dewyatt
I can point out one thing:

[http://rachelbythebay.com/w/2014/08/19/fork/](http://rachelbythebay.com/w/2014/08/19/fork/)

~~~
tyingq
Not mentioned there, but aside from kill() doing something special with -1,
waitpid() also does something special with -1. The referenced article code
could/would pass -1 to waitpid().

------
jrimbault
How did this get 86 points (as of writing this) ?

It's basically the third chapter of every C course every CS student will get.
And not very well done.

~~~
marssaxman
Perhaps this "third chapter" is not as universal as you expect; nor, I
believe, is a formal CS education as universal among the HN population as you
seem to be assuming.

Diving in and trying it out is a good way to learn. Sharing your results can
be helpful for others, even if you haven't learned very much yet; it can be a
good starting point, and others can correct your misunderstandings and comment
on areas you might investigate further.

I'm always happy to see people who have enough intellectual humility to show
their incomplete work as they go, so that others can benefit from the learning
process.

~~~
LyndsySimon
Agreed.

I've been a professional dev for a decade now, but failed out of college due
to mental health issues (depression). I know a ton about stuff higher in the
stack, but I've never spent much time writing C and mucking around with the OS
itself.

This project is interesting enough to me to be worth the time to read it, and
it's given me the idea of writing my own shell as a means of learning a Nim.

