The shell should be significantly easier than say writing a compiler, although I think it's less well documented.
Much of the work in writing a shell is parsing -- I would estimate that parsing is 60% of the work, whereas it might only be 10-20% of the work of a compiler. This is both because interpreters are smaller than compilers (fewer transformations), and because shell is harder to parse than most programming languages.
But as you can see from my blog, it requires a few different parsing techniques, and there are some non-obvious choices I made that I think made it easier. (e.g. Lexer modes aka lexical state are a huge win for reducing complexity.)
The shell uses surprisingly few system calls -- fork/exec/wait, open/close/read/write, pipe/dup/fcntl, and that's almost it. These are well worth studying. File descriptors are non-obvious and essential.
I never took the OS class that most people take, which shows you how C and Unix work. This thread lists a few, and also check out xv6 in the parent thread:
I learned a lot about system calls by using strace on many programs over the years. I think Python helps a lot because it is interpreted and has relatively direct bindings to Unix system calls.
FWIW, although I learned interesting things about parsing, I don't think there is that much "hard" in the computer science sense about shell. It's mainly an exercise in software engineering -- how do you keep your program from degrading into an enormous mass of poorly-debugged if statements? (I would classify bash in that category, unfortunately)
As far as data structures, you should be able to write a shell without anything fancy at all. In fact some shells are basically just tons of linked lists. Linked lists make memory management easier in C, although I'm using a more modern, high-level style.
I also spent a significant amount of time reading bash, dash, and mksh code for this project. (and to a lesser degree zsh).
To be honest the biggest "gotchyas" I stumbled across when writing my shell was handling PTYs. In the end I used someone else's module instead of writing my own because they supported several different flavours of UNIX as well so it seemed a pointless duplication of effort for me to roll my own. However it was real humbling when I discovered just how little I actually new about PTYs compared to what I thought I knew.
I don't know anything about PTYs either, since my shell is mostly a batch program like Perl right now. I'm viewing it first from a programming language perspective.
As mentioned, I'm deferring to GNU readline for interactive features. I'd be interested in what problems you had with PTYs and what module you used?
Are PTYs standardized by POSIX? It feels like you should be able to write a shell against only POSIX APIs (and ANSI C).
I'd also be interested to see your shell. There are a bunch of alternative shells and POSIX shell implementations here:
EDIT: I saw the sibling comment linking to https://github.com/lmorg/murex. Taking a look! A few months ago I started a thread with the authors of elvish, oh, mash, and NGS shells to exchange ideas. (elvish and oh are also written in Go.) I didn't know about your shell or I would have included you!
You'd think PTYs would be standardised - I certainly assumed they would be - but the process of assigning a PTY seemed to differ from UNIX to UNIX and I couldn't find any decent documentation that discussed PTYs in a consise, cross-platform, way. So started to look if anyone in the Go community had also played around with PTYs. The package I used was https://github.com/kr/pty. The code to be surprisingly readable too.
The problems I had was initially not even realising that many command line tools check if they're outputting to a PTY. This affected their execution behaviour. I'm sure this is all stuff you're already familiar with but a few examples I noticed were grep wouldn't highlight it's match, ls wouldn't output in multi-lined view and apt-get wouldn't give you many (any?) of it's interactive options. I also wanted tools like vi and top to function the same in my shell as they would in Bash but they couldn't without me assigning a PTY (I also needed to put the terminal into "raw mode" to passthrough sigkill and disabling echoing from STDIN - but thankfully that was very easy in Go)
I've played around a little with GNU readline. It's a really nice tool but I had a few cross compilation issues in Go when porting it to other platforms (eg FreeBSD). So I used an entirely Go package instead - which isn't without it's own bugs but at least it keeps my shell fully portable.
> A few months ago I started a thread with the authors of elvish, oh, mash, and NGS shells to exchange ideas. (elvish and oh are also written in Go.) I didn't know about your shell or I would have included you!
Thank you. My shell is only about 3 months old though (possibly less as I only named it 2 months ago) so likely wouldn't even have existed when you started your thread. :)
Much of the work in writing a shell is parsing -- I would estimate that parsing is 60% of the work, whereas it might only be 10-20% of the work of a compiler. This is both because interpreters are smaller than compilers (fewer transformations), and because shell is harder to parse than most programming languages.
But as you can see from my blog, it requires a few different parsing techniques, and there are some non-obvious choices I made that I think made it easier. (e.g. Lexer modes aka lexical state are a huge win for reducing complexity.)
The shell uses surprisingly few system calls -- fork/exec/wait, open/close/read/write, pipe/dup/fcntl, and that's almost it. These are well worth studying. File descriptors are non-obvious and essential.
I never took the OS class that most people take, which shows you how C and Unix work. This thread lists a few, and also check out xv6 in the parent thread:
https://news.ycombinator.com/item?id=13112960
I learned a lot about system calls by using strace on many programs over the years. I think Python helps a lot because it is interpreted and has relatively direct bindings to Unix system calls.
FWIW, although I learned interesting things about parsing, I don't think there is that much "hard" in the computer science sense about shell. It's mainly an exercise in software engineering -- how do you keep your program from degrading into an enormous mass of poorly-debugged if statements? (I would classify bash in that category, unfortunately)
As far as data structures, you should be able to write a shell without anything fancy at all. In fact some shells are basically just tons of linked lists. Linked lists make memory management easier in C, although I'm using a more modern, high-level style.
I also spent a significant amount of time reading bash, dash, and mksh code for this project. (and to a lesser degree zsh).
Happy to answer any other questions.