
A Lisp interpeter in a thousand lines of Bash - Morgawr
https://github.com/alandipert/gherkin/blob/master/gherkin
======
VLM
As a nostalgic trip, a little over 30 years ago I was playing with Randall
Beer's LISP interpreter which ran in MS Basic on a TRS-80 model III, very
slowly, as seen on page 176 at this link (this is the first in a multiple
article series)

[https://archive.org/details/80-microcomputing-
magazine-1983-...](https://archive.org/details/80-microcomputing-
magazine-1983-03)

I distinctly remember as a kid it was very slow indeed, but interesting.

I got lost reading the ads. In retrospect, computing used to be a much more
expensive hobby than it is today. Not just relative terms, but absolute terms.
Then again, people are much poorer now, so its required.

Anyway, since 1983, he became a neuroscience prof and mentions his BASIC LISP
on his homepage

[http://mypage.iu.edu/~rdbeer/](http://mypage.iu.edu/~rdbeer/)

The line numbers are not consecutive, but I think he's well under a thousand
lines of BASIC, there just aren't enough pages of code in the listing to
exceed that.

And yes, this was considered reasonable coding style back then. That is why
this generation never shrank in terror at the sight of bad Perl code. Why yes,
this is a bit hard to read, but I've certainly seen worse...

~~~
Morgawr
Slightly related to the "let's implement Lisp on weird languages/platform"
theme, here's awklisp, a project that apparently helped/inspired Dipert to
write Gherkin:
[https://github.com/darius/awklisp](https://github.com/darius/awklisp)

~~~
brudgers
In 1983, the TRS-80 was a mainstream computer. BASIC was also a mainstream
language - so was Assembly. Even a decade later, before the explosion of the
internet, getting one's hands on a C compiler typically meant paying for a
commercial implementation...probably on floppies and delivered over the shoe
net.

~~~
VLM
This is true, but I think by the standards of the 2010s most of the defining
characteristics of early 80s MSbasic would be considered really weird.

Line numbers. Control flow exclusively by if/then and for/next. We Love GOTO
(noobs probably don't know what a GOTO is or why some considered them
harmful). No namespaces, everything was a giant global shared namespace where
$T meant the same thing everywhere. No naming conventions for variables (like
Hungarian or CamelCase). No one used revision control. No unit tests. No
symbolic debuggers (they were coming, soon, and something like them existed
for assembly, but this was 1983 and not-assembly). For better or worse, no
REGEXes. No object orientation, pure procedural. Everything is in one file
because quite a few people had no disk drive and relied on cassette tapes.
Line editing was a little crude compared to modern vim/emacs and pretenders to
the throne. No IDEs until Turbo Pascal and the like quite a few years later
(Or was quickbasic first? Either way it would be a long time...)

By modern standards MSBASIC is pretty weird.

~~~
brudgers
BASIC is perhaps one of the less weird things about computing the early 80's.
It was the high point for women studying computing in universities - well over
a third of all computer science degrees were awarded to women: three times the
ratio found today. [1]

But more to your point, by the 1990's the idea that computer languages should
be understandable by non-programmers with a reasonable education and a general
familiarity with the principles of computing was dead.

Think of CoBOL. If its approach hadn't been abandoned for the obfuscations of
C++ and Perl, Eric Edwards wouldn't have written a book and gone on the
lecture circuits to spread the gospel of ubiquitous language. There's a reason
the unschooled learned HTML and PHP in the 1990's - they were accessible to
moderately educated people and could do useful work in the same way that CoBOL
and BASIC were by design.

The TRS-80 ran a version of TinyBasic. The use of line numbers and GOTO
allowed BASIC to run closer to the metal - GOTO 15 is an Assembly Language
JUMP to the address of where ever the instruction on line 15 mapped by the
assembler. Without it, BASIC would not have been such a successful path to a
higher level programming language for serious programmers steeped in assembly
language. GOTO is handy if you want to translate from Knuth's MIX without a
lot of fuss.

[1]
[http://en.wikipedia.org/wiki/Women_in_computing#The_Gender_G...](http://en.wikipedia.org/wiki/Women_in_computing#The_Gender_Gap)

~~~
mikeash
Many older BASIC implementations are interpreted, and it looks like Tiny BASIC
was as well:

[http://en.wikipedia.org/wiki/Tiny_BASIC](http://en.wikipedia.org/wiki/Tiny_BASIC)

It's strange to look back and see just how popular virtual machines were at
the time. BASIC typically used one, as did many other languages. Smalltalk is
famous for using a virtual machine, for example. Microsoft's original Mac apps
all ran bytecode in a virtual machine.

It seems crazy, because these computers were already tremendously slow,
relatively speaking, and adding a virtual machine makes it much worse.
However, it was ultimately a useful tradeoff because these machines were even
more limited on RAM than they were on CPU power, and using a virtual machine
with bytecode that allowed for an efficient instruction encoding could save a
lot of space. It doesn't matter how fast your code runs if it doesn't fit in
RAM, after all.

~~~
brudgers
Speed is of course relative. Interpreted BASIC is faster than working things
out by hand. For something critical, Assembly Language was always available.

On a TRS-80 Model 1, compiling means the compiler, the input and the output
have to live in 4k of ram (or 16k in the later versions).

Considering that the Level I TinyBasic interpreter lived on a 4kB ROM; Level
II lived on a 12kB ROM; and Mass storage for most early machines was audio
tape - not only slow but also notoriously prone to not loading files
correctly, the compiling code would have been great for masochists, not so
good for people who were just trying to get something done.

And that's before considering the complexities of tuning a compiler to
optimize code.

~~~
VLM
Having been there, there exists an intermediate step of tokenized code. So you
store ascii strings as .. ascii but as you enter source code a "then" as in
if/then gets tokenized into hex 0xD6 or something. So the poor CPU doesn't
have to run a full lexer at runtime to see if the "t" belongs to "to" or
"then" it just matches hex 0xd6 which is much faster. This works real well if
you have 128 (or so) or less tokens in your language. This can also save a
huge amount of memory, depending on your coding style I suppose.

Tokenization also allows some syntax error detection to occur as you type code
in, which was interesting. I don't remember enough about this. Obviously some
mistakes won't tokenize at all or will tokenize into gibberish.

So tiny basic in memory stored plain old ascii and saved plain old ascii to
cassette tape. lvl2 msbasic stored tokens in memory although it could
optionally save pure ascii to cassette tape. This had some interesting
software distribution issues and compatibility issues as it was sorta kinda
half way possible to save something on lvl1 and load it into lvl2 if you were
careful and vice versa.

------
mikeash
This is fascinating. I assumed it was yet another ridiculous attempt to build
something in an environment completely unsuited for it, but it seems that they
are serious. But they're also sufficiently aware of the craziness of the
project that the first thing they do is explain just why the heck they're
doing it:

[https://github.com/alandipert/gherkin/wiki/Why-
gherkin%3F](https://github.com/alandipert/gherkin/wiki/Why-gherkin%3F)

The short version is that bash is the closest thing to being universally
available on every UNIXoid system no matter what, and so by writing stuff in
bash, you make it so that it can run everywhere. But because bash sucks to
program in, this is a minimalist interpreter for a sane language. You can then
write programs in that language, and they will only depend on bash and on this
interpreter, and the interpreter is simple enough not to need any sort of
complex installation.

I can't quite think of a use case for this where it's not worth e.g.
installing Python first, but it's an interesting project all the same.

~~~
lloeki
> I can't quite think of a use case for this where it's not worth e.g.
> installing Python first

Whereever you get Bash, you can reasonably assume Perl5 (unless in an initrd
or something). Even on some old AIX 4 I had a readily available Perl 5.005.

Nonetheless I wish there were more actual shells that were not sh descendants.

~~~
mmastrac
Bash is probably a stretch in more initrds anyways (IIRC, Redhat used a very
light sh-like shell).

------
gaius
I started a project like this about 10 years ago, but then I discovered that
you could just compile Lisp on your own workstation and upload it to prod with
a .sh extension and no-one would actually check, they would just blindly run
it. Not even the size was suspicious. Used the same trick abit later with
OCaml and Haskell, you just compile them as whatever.py and no-one's any the
wiser.

------
shawndumas
ReadMe ==>
[https://github.com/alandipert/gherkin/blob/master/README.md](https://github.com/alandipert/gherkin/blob/master/README.md)

------
cbsw
\+ - * / even doesn't support multi-data. (+ 1 2 3) would be 3,stupid

~~~
crnixon
Cool, awesome, great point. I tried Googling for your implementation and
couldn't find it. Could you drop a link?

------
mzs
There a UUOC in strmap_file, in fact all those uses head, tr, and tail could
likely be just handled by sed.

------
finin
interpeter => interpreter

