
X86 assembler in Bash (2001) - gioele
http://lists.gnu.org/archive/html/bug-bash/2001-02/msg00054.html
======
s_tec
It warms my heart to see such a brilliant hack. It's especially impressive
that this doesn't use any external tools like sed.

If you scroll to the bottom, you can see how it works. Each x86 instruction is
actually a shell function, and the input to the the assembler is itself a
shell script. Each line of the input script basically tells the assembler
which bytes to output. There is some fun stuff for dealing with variable-
length x86 jump instructions, but otherwise that's the basic idea.

The new Mill CPU architecture actually uses a similar idea in their assembler.
Each of their instructions is a C++ function. To assemble Mill software, the
first step is to run the "assembly language" through a C++ compiler. The
resulting program emits the appropriate Mill machine code when it runs. This
is an interesting approach, since it turns the assembler into a reusable piece
of software.

Another side-benefit of this technique is that it gives the assembler a super-
powerful macro language. In this case, the macro language is basically bash
shell. So you want to emit 100 add instructions? Just put the instructions
inside a bash `for` loop!

~~~
fit2rule
Warms my heart too, but what interests me most about this story is that I
recall assemblers being written this way as a sort of a standard technique in
the 70's and 80's, and it seems to have been a lost art, now recovered (or, at
least in 2001) .. I remember some Tandem (or perhaps Wang?) machines used the
shell to emit assembly instructions, and in the days of MIPS as a hardware
manufacturer (RISCOS pizzabox) there were such assemblers, very rudimentary,
for the boot console, so you could emit code to start the machines ..

{Hmm .. what is the term for this, it occurs so often I'm sure there must be a
description of it, where new, old stuff becomes new and interesting again?}

~~~
vidarh
> and in the days of MIPS as a hardware manufacturer (RISCOS pizzabox)

I was very confused there for a moment - I'd never heard of RISC OS pizza box
in any other context than the Acorn / ARM machines. In case anyone else is
similarly confused: This is RISC/os the Unix for MIPS based systems.

~~~
fit2rule
Right, sorry for the confusion, and thanks for clearing that up.

------
4ad
Related, aaa, by Henry Spencer[1]:
[http://doc.cat-v.org/henry_spencer/amazing_awk_assembler/](http://doc.cat-v.org/henry_spencer/amazing_awk_assembler/)

    
    
       "aaa" (the Amazing Awk Assembler) is a primitive assembler written entirely
       in awk and sed.  It was done for fun, to establish whether it was possible.
       It is; it works.
    

Also, awf, also by Henry Spencer:
[http://doc.cat-v.org/henry_spencer/awf/](http://doc.cat-v.org/henry_spencer/awf/)

    
    
        This is awf, the Amazingly Workable Formatter -- a "nroff -man" or
        (subset) "nroff -ms" clone written entirely in (old) awk.
    

[1]
[http://en.wikipedia.org/wiki/Henry_Spencer](http://en.wikipedia.org/wiki/Henry_Spencer)

------
rwmj
On a similar topic, FORTH assemblers are interesting. You write FORTH code
like:

    
    
        : RDTSC
           RDTSC
           EAX PUSH
           EDX PUSH
        ;CODE
    

which compiles to a wrapper that runs the rdtsc instruction and pushes the
result (2 x 32 bit words) onto the FORTH stack.

Which reminds me, I must finish this one:
[http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth...](http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth.f;h=5c1309574ae1165195a43250c19c822ab8681671;hb=HEAD#l1627)

~~~
dvdkhlng
[Edit: yes the code above seems to be valid for the author's non-standard
"jonesforth" implementation of Forth. ;CODE is defined in the ANS Forth
standard to mean something very different.]

Sorry for the nit-picking, but the code you give is most likely wrong.
Assembler code in Forth is enclosed in CODE...END-CODE . ;CODE is used to
attach machine-code run-time semantics to words created with CREATE [1].

    
    
      CODE RDTSC  ( -- d )
           RDTSC
           EAX PUSH
           EDX PUSH
      END-CODE
    

Here RDTSC and PUSH are not compiled but executed immediately to output the
corresponding machine code to the current definition (which also uses the name
RDTSC albeit in a different vocabulary).

You can generate machine code by invoking the Assembler's words from Forth
words (a "word" is what you call "functions" in other languages), which can be
used as a simple macro facility or as a facility to dynamically generate
machine code, do automatic register allocation etc.

    
    
       ALSO ASSEMBLER
       : my-macro      RDTSC     EAX PUSH     EDX PUSH ;
       CODE RDTSC   my-macro  END-CODE
    

BTW for those interested, this is the source of the x86 assembler, written in
Forth, that ships with GNU forth [2]. The amd64 version [3] even supports SSE.
When writing assembler code in Gforth, the non-standard ABI-CODE facility [4]
is preferable over CODE..END-CODE, BTW.

[1]
[http://www.forth200x.org/documents/forth13-1.pdf](http://www.forth200x.org/documents/forth13-1.pdf)

[2]
[http://git.savannah.gnu.org/cgit/gforth.git/tree/arch/386/as...](http://git.savannah.gnu.org/cgit/gforth.git/tree/arch/386/asm.fs)

[3]
[http://git.savannah.gnu.org/cgit/gforth.git/tree/arch/amd64/...](http://git.savannah.gnu.org/cgit/gforth.git/tree/arch/amd64/asm.fs)

[4]
[http://www.complang.tuwien.ac.at/anton/euroforth/ef10/papers...](http://www.complang.tuwien.ac.at/anton/euroforth/ef10/papers/ertl.pdf)

------
luiz_andrade
"shasm doesn't use eval."

------
goldenkey
Reminds me of TempleOS [0]

[0] [http://www.templeos.org/](http://www.templeos.org/)

~~~
zypeh
so hardcore, so did the name

------
jfe
neat, but i'll stick to nasm, thanks.

------
integricho
I am impressed.

