

Bootstrapping a simple compiler from nothing - yzg_new
http://homepage.ntlworld.com/edmund.grimley-evans/bcompiler.html

======
kragen
My own experiments along these lines are in StoneKnifeForth, a self-hosted
compiler for a variant of Forth: <http://github.com/kragen/stoneknifeforth/>
and tokthr, where I'm hoping I can get to a usable interactive interpreter in
under 2K: <https://github.com/kragen/tokthr>

I think I need to correct stoneknifeforth a bit to make it run on current
Linux.

~~~
spc476
Hm ... you using Python to bootstrap. Much harder would be to assume no
languages whatsoever and go from there (some musings I've done on this type of
project: <http://boston.conman.org/2009/11/05.1>)

~~~
kragen
In MS-DOS, you don't need an octal-to-binary conversion program, assuming
you're content writing your next program in decimal instead of binary, because
you can type any 8-bit byte with Alt and the numeric keypad. You can also
write useful 8086 machine code using only printable ASCII, but that's probably
making things hard on yourself unnecessarily.

Did 1982 MS-DOS have I/O redirection? You're right that that would help quite
a bit.

I think one of my first programs in that environment _would_ be some kind of
editor. Maybe a hex or octal editor rather than a text editor, but an editor
nonetheless. Typing programs more than a few dozen bytes with no ability to
see or fix typos gets old fast.

I don't know how much harder the job really gets if you bootstrap directly
from machine code instead of Python. I don't think it's that much.

~~~
kragen
> Did 1982 MS-DOS have I/O redirection? You're right that that would help
> quite a bit.

According to Ralf Brown's interrupt list,
<http://www.ctyme.com/intr/rb-2554.htm>, MS-DOS 1.0 did not support
redirecting standard output; that was a feature added in MS-DOS 2.0, which
didn't come out until 1983:
[http://en.wikipedia.org/wiki/Timeline_of_x86_DOS_operating_s...](http://en.wikipedia.org/wiki/Timeline_of_x86_DOS_operating_systems)

So if you really had to solve this problem on 1982 MS-DOS, you'd need to open
an output file with a file control block and INT 21 with AH=0F. (And if you
really can't type NUL bytes on the keyboard, I think you are going to have to
write some code to zero out most of the FCB at start time, too. Ick.)

This program might solve that problem for MS-DOS 2+, although I haven't tested
it yet, and you'd have to use something like JKLMNO instead of ABCDEF for your
hex digits past 9:

    
    
         100:	b4 01                	mov    $0x1,%ah
         102:	cd 21                	int    $0x21
         104:	24 0f                	and    $0xf,%al
         106:	88 c2                	mov    %al,%dl
         108:	c0 e2 04             	shl    $0x4,%dl
         10b:	cd 21                	int    $0x21
         10d:	24 0f                	and    $0xf,%al
         10f:	08 c2                	or     %al,%dl
         111:	b4 02                	mov    $0x2,%ah
         113:	cd 21                	int    $0x21
         115:	eb e9                	jmp    100 <loop>
    
        kragen@VOSTRO9:~/devel/misc$ xxd < hexconverter.com
        0000000: b401 cd21 240f 88c2 c0e2 04cd 2124 0f08  ...!$.......!$..
        0000010: c2b4 02cd 21eb e9                        ....!..
        kragen@VOSTRO9:~/devel/misc$ od -t u1 < hexconverter.com
        0000000 180   1 205  33  36  15 136 194 192 226   4 205  33  36  15   8
        0000020 194 180   2 205  33 235 233
        0000027
    

I think you could probably create that program successfully with COPY CON in
not very many tries, but it's not much better than COPY CON itself. I don't
think INT 21 function 01 even supports backspace, and in that sense it might
actually be _worse_.

~~~
kragen
In case anyone's interested, I successfully entered the following version in
DOSBOX using COPY CON, and indeed even used it to recreate a copy of itself:

    
    
        kragen@inexorable:~/devel/inexorable-misc$ od -t u1 hexconv.com
        0000000 180   1 205  33  36  15 136 194 192 226   4 205  33  36  15   1
        0000020 194 180   2 205  33 235 233
        0000027
    

It turns out the 08 meaning "or %al," was getting interpreted by COPY CON as a
backspace, so this version adds AX to DX instead. (I sure am glad I didn't
have to debug that using TYPE.) This program has advantages and disadvantages
compared to COPY CON:

1\. It echoes its input so you can see, at least in theory, if you made a
mistake.

2\. You only need to type two keystrokes instead of one to four per byte.

3\. There are no forbidden bytes. At least in Dosbox, COPY CON converts ^@ to
the sequence 00 03, interprets 08 as backspace (even if entered on the keypad
with Alt), and stops copying when it gets a carriage return, even if the
carriage return was entered with Alt.

4\. To exit, you must reboot. In Dosbox, the output has already been flushed
to the filesystem, but I have my doubts about whether, on MS-DOS, every single
INT 21H AH=2 would write a 512-byte floppy disk sector with the newly appended
byte. On the other hand, you could probably just mash some key on autorepeat
long enough to fill up a couple of sectors, and you'd be good to go. This
program, after all, only has to be sufficient to enter the next phase of the
bootstrap, one with backspace and explicit termination.

5\. There is no backspace, so if you hit any incorrect keystrokes, you must
start over. For programs of this size, that's not a major constraint — it's
pretty easy to carefully enter 40 or 50 or 100 keystrokes without making any
errors — but it becomes more serious as programs get larger.

~~~
kragen
Further details at [http://lists.canonical.org/pipermail/kragen-
hacks/2011-April...](http://lists.canonical.org/pipermail/kragen-
hacks/2011-April/000519.html).

------
haberman
Sounds like a fun exercise for anyone who has just read "Reflections on
Trusting Trust." "You can't trust code that you did not totally create
yourself." Well what if I _did_ totally create all the code myself?

I guess you'd still be susceptible to microcode, so you'd have to create your
own CPU too. Sigh, back to the drawing board. :)

~~~
sp332
If you didn't write quantum mechanics yourself, how do you know it's not out
to get you? :)

~~~
jcitme
you HAD to make me write new laws of physics? you %@#$

------
Sapient
I have fantasized about doing this for 15 years, and after reading this, I
think I have finally got the motivation to do it.

~~~
kragen
I'm interested to see your results when you get done!

------
zerd
Combine this with a guide to making your own OS:
<http://www.jamesmolloy.co.uk/tutorial_html/index.html> and you've got a very
interesting project.

------
erikpukinskis
This is my dream vacation.

------
swah
Just text, formatted for 80 cols. Anyone else really likes reading stuff in
this "format"?

~~~
marshray
I do.

~~~
sathyabhat
Makes reading on cell phones a pain though.

~~~
sbierwagen
Amazing that 80 column unstyled ascii text, which is supposed to be the
absolutely lowest level of abstraction, able to pass through even the dumbest
mail servers without being mangled, able to be read by any computer made in
the last 40 years, windows boxes, mac boxes, sun boxes, dumb terminals,
propped-up toasters running NetBSD; is finally stymied by the ultimate low end
machine: one with a screen that isn't physically large enough for 80 column
text.

~~~
kragen
Lots of home computers from 30 years ago couldn't fit 80 columns on the TV
screen.

