
A C89 compiler that produces executables that are also valid ASCII text files [pdf] - luu
http://www.cs.cmu.edu/~tom7/abc/paper.pdf
======
hannob
There was actually a tool "com2txt" back in the DOS days. So you could convert
an executable and put it into an email....

Update: I have my data well sorted enough that I found it :-) It even comes
with code and is under a vague free license:
[https://github.com/hannob/com2txt](https://github.com/hannob/com2txt)

~~~
ptspts
Shameless plug: I also wrote a tool similar to com2txt in 1996:
[https://github.com/pts/pts-xcom](https://github.com/pts/pts-xcom) . A quick
comparison:

* com2txt.exe is 7110 bytes long, xcom.com is only 401 bytes long.

* xcom.com can also convert back from text to binary.

* xcom.com can also convert to data text (without the self-decoder header).

~~~
skocznymroczny
Can it also stop an alien invasion?

~~~
techolic
Send some aliens and test it out!

------
azendent
There's a nice video that does a great job of walking through what is
happening here:
[https://www.youtube.com/watch?v=LA_DrBwkiJA](https://www.youtube.com/watch?v=LA_DrBwkiJA)

~~~
bringtheaction
That is demoscene level awesomeness! Kudos to the author. Excellent video as
well. And not least, lovely ending <3

------
raphlinus
The phrase 'For example, on the popular and elegant X86 architecture, the
single byte 0xF4 is the "HLT" instruction' slays me every time.

~~~
metaobject
Can you explain this? Is it the "popular and elegant" part?

~~~
raphlinus
Absolutely. It's very subtle humor. Students of computer architecture consider
x86 to be one of the least elegant architectures around. Its many warts
include segment registers (originally a hacky workaround to stretch 64k of
memory to 1M), and an extremely complex instruction encoding employing prefix
bytes. Many of the legacy issues (such as not having enough registers) have
been papered over, leaving traces behind. Many people felt that the complexity
would doom the architecture, and that a cleaner, leaner RISC approach would
win out.

However, Intel has used their advantage in process technology to throw massive
amounts of transistors to make up for the problems caused by all this
complexity, and has done well. RISC has done well in the mobile space because
those transistors tend to be power-hungry, but everywhere else x86 is today
almost the only game in town.

One reason it's especially funny is that "HLT" is one of those legacy
instructions that has pretty much no use in a modern system, yet takes up a
whole slot in the byte encoding, while common operations like MOV or ADD often
require extra prefix bytes to specify the size of the operands.

Hope that helps!

~~~
CogitoCogito
> One reason it's especially funny is that "HLT" is one of those legacy
> instructions that has pretty much no use in a modern system

Is HLT no longer used in OS idle loops? Are there now other instructions which
are better to use instead?

~~~
fwsgonzo
It's absolutely used. It's a completely normal and expected instruction to
find on any CPU whether new or old.

------
modeless
On a related note, how about C source code that you can chmod +x and execute
directly, even with execve:
[https://gist.github.com/jdarpinian/84a28a1ed8a36313a4e0cad8b...](https://gist.github.com/jdarpinian/84a28a1ed8a36313a4e0cad8b3cd347f)

~~~
riking
Why go to all that trouble when you can just

#!/usr/bin/tcc -run #include <stdio.h> int main() { puts("Hello, World!"); }

~~~
modeless
Because most people don't have tcc installed, and the convenience is ruined if
you have to install things for this to work. I really like that tcc has a flag
for this; GCC and Clang really should copy it.

------
merraksh
The histogram on the last page counts the occurrences of each character in the
paper (all of them printable, of course). But because the histogram's counts
are made of characters too, the author had to add a few extra numbers to make
the histogram "converge". Brilliant.

------
Y_Y
This is the work of Tom7, well known for other projects like Learnfun &
Playfun, ARST ARSW and running a marathon in hockey skates.

~~~
learc83
Thanks for pointing that out. I wouldn't have noticed who that was otherwise.

His video on learnfun/playfun is both hilarious and amazing.

[https://www.youtube.com/watch?v=xOCurBYI_gY](https://www.youtube.com/watch?v=xOCurBYI_gY)

------
hprotagonist
The good doctor murphy is a mad genius.

I laughed very hard indeed when I first read this and got to the last 30
seconds of the video.

------
theknarf
Meta literate programs. Not only do you have the code and a descriptive
document about the code in the same document, but you also have the
executable!

------
merraksh
Do I see a Sierpinski triangle on page 9? It's the code that, according to the
description, changes the value of the AL register.

~~~
bonzini
No, it is the _data_ that the compiler precomputes to help changing the value
of the AL register. It is unused here, in fact it is cropped to 160 columns
and it has a caption in the middle so it's wrong even. He included it just
because it looks cool.

------
zmodem
The paper/executable starts with "ZM", but shouldn't a DOS .exe file start
with "MZ"?
([http://www.delorie.com/djgpp/doc/exe/](http://www.delorie.com/djgpp/doc/exe/))
What am I missing?

~~~
ChrisLomont
MZ or ZM works

[1]
[https://en.wikipedia.org/wiki/DOS_MZ_executable](https://en.wikipedia.org/wiki/DOS_MZ_executable)

~~~
zmodem
TIL I wonder if there’s any interesting reason for that.

------
zippzom
I must be missing something, but I don't see how the actual text of the paper
originates from the source code. Those C instructions actually compile into
the sentences of the paper as well?

~~~
moefh
From a quick look at the source[1], it seems the compiler will always generate
an executable with the text from the paper (which is read from the "paper/"
directory, and some bits hard-coded in the compiler source). Or something. I
don't really know SML.

From what I can tell, the .exe file generated by the compiler must be really
big anyway (since the relevant sizes in the header can't be small because they
have to be printable). So there must be some text, it might as well be the
paper.

[1]
[https://sourceforge.net/p/tom7misc/svn/HEAD/tree/trunk/abc/e...](https://sourceforge.net/p/tom7misc/svn/HEAD/tree/trunk/abc/exe.sml#l278)

~~~
zippzom
Ah so all the x86 bytes that the actual text generates are basically just
filler for the actually relevant sections of the paper (i.e. the jumble of
bytes that appears)? They're never actually read or executed by the CPU?

------
_RPM
Anyone have a copy of the program? I want to try and run it.

~~~
loeg
The text file (
[http://www.cs.cmu.edu/~tom7/abc/paper.txt](http://www.cs.cmu.edu/~tom7/abc/paper.txt)
) is the program.

------
plogik
If only i wasn't too lazy I'd buy a hat just to take it for the author.

