
The art of creating very tiny programs for the 80x86 family of CPUs - kken
http://www.sizecoding.org/wiki/Main_Page
======
bArray
Shameless plug for an old project, a 512 byte kernel with 512 byte programs
(no DOS calls):
[https://bitbucket.org/danielbarry/saxoperatingsystem/src/mas...](https://bitbucket.org/danielbarry/saxoperatingsystem/src/master/)

~~~
jhallenworld
Nice, but I have to complain about bitbucket.. it's so annoying that your html
documents do not show up when browsing the bitbucket website. Github has them
beat on this one- just rename the htmls to md and it will mostly work.

I wrote similar things in the past, but for 6809. My equivalent these days is
whenever I have to deal with a new microcontroller, I port my personal CLI +
task queue schedular to it. Once I have a UART, timer, and CLI with history
running, then the actual application is usually a breeze.

~~~
bArray
>Nice, but I have to complain about bitbucket.. it's so annoying that your
html documents do not show up when browsing the bitbucket website. Github has
them beat on this one- just rename the htmls to md and it will mostly work.

Yeah it was mostly written before ever coming into contact with Git so wasn't
designed for that! Imagine a versioning system that relied on zipping the
source code and adding a version number! I'll fix it at some point, but it was
more for shelving and sharing.

>I wrote similar things in the past, but for 6809. My equivalent these days is
whenever I have to deal with a new microcontroller, I port my personal CLI +
task queue schedular to it. Once I have a UART, timer, and CLI with history
running, then the actual application is usually a breeze.

Exactly!

One thing I plan to do in the future is run a 8086/8088 VM on something like
an Arduino (or whatever is popular at the time). Getting op codes to working
fine is easy, getting the BIOS interrupts working is slightly harder.

------
dakics
I remember having great time writing this little 256-byte game:

[https://github.com/dakics/asm-tie-fighter](https://github.com/dakics/asm-tie-
fighter)

Back then, it often made a huge usability difference if you optimized in asm
critical inner loops or other time-consuming external methods, which would be
called from Clipper, Turbo Pascal/C, Modula-2 or alike. TSR-loaded "Norton
Guides" were indispensable. Good times. :)

~~~
ohadpr
I think my first assembly book was a Norton one :-)

------
ohadpr
Many folks who used to write 0x86 assembly code love size coding compos,
especially demoscene folks.

Me and a couple of friends were able to implement the game of Nibbles (Snake
on old Nokia phones) in 70 bytes. Someone else managed to accomplish it in 48.

[http://www.hugi.scene.org/compo/compoold.htm#compo1](http://www.hugi.scene.org/compo/compoold.htm#compo1)

~~~
gnulinux
How is this possible? I can't even produce "Hello World" program as short as
that, do you not use ELF format?

~~~
jaak
LOL this reminds of old flame wars on Usenet about the smallest possible
"Hello World!".

Someone would always eventually post "echo Hello World!" and the endless
arguments would ensue about whether or not that was actually a program.

~~~
rwmj
Why echo it?

    
    
        $ cat /tmp/hello 
        hello\ world
        $ /tmp/hello
        /tmp/hello: line 1: hello world: command not found
    

It prints hello world :-)

~~~
vardump
Great idea! I improved the "program":

    
    
      $ cat >hello
      Hello\ World\!
      $ chmod +x hello
      $ ./hello
      ./hello: line 1: Hello World!: command not found
    

V0.2 changelog:

* fix: adheres to Hello World! standard now

------
mmjaa
Smallest program I ever sold, for near-5-digit profit, was a "REBOOT.COM"
which saved a sysadmin from hand-rebooting 200 machines, daily, and which
earned me $9,000 on account that it was my monthly rate at the time, and its
all I did that month, by circumstance, anyway ... it was a different time, and
I honestly felt I deserved it, then[1].

C:\> debug.com n JMP FFF0 w reboot.com q ^Z

(or at least, it was something like that...)

[1]- Now, not so much.

~~~
Someone
Reminds me of GO.COM. It was smaller, and made its programmer money, too
([http://peetm.com/blog/?p=55](http://peetm.com/blog/?p=55))

(Sorry, source code not available)

~~~
weinzierl
> (Sorry, source code not available)

ISWYDT;-)

Spoiler: GO.COM is 0 bytes long, yet was incredibly useful back at the time.

------
dividuum
We have the long tradition of sending encrypted or puzzle-like postcards to
our local hackerspace. Three years ago we created this postcard:
[https://entropia.de/Datei:2015-01-27-back.jpg](https://entropia.de/Datei:2015-01-27-back.jpg)

It's 200 bytes, a valid x85 ELF program with a deceptive 'strings' greeting
message and an additional hidden message. Run at your own risk ;-) The machine
readable version can be found here:
[https://entropia.de/Cryptierte_Postcarten](https://entropia.de/Cryptierte_Postcarten)

------
Scaevolus
x86 registers were designed with purposes in mind, and many of the 1-byte
opcodes reflect this-- for example, "A4 MOVS mb,mb: Move byte [SI] to ES:[DI],
advance SI,DI":
[https://www.swansontec.com/sregisters.html](https://www.swansontec.com/sregisters.html)

And it's fundamentally an octal machine:
[http://www.dabo.de/ccc99/www.camp.ccc.de/radio/help.txt](http://www.dabo.de/ccc99/www.camp.ccc.de/radio/help.txt)

~~~
pvg
'Octal machine' seems a little overstated (I know, that's what the ole post
says), 'fundamentally' fundamentally so.

~~~
ddingus
Did you read the linked material?

Looking at instruction encoding through an octal lens makes a lot of sense.

We should allow some license here. A discussion of, say the ALU, may likely
yield a different statement.

"Machine", viewed theou various lenses, bit depth, ALU, clock, instructions,
registers or lack of, works best in context.

Just a thought. :D

~~~
pvg
_Did you read the linked material?_

Did you read my comment that specifically mentions it?

 _Looking at instruction encoding through an octal lens makes a lot of sense._

Sure. But it doesn't make it an 'octal machine', let alone fundamentally so.
Lots of binary things exhibit easier-to-see patterns when presented as octal,
that's why octal is around. But it's a bit of convenient numerology not some
intrinsic aspect of the (very binary) machine.

------
peterisza
I wrote this animated 80-byte heart shaped tunnel and sent it in a text
message to a geek girl in high school:
[http://www.pouet.net/prod.php?which=12270](http://www.pouet.net/prod.php?which=12270)

~~~
delian66
It is a _very_ nice effect! It still runs under DOSBox:

    
    
        echo "B013CD106800A00789C30FAFDB01D0F7E801C3740C31D266B8C027090066F7F3414929C8243F0420AA89F831D2BB4001F7F32D780081EAA0007FCDF7DAE9C8FF" | xxd -r -p - > heart.com ; dosbox heart.com

------
faissaloo
I created a fan daemon for my Macbook in 864 bytes:
[https://github.com/faissaloo/mfcd](https://github.com/faissaloo/mfcd)

I also made echo in 153 bytes:
[https://github.com/faissaloo/echo](https://github.com/faissaloo/echo)

------
bitwize
Meanwhile, the remnants of the Lost Amiga Civilization snicker, and mutter
something about 68k instructions, their many address modes, and custom chips
between coughs.

~~~
kken
[https://amycoders.org/](https://amycoders.org/)

------
wybiral
The smallest "Hello world!" I ever managed was 86 bytes:
[https://github.com/wybiral/tiny-elf](https://github.com/wybiral/tiny-elf)

Edit: It's only possible by moving the string and some of the opcodes into the
ELF header section.

------
gesman
In my young and reckless years i loved to [ab]use INT 13 to literally
mechanically destroy hard drives with 12 bytes programs.

------
sehugg
You could even make a chess program in 392 bytes:
[http://nanochess.org/chess6.html](http://nanochess.org/chess6.html)

------
jhallenworld
How about this? A single block boot loader that understands the ext2
filesystem, so can load the kernel without having to "update-grub":

[https://github.com/jhallen/joes-
sandbox/tree/master/boot](https://github.com/jhallen/joes-
sandbox/tree/master/boot)

Bonus: it tries to read tracks at a time to make booting off of floppies
faster (I wrote it in the 90s..)

------
kragen
My only real effort in this direction was a 64-byte MS-DOS .COM file that
draws colored circles on the screen:

[http://canonical.org/~kragen/sw/dev3/morecircles.s](http://canonical.org/~kragen/sw/dev3/morecircles.s)

    
    
        00000000: b013 cd10 c42f 89e8 f72e 0000 89c5 31c6  ...../........1.
        00000010: 31d2 89c1 31db 88f7 89df c1fb 0201 df89  1...1...........
        00000020: c3c1 fb08 01df 01f7 5b26 883d 4353 89c3  ........[&.=CS..
        00000030: c1fb 0401 da89 d3c1 fb04 29d8 e2d6 ebc6  ..........).....
    

There's a lot to learn from tiny demos like the ones (I didn't write)
dissected in
[http://canonical.org/~kragen/demo](http://canonical.org/~kragen/demo).

Oh, I also wrote a Linux HTTP server in assembly with an executable under 2000
bytes:
[http://canonical.org/~kragen/sw/dev3/server.s](http://canonical.org/~kragen/sw/dev3/server.s)
with some explanation in [http://canonical.org/~kragen/sw/dev3/httpdito-
readme](http://canonical.org/~kragen/sw/dev3/httpdito-readme). I use it
occasionally for testing stuff, since it's slightly less hassle than using
python -m SimpleHTTPServer. Previous HN discussion at
[https://news.ycombinator.com/item?id=6908064](https://news.ycombinator.com/item?id=6908064)

The demoscene diskmag "hugi" did a series of sizecoding competitions in
1998–2009, with truly astounding results.
[http://www.hugi.scene.org/compo/hcompo.htm](http://www.hugi.scene.org/compo/hcompo.htm)
Highlights include a working textmode Pong game in 142 bytes, a Brainfuck
interpreter in 98 bytes, a Snake game in 48 bytes, and a perfect Tic-Tac-Toe
player in 213 bytes.

The 8086 is a reasonable design for this kind of nonsense, since it's full of
shortcuts and special cases that you can take advantage of. IBNIZ is an
architecture really optimized for it, though:
[http://pelulamu.net/ibniz/](http://pelulamu.net/ibniz/) and in practice I
often find that I can get things smaller with ARM Thumb-2 code than with the
8086. For example,
[https://github.com/kragen/dumpulse](https://github.com/kragen/dumpulse) is
about 350 bytes of code on most architectures, but just over 200 bytes on a
big-endian ARM.

------
joe_the_user
Is there an equivalent for tiny bytecode programs?

The general list of tricks seem focused on either reusing code-data by just
finding some byte/nibble you need in the code or finding op-codes that happen
to accomplish multi-op-code effects.

I'd like to what you could get with self-modifying code. But naturally on
80x86, that's suicide since the code very, very heavily cached.

~~~
kccqzy
The CPU most definitely still supports self-modifying code. Intel takes
compatibility seriously.

~~~
kragen
Oh, yeah, it's supported. But it's also slow.

~~~
userbinator
"continuously modifying" code, yes.

But do something like modify a constant used in a tight loop once before it,
and you will likely get _better_ performance, especially since you now don't
have to use a register or an extra memory access to load that constant.

That is, after all, how JITs work.

One of the best examples is something like an image codec --- input images
have diferent widths, heights, bits per channel, channels, coding modes, etc.
but they are constants throughout the same image. You can essentially
"compile" a decoder specialised to the exact image you're decoding, with all
the loop counters replaced with constants and the branches eliminated to
directly execute the code they would otherwise jump to.

~~~
kccqzy
I feel like this is something the compiler could do automatically. Give it a
hint, let it do some alias analysis and discover that a particular variable
won't change throughout a loop. And then let it treat that variable as a
constant (as an immediate in instructions) and generate the code to modify it
at the beginning. Perhaps let it pass on this information so that eventually
the section could be mapped W+X.

~~~
kragen
Self-modifying code to optimize a hot loop is only a useful thing to do on a
machine without a split I/D level-1 cache. All modern high-performance CPUs
will suffer an instruction cache flush on each iteration of the hot loop,
which will kill your performance. Also, though, even on CPUs with no
instruction cache, it's probably a very minor improvement at best. Suppose you
write the variable in one place and read it in 4 places. If no register is
allocated to the variable, on a load-store machine such as a RISC, the usual
approach involves one store and four loads. But if you use self-modifying code
instead, you have four stores (and no loads), but those stores are modifying
only part of an instruction word — so, on most hardware, they actually require
a preceding load, even if the architecture tries to hide that from you.

I'm convinced there _are_ cases where this kind of optimization actually
speeds up your code, but they are pretty unusual.

------
akrasuski1
Another nice site aggregating many clever tiny x86 programs:
[https://www.xorpd.net/pages/xchg_rax/snip_00.html](https://www.xorpd.net/pages/xchg_rax/snip_00.html)

~~~
agumonkey
maybe not the tiniest but reddit.com/r/tinycode has some

------
amelius
The smaller the problem, the easier it is to brute-force the solution.

