I wrote similar things in the past, but for 6809. My equivalent these days is whenever I have to deal with a new microcontroller, I port my personal CLI + task queue schedular to it. Once I have a UART, timer, and CLI with history running, then the actual application is usually a breeze.
Yeah it was mostly written before ever coming into contact with Git so wasn't designed for that! Imagine a versioning system that relied on zipping the source code and adding a version number! I'll fix it at some point, but it was more for shelving and sharing.
>I wrote similar things in the past, but for 6809. My equivalent these days is whenever I have to deal with a new microcontroller, I port my personal CLI + task queue schedular to it. Once I have a UART, timer, and CLI with history running, then the actual application is usually a breeze.
One thing I plan to do in the future is run a 8086/8088 VM on something like an Arduino (or whatever is popular at the time). Getting op codes to working fine is easy, getting the BIOS interrupts working is slightly harder.
Back then, it often made a huge usability difference if you optimized in asm critical inner loops or other time-consuming external methods, which would be called from Clipper, Turbo Pascal/C, Modula-2 or alike. TSR-loaded "Norton Guides" were indispensable. Good times. :)
Me and a couple of friends were able to implement the game of Nibbles (Snake on old Nokia phones) in 70 bytes. Someone else managed to accomplish it in 48.
Someone would always eventually post "echo Hello World!" and the endless arguments would ensue about whether or not that was actually a program.
$ cat /tmp/hello
/tmp/hello: line 1: hello world: command not found
$ cat >hello
$ chmod +x hello
./hello: line 1: Hello World!: command not found
* fix: adheres to Hello World! standard now
(or at least, it was something like that...)
- Now, not so much.
(Sorry, source code not available)
Spoiler: GO.COM is 0 bytes long, yet was incredibly useful back at the time.
- why 200 machines needed rebooting
- how this program achieved that without any form of networking
Also, the closest other thing I'm aware of with a similar size-to-profit ratio is K:
> Perhaps conscious that with the occasional wrong result from an expression, the interpreter could be mistaken for a post-doctoral project, Whitney commented brightly, “Well, we sold ten million dollars of K3 and a hundred million of K4, so I guess we’ll sell a billion dollars worth of this.”
> Someone asked about the code base. “Currently it’s 247 lines of C.” Some expressions of incredulity. Whitney displayed the source, divided between five text files so each would fit entirely on his monitor. “Hate scrolling,” he mumbled.
For whatever reason, the sysadmins decided to reboot them daily as part of their shutdown procedure - if I recall, it was because of a bug in one of the myriad TSR's that were loaded to make the systems work properly on the network, something to do with not getting accurate clock settings from the network - and indeed, they were all networked over 10Base2 .. so maybe that was it.
It was a really fun and easy hack to make. ;)
This shows that somebody though that it is good idea to write ntldr as valid DOS program, that actually cares about the state of DOS on which it runs. (ntldr actually is concetanition of 16b MZ EXE and two COFF images, ie. something that DOS is perfectly willing to load and execute)
It's 200 bytes, a valid x85 ELF program with a deceptive 'strings' greeting message and an additional hidden message. Run at your own risk ;-) The machine readable version can be found here: https://entropia.de/Cryptierte_Postcarten
And it's fundamentally an octal machine: http://www.dabo.de/ccc99/www.camp.ccc.de/radio/help.txt
Looking at instruction encoding through an octal lens makes a lot of sense.
We should allow some license here. A discussion of, say the ALU, may likely yield a different statement.
"Machine", viewed theou various lenses, bit depth, ALU, clock, instructions, registers or lack of, works best in context.
Just a thought. :D
Did you read my comment that specifically mentions it?
Sure. But it doesn't make it an 'octal machine', let alone fundamentally so. Lots of binary things exhibit easier-to-see patterns when presented as octal, that's why octal is around. But it's a bit of convenient numerology not some intrinsic aspect of the (very binary) machine.
echo "B013CD106800A00789C30FAFDB01D0F7E801C3740C31D266B8C027090066F7F3414929C8243F0420AA89F831D2BB4001F7F32D780081EAA0007FCDF7DAE9C8FF" | xxd -r -p - > heart.com ; dosbox heart.com
I also made echo in 153 bytes:
Edit: It's only possible by moving the string and some of the opcodes into the ELF header section.
Bonus: it tries to read tracks at a time to make booting off of floppies faster (I wrote it in the 90s..)
The general list of tricks seem focused on either reusing code-data by just finding some byte/nibble you need in the code or finding op-codes that happen to accomplish multi-op-code effects.
I'd like to what you could get with self-modifying code. But naturally on 80x86, that's suicide since the code very, very heavily cached.
And self-modifying code that takes advantage of the x86 cache and pipelines - IF it's running a W&X environment in the first place - is a wide-open opportunity to see what amazing things your newly fried brain can claim responsibility for causing to exist. :D
But do something like modify a constant used in a tight loop once before it, and you will likely get better performance, especially since you now don't have to use a register or an extra memory access to load that constant.
That is, after all, how JITs work.
One of the best examples is something like an image codec --- input images have diferent widths, heights, bits per channel, channels, coding modes, etc. but they are constants throughout the same image. You can essentially "compile" a decoder specialised to the exact image you're decoding, with all the loop counters replaced with constants and the branches eliminated to directly execute the code they would otherwise jump to.
I'm convinced there are cases where this kind of optimization actually speeds up your code, but they are pretty unusual.
Artifact of modern execute and cache schemes.
00000000: b013 cd10 c42f 89e8 f72e 0000 89c5 31c6 ...../........1.
00000010: 31d2 89c1 31db 88f7 89df c1fb 0201 df89 1...1...........
00000020: c3c1 fb08 01df 01f7 5b26 883d 4353 89c3 ........[&.=CS..
00000030: c1fb 0401 da89 d3c1 fb04 29d8 e2d6 ebc6 ..........).....
Oh, I also wrote a Linux HTTP server in assembly with an executable under 2000 bytes: http://canonical.org/~kragen/sw/dev3/server.s with some explanation in http://canonical.org/~kragen/sw/dev3/httpdito-readme. I use it occasionally for testing stuff, since it's slightly less hassle than using python -m SimpleHTTPServer. Previous HN discussion at https://news.ycombinator.com/item?id=6908064
The demoscene diskmag "hugi" did a series of sizecoding competitions in 1998–2009, with truly astounding results. http://www.hugi.scene.org/compo/hcompo.htm Highlights include a working textmode Pong game in 142 bytes, a Brainfuck interpreter in 98 bytes, a Snake game in 48 bytes, and a perfect Tic-Tac-Toe player in 213 bytes.
The 8086 is a reasonable design for this kind of nonsense, since it's full of shortcuts and special cases that you can take advantage of. IBNIZ is an architecture really optimized for it, though: http://pelulamu.net/ibniz/ and in practice I often find that I can get things smaller with ARM Thumb-2 code than with the 8086. For example, https://github.com/kragen/dumpulse is about 350 bytes of code on most architectures, but just over 200 bytes on a big-endian ARM.