
I want to compile a C program so simple I can explain all of the assembly - jesstess
http://blog.ksplice.com/2010/03/libc-free-world/
======
old-gregg
I wish someone would dissect JVM in exactly the same way: i.e. clearly
explained what needs to be stripped off to have a quickly loading "hello
world" implementation eating less than 1MB of RAM and starting in a few
microseconds, like a normal, sane process should.

All these wonderful things are being built (Clojure, JRuby) on top of JVM that
are of no use outside of web/EE because default JVM is so heavy.

Yes, a VM does a lot more than just bootstrapping your stdlib, like in case of
libc, yet I keep thinking there must be plenty of unnecessary fat to strip
off. Just look at Microsoft CLR: same feature set, yet none of that sluggish
starting, RAM-wasting JVM nonsense.

~~~
kevingadd
This reminds me of a blog post from one of the Unity developers:

We joke that doing anything in C# will result in an XML parser being included
somewhere. This is not that far from the truth; e.g. calling float.ToString()
will pull in whole internationalization system, which probably somewhere needs
to read some global XML configuration file to figure out whether daylight
savings time is active when Eastern European Brazilian Chinese calendar is
used.

[http://aras-p.info/blog/2009/11/14/improving-cmono-for-
games...](http://aras-p.info/blog/2009/11/14/improving-cmono-for-games/)

The sad thing is that he's not kidding - if you profile a typical application
under Mono or .NET, it loads an XML parser almost immediately.

~~~
vl
.NET policy configuration files are in XML. To parse them there is another
internal XML parser in mscorlib (.NET runtime dll).

------
a-priori
If you really want to write code that you can understand all the way down I
suggest starting from as close to bare metal as your level of masochism
allows. For me, that's GRUB.

This tutorial walks you through making a kernel image that GRUB can load, and
shows how to poke bytes into video memory to print characters to the screen:

<http://wiki.osdev.org/Bare_bones>

~~~
akgerber
And then buy an FPGA and bust out some Verilog and write the metal yourself.

~~~
VBprogrammer
When I was about 14 I found writing a floppy disk boot loader in x86 Assembly
to be a good way of learning how a computer really works at the most basic
level. Including reading FAT12 to find the start of your next piece of code
and even displaying a slash screen!

While a little masochistic I find I still call upon what I learned back then
while writing in C and to a lesser extent higher level languages.

~~~
wallflower
"So I said, "I'll look into this floppy disk." And I started pulling up the
datasheet on that chip, and I started coming up with my first ideas of "how do
I have that chip get the data to a floppy disk?" And then I came up with this
clever little approach. I needed a little bit of logic in here..."

Steve Wozniak, Founders at Work

Amazing full inspiring interview (I have read every interview in JL's book and
it is by far my favorite)

<http://www.foundersatwork.com/steve-wozniak.html>

------
pgbovine
reminds me of a _really_ old article (before the term 'blog' even existed)
about a person exploring why the heck a 'hello world' Linux ELF binary was so
darn big. i think it's an interesting exercise to figure out why 'hello world'
in your favorite language/runtime environment is the size that it is (e.g.,
what initialization code is being called, are any VMs or intepreters being
setup, etc.)

~~~
AgentIcarus
That was my first thought as well. That article is found at
[http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...](http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html)
(if that's the one you were thinking of)

~~~
pgbovine
yup exactly! HN makes for a great crowdsourced expert search engine ;)

------
Periodic
I love being reminded occasionally about the lower levels of computer
operation. It never ceases to amaze me how easy it is to gloss over the
details of computers with an abstraction. We create layer upon layer of
abstraction and can quickly forget what is underneath, if we ever knew in the
first place.

As long as those abstractions don't leak, it isn't a big deal, but when they
do you had better know what's going on down below.

------
rythie
I once wrote a CPUid feature check program in assembly, it took surprisingly
long to learn and write those 80 lines (including blank lines and comments)

The executable was 736 bytes: <http://rythie.com/labs/cpuid.php>

------
ryanmerket
Check out this old school keygen tutorial I wrote:
<http://krobar.by.ru/krobar/other/key107.txt>

~~~
najirama
This is 'Hacker' News, not 'Cracker' News..though I must admit it was a
provocative read.

------
vinhboy
That was a really cool article, anyone with more knowledge on the subject want
to comment on how sound the author's thought process is? Thanks.

~~~
ajross
It all looks correct to me. Some of it is a little "needlessly-surprised",
honestly, like the bit about having to make a syscall trap to exit the
program. Programs don't exit on their own: _something_ needs to tell the
kernel that the process is done.

~~~
jcdreads
This point isn't actually blindingly obvious. Those of us raised on Pascal
might have assumed (or, in my case, actually did assume, without thinking hard
about it) that the program ended when the execution point reached "the end,"
where the main method was probably located in the binary.

It's disturbing how many Pascal-isms still pervade my thinking, even 20 years
since I last (willingly) touched the language.

------
DanielBMarkham
There was an example I read recently of doing this in Windows that I liked
more -- perhaps because the setup and tear-down was more interesting.

~~~
sundeep
Do you have a link for that article? Sounds like something I'd be interested
in reading. Thanks.

~~~
Gonsalu
I think he's talking about this article:
<http://www.phreedom.org/solar/code/tinype/>

------
acg
I'm not sure I understand the revelation here. GCC is often used to target
single boards with little resources. You can build gcc for the environment you
want to target. Sounds like attempting to use a compiler flag in a way it
wasn't intended. If you wanted to reduce the binary size wouldn't you look at
the linker?

~~~
ars
Did you actually read the article? You don't sound like you did.

------
kbradero
Lets see a really simple program you can explain all te assembly :)

We need to write our program like this:

$echo 0000000: 55 48 89 e5 b8 ff aa 00 00 c9 c3 |xxd -r > sum2.bin

Here we have the SAME little program in C: $ cat sum.c int sum(void){ return
0x00ff + 0xaa00; }

We can getLook at the results: $ gcc -c sum.c -o sum.o (get the raw opcodes in
osx, intel arch )

$ otool sum.o -td|sed -n '3,$p'| awk '{ print $0}'|xxd -r > sum.bin

Now you can look at asm level your code: $ ndisasm -b 32 sum.bin 00000000 55
push ebp 00000001 48 dec eax 00000002 89E5 mov ebp,esp 00000004 B8FFAA0000 mov
eax,0xaaff <\-- gcc put our 'sum' final product here at compilation time
00000009 C9 leave 0000000A C3 ret

So your program is now reduced to this code: $ hexdump sum.bin 0000000 55 48
89 e5 b8 ff aa 00 00 c9 c3 000000b

Test your 2 files md5 sum.bin sum2.bin MD5 (sum.bin) =
a0ccc94bcdc860a81ff28252f56c2257 MD5 (sum2.bin) =
a0ccc94bcdc860a81ff28252f56c2257

We could probe our code with a selfmade userland loader: $./uloader sum2.bin
Display Opcodes to exec: 55 48 89 e5 b8 ff aa 00 00 c9 c3 End opcodes code to
exec address: exec_code =0x100100080 new crafted Proc : address = 0x100100080
returned value ==>aaff

\----BEGIN Code--- #include <stdio.h> #include <stdlib.h>> #include <unistd.h>
#include <fcntl.h>

int main( int argc, char _argv[] ){ unsigned int (_ proc)(); unsigned int
fdprog=0; unsigned int _exec_code=NULL; unsigned char_ ptr=NULL; unsigned int
returned_value=0x0; exec_code=(int _) malloc( 100 ); ptr=( char_ )exec_code;
fdprog=open(argv[1], S_IRUSR ); printf("Display Opcodes to exec:\n"); while(
read(fdprog, ptr, sizeof(unsigned char)) ){ printf(" %02x ", _ptr ); ptr++; }

    
    
            printf("\nEnd opcodes\n");                                                                                      
            printf("code to exec address: exec_code =%p \n",exec_code);                                                              
            proc=(unsigned int (*)() ) exec_code;                                                                          
            printf("new crafted Proc : address = %p \n",proc);                                                                       
            returned_value= (*proc)();    //here is the magic bro! :)
                                                                                                                            
            printf("returned value ==>%lx \n",returned_value);                                                                            return 0;                                                                                                               }                        

\----END Code ---

Saludos! Jorge A. Garcia.

------
jriddycuz
Thanks for the article! Also, I love that your computer is named "kid-
charlemagne". :)

