
TCP HTTP Server written in Assembly - thikonom
http://canonical.org/~kragen/sw/dev3/server.s
======
derefr
Cool stuff. Really, though, this is still relying on a rather large runtime
library: the physical, data-link, and network-layer drivers.

Now what'd be really awesome to see, would be one of those Operating System
guides that shows you how to write an OS kernel, in assembler, that can speak
HTTP. Even just limiting yourself to targeting the synthetic hardware of a VM
program, it'd still be quite a feat.

Bonus points if the entire network stack has been flattened using the hand-
rolled equivalent of stream-fusion. :)

~~~
aortega
Here you go:

[http://www.kyllikki.org/hardware/wwwpic2/src/wwwpic2.asm.htm...](http://www.kyllikki.org/hardware/wwwpic2/src/wwwpic2.asm.html)

Bonus: runs on 68 bytes of ram. Not a typo, it's bytes, and it's a "complete"
http+tcp/ip server.

~~~
mbell
That is just running the TCP/IP layer speaking RS232 and relying on an
external IC for lower layers. It's not at all what the GP is looking for.

It should probably also be noted that a minimum TCP header, with no data
attached, is 20 bytes, so to implement a 'full stack' in 68 bytes is a pretty
strong indication that you're relying on off SoC memory to handle the packet
buffering.

~~~
aortega
I encourage you to read the source code instead of guessing how it may work.

------
neverm0re
Here's another simpler implementation of an HTTP server in Linux x86 assembly
from last year, coincidentally by the one who did the Seiken Densetsu 3/Secret
of Mana 3 translation hack and the old Starscream 68k emulator:

[http://www.neillcorlett.com/etc/mohttpd.asm.txt](http://www.neillcorlett.com/etc/mohttpd.asm.txt)

And a not so successful thread to go with it:
[https://news.ycombinator.com/item?id=4714971](https://news.ycombinator.com/item?id=4714971)

~~~
kragen
That's very nice! Thanks! I spent a lot of last night figuring out how to use
socketcall, and this should be helpful.

------
kragen
I hacked on httpdito some more, and it has been improved in several ways:

\- it now forks so that it can handle multiple concurrent connections (up to a
limit of 2048);

\- it no longer uses libc at all, so it's down to 2088 bytes (I had it lower,
but then I added forking);

\- it's less complex now that it only has one way of invoking system calls
instead of two;

\- there are some performance results in the comments.

\- it has a name, "httpdito";

\- strlen works correctly.

Probably nobody will read this comment here, but I thought it was worth
mentioning.

~~~
kragen
Down to 1928 bytes now, and has timeouts for robustness. You can still DoS it
but it takes more work.

------
zw123456
Very cool I think. But cooler, Web Server on a FPGA, without CPU, only VHDL
[http://www.youtube.com/watch?v=7syu5EC1OWg](http://www.youtube.com/watch?v=7syu5EC1OWg)

------
mappu
Cool!

My comments as an inexperienced assembly developer, assuming this is
optimising for binary size:

\- The pug/doN macros do an extra reg-reg copy if passed a register - and the
recursive definition calls pop/pop/pop instead of just add %esp, -4*N, you
could shave a few bytes

\- AT&T syntax will always look weird to me, but the heavy use of macros and
local labels is quite elegant

\- A little bit of candid swearing in the comments? Fine by me, but is this
officially associated with canonical?

~~~
pbsd
Agree, AT&T syntax was just not designed for human reading. I doubt this is
too optimized for size, since there are obvious tricks that it misses.

Another observation: the strlen code is incorrect, as it also counts the \0.
We can fix this, and make the code 1 byte shorter (in glorious Intel syntax):

    
    
        lea esi, source        ; depends on source
        xor ecx, ecx           ; 2 bytes
        salc                   ; 1 byte
        cld                    ; 1 byte
        _back:
        scasb                  ; 1 byte 
        loopnz _back           ; 2 bytes
        not ecx                ; 2 bytes

~~~
kragen
BTW, I've fixed the strlen code (although differently). I didn't know about
SALC! That's a very clever way of zeroing AL.

I think at this point I might be able to get away with CLD since I never STD
any more :)

Some of the obvious tricks it misses are probably because they're not obvious
to me, while others may be just because I haven't gotten to them yet.

~~~
pbsd
Your way is much cleaner; mine was just a size gimmick. I just can't resist it
:)

------
tokenizer
As a web developer who isn't familiar with assembly or any web server more
barebones than nginx, what benefits does something like this provide? Speed?
Could this be a solution for an extremely simple directory/static file web
server?

~~~
anonymouscowar1
This is a simple, single-threaded single-process accept-read-respond-loop web
server. It's vulnerable to trivial trickle DoS attacks and probably has other
issues. There are no advantages, the author just did this for fun.

The TCP part comes from C code in the kernel, so this headline is a little
misleading ;-).

~~~
kragen
Agreed. However, it _should_ be safe from buffer overflows, path traversal
attacks, XSS, and obviously CSRF. It should be fine other than DoS. Let me
know if you find any exceptions.

~~~
anonymouscowar1
It's hard to be vulnerable to XSS and CSRF with all-static content, no?

So, not only will a trickle DoS other clients, each byte will also force an
O(n) traversal of $buf (burning CPU). Granted, buf is only 1000 bytes, but
that's not great.

It looks like a request with no space could force you to walk (`repne scasb`)
through invalid memory after $buf. Also maybe corrupt it
(unescape_request_path).

It will also fail to correctly parse HTTP/0.9 (not a big deal, but part of
spec). The parsing code ignores the existence of verbs other than GET.
(Doesn't check that the verb is GET either.)

We don't validate that paths start with /, we just skip that byte. Okay:

    
    
            mov (path), %al
            ...
            cmp $'/, %al
            je badreq
    

Since valid GETs are of the form:

    
    
        GET /foo.txt HTTP/1.0
             ^-- path=buf+5
    

As you point out, a client close will cause SIGPIPE causing a crash (DoS).

That's all I see. But I'm not an asm expert and I'm sure I've missed
something.

~~~
kragen
> It's hard to be vulnerable to XSS and CSRF with all-static content, no?

You would think, but actually Apache managed to be vulnerable to XSS by
including bits of the request URL in its error paegs, if I remember right.
Last millennium, I think.

> So, not only will a trickle DoS other clients, each byte will also force an
> O(n) traversal of $buf (burning CPU). Granted, buf is only 1000 bytes, but
> that's not great.

Hmm, while I hadn't thought about that, and I should have, I think that's
probably okay; basically you're saying that you can get the machine to burn up
to, say, 2048 cycles by sending it a small TCP packet. Which means that a
4-core 2GHz server machine can't handle more than about four million packets
per second (well, one million until I parallelize), which is about 85
megabytes per second, or 680 megabits per second. There are probably other
bottlenecks in the code, the kernel, or your data center that will kick in
first. It's probably more effective to DoS the server by just requesting files
from it.

> It looks like a request with no space could force you to walk (`repne
> scasb`) through invalid memory after $buf.

It's possible I could have gotten this wrong, but I did _try_ to limit the
number of bytes it would scan to the bytes that it had actually read, by doing

    
    
        mov (bufp), %ecx
    

before the repne scasb. Did I screw that up?

> HTTP/0.9 ...verbs other than GET.

Yes, those are unimplemented features, and you're right that their lack makes
the server behave incorrectly; hopefully they don't result in security bugs. I
think they don't matter in practice, since nobody sends HTTP/0.9 requests or
HEAD requests, except by hand, do they?

> We don't validate that paths start with /, we just skip that byte.

Right. And the $'/ check below is to keep you from saying

    
    
        GET //etc/passwd HTTP/1.0
    

and getting /etc/passwd. In case that matters in 2013.

Thank you very much for looking over it!

~~~
bebna
Didn't send ab HEAD requests?

I know it does this by a given flag, but in some tests I have seen some HEADs
between my GETs. I haven't used ab for long time, so don't quote me on that.
Have u tried httpress[1] as a benchmark tool?

How about a simple check against the first byte equals G (DEC 71) if it is a
GET? Shouldn't be that expensive, I think.

Thanks for creating it.

[1]
[https://bitbucket.org/yarosla/httpress/wiki/Home](https://bitbucket.org/yarosla/httpress/wiki/Home)

~~~
kragen
I don't know if ab sends HEAD requests! Thanks for the link to httpress; I've
been having trouble with ab failing at high concurrencies (1000 concurrent
connections) and also being the bottleneck.

------
pmiller2
Neat little piece of performance art (pun intended).

~~~
jebblue
Good way to put it, was trying to think of something similar.

------
radikalus
No full tcp stack in assembly? =p

(Yes there's no point as it's better in hardware blah blah)

------
Vektorweg
I'm really happy that executable size doesn't matter for server software.
Because Yesod produce really big execs.

------
mikkom
> Depends on the C libraries.

^ That tells everything you need to know.

------
pekk
and I just got finished rewriting all my large webapps in some obscure Java
framework for performance, because of some benchmarks I saw on HN. Guess now I
have to rewrite it all in assembly, because more performance is always better
right?

~~~
anonymouscowar1
This is not a very fast webserver. Anything using sendfile() and
threads/processes will beat it handily.

~~~
kragen
I haven't measured, but I'm pretty sure you're right.

~~~
anonymouscowar1
Me too. You could probably find a single-threaded, small file benchmark where
they compare similarly (or this even compares better — it does almost
nothing). But this is not most benchmarks. Large files or multiple clients
will bench this server poorly compared to MT + sendfile(2).

This server is single threaded and artificially serializes requests, at a
minimum. The copy through userspace is going to hurt compared to sendfile for
larger files.

~~~
kragen
I made it fork. Now, on my netbook, it's able to handle in the neighborhood of
a thousand requests per second and 20 megabytes per second, with up to 2048
concurrent connections. Not, I think, spectacular performance, but acceptable
for many purposes. You can still DoS it by opening 2048 concurrent connections
to it; as long as they are open, it will open no new connections, and it has
no timeout.

This has bloated the executable up to 2088 bytes.

------
meshko
OMG all these macros. It looks more like Python then Assembly. Come on, real
men do not use macros.

~~~
derleth
> Come on, real men do not use macros.

The sexism and historical ignorance in this sentence are in a race to see
which can be more breathtaking.

Regardless of which wins, meshko will look like a complete fool to anyone who
knows what they're talking about.

~~~
meshko
Wait, are you serious?

~~~
derleth
Yes. I am. The fact you weren't makes the sexism all the more odious.

~~~
meshko
Can you explain? I am genuinely curious as two the line of your thoughts now.

~~~
derleth
> Can you explain? I am genuinely curious as two the line of your thoughts
> now.

The worst forms of bias and discrimination are unexamined, because they can
fester and influence thought and action without ever being questioned. It's
difficult to argue someone out of a position they don't even realize is a
position that is up for argument.

------
puppetmaster3
Likely does not have any back door. Rumor is GCC opens back door for you know
who.

~~~
StavrosK
Voldemort?

~~~
puppetmaster3
Shhh.

