
Asmttpd – Web server for Linux written in amd64 assembly - pykello
https://github.com/nemasu/asmttpd
======
ndesaulniers
I'll try and get this building for OSX.

For the uninitiated, might I recommend my:
[http://nickdesaulniers.github.io/blog/2014/04/18/lets-
write-...](http://nickdesaulniers.github.io/blog/2014/04/18/lets-write-
some-x86-64/)

Though, this is written in yasm syntax, which is slightly different.

Also, keep an eye out for a blog post on Interpreters, Compilers, and JITs I'm
working on (cleaning it up and getting it peer reviewed this or next week)!

 __update 1 __Actually, would the syscall 's be different between Linux and
OSX? Let's find out, once this builds! _hammers away_

 __update 2 __Got it building and linking. bus error when run, debugging with
gdb.

 __update3 __Can 't generate dwarf2 debug symbols for OSX? $ yasm -g dwarf2

 __update 4 __Careful, this tries to listen on port 80 [0] (0x5000 (LE) ==
5*16^1 == 80), I would never run any assembly program off the web with
elevated privileges. I recommend 0xB8B0 (LE, port 3000).

update 5

> Actually, would the syscall's be different between Linux and OSX?

Looks like yes:
[http://unix.stackexchange.com/a/3350](http://unix.stackexchange.com/a/3350)
These might be close to shim out (OSX and Linux at least share a calling
convention, unlink Windows). I'll upstream what I have.

[0]
[https://github.com/nemasu/asmttpd/blob/master/main.asm#L24](https://github.com/nemasu/asmttpd/blob/master/main.asm#L24)

~~~
j42
Freudian slip?

unlink(Windows) indeed...

------
pavlov
It's been closer to 20 years since I last read a complete program in x86
assembly, so this is quite fun to look at.

I'm somehow disappointed (quite unreasonably, of course) that the code uses
plain old zero-terminated C strings instead of something more exotic. One of
the fun things about assembly is that you get to reinvent basic language
features on the fly -- calling conventions, data layout, strings, everything.

~~~
maguirre
Out of curiosity What would you have done for strings?

~~~
pavlov
Well, for a HTTP server, I don't have a specific idea... But in general, the
fun part would be trying to come up with string representations that are
optimized for the particular application.

The original 1984 _Elite_ computer game is famous for its huge galaxy full of
planets. Each of them had individual names and descriptions such as "Lave is
most famous for its vast rain forests and the Laveian tree grub."

Yet those strings were never stored as plain strings. The game had to run in
32kB of memory, so almost all strings were stored in a tokenized form and
expanded using a pseudo-random number generator:

[http://wiki.alioth.net/index.php/Random_number_generator](http://wiki.alioth.net/index.php/Random_number_generator)

That article shows how the planet description strings were stored and
reconstructed on the fly. The base representation for the aforementioned
description of planet Lave was only a handful of bytes: "\x8F is \x97"

So I think _Elite_ is a pretty good example of an application written in
assembly that didn't have anything like a generic string type.

~~~
mattgodbolt
Indeed! Full details of the string routine at [http://xania.org/201406/elites-
crazy-string-format](http://xania.org/201406/elites-crazy-string-format) if
you're interested in quite how mad it was!

------
kragen
I wrote httpdito, a web server for Linux in 386 assembly, a couple of years
ago (mostly outdated discussion is at
[https://news.ycombinator.com/item?id=6908064;](https://news.ycombinator.com/item?id=6908064;)
a README is at [http://canonical.org/~kragen/sw/dev3/httpdito-
readme](http://canonical.org/~kragen/sw/dev3/httpdito-readme)) and I was happy
to get the executable under 2000 bytes. I actually used it the other day to
test a SPA, although for some things its built-in set of MIME-types leaves
something to be desired.

But it doesn't have default documents, different kinds of error responses,
TCP_CORK, sendfile() usage, content-range handling, or even request logging.
So asmttpd is _way_ more full-featured than httpdito, and it's still under 6K.

(...httpdito possibly doesn't have any bugs, either, though ☺)

------
yellowapple
Relevant: [http://i.imgur.com/INBvStO.png](http://i.imgur.com/INBvStO.png)

------
barosl
Days ago I saw a lightweight httpd written in C here, yesterday a C++ header-
only httpd library caught my mind, and now an httpd in assembly. I'm curious
what would come next...

~~~
vardump
Oh, I bet someone writes httpd in vhdl or verilog... Unbeatable header parsing
time I'm sure.

Or maybe in CSS.
[https://news.ycombinator.com/item?id=9567183](https://news.ycombinator.com/item?id=9567183)

~~~
nathan_f77
> Oh, I bet someone writes httpd in vhdl or verilog

I would love to see that.

------
wyc
Shameless plug for a companion IRC bot in ARM assembly:
[https://github.com/wyc/armbot](https://github.com/wyc/armbot)

------
voidiac
The name is confusing, the first thing i thought of was an SMTP-server.

------
e12e
A little strange to see: "Sendfile can hang if GET is cancelled." in the
readme and no corresponding issue. Not even one closed as "wontfix". Sounds
like DOS?

------
mariopt
Why would someone write a web server in assembly? just for fun?

~~~
fleitz
Probably to lower overhead associated with C language features, similar to the
reason why many people write things in C instead of a higher level language.

~~~
stass
To avoid overhead people implement specialized compilers suited for the task
at hand. If anything, going down to C, and especially assembly will hurt
performance as low-level code is much harder to optimize for obvious reasons.
Above all of that real-world performance comes from proper system-level
design, not micro-optimizations, and using a low-level language (be it C, C++
or assembly) will prevent one from quickly iterating over different ideas.

~~~
kibibu
> low-level code is much harder to optimize for obvious reasons

What are the obvious reasons I'm missing?

~~~
stass
As cgabios noted below low-level code obduscates intended behaviour. Ever
tried to write a C optimizer? A trivial example is a for loop vs map -- the
former has inherent ordering semantics and compiler does not have any way of
knowing if this behavior needs to be preserved, while the latter just tells
that a particular operation needs to be applied to each element so compiler is
free to reorder/parallelize/etc. There are much worse situations that arise
from low-level language having to preserve the underlying machine memory
semantics (that is one of the reasons why it is hard to compile low level
languages like C or C++ to e.g. Javascript. Compiling x86 assembly would
reuire a full machine emulation).

This is discussed in detail in most introductory CS books if you would like to
learn more.

~~~
fleitz
In asm you write loops because they are easy, map is hard (and generally slow
because it's a lot more code.)

ASM derives performance from specialization, ASM asks, how often is this code
ACTUALLY going to run on another architecture, OS, etc? And then gains
performance by not supporting those things via abstractions, etc.

Throw away your CS textbook and run benchmarks, reality dictates theory, not
vice versa.

~~~
stass
Specialization is what compilers do really well. :) Humans -- not so much.

------
lelf
Maybe it's just me, but I honestly don't know what this is doing near the HN
top. It's more or less literal translation from C.

~~~
bjourne
No, a literal translation is what you get when you write a http server in C
and inspect what assembly code it produces for x86.64. Since this assembly
code is nowhere close to that output it is not a literal translation.

~~~
rdc12
Only if you use a completley naive compiler, any level of optimisation moves
from being a literal translation

~~~
bjourne
No, any correct translation a c compiler produces is, by definition, a literal
translation.

~~~
rdc12
No, that simply means that they are semantically equivilant, which is very
different to a literal translation. To quote an online dictionary [0], "2\.
Word for word; verbatim: a literal translation.". Optimisation in compilers is
certainly not word for word.

Take a simple problem like the FizzBuzz problem, write it as the simple
obvious branching style. Now compile it with GCC or Clang (with -O3) and you
end up with lookup table (or at least I did a few months back. Semantically
equivalent but not literal "word for word" translation.

[0]
[http://www.thefreedictionary.com/literal](http://www.thefreedictionary.com/literal)

~~~
bjourne
You are missing that there exists many more than one literal translation for a
particular C program to asm. Using your logic a compiler could not produce a
literal translation of any program unless its output to 100% matched that of
all other compiles for the same program.

------
polarbaer
No dependencies at all, runs on Docker 'FROM scratch', Nice! -
[https://registry.hub.docker.com/u/0xff/asmttpd/](https://registry.hub.docker.com/u/0xff/asmttpd/)

------
kryptiskt
I propose that a web framework be called Assembly on Ambulator.

~~~
lsiebert
Given that it's so small, 6k, I'd called the framework based on this Assembly
on Alleys.

It could totally implement a DSL... call it "C" for convenience, that
generates the required assembly code :-).

~~~
cbd1984
> call it "C" for convenience

As distinct from C, I take it.

~~~
lsiebert
No, that is part of the joke I was (apparently) failing at making.

------
markus2012
benchmarks?

:-)

~~~
martin1975
Just because it is in ASM, doesn't mean an exact equivalent in C won't smoke
it performance wise. Just sayin... Aside of intellectual curiosity, one would
be very hard pressed to write ASM code that is even barely more efficient than
C code generated by a decent compiler, i.e. clang, or Intel C....

~~~
harry8
Just because it's asm doesn't mean it's fast, sure. Just because it's C
doesn't mean it's faster than an implementation in your favourite scripting
language.

Having said that, compilers do pretty badly on C-to-simd optimisation. The
best you get is loop vectorization if it's really simple logic. You can
usually get some pretty good wins there. The fact you lay out your memory for
simd usually is a win all of its own due to cache prefetching even if you
don't actually use any simd instructions. Compilers need heuristics to manage
cache when you know what you're trying to do, (eg when should it use non-
temporal writes, for example?) Fast C code is written while having a really
clear mental model of the underlying architecture and the assembly that the C
will produce with -O3 (or whatever flag is relevant to your compiler) and then
checked with -S or objdump -D, profiled with callgrind/cachegrind, perf, rdtsc
etc...

The compiler really can't "Do it for you" You /can/ use a compiler as one of
your tools when /you/ do it. As Randy Hyde points out you can always beat the
compiler because you can use its generated assembly language in every case you
can't beat, so the absolute worst you get is a tie.

So yeah, you can totally smoke clang, Intel, microsoft and gnu C compiler and
get paid something for doing it in certain industries too. :-)

Mike Acton being aggressively opinionated on the subject, but the lecture is
_really_ good (despite/because of) the bits you'll disagree with and the
manner he'll rub you the wrong way.
[https://www.youtube.com/watch?v=rX0ItVEVjHc](https://www.youtube.com/watch?v=rX0ItVEVjHc)

------
sagargv
What's the performance like?

------
SteveBerta
Amazing!

