
Asmttpd: Web server for Linux written in amd64 assembly (2017) - jxub
https://github.com/nemasu/asmttpd
======
tptacek
Isn't this basically a web server for Linux written in C, except with
absolutely no standard library optimizations?

Yes, it's assembly, but it's basically a thin script over a set of string
library functions which are themselves just naive versions of what's already
in libc. Most of this is just strcpy/strcat/strchr/strstr; the only reason
it's not one giant stack overflow is that it doesn't actually implement most
of HTTP (for instance: nothing reads Content-length, and everything works off
a static buffer that is sized to be as large as the maximum request it will
read). What's the point of writing assembly if you're going to repeatedly scan
to the end of strings as you try to build them?

I'm not taking away anything from the project as an exercise. It's neat! But
like: it's not something you'd ever want to use, or a productive path to go
down to make a good webserver.

~~~
userbinator
_What 's the point of writing assembly if you're going to repeatedly scan to
the end of strings as you try to build them?_

I'd have to benchmark to make any claims about speed, but from my past, quite
extensive experience with using Asm, it's not uncommon for even a
simple/naive/asymptotically less efficient algorithm written in Asm to
outperform a more complex/clever/asymptotically efficient one in an HLL at the
problem sizes encountered in practice, simply by virtue of the constant factor
being much smaller. For the same reason a bubble/insertion sort in Asm can
easily beat the standard library sort if your inputs aren't that big.

Of course if you combine Asm with efficient algorithms (which can sometimes be
easier to write than in a HLL), then you can get much closer to how much the
CPU can actually do.

~~~
MaxBarraclough
But for web servers, it's not about micro-optimisations, but having an
architecture capable of efficiently handling concurrency, no?

nginx and Apache are both written in C, but nginx can outperform/outscale
Apache. That's because of high-level architectural differences, not from
tweaking assembly code. An HTTP server isn't doing much in data-
transformation/algorithmic terms, it just has to scale efficiently.

If your code is synchronous and non-parallel, it's going to lose every time,
even if it's highly tuned assembly.

------
mabynogy
Nobody mentioned rwasa so I do it (an HTTPs webserver written in asm):

[https://2ton.com.au/rwasa/](https://2ton.com.au/rwasa/)

I proposed to debian (on an IRC channel for that) to make a package for rwasa
but I've been told that "nobody does asm in 2017". I wonder if it's still true
in 2018.

~~~
shakna
There are still programs available in Debian written in asm, but they tend to
be exceptional.

Picolisp 64bit

Luajit (mostly)

~~~
lispre
Really? Picolisp was build with asm? not C as SBCL

~~~
shakna
Picolisp 32bit is C, but 64bit is asm. There is also Ersatz Picolisp which is
Java.

All sources are really well outlined, and worth the read through. But you
might start with the 64bit's README. [0]

[0] [https://software-lab.de/doc64/README](https://software-
lab.de/doc64/README)

------
bumholio
Sounds like an excelent front end, for best performance. For the back end
stability is paramount, that's why I would always recommend to pair this with
a shell script server:
[https://github.com/avleen/bashttpd](https://github.com/avleen/bashttpd)

Bash is undeniably the language with the highest possible stability, you can
literally feel the stability and security when a file is served and the hard
drives go crazy.

~~~
tptacek
Why would you assume best performance? Web server performance has more to do
with I/O strategy than it does with raw compute.

~~~
mst
Re-read the comment you're replying to. It's a joke.

------
saagarjha

        %macro stackpush 0
            push rdi
            push rsi
            push rdx
            push r10
            push r8
            push r9
            push rbx
            push rcx
        %endmacro
        
        %macro stackpop 0
            pop rcx
            pop rbx
            pop r9
            pop r8
            pop r10
            pop rdx
            pop rsi
            pop rdi
        %endmacro
    

I guess that's one way of saving registers. Not particularly efficient, I
guess, but it works…

~~~
agumonkey
I wonder if there are cpus with 1-instr state save

ps: thank you all for the answers

~~~
rwmj
ARMv7 (not AArch64) has the STM instruction that lets you push a register set,
selected by bitmask. eg:

    
    
        STMFD sp!, {r3-r7,lr}
    

(I believe the "FD" suffix is to do with the stack growing down - "full
descending")

Of course this is just implemented with microcode so it's not really any more
efficient than a series of PUSHes, except there's a bit less I-cache pressure.

~~~
jlarcombe
In the original ARM design the multiple register transfers were much more
efficient than the equivalent single register transfers because of the simple
architecture which had an inherent load delay (it couldn't fetch an
instruction and a data word in a single cycle). When they switched to the
Harvard model they lost their advantage.

Talking of the ARM reminds me of older heroic feats of assembly-programmed
internet software in the Acorn days, such as Ben Dooks' all-asm TCP/IP stack
and Jon Ribbens' web browser. Probably not been done that often...

------
sctb
A couple of past threads:

[https://news.ycombinator.com/item?id=9571827](https://news.ycombinator.com/item?id=9571827)

[https://news.ycombinator.com/item?id=7170010](https://news.ycombinator.com/item?id=7170010)

------
inamberclad
This is some of the best written assembly I've seen. I can actually follow
what's going on.

------
k1ns
This is really cool. I admire your bravery.

I noticed that certain options are hardcoded (such as the port number). As
someone with very little experience in assembly, how difficult would it be to
make this dynamic via environment variables?

~~~
tormeh
Slightly off topic, but I don't get environment variables. To me they look
like implicit options that are based on what shell session you're currently
in. Stuff that hangs around waiting to kill you when you're not looking, like
a cecum.

Why would anyone want environment variables when you can have config files or
explicit options instead?

~~~
tlarkworthy
I am very pro environment vars and consider files an anti pattern. Files are
like globals, sibling processes end up sharing state, plus they survive longer
than their use case.

Env variables cascade, which is why they have wider uses than program args.
You can organize a system as a group of processes and envs are a good way to
share state and abstract out the details like prod vs staging without
affecting the program implementation.

~~~
kjeetgill
I'd say it's really about size. A few dozen configs in environmental vars,
sure. But once you expand beyond that, need to manage variants of them, need
to compose them from fragments you start to see benefits from something more
structured like xml, toml, json, or yaml etc.

~~~
tlarkworthy
I just start loading env variables from files, e.g. source blah.env

This way, my bash environment mirrors the task I am working on

------
laurent123456
That's impressive, but I couldn't find the reason for it. Was it done just for
the challenge, or can their be a use for this kind of server?

~~~
yjftsjthsd-h
It's tiny; might have embedded use?

~~~
tambourine_man
Embedded x86_64? Is that a common thing?

~~~
alxlaz
It's quite common. "Embedded" doesn't mean just "low power and little number
crunching ability", it's just a fancy way of saying "special purpose". E.g.:
[https://buy.advantech.com/Industrial-Computers/Fanless-
Indus...](https://buy.advantech.com/Industrial-Computers/Fanless-Industrial-
PCs/model-MIC-7700H-00A1.htm) is an industrial computer; [https://www.edge-
core.com/productsInfo.php?cls=&cls2=&cls3=1...](https://www.edge-
core.com/productsInfo.php?cls=&cls2=&cls3=15&id=1) is a "whitebox" switch, but
it's a representative of many OEM/ODM designs.

x86-64 is a great architecture for many applications. It's cheap (due to
volume), well-understood, has great compatibility, and is pretty much the
best-supported architecture when it comes to software.

Edit: that being said, no, this doesn't have any embedded use today. Embedded
x86 isn't 386 processors anymore. The typical scenario for an embedded x86-64
is "I need to drive this specific application and I want to leverage the
Linux/FreeBSD/Windows environment for programming". That already puts you way
ahead of the point where writing assembly code is useful performance-wise, and
you probably don't want to put ASM code you downloaded off the Internet in a
network-facing position :-).

------
ebikelaw
Looking at the code I’d be surprised if this is anywhere near as fast as a
simple equivalent written in C++ (which wouldn’t be as pessimistic with the
register saving, in an opt build) or even compared to bozohttpd.

------
anyfoo
Neat, but unless I am missing something, a nightmare in terms of security.
Assembly has a worse type system than even C: essentially no type system at
all!

(Or rather the weakest, simplest, most dynamic type system you can imagine.)

~~~
nickpsecurity
There has been a lot of work on that. Look up Typed Assembly Language,
Security in front of that, and Dependently-Typed Assembly Language. Here's one
that was integrated with a safe, C variant.

[https://www.cs.cornell.edu/talc/](https://www.cs.cornell.edu/talc/)

~~~
ChickeNES
Unfortunately it appears that the linked project is not free software :(

~~~
nickpsecurity
I have the source but you're right: no obvious license file. I might email
them about it.

------
daniel-cussen
25.8 KB zip file! Now that's tight code.

~~~
vbezhenar
I'm not sure that properly written C version would have bigger size (assuming
that unnecessary code is stripped, of course).

~~~
Drdrdrq
Curious: how does Rust cope in this regard?

~~~
estebank
They're on the bigger side, due to a combination of defaulting to static
linking, some optimizations not having been possible (ThinLTO comes to mind),
lots of debug strings and metadata, as well as the use of jemalloc[1][2].

[1]:
[https://jamesmunns.com/blog/tinyrocket/](https://jamesmunns.com/blog/tinyrocket/)
[2]: [https://users.rust-lang.org/t/rust-binary-sizes-once-
again/1...](https://users.rust-lang.org/t/rust-binary-sizes-once-again/16287)

~~~
Thiez
Well if you only use libc like the assembly server does, I imagine you can use
`#![no_std]`. Your Rust will use `unsafe` a lot and be very unidiomatic, but
small. Is using the system allocator for binaries on stable yet?

~~~
steveklabnik
Next release.

------
fbn79
AVE CESARE!

------
ConcernedCoder
Nice? I mean, how would you be certain it was NOT a "Web VIRUS for Linux
written in amd64 assembly (2017)" unless you could read grok asm?

NOTE: I'm not saying it IS or ISN'T, and although I CAN grok asm, I'm not
going to take the time to do it right now...

Just struck me as possible, given the idea of a high-level language is to
mainly provide abstraction over implementation details ( point being, you
can't really get any MORE implementation detail specific than asm... )

(edit: words...)

~~~
saagarjha
Well, how do you know the C library you just downloaded isn't a virus either?

