Asmttpd: Web server for Linux written in amd64 assembly (2017)

tptacek · on July 14, 2018

Isn't this basically a web server for Linux written in C, except with absolutely no standard library optimizations?

Yes, it's assembly, but it's basically a thin script over a set of string library functions which are themselves just naive versions of what's already in libc. Most of this is just strcpy/strcat/strchr/strstr; the only reason it's not one giant stack overflow is that it doesn't actually implement most of HTTP (for instance: nothing reads Content-length, and everything works off a static buffer that is sized to be as large as the maximum request it will read). What's the point of writing assembly if you're going to repeatedly scan to the end of strings as you try to build them?

I'm not taking away anything from the project as an exercise. It's neat! But like: it's not something you'd ever want to use, or a productive path to go down to make a good webserver.

userbinator · on July 14, 2018

What's the point of writing assembly if you're going to repeatedly scan to the end of strings as you try to build them?

I'd have to benchmark to make any claims about speed, but from my past, quite extensive experience with using Asm, it's not uncommon for even a simple/naive/asymptotically less efficient algorithm written in Asm to outperform a more complex/clever/asymptotically efficient one in an HLL at the problem sizes encountered in practice, simply by virtue of the constant factor being much smaller. For the same reason a bubble/insertion sort in Asm can easily beat the standard library sort if your inputs aren't that big.

Of course if you combine Asm with efficient algorithms (which can sometimes be easier to write than in a HLL), then you can get much closer to how much the CPU can actually do.

MaxBarraclough · on July 14, 2018

But for web servers, it's not about micro-optimisations, but having an architecture capable of efficiently handling concurrency, no?

nginx and Apache are both written in C, but nginx can outperform/outscale Apache. That's because of high-level architectural differences, not from tweaking assembly code. An HTTP server isn't doing much in data-transformation/algorithmic terms, it just has to scale efficiently.

If your code is synchronous and non-parallel, it's going to lose every time, even if it's highly tuned assembly.

pcwalton · on July 14, 2018

> it's not uncommon for even a simple/naive/asymptotically less efficient algorithm written in Asm to outperform a more complex/clever/asymptotically efficient one in an HLL at the problem sizes encountered in practice, simply by virtue of the constant factor being much smaller

You're conflating two different things: constant factors based on algorithm choice and the merits of hand-written assembler. It's true that sometimes asymptotically worse algorithms beat asymptotically better ones in practice. It's a lot less clear that humans can regularly beat compilers for typical scalar code.

1996 · on July 14, 2018

I do not care about good. I care about good enough. I would want to use that, on systemd nspawns (or whatever the current name of systemd container is) to see if I could get more performance than nginx for APIs returning fixed length things.

I wonder how efficient it would be to spawn a few hundreds.

saagarjha · on July 14, 2018

As I've mentioned in another comment, I wouldn't place any bets on the efficiency of this code. It's written with easy of understanding in mind at the expense of code size and speed.

mabynogy · on July 13, 2018

Nobody mentioned rwasa so I do it (an HTTPs webserver written in asm):

https://2ton.com.au/rwasa/

I proposed to debian (on an IRC channel for that) to make a package for rwasa but I've been told that "nobody does asm in 2017". I wonder if it's still true in 2018.

shakna · on July 14, 2018

There are still programs available in Debian written in asm, but they tend to be exceptional.

Picolisp 64bit

Luajit (mostly)

lispre · on July 14, 2018

Really? Picolisp was build with asm? not C as SBCL

shakna · on July 14, 2018

Picolisp 32bit is C, but 64bit is asm. There is also Ersatz Picolisp which is Java.

All sources are really well outlined, and worth the read through. But you might start with the 64bit's README. [0]

[0] https://software-lab.de/doc64/README

M00nF1sh · on July 13, 2018

FYI: nobody does asm in 2018

bumholio · on July 13, 2018

Sounds like an excelent front end, for best performance. For the back end stability is paramount, that's why I would always recommend to pair this with a shell script server: https://github.com/avleen/bashttpd

Bash is undeniably the language with the highest possible stability, you can literally feel the stability and security when a file is served and the hard drives go crazy.

tptacek · on July 14, 2018

Why would you assume best performance? Web server performance has more to do with I/O strategy than it does with raw compute.

mst · on July 14, 2018

Re-read the comment you're replying to. It's a joke.

saagarjha · on July 13, 2018

    %macro stackpush 0
        push rdi
        push rsi
        push rdx
        push r10
        push r8
        push r9
        push rbx
        push rcx
    %endmacro
    
    %macro stackpop 0
        pop rcx
        pop rbx
        pop r9
        pop r8
        pop r10
        pop rdx
        pop rsi
        pop rdi
    %endmacro

I guess that's one way of saving registers. Not particularly efficient, I guess, but it works…

userbinator · on July 14, 2018

You can thank AMD for mysteriously removing PUSHA/POPA from the opcode map in 64-bit mode (not even replaced by any new useful instructions, just made invalid.)

simcop2387 · on July 14, 2018

I don't think they've given reasons for it, but it's likely because they just completely destroy any kind of out of order execution. They'd end up requiring a fence before either instruction to ensure that you only push or pop the correct values and you couldn't speculate at all past them. Combine that with more than double the size of needed space (double number of registers, and double the size of them) it seems pretty wasteful. And then there's the fact that modern compilers are likely already avoiding those instructions because of the timing of saving extra registers that you aren't using in a given function, it probably just doesn't make much sense to keep them anymore.

userbinator · on July 14, 2018

but it's likely because they just completely destroy any kind of out of order execution.

The same goes for the longer sequence of individual pushes or pops --- they all depend on the stack pointer. In fact, the single instruction needs to only adjust it once by the total number of registers pushed/popped (since it is a constant).

In other words, PUSHA/POPA already decode internally to a bunch of moves and one ALU op for the stack pointer which can be scheduled OoO. All they needed to do for 64-bit mode was adjust the constant (by multiplying it by two) and emit more uops for the additional registers, but they didn't for some otherwise inexplicable reason. All the machinery to do it was existing.

And then there's the fact that modern compilers are likely already avoiding those instructions because of the timing of saving extra registers that you aren't using in a given function, it probably just doesn't make much sense to keep them anymore.

Compilers won't ever use PUSHA/POPA but lots of other code will --- BIOS, executable packers, OS state-saving code (the perfect example of what these instructions were for?), etc.

See the story of SAHF/LAHF for a similar and even more astoundingly bad decision.

agumonkey · on July 13, 2018

I wonder if there are cpus with 1-instr state save

ps: thank you all for the answers

rwmj · on July 13, 2018

ARMv7 (not AArch64) has the STM instruction that lets you push a register set, selected by bitmask. eg:

    STMFD sp!, {r3-r7,lr}

(I believe the "FD" suffix is to do with the stack growing down - "full descending")

Of course this is just implemented with microcode so it's not really any more efficient than a series of PUSHes, except there's a bit less I-cache pressure.

jlarcombe · on July 14, 2018

In the original ARM design the multiple register transfers were much more efficient than the equivalent single register transfers because of the simple architecture which had an inherent load delay (it couldn't fetch an instruction and a data word in a single cycle). When they switched to the Harvard model they lost their advantage.

Talking of the ARM reminds me of older heroic feats of assembly-programmed internet software in the Acorn days, such as Ben Dooks' all-asm TCP/IP stack and Jon Ribbens' web browser. Probably not been done that often...

PeCaN · on July 13, 2018

Itanium does this automatically. You declare what registers you're using in the the function prologue and it handles popping/pushing/renaming. No register window exceptions too.

emteycz · on July 14, 2018

Isn't Itanium obsolete? I'd love to learn otherwise.

PeCaN · on July 14, 2018

Sadly it's dead now. In a cruel twist of fate, Intel killed it off shortly before we found out about all the speculative execution attacks and found that everything else is horrifically vulnerable.

emteycz · on July 14, 2018

Do you mean that Itanium would have been safe?

PeCaN · on July 14, 2018

It's inherently not vulnerable to the Spectre/Meltdown family of attacks. They rely on speculative, out-of-order execution on modern CPUs, but Itanium is an in-order core with very limited (and software-controlled) speculation.

It's actually not vulnerable to a bunch of other attacks as well (e.g. a buffer overflow cannot overwrite the return address on Itanium).

pjc50 · on July 13, 2018

ARM has stmdb / ldmia which appear in practically every function prolog/epilog. It also has "banking" systems to swap to a different set of registers on interrupts, which saves time and stack space.

pwg · on July 14, 2018

Another one: the old Z-80 CPU had two sets of registers, and a single instruction to swap between the main and the alternate register set.

http://landley.net/history/mirror/cpm/z80.html (search for "alternate registers").

nineteen999 · on July 16, 2018

Sadly it appears that none of the Z80 C compilers (at least the ones I've used, Hi-Tech C and sdcc) are smart enough to use it.

saagarjha · on July 13, 2018

Even if there was one, you'd still be saving registers unnecessarily. If you only clobber one register in a procedure there's no need to save and restore all of them.

kijiki · on July 13, 2018

x86 has LOADALL and SAVEALL.

https://en.wikipedia.org/wiki/LOADALL

msla · on July 13, 2018

Those undocumented instructions don't exist on x86_64, and they could be used to do bizarre things on the processors which had them:

> As the two LOADALL instructions were never documented and do not exist on later processors, the opcodes were reused in the AMD64 architecture.[8] The opcode for the 286 LOADALL instruction, 0F05, became the AMD64 instruction SYSCALL; the 386 LOADALL instruction, 0F07, became the SYSRET instruction. These definitions were cemented even on Intel CPUs with the introduction of the Intel 64 implementation of AMD64.[9]

[snip]

> Because LOADALL did not perform any checks on the validity of the data loaded into processor registers, it was possible to load a processor state that could not be normally entered, such as using real mode (PE=0) together with paging (PG=1) on 386-class CPUs.[7]

scandinavian · on July 13, 2018

What about PUSHAD and POPAD for x86, those are not undocumented right?

jamieiles · on July 13, 2018

There is no pusha in long mode though, you need to push individual regs.

elcritch · on July 13, 2018

BeagleBone’s processor the AM335x PRU processors by TI has two “PRU” coprocessors. They have “xin” and “xout” instructions that can copy a register bank in 1 cycle. Pretty handy for quick data gathering but tricky to use in C.

blattimwind · on July 13, 2018

Sure. It's called windowed registers and you have one instruction each to move the window right/left.

sctb · on July 13, 2018

A couple of past threads:

https://news.ycombinator.com/item?id=9571827

https://news.ycombinator.com/item?id=7170010

inamberclad · on July 14, 2018

This is some of the best written assembly I've seen. I can actually follow what's going on.

k1ns · on July 13, 2018

This is really cool. I admire your bravery.

I noticed that certain options are hardcoded (such as the port number). As someone with very little experience in assembly, how difficult would it be to make this dynamic via environment variables?

tormeh · on July 13, 2018

Slightly off topic, but I don't get environment variables. To me they look like implicit options that are based on what shell session you're currently in. Stuff that hangs around waiting to kill you when you're not looking, like a cecum.

Why would anyone want environment variables when you can have config files or explicit options instead?

jolmg · on July 14, 2018

They have their advantages over config files and command line options. They're better than config files because you can ask for different behavior in different invocations, and they're better than command line options because it allows you to control a program even when you don't invoke it directly.

Consider, for example, wanting different options for less between when it's called by you, or by man, or by git, or by journalctl, or any other utility.

    alias man='LESS="X$LESS" man'
    alias journalctl='LESS="S$LESS" journalctl'

It's also useful when you want all processes that follow from an invocation to share a certain configuration, and when they follow from another invocation to have another. An example of this is $DISPLAY. I startx on my tty1 and that invokes the Xorg server, exports DISPLAY with the display of the server, typically :0, and executes my window manager. When I press certain shortcut keys, applications will be called, and they, in turn, might call other applications, and they all need to know that when they open up a window it should be on display :0. If I login remotely through ssh, forwarding my X11 service, the applications that I call on the same machine where other processes of the same applications opened their windows on :0, should now open their windows on localhost:10, so I get to see them on the laptop I'm logging in from. If someone else wants to use their account on the same computer, they might switch to tty2 and startx themselves, and all the applications they open should open on their Xorg server process with display :1, not mixing with the other windows I opened locally or on the laptop.

This is also why $LANG, $HOME, $USER, and $TERM are useful. They're stuff that should shared by process trees. Their purpose can't be fulfilled by configuration files or command line options.

xorcist · on July 14, 2018

> you can ask for different behavior in different invocations,

Pretty much every software supports several config files (system wide, invocation specific) for this reason.

It's not harder to write a line to a file than to set an environment variable.

Environment variables have opaque size limitations. Any non-trivial data is likely to be quoted or encoded in some way, which is going to vary between your applications, and there's going to be quirks in parsing.

It is much easier to programmatically reason about state of standard file formats on disk, instead of resorting to parsing environments of running processes.

Environment variables is inherited to child processes (which is the point of using them). That's going to have security implications for you. Especially if those processes also read their configuration from environment.

And yes, given the choice, I think you will find that most people uses .gitconfig instead of setting their git configuration via environment variables, for these very reasons.

jolmg · on July 15, 2018

Not every foot fits into the same shoe. There are things that are appropriate for environment variables and things that are not, just like there are things that are appropriate for config files or command line options and things that are not.

Your post is weird to me, because you seem to have interpreted that I somehow implied that we shouldn't use configuration files and that everything should be done through environment variables. That's not at all the case.

On the other hand, your post seems to imply that environment variables should never be used. Is that so? Because if it is, I'd like to know how you'd think that $DISPLAY or $TERM could be better substituted by another solution. I'm not even going to limit you to config files and command line options. I've already painted a concrete scenario of the use of $DISPLAY in my previous post, can you paint that same scenario with another solution?

Now to paint a scenario of the use of $TERM. $TERM controls how terminal applications communicate with the terminal to use its features like clearing the screen, moving the cursor around, etc. Imagine one server and multiple people connecting to it through ssh in each of their computers. Each person uses a different terminal in their machines. The server terminal applications use $TERM to determine how to use the different features of the terminals that are connecting to it. These are not only the applications that the user launches through the shell, but also the ones that are called indirectly via other programs that the user has invoked. Also keep in mind that the server might simply not recognize the type of terminal you have (it doesn't have the appropriate terminfo file installed), but you might know another terminal that is similar to yours and that the server might know. In that case, you'll want to inform all programs you launch and the ones they launch to treat your terminal as if it were that other terminal (this is when you'd set $TERM in the shell session to something else). Can you rethink this so the terminal applications can use something other than $TERM to determine the type of terminal that's in use and still allow the user to override it?

> Pretty much every software supports several config files (system wide, invocation specific) for this reason.

How would the program know what configuration file to use for invocation specific configuration? If it's not through an environment variable, then I guess you specify it by command line option, and, like I said, that's not going to help when you don't control the options that are passed to the program when it's another program and not yourself that makes the call.

lewisinc · on July 13, 2018

It's harder to accidentally commit a running process's environment to version control ;)

easytiger · on July 14, 2018

> Why would anyone want environment variables when you can have config files or explicit options instead?

If you have an underlying .so used by an application that hasn't implemented optioning at init of said library it can then let you control things in something that has no runtime configuration source. I've written several such libraries. Also good for controlling things you override with LD_PRELOAD

geezerjay · on July 13, 2018

Config files set per user/system options whether they are set locally or globally, while env variables set per options per shell.

Global options > user options > env variables > command line parameters

emteycz · on July 13, 2018

Isn't it the other way around? Command line params are more than env vars and so on?

earenndil · on July 14, 2018

I think those are arrows, not greater-than-signs (i.e. load config from global options, then overwrite that with user options, etc.)

emteycz · on July 14, 2018

Oh, me. Sorry, you're totally correct.

tlarkworthy · on July 13, 2018

I am very pro environment vars and consider files an anti pattern. Files are like globals, sibling processes end up sharing state, plus they survive longer than their use case.

Env variables cascade, which is why they have wider uses than program args. You can organize a system as a group of processes and envs are a good way to share state and abstract out the details like prod vs staging without affecting the program implementation.

kjeetgill · on July 13, 2018

I'd say it's really about size. A few dozen configs in environmental vars, sure. But once you expand beyond that, need to manage variants of them, need to compose them from fragments you start to see benefits from something more structured like xml, toml, json, or yaml etc.

tlarkworthy · on July 14, 2018

I just start loading env variables from files, e.g. source blah.env

This way, my bash environment mirrors the task I am working on

ConcernedCoder · on July 14, 2018

I'd say it's really about size.

My wife assures me that size doesn't matter, it's actually the way you use your configs to manage your environmental vars.

(sorry, I just couldn't help myself...)

gpderetta · on July 14, 2018

Envs are also globals. Dynamically scoped globals. They are evil and allow all kind of spooky action at distance, but sometimes they are a necessary evil.

kazinator · on July 15, 2018

Explicit options require the cooperation of every intermediate program. If program D is called by C, which is called by B, which is called by A, and D gets a new option that we need to use, we have to modify C to pass it to D, then B to pass it to C, and A to pass it to B. Or we could just set an environment variable.

Configuration files don't let us override individual variables easily, in different invocations of the program. We have to generate a custom version of the configuration file for that job and pass that file's name to it.

Environment variables are only visible to children, so they are inherently secure; we don't worry whether permissions are too loose on an environment variable so that another user could see it or modify it. There is no such thing.

tlb · on July 15, 2018

Environment variables are visible with 'ps -E', to anyone with the same UID or root.

In Linux, they're visible in /proc/<pid>/environ, which is permission 400 and owned by the effective user of the process.

_bxg1 · on July 13, 2018

In addition to the other advantages mentioned here, environment variables serve as a common language that all programs can understand, in theory. They're also less complex to access than reading and parsing a file (less important for, say, Node.js than C++, but).

rixed · on July 15, 2018

> Why would anyone want environment variables

> when you can have config files

You still need a way to tell your program where the config file is.

Environment variables are just like a configuration file that's automatically opened by the OS when a process start and automatically parsed as a set of named strings by the libc. In the many cases where one does not need more complex data structures than that, then to introduce a config file is just making things harder.

That being said, I wish envvars had namespaces of some sort.

alxlaz · on July 13, 2018

It wouldn't be that difficult, it would only require some libc-related magic (but I haven't read much of the asmttpd source code to know how much of it is in there). IIRC, environment variables are passed from the parent process to the child in the form of a "naive" array of strings, so the truly unpleasant parts of this, like parsing text from a config file, are already taken care of.

jxub · on July 13, 2018

Well, it's not mine but I'm sure it certainly was quite a challenge to build, props to the author for accomplishing that.

haolez · on July 13, 2018

That's probably not a good model to fit in assembly code. It's easier to have a wrapper in another language as the launcher and have it set the variables in the assembly code and build the binary.

getcrunk · on July 13, 2018

haolez · on July 13, 2018

I’m assuming that there is a need to dynamically configure things like the port in the parent’s environment. However, for an assembly tool, it might make sense to simply have it hard coded and save the extra complexity. My launcher script suggestion is a possible solution to this, but I would just hard code the port and ship it without a launcher myself. The suckless projects make use of hard coded configurations in their C programs. The principle is the same.

kjeetgill · on July 13, 2018

speculation: Maybe it's simpler to just bake a string into the binary and hardcode the pointer to at every use than futz around with strings in asm?

But if you're already doing the rest of it in asm why would that scare you?

SSLy · on July 13, 2018

for starters, you'd need a text parser

admax88q · on July 13, 2018

Something you already have if you're building an http server in asm...

captoblivious · on July 13, 2018

[flagged]

saagarjha · on July 13, 2018

While the point you are making is correct, you could have presented it in a less abrasive way. Not everyone is aware of how environment variables are passed to programs, or pieced together that char envp is also available in assembly.

alxlaz · on July 13, 2018

If we're talking about bad ways to present knowledge, the parent comment could also have been "I really don't know much on this topic so I'd rather not speculate" ;-).

laurent123456 · on July 13, 2018

That's impressive, but I couldn't find the reason for it. Was it done just for the challenge, or can their be a use for this kind of server?

nemasu · on July 15, 2018

Yup, initially it was for the challenge and to learn some x86_64 assembly. As for usefulness, personally I use it all the time when I need to serve some static content and don't want to set up Apache or nginx or something else.

yjftsjthsd-h · on July 13, 2018

It's tiny; might have embedded use?

tambourine_man · on July 13, 2018

Embedded x86_64? Is that a common thing?

alxlaz · on July 13, 2018

It's quite common. "Embedded" doesn't mean just "low power and little number crunching ability", it's just a fancy way of saying "special purpose". E.g.: https://buy.advantech.com/Industrial-Computers/Fanless-Indus... is an industrial computer; https://www.edge-core.com/productsInfo.php?cls=&cls2=&cls3=1... is a "whitebox" switch, but it's a representative of many OEM/ODM designs.

x86-64 is a great architecture for many applications. It's cheap (due to volume), well-understood, has great compatibility, and is pretty much the best-supported architecture when it comes to software.

Edit: that being said, no, this doesn't have any embedded use today. Embedded x86 isn't 386 processors anymore. The typical scenario for an embedded x86-64 is "I need to drive this specific application and I want to leverage the Linux/FreeBSD/Windows environment for programming". That already puts you way ahead of the point where writing assembly code is useful performance-wise, and you probably don't want to put ASM code you downloaded off the Internet in a network-facing position :-).

ashleyn · on July 14, 2018

Security is enough of a consideration that any well-established webserver without corporate or foundational backing is instantly unsuitable for serious use.

oneweekwonder · on July 14, 2018

I was under the impression caddy web server mostly developed by one person[0]. I see people these day comparing it to nginx.

Everything needs to start somewhere.

[0]: https://github.com/mholt/caddy/graphs/contributors

imtringued · on July 14, 2018

It is written in Go. That alone makes it safer than an unproven web server written in assembly.

stingraycharles · on July 13, 2018

Definitely looks like it. No library dependencies such like libc, makes it an ideal use case for embedded software.

ebikelaw · on July 14, 2018

Looking at the code I’d be surprised if this is anywhere near as fast as a simple equivalent written in C++ (which wouldn’t be as pessimistic with the register saving, in an opt build) or even compared to bozohttpd.

anyfoo · on July 13, 2018

Neat, but unless I am missing something, a nightmare in terms of security. Assembly has a worse type system than even C: essentially no type system at all!

(Or rather the weakest, simplest, most dynamic type system you can imagine.)

nickpsecurity · on July 13, 2018

There has been a lot of work on that. Look up Typed Assembly Language, Security in front of that, and Dependently-Typed Assembly Language. Here's one that was integrated with a safe, C variant.

https://www.cs.cornell.edu/talc/

anyfoo · on July 13, 2018

Wow, even dependently typed, that's pretty brilliant. I can see this being enormously helpful when writing very low level, for example early boot stage, assembly code.

ChickeNES · on July 13, 2018

Unfortunately it appears that the linked project is not free software :(

nickpsecurity · on July 13, 2018

I have the source but you're right: no obvious license file. I might email them about it.

userbinator · on July 14, 2018

On the other hand, there is essentially no undefined behaviour, no chances of compiler bugs/backdoors, and the code is (by necessity) simple enough that what it does can be easily analysed and understood. Not a lot of places for bugs, security or otherwise, to hide if there's not a lot of code in the first place.

anyfoo · on July 14, 2018

Virtually no undefined behavior, but few security issues in C is because of compiler bugs or backdoors. And integer overflows, buffer overflows, and race conditions don't need any undefined behavior in C, and are the same problem in assembly.

But many programmer mistakes that a C compiler would at the very least warn about will only be apparent at runtime in assembly: Treating something as the wrong type, forgetting to dereference a pointer (or accidentally dereferencing it), mixing up values of different types, mutating something that's meant to be constant in a context, assigning an integer quantity to a pointer without an explicit cast...

Then on top of that you get a myriad of new potential mistakes that are impossible or at least very unlikely in C: Mixing up registers, accessing the wrong offset in a structure (e.g. easier to mix up reg+12 and reg+14 than foo.enabled and foo.name), messing up the stack layout... And now think about what each of those mean when you're actually changing code: Did you update every register? Did you update every offset?

As for the simpler code being simpler to understand and simpler to keep bug-free, that's kind of a tautology that applies independently of the language. And besides, wouldn't those qualities be even easier to achieve for the same code written in C?

userbinator · on July 14, 2018

And integer overflows, buffer overflows, and race conditions don't need any undefined behavior in C

Integer overflows are undefined behaviour in standard C.

anyfoo · on July 16, 2018

No they are not. Unsigned integer overflow is perfectly defined.

nickpsecurity · on July 14, 2018

"no chances of compiler bugs/backdoors"

That's not true if running complex assembly through an assembler and/or linker. Aside from totally changing the program, there's definitely potential for it to be modified to do things like drop safety checks for optimizations or use instruction patterns that increase covert channels. That's on top of not working correctly at all. I'll add to that point that the Intel documentation is huge with every processor having errata. That's why high-assurance security always considers those things inside the Trusted Computing Base (TCB).

acqq · on July 14, 2018

> there's definitely potential for it to be modified to do things like drop safety checks for optimizations

Can you please give some link? It sounds strange, I’ve never read about something like that happening in assembly (“drop safety checks for optimization” — which checks, how can that even happen? Maybe you confused that with the writings about what C compilers do?).

nickpsecurity · on July 14, 2018

Well, let me first say how high-assurance security makes us think on these things: everything is incorrect, insecure, and backdoored by design until proven otherwise. Then, there's rigorous methods for attempting to prove otherwise. We also try to speculate about attacks using a mix of prior attacks (known knowns) and variations of prior problems (known unknowns). So, here's a few things that came to my mind:

The assembler or linker are programmed to look for and drop checks for stuff like buffer overflow.

A "macro" assembler has some code generation. It might generate the code in an insecure way.

An malicious assembler might swap out instructions that prevent side channels for equivalent instructions that cause them.

There's lots of ways to screw up assembly code in general on correctness side that might at least lead to DOS. The malicious assembler might do any of them.

The linker might do any of that. It could even be a better place for it since few people understand or look at linkers. Although memory is fuzzy right now, you could also look for any security errors that started with the linker. They could be done on purpose.

So, I would expect a formal specification of each component, their correctness/safety/security requirements (policy), the implementation, and evidence the implementation maintains the spec and policy. CompCert and CakeML are example compilers that do that for correctness. I can't recall if someone verified a x86 assembler and linker. Rockwell-Collins is best example of your CPU/microcode being verified in itself and against specs of your application code function by function. I've also seen formal work on linking, mainly with type systems, in Standard ML that a lot of proof systems output. Most just ignore assemblers and linkers for x86 for some reason. Be a great project for some people to jump on! :)

http://www.ccs.neu.edu/home/pete/acl206/slides/hardin.pdf

acqq · on July 15, 2018

Thanks for the Rockwell-Collins link!

What I understand there verified is however not that the adversary can't add malicious code but that there are formal verification steps that no bugs are introduced as the non-subverted implementation is introduced.

I agree with you that the linkers are a nice place for the malicious code to be hidden. The malicious code can be anywhere in practice... the way I see it, the accidental errors that introduce security issues are much more probable in compilers and the modern linkers (especially as they do more of a job that formerly only compilers did) than in assemblers.

The assemblers are surely the easiest to be verified on producing what is expected of them to produce: a separately developed disassembler is enough.

nickpsecurity · on July 15, 2018

"What I understand there verified is however not that the adversary can't add malicious code but that there are formal verification steps that no bugs are introduced as the non-subverted implementation is introduced."

What they did is a high-level view (formal spec) of exactly what it does, one of what it means to be secure, and proof the design matches that. It's harder to subvert. It can also prove the system can't be attacked if they model those properties. For instance, they do show some protection against leaks from one process to another with the non-interference property of their partitioning mechanism. I just posted a lot more stuff like that in hardware and software here if you're interested:

https://news.ycombinator.com/item?id=17530829

" the way I see it, the accidental errors that introduce security issues are much more probable in compilers and the modern linkers (especially as they do more of a job that formerly only compilers did) than in assemblers."

I agree. I bring it up every time someone erroneously focuses on the Karger/Thompson attack on compilers. The more common error is the compiler messing the code up. An easier subversion would be to put a bug into a compiler that did that which looked like a normal bug. It's why certifying compilers are necessary. Note that TALC is part of a bigger picture at FLINT team where they do certifying compilers for stuff like ML, safe versions of C, intermediate/assembly languages that are typed for extra layer of defense, I/O, interrupts, and many other things. Here's their work:

http://flint.cs.yale.edu/flint/publications/

http://flint.cs.yale.edu/flint/software.html

Yeah, assemblers should be pretty easy if they're just doing straight-forward stuff preserving the data and structure. Those like Hyde's High-Level Assembly (HLA) might be trickier with high-level functionality. However, even those might be divided between a first-pass acting as a compiler for high-level stuff and a simpler assembler.

daniel-cussen · on July 13, 2018

25.8 KB zip file! Now that's tight code.

vbezhenar · on July 13, 2018

I'm not sure that properly written C version would have bigger size (assuming that unnecessary code is stripped, of course).

Drdrdrq · on July 13, 2018

Curious: how does Rust cope in this regard?

estebank · on July 13, 2018

They're on the bigger side, due to a combination of defaulting to static linking, some optimizations not having been possible (ThinLTO comes to mind), lots of debug strings and metadata, as well as the use of jemalloc[1][2].

[1]: https://jamesmunns.com/blog/tinyrocket/ [2]: https://users.rust-lang.org/t/rust-binary-sizes-once-again/1...

kbenson · on July 13, 2018

You're talking binary size, the zip file being referred to is the entire git repo in zip format. How large the Rust code would be is an interesting question.

Interestingly, the repo notes the binary is only 6kb. I doubt C or C++ will get anywhere close to that.

saagarjha · on July 14, 2018

If they're downloading the zip file from GitHub, then no, the zip only contains a snapshot of the current master branch.

Thiez · on July 14, 2018

Well if you only use libc like the assembly server does, I imagine you can use `#![no_std]`. Your Rust will use `unsafe` a lot and be very unidiomatic, but small. Is using the system allocator for binaries on stable yet?

steveklabnik · on July 14, 2018

Next release.

coldseattle · on July 14, 2018

There it is!

fbn79 · on July 13, 2018

AVE CESARE!

ConcernedCoder · on July 14, 2018

Nice? I mean, how would you be certain it was NOT a "Web VIRUS for Linux written in amd64 assembly (2017)" unless you could read grok asm?

NOTE: I'm not saying it IS or ISN'T, and although I CAN grok asm, I'm not going to take the time to do it right now...

Just struck me as possible, given the idea of a high-level language is to mainly provide abstraction over implementation details ( point being, you can't really get any MORE implementation detail specific than asm... )

(edit: words...)

saagarjha · on July 14, 2018

Well, how do you know the C library you just downloaded isn't a virus either?