Hacker News new | past | comments | ask | show | jobs | submit login
Inline Assembly Dangers (fobes.dev)
122 points by fobes 10 months ago | hide | past | favorite | 61 comments



One of my favorite bugs was due to inline assembly!

I once wrote a function that would crash only if you passed it the immediate value 4. Like `foo(4);` A variable containing 4 would not crash.

The function was calling the Intel assembly instruction to byte-swap an integer. These days there’s an intrinsic for that. But, this was long ago and was written for platforms that didn’t even support POSIX.

So, this function was serializing an integer to memory, byteswapping it in the process, and naturally advancing the destination pointer as well.

The bug was that I forgot to mark the input register as dirty when I set up the inline assembly.

An important thing to know about inline asm, and the reason it is so strongly discouraged these days, is that the compiler cannot see it. It’s a black box for the compiler with a limited set of manual hints — one of which I omitted. Therefore it really mucks up the compiler’s ability to pipeline instructions. And, that cancels out the performance benefits for short snippets of assembly.

Anyway… I didn’t mark the register as dirty. Therefore, the compiler was unaware that it had been modified.

So, if I called the function with a 4, it would run some black box inline assembly, write the value to memory, and then it needed to increment a pointer by the size of one integer. 4 bytes. How convenient! The compiler just saw that at compile time there was definitely a 4 in a certain register! And, the inline asm declaration didn’t declare that anything happened to that register, such as byteswapping it. So, let’s just use that “unmodified” register as a value to add to the pointer!

Advancing a pointer by a byteswapped 4 makes for a quite invalid pointer. And, technically it was the next call to the serialized that would crash when it tried to write another value to that pointer.


Sounds like this could be a nice extension to clang/gcc sanitizers, adding checks that certain registers are unmodified across inline assembly calls.


In addition to the missing newlines, the inline assembly code is also not correctly marking that it clobbers memory.

It is probably not an issue for the entry point function, but the lack of it could mean the compiler is allowed to do incorrect optimizations.


> Therefore it really mucks up the compiler’s ability to pipeline instructions. And, that cancels out the performance benefits for short snippets of assembly.

This seems very untrue. Compiler shouldn't pipeline anything. Pipelining is baked into the architecture. You can write only ASM and the pipeline will be used.

The reason we don't use assembly is that it tends to perform worse than optimized C code, and C is more user friendly. Weirdly this can be true even for very small code snips, that will perform better via optimization; a lesson from computer engineering class.

An additional problem is the bugs caused by not following the programming manual. You technically don't need to "mark" registers, unless you want to tie into your C variables (and there are other ways like parameter passing to ASM functions). You can simply keep it clean for the rest of the C code with the stack.

C code gets turned into ASM. Typically, you can follow the same rules, and everything will always be fine. Many compilers have extended assembly syntax, but I'm wary of it. Seems like some of the modern embedded compilers don't use it, and/or it's not entirely reliable.

P.S. I'm not even sure how true the thing about optimization is, since there ARE asm optimizations. In this day and age, asking for help making your asm more efficient seems silly tho.

Really should only be used to dip into hardware, which rarely has performance considerations.


Compilers are aware of pipelineing and aim to emit pipeline-friendly code for a particular target.

https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html - ctrl-f for "sched" and "reorder" (or indeed, "pipeline") and you'll see some examples.


Yes there might be optimizations related to improving pipeline performance. It is not "pipelining" your instructions.


In my experience you can get a solid 5-10% performance boost just by rearranging order of (independent) asm instructions, sometimes more inside tight loops. Compilers have big tables of what resources each instruction uses and are pretty good at maximizing throughput this way.

If you're willing to put in the time, optimized hand written assembly still reliably beats compilers, especially for vectorization.


In some scenarios, it might. You'd hope not these days, because the compiler can see more than you, and it can take a lot of effort to do this kind of analysis.

However maybe there are areas you can still find, and I would expect there to be at least some niche applications for writing code more efficiently with ASM.

I've heard it's still helpful to reduce code size, if you have space limitations.


All serious compilers go through LLVM or have their own optimization passes. The person writing the compilers needs to know how the architecture works to make it possible for the pipeline to do its job. It's no different than the programmer being aware of cache lines, even if they don't explicitly manage them.

You should understand that people don't always speak in what you'd consider to be 100% literal language.


I think a ton of serious compilers don't go through LLVM. As always HN makes a ton of assumptions based on a small fraction of the engineering and tech world.

The compiler needs to know how to generate efficient ASM code. That may or may not involve optimizations targeting the pipeline. OP made it seem like the compiler is handling pipelining itself, vs. eg. reordering instructions.

> The person writing the compilers needs to know how the architecture works to make it possible for the pipeline to do its job.

Not really, but you will probably want to think about these things a little to optimize. The pipeline also works if you naively use it.


>I think a ton of serious compilers don't go through LLVM. As always HN makes a ton of assumptions based on a small fraction of the engineering and tech world.

I said LLVM or their own optimization passes. Reread what I said so you can understand the words on the screen instead of bemoaning the state of the world.


> Compiler shouldn't pipeline anything

Yeah, tell it to the researchers in the 90s who spent that decade on teaching their compilers to properly schedule the instructions for maximum throughput, and unrolling the loops to so that all of the 32 registers could be used, and accounting for the available execution units to hide latency etc. Sure, x86 nowadays do most of that in the hardware but still, even today, moving the loads to as early as possible is beneficial for hiding their latency because even a hyper-speculative, out-of-order processor can't start executing an instruction it hasn't fetched yet.


> Yeah, tell it to the researchers in the 90s

I will.


Inline assembly was very pleasant to write in Turbo Pascal and Delphi.

In D language it looks okay too.

In GCC, inline assembly has insane syntax. Probably on purpose, to discourage writing it.


> In GCC, inline assembly has insane syntax. Probably on purpose, to discourage writing it.

No, it's primarily for portability and flexibility. Other languages have "inline assembly" that is really a separate 8086 assembler built into the C compiler. And that's good for some things (accessing values in the surrounding C code via natural expressions) and bad for others.

In particular, a gcc asm() block can do anything the underlying assembler can, and the gcc "constraint" syntax is designed to express the idea of "what rules must the compiler follow to generate code around this asm block".

Want to expand your asm block as a big string using preprocessor macrobatics? It's just a string, you can do that.

Want to write a macro that expands into a runtime constant that you want to link into a special section? No problem.

Want to call a function in a special ABI that is going to clobber a specific set of registers and need to tell the compiler about it? The gcc syntax has a constraint for that.

Basically, "inline assembly" in DOS tools was... fine. It handled the case of "1980's Compilers Suck and I Want to Hand-Optimize This Code". And that's fine. But that's not why people write assembly in the modern world, and gcc has a much broader reach.


Being able to use macro expansion, stringification, concatenation, etc. in GCC inline asm is a major advantage over how, e.g. MSVC handled inline asm. Getting the constraints right is the hard part, the syntax isn’t much of an issue beyond the initial learning curve imo.


I never got the point of that weird syntax, I would rather reach out to a raw Assembly instead of dealing with such syntax.

To add to your list, most C and C++ compilers on the PC world as well.

Certain folks will mention it is because the compiler needs register usage info, yet PC compilers could and can, infer the usage as well, when parsing those asm blocks, or how intrisics are used (as alternative).


> > In GCC, inline assembly has insane syntax. Probably on purpose, to discourage writing it.

> Certain folks will mention it is because the compiler needs register usage info, yet PC compilers could and can, infer the usage as well, when parsing those asm blocks,

The compiler can only infer the register usage info for known instructions, but the whole point of inline assembly in GCC is to use instructions the compiler doesn't know about (and many of these have implicit register uses which can't be inferred from just looking at the assembly syntax, you really need to know what the instruction does). And in some cases, the register usage can't be inferred even for known instructions; for instance, the registers used by a "call xyz" instruction depend on how the "xyz" subroutine was implemented (it probably has a nonstandard calling convention, otherwise there would be no reason for using inline assembly to call it).

The "weird syntax" actually matches closely how known instructions are described within GCC itself, which makes sense since GCC inline assembly is all about teaching it about new (or non-standard) machine instructions (or pseudo-instructions), without having to modify and recompile the compiler.


Yeah, which kind of proves the point of the design of UNIX compilers, versus other platforms.

I know GCC since 1995, and am I quite aware why such bad developer experience should be preferable kind of argumentation.


its not too bad syntax really.

asm('asm code' : 'outputs' : 'inputs')

you can even use labels in it etc. for in/ouputs so it's 'easily readable'.

People just write it really lazy, don't use proper formatting / whitespace which makes it really difficult to read _other_peoples_inline_assembly.

I'd wager if you write it nicely with comments etc. it's quite alright. with some macros it might be more beautiful but imho those are harder/more annoying to interpret.

https://pastebin.com/S0x9hKJP <-- this isn't too bad is it? ignore that it has bugs xD


If you mean x86/x86-64 assembly, you can turn on intel syntax.


Writing inline assembly in most languages is a pain. I think there's some real opportunity to make things more ergonomic. Basically taking the idea of HLASM and other high-level assembler projects, but actually bringing them into the modern era of languages and tooling.

I would like to see a kind of "seamless" assembly feature, where asm instructions and ISA-specific registers could be made first-class objects (intrinsics are a weak version of this). Register-sized objects (like most numeric primitives, pointers, etc.) should seamlessly decay to the register in which they reside.

Also, tooling and LSPs could be improved. Wouldn't it be nice if I can text-select a section of asm and the LSP shows me latency/throughput of the block on my desired target platform? Or maybe it does some proper static analysis like noticing dead writes, or maybe a refactoring action that expands a slow div into a more optimal instruction sequence.


And this is why inline Assembly done by GCC and clang sucks, and the inline Assembly done by most PC compilers, with first level support for opcodes, or instrisics is a great experience.


It's great experience, but it's also slow as the compier must save the entire function state before your assembly block and cannot make any assumptions about what you do inside.

This is acceptable if you see the point of inline assembly in writing long code sections, but that's not what inine assembly is meant to solve. If you want to write long sections of assembly code, just put the whole thing into an assembly source file and link it into your binary.

gcc-style inline assembly on the other hand is designed to solve the problem of augmenting code generation with individual instructions the compiler doesn't know about. Correctly used, gcc-style asm statements typically only hold one or two instructions that are then integrated into a complex function with little to no overhead. And for this usage, the syntax and model provided by gcc is perfectly fine. It only fails if you misuse it for writing long chunks of assembly code, something for which you should have used an assembly file in the first place.


> It's great experience, but it's also slow as the compier must save the entire function state before your assembly block and cannot make any assumptions about what you do inside.

That is why there are function attributes and #pragmas to control what the compiler is supposed to do.

Compilers that don't follow the brainded execution model of UNIX, with hard separation between compiler and Assembler phases, as separate processes comunicating via pipes, are a bit more knowledgeable of what those Assembly instructions are actually doing.

Not only that, they are able to provide proper error messages when the opcodes are badly used.


> [...] are a bit more knowledgeable of what those Assembly instructions are actually doing. Not only that, they are able to provide proper error messages when the opcodes are badly used.

That's only possible if the compiler knows all assembly instructions being used. But the whole point of GCC-style inline assembly is to be able to use assembly instructions the compiler doesn't know about (for instance, because they're new instructions which didn't exist when the compiler was released), with the same performance as the ones the compiler does know about. In many cases, even the assembler doesn't know about the new instruction, and you have to specify it using .byte directives or similar (though the Unix compiler model, with proper separation between the compiler and the assembler, also allows you to use a newer assembler without having to change your compiler).


Again, this kind of arguments gets tiring, compilers get upgrades.


And GCC/clang do get intrinsics that are nicer to use than inline asm, often with more optimization available to them as well.

If the compiler does know about a specific instruction, then it's better to just provide the intrinsic than try to infer it from some inline asm string.


One of the benefits of gcc's powerful inline asm support is that it allows you to write new intrinsics. This is almost impossible in most other compilers, as they will do crazy things such as flushing out all registers before each asm block. Very useful for low level work.


> it allows you to write new intrinsics

I admit that it gets very close, however I think compiler provided intrinsics can be and are optimized more than ones that are implemented with inline asm, as the compiler knows about the semantics of the instruction for such an intrinsic, and not much about an inline asm one, apart from inputs/outputs/clobbers.

Having said that I agree that inline asm is very useful for instructions that the compiler does not know about.


That's great if you're not locked down to a particular compiler version


> That is why there are function attributes and #pragmas to control what the compiler is supposed to do.

Which basically reinvent gcc's syntax to set input/output/modified registers, only worse. No thank you.

The compiler you are most likely thinking off still does not allow to set multiple output values from an inline asm block, for example. Something the gcc/clang syntax has no problems with. These details all make the "ffi"/interface between the C world and your inline asm slower than it could be.


No thank you, to pasting strings.


But yes to pragmas with register lists ? Be serious.


It seems that MSVC supports inline assembly only on 32-bit x86. For amd64 and ARM they moved entirely to intrinsics.


The more I've written code close to metal (mostly SIMD for signal processing), the more I've grown to prefer either intrinsics or separate translation unit for assembly.

If you want your code to intertwine with what the C compiler does, intrinsics are great.

If you don't, .s is great.


And the only way to use new instructions is to make them slow or to wait for a compiler update.


Exactly, this is the right approach.


Yes, and the set of intrinsics supported by MSVC is really lacking. On x64, this left no way to force a conditional move to bypass a poorly predictable branch, nor was there a way to access carry-extended operations until recently. On ARM64, it doesn't have complete coverage of scalar intrinsics for operations like RBIT (reverse bit order).


Yes, Assembly programming concatenanting strings, is horrible.

Intrisics give the best of both worlds, without dealing with string based content.

Actually ESPOL/NEWP from 1961 was one of the very first system programming languages with two key inovations, intrisics and unsafe code blocks.


I’m not a fan of concatenating strings either. I was thinking about the MSVC/Pascal-style inline assembly here, which isn’t based on strings.

Correct me if I’m wrong, but I don’t think that you can guarantee a routine is constant-time by using intrinsics. You need asm for that. Asm that won’t be changed by the compiler. So you need to write an external asm file for that now, which is fair enough, but I just wouldn’t present intrinsics as all-around superior.


Yes, if you want full control, only external Assemblers will do the trick.


What is a PC compiler in this context? MSVC?


Almost all MS-DOS, OS/2, Windows 16 bit, Windows 32 bit and Windows 64 bit compilers you can think of, that aren't GCC or clang forks with their UNIX architecture, and support inline Assembly.

For C, C++, BASIC, Pascal, Delphi, Modula-2, Oberon-2, Component Pascal, D,...

Traditionally PC refers to IBM/Microsoft's linage of computers and operating systems.


The bug here isn't really the fact that GCC inline assembly is a string replacement macro language, which surprises a lot of people coming from other syntax like MSVC.

It's that the author wrote what looks to be a 20+ instruction entry and setup function and never once tried to verify it. You never write inline assembly without reading the resulting disassembly. It's like rule 0...


You really should not use inline asm to write your _start routine. The compiler can (and someday WILL, when you least expect it) use the stack, maybe in the prelude, maybe because one day it decides it needs to preserve one of the regs you claim you are using. This is undefined behavior all over the place. There used to be some attributes like 'naked' in some old compilers to avoid this...

Also, no tests check if bss is zero? There's your problem... :)


Code inside code is always a bad idea. Sometimes you can't avoid it (like with SQL), but in this case, if the file had just been an assembler source code file, this wouldn't have happened.


It always feels really tricky imo. Intrinsics are preferable if the functionality you need is at all possible with those. But I'd be surprised if you could easily clear the BSS with those.

And who can remember what early clobbers is? I always forget and have to look it up because it feels especially subtle.

It'd be neat if inline asm could use something like the triple quotes in other languages. Maybe it'd be easier to format text like this.


> Intrinsics are preferable if the functionality you need is at all possible with those. But I'd be surprised if you could easily clear the BSS with those.

You can do that with a plain memset.

They are just using inline asm because they don't want the compiler to use stack/globals before they've had time to initialize them, which is very risky.


In C++11 you can use R"( raw/multi-line strings for inline assembly.


The real problem with inline assembly is that you really are flying blind and have to be an experienced enough to know what you're doing. The compiler can't help you much at all due to losing all type information.

Saying that, I think it's a skill that we need to foster, at least for a few niches.


Annoying. I hit a somewhat similar snag on the weekend - inline assembly (in someone else's library) wasn't compiling because another library I was using had macros that collided with opcode names. Super-confusing error messages as a result.


A better title would be "Knife Juggling Dangers."


Why is this link dead for me?


Hi, op here. Is there any sort of error message or status code? I use CloudFlare so it's quite strange if it simply didn't connect for you.


Hmm, appears to be specific to my work PC which really doesn't like the SSL for some reason. Simply refuses to acknowledge that it's encrypted, rather than even presenting the certificate. Very odd since my own CloudFlare-fronted site which also uses their certificates works just fine. Sorry, I think this is very specific to my environment.


Uh, why is this not a compile error? I’d expect the string to be concatonated, which would result in an invalid program.


The strings are concatenated, it’s just that the first line was a comment, meaning the compiler/assembler saw all the next lines as part of the first. Comments don’t cause compiler errors, so no error here…


The first string is a comment, so all the strings after it just get appended to the comment text, until it finally hits the first newline. Quite an insidious mistake.


why not just use the standard library functions, instead of these esoteric __*() functions, which although they may exist are probably not the right ones to be using.... it looks like you're trying to set the time zone and there's methods to do that, which is not the one you're referencing in your code...


The standard library functions have to be provided by someone. In bare metal/deep embedded work, that someone is YOU.

So anything is possible. Anything. Though if you're unlucky enough to have untrustworthy hardware, or just not-yet-trustworthy hardware, that's usually enough to keep the nasal demons at bay (even they fear malfunctioning hardware), which is one small mercy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: