Hacker News new | past | comments | ask | show | jobs | submit login
New inline assembly syntax available in Rust nightly (rust-lang.org)
474 points by dagenix 34 days ago | hide | past | favorite | 222 comments



I recommend reading the RFC. It explains pretty well how it works and it's readable even for people with only very minimal asm experience:

https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-in...


To be honest, I still like the explicit cobber syntax of rusty_asm.

Last year I had some fun getting a rust program to run on some old 386 DOS SBC I had hanging around. Was terrible in terms of code density (basically a bunch of 32 bit instructions prefixed with the "execute 32 bit op in real mode" prefix).

This was a wrapper around the DOS string print function (where s is a &str).

        rusty_asm! {
            let ptr: in("{dx}") = s.as_ptr();
            let len: in("{cx}") = s.len();
            clobber("ah");
            clobber("al");
            clobber("bx");
            asm("volatile", "intel") {
                "mov ah, 0x40
                xor bx, bx
                int 21h"
            }
        }


In the new syntax, a direct translation of that would look something like this:

    asm!("
        mov ah, 0x40
        xor bx, bx
        int 21h
        ",
        in("dx") s.as_ptr(),
        in("cx") s.len(),
        out("ax") _,
        out("bx") _,
    )
However, I'd personally write that like this instead:

    asm!(
        "int 21h",
        inout("ax") 0x4000 => _,
        inout("bx") 0 => _,
        inout("cx") s.len() => _,
        inout("dx") s.as_ptr() => _,
    )
That lets the compiler handle filling in ax and bx, and also avoids trusting that the interrupt routine leaves ax/bx/cx/dx untouched. (I might also write clobbers for every other general-purpose register, too. Take a look at Linux arch/x86/boot/bioscall.S and the "Glove box" for BIOS calls, due to misbehaving BIOSes that clobber registers they shouldn't.)


I agree that passing `ax` and `bx` as inputs is better than setting them in the inline asm, but I don't like the idea of setting registers that shouldn't get changed as outputs "just in case".


It's not a theoretical consideration. It's quite common for BIOS and firmware code to clobber registers that it isn't supposed to; see the code I mentioned in the Linux kernel, which arose through real-world bugs that were incredibly hard to debug.


But where do you draw the line? What if it clobbers the stack pointer?


Then it won't be able to return at all, and people will notice more quickly.


Thanks! The latter of the two is definitely nice. Yeah, I should've let the compiler setup ax and bx. Again, this was a hack :)


40Hex :raises eyebrow:


I've programmed for DOS and BIOS before, and using the `h` suffix for `int 21h` and `int 20h` and `int 10h` and `int 13h` still feels natural. I never wrote `h` literals for anything but a few interrupt numbers, though. And when I programmed for 32-bit Linux, `int 0x80` felt more natural.


I thought having the 0x-prefix (which AFAIK is a C-originated syntax) along with the -h suffix (classic PC Asm syntax) together in the same code was unusual too.


Rust folks, thank you so much for making this far, far better than GCC’s asm syntax for C. Also, thank you for using Intel x86 syntax instead of AT&T.

It would be delightful if GCC were to adopt something similar after this stabilizes in Rust.


> Also, thank you for using Intel x86 syntax instead of AT&T.

Thank you for helping to validate that decision; this is the kind of feedback we needed.

(And note that you can still choose to use AT&T syntax, it just isn't the default.)


Yes, thank you for the Intel syntax. It is by far the most frequent syntax you'll find anywhere somewhat recent (1990+) and it is by far the most straightforward to learn by being clean. I did learn both in the 90s, but I've always struggled with AT&T symbols and different offerings which to me was hard to switch to when learning from different material sources.


FWIW I mostly see AT&T when compiling against C codebases to a) avoid requiring a secondary assembler & wanting to support legacy binutils and b) because inline C is AT&T in practice, at least outside of windows (no clue what the c ecosystem is like there). However, most resources exploring “assembly” do so in a context where it makes sense to use intel syntax & work with a “dedicated” assembler.

That said, this is a good decision because C compilers seem to be the major holdouts at this point—binutils has had .intel_syntax for a long time now, it’s just not supported inline.


> at least outside of windows (no clue what the c ecosystem is like there).

Microsoft's assembler uses Intel syntax, and their inline assembly also uses Intel syntax.

Their inline assembly is much less explicit about inputs, outputs, and clobbers, and "just" has you mix C symbols and labels with assembly instructions, which I imagine is a pain to get right.

However Microsoft has de-emphasized inline assembly and doesn't allow it on amd64 or ARM. For things that need specific instructions you need to either use intrinsics or put assembly in a different object file.


> However Microsoft has de-emphasized inline assembly and doesn't allow it on amd64 or ARM.

That's just sad :(


I suspect if they had a situation like GCC where the programmer had to annotate the side effects of every assembly snippet, it would have been easier for them to add it elsewhere.

But the way they had it, they presented a seamless blend of assembly instructions and any C identifier, in or out, and I guess the compiler would need to parse out any side effects and cope with them. My guess is they looked at porting all that to ia64 [which they supported until Server 2008 R2], amd64 and ARM and balked.

No idea if they ever had it on some of the old architectures they supported in NT4 days or on CE (alpha, mips, ppc).


I've used it with SH4 on Windows CE.


I gotta say, I’ve written thousands of lines of assembler and can count the times that inline assembly was clearly useful on one hand. The clear benefits seem to be readability and concentration of documentation.


I've had a whole bunch of projects that have needed 2 to 40 instructions of assembly-- often preferable for the the assembly to be inlined to improve performance and surrounded by closely-related C code on both sides.

Examples: An abort call on an embedded system that needs to disable interrupts to prevent task switching after the abort. One instruction.

An implementation of _start for embedded that can do everything in C except setting a couple processor flags. Two instructions.

Running (with a lock) event callbacks in an embedded system on their own stack, so that not all tasks need to have their stack big enough to handle the stuff the callbacks might do; 5 instructions that are far better inlined than having another call and indirection. (In addition to keeping doc/code together).


Yes, this is great, but I have to disagree on the Intel x86 syntax assertion by the original poster. Perhaps it's because of the era I grew up in and how I first learned machine language and then assembly language, but I've always preferred AT&T syntax. Regardless thanks for supporting the saner syntax even if it's not by default ;)


If you're ever in doubt on which syntax to implement first or make default, just look at the processor manual (TRM). Match it, and you'll have an ironclad defense :)


That was one of our primary justifications. The tradeoff was between that and what's currently widely used with GCC and clang; the latter (along with people porting from what's now called llvm_asm!) motivated us adding the att_syntax option.



I prefer AT&T, thank you for the option.


For ARM, can we have the "." separators? They are standard for Aarch64 anyway, and they make the 32-bit code far easier to read. Like so:

add.s.ne

ldm.ia.cc

(in both orders, for those cases with two suffixes)


You can also use intel syntax with inline assembly in gcc, e.g.:

        asm(".intel_syntax noprefix;"
            "xor eax, eax;"
            "xor edx, edx;"
            "1:;"
            "mov r8, [rdi];"
            "mov r9, [rsi];"
            "sub ecx, 64;"
            "jl 2f;"

            "cmp r8, r9;"
            "jnz 3f;"

            "lea rdi, [rdi - 8];"
            "lea rsi, [rsi - 8];"
            "jmp 1b;"

            "2:;"
            "not ecx;"
            "shr r8, 1;"
            "shr r9, 1;"
            "shr r8, cl;"
            "shr r9, cl;"
            "cmp r8, r9;"

            "3:\n"
            "seta al;"
            "setb dl;"
            "sub eax, edx;"
            ".att_syntax prefix;"
            : "=&D" (d0), "=&S" (d1), "=&d" (d2), "=&c" (d3), "=&a" (cmp)
            : "0" (l), "1" (r), "3" (nr_key_bits)
            : "r8", "r9", "cc", "memory");


Is AT&T syntax really that bad? In particular, it's way nicer for specifying operand sizes. I'd prefer "andl $3, (%eax)" any day over "and DWORD PTR [eax], 3".


I subjectively like AT&T syntax more (as a more familiar to me and having more "machine" vibe).

However, I recognize that it's objectively worse for a human coder and contains some syntactic footguns which shoot even a experienced coder regularly:

- `number` (memory displacement/pointer) used instead of literal `$number`. It's an easy automatic mistake even for a person who knows this very well.

- the SIB clauses for x86 (memory addressing with constant Displacement and Scale and Index, Base in registers) look like `D(B, I, S)`: it's possible to remember this, but reading/writing it is not as obvious as `[D + B + I * S]`.

- Intel syntax in general is more similar to high-level languages, even though it's more verbose.

- AT&T syntax has syntactic redundancies like '%' before each register which make code much noisier than needed.


The intel syntax has some nice gotchas though as the scale can be 1. I believe there is [eax+ebp] which uses the ds segment, but [ebp+eax] uses ss.


The really fun bit is when you explicitly want to force one form or other, for some reason. Usually this is related to instruction length, but occasionally you used to run into times when one form could help avoid a stall.

Keeping tricks like that stable and clearly documented is, uh, not for the faint of heart.


% before register names isn't redundant: it allows you to refer to symbols that would otherwise have the same spelling as register names.


Yes, that's where AT&T is more logical.

In practice, though, naming a symbol with a register name is error-prone and confusing. I'd rather have a way to escape a symbol when it's really needed than to pay the price of noise for an admittedly bad idea.


New registers are added all the time (for instance, the %zmm registers), so even if a symbol name doesn't conflict now, it might conflict in the future. Having separate namespaces for symbols and registers is a good idea.


Good point, but I wonder if it shouldn't be the labels which should be prefixed instead of the registers?


Yes, ideally the labels would be strings in the style of C source code. That includes using \0 to put NUL bytes in the middle. It also includes wide character strings.

This allows for unusual languages like LISP and FORTH, without mangling the symbols. Symbols could have commas and spaces.


The problem is not whether its bad or not, but rather, having to learn a completely-different second syntax for one architecture.

Most Rust code is quite portable, targeting ARM, x86, MIPS, PPC, WASM, Sparc, s390x, riscv, ... That means, that for many snippets of inline assembly, you might encounter ~8 of them, one for each architecture, all using different syntaxes.

Intel syntax is quite similar to that of other popular architectures `op dst, args...`.

Adding another second syntax for x86 just doesn't add that much value IMO, and adds quite a bit of cost: now everybody dealing with x86 assembly needs to learn 2 syntaxes... and everybody dealing with portable code now needs to be at least able to read 2 syntaxes for x86... Without talking about the cost of implementing a second syntax in the compiler, etc.

If you prefer AT&T, you can always write a proc macro that translates it to Intel, and use that in your projects.

If I ever need to deal with such code, I'd just expand the macro to read the actual Intel syntax, modify that, and either fork the project, or submit a patch with a fix using Intel syntax.


> If you prefer AT&T, you can always write a proc macro that translates it to Intel, and use that in your projects.

If you prefer AT&T (or you have a large body of existing AT&T code you don't want to have to translate all at once), use asm!("...", options(att_syntax)) and it'll Just Work.


I have two big reasons for preferring Intel syntax:

1. It’s the syntax in the manual. The last thing I want to do when reading or writing asm is to mentally translate from the manual to AT&T syntax.

2. Addressing like (%rax) is tolerably. But the AT&T scale * index + offset syntax is inexcusable. Give me the verbose Intel addressing syntax any day, please.

(As a kernel programmer, I’m more familiar with AT&T. I still hate it. I’m morbidly curious how Intel syntax ought to handle things like SGDT. Maybe SGDT SIXBYTE PTR [address]? The fact that four bytes is called a DWORD isn’t great.)


> 1. It’s the syntax in the manual. The last thing I want to do when reading or writing asm is to mentally translate from the manual to AT&T syntax.

Right! I worked on C++ compilers for years and I don't even know where is the canonical book of AT&T mnemonics. At the rate that Intel is adding new instructions, using anything other than their official docs (and thus their official names) seems nuts.


I'm quite confident that the canonical book of AT&T x86 assembler does not exist. There is the canonical book of Intel asm (the SDM), the canonical book of AMD asm (the APM), and the almsot canonical book of upcoming Intel asm (the ISE). Sadly, these documents are not generally entirely consistent with each other, but they all agree that the Intel syntax is the syntax.


AT&T syntax is actually the common one in most low-level programming if you count by architectures and most likely also by code size produced, if only because GCC has/had been the de facto compiler for new chip uarchs for more than a decade (helped by the fact that everyone wants their chips to run Linux).

Nowadays GCC and LLVM support both styles and archs pick whatever they prefer and nobody cares, really.

> either fork the project, or submit a patch with a fix using Intel syntax.

That sounds a bit extreme? Reading/writing in both styles is not an issue for anyone that has dealt with x86 professionally.


>AT&T syntax is actually the common one in most low-level programming

Is this actually true? Admittedly I've done mostly x86 and ARM for the past several years (almost entirely Cortex-M, so v6-M and v7-M profiles, using ARM's GCC builds for embedded) and the only toolchain that prefers AT&T syntax is x86 GCC and those explicitly trying to be compatible with it. All the ARM inline assembly I've written, targetting GCC backends, has been ARM syntax, and likewise for all the disassembly output.

The DSPs and DSP-likes are... always weird. So I try to stay away from them and make them someone else's problem. But I don't think they use AT&T syntax either. It doesn't work so well for truly strange processors anyway.

I'm one of those guys who tends to have the makefiles output the disassembly, and have it open on the other monitor while I'm working, so I'd notice if it were different....


And apparently Rust already supports both.


They're both terrible. :( I yearn for a more ergonomic assembler.

That's not to say that the Rust team should have tried to find (or create) an alternative; it's outside their core mission, and choosing between popular existing alternatives is the right framing.


I agree. They took their time but was worth it. I really missed not having had ASM is stable Rust the past few years but since it means not having to deal with something as horrendous as GCC's __asm__ syntax I'm glad they took their time.


GCC supports Intel syntax.

Why there is always a guy who doesn't like the default and unaware that it's just one of the options?


They’re upset that it’s the default, not that the option doesn’t exist.


As someone who only knows at&t syntax I guess I should view this is an opportunity to learn intel syntax


Still fails short of asm code blocks of PC based languages.


I've followed the discussion regarding how ASM should be integrated in Rust over the past few years, the idea of integrating the ASM syntax more tightly with Rust has been brought up. Basically one could treat assembly as an ad-hoc domain specific language with its own parsing rules that would make it more like a first class citizen instead of a magical character string. I've actually implemented a very simplistic MIPS assembler in a similar way for emulation purposes: https://github.com/simias/rustation/blob/master/src/parallel...

I think they were absolutely right not to go down that route, because then you have to basically specify this syntax for every possible architecture targeted by Rust. Integrating ASM in the language syntax is fine if you only care about x86, but it's going to be a mess if you want to support MIPS, ARM32, AARCH64, Sparc and whatnot because the ASM dialects for each of these architectures have special bespoke syntax to deal with various quirks of the underlying ISA. MIPS has special syntax to load the top and bottom half of an address as well as dealing with delay slots (.noreorder), ARM has special sugar for PC-relative addressing etc...

Importing that baggage into Rust would be a fool's errand IMO. It would also make it harder both to port Rust to new architectures and to transfer existing assembly code into Rust since it would require adjusting the syntax.

I'm very happy with Rust's current approach, it's a good middle ground IMO.


Is the MSVC dialect of C/C++ on x86-64 not a "PC based language"? Because it doesn't support inline asm at all.


Better not supporting it and just use external Assemblers or intrisics, like it does by shipping MASM in the box, a very good Macro Assembler, at least none of them are glueing strings together.

Also that same compiler still does it for x86.


what are PC based languages ? what is PC short for in this context


MS-DOS/Windows heritage.

Modula-2, Pascal, C, C++, Basic based compilers with inline Assembly parsers or intrisics that interact with host language type system instead of manually dealing with strings UNIX style.


But in rust asm! does interact with the type system.

The asm macro does know about what group of registers (or what specific register) a input/output is stored/loaded from and compilation will fail if the input isn't compatible with the register.

Then it's delegating to a assembler for the rest.

Sure it could also have a list of all asm commands the given target does support and parameters/registers it can be used with but this means keeping track of all supported targets and all commands for all potential featuresets of all targets and all way they can be called. This is a lot of work. Even more such support can be added later one, too. For now it's more important to support asm on stable.

Also with proc-macros you can implement this as a library on top of `asm`!

Also with a more "knowledgeable" asm any support for a new platform would need to manually add support for all asm of that language so having a standard stringly interface is a good think anyway.


What you describe is how it works in GCC/LLVM and is required in any inline asm system. Otherwise it is not really inline.

Saying that fits with the type system is a stretch... it is like any other FFI.


The use of strings in Rust's implementation is limited to the assembly template and the operand constraints. Functionally speaking, moving those out of strings gains you absolutely nothing and can make parsing more difficult.

The important part, type system integration, is already there. Sizes must match, move and borrow checking still happens, etc. This avoids the typical failure modes of unix-like stringly-typed systems. In fact if you read the announcement and RFC, you'll see that this was a primary goal of the new design!


Second that. Very simple example:

  function MultiplyBy2Reg(aInput: int64): int64 assembler 
  register;
  asm
    SHL RAX, 1;
  end;

  function MultiplyBy2(aInput: int64): int64 assembler;
  asm
    mov RAX, aInput;
    SHL RAX, 1;
  end;


Now I need to port that code to MIPS, what do?

This is a very nice feature to have if you know your environment is overwhelmingly tied to a certain architecture and you want to make it easy to target this architecture in particular.

This might actually be a somewhat reasonable, if short-sighted approach if we were in 2005 and you could make the point that anything that's not x86 is effectively legacy or niche but nowadays you need to at the very least support x86-64, ARM32 and AARCH64. Baking all these flavors of assembly into a language would be very heavy handed and a maintenance nightmare.

Beyond that we're not in the 90's anymore, ASM is not routinely used outside of very low level code these days. Even for performance people often opt for intrinsics instead. Compilers got massively better at optimizing code while humans got worse due to ever more complex architectures.

Being able to seamlessly blend ASM with normal code isn't that much of a killer feature anymore IMO. It would be a costly gimmick to implement.


It is Assembly after all, just use conditional compilation.


ifdef is your friend. used everywhere


The inline assembler syntax used in D is designed to match the syntax used in the Intel assembler reference manuals. It's not that the Intel syntax is that great (it isn't) but the mental gymnastic of having to swap the operands makes blood run out of my ears, and causes me to make huge numbers of mistakes.


Anybody that knows about the RFC and other implementations in C compilers, why are always "string literal parameters" (quoted strings) used?

Wouldn't be nice to have a way to embed other languages without the need for those quotes everywhere?

I guess it is meant to avoid complicating the syntax/grammar reusing the existing macro one?


It means that a parser for Rust doesn't also have to be a parser for every embedded language - which is particularly important for things like syntax highlighters in text editors or HTML renderers or whatever.

For example, in most languages including Rust, parentheses are balanced as a rule, but they might not be in some inner language. So if you want to define a syntax where you can say

    inner_language!( smiley = :-) );
while you can certainly write a parser that understands what's going on there, any existing Rust parser will get very confused. So, unless you're confident the inner language is syntactically compatible (even if it's meaningless) with the outer language, you'd need to protect it somehow.

Making the inner language quoted means that existing parsers that already know how to handle the outer language's string syntax (where to find the close quote, how close quotes might be escape, etc.) work seamlessly. You could also define some other syntax for it, but it's fundamentally the same requirement - it's a well-defined open and close syntax - and most other syntaxes end up uglier (think of <script>//<!-- ... //--></script or <script>//<![CDATA[ ... //]]></script> in HTML/XHTML.)

And there's nothing preventing a particularly clever parser from recognizing the asm! macro specifically and syntax-highlighting (or otherwise tokenizing/processing/etc.) the text inside, if it wants to.


In principle, you can embed languages without the need for strings. However, when you do so, you need to worry about the tokenization of languages to see that a) they combine in a sensible way and b) tools won't get confused if they don't understand the sub-language (this is a real issue that cropped with XHTML and JS).

In practice, inline assembly has pretty much been seen as a "paste this text into the generated assembly of the program, after substituting some values determined by register allocation"--in other words, it's a pretty natural string formatting problem, and describing the language like you would a formatted string is a natural step to take.

Another point to make is that assembly languages are wildly divergent in their actual syntax. ARM uses # to denote literals, x86 (Intel syntax) uses # to denote comments.


For maximum generality and compatibility, likely. Also note that this is ”just” a macro, not first-class language syntax, so any non-quoted arguments would have to be valid Rust token trees.


> any non-quoted arguments would have to be valid Rust token trees.

Token trees allow a very wide set of syntaxes. rust-c for example allows you to embed inline C code as token tree via a macro: https://github.com/lemonrock/rust-c


Yes, Rust macros have certain syntax requirements that allow the code to parse without knowing what a macro does (and to find out where a macro ends). Don't quote me but there are some token requirements as well as balanced parenthesis. If you allow another language to be inserted raw you risk that it requires syntax that would break these requirements.

Buy using a raw string literal you have a well defined quoting syntax mediating the two languages.


Yes it adds a lot of complexity on the tokenization level (and others), potentially requiring some form of contextual tokenization.

Scala originally supported "inline" XML but moved it into some form of compiler plugin you can opt-in to as just having it enabled had drawbacks even if you don't use any XML... (I don't remember the exact drawbacks anymore).


It saves time (there also isn't a single ground truth for parsing assembler)

https://dlang.org/spec/iasm.html

No quotes required


Zig has a nice solution for that:

https://ziglang.org/documentation/master/#Global-Assembly

The "magic" being that lines beginning with double-backslashes are string literals.


That's a solution in none of the ways GP asks about. It's just a different syntax for a string literal, the entire point of the original comment is why string literals.

> The "magic" being that lines beginning with double-backslashes are string literals.

So… strictly worse than rust which allows linebreaks inside string literals, and thus only require a dquote before and one after the entire block instead of a two-char prefix before each line?


I may be the only one but I feel like this (and the old) asm syntax is too complex and not very user friendly. It looks heavily inspired GCC. I'm sorry for the upcoming German (you can ignore it and just read the examples), I feel like a smarter compiler could have a friendlier asm syntax, similar to the one used by FreeBASIC[1].

I hope you can see why I prefer that. Even though the FB compiler is not very smart and just pushes and pops all important regs to and from the stack its asm syntax is a joy to write. I don't have to specify clobber registers, in, out and variables and functions are in scope by default. Also the asm code is not a string and doesn't need to be quoted, it's doesn't look like something foreign at all.

I know that something like that is currently not possible in Rust but I'd love to see it some day.

[1] https://www.freebasic-portal.de/befehlsreferenz/inline-assem...


You can track some of the history of the proposals here: https://github.com/rust-lang/rust/issues/29722.

A lot of the improvements you've suggested have been suggested in the past, but there's some reasons why they aren't being proposed. In short, sufficiently smart compilers can probably guess at a lot of constraints automatically, but building that level of smarts is very daunting, and for most use cases, the user probably wants/needs the fine-grained control anyways.

For example, take the call instruction. Which registers should the compiler mark that as using and/or clobbering? It depends on the ABI of the function being called--and the user might not even be calling a function at all (say, using a call as a gadget to get the current program counter). Saving a register is another common technique in inline assembly, and you don't want the compiler to infer that the register is clobbered--indeed you're saving it to explicitly not clobber it.


Could we build a sufficiently smart compiler as a procedural macro that wraps asm!(), so it can be developed as part of the ecosystem?


That’s a very perilous road. Not to mention expensive to maintain, slow to compile, with very little gain.


All of that would be true if the compiler had it built-in too though, right?


The problem with baking the asm syntax into the language is that it can make it really hard to use new / obscure instructions. And yet that is often the only reason to use inline asm in the first place, because you need some specific CPU feature which the language you're using doesn't expose.


> I feel like a smarter compiler could have a friendlier asm syntax, similar to the one used by FreeBASIC[1].

Feel free to propose a better syntax, then. This feature is not yet part of Stable Rust, so syntax improvements are very much possible.

The FreeBASIC syntax seems problematic though, since it would require architecture-specific smarts in the compiler and make it hard to port it to different architectures or add support for newer asm features. It might be better to have these additional features as separate tooling or crates, that might even wrap the asm! syntax itself via proc macros.


Sadly I think Rust's way is the best one in the long term even if it's not very user friendly. There are a lot of problems with the FreeBASIC syntax, my comment was mostly wishful thinking. As other user has said in a reply to my comment, you can't beat the level of control Rust's syntax provides and that's what most people writing assembly would want. You are also right that a proper asm parser would be difficult to maintain due to the amount of architectures Rust has to support.


Notice that you can quite easily implement a Rust proc macro that takes whatever DSL you want to use for inline assembly, and generates an `asm!` expression from it.

So better syntaxes are possible, as long as they can be built on top of the proposed inline-assembly feature.

They can, however, just be implemented as normal Rust libraries, and do not need to be part of the compiler.


Yes, I mean we shouldn't forget that asm today is a last resort escape hatch which should be avoided if possible.

EDIT: Just to be clear it's still important. But it's mainly used for a view "very high control" use cases.


> I don't have to specify clobber registers,

So how does the compiler know that these are clobbered then ?


Since the parent comment said "just pushes and pops all important regs", it probably assumes that all registers are clobbered.


I expect the entire assembly is encoded into the language.


That doesn't really help - the compiler doesn't know what could be clobbered by any given (sys)call.


Not really, the other reply is right. Most registers are clobbered. FreeBASIC only passes the asm code inside the asm block directly to GAS (using the .intel_syntax noprefix directive). As I said, the compiler is too dumb to figure out what registers are being used so as a safety measure it clobbers almost all of them (see Register Preservation here[1]).

[1] https://documentation.help/FreeBASIC/KeyPgAsm.html


I am super new to Rust and maybe not the best place to ask, but why would you want to use inline assembly in Rust at all? Does that not invalidate a lot of the safety features built into the language?


"safely abstracting `unsafe`" is the important concept with respect to rust and it's use of `unsafe`.

`unsafe` occurs in rust source code where the author of some code needs to do things in a way that can't be directly proven to the compiler to be safe. These include things like calling C functions and ASM code (both cases where we can't infer all the information necessary to ensure safety). The author of the unsafe code then provides an "safe" abstraction around the unsafety that ensures that when one uses the "safe" interface, no undefined behavior occurs.

At the lowest level there is always some unsafety: system calls, libc function invocations, asm, modifying various memory mapped registers. What rust provides vs C or C++ is effective isolation of the unsafety.


There are some places where you have to use assembly instructions:

- Operating systems. For example, the voodoo that happens during early boot, where you need to walk the CPU through different modes. And also fundamental kernel stuff like mapping memory.

- Cryptography. Crypto code is often hand-written in assembly for efficiency. (Sometimes it's even the other way around: crypto algorithms may be designed with specific machine instructions in mind.) Also there are places where you have to guarantee that some operation happens in constant time, which is hard to do when a compiler may choose to insert branches or jumps without telling you.

To compete with C in those spaces, Rust will need inline assembly features.

As far as safety is concerned, yes, inline assembly is completely unsafe. In that sense, it's not so different from calling functions in the C standard library (or any other C library), which might do absolutely anything with your memory. In all of these cases, the Rust programmer has to use the `unsafe` keyword, and it's up to them to make sure that Rust's rules are still respected after the unsafe code has run. Doing this properly, and wrapping it all in a safe Rust API, means that other safe code can then use your library without the `unsafe` keyword and without any risk of triggering undefined behavior.


> - Operating systems. For example, the voodoo that happens during early boot,

Isn't every system call an assembly instruction? Not just the voodoo stuff?

My only experience with "real" assembly was my OS class in college, in which a project we had involved adding system calls to Linux, and they were all snippets of assembly code called in C.


Yes, generally speaking you need inline assembly to execute whatever architecture-specific instruction is used to enter the kernel (`int 0x80`, `syscall`, `swi`, etc.). And in the kernel you need inline assembly for other various architecture-specific instructions so you can execute them as part of your regular code instead of having to write them in a separate assembly file.

For the weirder cases like right after boot or handling an interrupt you generally just have to go full assembly for that, since in general you're not started out in a state where you can just start running your typical C/Rust/etc. code. It depends heavily on your platform at that point though.


Yes, in general. Most Rust code will go through a libc, though, which will make the syscall internally in assembly and expose a C interface.


Rust has lots of unsafe features; they all require explicit `unsafe {}` blocks.

As the post mentions, inline assembly comes in handy in a number of low level contexts (e.g. dealing with memory-mapped devices, working on microcontrollers, using processor features that aren't exposed by the kernel or standard library).


> e.g. dealing with memory-mapped devices, working on microcontrollers, using processor features that aren't exposed by the kernel or standard library

Or kernel features not exposed by the libc.


> using processor features that aren't exposed by the kernel or standard library).

As someone who doesn't do any low-level work, could you give an example of these kinds of features? I'd like to read into some of them :)


Moving into or out of a control register, such as to enable a processor feature. Disabling or enabling interrupts. Reading or writing model-specific registers. Saving registers, switching stacks, and calling another function, then switching back and restoring registers when the function returns. Initializing and using hardware virtualization features (e.g. Intel VT). Using features like SMAP (preventing accidental access from the kernel to user memory through a wild pointer, temporary enabling it in careful dedicated routines like copy_from_user and copy_to_user). Making raw system calls from userspace.


My go-to example of this is CPU-level virtualization support, like AMD-V or VT-x.

Some C/C++ compilers will expose these instructions as intrinsics, but AFAIK Rust doesn't.


I used inline assembly once when playing around with Rust on GPUs to get Cuda's implicit thread ID and grid ID: https://docs.nvidia.com/cuda/parallel-thread-execution/index...


Some processors implement "Instruction Level Parallelism" where you can do arithmetic on many values simultaneously with special assembly instructions.

The most widespread cases of this (SSE, AVX2) came from Intel trying to get lock-in in the high-performance computing market. So it took a while for AMD to sell chips that implemented these instructions and even Intel's catalog doesn't uniformly offer them.

It's also tough to get the compiler to emit these instructions where you want even if it knows they're available (unsurprisingly, since you're asking it to auto parallelize a computation), so in the high-performance space a lot of people just have to resort to inline assembly.


Just a minor terminology point. Instruction Level Parallelism (ILP) means something else.

What you are describing, with special instructions, is called Single Instruction Multiple Data (SIMD), or simply vector instructions.

ILP means the execution of multiple separate instructions in parallel per clock cycle, though techniques like superscalar, out-of-order and speculative execution. ILP is sometimes used as a measure: How many instructions per cycle can be issued. It does not require special instructions, it's just the hardware being clever about running existing instructions faster.


Ah, we used that wrong in my old group then.

We talked about Instruction Level Parallelism vs. Thread Level Parallelism vs. Node Level Parallelism since generally if you are working on a problem with some data dependency you will really have to think about each separately to get the best performance.


This sort of instructions you often have intrinsics for though.


Ah, yeah, in my head I tend to conflate intrinsics and inline assembly


Just take any operating systems course and try to implement your own.

Or try to program a micro-controler "hello world" that actuates an LED ?


Aren't those usually MMIO?


Rust wants to be a serious contender for low level and embedded code. In those spaces sometimes you don't have much of a choice but to drop down into assembly.


If you have a large-ish project that is otherwise coded in Rust, but you have to do some low-level coding because of special I/O, or special intrinsic features for multimedia, or even just flat-out performance, then this allows you to do so without falling back on linking against ASM or C.


Because ultimately Rust wants to be a C++ replacement and you can't do that if you don't let people who know what they're doing opt into inline assembly and raw pointers.


Yes, but low-level programming implies direct access to memory and instructions.

Rust is not designed for memory safety only. If you only want that, there are other simpler options, like any functional, scripting or managed language. Instead, Rust is designed to bring as much memory safety as possible (but not more!) to the low-level and performance fields.


Embedded/systems programming is not possible without dropping to assembly at times. Also, it can unlock some hardware-specific performance optimizations.

Rust already has unsafe blocks (explicitly marked `unsafe`) for this kind of situations. Inline assembly obviously is allowed only in unsafe contexts and should be used very carefully.


I'm new to Rust, so there may be precedent that I don't know, and I'm kinda spoiled because I'm a front-end dev mainly, where the rule is "don't break the web", but doesn't changing the syntax of a macro like that break backwards compatibility?


To clarify, "unstable" in rust-speak means "You can use this, but it might completely change or go away, and therefore break your code."

You need to deliberately opt in to unstable features, so you are unlikely to use something like this by accident.


Also worth noting that the old syntax was renamed from `asm!` to `llvm_asm!`. It'll still disappear at some point, but existing users of the former on nightly have a very easy way of fixing their build in the meantime.


asm! has been an unstable feature for years, which means a) it's available only in nightly builds, and b) there is no guarantee that it will remain backwards-compatible.

Actually, with regards to asm!, it's been known for a long time that it was unlikely to be stabilized as-is (being originally a very thin wrapper around LLVM's inline asm functionality). Probably nearly everyone who's been relying on inline asm to make their code work has already been following the effort to replace asm! with a better version.


c) Even on nightly builds you need to put #![feature(asm)] at the root of your crate to use it. (Likewise for all unstable features, just replace asm with feature_name.)


> Probably nearly everyone who's been relying on inline asm to make their code work has already been following the effort to replace asm! with a better version

Except for my colleague who basically says "nah, they'll never be able to move off the LLVM syntax, the code i'm writing now will be supported forever".

Fortunately, we only have a small amount of assembly, so if we ever do want to upgrade to a version of the compiler which has dropped support, it won't be a lot of work.


Support will likely not be dropped anytime soon, maybe never. You just need to rename it to `llvm_asm!`. Through with `asm!` likely going to be stabilized while `llvm_asm` never going to be stabilized it might be worthwhile to switch to the new asm to get of nightly in the future ;=)


Rust has two ways of ensuring forwards compatibility (or incompatibility):

The first is the `stable`/`nightly` system. Anything in `stable` gets compatibility guarantees, anything in `nightly` can change or be removed at any time.

Nightly also has feature-gates: using an unstable feature requires an opt-in to enable it. The old `asm!` syntax which got renamed was only ever in `nightly`.

The compatibility guarantees for `stable` are based on editions: each year there's an edition, your project's `cargo.toml` specifies which edition your code is for, and compilers will continue to support old edition code. New editions might require changes to existing code, but as long as you don't update the edition in your `cargo.toml` you can freely update your compiler (to a new stable version) without worrying about it breaking. You can use crates (packages) from any edition with any other edition, as long as the compiler is new enough to support the latest edition used.


The important thing in this case is that the previous syntax was never actually stabilized, and was only available in nightly releases of the compiler.

So technically it does break some code, but only code which was deliberately opting into unstable features with the knowledge that it may break in the future.


> syntax was highly unlikely to ever graduate from nightly to stable Rust, despite being one of the most requested features.

I think it's safe to infer from this line that there was not yet a guarantee of stability in the interface, and that breaking backwards compatibility was therefore fine.

>We renamed the existing asm! to llvm_asm!

They also kept the existing functionality under a different name to ease transition. Current users will need only update the macro name and then transition to the better syntax at their own pace.


`asm!` never made it out of the nightly builds, so nothing that's being changed has been released.


"don't break the web" type issues are not really applicable to releases of precompiled languages or applications that ship their own interpreters and don't load third party content.

Let's say you are browsing my website. Suddenly, your browser gets an update and prompts you to re-open the window. My site no longer displays correctly due to a backward incompatible change.

My site is the web and your browser broke it.

If you use a native app or service I write in rust 1.35 or whatever and there is a backwards incompatible change in a new version of rust it has no impact on your use of that app at all.


backwards compatibility is still highly valued. Code no longer compiling in newer versions of the languages can create huge churn.

It's one of the reasons Java is so widely used, you can still use Java 1.4 libraries in Java 13 and most 1.4 code will keep working and compiling.


Every couple years they'll define an edition of a language (e.g. rust 2015, rust 2018), the idea being that if you choose to stick to one of them, the latest rust will still compile your code as long as you tell it what version you're targetting.


This isn't related to editions; asm was never stable syntax and couldn't be used in the stable compiler; you could only use it on nightly and only if you opted in. This new syntax is something we hope to stabilize eventually.


>I'm a front-end dev mainly, where the rule is "don't break the web"

left-pad flashbacks intensify


I don’t know much about the topic, but I am curious: does this open Rust up to being less “safe” (in the Rust sense of safety)?

C is of course higher level than assembly and I understand that C is very much not “safe”, so it makes me wonder whether this gives library developers incentives to add code that has a higher likelihood of creating security bugs while optimizing their code.

Is there any reason to believe this could be the case, or am I just misunderstanding how all of this fits together?


This does not make Rust less safe. The capability to use assembly is already there: either by using the existing assembly syntax, by linking to C code, or by linking to assembly routines directly. In order to use this, you need to use the unsafe keyword in order to indicate to Rust, "I am aware that the guarantees that Rust provides to me are not applicable when using this feature." This is true of all the existing ways of interfacing with non-Rust or assembly code, and it does not change with this new syntax.


unsafe { } blocks already exist, and in fact it looks like these have to go inside of them. Rust can do unsafe things, it just goes to great lengths to empower you to avoid accidentally doing unsafe things. It also forces you to label all unsafe code with an easily-searchable demarcation.


If I wanted to learn assembly language in 2020, what's the best book or resource people would recommend?


https://famicom.party features the best, most accessible 6502 assembly crash course I've yet to come across.

It does strike me as weird that there aren't any popular resources that sit between something as high-level as the linked e-book and something as low-level as "buy some hardware and figure it out".


I suspect this has to do with Intel architecture not being very beginner friendly. When I tried learning x86 assembly, I found it to be incredibly overwhelming.

There are simulators for teaching assembly, but I don't think those are nearly as fulfilling as hacking on real hardware. And since it's difficult to hack on your computer's hardware, you're left pickup up something more simple.


In my opinion, it's less that x86 is any more or less user-friendly, and more about having an environment that gives you the "raw" bare-metal feeling that you could once have from DOS. You can write a few instructions, write to video memory, and the result is right there.

You can still get that feeling on x86. For instance, try writing some code directly in the MBR of a disk that writes some bytes to the serial port on 0x3f8 with outb, and load it up in KVM.


Thanks!


I really liked Assembly Language step-by-step. https://www.amazon.com/Assembly-Language-Step-Step-Third/dp/...


As an unusual initial introduction, I've pointed several people at https://www.muppetlabs.com/~breadbox/software/tiny/ .


I'm reading through some of these posts now and they're a great introduction to ASM and the larger ecosystem. Thanks for sharing.


I really like "Introduction to 64 Bit Assembly Programming for Linux and OS X: For Linux and OS X" by Ray Seyfarth. It's short and you actually implement something interesting like the correlation function or basic data structures, like linked list.


Thanks!


Imo there’s not really a best book or resource other than buying a microcontroller and doing it yourself! Pick up a ARM Cortex-M4 development board either from TI or STM and reimplement their example code or driver code in assembly! You can look at the ISA to see what assembly instructions are supported, or the disassembly of a running program (probably with no optimizations so you can see what they are doing)


There’s definitely something to be said about poking a physical pin in assembly. It makes me feel like I actually have control of the computer, and that I’m not just along for the ride.


Is there anything like https://github.com/herumi/xbyak on the Rust side? I like this approach a lot for non-tivial high performance work since it lets you tweak your assembly for particular hardware at runtime.


I don't understand (I do actually but it's not that much work) why so few languages do what D does and actually have inline assembly as a fully supported (no quotes) syntactic construct within the language


> I do actually but it's not that much work

It apparently is, given that D only supports x86 assembly, doesn't support all instructions and has idiosyncrasies like requiring prefixes to be written as separate instructions.


D only has about <20 people working on it at a given time. Some languages probably have at least a hundred times that number.


The number of people working on the asm! for Rust on the more obscure architectures is likely less than one.


Always tricky. I avoid inline assembler, except in cases where the compiler cannot generate some special instruction. This can occur in drivers and kernel code, where there's no other language feature to manipulate processor architecture (special registers or machine state manipulation).

It can be fun to try and hand-optimize code in regular applications. I can't help but feel, there's more juice in teaching the compiler to recognize the case and generate better code instead. But few know how to go down that road.


Teaching compilers works in most cases, but not all of them. Sometimes the language just does not have ways of expressing things to the compiler that you would like it to know. (For example, how would you explain to the compiler that reading within the same page will not trigger a fault? Such a guarantee is often used in vectorized strlen functions, for example.)


Hopefully this will help us get better numerical libraries for scientific computing.


I hope so too but I’m skeptical. In my opinion, Rust still needs ergonomic multi-dimensional index expressions and standardized multi-dimension array traits. I’d love to see (or join) a scientific working group to help with this.


You can use a[(i, j, k)], and also a[i][j][k] for some data layouts.

As for multidimensional array traits, you can use Index<(usize, usize, usize), Output = f32>.


you can use the syntax `a[[i, j, k]]` too, just for another example.


The lack of such features did not prevent numerical libraries in C.


Or Rust (I don’t understand your point)


As the topic was inline assembly, I interpreted "better" as "faster", not "more ergonomic".


That would be amazing, but I would also be satisfied with a numpy clone. Of course language-level support would be far better!


I can envision something between the two, e.g. a situation similar to the Future trait and the various async backends.


For that sort of code, the value of inline assembly over compiler intrinsics (and Rust already has stabilized arch-specific intrinsics for the platforms listed in the current inline asm proposal) is likely minimal or nonexistent.


Is Rust ever going to go for standardization? Seems like the language has been in a state of flux with heavy changes all the time. When's it going to settle down a bit?


Keep in mind that the changes since 1.0 have been with very few exceptions completely backward compatible. I write a significant amount of Rust and I don't follow the development of the language ultra closely anymore, yet it doesn't feel like I'm coding for a moving target.

Basically from my point of view the changes are about adding features (inline ASM in this case, async I/O previously) but it doesn't really change the way you code, it's just an other tool in your belt that you may decide to use later. If you didn't have a need for inline ASM before you still won't be using it, if you couldn't use Rust previously because of bad ASM support it opens up new use cases.


As long as it's backwards compatible, why should that be a problem?


Use of Rust on domains that require compilers for ISO and ECMA certified languages.


Bro. First, unsafe and now assembly. How is Rust a safe language?


As you can see, unsafe blocks are required for the assembly block. If you see "unsafe" itself as a problem, you should perhaps read a bit about Rust's philosophy about the matter: https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html


There are two languages, “rust” and “unsafe rust”. All code in unsafe blocks is not safe.


> All code in unsafe blocks is not safe.

That's a sadly common misconception. The unsafe blocks only enable a few extra language features, like dereferencing raw pointers and calling unsafe functions; if none of these extra language features are used, the code is completely safe. That is, if you have a block of code ({ X }) and just add the unsafe keyword (unsafe { X }), it works exactly the same and is still safe. This also shows that there are not "two languages"; what you called "rust" is a subset of what you called "unsafe rust", not a different language.


The first paragraph of the chapter on unsafe rust in TRPL reads:

> However, Rust has a second language hidden inside it that doesn’t enforce these memory safety guarantees: it’s called unsafe Rust and works just like regular Rust, but gives us extra superpowers.

I disagree that “{ X }” and “unsafe { X }” are the same. The former is provably safe if a bug free rust compiler accepts it. The latter leaves the burden of proof on the programmer.


This looks a lot better than GCC’s asm. However, I can’t think of a good reason why one should use asm in their code instead of letting the compiler write the far superior asm for you.


It's not always superior!

For example, I've been working on a big integer library. Addition is a bit annoying, because you need to add 3 numbers (the prior carry bit, and the two digits you're adding) and get out a new carry bit and a resulting digit. This is a bit cumbersome:

        let (res, carry1) = target_digit.overflowing_add(carry as u64);
        let (res, carry2) = res.overflowing_add(other_digit);
        *target_digit = res;
        carry = carry1 || carry2;
The resulting assembly is a fairly literal translation. We perform the addition using an add instruction, and extract the carry flag into a register using setb.

        addq    (%rdi,%rsi,8), %rcx
        setb    %r10b
        addq    (%rdx,%rsi,8), %rcx
        setb    %al
        movq    %rcx, (%rdi,%rsi,8)
        orb     %r10b, %al
        movzbl  %al, %eax
But there's a dedicated instruction for this, adc. adc adds two operands and the carry flag, while itself setting a carry flag. Manually unrolling the loop a bit, I wrote this assembly:

                shlb $8, {carry}
                adcq 0x00({y0}), {x0}
                adcq 0x08({y0}), {x1}
                adcq 0x10({y0}), {x2}
                adcq 0x18({y0}), {x3}
                adcq 0x20({y0}), {x4}
                adcq 0x28({y0}), {x5}
                setb {carry}
And got a 3x speedup.


Thanks for this - its been many years since I've done anything touching assembly, and never outside of an academic context, so when I read conversations like this I'm always curious for concrete examples of how people are actually _using_ this stuff beyond something vague like interacting with VT-x or working on an experimental OS.


What is Rust's policy for signed/unsigned int overflow? I assume that it's not modulo or else the complier should have generated ADC for you.

I assume you've been using Compiler Explorer a lot?


> What is Rust's policy for signed/unsigned int overflow?

Defined, by default checked in debug mode and unchecked in release, unsigned wraps and signed wraps as two's complement. This can be overridden by explicitly setting overflow-checks in the relevant profile.

It also, separately, has explicitly wrapping, saturating and checking versions of basic arithmetic operations.


overflowing_add, like I was using above, is explicitly wrapping.

I've found it very difficult to provoke rustc into emitting an ADC: the only case where it does so AFAICT is when adding u128s, which are implemented using u64s. Not sure why, except that the shortest function I could think of to emulate ADC is kind of baroque, and it's possible the compiler can't figure it out.

I've mostly been using cpuprofiler[1] and Vtune to simultaneously profile my code and show the assembly. In theory they both provide timing information per-instruction, but I don't really trust it. For the 6 adc instructions above, it shows the number of clock ticks as ranging from 22 million to 3 billion, which doesn't make sense to me. But at least it shows me the assembly!

[1] https://docs.rs/cpuprofiler/0.0.4/cpuprofiler/index.html


This will change your life!

https://godbolt.org/z/fntvYa


The article here:

https://news.ycombinator.com/item?id=23351007

specifically recommends against the carry variants of addition, because the instructions are still dependent on each other and don't pipeline well. In other words, it's using the same algorithm, just buried in a single instruction, and that doesn't necessarily make it faster.

Have you considered using a strategy similar to what the article suggests? I think the HN comments also had some additional suggestions.


I was briefly very excited when I read that article, actually. But as devit points out in a sibling comment, that technique is only relevant to cases where you're adding more than 2 numbers.

Multiplication initially seemed like a very promising use case, since it's basically repeated addition. But I'm not super optimistic about that, because I think it's dominated by the alternate optimization of noticing that the product of two 64 bit numbers cannot saturate the high 64 bits of the resulting 128 bit number, which causes carries to be bounded [1].

[1] https://github.com/rocurley/bignum/blob/b45448a156fb9100ab06...


> that doesn't necessarily make it faster

Did you miss the "And got a 3x speedup" part of the post you were replying to? Actual benchmarks of real code always trump theoretical deliberations.


Yikes. I was just trying to link to a relevant technique that the author might find helpful.

A big integer library may have many use cases; a benchmark only shows one data point. It's possible that by deferring carry work across more operations he'd see an even bigger improvement.


The dependency is unavoidable due to the way addition works.

The approach in the article only works if you are adding a lot of numbers together, and then indeed doing carry propagation once at the end is obviously faster.

But of course there is no way that doing the carry propagation yourself on one addition can possibly be faster on a decent CPU that implements add-with-carry efficienly.


Maybe it's worth considering an interface to a big int library that can defer carry work across many operations, and then normalizing at the end?

That certainly sounds useful for, e.g., totaling an array of big integers.


Compilers are great, but they don't always produce optimal code. You can quite often produce faster code if you're prepared to hand roll the assembly code, but it's rarely something most people will need to do. The time it takes to produce often outweighs any advantage you'd get, but having it as an option is going to be very important.

Just within the rust world, https://www.nickwilcox.com/blog/autovec2/ from a few weeks ago talks about the limitations with the compiler around vectorisation, demonstrating that with hand-rolled assembly they were able to cut the runtime in half, vs what the compiler was able to produce.

Outside of the rust world, you'll often see hand-rolled assembly in performance critical code across a spectrum of uses, e.g. OpenSSL uses it extensively for performance reasons, Go has it in a number of places in their libraries, with a pure go fall-back, and so on.


Ah gotcha. I guess I just haven’t come across a situation that calls for hand written ASM yet :p


I definitely haven't for the things I'm working on. Good enough is good enough.

edit: Thinking about it, that's not true. I was dealing with some hot code running on some MIPS architecture stuff a while back and the compiler's generated ASM was merely "okay", for the standard library. Alternative libraries with hand-rolled ASM ran significantly faster and we were going to leverage those, when another team came up with architectural changes that allowed us to take that away from the MIPS part of the stack.


I used to work on compilers and run-time libraries for embedded systems; our preference when we sub-optimally compiled the standard library was to teach the compiler how to generate the optimal code :).


Compilers kind of suck at filling branch delay slots :(


Most application developers don't need assembly, but it's still necessary for the people writing Rust and its stdlib itself, embedded developers, people writing libraries that use very specialised optimisations, SIMD, etc.


You can't explore undocumented CPU instructions, like this guy is doing: https://youtu.be/jmTwlEh8L7g


One niche case where you need assembly is if you write a green threads library. At the yield points, you need to save all necessary registers of the current thread, and restore the registers of the thread you're switching to. There's no way you can do this without assembly, because you a) you need to control exactly which machine instructions are emitted in which order, and b) you need to read and write actual registers, not variables or memory locations.


Sometimes the compiler generates worse ASM (rarely, but it can happen, especially with vector instructions). Sometimes you want to use ASM instructions there are no intrinsics for. Sometimes you want to write some extremely low-level startup code for an OS or microcontroller and simply need assembly since the C/Rust/whatever runtime can't operate yet (eg main memory needs to be initialized).


Concerning vector instructions, it's not rare at all for compilers to generate worse code. That's something they still suck at, for the most part.


Yes, I suppose I phrased it poorly. For non-vector instructions it's rare, but auto-vectorization is often poor (though Rust seems to be pretty good at it, it's still far from perfect).


One example is authoring USDT probes. This is done by writing inline assembly comments [1] that the kernel can then trace when instructed to.

Also people doing cryptography things occasionally need access to specific ASM instructions for their work. I believe AES has specialized instructions, but there are more cases.

You're not wrong that writing inline ASM is a fairly specialized need, but for those who require it this will come in incredibly useful.

[1]: I still need to try using this, so I might be wrong. I was waiting on inline assembly to be available before trying tho, so quite excited it's now available.


Yeah true. But doesn’t Rust provide an equivalent to C’s intrinsic functions like popcount?


Yes:

https://doc.rust-lang.org/core/intrinsics/fn.ctpop.html

https://doc.rust-lang.org/std/?search=count_ones

There are still use cases for raw assembly untouched by the optimizer - e.g. constant time cryptography functions designed to thwart timing attacks might desire control over the exact instruction stream. Although, most of that can be tackled with out-of-line assembly - as one must do with x64 MSVC for example.


Because compilers almost never write better code than humans ?

There are literally tens of thousands of bugs in the GCC and LLVM/Clang bug trackers about that.

I don't remember the last time I read some compiler-generated asm and thought "wow, that's great code".


> Because compilers almost never write better code than humans

Compilers almost always write better code than most humans.


Really? Citation needed.

Compilers are programmed by humans. You are claiming that somehow compilers have learned to produce better assembly than the humans that programmed them.

I am a compiler writer, and know many compiler writers. I have yet to find one that would stand by your claim. When I look at the output of LLVM or GCC, I am ashamed.

I wish I had more time to make LLVM much better, but actually, 99% of my time is spent in fixing bugs filled by users that are neither compiler writers nor assembly programmers, and they too realized how bad the machine code emitted by these tools is.

So sure, a compiler emits better assembly than a human that hasn't learn assembly programming. It also emits better assembly than all humans that do not know how to program.

But no, it does not produce _by far_ better assembly than any programmer that actually cares about machine code enough to just take a look at it and check it.


Sadly it is yet another language that follows the UNIX tradition of not having a proper Assembly parser instead forcing us to play with strings.

I guess it leaves PC based languages to have actual inline Assembly parsers or intrisics.


If you want the outer language to be cross platform (e.g. x64, ARM64, etc), then you'd either need a parser for every potential platform (even ones you aren't currently using!) OR you just settle for strings.

Don't get me wrong: Parsed assembly is superior because you can have certain parser guarantees/checks. But it will definitely limit the scope of your language or at least slow its adoption to new platforms.


I really don't like the strings (because Rust is so good at avoiding things like this), but your explanation makes tons of sense.

That being said, it wouldn't be hard to create an asm_x86! macro that hides the string.


Productivity of language users vs compiler developers.


Productivity of a niche area.

But related: There's no reason parsers cannot be externalized and inject the resultant assembly strings into the Rust solution. If that's really what people want. The nice thing about external parsers is that they become optional in every sense.

If the Rust team builds them in, then every supported platform needs support, every development environment needs every parser, and tooling also has to support it (e.g. IDEs, code checkers, etc).

It isn't simply just adding it once and leaving it forever, the work is repeated several times over and the maintenance will be time-consuming for what is: A niche feature.


Better not starting talking about niche in regards to Rust, unless adoption of a niche systems language isn't something to care about.

Productivity matters everywhere, systems languages that care about happy programmers have better chances of long term adoption.


That's a pretty shallow argument, one that can literally be re-used verbatim to argue for any and every feature/philosophy. Particularly when "happy" is as subjective as this.

It also ignores the reality that the Rust team's time is finite, and if they invest it into three or more assembly parsers then they aren't investing it into other areas of the language that could also be productivity multipliers. Areas that more programmers actually use.

Ultimately how much is unsafe assembly in Rust going to be used? And how large are those code sections? And are you better off creating Assembly in existing dedicated tooling and importing it, even if Rust had native parsers (e.g. due to better debugging/visualization elsewhere)?

Honestly if Rust code outside of standard libraries ends up with a ton of assembly anyway then that's a weakness of Rust. You should be using as little as possible. Even unsafe Rust is superior to assembly because it is platform agnostic. I'd go as far as to call it a "last resort."


It seems reasonable, but we need to not conflate niche with lacking-popularity (i.e. low usage).

Niche is about the environment, i.e. the problem domain.

Analogy to biology/ecology, grazing on mountain tops is a niche that is also unpopular (i.e. low usage), meanwhile pollination is a niche that is popular.

The problem domain for Rust to be useful is very large, across the programming ecosystem and across the programming ecosystem effort is put into increase accessibility for the less and less experienced (e.g. embedded, wasm, webservers, clis, etc). However, Rust might currently lack popularity (i.e. low usage).

The problem domain for Inline Asm to be useful is very small, typically by experts in narrow domains and across the programming ecosystem effort is put into decrease the need for inline asm (e.g. crypto, firmware, drivers, etc). It also happens to be unpopular, and is avoided as much as possible.

Summary, Rust is pragmatically niche, rather than theoretically niche, Inline Asm is theoretically niche, and also pragmatically niche.


There are infinitely more language users. I'd rather throw a bone to the compiler developers.


Then that means there is infinitely more productivity to be gained catering to the language users.


One more reason to actually care about the productivity of language users.

They are free to go elsewhere more welcoming.


It's not "PC based" that's the magic you want, it's the assumption of x86 (and, very recently, sorta, ARM, in a few compilers).

Architectures are weird, assembly is crazy. There just isn't going to be a single set of semantics around what you want.

This Rust thing isn't revolutionary, it's indeed just a prettier way of writing the model GCC invented a few decades back:

+ "Assembly" is a black box for the code author.

+ The compiler's job is generating code around the assembly, not the code itself.

+ So the fundamental model is "constraints": the rules the compiler needs to follow before and after the assembly, and how to emit an appropriate string to represent the register/address/immediate/whatever requested by the assembly code.

And like it or not, this works. It works really well. Generations of OSes have been written in this model without much complaint (beyond the general ugliness of the gcc syntax, which Rust is trying to fix).

I think this is fine. I agree it's not going to bring a revolution, but we really don't want one in this space.


GCC did not invent anything, that dumb approach to inline Assembly is traditional in UNIX environments.

Amiga and Atari also had similar inline Assembly support.


I still think you're missing the point. GCC indeed didn't invent "inline assembly as a string", no.

GCC invented "constraint metalanguage as the foundation for inline assembly generation", which is the model here that enables one compiler and one syntax to work the same way for basically every architecture over decades. Modelling the interior of the assembly code as a black box (which in this implementation means "string") is just a side effect of this important design choice.

Again, the stuff you're imagining isn't a general purpose assembly language model, it's an x86-specific hack that works only on the specific compilers and architectures for which it was targeted.


Except that compilers with inline Assembly also don't have dumb backends that just dump Assembly for as, like on UNIX.

They interact with the backend.

They weren't specific of x86, although I mentioned only PC, they were available across all 8 and 16 bit home computers and several embedded targets support it as well.


GCC doesn't implement the "dumb approach" that existed before it, and that Amiga and Atari originally had. The GCC approach provides enough information to the compiler that code can be optimized in the presence of inline assembly; the compiler is informed of what the assembler code reads and writes, and can allocate registers for it. It was much more flexible than the implementation in the original Unix compiler.


Microsoft declined to extend its inline assembly syntax to support x64 or ARM-based processors.

If you want to use a single specific instruction or two, it's usually better to just use the compiler's intrinsics (Rust has already stabilized the x86 SSE and AVX intrinsics) than inline assembly. When you go to more complex assembly fragments, you tend to need more control over how the inline assembly fragment interacts with scheduling and the register allocator, and making that control work ergonomically is that much more challenging.


I'm sure someone will be along to provide a wrapper macro or macros to address that shortly.

I can see why they would not want to bless a particular syntax tree or whatever into the core compiler.


(Once this is stable) haven't they blessed a particular syntax tree anyways, since the contents of the "" string is parsed to a specific syntax tree that will have to remain stable. The only difference is that that syntax tree is delimited "<contents>" instead of asm!(<contents>).


Does compiler behavior matter here, or could your problems be fixed in your editor?

On the rare occasion I write inline assembly, my annoyances were autoindent, syntax highlighting, and keystrokes required. Only the latter is directly caused by the stringy inline syntax, and an editor could still mitigate it.


Are you aware of PeachPy? It's the only (Intel) assembler I have ever seen I liked.


Strings can be parsed. The rust compiler parses format strings, for example.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: