These progress numbers are very interesting:
and I was wondering how directed the effort was.
The way Rust code gets added includes:
* A new feature needs an identifiable library, so the new library can be written in Rust to begin with. (Example: U2F token USB integration.)
* Old code needs a rewrite anyway, so the rewrite can be in Rust. (Example: Character encoding converters.)
* Servo has proven a component, so it makes sense to bring
it over. (Examples: Stylo and WebRender)
* History of vulnerabilities in code that was replaced. (Example: MP4 metadata parser)
The article links to a longer article (https://hsivonen.fi/encoding_rs/) about encoding_rs. The longer article mentions a bug that got fixed in Firefox ESR after the code had been replaced with encoding_rs in non-ESR Firefox. (I wrote the bug, too, though.)
> or newly re-written crates now more useful to the wider community than the same code locked up in C++.
encoding_rs is an example of a crate developed for Firefox but also developed as a crates.io crate from the start. ripgrep is probably the best-known Rust-only app that uses encoding_rs. Since Visual Studio Code bundles ripgrep, I believe Microsoft shipped encoding_rs before Mozilla did!
Is it mature? Can you shed more light on how is it used in production?
I don't believe that it uses Actix, though. Actix was created by and is maintained by a Microsoft employee.
I doubt this would ever be discovered; who would analyze code that was formerly a part of Firefox?
There's also folks that just study these things to identify patterns in problems created, prevented, or detected (at what effectiveness) in various languages and techniques in software development. Along similar vein, each bug report also provides (in theory) a test case for automated tools that detect bugs. It's very important to have a huge, diverse pile of code to test those tools with. That's because each one's algorithms might have blind spots missing bugs. The more code and bugs we have, the better we can assess those algorithms' accuracy. And then build better algorithms. :)
That said, I can't find the source right now but I believe the quote is something along the lines a sizable percentage of Firefox's security bugs would be less severe or nonexistent in Rust vs. C++.
So one then needs to resort to statistics and other stuff as argument validation.
For example, even after being proven wrong with Godbolt that it is possible to write safer code in C++, while keeping the same or even less hardware requirements, many embedded C devs still argue that it is not worthwhile for safer code.
Rust, just like other (almost) memory safe systems languages will get the same human judgement.
Ownership is a hugely important part of designing programs, and it's something people need to come to terms with eventually, but a language where you can't do even hello world without understanding ownership adds a lot of mental overhead to the learning process when someone is still not even comfortable with for loops and function calls.
That being said ownership is rather hard, but liberal usage of `.clone()` can get you pretty far.
Pyret at pyret.org is a good candidate since there's a group called Bootstrap successfully teaching it to middle schoolers. That's bootstrapworld.org.
Far as Rust, Ill also note people exploring the language or doing quick-and-dirty coding can just use reference counting if they want. Rust supports that. There's a performance hit but that's probably fine in those use cases.
People looking for bugs in Firefox ESR.
If you take on a full rewrite you lose a lot: you can't show that you are incrementally better, you cannot show that it will be a good long term investment, you must rewrite even well maintained core parts that works fine, you don't get to improve the original engine with the good parts and essentially you get nothing in return.
For a much better answer than mine:
The broad sentiment seems to me that it's a good thing to move over, as pragmatism allows. Personally I'm a converted skeptic - I had my doubts about the language and my initial stabs at it left me somewhat frustrated - but as I've grown to internalize its semantics and behaviour - the benefits in terms of safety and clarity of intent and optimization potential are clear (e.g. aliasing semantics are just so much better).
A mix of factors enter into whether a component moves over or not, including the views of the developer in question, the complexity of the API boundary between the main codebase and the subcomponent, and the complexity of the component logic itself.
If someone is interested in that as well, maybe there are some ways to join forces.
That may be the one thing left of C in 50 years' time.
But even within embedded rust you won't find much appetite for dynamic memory allocation, and a fair amount of the work has been stabilizing the mechanisms by which you can build rust code which does not use its standard library ('no_std'). This has nothing to do with the fear of the new and lots to do with predictability of code. Using a single heap for all allocations is not at all ideal in an embedded context: your allocations become much harder to predict (both in terms of failure and in terms of time taken), errors become harder to recover from, and it becomes harder to reason about the amount of memory your system will use (especially in edge cases). For embedded work you will generally try very hard to allocate everything statically, and use some kind of pool allocation if you cannot (and often you combine the allocation and whatever datastructure you are placing the objects in).
The same is true in game development. Using a per-frame arena allocator for short lived objects can make a big difference to performance. And Rust’s lifetime rules can be used to make this sort of thing completely safe.
It’s a shame that rust std completely relies on the hidden global allocator instead of accepting an allocator as a parameter. It means that while you can write your arena allocator, you can’t use it with any of the built in Box/Vec/HashSet/etc types.
I actually really like Zig’s answer here of just passing in an allocator as an argument in all data structures. I’d love to see that in rust!
Global allocators have finally stabilized. Working on it!
It's not fear of newer languages (most of the time). I would love to use Rust for embedded in a professional environment but it loses out quickly based on supported platforms.
As leader of an embedded development team, I can't choose a language that will limit the processor parts we can use when we get closer to manufacturing and cost reduction. At volume, many millions spent on software development is dwarfed by cost savings from swapping in the right part.
It doesn't tend to be that extreme a part swap - if you're doing signal processing and UI then you're probably always going to use a 32 bit part. If you're just doing some basic logic with a couple of digital inputs then you're probably always going to use an 8 bit part. You might move 32 bit ARM to 16 bit MSP430 to hit a power budget or similar.
Leaving aside processor architecture, the factors that affect cost from most to least (approx) are: Flash size, number of peripherals/features, clock speed, ram size. Basically, sometimes another manufacturer will have a part that meets your feature requirements much cheaper, and then you port the code. The rest of the time, you're just squeezing program space, memory usage and processor usage to reduce the required specs.
ChromeOS, Android, Windows/UWP, Fuchsia, ...
C ABI is a actually the OS ABI, when they happen to be written in C and expose syscalls in C.
As for C in 50 years, it depends how much we keep on using UNIX.
For example, WinRT - used by Windows/UWP - can be expressed entirely in terms of C structs and function pointers.
Yet this is much less relevant in UWP, which allows for richer type system.
Additionally the COM/UWP ABI requires type libraries for properly accessing the objects and marshaling.
And yes, WinRT does require its metadata for marshaling (COM doesn't - you can compile proxies/stubs from IDL). It does not require that for accessing objects, and there are many WinRT components which, in fact, cannot be marshaled at all. But all that doesn't make it any less of a C ABI - the metadata format is documented, and the standard WinRT APIs used to access it are themselves WinRT-compliant, and therefore the whole thing can be consumed from C, if painfully.
Similarly, the definition of a C ABI is "ABI that can be expressed in C terms" (not "ABI that is easy to code against in C") - i.e. an ABI that any language with C FFI can use. That last part is what makes this categorization useful.
And it has some significant implications beyond "assembly ABI" - for example, C doesn't have guaranteed optimized tailcalls, and therefore any such ABI can't provide such a guarantee, either. Or - C doesn't have exceptions, and so COM and WinRT have to use error signalling mechanisms that are expressible in C (HRESULTs and thread-local error description objects). Or take async - WinRT offers CPS-style futures via IAsyncResult, and because that's just a WinRT interface, any language that can do C FFI can do WinRT async (unlike fibers, goroutines etc, which require intrinsic support).
I smell vulnerabilities miles away with that sort of implementation. Hope the Rust programmers remember basic garbage collection in C, since C itself doesn't have it automated.
In this case, there indeed is no .c compilation unit between the C++ and Rust code that see each other via C linkage.
For starters it needs to have classes (probably using PIMPL like things internally by default). It needs to have some sort of error handling. It needs to support some basic data like std::vector (but they can start from scratch).
Edit: my fingers typed API not ABI first...
So we have things a little more basic than that, at various levels:
1. API: C. This takes care of a bit more than just data interop and common functions; it maintains an interface, takes care of platform portability concerns (calling conventions, linkage, runtimes), and allows future growth.
2. Modules: From executables and shared objects to services running on a remote machines accessible over TCP/TLS/HTTP(S).
3. Conventions: From free-form (de)serialization formats like JSON/YAML, to read-optimised formats that come with their own implementation code like Cap'n'Proto and Flatbuffers, to share-optimised formats that come with their own read/write/live-share implementation code like Apache Arrow. These come the closest to being a common "ABI", but (thankfully) restrict themselves from being ingrained into a language.
Being able to speak a common ABI doesn't necessarily guarantee that your local ABI has to conform identically to it. In Rust, for example, you need to add a repr(C) to be able to actually guarantee that the layout conforms to the C ABI. CORBA IDL is something that tried to achieve this kind of capability.
No, but you still need to support the common ABI. E.g., for Rust, that's an easy job, compared to Go.
But, Rust doesn't have classes.
> It needs to have some sort of error handling
But, Rust doesn't use exceptions, just algebraic data types (`Result`).
> It needs to support some basic data like std::vector
But, these aren't primitives, these are just std library constructs. Which is arguably an advantage, if you get a good FFI you can just use the other language's libraries directly.
A higher level FFI is a good idea, I think you're going to discover that the intersection of languages is surprisingly close to C though.
And so what if std::vector isn't primative. It's an extremely common construct that should be part of a cross-language ABI. Hell, it's simpler than strings and they are part of the C ABI.
This is one of those things that everyone is going to nay-say until someone actually does it. The same happened with webassembly. I remember when it was deemed impossible and PNaCl would never work etc. etc.
They are not "at a binary level the same as classes" (whose classes? C++'s classes?), at least not the moment things like vtables get involved.
Which would then not map straightforwardly to C++ classes, because the vtable pointer would be in the wrong spot.
You can't have an ABI that makes two things that represent things in concretely different ways somehow resolve that inherent contradiction.
Rust doesn't have classes. And as discussed in the article, Rust puts the vtable pointer on the reference while C++ puts it on the pointee, so even fully abstract C++ classes ("interfaces") and Rust traits (when used with type erasure rather than with generics) are implementation-wise different even if they look conceptually similar.
Go doesn't set up the stack in the same way as the others.
The main reason the C ABI is de-facto the lingua franca for language bindings is because it's almost runtime-less. You basically only need to agree on things like stack layout and where the parameters/return values go. It's a super low bar.
Dealing with C++ objects, overloaded functions, namespacing, exceptions etc... Now that's a whole different can of worms. How would you automagically map something like std::cout and its operator<< in Rust or Go for instance?
You'd define std::cout and operator<< in terms of a sane underlying reader/writer ABI?
Passing basic datastructures back and forth isn't hard and there are tools that will autogenerate the code to do that. But when you start using them you'll find they're not solving the hard part of the problem.
Memory ownership is trivial by comparison, particularly in this case when all the languages have the equivalent of malloc/free, so you just make that part of the ABI surface.
I haven't really looked at that in detail, but I would find any tooling for better language interoperability very worthwhile.
Looks interesting, maybe another step into Microsoft Linux.
GObject is a good example of a truly cross-platform ABI like that.
It was a clever decision. I'm not sure if the right one but a nice thing to try to see how well it works. I wonder if anyone did a "Looking Back"-style write-up on how it's worked out so far with pro's and con's.
I'm not sure what the exact status of extendin GObject hierarchies from Rust is. See https://github.com/gtk-rs/gobject-subclass
I just have two complaints.
The build times, as Gtk-rs seems to do code generation during the build, and the usual Rc<RefCell<item>> for accessing internal widget data on the callbacks.
Certainly successful at bringing new developers into the Gtk world (especially thanks to elementary OS and its developer documentation). Familiar syntax for C# devs.
The downside is, well, memory bugs. For example, sometimes perfectly reasonable Vala code results in error messages like "Unable to parse accelerator '\u0008\x8dn\u000b\u0008': ignored request to install 501 accelerators" or downright segfaults: https://gitlab.gnome.org/GNOME/vala/issues/626
Either you get a bytecode format, or a platform specific one.
Don't forget C, with the amount of C code out there you'll have to make sure it can handle this new format. And by the time you limit yourself to the lowest common denominator between all those languages you'd probably just reinvent the C ABI anyway.
I'm fine with the ABI having an explicate this pointer (which python already does for classes) which allows C to call/implement classes as well, but it needs to be in the ABI (languages with first class classes can choose to hide the this pointer and v-table).
Operator overloading (which forces name mangling or something like it) is tricky though. As is exceptions. I don't have answers to that.
I don't think it really raise the bar much. Rust for example isn't an OO language so classes and v-tables won't be included. Go doesn't have templates so off they go. Operator overloading, multi inheritance, exceptions, garbage collection all go.
What your left with is essentially the C ABI. Being this lowest common denominator is why it's the the common ABI just as much as the success of the language.
What could be done, however, is to provide an ABI-as-a-library at each end then pass that with the C interface. Textual Specification, statically processed at each end into some kind of parser type thing. Still borderline suicidal effort, however, as not all languages (Go, I think?) treat the address space equally among a clusterfuck of other things. Any form of parametric polymorphism would be a disaster too, I assume.
Scripting support (IDispatch, WSH) is OK despite a bit outdated.
Registration infrastructure (HKCR), GUI widgets (ActiveX), threading model (these apartments) and inheritance of implementation (aggregation) ain't good, way overengineered.
Compound document format (structured storage), RPC both local and networked, are just horrible.
Embedding a MS Works worksheet in a MS Works document however was sometimes a good idea in '95.
I've yet to see a better implementation than that and I find it really disappointing that the state of the art has moved backwards in this area (in addition to UX and performance relative to hardware specs).
Stuff like OLE Automation is very much not simple, but it's not a requisite part of it all.
The problems and boilerplate usually affect those that insist in using the C level APIs.
How have something alike? Maybe put a ring buffer as mediator and push/pull a binary encoding format (messagepack?). Probably need a little of C for the bridge.
You make it sound like they're just cargo culting a shiny new language when in fact they invented the language in the first place to solve their complexity problems with C++.