Hacker News new | past | comments | ask | show | jobs | submit login
Speed of Rust vs. C (kornel.ski)
619 points by sivizius 35 days ago | hide | past | favorite | 525 comments

"But the biggest potential is in ability to fearlessly parallelize majority of Rust code, even when the equivalent C code would be too risky to parallelize. In this aspect Rust is a much more mature language than C."

Yes. Today, I integrated two parts of a 3D graphics program. One refreshes the screen and lets you move the viewpoint around. The other loads new objects into the scene. Until today, all the objects were loaded, then the graphics window went live. Today, I made those operations run in parallel, so the window comes up with just the sky and ground, and over the next few seconds, the scene loads, visibly, without reducing the frame rate.

This took about 10 lines of code changes in Rust. It worked the first time it compiled.

>> One refreshes the screen and lets you move the viewpoint around. The other loads new objects into the scene.

How did you do that in Rust? Doesnt one of those have to own the scene at a time? Or is there a way to make that exclusive ownership more granular?

Since this got so many upvotes, I'll say a bit more. I'm writing a viewer for a virtual world. Think of this as a general-purpose MMO game client. It has no built-in game assets. Those are downloaded as needed. It's a big world, so as you move through the world, more assets are constantly being downloaded and faraway objects are being removed. The existing viewers are mostly single thread, in C++, and they run out of CPU time.

I'm using Rend3, which is a 3D graphics library for Rust that uses Vulkan underneath. Rend3 takes care of memory allocation in the GPU, which Vulkan leaves to the caller, and it handles all the GPU communication. The Rend3 user has to create all the vertex buffers, normal buffers, texture maps, etc., and send them to Rend3 to be sent to the GPU. It's a light, safe abstraction over Vulkan.

This is where Rust's move semantics ownership transfer helps. The thread that's creating object to be displayed makes up the big vertex buffers, etc., and then asks Rend3 to turn them into a "mesh object", "texture object", or "material object". That involves some locking in Rend3, mostly around GPU memory allocation. Then, the loader puts them together into an "object", and tells Rend3 to add it to the display list. This puts it on a work queue. At the beginning of the next frame, the render loop reads the work queue, adds and deletes items from the display list, and resumes drawing the scene.

Locking is brief, just the microseconds needed for adding things to lists. The big objects are handed off across threads, not recopied. Adding objects does not slow down the frame rate. That's the trouble with the existing system. Redraw and new object processing were done in the same thread, and incoming updates stole time from the redraw cycle.

If this was in C++, I'd be spending half my time in the debugger. In Rust, I haven't needed a debugger. My own code is 100% safe Rust.

Wonderful! Thanks for sharing. This sounds like the exact sort of work that Rust is perfect for.

I'm making a game in Rust and Godot (engine) and since it's a factory game the simulation performance is important. Rust means I worry far less about stability and performance.

I bet if you wrote a good blog entry with screenshots and explanation of how your code loads and renders I imagine it would do well on HN.

Too soon. Someday perhaps a Game Developers Conference paper/talk. I was considering one, but live GDC has been cancelled for 2021. My real interest in this is how do we build a big, seamless metaverse that goes fast. I'm far enough along to see that it's possible, but not far enough along that people can use the client.

Rust is good for this sort of thing. It's overkill for most web back end stuff. That's where Go is more useful. Go has all those well-used libraries for web back end tasks. Parallelism in web back ends tends to be about waiting for network events, not juggling heavy compute loads of coordinated disparate tasks. Hence all the interest in "async" for web servers. As I've said before, use the right tool for the job.

One idea I liked was ...

You have an authoritive world simulation server, as usual.

You then have several servers whose chief job is to keep clients in sync with the authoritive server.

Most network games combine these two roles, but there is a lot of processing and network traffic required to keep clients in sync. For massive multiplayer there is a benefit to scale the "client-interaction" servers.

My question is probably off because I lack the knowledge but how do the commercial games/game engines do this then if this is such a rocket science? Something like Fortnite or an aged GTA do what you've described (downloading assets on demand without any fps-drop) for quite some time now.

The claim isn't that it's impossible, or "rocket science", it's that it's hard to do right and was made much easier. You're bringing up for comparison a game engine that has been in constant development by experts for over two decades (Unreal Engine) and a game engine in constant development for over a decade ago (RAGE). Just because someone makes a professional product using tens or hundreds of millions of dollars doesn't mean it was easy.

There's a reason why Epic is able to charge a percentage of sales for their engine, and companies still opt for it. That's because it's hard to do reliably and with good performance and visuals.

Yeah, but just picking one of many requirements in game dev and advocating why lang x can do this better than y ignores all the other checkboxes. Yeah, C++ is nerve-wrecking but Rust can be even more. IIRC there was a thread about Zig vs Rust and why Rust is just the wrong tool (for OPs use case in that case). IDK but there is a reason why C++ dominates game dev and a reason why Rust still struggles with mainstream adoption compared to same agers like Go or TS.

Game dev has nothing to do with it.

The claim was that Rust made making something parallel easy/easier, and the illustrative example given was someone trying to parallelize something in a game engine. Whether Rust is good for game development or not is irrelevant to the point that was being made.

Even if everyone accepted that Rust was horrible for game development on most metrics, the example given would still carry the exact same weight, because it's really about "parallelizing this would be very hard in C++, but it was very easy because of Rust."

The simplest (and often best) option is to use the Arc<Mutex<MyStruct>> pattern.

The Arc is an async reference counter that allows multiple ownership. And the nested Mutex enforces only one mutable borrow at a time.

I think you mean that Arc is an atomic reference counter (it uses atomic cpu instructions to prevent race conditions when incrementing and decrementing the ref count)

Ah yes, sorry. I remember it in my head with "async", but you're right. :)

Eh, with Arc you can share ownership easily, and there are probably a lot of cleverer concurrent data structures or entity component kinda things that'd just work too. But maybe you can arrange things so that one thread owns the scene but the other thread can still do useful work?

This typically isn't possible because the rendering context is global and is needed for both loading and rendering. You need an Arc to guarantee the correct Drop mechanism.

I'm not sure how your architecture but you might not even need to lock things. I find that using mpsc channels allows me to get around like 60% of locking. Essentially, you have some sort of main loop, then you spawn a tread, load whatever you need there and then send it to the main thread over mpsc. The main thread handles it on the next iteration of the main loop.


C being barebones does not mean it is faster. Because it has such weak typing and gives a huge amount of programmer freedom, compilers have to do a lot of work to be able to understand a C program well enough to optimise it.

Rust, on the other hand, requires the programmer to give the compiler more information about what they're doing.

A very simple example:

  void foobar(struct foo *f)
    f->a += 2;
    f->a += 2;
    f->a += 2;
Because C pointer types are so barebones, the compiler can't tell whether foo() and bar() can modify f->a just from looking at the above code. So it will always have to load and store that for each += operation.

Rust on the other hand has two kinds of reference, rather than pointers:

  fn foobar(f : &mut foo) {
    f.a += 2;
    f.a += 2;
    f.a += 2;
This is more high-level. But it's good for performance! Rust has a rule that you can only have one mutable reference to a struct at one time. Therefore, foo() and bar() can't be modifying f.a and it can simplify this to `f.a += 6;`.

(You can see it in action for yourself here: https://godbolt.org/z/hWs67P. Sadly, Rust doesn't do this by default due to problems with LLVM, but eventually it can.)

Lately on another Rust thread somebody pointed out that C programs use a lot of indirection like pointers and vtable dispatches which actually detracts from the supposed mega-speed of the low-level C. I found that to be mind-blowing and felt stupid for not remembering that earlier.

I don't think vtables are often used in C.

I think they meant function pointers in general (callbacks and the like, see qsort). Also, a lot of C codebases end up implementing manual vtables of some sort where polymorphism is required.

It's true they aren't that common, but have you looked at the Linux kernel code? It's written in C, and vtables are everywhere.

I've also used them on occasions. They do a particular job people call pluggable implementations / dynamic typing / sub-classing depending on their background. If there isn't some central place that knows all the types (if there is you can use switch/case/match), there isn't much choice, it's vtables or nothing. But needing to do that job is rare, so seeing it in languages that don't use vtables promiscuously to implement OO is correspondingly rare.

Which is as it should be, because vtables are more expensive than a direct call, very expensive if they prevent inlining. One of rust's triumphs is it provides them, but you have to go out of your way to use them so it encourages avoiding their overheads.

Dynamic dispatch (vtables are one way to do it, and so is a function pointer) is incredibly common in C, because there's no language-built-in way to perform monomorphization.

> Sadly, Rust doesn't do this by default due to problems with LLVM, but eventually it can.

Can it, though? I keep hearing/reading FUD around both unsafe and Pin making it unlikely to ever be able to ubiquitously enable the noalias stuff.

You have correctly identified it as FUD. People have a bone to pick with Pin, so they irrationally latch on to it, but the general problem is the fact that the &mut invariants cannot currently permit any self-referential data, which is a useful concept in general (for intrusive data structures, etc) and whose lack that people have been hacking around since before 1.0, with crates like rental and owning_ref. The plan to fix this is to make self-referentiality a first-class concept in the language, as a principled exception to the usual &mut uniqueness invariant that preserves memory safety while properly encoding the aliasing guarantees.

> The plan to fix this is to make self-referentiality a first-class concept in the language, as a principled exception to the usual &mut uniqueness invariant that preserves memory safety while properly encoding the aliasing guarantees.

Are there issues I can subscribe to or RFCs for this?

Source? I would like this to be true, but I haven't actually seen it anywhere.

https://news.ycombinator.com/item?id=26410487 (it's a significant part, though not the entirety, of this comment)

Code affected by unsafe already has to go through `UnsafeCell` which disables most aliasing optimizations, and the stacked borrow semantics have explicit opt-outs for such cases. I don't believe there are any Rust semantic issues standing in the way of exploiting aliasing information in this way, just LLVM bugs (and the fact that its restrict model isn't currently equipped to handle the information at such a fine granularity).

There is undefined behavior in Rust affecting real-world code, including Tokio's scheduler, and code produced by async fn definitions. UnsafeCell doesn't solve the problem. There's more information at https://gist.github.com/Darksonn/1567538f56af1a8038ecc3c664a....

Bug report at https://github.com/rust-lang/rust/issues/63818.

Reddit threads at (older) https://www.reddit.com/r/rust/comments/l4roqk/a_fix_for_the_... and (newer) https://www.reddit.com/r/rust/comments/lxw6cl/update_to_llvm....

Somewhat related HN thread at https://news.ycombinator.com/item?id=26406989.

This is the `Pin<&mut>` example which I'm not that familiar with but was aware of. I think it's highly unlikely that this one, relatively niche use case is going to prevent Rust from ever being able to safely turn on aliasing optimizations. There have been several solutions proposed, e.g. adding a stricter UnsafeCell to the language; they may technically not be backwards-compatible, but given how `Pin` is used and the fact that this is a soundness issue, I think it should be fine.

The HN thread is mostly unrelated. I agree that it would have been better to integrate `Pin` directly into the language, though, but mostly for ergonomic reasons; it could still probably happen in an edition upgrade.

The github discussion thread where that originates from also contain the language designers optimistically discussing how in the worst case `Pin` just has to be hardcoded to exclude these optimizations. Which wouldn't make it the first type in the std lib that works similary (see interior mutability and UnsafeCell), so I personally intepret that as "unlikely to be an issue".

That's what restrict is for:

    void foobar(struct foo *restrict f) { ... }

And just like that we're back to

> There's a significant difference between what these languages can achieve in theory, and how they're used in practice.

Theoretically in C you can use restrict for this. Practically nobody does (rust users keep finding bugs and miscompilation in LLVM's noalias support), and it's a huge footgun because you're completely on your own and all bets are off if you misuse it.

Meanwhile in rust land, it's the default and the compiler checks your homework, and while you can always lie to the compiler it's much rarer that you'd even get in a position to do so.

But restrict is very hard to reason about correctly, and the consequences for making a mistake are potentially catastrophic.

There's a very real difference between what's possible in theory and what humans do in practice. I wish Hacker News people engaged more with the practice.

Yes, C99 added that keyword, and you can use it in C code. (C++ is a more complicated matter, I believe…)

But I think it's pretty uncommon in practice. One reason is that the C compiler has no borrow checker to help you notice when you're using `restrict` unsafely.

(Also, there's the whole “strict aliasing” hell…)

It's not that uncommon in HPC codes. I.e. one of the places where speed matters.

If you can have "one mutable" and "more than one non-mutable" reference (not sure if this is the case), you still cannot safely do that optimization, as foo() and bar() could in principle be reading f.a and the optimized version would then not have the correct values of f.a when foo() and bar() are called.

Note, I genuinely don't know if this caveat applies to Rust. But, m general, not only mutable references need to be considered.

> My overall feeling is that if I could spend infinite time and effort, my C programs would be as fast or faster than Rust, because theoretically there's nothing C can't do that Rust can.

The exact same argument applies to assembly code. There are very good reasons that it's not used nowadays except in incredibly rare circumstances or in the embedded world.

It doesn't matter in the slightest how fast your language is in theory. Not one iota.

The only thing that matters is how fast the programs you write with it are in practice. The evidence is clear: it is significantly easier to write faster programs in Rust than it is in C, and this applies to even the most skilled developers.

The very short version:

There are many things which cause both frequent and rare strange crashes in multi-threaded code.

In C, all these things will compile OK. In safe Rust, practically none of them will compile.

It is much easier to fix compile issues in Rust to do with dangerous memory usage or thread sharing than it is to debug a compiled program.

Accessing files or database content from multiple threads without proper locking, or transactions, in place will compile just fine.

Rust won’t prevent all manner of bugs, but if you need to prevent multiple parts of your application from accessing a resource concurrently, it’s fairly trivial to design types that would guarantee that at compile time.

If you need to do it across processes, then you need a file system that supports exclusive file opens.

The db question is a red-herring, because to properly solve for concurrency in the db, you need to use locks in the db (table and row locks), not the code accessing the db.

You dont necessarily need to use locks in the db. You can use transactions as well, unless you want to do major changes to the DB where transactions would be in constant conflict state.

You’re right, though that wasn’t really the point of my reply. More that concurrency in the DB is a DB design choice and not the responsibility of the language accessing the DB. This is especially true when accessing the DB from multiple nodes, which most deployed software does.

Locks and transactions are intertwined. Write locks are typically released only when the transaction ends, and read locks too at some transaction isolation levels.

> Rust won’t prevent all manner of bugs

Nit: rust will prevent all manner of bugs. Rust won't prevent every manner of bugs.

Indeed, but the fearless concurrency sales pitch tends to overlook that.

It’s hard to see how you could be arguing in good faith given that your point of “fearless concurrency” being an overstated “sales pitch” has now been answered with multiple substantive answers.

Please stop moving the goal posts from “program concurrency” to “distributed or multi-process transactions.” It subtracts from the conversation.

Rust is honest about what it does and doesn’t do.

Rust is honest, https://doc.rust-lang.org/nomicon/races.html

I am not moving goal posts, rather talking about issues that many apparently lack the knowledge to understand how many variants of data races exist in an application.

Quite understanable given the majority of developers that keep writing single threaded applications.

The fearless concurrency sales pitch doesn't overpromise. You're trying to stretch it, but it's pretty clear on what kind of concurrency issues it covers (eg. data races).

Yeah it would be a grave misinterpretation of "fearless concurrency" to think Rust somehow validates that access to some shared resource with it's own semantics is also safe. I'm not educated on the subject but that problem seems pretty intractable for a language to solve in a general sense.

Indeed, yet that is not how many Rust advocacy blog posts sell it.

Such as?

Any blog post that gives examples accessing shared variables and never goes beyond anything else, leaving to the reader that it works the same way regardless of what resources are being accessed.


Now rewrite the same blog post using SQL queries to update the same counter variable on a table row.

The Rustonomicon is quite clear that it wouldn't work,

> However Rust does not prevent general race conditions.


You really had to dig a post that has less text than code?

> leaving to the reader that it works the same way regardless of what resources are being accessed.

If that's how readers envision that non-language-managed resources work, they should go back to CS102 right now.

And concerning the nomicon, it explicitly states that Rust can't handle all races condition, I don't get your point.

What a silly point to make. The code you’re writing in rust could launch a nuke that destroys the world and it will also compile just fine.

Rust has the expressive power to design an API that explicitly makes it impossible if you so desire though. For instance by using an API similar to mutexes where you have to lock to access the contents.

> However Rust does not prevent general race conditions.


I never claimed otherwise. If you want to enforce invariants in your DB API you'll have to implement them yourselves, Rust won't do it for you because Rust doesn't know what a database is. Still, it's perfectly possible to design a DB API is such a way that it would prevent some issues, such as for instance doing changes without an active transaction.

But you're right to point out that Rust won't magically solve all your possible sources of crash and data corruption, it just does make it a lot easier to enforce arbitrary constraints. For instance if you decide that in order to avoid these issues you want all your DB stuff to only run in a single thread you could design your API in a way that would prevent DB handles from being shared between threads, all that enforced by the language. You couldn't accidentally leak a DB handle to a different thread.

> Accessing […] database content from multiple threads without proper locking, or transactions, in place will compile just fine.

All the rust db interfaces I've seen so far forbid unlocked sharing of connections, statically (the connections are !Sync, in Rust speak).

They don't enforce the use of transactions, nor exclusive access to database tables.

show me a c program that does this in a manner that a rust program can't emulate

C doesn't advertise fearless concurrency.

> Code converted from C to Rust seems much more voluminous.

This is a fairly odd claim.

If you have to deal with strings (ASCII and UTF-8) properly, C is stupidly verbose.

If you need a data structure more complex than an array of something, C is stupidly verbose.

If you want to deal with pattern matching/regexen, C is ridiculously verbose.

Do I agree that Rust is far more verbose for an embedded "blinky" (the embedded equivalent of "Hello, World!")? Yes.

But once I start doing things like processing messages in a communications stack (BLE, CANOpen, Ethernet, etc.), Rust starts looking better and better.

Blink in rust is concise as well:

  // Main function
  fn main() -> ! {
    let peripherals = arduino_uno::Peripherals::take().unwrap();

    let mut pins = arduino_uno::Pins::new(

    // Pin D13 is connected to L led
    let mut led = pins.d13.into_output(&mut pins.ddr);

    loop {

When I converted code from C to Rust, I was left with much more code.

Are you just noting specifics, or was it your experience that there was less Rust required for a full C to Rust conversion?

AFAIK it is a common experience that both C and C++ code tend to become fewer LOC in a move to Rust, a lot of it due to stuff like serde that greatly reduces boilerplate. It probably depends on your application and program size, though. I'm sure that if you're doing a lot of pointer wrangling or something it won't be smaller in Rust, while if your C program was just a bunch of SIMD intrinsics anyway like a lot of high performance code is now it'll probably be basically the same length.

Did you ever take a look to try and figure out where the extra verboseness came from? For instance, was it spread equally across all functions or concentrated in those doing numeric computations (or whatever)? It would be interesting to learn something from this example.

Did you convert it by hand and translate the code into idiomatic rust? Or just run some C code through c2rust? Because the latter will not produce anything like a typical rust program, it will be C types and logic with minimal changes to make it compile in rust (and those changes will make it look very verbose).

for one thing, naming a C struct requires more words than naming a Rust struct; for another, the names of functions that operate on them

I mostly agree with your post but:

> If you need a data structure more complex than an array of something, C is stupidly verbose.

What’s stupidly verbose about Structs to you?

"Data structure" here probably means something like a linked list, binary tree, or hash table.

Linked list, binary tree, and heck, even a hash table aren’t “stupidly verbose” to me in C either, hence my question to GP.

You have to allocate and deallocate--where and when does that occur? When the data structure needs to grow, all pointers need to be invalidated--where and when does that occur? How do you iterate across the data structure? Do those iterators hold a local pointer or do they hold a root pointer plus an accessor function pointer plus memoization? What happens when you want to copy a subset of the data structure? I can go on and on.

Every one of those things requires a fairly primitive function call that sometimes doesn't even need to exist in another language. For contrast, think about all the macrology that the Linux kernel does to iterate across every element of a linked list that is effectively "for element in list {}" in any higher language.

And, even worse, most of that stuff in C needs to be runtime-only, while some of this kind of thing gets elided by the compiler in other languages.

If you don't consider all that "stupidly verbose" I'm really curious what you would?

And this is long before we start talking about concurrency.

C allows you to write something that works "just enough" that you put it into production and then it bites you in the ass (similar to quadratic algorithms--functional enough to get to production and then bite you).

The author did say that there is nothing that Rust does that C cannot. The difference is that in Rust, those things are easier, or many times, the default way, while in C, you would have to take care of way too many things to make sure things work.

I used to say that wrt to memory leaks with Java and C++ and it remains true when comparing C and Rust:

It's not that it's easier to write programs in [Java|Rust]. It's that it makes it much harder to write the bugs.

Yesterday I upgraded an entire code-base from C to C++ just because it was faster than writing my own dynamically resizing array in C.

Writing programs in C is harder just because there are essentially no containers in the stdlib.

Aren't there good container libraries for C?

There are a lot of great libraries for sure, but they aren't in the stdlib and C doesn't make it as easy to use external libraries as languages with modern tooling. Everybody gets grumpy about dependencies and a lot of people probably figure it's easier to maintain their own container code in their application than to deal with that.

They are either based on void * with performance issues, or macro based with weird ergonomics that look like function calls, but aren't.

It's doable, but not very easy.

You are right in a library-demographical sense, but not in a fundamental sense. There is a 3rd way. Have a look at the CTL I linked to (downvoted..maybe I should have explained more?).

Once you give up the closed source/prebuilt binary library idea and embrace the C++-like header library idea and write implemenations in terms of "assumed macro/inline function" definitions, the problem becomes straightforward with no performance issue and different ergonomics issues than you probably think.

It's a more "manual instantiation" than C++ templates or generics in other languages where just refering to them works, but most of C is quite manual. So, it fits the headspace & the hard parts of data structures/meddlesome hands remain factored out. Since you parameterize your files/code with #define/#include, you have to name your parameters which can make the instantiating client code more obvious than C++ templates with many arguments. OTOH, there is no/poor type checking of these parameters.

I had a look, and it feels like template programming but with even worse guarantees.

Having a type declaration dependent on #define P whether it is plain old data or not, and needing to know what that means, is not the kind of ergonomics I'd want. That requires learning a whole new paradigm to ensure I am not doing wrong things.

In my mind it is so big an extension of the C language, that it leaves the C headspace and becomes its own headspace.

Yeah. It's not for everyone. I think "different ergonomic issues" may cover that and I did mention the type checking already. :-)

It is a smaller learning curve from pure C than "all of Rust" or even "all of C++/STL". You got the basic idea in short order (that may be for ill as well as good..I was never trying to make a normative claim).


But it’s not just taking care of those things that must be done; it seems to bloat the code.

I think it’s a great alternative to C for large apps like Firefox, and perhaps it’s a good alternative to Go for services.

For general purpose, I personally want something fast that’s easy, clear, and concise like Ruby.

Do I just accept that Rust is the most evolved version of C, or is my gut correct that it’s bloated? Is it a good choice for general purpose as-is?

As someone who's worked predominantly in high level languages (Scala, Ruby, etc...) I've found Rust to be relatively straightforward to use for simple CLI tools or gRPC servers (tonic library is pretty nice). I haven't tried building a CRUD app yet, but I don't see any real reason why it would be impossible to have an ergonomic web framework in Rust.

The things I find most difficult: 1. Wrangling with the borrow checker can be painful before you know what you're doing (and even afterwards) but if you understand the standard library/patterns well, it seems to minimize the cost. Example being trying to write your own `get_or_else_insert` style method for a HashMap. Writing your own version is easy in other languages but hard in Rust. If you didn't know that methods like that already exist on HashMap you will experience a lot of pain until you understand the "right" way to do something.

2. Shared memory concurrency is definitely at the nexus of all the more difficult parts of Rust. Especially with async/await. There's no question in my mind that if you want to write a webserver that has async functions accessing shared memory, that you will for sure need to fill in any gaps in your knowledge as it will be difficult to get to a working program without understanding significantly more concepts than what it might take for a simple single-threaded CLI app

I'm pretty sure that it will be possible (if it isn't already) to get the Rust ecosystem to a state where writing a CRUD app is about as simple as in Go (and considerably easier/more ergonomic w.r.t certain things like JSON serialization).

Firefox's is compiled code is mostly written in C++ not C. You conflate C with C++, Java, and C#. C++ while has source compatibility with C tends to end up much different than C. C will not give you the OOP hell-scape you can dig yourself into with those three languages. C code tends be simpler and much more close to the assembly that will be generated than you would get in those language. Moreover, Java and C# are not even compiled. Anyways C != C++. C++ has changed quite a bit since it's earlier days and has diverged from plain C in a lot of ways. Some people even feel C++ keeps adding too many new features too fast.

> Some people even feel C++ keeps adding too many new features too fast.

It's not so much that they keep adding features, but that they (almost?) never remove any.

Python 3 has shown the world what happens when that is done without bringing the ecosystem along.

Haskell has also removed some features over time.

Haskell or GHC?

If you mean GHC, the language version is whatever the pile of configuration flags at each source file ends up meaning, some of them even contradict themselves.

Great for language research, which is Haskell main purpose in life, hardly a good idea for getting industry love.

I am talking about Haskell, specifically the removal of 'n+k patterns' and 'monad comprehensions'.

About GHC: I think their approach with pragmas is great and something for other languages to emulate. It's also great in production, with the caveat that you might want to restrict that mechanism to surface level changes only, and nothing that changes the intermediate format.

Are there many other programming languages that remove features?

Python 3 made some backwards incompatible changes. Haskell removed some features over time, as well.

> C++ while has source compatibility with C

It doesn't really, some things from C won't compile in C++, but that's a minor nitpick

I guess people mostly take source compatibility to mean that you can write the headers for your C library so that it can be used from C++. That's not the same thing as C being a proper subset of C++ or whatever, but it's still a vast enough advantage of C++ over most competitors that it might as well be.

Ruby is not fast.

But if you want something between rust and ruby, checkout crystal (garbage collected, strongly typed with inference, llvm binary, ruby-like syntax, fast like golang).

> Code converted from C to Rust seems much more voluminous. Perhaps it’s easier to maintain if you know Rust well?

My experience is mostly the opposite. For many tasks Rust can be much more concise (see anything with string handling, for example).

> Code converted from C to Rust seems much more voluminous.

I haven't seen that in practice. A good point of reference are implementations of things like ruby, python, or the erlang vm in rust compared to the C alternatives. This might be because Rust is also more expressive (probably by borrowing certain syntax/semantics from ocaml/haskell), though the borrow checker does add back some verbosity.

But Rust works badly with mmapped (memory-mapped) files, as the article notes. So in C you could load (and save!) stuff almost instantly, whereas in Rust you still have to de-serialize the input stream.

No you don't. I've written multiple programs that load things instantly off the file system via memory maps. See the fst crate[1], for example, which is designed to work with memory maps. imdb-rename[2] is a program I wrote that builds a simple IR index on your file system that can then instantly search it by virtue of memory maps.

Rust "works badly with memory mapped files" doesn't mean, "Rust can't use memory mapped files." It means, "it is difficult to reconcile Rust's safety story with memory maps." ripgrep for example uses memory maps because they are faster sometimes, and its safety contract[3] is a bit strained. But it works.

[1] - https://github.com/BurntSushi/fst/

[2] - https://github.com/BurntSushi/imdb-rename

[3] - https://docs.rs/grep-searcher/0.1.7/grep_searcher/struct.Mma...

I didn't read your code but one problem I suspect you ran into is that you had to re-invent your container data structures to make them work in a mmapped context.

No, I didn't. An fst is a compressed data structure, which means you use it in its compressed form without decompressing it first. If you ported the fst crate to C, it would use the same technique.

And in C, you have to design your data structures to be mmap friendly anyway. Same deal in Rust.

But this is moving the goal posts. This thread started with "you can't do this." But you can. And I have. Multiple times. And I showed you how.

> which means you use it in its compressed form without decompressing it first.

So your code operates directly on a block of raw bytes? I can see how that can work with mmap without much problems.

My argument was more about structured data (created using the type system), which is a level higher than raw bytes.

> So your code operates directly on a block of raw bytes? I can see how that can work with mmap without much problems.

Correct. It's a finite state machine. The docs of the crate give links to papers if you want to drill down.

> My argument was more about structured data (created using the type system), which is a level higher than raw bytes.

Yes. You should be able to do in Rust whatever you would do in C. You can tag your types with `repr(C)` to get a consistent memory layout equivalent to whatever C does. But when you memory map stuff like this, you need to take at least all the same precautions as you would in C. That is, you need to build your data structures to be mmap friendly. The most obvious thing that is problematic for mmap structures like this that is otherwise easy to do is pointer indirection.

With that said, this technique is not common in Rust because it requires `unsafe` to do it. And when you use `unsafe`, you want to be sure that it's justified.

This is all really besides the point. You'd have the same problems if you read a file into heap memory. The main problem in Rust land with memory maps is that they don't fit into Rust's safety story in an obvious way. But this in and of itself doesn't make them inaccessible to you. It just makes it harder to reason about safety.

Dang, burntsushi up in the house! Hey, just wanted to say I enjoy your work––I've learned a lot from it. Thank you!

It's very tedious to debate with someone who explicitly makes assumptions about something (like code) without having read it, and puts the burden of refuting those assumptions on you...

It doesn’t say it “works badly” it says the borrow checker can’t protect against external modifications to the file while memory-mapped, which has a host of issues in C as well.

You can mmap files in Rust just fine, but it’s generally as dangerous as it is in C.

I don’t get this obsession with “dangerous.” Honestly, what does that even mean? I think a better word is “error-prone.” Danger is more like, “oh my god a crocodile!”

> Honestly, what does that even mean?

It has a very specific meaning in Rust: the user can cause memory unsafety if they make a mistake.

> I think a better word is “error-prone.”

The issue with the connotation there is that it's not about the rate of problems, it's about them going from "impossible" to "possible."

There can be real danger when the code is used in certain applications. For example when controlling the gate of the crocodile cage in a zoo.

Concurrency bugs can absolutely cause dangerous danger of the deadly variety:


Unfortunately, as is most always the case of negligence instead of some particular language features:

“A commission attributed the primary cause to general poor software design and development practices rather than single-out specific coding errors. In particular, the software was designed so that it was realistically impossible to test it in a clean automated way.“

Ergo, concurrency doesn’t kill people, people do.

You sound like you make a refutation, but you really don't. This whole discussion is about giving tools to developers that are systematically less error-prone, which your quote suggests would have been helpful to that specific development team.

the main problem here is that C has the capability to declare mmap regions correctly: `volatile char[]` and Rust does not (`[Cell<u8>]` is close but not exactly right, and annoying)

most rust folks who use mmap don't mark the region as Celled, which means they risk UB in the form of incorrect behavior because the compiler assumes that the memory region is untouchable outside the single Rust program, and that's not true

(it's also not true generally b/c /dev/mem and /proc/pid/mem exist, but it's beyond Rust's scope that the OS allows intrusion like that)

Errors are up to interpretation. It just means the thing didn't happen as requested. Errors are meant to be expected or not expected depending on the context.

Dangerous means dangerous. It's not up for interpretation.

Languages have multiple, very different words, for exactly this reason.

Agreed. But still, folks make it sound bad. For instance “danger” in the many context could also be reframed as “powerful”, could it not?

But that may be of little solace. If you snapshot your entire heap into an mmapped file for fast I/O, then basically the entire advantage of Rust is gone.

Is there literally no other code in the application?

Rust has plenty of situations where you do unsafe things but wrap that in safe APIs. If you’re returning regions of that mmapped file, for example, a lifetime can be associated to those references to ensure that those are valid for the duration of the file being mmapped in the program.

It can be used to ensure that if you need to write back to that mmapped file (inside the same program) that there are no existing references to it, because those would be invalid after an update to the file. You need to do the same in C, but there are no guardrails you can build in C to make that same assurance.

> If you snapshot your entire heap into an mmapped file for fast I/O,

I've never heard of this trick. And my first reaction is "That would be a nightmare of memory unsafety if I did it in C++"

What's it used for? IPC?

I think emacs (used to?) do something awful like this. https://lwn.net/Articles/707615/

I'd call mmaping data structures into memory an advanced systems programming trick which can result in a nice performance boost but which also has some severe drawbacks (portability across big/little endian architectures and internal pointers being two examples).

I know some very skilled C++ and Rust developers who can pull it off. If you're at that skill level, Rust is not going to get in your way because you're just going to use unsafe and throw some sanitizers and fuzzers at it. I wouldn't trust myself to implement it.

You have to combine it with other techniques, e.g. journaling to make it safe, but this is not always necessary (e.g. when using large read-only data-structures)

In C you can access pointers to memory mapped files effortlessly in ways that are often extremely unsafe against the possible existence of other writers and against the making being unmapped and mapped elsewhere. It’s also traditional to pretend that putting types like int in a mapped file is reasonable, whereas one ought to actually store bytes and convert as needed. Rust at least requires a degree of honesty.

is it something deeply ingrained to rust? or is it something rust is working on?

It's more like, Rust wants to make guarantees that just aren't possible for a block of memory that represents a world-writable file that any part of your process, or any other process in the OS, might decide to change on a whim.

In other words, mmaped files are hard, and Rust points this out. C just provides you with the footgun.

The problem is that compilers are allowed to make some general assumption about how they're allowed to reorder code, always based on the assumption that no other process is modifying the memory. For example, the optimizer may remove redundant reads. That's a problem if the read isn't really redundant -- if the pointer isn't targeting process-owned memory, but a memory mapped file that's modified by someone else. Programs might crash in very "interesting" ways depending on optimization flags.

C has this issue as well, but Rust's compiler/borrow checker is particularly strong at this kind of analysis, so it's potentially bitten even harder.

While it works great for some cases, one should not forget it doesn't cover external resources, specially those shared across processes.

You have made this claim multiple times. Why do you see this as a language issue and not an OS issue? It becomes an even bigger problem when we talk about distributed systems and distributed resources. Is there a language that handles this?

These issues about multiple processes and distributed systems are framework and OS level concerns. Rust helps you build fast concurrent solutions to those problems, but you’re correct that it can not solve problems exterior to the application runtime. How is that a deficiency with Rust?

Fearless concurrency sales pitch.

Yes languages like Erlang and runtimes like Coyote and Orleans.

Erlang has a great concurrency model with higher overhead than Rust, but similar cross thread safety, doesn’t do anything about exterior resources to the application.

I’ve not worked with Coyote, but if it is the system for .net, it describes itself as a framework, “Coyote provides developers a programming framework for confidently building reliable asynchronous software on the .NET platform”.

Orleans similarly describes itself as a framework, “Orleans is a cross-platform software framework for building scalable and robust distributed interactive applications based on the .NET Framework.”

Rust is a language, similar frameworks are being built with it, the point your making does not appear to be about the language.

If I understand correctly, the Erlang point was, that you can have a distributed system by using Beam to scale to multiple machines and have them communicate via message passing, which is all possible and encouraged, because of how you structure and write code in Erlang, as actors with mailboxes, isolating actors from each other, except for the messages, that are passed.

You say, that the Erlang concurrency model has higher overhead than Rust. In Rust there are probably multiple projects going on right now (one of them is Bastion, but I guess there are probably others), which try to provide Erlang like concurrency. What do you mean by overhead of a concurrency model (that of Erlang) being higher than the overhead a programming language (Rust)? As far as I know Erlang's lightweight processes are about as lightweight as you can get. Is there a Rust framework for Erlang like concurrency, which reduces the footprint of lightweight processes even more?

That wasn’t meant to be a snide comment about Erlang in any way. All I meant by the comment about higher-overhead was that the language itself generally has more costs to run, i.e. runtime, memory usage, garbage collector, interpreted, etc, than Rust.

The “process” model of Erlang is about as lightweight as you can get, agreed.

In terms of capabilities of beam across systems, point taken. Though we start stretching some of the understanding of where languages end and runtimes begin... Rust and C make those boundaries a little more clear.

How could an OS adapt its processes functionality to help Rust here?

Capabilities would probably be helpful :V

Without real world data "fearlessly parallelizing all the things!" is an awful idea due to all the overhead involved.

The most important design decision while writing a parallel algorithm is to decide for what amount of data is not worth it.

He tried with few effort and noticed that for his use case the code is faster, I fail to understand this rebuttal of the parent's comment

The average cellphone today has more than 4 cores. A decent desktop can deal with 16 threads on 8 cores.

There is a lot of untapped parallelism readily available waiting for the right code.

It's not about number of available threads, the very act of scheduling tasks across multiple threads has scheduling and communication overheads, and in many situations actually ends up being slower than running it on the same thread.

That said, I think the original comment was rightly pointing out how easy it was to make the change and test it, which in this case did turn out to be noticeably faster.

Parallelization is the nuclear energy of comp science. Loads of potential, high risk reward and they would have gotten away with it, if it were not for those meddling humans. Its non-trivial and can only be handled by accomplished engineers. Thus it is not used - or is encapsulated, out of sight, out of reach of meddling hands. (CPU Microcode shovelling non-connected work to pipelines comes to mind / NN-Net training frameworks, etc.)

> [...] or is encapsulated, out of sight, out of reach of meddling hands.

That's the real issue here! Most language have poor abstractions for parallelism and concurrency. (Many languages don't even differentiate between the two.)

Encapsulating and abstracting is how we make things usable.

Eg letting people roll hash tables by themselves every time they want to use one, would lead to people shooting themselves in the foot more often than not. Compared to that, Python's dicts are dead simple to use. Exactly because they move all the fiddly bits out of the reach of meddling hands.

> Exactly because they move all the fiddly bits out of the reach of meddling hands.

You don't even need to hide those out of reach. You just need to make it dead simple to use and pretty much nobody will want to touch the fiddly bits.

And those who do know they had it coming.

Oh, definitely. You just need to make it so that people don't have to fiddle with it and don't fiddle with it by accident.

You seldom have to protect against malicious attempts. (But when you do, it's a much harder task.)

As a general thought about parallelizing all the things it's true though. When looking for speedups, parallelization granularity has to be tuned and iterated with benchmarking, else your speedups will be poor or negative.

I think the example case in this subthread was about making some long app operations asynchronous and overlapping, which is a more forgiving use case than trying to make a piece of code faster by utilizing multiple cores.

Also Rust is risky to parallelize: you can get deadlocks.

I don't get the obsession of parallel code in low level languages by the way. If you have an architecture where you can afford real parallelism you can afford higher level languages anyway.

In embedded applications you don't usually have the possibility to have parallel code, and even in low level software (for example the classical UNIX utilities), for simplicity and solidity using a single thread is really fine.

Threads also are not really as portable as they seem, different operating systems have different way to manage threads, or even don't supports thread at all.

This is a bad take. ripgrep, to my knowledge, cannot be written in a higher level language without becoming a lot slower.[1] And yet, if I removed its use of parallelism by default, there will be a significantly degraded user experience by virtue of it being a lot slower.

This isn't an "obsession." It's engineering.

[1] - I make this claim loosely. Absence of evidence isn't evidence of absence and all that. But if I saw ripgrep implemented in, say, Python and it matched speed in the majority of cases, I would learn something.

Python isn't really something I would even think as possible example, Common Lisp, D, Nim, Swift, most likely.

So? I said, "higher level language." I didn't say, "Python specifically."

I would guess D could do it.

I don't know enough about Nim or Swift.

I would learn something if Common Lisp did it. I'd also learn something if Haskell or Go did it.

I am not trying to contradict anyone here, but any language mature enough to have an impl/way to not have arbitrary performance ceilings needs access to inline assembly/SIMD. Cython/Nim/SBCL can all do that..probably Haskell..Not so sure about Go or Swift. Anyway, many languages can respond well to optimization effort. I doubt anyone disagrees.

At the point of realizing the above no ceiling bit, the argument devolves to more one about (fairly subjective) high/low levelness of the code itself/the effort applied to optimizing, not about the language the code is written in. So, it's not very informative and tends to go nowhere (EDIT: especially when the focus is on a single, highly optimized tool like `rg` as opposed to "broad demographic traits" of pools of developers, and "levelness" is often somewhat subjective, too).

You're missing the context I think. Look at what I was responding to in my initial message in this thread:

> If you have an architecture where you can afford real parallelism you can afford higher level languages anyway.

My response is, "no you can't, and here's an example."

> but any language mature enough to have an impl/way to not have arbitrary performance ceilings needs access to inline assembly/SIMD

If you ported ripgrep to Python and the vast majority of it was in C or Assembly, then I would say, "that's consistent with my claim: your port isn't in Python."

My claim is likely more subtle than you might imagine. ripgrep has many performance sensitive areas. It isn't enough to, say, implement the regex engine in C and write some glue code around that. It won't be good enough. (Or at least, that's my claim. If I'm proven wrong, then as I said, I'd learn something.)

> At the point of realizing the above no ceiling bit, the argument devolves to more one about (fairly subjective) high/low levelness of the code itself/the effort applied to optimizing, not about the language the code is written in. So, it's not very informative and tends to go nowhere.

I agree that it's pretty subjective and wishy washy. But when someone goes around talking nonsense like "if parallelism is a benefit then you're fine with a higher level language," you kind of have to work with what you got. A good counter example to that nonsense is to show a program that is written is a "lower" level language that simultaneously benefits from parallelism and wouldn't be appropriate to do in a higher level language. I happen to have one of those in my back-pocket. :-) (xsv is another example. Compare it with csvkit, even though csvkit's CSV parser is written in C, it's still dog slow, because the code around the CSV parser matters.)

Ok. "Afford parallelism => afford high level" with the implication of HL=slow does sound pretty off base. So, fair enough.

FWIW, as per your subtle claim, it all seems pretty hot spot optimizable to me, at least if you include the memchr/utf8-regex engine in "hot spot". I do think the entire framing has much measurement vagueness ("hot", "vast majority", "levelness", and others) & is unlikely to be helpful, as explained. In terms of "evidence", I do not know of a competitor who has put the care into such a tool to even try to measure, though. { And I love rg. Many thanks and no offense at all was intended! }

ack might be an example. It's Perl, not Python, and its author is on record as saying that performance isn't his goal. So it's a bit of a strained one. But yes, it's true, I don't know any other serious grep clone in a language like Python. This is why I hedged everything initially by saying that I know that absence of evidence isn't evidence of absence. :-) And in particular, I framed this as, "I would learn something," rather than, "this is objective fact." So long as my standard is my own experience, the hand wavy aspect of this works a bit better IMO.

> I do not know of a competitor who has put the care into such a tool to even try to measure, though.

Right. Like for example, I am certain enough about my claim that I would never even attempt to do it in the first place. I would guess that others think the same. With that said, people have written grep's in Python and the like, and last time I checked, they were very slow. But yeah, the "development effort" angle of this likely makes such tools inappropriate for a serious comparison to support my claim. But then again, if I'm right, the development effort required to make a Python grep be as fast as ripgrep is insurmountable.

> it all seems pretty hot spot optimizable to me

As long as we're okay with being hand wavy, then I would say that it's unlikely. Many of the optimizations in ripgrep have to do with amortizing allocation, and that kind of optimization is just nearly completely absent in a language like Python unless you drop down into C. This amortization principle is pervasive and applies as deep as regex internals to the code the simply prints ripgrep's output (which is in and of itself a complex beast and quite performance sensitive in workloads with lots of matches), and oodles of stuff inbetween.

> { And I love rg. Many thanks and no offense at all was intended! }

:-) No offense taken. This is by far the best convo I'm having in this HN post. Lol.

Note that I've made similar claims before. In the last one, there is a lot more data: https://news.ycombinator.com/item?id=17943509

When I used to write in Cython + NumPy I would pre-allocate numpy arrays written into by Cython. It's C-like, but because of the gradual typing I think firmly in the higher level (for some value of "er"). One can certainly do that stuff in Nim/SBCL/etc. (and one sees it done).

While allocation is pretty pervasive, I'm skeptical that everywhere or even most places you do it is an important perf bottleneck. Without a count of these 20 times it matters and these 40 it doesn't, it's just kind guesswork from an all too often frail human memory/attention that "ignores the noise" by its very nature. You might be right. Just trying to add some color. :-)

Another way to think of this is to imagine your own codebase "in reverse". "If I drop this optim, would I see it on that profile?" Or look at the biggest piles of code in your repo and ask "Is this in the critical path/really perf necessary?" and the like. Under the assumption that higher level things would be a lot shorter that kind of thought experiment can inform. Maybe an approach toward more objectivity, anyway. Little strewn about tidbits in every module don't really count { to me :-) } - that speaks more to abstraction problems.

But I don't think there is a lot of value in all the above gendankenizing. While I realize some bad "just throw money at it" kicked this off, one of my big objections to the entire framing is that I think people and their APIs really "alter the level" of a language. Indeed their experience with the language has big impact there. Every one reading this knows C's `printf(fmt, arg1, arg2,..)`. Yet, I'd bet under 1% have heard of/thought to do an allocating (or preallocated) string builder variant like `str(sub1, sub2, ..., NULL)` or using/acquiring something like glibc's `asprintf`. People will say "C is low level - It has no string concatenation!". Yet, inside of an hour or two most medium-skill devs could write my above variadic string builder or learn about vasprintf. Or learn about Boehm-Wiser for garbage collected C or 100 other things like that CTL I mentioned elsewhere in this thread.

So what "level" is C, the language? Beats me. Does it have concatenation? Well, not spelled "a+b" but maybe spelled not much worse "str(a,b,NULL)". Level all just depends so much on how you use it. Performance is similar. Much C++ (and Rust for that matter) is terribly inefficient because of reputations for being "fast languages" leading to less care (or maybe just being done by junior devs..). These "depends" carry over to almost anything..not just Rust or C, but sometimes even English. I am usually told I write in much too detailed a way and a trimmer way might have higher persuasion/communication performance! { How's that for "meta"? ;-) }

> This is by far the best convo I'm having in this HN post. Lol.

Cool, cool. There can be a lot of "Rust Rage" out there (in both directions, probably). :)

Anyway, I don't think we'll resolve anything objective here, but don't take a lack of response as indicating anything other than that. You aren't making any strong objective claims to really rebut and I'm glad that you personally undertook the challenge to do ripgrep in any language. I do think many might have done..maybe Ada, too, and probably many more, but maybe all at the same "realized levelness". You just did not know them/feel confident about getting peformance in them. Which is fine. A.Ok, even! I guess your other biggy is Go and that might actually not have worked of all the alternatives bandied about by pjmlp and myself so far.

> While allocation is pretty pervasive, I'm skeptical that everywhere or even most places you do it is an important perf bottleneck. Without a count of these 20 times it matters and these 40 it doesn't, it's just kind guesswork from an all too often frail human memory/attention that "ignores the noise" by its very nature. You might be right. Just trying to add some color. :-)

In general I agree. But I'm saying what I'm saying because of all the times I've had to change my code to amortize allocation rather than not do it. It's just pervasive because there are all sorts of little buffers everywhere in different parts of the code. And those were put there because of experimenting that said the program benefited from them.

The key here is that the loops inside of ripgrep can grow quite large pretty quickly. There's the obvious "loop over all files," and then there's "loop over all lines" and then "loop over all matches." ripgrep has to do work in each of those loops and sometimes the work requires allocation. Even allocations at the outermost loop (looping over all files) can cause noticeable degradations in speed for some workloads.

This is why I'm so certain.

The numpy example is a good one where a substantial amount of code has been written to cater to one very specific domain. And in that domain, it's true, you can write programs that are very fast.

> So what "level" is C, the language?

Oh I see, I don't think I realized you wanted to go in this direction. I think I would just say that I absolutely agree that describing languages as "levels" is problematic. There's lots of good counter examples and what not. For example, one could say that Rust is both high level and low level and still be correct.

But like, for example, I would say that "Python is high level" is correct and "Python is low level" is probably not. But they are exceptionally general statements and I'm sure counter-examples exist. They are, after all, inherently relativistic statements, so your baseline matters.

That's kind of why I've stayed in "hand wavy" territory here. If we wanted to come down to Earth, we could, for example, replace "high level languages" in the poster's original statement with something more precise but also more verbose that this discussion still largely fits.

> I am usually told I write in much too detailed a way and a trimmer way might have higher persuasion/communication performance! { How's that for "meta"? ;-) }

Yeah, it's hard to be both pithy and precise. So usually when one is pithy, it's good to take the charitable interpretation of it. But we are technical folks, and chiming in with clarifications is to be expected.

> I don't think we'll resolve anything objective here

Most definitely. At the end of the day, I have a prior about what's possible in certain languages, and if that prior is proven wrong, then invariably, my mental model gets updated. Some priors are stronger than others. :-)

> You aren't making any strong objective claims to really rebut

Right. Or rather, my claims are rooted in my own experience. If we were going to test this, we'd probably want to build a smaller model of ripgrep in Rust, then try implementing that in various languages and see how far we can get. The problem with that is that the model has to be complex enough to model some reasonable real world usage. As you remove features from ripgrep, so to do you remove the need for different kinds of optimizations. For example, if ripgrep didn't have replacements or didn't do anything other than memory map files, then that's two sources of alloc amortization that aren't needed. So ultimately, doing this test would be expensive. And that's ignoring the quibbling folks will ultimately have about whether or not it's fair.

> I guess your other biggy is Go and that might actually not have worked of all the alternatives bandied about by pjmlp and myself so far.

I would guess Go would have a much better shot than Python. But even Go might be tricky. Someone tried to write a source code line counter in Go, put quite a bit of effort into it, and couldn't get over the GC hurdle: https://boyter.org/posts/sloc-cloc-code/ (subsequent blog posts on the topic discuss GC as well).

I feel we've talked past each other about what is/is not Python a few times. There is Cython and Pythran and Pypy and ShedSkin and Numba and others that are targeting, for lack of a more precise term, "extreme compatibility with" CPython, but also trying to provide an escape hatch for performance which includes in-language low levelness including allocation tricks that are not "mainstream CPython" (well, Pypy may not have those...).

My first reply was questioning "what counts" as "Python". Cython is its own language, not just "C", nor just "Python", but can do "low level things" such as using C's alloca. Maybe the only prior update here is on the diversity of "Python" impls. There are a lot. This is another reason why language levelness is hard to pin down which was always my main point, upon which we do not disagree. Maybe this is what you meant by "exceptionally general", but I kinda feel like "there isn't just one 'Python'" got lost. { There used to be a joke.."Linux isn't" related to the variety of distros/default configs/etc. :-) }

Advice-wise, I would say that your claim can be closer to easily true if you adjust it to say "ripgrep needs 'low level tricks' to be fast and a language that allows them, such as Rust". That phrasing side-steps worrying about levelnesses in the large of programming languages, re-assigns it to techniques which is more concrete and begs the question of technique enumeration. That is the right question to beg, though, if not in this conversation then in others. You might learn how each and every technique has representation in various other programming languages. It's late for me, though. So, good night!

Ah I see. You are right. I missed that you were going after that. I'm personally only really familiar with CPython, so that is indeed what I had in mind. To be honest, I don't really know what a ripgrep in Cython would look like. Is there a separate Cython standard library, for example? Or do you still use Python's main standard library?

We don't have to tumble down that rabbit hole though. If someone wrote a ripgrep in Cython and matched performance, then I would definitely learn something.

> "ripgrep needs 'low level tricks' to be fast and a language that allows them, such as Rust"

I might use that, sure. I think my point above was that I had to riff off of someone else's language. But I think we covered that. :-) In any case, yes, that phrasing sounds better.

Anyway, good chatting with you, good night!

> I do not know of a competitor who has put the care into such a tool to even try to measure, though.

As an aside, I'm the author of ack, and I would ask that folks not use the word "competitor" to describe different projects in the same space. Speaking for me/ack and burntsushi/ripgrep, there is absolutely no competition between us. We have different projects that do similar things, and neither of us is trying to best the other. We are each trying to make the best project we can for the needs we are looking to fill. ripgrep won't replace ack, ack won't replace ripgrep, and neither one will replace plain ol' grep. Each has its place.

I believe this so strongly that I created a [feature comparison chart](https://beyondgrep.com/feature-comparison/) comparing various greplike tools, and a related blog post I wrote on this: [The best open source project for someone might not be yours, and that's OK](https://blog.petdance.com/2018/01/02/the-best-open-source-pr...)

I suspect Don Stewart might be able to do it in Haskell.

That's basically by knowing enough about GHC to carefully trigger all the relevant optimizations.

Hah. I'm highly skeptical. But I suppose if anyone could do it, it'd be him. I would certainly learn something. :-)

I've tried optimizing Haskell code myself before. It did not go well. It was an implementation of the Viterbi algorithm actually. We ported it to Standard ML and C and measured performance. mlton did quite well at least.

We published a paper about the process of writing Viterbi in Haskell in ICFP a few years back: https://dl.acm.org/doi/pdf/10.1145/2364527.2364560?casa_toke...

Unfortunately, the performance aspect of it was only a small part, and we didn't talk about the C or mlton ports in the paper.

Very interesting!

I suspect you could make a very Haskell-like language that's also really fast, but you'd have to base it on linear types from the ground up, and make everything total by default. (Hide non-total parts behind some type 'tag' like we do with IO in current Haskell (and have something like unsafePerformPartial when you know your code is total, but can't convince the compiler).)

That way the compiler can be much more aggressive about making things strict.

Cython with all the appropriate cdef type declarations can match C and so might also do it. Not sure Cython exactly counts as "Python"..it's more a superset/dialect { and I also doubt such a port would hold many lessons for @burntsushi, but it bore noting. }

You would go to parallelism precisely on those platforms where simpler performance fixes (changing some data structures or implementing limited sections in a fast language) are insuficient. Eficient parallelization of an existing algorithm is a major undertaking.

> In embedded applications you don't usually have the possibility to have parallel code, and even in low level software (for example the classical UNIX utilities), for simplicity and solidity using a single thread is really fine.

Depends on which of the classic utilities you are talking about.

Many of them are typically IO bound. You might not get much out of throwing more CPU at them.

ripgrep? :)

Indeed. Many of the optimizations ripgrep (and the underlying regex engine) does only show benefits if the data you're searching is already in memory.[1] The same is true of GNU grep. This is because searching data that's in your OS's file cache is an exceptionally common case.

[1] - I'm assuming commodity SSD in the range of a few hundred MB/s read speed. This will likely become less true as the prevalence of faster SSDs increases (low single digit GB/s).

ripgrep is based on re2, a c library

I would guess it contains more c than rust code...

But what I love about this article is its lack of hype. It makes clear arguments both ways and all of them I can get behind

Hype doesn't help

Edit: To all my downvoters; I anticipated you :) With love and best wishes

No it's not. Its regex library is written in Rust, but was inspired by RE2. It shares no code with RE2. (And RE2 is a C++ library, not C.)

Off the top of my head, the only C code in ripgrep is optional integration with PCRE2. In addition to whatever libc is being used on POSIX platforms. Everything else is pure Rust.

It couldn't figure it out from looking through ripgrep's website: does ripgrep support intersection and complement of expressions? Like eg https://github.com/google/redgrep does.

Regular languages are closed under those operations after all.

No, it doesn't. It's only theoretically easy to implement. In practice, they explode the size of the underlying FSM. Moreover, in a command line tool, it's somewhat easy to work around that through the `-v` switch and shell pipelining.

Paul's talk introduced redgrep is amazing by the way. Give it a watch if you haven't yet: https://www.youtube.com/watch?v=Ukqb6nMjFyk

ripgrep's regex syntax is the same as Rust's regex crate: https://docs.rs/regex/1.4.4/regex/#syntax (Which is in turn similar to RE2, although it supports a bit more niceties.)

> No, it doesn't. It's only theoretically easy to implement.

Oh, I didn't say anything about easy! I am on and off working on a Haskell re-implementation (but with GADTs and in Oleg's tagless final interpreter style etc, so it's more about exploring the type system).

> In practice, they explode the size of the underlying FSM.

You may be right, but that's still better than the gymnastics you'd have to do by hand to get the same features out of a 'normal' regex.

> Moreover, in a command line tool, it's somewhat easy to work around that through the `-v` switch and shell pipelining.

Alas, that only works, if your intersection or complement happen at the top level. You can't do something like

(A & not B) followed by (C & D)

that way.

> Paul's talk introduced redgrep is amazing by the way. Give it a watch if you haven't yet: https://www.youtube.com/watch?v=Ukqb6nMjFyk

I have, and I agree!

Perhaps I'll try and implement a basic version of redgrep in Rust as an exercise. (I just want something that supports basically all the operations regular languages are closed, but don't care too much about speed, as long as the runtime complexity is linear.)

Yeah sorry, I've gotten asked this question a lot. The issue is that building a production grade regex engine---even when it's restricted to regular languages---requires a lot more engineering than theory. And these particular features just don't really pull their weight IMO. They are performance footguns, and IMO, are also tricky to reason about inside of regex syntax.

If you get something working, I'd love to look at it though! Especially if you're building in a tagless final interpreter style. I find that approach extremely elegant.

For my current attempts, I bit off more than I could chew:

I tried to build a system that not only recognizes regular languages, but also serves as a parser for them (a la Parsec).

The latter approach pushes you to support something like fmap, but the whole derivatives-based approach needs more 'introspection' so support general mapping via fmap (ie a->b) is out, and you can only support things that you have more control over than functions.

(And in general, I am doing bifunctors, because I want the complement of the complement be the original thing.)

Sorry, if that's a bit confused.. If I was a better theoretician, I could probably work it out.

I haven't touched the code in a while. But recently I have thought about the theory some more. The Brzozowski derivative introduced the concept of multiplicative inverse of a string. I am working out the ramifications of extending that to the multiplicative inverse of arbitrary regular expressions. (The results might already be in the literature. I haven't looked much.)

I don't expect anything groundbreaking to come out of that, but I hope my understanding will improve.

> And these particular features just don't really pull their weight IMO. They are performance footguns, and IMO, are also tricky to reason about inside of regex syntax.

Well, in theory I could 'just' write a preprocessor that takes my regex with intersection and complement and translates it to a more traditional one. I wouldn't care too much if that's not very efficient.

I'm interested in those features because of the beauty of the theory, but it would also help make production regular expressions more modular.

Eg if you have a regular expression to decide on what's a valid username for someone to sign up to your system. You decide to use email addresses as your usernames, so the main qualification is that users can receive an email on it. But because they will be visible to other users, you have some additional requirements:

'.{0,100} & [^@]@[^@] & not (.(root|admin|<some offensive term>).@.) & not (.<sql injection>.*)'

That's a silly example. I think in production, I would be more likely to see something as complicated as this in eg some ad-hoc log parsing.

> The issue is that building a production grade regex engine---even when it's restricted to regular languages---requires a lot more engineering than theory.

Amen to that!

Ah, thanks burntsushi, I believe you are the ripgrep author even?

Great work btw. Ripgrep is the best

... I will have to restrict my comment to just LLVM being a larger, c++, dependency

... Just angling for more downvotes ;) Thanks for the reply

To be clear, ripgrep has no runtime dependency on any LLVM or C++ library. rustc does.

The interesting thing here is that rust has good threading and fantastic crates

I played with making a regex library in rust. Which, as per RE2 design involves constructing graphs and glueing them together as the regex is traversed

This requires a cycle catching gc, or, just a preallocated arena... It was my first foray into rust and felt I would need to be hitting into unsafe, which I wasn't ready for. Array indexing might decompose into an arena, but syntactically just a bit messier (imho)

Would be interesting to see how the RE2 does it in rust (didn't know that)

I like how the article shows both sides of the fence, it makes me realize:

I get a lot of optimizations from ptr stuffing in c. But sometimes we should lay down the good, for the better

You're overcomplicating it. When it comes to finite state machines at least, it's very easy to use an ID index instead of the raw pointer itself. That's exactly what the regex crate does.

For reference, I am also the author of the regex crate. The only unsafe it uses specific to finite automata is to do explicit elimination of bounds checks in the core hybrid NFA/DFA loop.

> When it comes to finite state machines at least, it's very easy to use an ID index instead of the raw pointer itself.

As an old C programmer, the difference between an array index and a pointer caught me by surprise. In C a pointer is just an unchecked offset into memory. A real array index is just a unchecked offset into ... maybe a smaller chunk of raw memory.

But in rust, an array index is something that comes with additional bounds checking overheads with every use. And the memory it points to is also constrained - the entire array has to be initialised, so if the index passes the bounds check you are guaranteed rusts memory consistency invariants are preserved. Indexes also allow you to escape the borrow checker. If you own the slice, there is no need to prove you can access an element of the slice.

So yeah, you can use indexes instead of pointers, but for rust that's like saying you can use recursion instead of iteration. Indexing and pointers are two very different things in rust.

I guess so. But note that I didn't equate them. I just said that you can use an ID index instead. For the particular program of FSMs, they work very well.

If bounds checks prove to be a problem, you can explicitly elide them. Indeed, Rust's regex does just that. :-)

> I played with making a regex library in rust. Which, as per RE2 design involves constructing graphs and glueing them together as the regex is traversed

You could instead go with a derivatives approach.


Has anyone built a production grade regex engine using derivatives? I don't think I've seen one. I personally always get stuck at how to handle things like captures or the very large Unicode character classes. Or hacking in look-around. (It's been a while since I've given this thought though, so I'm not sure I'll be able to elaborate much.)

I've made some attempts, but nothing production grade.

About large character classes: how are those harder than in approaches? If you build any FSM you have to deal with those, don't you?

One way to handle them that works well when the characters in your classes are mostly next to each other unicode, is to express your state transition function as an 'interval map'

What I mean is that eg a hash table or an array lets you build representations of mathematical functions that map points to values.

You want something that can model a step function.

You can either roll your own, or write something around a sorted-map data structure.

Eg in C++ you'd base the whole thing around https://en.cppreference.com/w/cpp/container/map/upper_bound (or https://hackage.haskell.org/package/containers- in Haskell.)

The keys in your sorted map are the 'edges' of your characters classes (eg where they start and end).

Does that make sense? Or am I misunderstanding the problem?

> I personally always get stuck at how to handle things like captures [...]

Let me think about that one for a while. Some Googling suggests https://github.com/elfsternberg/barre but they don't seem to support intersection, complement or stripping prefixes.

What do you want your capture groups to do? Do you eg just want to return pointers to where you captured them (if any)?

I have an inkling that something inspired by https://en.wikipedia.org/wiki/Viterbi_algorithm might work.

https://github.com/google/redgrep/blob/main/parser.yy mentions something about capture, but not sure if that has anything to do with capture groups.

> About large character classes: how are those harder than in approaches? If you build any FSM you have to deal with those, don't you?

I mean specifically in the context of derivatives. IIRC, the formulation used in Turon's paper wasn't amenable to large classes.

Yes, interval sets work great: https://github.com/rust-lang/regex/blob/master/regex-syntax/...

This is why I asked if a production grade regex engine based on derivatives exists. Because I want to see how the engineering is actually done.

> What do you want your capture groups to do? Do you eg just want to return pointers to where you captured them (if any)?

Look at any production grade regex engine. It will implement captures. It should do what they do.

> I have an inkling that something inspired by https://en.wikipedia.org/wiki/Viterbi_algorithm might work.

Nothing about Viterbi is fast, in my experience implementing it in the past. :-)

> https://github.com/google/redgrep/blob/main/parser.yy mentions something about capture, but not sure if that has anything to do with capture groups.

It looks like it does, and in particular see: https://github.com/google/redgrep/blob/6b9d5b02753c4ece17e2f...

But that's only for parsing the regex itself. I don't see any match APIs that utilize them. I wouldn't expect to either, because you can't implement capturing inside a DFA. (You need a tagged DFA, which is a strictly more powerful thing. But in that case, the DFA size explodes. See the re2c project and their associated papers.)

If I'm remembering correctly, I think the problem with derivatives is that they jump straight to a DFA. You can't do that in a production regex engine because a DFA's worst case size is exponential in the size of the regex.

> If I'm remembering correctly, I think the problem with derivatives is that they jump straight to a DFA. You can't do that in a production regex engine because a DFA's worst case size is exponential in the size of the regex.

Oh, that's interesting! Because I actually worked on some approaches that don't jump directly to the DFA.

The problem is the notion of (extended) NFA you need is quite a bit more complicated when you support intersection and complement.

Indeed. And in the regex crate and RE2, for example, captures are only implemented in the "NFA" engines (specifically, the PikeVM and the bounded backtracker). So if you support captures, then those engines have to be able to support everything.

Can I ask; what about a zdd?

The seem similar to closed languages with a disjunct and conjunct

Though I don't think I will, I was considering adding zdd or bdd to a PEG, to provide that conjunct

ofc, sat solver can represent a regex with conjuncts, but is this a good way of going about it, particularly with unbounded strings??

Would love to hear your thoughts on that

I don’t think people are downvoting you because they disagree on a matter of opinion. You’ve literally got the author of ripgrep having replied to you to tell you that what you’ve said is categorically false.

I anticipated my own falsity. I'm aware and at home with it

Huh? Why do you say something that you anticipate to be false?

Simply bervity

I anticipated I could well be wrong. I ALSO anticipated it would be a hard statement for people to take

I think it was a reasonable statement -- I can't research everything I say, and I had read re2 and regexes in rust to be the same

Interesting to read about redgrep and derivatives approach. Currently I'm programming a language that adds turing completeness to PEG expressions -- as in functions, but extended so the lhs is like a PEG -- just as function body can call sub functions, so too can the lhs

I'm hoping this will give a simple unified language



We make mistakes. If we can't handle that, then either we don't speak or program; or we deny it, program in c, then have flamewars and real wars

Or thirdly, we accept it, program in rust, and let others correct us

We can say you are my rustc compiler. So in effect I used a rust philosophy..... While programming in c

I guess the thing is that if we’re not sure whether or not what we’re saying is true, it can be considerate to phrase it that way, e.g. “I think ripgrep’s regex library is written in C” rather than stating it as a fact. While it is particularly likely that folks on this website will correct mistaken statements, stating them as fact seems more likely to potentially spread misinformation.

But anyways, cheers and good luck with your programming language!

In one word: Jesus

Why would you guess about how much C or Rust code that ripgrep contains when you could very quickly look? https://github.com/BurntSushi/ripgrep

Hm, how do I go from the github repo to a language breakdown of the dependency tree?

A lot of modern embedded hw are running operating systems providing threads (such as Linux) and multi-core CPUs.

Deadlocks are unique to Rust, eh?

> C libraries typically return opaque pointers to their data structures, to hide implementation details and ensure there's only one copy of each instance of the struct. This costs heap allocations and pointer indirections. Rust's built-in privacy, unique ownership rules, and coding conventions let libraries expose their objects by value

The primary reason c libraries do this is not for safety, but to maintain ABI compatibility. Rust eschews dynamic linking, which is why it doesn't bother. Common lisp, for instance, does the same thing as c, for similar reasons: the layout of structures may change, and existing code in the image has to be able to deal with it.

> Rust by default can inline functions from the standard library, dependencies, and other compilation units. In C I'm sometimes reluctant to split files or use libraries, because it affects inlining

This is again because c is conventionally dynamically linked, and rust statically linked. If you use LTO, cross-module inlining will happen.

> ABI compatibility

Rust provides ABI compatibility against its C ABI, and if you want you can dynamically link against that. What Rust eschews is the insane fragile ABI compatibility of C++, which is a huge pain to deal with as a user:


I don't think we'll ever see as comprehensive an ABI out of Rust as we get out of C++, because exposing that much incidental complexity is a bad idea. Maybe we'll get some incremental improvements over time. Or maybe C ABIs are the sweet spot.

Rust has yet to standardize an ABI. Yes you can call or expose a function with C calling conventions. However, you cant pass all native rust types like this, and lose some semantics.

However, as the parent comment you responded to you can enable LTO when compiling C. As rust is mostly always statically linked it basically always got LTO optimizations.

Even with static linking, Rust produces separate compilation units a least at the crate level (and depending on compiler settings, within crates). You won't get LTO between crates if you don't explicitly request it. It does allow inlining across compilation units without LTO, but only for functions explicitly marked as `#[inline]`.

Swift has a stable ABI. It makes different tradeoffs than rust, but I don't think complexity is the cliff. There is a good overview at https://gankra.github.io/blah/swift-abi/

Swift has a stable ABI at the cost of what amounts to runtime reflection, which is expensive. That doesn't really fit with the goals of Rust, I don't think.

This is misleading, especially since Swift binaries do typically ship with actual reflection metadata (unless it is stripped out). The Swift ABI does keep layout information behind a pointer in certain cases, but if you squint at it funny it's basically a vtable but for data. (Actually, even more so than non-fragile ivars are in Objective-C, because I believe actual offsets are not provided, rather you get getter/setter functions…)

I don't disagree that Rust probably would not go this way, but I think that's less "this is spooky reflection" and more "Rust likes static linking and cares less about stable ABIs, plus the general attitude of 'if you're going to make an indirect call the language should make you work for it'".

Do you have a source on this? I didn't think Swift requires runtime reflection to make calling across module boundaries work - I thought `.swiftmodule` files are essentially IR code to avoid this

Pretty sure the link the parent (to my comment) provided explains this.

It's not the same kind of runtime reflection people talk about when they (for example) use reflection in Java. It's hidden from the library-using programmer, but the calling needs to "communicate" with the library to figure out data layouts and such, and that sounds a lot like reflection to me.

Yes, and if you use the C abi to dynamically link rust code, you will have exactly the same problem as c: you can't change the layout of your structures without breaking compatibility, unless you use indirecting wrappers.

That's ABI compatibility of the language, not of a particular API.

If you have an API that allows the caller to instantiate a structure on the stack and pass a reference to it to your function, then the caller must now be recompiled when the size of that structure changes. If that API now resides in a separate dynamic library, then changing the size of the structure is an ABI-breaking change, regardless of the language.

Rust seems great to me, but aren't we losing a lot by giving up on C's dynamic linking and shared libraries?

To give some context to the parent comment:

$ ls -lh $(which grep) $(which rg)

-rwxr-xr-x 1 root root 199K Nov 10 06:37 /usr/bin/grep

-rwxr-xr-x 1 root root 4.2M Jan 19 09:31 /usr/bin/rg

My very unscientific measurement of the startup time of grep vs ripgrep is 10ms when the cache is cold (ie, never run before) and 3ms when the cache is hot (ie, was run seconds prior). For grep even in the cold case libc will already be in memory, of course. The point I'm trying to make is even the worst case, 10ms, is irrelevant to a human using the thing.

However, speaking as a Debian Developer, it makes a huge difference to maintaining the two systems that use the two programs. If a security bug is found in libc, all Debian has to do is make the fixed version of libc as a security update. If a bug is found in the rust stdlib create Debian has to track down every ripgrep like program that statically includes it, recompile it. There are current 21,000 packages that link to libc6 right in Debian right now. If it was statically linked, Debian would have to rebuilt and distribute _all_ of them. (As a side note, Debian has a lot hardware resources donated to it but if libc wasn't dynamlic I wonder if it could get security updates to a series of bugs in libc6 out in a timely fashion.)

I don't know rust well, but I thought it could dynamically link. The Debian rust packagers don't, for some reason. (As opposed 21,000 dependencies, libstd-rust has 1.) I guess there must be some kink in the rust tool chain that makes it easier not to. I imagine that would have to change if rust replaces C.


I am sympathetic to the point you make but to be accurate, one can consume and create C and C compatible dynamic libraries with rust. So, one is not “losing” something because what you (and me) want - dynamic linking and shared libraries with a stable and safe rust ABI - was not there to begin with.

Some would argue you gain more than you lose.

Also to be pedantic, C doesn't spec anything about linkage. Shared objects and how linkers use them to compose programs is a system detail more than a language one.

Dynamic linking and shared libraries are an OS feature, not a C one. C worked fine on DOS with no DLLs at the time.

This being said, Rust has no problem using dynamic libraries.

The reason Common Lisp uses pointers is because it is dynamically typed. It’s not some principled position about ABI compatibility. If I define an RGB struct for colours, it isn’t going to change but it would still need to be passed by reference because the language can’t enforce that the variable which holds the RGBs will only ever hold 3 word values. Similarly, the reason floats are often passed by reference isn’t some principled stance about the float representation maybe changing, it’s that you can’t fit a float and the information that you have a float into a single word[1].

If instead you’re referring to the fact that all the fields of a struct aren’t explicitly obvious when you have such a value, well I don’t really agree that it’s always what you want. A great thing about pattern matching with exhaustiveness checks is that it forces you to acknowledge that you don’t care about new record fields (though the Common Lisp way of dealing with this probably involves CLOS instead).

[1] some implementations may use NaN-boxing to get around this

Lisp users pointers because of the realization that the entities in a computerized implementation of symbolic processing can be adequately represented by tiny index tokens that fit into machine registers, whose properties are implemented elsewhere, and these tokens can be whipped around inside the program very quickly.

What your describing are symbols where the properties are much less important than the identity. Most CL implementations will use fixnums rather than pointers when possible because they don’t have some kind of philosophical affinity to pointers. For data structures, pointers aren’t so good with modern hardware. The reason Common Lisp tends to have to use pointers is that the type system cannot provide information about how big objects are. Compare this to the arrays which are often better at packing because they can know how big their elements are.

This is similar in typed languages with polymorphism like Haskell or ocaml where a function like concat (taking a list of lists to a single list) needs to work when the elements are floats (morally 8 bytes each) or bools (morally 1 bit each). The solution is to write the code once and have everything be in one word, either a fixnum or a pointer.

Rust makes building from source and cross compiling so easy that I don’t really care for dynamic linking in my use cases of Rust.

Dynamic linking is one thing I miss from Swift - I used dynamic linking for hot code reloading for several applications, which resulted in super fast and useful development loops. Given Rust's sometimes long compile times, this is something which would be welcome.

There are crates for hot reloading in Rust, and they use dynamic linking.

Do you have to stick to a C-FFI like interface, or can they handle rust-native features like closures and traits?

Some stick to C FFI, some enforce that the Rust compiler version is the same which makes ABI issues irrelevant.

> This costs heap allocations and pointer indirections.

Heap allocations, yes; pointer indirections no.

A structure is referenced by pointer no matter what. Remember that the stack is accessed via a stack pointer.

The performance cost is that there are no inline functions for a truly opaque type; everything goes through a function call. Indirect access through functions is the cost, which is worse than a mere pointer indirection.

An API has to be well-designed this regard; it has to anticipate the likely use cases that are going to be performance critical and avoid perpetrating a design in which the application has to make millions of API calls in an inner loop. Opaqueness is more abstract and so it puts designers on their toes to create good abstractions instead of "oh, the user has all the access to everything, so they have all the rope they need".

Opaque structures don't have to cost heap allocations either. An API can provide a way to ask "what is the size of this opaque type" and the client can then provide the memory, e.g. by using alloca on the stack. This is still future-proof against changes in the size, compared to a compile-time size taken from a "sizeof struct" in some header file. Another alternative is to have some worst-case size represented as a type. An example of this is the POSIX struct sockaddr_storage in the sockets API. Though the individual sockaddrs are not opaque, the concept of providing a non-opaque worst-case storage type for an opaque object would work fine.

There can be half-opaque types: part of the structure can be declared (e.g. via some struct type that is documened as "do not use in application code"). Inline functions use that for direct access to some common fields.

Escape analysis is tough in C, and data returned by pointer may be pessimistically assumed to have escaped, forcing exact memory accesses. OTOH on-stack struct is more likely to get fields optimized as if they were local variables. Plus x86 has special treatment for the stack, treating it almost like a register file.

Sure, there are libraries which have `init(&struct, sizeof(struct))`. This adds extra ABI fragility, and doesn't hide fields unless the lib maintains two versions of a struct. Some libraries that started with such ABI end up adding extra fields behind internal indirection instead of breaking the ABI. This is of course all solvable, and there's no hard limit for C there. But different concerns nudge users towards different solutions. Rust doesn't have a stable ABI, so the laziest good way is to return by value and hope the constructor gets inlined. In C the solution that is both accepted as a decent practice and also the laziest is to return malloced opaque struct.

> This costs heap allocations

I'd like to point out that this is not always the case. Some libraries, especially those with embedded systems in mind, allow you to provide your own memory buffer (which might live on the stack), where the object should be constructed. Others allow you to pass your own allocator.

> "Clever" memory use is frowned upon in Rust. In C, anything goes. For example, in C I'd be tempted to reuse a buffer allocated for one purpose for another purpose later (a technique known as HEARTBLEED).

This made me laugh

It's not trivial to write a funny and clever burn, but this just hits the spot...

That is nice, although I think Heartbleed was due to a missing bounds check enabling the reading of adjacent memory, not due to reusing the same buffer...

If my memory is correct: yes, the root cause was a missing bounds check, but the vulnerability was much worse than it could have been because OpenSSL tended to allocate small blocks of memory and aggressively reuse them — meaning the exploited buffer was very likely to be close in proximity to sensitive information.

I don’t have time right now to research the full details, but the Wikipedia article gives a clue:

> Theo de Raadt, founder and leader of the OpenBSD and OpenSSH projects, has criticized the OpenSSL developers for writing their own memory management routines and thereby, he claims, circumventing OpenBSD C standard library exploit countermeasures, saying "OpenSSL is not developed by a responsible team." Following Heartbleed's disclosure, members of the OpenBSD project forked OpenSSL into LibreSSL.

Until very recently, memory allocators were more than happy to return you the thing you just deallocated if you asked for another allocation of the same size. It makes sense, too: if you're calling malloc/free in a loop, which is pretty common, this is pretty much the best thing you can do for performance. Countless heap exploits later (mostly attacking heap metadata rather than stale data, to be honest) allocators have begun to realize that predictable allocation patterns might not be the best idea, so they're starting to move away from this.

True of the more common ones, but it should be acknowledged that OpenBSD was doing this kind of thing (and many other hardening techniques) before heartbleed, which was the main reason Theo de Raadt was so upset that they decided to circumvent this, because OpenBSD's allocator could have mitigated the impact otherwise.

Even higher-performance mallocs like jemalloc had heap debugging features (poisoning freed memory) before Heartbleed, which -- if enabled -- would catch use-after-frees, so long as libraries and applications didn't circumvent malloc like OpenSSL did (and Python still does AFAIK).

> and Python still does AFAIK

Don't you sort of have to do that if you're writing your own garbage collector, though? I guess for a simple collector you could maintain lists of allocated objects separately, but precisely controlling where the memory is allocated is important for any kind of performant implementation.

Python does refcount-based memory management. It's not a GC design. You don't have to retain objects in an internal linked list when the refcount drops to zero, but CPython does, purely as a performance optimization.

Type-specific free lists (just a few examples; there are more):

* https://github.com/python/cpython/blob/master/Objects/floato...

* https://github.com/python/cpython/blob/master/Objects/tupleo...

And also just wrapping malloc in general. There's no refcounting reason for this, they just assume system malloc is slow (which might be true, for glibc) and wrap it in the default build configuration:


So many layers of wrapping malloc, just because system allocators were slow in 2000. Defeats free() poisoning and ASAN. obmalloc can be disabled by turning off PYMALLOC, but that doesn't disable the per-type freelists IIRC. And PYMALLOC is enabled by default.

Thanks for the links! I wasn't aware of the PyMem_ layer above, the justification for that does sound bad.

But Python runs a generational GC in addition to refcounting to catch cycles (https://docs.python.org/3/library/gc.html): isn't fine control over allocation necessary for that? E.g. to efficiently clear the nursery?

Ah, good point; at the very least things like zeroing out buffers upon deallocation would have helped. Yes, I was a fan of the commits showing up at opensslrampage.org. One of the highlights was when they found it would use private keys as an entropy source: https://opensslrampage.org/post/83007010531/well-even-if-tim...

That's what happens by using normal malloc/free anyway, no? Implementations of malloc have a strong performance incentive to allocate from the cache hot most recently freed blocks.

Yes, all allocators (except perhaps OpenBSDs from what I see in this thread) do this. It is also why `calloc` exists - because zero-initializing every single allocation is really, really expensive.

iirc both issues caused the problem. Buffer overlow let the memory get read, re-use meant there was important data in the buffer.

It's incorrect, however.

Heartbleed wasn't caused by reusing buffers; it was caused by not properly sanitizing the length of the buffer from entrusted input, and reading over it's allocated size, thus allowing the attacker to read into memory that wasn't meant for him.

OpenSSL had its own memory-recycling allocator, which made the bug guarantee leaking OpenSSL's own data. Of course leaking random process memory wouldn't be safe either, but the custom allocator added that extra touch.

OpenSSL, like many other software indeed used a custom allocator, but this hasn't much to do with this to anything at all, as the system allocator also strongly favors giving back memory that once belonged to the same process, as it has to zero memory that belonged to other processes first.

This is of course a kernel feature when the lower level primitives are used that ask for blocks of memory from the kernel, which zeroes them if they had belonged to another process prior, and does not when they had not, and thus strongly favors giving back own memory. — allocators, such as the standard library's one or any custom ones, are built on top of these primitives.

That's not a good burn though.

This was actually a somewhat significant reason I shared this article. (^.^)

> in C I'd be tempted to reuse a buffer allocated for one purpose

... In rust I'd just declare an enum for this. Enums in Rust can store data. In this way they are like a safe union.

It was quite funny but it's quite likely you'll be reusing memory anyway whether it's on the stack or the heap, no?

The issue with this is that 'clever' compilers can optimise out any memset calls you do.

Rust's safety rules also forbid access to uninitialized memory, even if it's just a basic array of bytes. This is an extra protection against accidentally disclosing data from a previous "recycled" allocation.

> computed goto

I did a deep dive into this topic lately when exploring whether to add a language feature to zig for this purpose. I found that, although finnicky, LLVM is able to generate the desired machine code if you give it a simple enough while loop continue expression[1]. So I think it's reasonable to not have a computed goto language feature.

More details here, with lots of fun godbolt links: https://github.com/ziglang/zig/issues/8220

[1]: https://godbolt.org/z/T3v881

Somewhat off-topic: I just looked into zig, because you mentioned it.

> C++, D, and Go have throw/catch exceptions, so foo() might throw an exception, and prevent bar() from being called. (Of course, even in Zig foo() could deadlock and prevent bar() from being called, but that can happen in any Turing-complete language.)

Well, you could bite the bullet and carefully make Zig non-Turing complete. (Or at least put Turing-completeness behind an escape hatch marked 'unsafe'.)

That's how Idris and Agda etc do it.

With respect to deadlocks, there’s little practical difference between an infinite loop and a loop that holds the lock for a very long time.

Languages like Idris and Agda are different because sometimes code isn’t executed at all. A proof may depend on knowing that some code will terminate without running it.

> Languages like Idris and Agda are different because sometimes code isn’t executed at all. A proof may depend on knowing that some code will terminate without running it.

Yes. They are rather different in other respects as well. Though you can produce executable code from Idris and Agda, of course.

> With respect to deadlocks, there’s little practical difference between an infinite loop and a loop that holds the lock for a very long time.

Yes, that's true. Though as a practical matter, I have heard that it's much harder to produce the latter by accident, even though only the former is forbidden.

For perhaps a more practical example, have a look at https://dhall-lang.org/ which also terminates, but doesn't have nearly as much involved proving.

Thank you for doing the research and not just mindlessly adding features other languages have :)

Oh, that's great. I write interpreters off and on and I love Zig, so it's nice to hear I can get the best code gen while keeping the language small

Really cool investigation. I wonder if this applies to rust as well.

As you said though, this is finicky, and if you need this optimization for performance then you don’t want to rely on compiler heuristics.

Rust's output[0] is basically the same as Zig in this case. The unsafe is needed here because it's calling extern functions.

However, in this specific instance at least, this isn't as optimal as it could be. What this is basically doing is creating a jump table to find out which branch it should go down. But, because all the functions have the same signature, and each branch does the same thing, what it could have done instead is create a jump table for the function to call. At that point, all it would need to do is use the Inst's discriminant to index into the jump table.

I'm not sure what it would look like in Zig, but it's not that hard to get that from Rust[1]. The drawback of doing it this way is that it now comes with the maintenance overhead of ensuring the order and length of the jump table exactly matches the enum, otherwise you get the wrong function being called, or an out-of-bounds panic. You also need to explicitly handle the End variant anyway because the called function can't return for its parent.

I don't know Zig, but from what I understand it has some pretty nice code generation, so maybe that could help with keeping the array and enum in step here?

[0] https://godbolt.org/z/sa6fGq

[1] https://godbolt.org/z/P3cj31

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact