Safely writing code that isn't thread-safe

ajross · on Nov 23, 2022

It takes a while to get to it, but the point of the argument seems to be that naive Rust code can't have race conditions because naive Rust code will only assign to memory through a mutable reference, and the borrow checker guarantees that no two functions (no matter which thread they may be running in) can have borrowed such a reference simultaneously (whether "concurrently" or not). And that's true as far as it goes, I guess.

But the flip side is that naive Rust code just isn't useful for multithreaded work and tends to require heavier abstractions (e.g. copy "commands" or whatever in and out of queues vs. just taking a lock around some shared data). I occasionally try to look at the natural way one expresses the kind of stuff we do in Zephyr in a Rust kernel, and... there's really not much there besides copious and extensive use of unsafe.

aidenn0 · on Nov 23, 2022

> But the flip side is that naive Rust code just isn't useful for multithreaded work and tends to require heavier abstractions (e.g. copy "commands" or whatever in and out of queues vs. just taking a lock around some shared data). I occasionally try to look at the natural way one expresses the kind of stuff we do in Zephyr in a Rust kernel, and... there's really not much there besides copious and extensive use of unsafe.

You can take a lock around some shared data; the standard library has such a thing. If the standard library lacked such a thing, then you could implement that abstraction in unsafe, but that at least narrows the amount of code you need to reason about the thread-safety for. Of course that only prevents data-races, not deadlocks and livelocks.

Most code that you can't "just do" in Rust is code that really should have a lot of eyeballs on it were it implemented in some other language. Kernel code gets a lot of eyeballs, usually a lot of testing, and still ends up with fielded bugs[1].

For kernel code, I imagine Rust shines for device drivers; TFA applies well there; a multithreaded kernel could have thread-oblivious device drivers and no future "optimization" would accidentally share data between threads.

There are data-structures and algorithms that put you in conflict with Rust's type system (a non-concurrency one that comes to mind is trees with parent pointers). Sometimes the solution is to pick a different data-structure, other times it is judicious use of unsafe.

1: Note that the same is true with parts of the Rust stdlib that are implemented with unsafe.

nyanpasu64 · on Nov 23, 2022

In my experience multithreaded Rust isn't substantially more restrictive then multithreaded C++ at writing code free of data races (although it very much is at writing single-threaded code free of use-after-free). The primary methods in Rust to share memory across threads (in my experience writing high-performance wait-free but userspace not kernel code) are custom UnsafeCell wrappers which implement Send and Sync and have safe locking mechanisms preventing data races, safe atomics, or safe mutexes.

duped · on Nov 23, 2022

> e.g. copy "commands" or whatever in and out of queues vs. just taking a lock around some shared data

    struct Shared { /* ... */ }
    
    #[derive(Clone)
    struct Context {
      shared:Arc<Mutex<Shared>>
    }

    impl Context {
      fn do_stuff (&self) {
        let shared = self.shared.lock();
        /* ... */
      }

      fn do_more_stuff (&self) {
        let shared = self.shared.lock();
      }
    }

What's wrong with that? You have a bunch of safety guarantees here :

- mutex poisonings are detected (you can use try_lock to recover from them)

- the shared data isn't leaked and guaranteed to live long enough in each calling context

- the Mutex implementation doesn't suck and iirc supports reentrant code without deadlocks, making it safe to use on the same thread

- it's really easy to call `context.clone()` to send it wherever you want

masklinn · on Nov 23, 2022

> the Mutex implementation doesn't suck and iirc supports reentrant code without deadlocks

I don’t think that’s true, at least in the “rust mutexes are reentrant” sense: a mutex hands out &mut refs, a reentrant mutex would mean concurrent &mut, which is illegal.

However rust does allow handing out the one &mut to callees, without fear that it will be leaked and outlive the lock or anything, so this sort of “reentrancy” is fine if you split functions into lock acquisition and actual processing.

IME deadlocks (and self-deadlocks) remain the biggest issue in concurrent rust. Lock ordering concerns are especially frustrating when fine-locking.

duped · on Nov 23, 2022

If you do something like this

    #[derive(Clone)]
    struct S {
      a: Arc<Mutex<A>>
    }

    impl S {
        fn reentrant(&self) {
            let cloned = self.cloned();
            let locked = self.a.lock();
            if condition {
                cloned.reentrant();
            } else {
                /* do stuff */
            }
        }

I believe it will not deadlock. Reentrant mutexes are pretty common implementations - is that not true in Rust right now?

anderskaseorg · on Nov 23, 2022

Your code does deadlock.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

The parent comment correctly explained that it would be incorrect for a Mutex<A> to hand out unique references &mut A to multiple call frames simultaneously. If it did, one of them could be passed to another thread and accessed concurrently with the other one to create race conditions.

https://doc.rust-lang.org/std/sync/struct.Mutex.html#method....

“The exact behavior on locking a mutex in the thread which already holds the lock is left unspecified. However, this function will not return on the second call (it might panic or deadlock, for example).”

The parking_lot crate has a ReentrantMutex<A> that would not deadlock here. It hands out shared references &A instead of unique references &mut A. If you want to be able to mutate the value, you can wrap it in a type with interior mutability (Cell or RefCell), and then the type system will prevent you from passing those references to other threads.

https://docs.rs/parking_lot/latest/parking_lot/type.Reentran...

masklinn · on Nov 23, 2022

> The parent comment correctly explained that it would be incorrect for a Mutex<A> to hand out unique references &mut A to multiple call frames simultaneously. If it did, one of them could be passed to another thread and accessed concurrently with the other one to create race conditions.

FWIW they wouldn't even need to be moved between threads. IIRC creating two independent &mut to the same object is one of the instant UBs, it already is an invalid program state.

umanwizard · on Nov 23, 2022

> (you can use try_lock to recover from them)

Slightly inaccurate. You can recover from them with `lock` too. In fact, to _not_ attempt to recover from them, your code should have looked like:

    let shared = self.shared.lock().expect("mutex poisoned!")

as `lock` returns a `Result`.

The difference between `try_lock` and `lock` is unrelated: `try_lock` fails fast if it's not possible to take the mutex immediately.

the_mungler · on Nov 23, 2022

To be fair, rust at kernel level seems to have way more unsafe code than usual, though I have no experience in that area so I can't say much. In rust you absolutely can "just take a lock around some shared data", but it is a bit different then other languages. mutexes actually contain the data they guard, making it impossible to modify unless you have the lock. There is another article about muexes by this same author if you want more details: http://cliffle.com/blog/rust-mutexes/

umanwizard · on Nov 23, 2022

> just taking a lock around some shared data

Can’t you do this in Rust with std::sync::Mutex and similar?

oconnor663 · on Nov 23, 2022

Absolutely, and IMHO the Mutex/MutexGuard API is one of the best showcases of what Rust is capable of. (Fun fact: RwLock<T> is Sync only if T is Sync, but Mutex<T> is Sync even if T is not Sync!)

tialaramex · on Nov 23, 2022

Certainly it's interesting that Mutex<T> is a thing in Rust but the equivalent (a mutex in the form of a wrapper type) doesn't exist in C++. One rationale is that Mutex<T> in Rust is actually safe, whereas the C++ equivalent would be an attractive nuisance since it looks safe but would be easily abused and if any of the mutex's users abused it you're screwed.

gpderetta · on Nov 23, 2022

> Mutex<T> is a thing in Rust but the equivalent (a mutex in the form of a wrapper type) doesn't exist in C++

This is the synchronized value pattern [1]. I'm pretty sure that my 3rd edition of "The C++ programming Language" by Bjarne had a description of it and it predates rust by at least a decade.

[1] https://www.boost.org/doc/libs/1_80_0/doc/html/thread/sds.ht...

tialaramex · on Nov 23, 2022

The Fourth Edition does not appear to mention this pattern under that name, and indeed it gives as an example burying the mutex inside the type to be protected, which has the same downside (the resulting object is bigger†) but not the upside (Stroustrup's approach means we can still forget to take the lock)

That Boost link says it is "experimental and subject to change in future versions" but I don't know whether Boost just says that about everything or whether this would particularly mark out this feature.

† In Rust Mutex<T> is 8 bytes bigger than T, typically. In C++ std::mutex is often 40 bytes.

gpderetta · on Nov 23, 2022

I don't have the copy of the book with me, so I don't know how Stroustrup called it. It was part of a discussion of overloading operator->.

The boost warning doesn't have much to it. Boost libraries don't even guarantee API stability across versions and boost.synchronized has been available for a few years with no changes.

> In Rust Mutex<T> is 8 bytes bigger than T, typically. In C++ std::mutex is often 40 bytes.

That's because on libstdc++ std::mutex embeds a pthread_mutex_t which is 40 bytes for ABI reasons. It is a bad early ABI decision that can't be changed unfortunately. std::mutex on MSVC is worse. std::shared_mutex is much smaller on MSVC, but even worse than std::mutex on libstdc++.

A portable Mutex<T> of minimal size can be built on top of std::atomic::wait though.

tialaramex · on Nov 23, 2022

Maybe the standard should then take the opportunity to define such a thing, since it would be smaller and more useful than what they have today in practice.

gpderetta · on Nov 23, 2022

Well, yes. Then again the committee took 10 years to standardize std::mutex, 14 years for std::shared_mutex. 17 for std::optional. We still don't have a good hash map.

We have to be realistic, the standard library will never be complete and you'll always have to get basic components from 3rd party or write them yourself.

hedora · on Nov 23, 2022

One thing that’s nice about C++ is that you reimplement stuff that doesn’t have a decade of battle testing behind it.

That way, such ‘bleeding edge’ features evolve and improve a lot before being set in stone.

umanwizard · on Nov 23, 2022

Unfortunately, they did standardize the bad hash map.

tialaramex · on Nov 23, 2022

To be fair they standardized the hash map you'd have probably been taught 30 and maybe even 20 years ago in CS class. It's possible that if your professor is rather slow to catch on they are still teaching new kids bucketed hash tables like the one std::unordered_map requires.

I'd guess that while a modern class are probably taught some sort of open addressed hash map, they aren't being taught anything as exotic as Swiss Tables or F14 (Google Abseil and Facebook Folly's maps) but that's OK because standardising all the fine details of those maps would be a bad idea too.

On the other hand, the document does not tell you to use a halfway decent hash function, and many standard implementations don't provide one, so in practice many programs don't use one. The "bad hash map" performs OK with a terrible hash function, whereas the modern ones require decent hashes or their performance is miserable.

oconnor663 · on Nov 23, 2022

I think one of the reasons this isn't standard is that it's too easy to make mistakes with it. For example, if `std::string readValue3()` was changed to `std::string& readValue3()`, that reference would outlive the temporary guard, and any code that retained that reference would be broken. That's not so different from regular C++ mutex issues, but the downside here is that the convenience of synchronized_value also makes it harder to spot the mistake.

gpderetta · on Nov 23, 2022

Indeed. It is relatively easy to leak out a reference from a synchronized wrapper. I see it more of an aid to highlight which data is shared (and which mutex protects it) than a strong safety helper.

grogers · on Nov 23, 2022

Not in the stdlib, but it exists elsewhere, such as folly::Synchronized. There are some gotchas but it's a LOT better than a separate mutex and data. The main gotcha is instead of

  for (auto foo : bar.wlock()) {
      baz(foo);
  }

You need to do

  bar.withWLock([](auto &lockedBar) {
      for (auto foo : lockedBar) {
          baz(foo);
      }
  });

In rust, the lifetime checking prevents that.

The other big gotcha is accidentally blocking while holding a lock. E g. Instead of

  auto g = bar.wlock();
  baz(*g);
  co_await quux();

You should do

  {
      auto g = bar.wlock();
      baz(*g);
  }
  co_await quux();

Or use withWLock. If you co_await with the lock held you can deadlock if the executor switches to a different coroutine that tries to acquire the same lock. If you actually need to hold the lock across the blocking call, you need coroutine-aware locks, which turns it into

  auto g = co_await bar.wlock();
  baz(*g);
  co_await quux();

No idea if rust prevents this problem - I suspect not, but I haven't used async rust.

tialaramex · on Nov 23, 2022

It is possible to fall asleep in Rust while holding a lock, but it's possible to statically detect this mistake/ infelicitious choice in the software and diagnose it. Clippy calls this await_holding_lock - unfortunately the current Clippy diagnosis sometimes gives false positives, so that needs improving.

Tokio provides a Mutex like your final example that is intended for use in such async code that will hold locks while waiting because Tokio will know you are holding the lock. It is accordingly more expensive, and so should only be used if "Don't hold locks while asleep" was not a practical solution to your problem.

smallstepforman · on Nov 23, 2022

To use the articles example, there are typically multiple bank accounts you want to update atomically, so guarding one account with a mutex doesn’t help you prevent deadlocks. The lock needs both accounts. The Mutex<T> example just doesn’t work with interacting objects.

tialaramex · on Nov 23, 2022

In C++ you'd want to still also offer std::mutex because C++ doesn't have Zero Size Types, so a C++ Mutex<T> equivalent would always need space to store something. Mutex<()> is the same size as a hypothetical "mutex only" type and so Rust has no reason to offer a separate type representing a mutex which doesn't protect anything in particular.

In fact even without actually putting anything of substance in the mutex, you can get value from type system judo using this mechanism, which C++ doesn't appear to do either.

gpderetta · on Nov 23, 2022

> C++ doesn't have Zero Size Types

[[no_unique_address]] since C++20. Before that there was the empty base class optimization.

tialaramex · on Nov 23, 2022

Neither the Empty Base Class nor [[no_unique_address]] give C++ Zero Size Types. The [[no_unique_address]] attribute is a way to achieve something empty base classes were useful for without the accompanying problems, so that's nice, but it's not ZSTs.

Can you say whether you genuinely thought C++ had ZSTs? And if so, how you came to that conclusion ?

gpderetta · on Nov 23, 2022

I'm not saying that C++ has zero size types. I'm saying that no_unique_address and EBO are a way to store a stateless object without it occupying any space, which is all you need to implement a zero space overhead Mutext<T> for stateless types.

tialaramex · on Nov 23, 2022

I think the complexity to deliver an equivalent of Mutex<T> which also works via no_unique_address to deliver no-space-overhead for deliberately stateless types that would otherwise add 1 byte to the type size is probably a bit much to ask.

Thanks for pointing me to Boost synchronized_example<T> showing that this does exist, at least as an experimental library feature.

gpderetta · on Nov 23, 2022

It is not exactly rocket science: https://gcc.godbolt.org/z/6Kz53bs7x. Bonus it supports visiting multiple synchronized at the same time, deadlock free.

tialaramex · on Nov 23, 2022

Huh. I was expecting that providing access to the no_unique_address value despite it not having an address would be much trickier than that.

masklinn · on Nov 23, 2022

> Fun fact: RwLock<T> is Sync only if T is Sync, but Mutex<T> is Sync even if T is not Sync!

What’s more interesting is figuring out why that is. Also why Arc<T> is Send only if T is Send.

delian66 · on Nov 23, 2022

Why is that?

GrumpySloth · on Nov 23, 2022

My guess is one way it could break, if it was otherwise, would be if T relied on thread local state.

masklinn · on Nov 23, 2022

The alternative would be to actively undermine Rust's type system (and guarantees):

- RwLock hands out multiple references (that's the point), Sync means a type can be used from multiple threads (concurrently), if RwLock<T: !Sync> was Sync it would allow multiple outstanding references for the same !Sync object, which is not legal.

- Mutex, however, only hands out a single reference at a time (which is also why it can always hand out a mutable reference), meaning semantically it acts like it moves the T to the target thread then borrows it = nothing to sync, that's why Mutex<T> is Sync if T is Send.

- For Arc, if it were Send without the wrapped object being Send it would allow Send-ing !Send objects: create an Arc<T>, clone it, move the clone to a second thread, drop the source, now you can try_unwrap() or drop() the clone and it'll work off of the second thread.

This is a problem with threadlocal state, but also with resource affinity (e.g. on windows a lock's owner is recorded during locking, and only the owner can release the lock, there are also lots of APIs which can only work off of the main thread.), or thread-safety (Rc for instance would be completely broken if you could Send it, as the entire point is to use an unsynchronised refcount).

pornel · on Nov 23, 2022

That's not accurate. You can mutate via shared reference too, as long as something ensures it's still thread safe (e.g. synchronized).

There's UnsafeCell type that is a deliberate loophole in immutability of types behind shared references. It's used as the basis for atomic access and mutexes.

dxuh · on Nov 23, 2022

Though I have struggled with this quite a few times, I have never considered that preventing your code from being called from multiple threads is a great feature to have. I have written comments stating that certain functions should not be called from multiple threads many times. And I myself have called these functions from multiple threads by accident before.

Just another comment about this:

> Basically all languages except C(++) have an async mode that transforms code into state machines that can be interleaved on a single thread of execution.

C++ has coroutine support since C++20. It's not pretty, but it's there.

midjji · on Nov 23, 2022

There really should be a syntactic sugar for things like pure function/stateless, and its collary, explicitly reentrant, verified by the compiler. Then a debug mode- or perhaps flag would allow all functions which dont qualify as explicitly non-threadsafe, possibly having that as the default.

But I'm dubious about the use of needing to explicitly mark functions as non-threadsafe, its a bit like the const vs mut, but while const might be slightly less common save for things that should be autogenerated, in the case of reentrant, if it isnt explicitly marked, it should never be used reentrant, even if it happens to be a pure function at the moment, or for this one implementation etc.

lll-o-lll · on Nov 23, 2022

Not pretty! How dare you! Beauty is clearly in the eye of the beholder.

masklinn · on Nov 23, 2022

Or the numerous eyestalks of the Beholder, as the case may be.

throwaway17_17 · on Nov 23, 2022

I know meta-commentary on comments are slightly frowned upon, but yours just made my morning. So sincerely, thank you.

nly · on Nov 23, 2022

C++ coroutines are incomplete without a an executor to run them on.

Unlike C# you can't just sprinkle in async and await keywords and let the background thread pool take care of it...and doing so would be undesirable, given the lack of thread safe garbage collection.

Boost.Fiber on the other hand pretty damn cool.

hedora · on Nov 23, 2022

That is a feature of coroutines, not a bug. I’ve written executors in multiple languages. For production code, it is usually easier than reusing some complicated thing that is a poor fit but does nothing well.

steveklabnik · on Nov 23, 2022

The exact same thing is true of Rust.

1letterunixname · on Nov 23, 2022

For a built-in reference semantics system that's finer-grained than Rust's, there's Pony's system (has a GC, but Orca is supposedly faster than C4 or HiPE/BEAM).

https://tutorial.ponylang.io/reference-capabilities/capabili...

smallstepforman · on Nov 23, 2022

I’d love to use Pony (sans GC), but the project is dead, Sylvan has lost interest, Sebastian has lost interest, and even its biggest advocates (Sean and Joe) have moved on. It’s s shane since I really really like what Pony was doing.

ergl · on Nov 23, 2022

The project is definitely not dead! Right now it is entirely volunteer-driven, so the amount of work being done depends on the amount of work put in by the overall community. Feel free to drop by our Zulip to catch up on the latest developments: https://ponylang.zulipchat.com/

izietto · on Nov 23, 2022

Are you sure about that? I see movement in the main repository and the latest release dates to 9 days ago: https://github.com/ponylang/ponyc/releases/tag/0.52.1

p1mrx · on Nov 23, 2022

Mirror: https://web.archive.org/web/20221123004412/http://cliffle.co...

samsquire · on Nov 23, 2022

Thank you for writing this.

I am deeply interested in parallelism, asynchrony and multithreading. I blog about it everyday in ideas4 (see my profile)

I try think of automatic parallelization approaches. And how to structure programs that take advantage of queuing theory and processing ratios. Imagine if the compiler could work out that one program has a ratio of processing of 2:1 then it could scale out automatically.

I am very interested in compilers that can automatically parallelise. I find Pony to be very interesting with its reference capabilities.

My problem with Rust is that is precludes many safe programs. References can be alive and be passed around.

Regarding bank account transactions, I worked on the same problem. It lead to me implementing multiversion concurrency control. Multiversion concurrency control allows thread safety avoiding locks except for internal data structures of the MVCC.

Multiple threads can read/write the same data but they'll never modify the exact same data due to the multiple versions.

https://github.com/samsquire/multiversion-concurrency-contro...

I also implemented the bank account example but I serialised the account transactions to the same account numbers in ConcurrentWithdrawer.java - this solution doesn't use multiversion concurrency control.

My other solution BankAccounts2.java and BankAccounts3.java take different approaches.

The BankAccounts2.java has 1 thread out of 11 threads that synchronizes the bank accounts and performs the serialization of account balances. The other threads generate transactions.

BankAccounts3.java simply uses a lock.

The problem with the bank accounts problem is that I am yet to work out how to scale it. You could shard threads to serve a certain range of account numbers.

Or you can separate the current balance of each account across threads and if the receiving transaction is less than that balance, you don't need to read other threads to check.

I recommend this article by Vale dev's

https://verdagon.dev/blog/seamless-fearless-structured-concu...

I also experimented with other concurrency primitives.

One is a scheduler that receives requests to write to shared memory. Then it schedules the request then the thread marks a callback array and the next request can be served. There's scalability challenges.

ilyt · on Nov 23, 2022

> Imagine if the compiler could work out that one program has a ratio of processing of 2:1 then it could scale out automatically.

Don't think that's possible, as there rarely are programs where compute to IO ratio is constant. Even the simplest of say web API will have calls that take more CPU and calls that just wait for network somewhere.

It would be nice to have tooling to figure out whether app is "just" idling or waiting for network for scaling purposes, we already have that in form of IO wait on disk IO.

Then again Go's solution of "have threads so light you can just run 100000 of them to fill CPU even if each of them have a lot of network wait" works well enough. There is also of course async/event driven but that generally leads to code with worse readability (at best similar) and more annoying debugging

samsquire · on Nov 23, 2022

I implemented a simple round robin userspace 1:M:N scheduler in Rust, Java and C.

It has 1 schedule thread that preempts kernel and lightweight threads and M kernel threads and N lightweight threads

Hot for and while loops can be interrupted by setting the looping variable to the limit.

https://GitHub.com/samsquire/preemptible-thread

I am currently investigating coroutine transpilation with async await similar to Protothreads implementation that uses switch statements. I am trying to do it all in one loop rather than reentrant functions as in Protothreads.

Essentially I break the code up around the async/await and put different parts into a switch. I "think* it's possible to handle nested while loops inside these coroutines.

The problem I don't know how to handle is if a coroutine A calls coroutine B and B calls A. I get the growing data problem which I'm trying to avoid. I want fixed size coroutine list

I think some Microservices take more CPU than others. You can scale one Microservice more than another based on CPU usage.

Another of my benchmarks is multithreading message generation and sending it between threads in thread safe. I can get 20-100 million message sent per second, sending in batches

mrkeen · on Nov 23, 2022

> I also implemented the bank account example but I serialised the account transactions to the same account numbers in ConcurrentWithdrawer.java - this solution doesn't use multiversion concurrency control.

Damn that's a lot of code. Mutability everywhere, nulls and special integers (-1), triply-nested for-loops, direct indexing into lists. And it's all in memory? If you're in a situation where you want multiple safe writers, just use software transactional memory.

samsquire · on Nov 23, 2022

I'm sorry about the code quality, I never got around to refactoring it, I was just trying to get it to work safely and reliably. Without any money destruction or creation.

Checkout MVCC.java and TransactionC.java for multiversion concurrency control. It's far easier to understand.

It takes less code to implement multiversion concurrency control.

Software transactional memory is great and I like it but it can have scalability problems.

I really enjoyed Joe Duffy's blog series on Midori, the Microsoft .NET software transactional memory implementation.

http://joeduffyblog.com/2015/11/03/blogging-about-midori/

STM essentially can get transpiled into a for loop modifying a log area and then a CAS instruction to the actual memory.

STM can be thought of as multiversion concurrency control because the operations occur on a log.

But the problem is that most things are not transactional. How do you reverse an API call?

ghusbands · on Nov 23, 2022

While MVCC works fine in the single-account case, it fails badly in cases where you want to maintain an invariant across accounts. For example, if a person can have two accounts at a bank with a total overdraft of no more than $100 (so A+B>-100), two MVCC transactions can alter one account each, checking for the invariant, and you still end up overdrawn beyond the limit. In general, the fact that MVCC only fully handles write-write hazards can cause many problems.

samsquire · on Nov 23, 2022

This reminds me of CHECK CONSTRAINTS in RDMS.

The whitepaper Serializable Snapshot Isolation talks more of dangerous read-write structures.

My MVCC may be vulnerable to write skew even though it generates the right result. I am yet to generate a test case that exhibits that behaviour. The write skew occurs when there's two dangerous read-write structures

int_19h · on Dec 1, 2022

Why can't it check the invariant as it is reconciling the snapshot with other changes?

hedora · on Nov 23, 2022

> Imagine if the compiler could work out that one program has a ratio of processing of 2:1 then it could scale out automatically.

Cilk and Cilk++ are parallel variants of C that scale up automatically in the way you describe.

It is not hard to implement it in a scale out way in make (by following the example of distcc), and some of the big data frameworks get it right (especially ones with finer grained tasks than Hadoop MapReduce).

therein · on Nov 23, 2022

> My problem with Rust is that is precludes many safe programs. References can be alive and be passed around.

I do agree with this and it rubs me the wrong way. Sure I can put something into `std::mem::ManuallyDrop` and hand out transmuted unsafe `&'static mut` references to it but then people come and point out (rightfully) how this library is unsound.

wizzwizz4 · on Nov 23, 2022

Use something with interior mutability, like the Cell family; then you can hand out & references.

Safe Rust precludes some safe programs, but Rust's soundness rules preclude far fewer. I've only seen a couple of real-world programs that you couldn't express in Rust (without unnecessary runtime checks), and that's solely due to the hierarchical lifetimes.

smallstepforman · on Nov 23, 2022

Concurrency via transactional memory. Try and retry (potentially infinitely) until your transaction (copy) matches source. Then try to push the transaction headers …. The essence of a lock free queue …

cozzyd · on Nov 23, 2022

How does rust implement signal handlers?

jakewins · on Nov 23, 2022

Crates like signal-hook force you to use APIs that can't interact directly with the rest of your apps state, I think: https://crates.io/crates/signal-hook

aidenn0 · on Nov 23, 2022

I spent the first 10 years of my career thinking I wasn't smart enough to write multi-threaded C code. Since then I have been convinced that nobody is smart enough to write multi-threaded C code, and multiprocessing should be used instead.

cozzyd · on Nov 23, 2022

As always, it depends on what the interfaces between threads are. If they're just workers, it's easy to get right, or if you have producer-consumer queues. If everything can be shared then... yeah, good luck.

aidenn0 · on Nov 23, 2022

The problem with C (and many other languages too) is that with threads, the default is shared. A C-like language where all statically allocated values default to thread-local is already a big improvement. Now lets make it so file-descriptors can only be used by the thread that opened them and give each thread its own heap. Wow, this is starting to look a lot like a Unix process...

cozzyd · on Nov 23, 2022

#define static static thread_local :)

(though this also makes it impossible to have shared static data...)

aidenn0 · on Nov 23, 2022

Also doesn't solve global variables.

spacechild1 · on Nov 23, 2022

I am not exceptionally smart and I am certainly able to write (and debug) multi-threaded C code, but it is painful :-) I try to use C++ instead whenever I can.

BTW, "multiprocessing" is not a drop-in replacement for multithreading. In fact, there are many domains where "multiprocessing" just isn't possible or practical. So the real solution is to use another language.

Cloudef · on Nov 23, 2022

This, the moment you call into library or code you haven't written yourself, you can throw out all the guarantee your program will be thread safe, and I don't even want to go into details like program mixing forks and threads or unix signals...

Multiprocessing has other benefits too, like making your code simpler, fault tolerant (you can restart failed processess, etc...) and you can make sure OS is handling all resource cleanup. I've seen many multithreaded monsters that would've been better as multi-process architecture instead, and maybe even more faster / safer too, this is usually how actor based languages like erlang is implemented, behind the scenes everything is actually a separate process.

If you need to work with shared structure in threads, keep that part very small. Or consider if GPU or SIMD would be a better fit.

pornel · on Nov 23, 2022

This isn't true in Rust. Thread safety checks are propagated all they way through libraries and 3rd party code.

If it compiles, it's thread safe. If it isn't thread safe, Rust will show you exactly where. It is as amazing and magical as it sounds.

Cloudef · on Nov 23, 2022

True, but the above topic was not about rust. Though even in rust, if the code ever calls into C code the same pitfalls still apply. Also in rust because how everything has to take into account thread safety, programming simple or low-level stuff in it is frustrating experience. See old discussion here: https://news.ycombinator.com/item?id=33590864

pornel · on Nov 23, 2022

For set-once global variables there's OnceCell. I understand it's super frustrating if you don't know it exists, and DIYing it from first principles requires knowledge of both Rust's low-level primitives as well as OS/multithreading techniques.

OTOH it is an example how thread-safety extends to everything in Rust, and you can't be surprised by some function somewhere unsafely mutating global state.

Cloudef · on Nov 23, 2022

If you read the topic, you would see that even OnceCell did not work (unless you wrapped thing into mutex, or implemented deep copy, which is again what I did not want).

pornel · on Nov 23, 2022

You haven't provided more details beyond "doesn't work, wants Copy", which sounds to me like you've tried to move data from behind a reference, or perhaps expected a placement-new style initialization. You could post your code on users.rust-lang.org to get a solution.

It's super common for programmers coming from C to have a blind spot around strict static ownership, equate references with pointers and try to use them for "not copying", while in Rust they mean "not owning". This desire to avoid copies instead results in needless fights with ownership and the borrow checker. Rust moves never do deep copies, but Rust approaches this differently than C, and it takes practice to internalize that.

Everyone in that thread told you OnceCell would work, because it really would. People do write huge programs in Rust every day. There's a steep learning curve, but once you know the Rust-specific programming patterns, rather than try to write C in Rust, it goes smoothly.

Cloudef · on Nov 23, 2022

Expect the person who tried to do it as well (first telling me OnceCell and RWLock would work) in the end could not get it working either and gave up in half a hour. In the end I moved on, I still program in rust, but it's not a good language when you have to work on the memory level and avoid copying data, even with unsafe, which is sad because unsafe could be more powerful if it actually gave you the power to actually instruct the borrow checker at the end of unsafe block what the state of things should be now, when it can't figure it out itself.

pornel · on Nov 23, 2022

Assigning to RwLock is trivial, so if nobody was able to do that, then shared^mutable limitation was a red herring, and your issue wasn't with thread safety at all. HN-comment-based blind debugging of your program is getting pretty far from the topic of the thread and TFA…

The next most common footgun C-style expectations get you in Rust is self-referential structs, when you create a new object and then take (temporary scope-bound) references to inside of it, and try to store them long-term in unrestricted scope. That is a thing safe Rust doesn't guarantee without Rc/Arc, regardless whether it's thread-safe or not (moves can trivially invalidate such structs in single-threaded code too).

aidenn0 · on Nov 23, 2022

You don't necessarily get fault-tolerance for free; there will always be some shared resources (or you don't have a single program in multiple processes, you just have unrelated programs), so you still need to design for the requirement "when a process exits, external state is valid." In a sufficiently complicated system, this doesn't happen by accident.

I do find it a better lever for "make the easy things simple and the hard things possible" though.

hedora · on Nov 23, 2022

I’ve come out the other side of the tunnel with the conclusion that everyone should write lock free code, because it is too hard to reason about the performance of mutexs, let alone the sequential semantics of code that is too sloppy to be made lock free.

Edit: Do you actually mean multiprocess programs? Those are the ones that communicate via lock free data structures stored in shared memory, with the invariant that processes can be kill -9’ed or segfault without leaving the shared memory in an inconsistent state.

That’s like switching your compiler into nightmare mode. It’s technically possible, but first you need to master piles of obscure data structure research.

jjgreen · on Nov 23, 2022

Tosh, a lot of use-cases need shared read-only access to a large structure, mark that const* and you can thread away without a mutex in sight, each thread has it own r/w struct to accumulate results.

aidenn0 · on Nov 23, 2022

I agree, and with multiple processes that's just a single mmap() call away!

r-s · on Nov 23, 2022

I came to a similar conclusion years ago. Yes, there are examples of applications that have done it, but they often have very elite engineers. Most teams are comprised of various levels of engineers and its extremely difficult to get right.

I work more in web dev now and I suspect the majority of my team does not know the difference between a thread and a process. HN readers may scoff but these are productive engineers who have build some honestly pretty impressive apps and systems.

CarVac · on Nov 23, 2022

Multiprocessing as in openmp?

yazaddaruvala · on Nov 23, 2022

The parent is using multi-processing to mean: “no shared memory concurrent processing”

Which still isn’t quite good enough because files (especially mmaped files) can be abused as shared memory even between multiple processes. But the general point is good, “you’re better off with mutable xor shared memory while concurrently processing”.

aidenn0 · on Nov 23, 2022

Right process aren't "shared nothing" but at least it makes you work a little bit before you can shoot yourself in the foot..

aidenn0 · on Nov 23, 2022

Sorry, the other reply explained what I meant. "Multiprocessing" has many meanings but specifically when contrasted with "multithreading" it usually implies the thread/process distinction in the Unix sense of the terms.

pjmlp · on Nov 23, 2022

Multiprocessing won't help you when accessing shared resources.

gpderetta · on Nov 23, 2022

Right. And most importantly you can build (even inadvertently) the equivalent of shared memory on top of multiprocessing and message passing so you always have to be careful.

aidenn0 · on Nov 23, 2022

True, but in C, everything that isn't automatic storage duration defaults to being a shared resource with multithreading. I'm not saying that multiprocessing is enough to avoid all footguns, just that it eliminates a lot of them.

pjmlp · on Nov 23, 2022

Now do the same exercise talking to a real database over distributed computing IPC mechanisms, while ensuring everyone else is following the same rules, regardless of the programming language talking to the same database tables.