Rust has an interesting, and somewhat difficult, memory model.
First, there's the single ownership tree. Single ownership is checked by the borrow checker. You can have only one mutable reference to a data object at a time, and that's also checked by the borrow checker. If you can stay within that model, if the program compiles, it will probably work at the data access level.
But then there are all the escape hatches.
There's "Rc", which is reference counting for read-only use. This works fine for immutable objects, but not for mutable ones.
There's "Cell" and "RefCell". These allow multiple ownership, and only one mutable access at a time. Request mutable access via two routes and you get a panic, or an error if you use try_borrow();
The static analysis of the borrow checker is not able to figure out when you're headed for trouble via this route.
There's "Arc" and "Mutex", which allows access across thread boundaries. Request access via two routes in the same thread, and you deadlock.
Designing the data structures for something tightly coupled with concurrency, like a window manager or a game, is difficult.
"Memory model" is a term with a lot of different, but partially overlapping, meanings. The one being used in the article only concerns the logical arrangement of data, not rules about ownership and life span.
For the purposes of this article, Rust's memory model is no different from that of Haskell or C++: mostly what the article dubs the "lisp memory model", with some excursions into things like the magnetic tape memory model.
It's worth being pedantic here because, if you don't recognize what TFA means by "memory model", you're maybe not well positioned to recognize the deeper insight about computing that it's trying to present.
Right. This is not what most people these days mean by "memory model" and indeed what I expected based on the date was an article about Consistency Models, maybe talking about the C++ model, the Java model, how they relate to hardware.
I think the article doesn't end up offering a deep insight because at a theoretical level all of these are equivalent (which it acknowledges) and at a practical level what matters is performance. And the practical implications depend on the physical manifestation of the computer. That's an electronics thing not a programming thing.
Pointer chasing was a plausible strategy when your physical computer takes a very similar amount of time to read the next sixty-four bytes from RAM versus chasing pointers to eight sets of eight bytes randomly sprayed around the machine. But today that's not just "a bit slower" it can be literally thousands of times slower.
Because today the next sixty-four bytes are probably in your nearby cache, whereas hopping around madly will cause loads from all the way over in the DRAM, and that takes forever. Worse, with pointer chasing you are waiting for one load, to get the address for the next load. So you don't even get to schedule all the waiting up front and get on with something else in parallel, it's just a series of stalls.
On one level, yes, that's true. On another level, I think you may have left out the human factor. To me, the most important thing about these sorts of models is that they're mental models - they don't govern how the computer language works so much as they govern how the computer programmer works.
For my part, I think about very data (and, by extension, memory) differently depending on whether I'm working in Java or Forth (admittedly it's been a while) or SQL or APL (admittedly not much experience here, unless you're willing to charitably allow me to count numpy). Apache Spark is perhaps yet another model; I'm not sure how to reconcile it with the six in the article. And the different ways of thinking lead to fundamentally different approaches to domain modeling when I'm working in each language.
Reading the article actually helped me understand another angle to my disdain for object-relational modeling: By trying to hack the relational database into being just an object store, you end up sacrificing the unique data modeling possibilities that the relational model offers, and replace it with a rather poor implementation of object graphs. It's difficult for me to see that sort of thing as a clear win. If you want Gemstone, just use Gemstone.
Right but you're talking about an abstraction and the abstraction will leak at least somewhat.
A modern language can present you with what appears to be a doubly linked list, but is actually just a veneer over a data structure that actually performs well on the hardware. But of course you'll write programs that treat it like a linked list, so they'll have markedly worse performance than was necessary and worse, you may try to "improve" this based on your erroneous mental model and make things even worse.
I think that's where you got to with ORM too. The object store is an abstraction, but it somewhat leaks the fact that there's actually a relational database underneath. If you really just wanted an object store, this will be slower and clunkier than it should be, and if you really wanted a relational database you shouldn't have ORM.
So this is the problem, exposing the right abstractions. Making easy things easy, and yet not making difficult things impossible. Trying to discourage the programmer from using a huge fixed size array and a counter when what they actually needed was a growable vector, but not preventing them from using that fixed size array with a counter if that genuinely is the correct way to solve their problem...
SPARK is weird; to the extent that I understand it, it's sort of more like make than like any conventional programming language.
I think the lifetime stuff is a bit fuzzy: count it in, count it out? I mean, what I'm calling the "Lisp memory model" has the crucial advantage that you can run a garbage collector, or you can do C++-style or Rust-style ownership to manage deallocation, which is not really a thing that makes sense with parallel arrays or nested records.
What I mean is that, though you can garbage collect a whole array, if you have an index array X such that Y[X[i]] contains the Y of entity i, the runtime can't automatically compact Y to remove the indices to which X does not point, because you could generate indices into Y in some other way, for example by counting up from 0. Automatically compacting Y in such a way would be the parallel-arrays equivalent of garbage collection in the Lisp memory model.
It's not like the lisp runtime is going to notice that you won't be using specific values anymore and start chopping them out of the middle of your lists, or the Java runtime is going to do that with specific fields in a live object.
In the Lisp memory model, we map a conceptual entity to a memory address where we allocated a block to hold data about the entity. With parallel (Fortran) arrays, we map a conceptual entity to an index at which data is stored about the entity in various arrays.
In the first case, forgetting about an entity so we can reuse the memory is just deallocating a block. This is relatively straightforward, to the point that we can automate it with a garbage collector—though it's not so straightforward that Valgrind never finds bugs when we do it by hand.
In the second case, forgetting about an entity is not impossible, but is generally considerably more involved. Even in the relational model, which one could reasonably claim is a sort of systematization or rationalization of parallel arrays, lifetime management with foreign keys is sometimes tricky.
With a pure parallel-arrays model, you don't have any pointers/references, so you can't really run a garbage collector at all. (However, for example, Microsoft BASIC-80 had a kind of garbage collector for strings, despite its semantic model at the user level being devoid of any kind of referencing.) With something like Numpy, you can run a garbage collector, but your use of parallel arrays prevents it from reclaiming space dedicated to data about forgotten entities. Instead, it can reclaim space dedicated to forgotten attributes.
Yes I have found that Rust's ownership system works really well for single-threaded cases, especially when you can get away with a functional-ish programming style, but when you have large amounts of data which has to be shared between different parts of the program it starts to get tricky.
As you mentioned game programming, one common approach there is to have large arenas of structured memory which you use thread-pools to dispatch work over. I know there are some libraries which do this in Rust, but it seems like there's room for some language innovation here to figure out a memory model which would make this style of programming simple and ergonomic while maintaining safety
Is it really different from C though? (But with some runtime intervention instead of UB)
Rc and Arc are maintaining the “be sure that you'll only free this once” invariant automatically. In C you have the same invariant, you just have have to maintain it by yourself (or use an ad hoc reference counting mechanism) and if you don't it's a double free (UB). And in both cases you are responsible ensure the memory is freed at least once.
Mutex ensure you don't perform one read and one write at the same time. This is also forbidden by the C memory model and you'll probably end up using a mutex too (or other concurrency primitives) with the exact same risk of deadlock.
C allows much more flexibility wrt lifetimes than does rust.
For example: a common pattern in video games is the frame-local allocation. You (cheaply) allocate some object at the beginning of a frame. Share references to it all over the place. At the end of the frame, you (cheaply) free all the objects that were allocated in this manner. This cannot really be expressed in terms of rust's ownership system.
More generally, rust enforces local correctness, while c allows for global correctness (a weaker condition). Consider, as a simpler example, this c function which I write frequently:
char *tmpstr(void) {
static char buf[16][1024];
static int i = 0;
char *ret = buf[i];
i = (i+1) % 16;
return ret;
}
This function is not locally correct; in a vacuum, it is impossible to say whether this function violates invariants. (Ignoring the problem of overflowing the 1024-byte buffer.) There are many programs using this function which are correct; then, there are some programs which call this function more than 16 times in a row and end up with undesirable behaviour as a result.
Rust requires that every function be locally correct. Papering over some details: for any sound function f and hole-containing program P[_], if P[f] is well-typed then P[f] is sound; that means that it's always meaningful to consider f without considering the context in which it will be called. There is no such guarantee in c.
As a sound language, this is a very useful attribute for rust to have, because it means that libraries can be meaningfully type-checked (not to mention it significantly simplifies the type system), but it would be naive to say that rust simply formalizes the ownership model you would have used anyway in c.
Yes, of course you can express this in rust. The topic at hand is whether it can be expressed in terms of rust's ownership policies. Your example shows that it cannot; it returns an object of static lifetime, avoiding the issue completely.
The topic at hand was about the comparison between C and Rust in terms of memory model. Ownership policies are one restriction safe Rust uses to ensure that the user is adhering to its memory model, but Rust is not limited to safe Rust and you can op-out ownership policies if you use raw pointers.
The difference between Rust and C in terms of memory model isn't ownership, it's “mutable xor shared”.
I cannot understand your question about "ownership policies". Rust have only one ownership policy: one owner per object.
It's possible to use pointers to circumvent the restriction and then control when object is destructed using ManuallyDrop[0], so burden of memory bookkeeping will be shifted from Rust compiler to the programmer.
> C allows much more flexibility wrt lifetimes than does rust.
Because C does not have lifetimes at all, or ownership as a formal langage concept, and thus does not run head first into the completeness problem.
Which is one of unsafe’s use cases: when the compiler can’t prove the code is correct (because there will always be correct programs the compiler can’t prove, and there are currently a fair bit of them), and you can, and there are no suitable safe alternatives, you use unsafe.
> For example: a common pattern in video games is the frame-local allocation. You (cheaply) allocate some object at the beginning of a frame. Share references to it all over the place. At the end of the frame, you (cheaply) free all the objects that were allocated in this manner. This cannot really be expressed in terms of rust's ownership system.
Of course it can, though it requires APIs which are not there (in the stdlib, yet): a frame local allocator and custom per-object allocator. All frame-local objects use that and thus keep a reference to the allocator.
And it avoids storing frame-local objects in locations which outlive the frame.
> This function is not locally correct; in a vacuum, it is impossible to say whether this function violates invariants.
That is also a use-case for unsafe: an unsafe function is one for which the caller is responsible for checking invariants.
This is unsafe anyway because modifying globals is unsafe to start with.
>> cannot really be expressed in terms of rust's ownership system
> Of course it can
You cannot share objects using rust's ownership system.
Interior mutability involves an unacceptable performance hit (and doesn't actually allow you to share anything). And if you use shared indices into a centralized array, then you have blinded the ownership system, creating pointers that it cannot make sense of.
>> This function is not locally correct; in a vacuum, it is impossible to say whether this function violates invariants.
> That is also a use-case for unsafe: an unsafe function is one for which the caller is responsible for checking invariants.
That's exactly right, and was my thesis; rust's type system does not allow you to prove the correctness of such a function, because it is neither correct nor incorrect without context. 'Unsafe', as a subversion of the type system which allows unsound behaviour, is completely uninteresting wrt the topic at hand; the comparison is a purely theoretical one between non-'unsafe' rust—that is, rust's ownership model—and c.
Interior mutability does not have uniform performance characteristics, Cell has a very different cost (that is, none) over RefCell (which involves checking a flag when accessing). Etc etc.
Unsafe does not allow unsound behavior, or rather, it lets you write unsound behavior but that is a bug. You still have to uphold various invariants in unsafe code. It’s there for when you know you can, but the compiler cannot, which is an inherent property of any sound system.
> Unsafe does not allow unsound behavior, or rather, it lets you write unsound behavior but that is a bug. You still have to uphold various invariants in unsafe code. It’s there for when you know you can, but the compiler cannot, which is an inherent property of any sound system.
I don't understand what you're saying here.
Rust-without-'unsafe' is a sound language; i.e. any program which types will be unable to violate runtime invariants.
Rust-with-'unsafe' is an unsound language; i.e. there are programs which type but yet violate runtime invariants.
That is a bug in those programs—sure, yes, of course. It is not a bug in rust; it is the language functioning as intended. However the topic at hand is the relationship between rust's type system and c's; hence unsafe is completely uninteresting, because it is a tool for subverting rust's type system.
(It occurs to me now that I might have gotten a less generally defensive response had I made the original comparison between rust and a hypothetical rust without unsafe. Oh well)
> However the topic at hand is the relationship between rust's type system and c's; hence unsafe is completely uninteresting,
Gotcha, I was just trying to clarify the semantics of unsafe, lots of people think it suddenly lets you do whatever you want, and while it will let you, that doesn't mean that you can, which is an important distinction. You're right that this puts it in the same general category of C in a theoretical perspective, I think practice will bear out that it's not a binary proposition, but time will tell :)
If manually managing memory is like wielding a gun, the borrow checker is an automatic safety that prevents you from pulling the trigger when you're roughly pointing it at yourself. But it's coarse-grained and errs on the side of caution; it simulates your foot as as the rectangular box that would contain it, not as a detailed 3D mesh. If you really think you can aim it between your toes and avoid hitting yourself (for example, "the value returned by this function must remain alive for no more than 15 successive invocations of this function"), unsafe will let you try, but the borrow checker's built-in rules isn't granular enough to help you, though it will still stop you if you accidentally put your hand in front without declaring it.
C is like a laser gun in a room full of oddly-shaped mirrors with no safety at all. Good luck.
> it's coarse-grained and errs on the side of caution
That's what type systems in general do. There is no type system for a turing-complete language that is both sound and complete. (Trivial proof: if (undecidable) halt; unsound.)
> That's exactly right, and was my thesis; rust's type system does not allow you to prove the correctness of such a function, because it is neither correct nor incorrect without context. 'Unsafe', as a subversion of the type system which allows unsound behavior, is completely uninteresting wrt the topic at hand;
For the record, the “topic at hand” was the memory model not the type-system, nor the theorem proving capabilities of the borrow checker.
Your arguments are not wrong per se, but they are completely missing the topic.
Upon further reflection, I don't think 'local/global correctness' distinction this was expressed very well or very usefully. The 'hole' example papers over so many details as to be misleading.
So here's another, hopefully better expression:
Rust's type system wrt lifetimes is not very expressive, compared with 'dynamically-typed' c. It can't represent things like 'the value returned by this function must remain alive for no more than 15 successive invocations of this function'.
A lot of useful C programs can be written without dynamic memory allocation or multithreading though, while Rust seems to assume a sort of "base complexity" even for very simple programs as soon as memory access is involved.
Rust doesn't assume an allocator. The Rust standard library (to have things like growable strings) requires an allocator but the language and its core do not.
And Rust doesn't assume threading either. The default stuff isn't even thread safe, it's just that in Rust when you introduce threading and still try to use stuff that isn't thread safe the compiler will tell you that can't work.
You can deliberately write stuff that won't do what you meant of course, but they're trying to make it harder to do that by accident. I have a set of Rust types which defy logical behaviour (for example type whose members always compare as equal yet hash differently) and as intended if you use those in otherwise safe code your program might not work but it doesn't cause random craziness. For example maybe your attempt to binary search an ordered list of my data types that claim they form a total order and yet always reply "I'm smaller" when asked how they compare to anything, even themselves - will become an infinite loop, but it won't suddenly execute environment variables as code or anything like that whereas in C all bets are off.
> And Rust doesn't assume threading either. The default stuff isn't even thread safe
Right, and Rust doesn't even know what a thread is actually. It's all implemented in the standard library. The only thing Rust know, is that there exist some special Traits (Send and Sync) which are automatically implemented if a struct field implements it itself. And that's it.
> The static analysis of the borrow checker is not able to figure out when you're headed for trouble via this route.
This route exists purely because you can't statically prove some programs correct. Without it, the compiler would have to reject some correct programs and the language would be more limited.
This is unfair to complain about it when the GC-based alternatives push all of memory management to runtime, so the surface for getting panics is a lot larger. Rust's solution is not perfect but still the best you can get.
> Designing the data structures for something tightly coupled with concurrency, like a window manager or a game, is difficult.
It is difficult in any language. The difference is that in Rust you face that difficulty when trying to design/compile your program, and in other more permissive languages you face it when your customer reports a bug. The cost of fixing problems is lowest when they are detected early, so Rust has a clear edge here.
I'm not sure someone (including OP) would be "complaining" about it, considering that "this enforces borrow checker rules at runtime" is like the first or second thing in the documentation for RefCell and friends.
The frustrating bit about it is not that you get panics or deadlocks at runtime, that is to be expected when you can't get them at compile time, after all. The nasty bit is that this memory model makes it very difficult to work with tightly-coupled, concurrent state, and you have to go through all the RefCell<Rc<Whatever>> escape hatches, at which point you're not that far from C++ hell (edit: i.e. you "pay extra", in the form of the awkward escape hatches, to get not that far from where you were before).
In theory (or, if we're being realistic, in introductory tutorials) it's possible to avoid the escape hatches if you are careful to structure access to your data juuuuust right. That works out fine if you're the only person working on an infrequently-changing program with a small model that you understand very well. If everyone needs to hook into the first two or three function call levels of the main loop, and if your model is anything but small, that doesn't work out anymore. Six weeks into the whole thing and people spend more time un-breaking builds than writing code.
Also, seriously, not every comment that says "I wish Rust did X better" is a wanton attack upon the technological excellence and purity of the one perfect language to rule them all...
I have written a few parallel, non-trivial Rust programs (actually one of them, fclones seems to be one of the best in class performance-wise, despite many competitors), yet I've never had to use Refcell and runtime-checked borrowing.
Yes there were a few Arc here and there, but 99% of time the simpliest move / clone / compile-time-borrow are enough. Definitely very far from the place I'd be if I coded that in C++.
Oh, I knew I'd recognized your username from somewhere :-D. fclones is actually my favourite dupe finding tool!
In my experience the underlying model is what determines these things. That's not always under one's control. Sometimes you can't meaningfully trace ownership, or the development model (short iterations by a large team) make it hard to keep it in sync. I know of non-trivial programs that rarely (if ever) use runtime-checked borrowing but runtime-checked borrowing is, nonetheless, a thing. Some developers need it.
Oh yeah, definitely. It is good to have these all escape hatches at hand. But it is not good to default to them - I guess you could program Rust like Java by throwing everything into Arc<RefCell<..>>, but that's not they way to go IMHO.
In your experience, what would be a better approach to dealing with “tightly-coupled concurrent state”? I mean, one that a “sufficiently smart compiler” can verify, not just hoping it works at runtime (as in C)?
One obvious approach is to de-parallelise the state (i.e. actor model) but that’s almost cheating (“we solved the problem by removing the problem”) and probably not appropriate for some applications that truly require concurrent state.
Also, are you aware of any codebase that I could take a look at, that is an example of concurrent state?
I'm gonna go with the unpopular opinion here: the "better approach", as in, the one that allows you to have arbitrary state coupling and take the human factor of the equation, is garbage collection.
Rust's trade-off (you get compile-time guarantees in exchange for requiring you to structure data accesses in a particular way) is perfectly valid, but it's still a trade-off. As with any trade-off, you run into the part that's traded for something else sooner or later :-).
Non-deterministic garbage collection applies to a tiny fraction of the state problem though - the memory part. It does not apply to other types of resources: file descriptors, sockets, database sessions, connections, threads, etc. - generally anything that needs proper disposal and is not (only) memory.
It is also worth noting you can perfectly have garbage collection in Rust - reference counted pointers or epoch GC. Refcounted pointers have the advantage of being deterministic.
The ownership is part of the type-checker and disappears at runtime.
At runtime (and in memory) you are left with C's memory model (pointers but also can nest records) and with the other things you mention.
I don't immediately see why something like a window manager might need a data structure which requires multiple concurrent writers. Or, while at it, why can't events coming to each window be perfectly serialized and not require any concurrent mutation. Could you share an example?
Serialization does not eliminate concurrency, it is a strategy for dealing with it. That is, when your abstract model requires concurrent writes, you can instead add the operations to an event queue and process them serially. The event queue itself is still concurrently written to. You also still face complex problems with dependent writes - e.g. when two writers want to increment the same place (even worse when they need to apply a more complex function than incrementing, and worse of all when they want to apply a non-pure function to the current value).
I agree, but I still can't see where a WM might need to concurrently write to a per-window event queue. Also, I can't see why pushing onto a queue can't be blocking (with a timeout) and use a mutex, if multiple threads need to write to it.
Wow, I'm surprised I never ran across this excellent essay before.
I also liked the use of the term "memory model" in this case to mean conceptual model, because the metaphors and affordances of a language are what give (or restrict) its expressive power and the level of abstraction you can bring to bear. A mixed blessing.
Thanks, that means a lot! I think it kind of falls short of some of its initial promises, and clearly needs revision, but there are some good nuggets in there. As one example of what needs revision, I drew the diagrams after I wrote the text, and I realized that what I was describing in the text was far more complicated than what I really wanted to draw... but then never went back and revised the text.
If this is the kind of thing you like, you might find more things to your taste in http://canonical.org/~kragen/dercuano, which touches on, among other things, the bizarre memory model of the Zork Z-machine.
for a long time, I have been pondenring about a high-level 'programming' languages about structured data. It is by far finished, but I am attempting to deal with the problem of different memory models from the other end: starting with an ideal model trying to abstract from implementations.
I have not address the problem of transactions and cooperation yet. Often the problem with implementation of data models start becoming 'interesting' when several actors at the same time are querying and changing it, especially when the dimension of distribution is added.
Is Dercuano available as individual HTML pages that can be clicked around (and indexed by search engines)? So far I've only found it as a giant blob (PDF or tar) and I'm wondering why that seems to be its primary presentation.
Hi Ping! I miss you a lot! The primary representation of Dercuano is a giant blob both to improve the chances that copies of it will survive after I'm dead and to more clearly establish a publication date for purposes of patent prior art. The PDF isn't very good (I hacked together a PDF-rendering script with reportlab in five days before the deadline, and even the CSS for the HTML version is pretty amateurish) but there are a variety of places that can view or archive PDFs but not tarballs of HTML.
I’m only about half way through, but I’m finding it extremely helpful, clear and well-written (some of the terminology/ideas I’m not familiar with/are over my head, but I plan on going back over it again later and looking stuff up :) - Thanks so much!
Not at all! I’m obviously not qualified to say, but it seems to me to be a great overview of a very complicated topic - clear, concise, comprehensive and not patronising. Stuff I don’t understand off the bat I can always look up, but it gives me a framework to begin to hang stuff on/explore from. Even if you had written several tome-like textbooks you couldn’t have compensated for my ignorance of the basic principles of computer science/informatics/programming etc.! :)
As a compiler writer, when I hear the term "memory model", I immediately assume it's referring to a multithreaded memory model, and is going to discuss when writes are guaranteed to be made visible to other threads.
A better term for what's discussed here would be the "object model"--you're effectively describing how you would represent a conceptual object in different memory organization schemes.
Yes. But we mostly have one memory model today - one big flat memory space, because C.
There are other approaches, but most of the clever ones died off. x86-32 has lots of interesting things you can do with segments. Nobody used that stuff, so amd64 has flat memory.
Burroughs once had a memory system that worked like pathnames. Think of an address as being "job/program/module/function/variable[n]", not a number. User programs could not see the physical memory address. This allowed the OS to do swapping and compaction underneath the program. The hardware would catch subscripting out of range. Worked fine, never caught on.
Today we do that with collection classes, rather than at the OS/hardware level.
Memory models are making a comeback, as we hit the limits of caches. Machines with some memory shared, some memory unshared, and some slower if shared have appeared.
> But we mostly have one memory model today - one big flat memory space, because C.
Not really. C (or C++) memory model isn’t really well-defined yet, there’s still fights going on regarding e.g. values out of thin air (or maybe they’ve been resolved recently, I haven’t checked).
In contrast, Java has a well-defined memory model without any undefined behaviour, that can get quite tricky at times (e.g. what happens if you write to a final field (!!) (you first need to make it non-final using runtime reflection operations or use the Unsafe module)).
Interesting, thanks! I'll read the paper in its entirety later, but suffice to say, me and Hans disagree about the fact that all data races are errors, or what a reasonable compiler is.
For example, take this code:
// thread 1: // thread 2:
with lock(counter_lock): with lock(counter_lock):
print(counter) atomic { counter += 1 }
There's one of two possible linear executions of this code: either you print the old value of the counter (n), or the new value (n+1). That's it.
I argue that this code is correct, and should remain legal, even if locks are elided (provided that the write to counter is actually atomic, i.e. doesn't result in a torn write / read). In fact, if the lock is implemented as a spin lock, that's exactly what happens even with locks! I also think that no compiler is "reasonable" if it breaks this code, or invalidates the whole program ("undefined behaviour means anything can happen").
Edit: He actually mentions a similar problem in the paper, and argues that it's valid for the compiler to "optimize" the code and break it. I think the original problem here is that C++ compilers implicitly assume the right to perform global optimisations. I would argue that it's better to move to a local optimisations model, because it's easy to modify the code to make it locally optimisable (just read all variables and make local copies of them, and adopt Rust-like ownership data model for structs/arrays etc. so the programmer can communicate to the compiler the fact that the block of memory is local).
The pseudo-code you listed does not contain a data race by the author's definition since there is no simultaneous read/write assuming words have their usual meanings. As a result I believe it would not be UB in C++ today.
The protest that languages like C or C++ should just leave Undefined Behaviour alone so it will Do What You Mean is probably almost as old as I am and even more futile. You don't actually want that, the languages go as fast as they do because optimising compilers assume undefined behaviour can't happen.
The Java model tries to deliver what you wanted here and it fails. You still get surprising behaviour sometimes (defeating the objective of ensuring that the behaviour with data races is something you can reason about) but now you can't do certain optimisations that seem otherwise reasonable because they defy the promises the model made.
There are people in this space, who believe if they tighten the model even further they can get rid of the surprising behaviour while retaining most performance. That's where OCaml seems to be going. Maybe it will work.
The Rust model is the opposite, Rust evades the problem with its rule about modification XOR multi-reference. Either one thread can write this data, or one or more threads might read the data, but never both and thus no data races. Instead of ensuring you're able to understand the residual data races (and maybe conclude they're "benign") Rust ensures your program doesn't have any.
The pseudocode isn't UB because of the locks, but my argument is that it shouldn't be UB even if the locks were gone. In that case, there is a simultaneous read/increment, but it's still obviously linearizable exactly the same way as with locks, so it still shouldn't be UB. (Or, if the memory fence caused by atomic increment instruction makes it non-UB, then just imagine overwriting a flag without any fences instead.) In Rust it's not even possible to write such code.
Another example is Figure 5 in this paper [1] (referenced from Bohem's paper above):
x = y = 0
// thread 1: // thread 2:
a = x; c = y;
b = x; x = c;
if (a == b)
y = 1;
My point is that the assumption that a == b is invalid if x is a global variable. That's not a valid assumption for the compiler to make, ever, unless it can prove that x isn't accessed from different threads (as a human would have to, to manually perform the optimisation).
I haven't been tracking OCaml's progress too closely (multicore is just taking too long), but hopefully it's going in this direction... if you've any relevant references / descriptions of their memory model, let me know!
> In Rust it's not even possible to write such code.
What does "such code" mean though? Rust won't let you say that your mutable variables are just shared between threads, because that's unsafe. But if you say OK, I want atomic data types (and so they lack tearing and other things you were bothered about) then you can clone an Arc of them, and write code to manipulate the atomics whose consequences are just as under specified. It's just that now you have to spell out that this is what you wanted, writing e.g. std::sync::atomic::Ordering::Relaxed
If you later complain to me that this code surprised you because the memory ordering wasn't what you expected, I will point at that symbol. If you wanted some specific ordering then "Relaxed" was wrong.
> My point is that the assumption that a == b is invalid if x is a global variable.
No it isn't? This is an important optimisation, which you would prefer to sacrifice rather than just writing what you mean in the first place.
> if you've any relevant references / descriptions of their memory model, let me know!
Sure, however now your atomic has 'static lifetime so it lives forever and that's sort of cheating. The reference counter means nobody in particular needs to be in charge of cleaning up, whoever has it last will drop it. Call it personal preference.
Also, is the broken kinda of look of your blog an aesthetic choice or a bug? Is "![Today%20is%20my%20first" really what that ought to display under the title ? I feel like it probably isn't.
I don't see how 'static is cheating, that's the semantic you want: to be able to live as long as the various threads. If you used scoped threads you wouldn't need 'static, but those are less general, so I didn't get into it.
No, it's just buggy. :( Thanks for mentioning though, that bug is not one I had seen just yet.
> C (or C++) memory model isn’t really well-defined yet
To the limited extent this is true, it's about pragmatism.
In Java if the situation is that correct Java now has to be 5% slower then a new JRE is shipped that's 5% slower and some people are unhappy but life goes on.
In C++ if the situation is that correct C++ now has to be 5% slower then C++ programmers will mostly demand an "Incorrect-but-faster" setting on their compiler, and they continue to use the faster code.
So, in practice the fine details of the memory model must reflect the practical reality of the physical computers or else they're useless.
> But we mostly have one memory model today - one big flat memory space, because C.
This is not what is meant by 'memory model'. What is really specifies is how memory operations can ordered and when a write made by one thread is seen by another, acquire, release, relaxed etc.
That’s not what Gp is talking about. « Memory model » in a multithreaded context is the guarantees about memory-level interaction of concurrent threads.
I think the flat memory model predates C significantly. Didn't the earliest LISP (195x) already have a flat, garbage collected memory? In fact, C is one of the few languages which actually exposes the possibility of a segmented memory model to programs.
This topic is outside the realm of my knowledge, but at the language level, we do have that (compacting GCs), and I think we do have it at the OS level as well due to virtual memory (but hardware accelerated TLB).
Yeah, "memory model" is a terribly overloaded term (myself, I think small, compact, medium, large, huge), and "object model" has the same problem. Maybe I should just generate a random neologism.
"Memory model" has yet a third meaning, referring to interference between operations in multiple CPU cores, and mediated by variously-shared parts of cache systems of varying complexity.
Here, complexity of the memory model is traded off against complexity of the primitives to manage the interference, (compare-and-set, load-locked/store conditional, etc.) and with the complexity of the cache system.
Java's original memory model, perhaps the first formally specified, turned out to be unimplementable on common actual machines. The next language to get a formal memory model was C++, followed shortly by C and then others.
Real multicore machines have sorted themselves out into two models of cache coherency; one more forgiving represented by mainstream x86 and now Apple M1, relying on deeply complex cache designs, and the other by the remaining lottery survivors, non-Apple ARM, and Power. The various RISC-V implementations will probably straddle the divide, like ARM/ARM-M1.
Java, C++, and C can with some discipine be coded to work transparently on either, but very often programs accidentally depend on the x86/Apple model, never having been tested beyond it. Rust, absent "unsafe", is happy on either. The Linux kernel works on either, and once supported other more exotic variations.
The "pipes" has some better examples - Lucid is a fantastic example.
My own programming language Intonal (https://intonal.io/text) is a practical continuation of Lucids ideas, in the first steps of seeing production use, combined with a "Haskell-lite" type system (control structures, typeclasses, second order functions, custom data types/enums, etc) - Turns out the "pipes" or data flow approach works really well for audio, also turns out there was/is a lot of to figure out about how memory works in such a language.
So the "program Fortran in any language" (using only arrays as illustrated in the linked paper[1]) is interesting. I didn't know it was called that since I didn't learn Fortran. (I just flashed back to a memory of someone I worked with around 2000 who said he wrote Fortran in Perl.)
Once, I was in a hurry to write a Tetris clone using pygame library, and was just starting to learn Python, so I read just enough of the Python tutorial to get started (but didn't get to classes) and then wrote it without any classes (structures), and just used all parallel arrays instead. It reminded me of the way I originally learned to write text adventures in Basic (from a short book in the 80s about how to do that, by Christopher Lampton).[2]
I've been interested in Forth lately, so it was interesting to see him bring that up as well.
i am going to echo a lot of the comments here and say that this is an eye-opening essay. it got me to thinking things that i would not have otherwise. that is a very high bar to clear, which very rarely happens. for any random bit of text on the internet you might choose to read, you can't expect much more than that. in that respect, it is a wild success.
on the other hand ... i believe kragen has deep knowledge of this concept he's brought up, but he has failed to convey its specifics to people who don't already understand it. i acknowledge i may well be too stupid to grasp all this, but it would have been a nice bonus if i could have.
i often wonder: why does any reasonably worthwhile program, the type other people will pay for, always require at least 50,000 lines of code? is that really necessary? surely we can do better than that. but if there's a way, apparently nobody has found it yet. i hope people like kragen help push the state of the art forward, so we might someday find a way around that.
> i often wonder: why does any reasonably worthwhile program, the type other people will pay for, always require at least 50,000 lines of code? is that really necessary? surely we can do better than that. but if there's a way, apparently nobody has found it yet. i hope people like kragen help push the state of the art forward, so we might someday find a way around that.
It's unlikely you'll get many people to pay for automating a simple process, so "serious programs" will generally only be the ones that automate complicated processes. So from the start, the pure algorithm you're going to implement will usually be complex. Then, to actually implement that algorithm on a real machine, you'll have to deal with all of the specifics of that machine as well, which adds significant boiler-plate even when coding in Python or Haskell, not to mention programming in C++ or Rust. This boilerplate typically grows exponentially with the size of the algorithm you have to implement, and with the size of the data that will be processed.
It's also important to remember that many "simple" operations, such as "print the date one hour from now", or "read the user's name" hide huge complexity encoded in human context and history.
> It's also important to remember that many "simple"
operations, such as "print the date one hour from
now", or "read the user's name" hide huge complexity
encoded in human context and history.
obviously true. but are those operations more complex than, say, what a modern cpu does? think of the effectively infinite number of details of computation that the cpu i am using to
type this message is hiding from me. i don't need to know. it's all taken care of. but for some reason, we can't embed "print the date one hour from now" into the operating system, so that particular bit of code only has to be written once and then reused infinitely, like my cpu's microcode?
programmers know why this is harder than it looks, of course. but is it really unsolvable? i hope not.
The "lisp" section seems to conflate "object graphs" and object-oriented programming. It talks about classes while talking about garbage collection and before mentioning "object-oriented".
Lisps, Haskell and ML have different model than the other languages in the list and don't have classes, objects are usually arrays of pointers with no special fields. Also, hash tables are usually implemented inside the language.
First, there's the single ownership tree. Single ownership is checked by the borrow checker. You can have only one mutable reference to a data object at a time, and that's also checked by the borrow checker. If you can stay within that model, if the program compiles, it will probably work at the data access level.
But then there are all the escape hatches.
There's "Rc", which is reference counting for read-only use. This works fine for immutable objects, but not for mutable ones.
There's "Cell" and "RefCell". These allow multiple ownership, and only one mutable access at a time. Request mutable access via two routes and you get a panic, or an error if you use try_borrow(); The static analysis of the borrow checker is not able to figure out when you're headed for trouble via this route.
There's "Arc" and "Mutex", which allows access across thread boundaries. Request access via two routes in the same thread, and you deadlock.
Designing the data structures for something tightly coupled with concurrency, like a window manager or a game, is difficult.