May. Not will. The difference here is important, because the actual memory ordering presented is an issue of hardware implementation choice (and of course the local vagaries like cache line alignment, interrupt order and the behavior of other CPUs on the bus). You can't just write some sample code to demonstrate it and expect it's going to work the same on "ARM".
In fact I'd be really curious how Apple handles this during the Mac transition. I wouldn't be at all surprised if, purely for the sake of compatibility, they implement an strongly ordered x86-style cache hierarchy. Bugs in this world can be extremely difficult to diagnose, and honestly cache coherence transistors aren't that expensive relative to the rest of the system.
In practice that's the stuff that's expensive.
Slightly longer version of the same claim here:
 I prefer the term core-local because I think it's somewhat more accurate as it can include things like the delayed processing of invalidations and sibling core interactions in SMT which might not fall under "pipeline" effects but are still local to the (physical) core.
I can guarantee it's not for "everyone making CPUs". I'm literally writing code as we speak on a multi CPU cache-incoherent system. It's a big world.
(not a hardware guy
assert!((samples as u32) <= u32::MAX);
EDIT2: As I thought casting a `usize` which is 64-bits to a `u32` causes it to be truncated and hence the assertion is always true. Further by using a number that's bigger than a `u32`, this example contains undefined behavior. This is due to the use of `slice::from_raw_parts` where `self.samples` is left as a `usize` and hence takes a much bigger slice than what was allocated (the leftover of the truncate operation). I made a small playground which demonstrates the segfault. https://play.rust-lang.org/?version=stable&mode=debug&editio.... The assertion should rather be:
assert!(samples <= u32::MAX as usize);
I think this issue might prove a problem in the long tail of desktop and server software running on ARM.
A lot of desktop and server applications try to take advantage of all the cores. Many times, they are using libraries that were either implemented prior to C and C++ having defined memory models or else without that much care for memory model as long as it ran without issues on the developer computer (x86) and server (x86). Going to ARM is going to expose a lot of these bugs as developers recompile their code for ARM without making sure that their code actually adheres to the C/C++ memory models.
Developers will become more aware of the differences between the architectures, tool chains will accommodate both better, people and software will stop assuming they are running on x86 as default. ARM won’t “win” the desktop or the server market, but it will become a viable alternative, squeezing the profits of companies who depend on x86.
That remains to be seen.
Also Raspberry Pi was a popular choice for many tinkerers for years which also helps with ARM penetration.
Most libraries and applications use stuff like mutexes, btw :) it's not like people who don't care about memory models try to make lockfree things often.
Also, Multi-threaded programming is hard, so the really long tail probably already is buggy on x86. Surfacing their bugs more often may be a blessing in disguise.
It is true though that with near total x86 domination i the server market for the las 20 years, newer server software might have a lot of x86-isms.
It wouldn't surprise me if server focused ARM chips ended up providing x86 style ordering to ensure compatibility with ported software.
There's still plenty of room to undercut Intel's historical 60% gross margins even if you're shipping largely interchangeable products.
The other way it might play out is a race to very high core counts as the main differentiator, providing single socket performance worth the hassle of not being able to run everything on it in a rock solid way. Postgres will work great. That redis fork that adds threads maybe not.
Or Intel may start to allow developers to relax memory correctness guarantees on a process by process granularity in their own progress to high core counts. It's hard to imagine their current methods scaling to 1k cores. But if you asked me ten years ago I'd have said they wouldn't have made it to 56 cores, either.
If Intel can allow developers/OS's to relax guarantees, why can't ARM allow OS's to strengthen them as necessary, while still keeping a relaxed memory model most of the time?
The amount of testing and verification in making sure something like Postgres runs well on a different memory order more is substantial. Adding the flag at the end is trivial.
This way around, software is correct by default unless it explicitly asks to be unsafe.
Having a cache coherence system which was itself weaker, allowing reorderings consistent with the memory model, makes barriers and "implicit barriers" like address dependencies very expensive, and there is little evidence this is the case.
Even in a hypothetical core which had a cache coherency system coupled to the memeory model, you aren't really avoiding any coherence traffic, just allowing certain reorderings such as satisfying requests out of order.
Is this a "Rust-ism"? I had a double-take while reading that, because in C that would mean a null pointer, and in the terminology I'm used to, the intent is to set the pointee to 0.
Note that x86 does allow some memory reordering:
(I have experience debugging and fixing an extremely rare bug caused by the above subtle reordering, which occurred approximately once every 3-4 months.)
It doesn’t ring a bell to me as someone who’s spent a lot of time in the Rust community, so I’d say it’s probably just a difference in personal jargon rather than a Rust versus C thing.
(Rust usually uses “reference” instead of “pointer” anyway.)
(edit: replaced asterisks with “[asterisk]” since HN’s Markdown has no way to escape them.)
Critically, Rust references have pointer identity: you can convert between references and raw pointers, and if you convert a [asterisk]const u32 to &u32 and back to [asterisk]const u32, the pointers are guaranteed to compare equal. In my opinion this is unfortunate. It would be nice if the compiler could represent &u32 in memory as if it were just u32, i.e. just pass the value rather than a pointer. After all, the pointed-to value is guaranteed to remain unchanged as long as the &u32 exists. And passing by value is almost always faster for small values (where the size of a value <= the size of a pointer); the programmer could just pass by value directly, but only if they know it’s a small value, which isn’t the case in generic code. Unfortunately, the ability to compare pointer identities means that you really do need a pointer, even though most code never does so. (LLVM can transform pointers to values for local variables and even function arguments if it can prove that the actual code never checks pointer identity, but again, that applies to both references and raw pointers; most of the distinction between the two has been lost by the LLVM IR stage anyway.)
I don't think I can articulate well why it's important to keep in mind and appreciate value semantics. Certainly many C programmers, especially the newer ones, are far too concerned with value representation rather than the abstract value itself, often conflating the two. I guess one good example of why it's important to understand pointers as proper objects with abstract values is when working with pointers-to-pointers, pointers-to-pointers-to-pointers, etc. Pointer arithmetic is another case, which doesn't overlap with the former as much as you'd think. In these cases understanding pointers as values is important to understanding the semantics of a program, and more generally how to leverage the language efficiently and safely. Note that these semantics have no analog in references, the construct in, e.g., C++. It doesn't make sense to conceptualize references as independent objects; rather, a reference is a syntactic construct, no more an object than a list initializer.
Yeah, as a C programmer I would find it kind of odd if someone seemed to conceptually elide the character of pointers as proper first-class objects.
For those wondering, given the context of Rust here, both references and C-style pointers are first-class in Rust, but references (which are overwhelmingly more common) don't directly permit pointer arithmetic at all, and would require one to first cast the pointer to an integer.
Name: Address: Value:
p 0100 02 01
v 0102 12 59
I wonder if this confusion of the pointee with the pointer is what makes this concept seem so difficult to those who didn't start with Asm (especially multiple indirections) --- I certainly saw a lot of that when I used to teach CS courses.
Actually, id returns a pointer to a PyObject representing an integer, the value of which is the value of the pointer that was passed as an argument, which is the address of the pointee. Right?
The content (the value) of a pointer is an address.
ouch that hurts. You should be proud of that fix....I guess you kinda are :D
It's a Rust-ism. In Rust there is no null.
(star)const T and (star)mut T both have an is_null method and there is std::ptr::null and std::ptr::null_mut available to create null pointers.
The compiler's re-orderings will always be valid according to the abstract memory model rather than the hardware's, so even on x86 you must use the correct memory orderings, or risk subtle bugs due to compiler optimisations.
It's a whole new 'it works on my machine' issue (for some people).