Author of the original WTF::ParkingLot here (what rust’s parking_lot is based on).
I’m surprised that this only compared to std on one platform (Linux).
The main benefit of parking lot is that it makes locks very small, which then encourages the use of fine grained locking. For example, in JavaScriptCore (ParkingLot’s first customer), we stuff a 2-bit lock into every object header - so if there is ever a need to do some locking for internal VM reasons on any object we can do that without increasing the size of the object
> The main benefit of parking lot is that it makes locks very small, which then encourages the use of fine grained locking. For example, in JavaScriptCore (ParkingLot’s first customer), we stuff a 2-bit lock into every object header - so if there is ever a need to do some locking for internal VM reasons on any object we can do that without increasing the size of the object
IMHO that's a very cool feature which is essentially wasted when using it as a `Mutex<InnerBlah>` because the mutex's size will get rounded up to the alignment of `InnerBlah`. And even when not doing that, afaict `parking_lot` doesn't expose a way to use the remaining six bits in `parking_lot::RawMutex`. I think the new std mutexes made the right choice to use a different design.
> I’m surprised that this only compared to std on one platform (Linux).
Can't speak for the author, but I suspect a lot of people really only care about performance under Linux. I write software that I often develop from a Mac but almost entirely deploy on Linux. (But speaking of Macs: std::mutex doesn't yet use futexes on macOS. Might happen soon. https://github.com/rust-lang/rust/pull/122408)
Hypothetically Rust could make `Mutex<InnerBlah>` work with just two bits in the same way it makes `Option<&T>` the same size as `&T`. Annotate `InnerBlah` with the information about which bits are available and let `Mutex` use them.
There was talk of Rust allowing stride != alignment. [1] I think this would mean if say `InnerBlah` has size 15 and alignment 8, `parking_lot::Mutex<InnerBlah>` can be size 16 rather than the current 24. Same would be true for an `OuterBlah` the mutex is one field of. But I don't think it'll happen.
In principle, you Rust could create something like std::num::NonZero and its corresponding sealed trait ZeroablePrimitive to mark that two bits are unused. But that doesn't exist yet as far as I know.
There are also currently the unstable rustc_layout_scalar_valid_range_start and rustc_layout_scalar_valid_range_end attributes (which are used in the definition of NonNull, etc.) which could be used for some bit patterns.
I do the same in my toy JVM (to implement the reentrant mutex+condition variable that every Java object has), except I've got a rare deadlock somewhere because, as it turns out, writing complicated low level concurrency primitives is kinda hard :p
How can a parking_lot lock be less than 1 byte? does this uses unsafe?
Rust in general doesn't support bit-level objects unless you cast things to [u8] and do some shifts and masking manually (that is, like C), which of course is wildly unsafe for data structures with safety invariants
I don’t know the details of the Rust port but I don’t imagine the part that involves the two bits to require unsafe, other than in the ways that any locking algorithm dances with unsafety in Rust (ownership relies on locking algorithms being correct)
This is very similar to how Java's object monitors are implemented. In OpenJDK, the markWord uses two bits to describe the state of an Object's monitor (see markWord.hpp:55). On contention, the monitor is said to become inflated, which basically means revving up a heavier lock and knowing how to find it.
I'm a bit disappointed though, I assumed that you had a way of only using 2 bits of an object's memory somehow, but it seems like the lock takes a full byte?
It’s just that if you use the WTF::Lock class the. You get a full byte simply because the smallest possible size of a class instance in C++ is one byte.
But there’s a template mixing
thing you can use to get it to be two bits (you tell the mixin which byte to steal the two bits from and which two bits).
I suspend the same situation holds in the Rust port.
I am very familiar with how Java does locks. This is different. Look at the ParkingLot/parking_lot API. It lets you do much more than just locks, and there’s no direct equivalent of what Java VMs call the inflated or fat lock. The closest thing is the on demand created queue keyed by address.
>I am very familiar with how Java does locks. This is different. Look at the ParkingLot/parking_lot API. It lets you do much more than just locks, and there’s no direct equivalent of what Java VMs call the inflated or fat lock. The closest thing is the on demand created queue keyed by address.
Are you familiar with the new LightweightSynchronizer approach with an indirection via a table, instead of overwriting the markWord? I'd say that this has pushed the ParkingLot approach and Java's (Hotspot's, really) approach closer to each other than before. I think the table approach in Java could be encoded trivially into ParkingLot API, and the opposite maybe. Obviously the latter would be a lot more hamfisted.
The idea is that six bits in the byte are free to use as you wish. Of course you'll need to implement operations on those six bits as CAS loops (which nonetheless allow for any arbitrary RMW operation) to avoid interfering with the mutex state.
Unhelpful response. This cuongle.dev article does not answer nextaccountic's question, and neither do the webkit.org articles that describe the parking lot concept but not this Rust implementation. The correct answer appears to be that it's impossible: `parking_lot::RawMutex` has private storage that owns the entire byte and does not provide any accessor for the unused six bits.
(unless there's somewhere else in the crate that provides an accessor for this but that'd be a weird interface)
(or you just use transmute to "know" that it's one byte and which bits within the byte it actually cares about, but really don't do that)
(slightly more realistically, you could probably use the `parking_lot_core::park` portion of the implementation and build your own equivalent of `parking_lot::RawMutex` on top of it)
(or you send the `parking_lot` folks a PR to extend `parking_lot::RawMutex` with interface you want; it is open source after all)
> and neither do the webkit.org articles that describe the parking lot concept but not this Rust implementation
The WebKit post explicitly talks about how you just need two bits to describe the lock state.
> The correct answer appears to be that it's impossible: `parking_lot::RawMutex` has private storage that owns the entire byte and does not provide any accessor for the unused six bits.
Not impossible. One way to do this is to just use parking_lot directly.
In WebKit there’s a template mixin that lets you steal two bits for locking however you like. JavaScriptCore uses this to steal two bits from the indexing type byte (if I remember right)
> The WebKit post explicitly talks about how you just need two bits to describe the lock state.
It describes the algorithm but not how a caller of the Rust `parking_lot` crate could take advantage of this.
> Not impossible. One way to do this is to just use parking_lot directly.
By "just use parking_lot directly", I think you're talking about reimplementing the parking lot algorithm or using the C++ `WTF::ParkingLot` implementation? But not actually using the existing Rust crate called `parking_lot` described in the cuongle.dev article? That's confusingly put, and nextaccountic's question is certainly Rust-specific and likely expecting an answer relating to this particular crate. At the least, "does this use unsafe" would certainly be true with an implementation from scratch or when using FFI into C++.
I hear that this algorithm and the C++ implementation are your invention, and all due respect for that. I'm also hearing that you are not familiar with this Rust implementation. It does not offer the main benefit you're describing. `parking_lot::RawMutex` is a one-byte type; that six bits within it are unused is true but something callers can not take advantage of. Worse, `parking_lot::Mutex<InnerFoo>` in practice is often a full word larger than `InnerFoo` due to alignment padding. As such, there's little benefit over a simpler futex-based approach.
> It describes the algorithm but not how a caller of the Rust `parking_lot` crate could take advantage of this.
Read the WebKit post.
> By "just use parking_lot directly", I think you're talking about reimplementing the parking lot algorithm or using the C++ `WTF::ParkingLot` implementation? But not actually using the existing Rust crate called `parking_lot` described in the cuongle.dev article?
"just use parking_lot directly" is a weird way to say "use the `parking_lot_core` crate instead of the `parking_lot` crate".
...and note that I mentioned this in my earlier comment: (slightly more realistically, you could probably use the `parking_lot_core::park` portion of the implementation and build your own equivalent of `parking_lot::RawMutex` on top of it)
I'm not trying to be disagreeable here, but really you could save a lot of trouble if you were a bit more careful in your communication.
No. nextaccountic's comment and the cuongle.dev article are both talking about Rust. The Rust `parking_lot` implementation only uses two bits within a byte, but it doesn't provide a way for anything else to use the remaining six.
pizlonator's comments mention both the (C++) WTF::ParkingLot and the Rust `parking_lot`, and they don't answer nextaccountic's question about the latter.
> nextaccountic is confused.
nextaccountic asked how this idea could be applied to this Rust implementation. That's a perfectly reasonable question. pizlonator didn't know the anwer. That's perfectly reasonable too. Conscat suggested the article would be helpful; that was wrong.
I’m surprised that this only compared to std on one platform (Linux).
The main benefit of parking lot is that it makes locks very small, which then encourages the use of fine grained locking. For example, in JavaScriptCore (ParkingLot’s first customer), we stuff a 2-bit lock into every object header - so if there is ever a need to do some locking for internal VM reasons on any object we can do that without increasing the size of the object