Indeed, although userspace spinlocks are almost always a bad idea, it's possible to write a much better one than repurposing ReadWriteLock.
1. As you say, use onSpinWait
2. It emits a load barrier on each failed trylock call, from the underlying strong integer CAS. Ideally it would use a cas with relaxed memory order, and use an acquire fence only if successful.
3. The read path emits store barriers on release, which is unnecessary. Releasing could use a relaxed order fetch-add instruction.
It's theoretically possible that the JIT would be smart enough to optimize these with the existing code, but that's asking a lot from its analysis, including that the thread local storage writes for detecting reentrance aren't "real" writes that need to be made visible to other threads.
Interesting. I don't have anything to add to this, but by a very strange coincidence I was working on exactly the same things as this article describes today - packing smaller fields into an integer with bitwise operations and using the primitive maps in fastutil.
Yes, in 64-bit HotSpot every object has a 16-byte header, and it applies to arrays if by that you mean that each object in an array still has a header. There's a good overview of what that space is used for here:
[1] https://docs.oracle.com/en/java/javase/17/docs/api/java.base...()