Lectem's comments

Lectem · 2026-01-29T07:48:50 1769672930

That's a spin loop ;)

m-schuetz · 2026-01-29T08:11:23 1769674283

I see, thanks.

Lectem · 2026-01-29T05:33:05 1769664785

That's how they are supposed to work indeed! But spin locks aren't the only spin loops you may find, and allocator for example do spin. And for example under an allocation heavy code (that you should avoid too, but happens due to 3rd parties in real life), this can trigger contention, so you need contention to not be the worse type of contention.

Lectem · 2026-01-29T05:30:57 1769664657

The issue with that is that a load fence may be very detrimental to perf. It doesn't really matter if rdtsc executes out of order in this code anyway, and there is no need for sync between cores.

rdtsc · 2026-01-30T04:26:35 1769747195

You could first measure the perf impact of the fence instruction and then subtract that out? But yeah I guess it may not matter much for quick and dirty calibration loop.

I found somewhere (https://aloiskraus.wordpress.com/2018/06/16/why-skylakex-cpu...) that the pause instruction had this wild cycle difference between different CPU and it caused some grief, I had no idea. I stopped doing low level coding a while back.

Lectem · 2026-01-29T05:25:14 1769664314

I've heard of issues on Arm devices with properly isolated cores (only one thread allowed, interrupts disabled) because the would interact with other threads using such a spinlock, threads which were not themselves isolated. The team replaced it all with a futex and it ended up working better in the end. Sadly this happened while I was under another project so I don't have the details, but this can be problematic in audio too. To avoid the delay of waking up thread you can actually wake them a tiny bit early and then spin (not on a lock), since you know work is incoming.

spacechild1 · 2026-01-29T05:56:30 1769666190

For task queues we would use a lockfree queue, wake up the threads once at the beginning of the audio callback and then spin while waiting for tasks, just as you described.

My example above was rather about the DSP graphs themselves that are computed in parallel. These require to access to shared resources like audio buffers, but under no circumstance should they give up their timeslice and yield back to the scheduler. That's why we're using reader-writer spinlocks to synchronize access to these resources. I really don't see any other practical alternative... Any ideas?

Lectem · 2026-01-29T07:55:00 1769673300

I suppose you need to be able to read data from the buffers to know what parts of the graph to cull? Is computing the graph really long or the graph needs update mid execution? If you really have nothing else to do on those threads/cores, spinning might actually be the solution(considering a high sampling rate). I'd still fallback to the OS after a certain amount of time, as it would mean you failed to meet the deadline anyway. I would also reduce as much as possible the need for writes to synchronized resources where possible, so that you can just read values knowing no writes can happen during your multiple reads.

Lectem · 2026-01-29T05:18:08 1769663888

> which themselves do a short userspace spin-wait and then fall back to a kernel wait queue on contention.

Yes, but sadly not all implementations... The point remains that you should prefer OS primitives when you can, profile first, reduce contention, and then only, maybe, if you reeeally know what you're doing, on a system you mostly know and control, then perhaps you may start doing it yourself. And if you do, the fallback under contention must be the OS primitive

Lectem · 2026-01-29T05:13:32 1769663612

The author (me) actually read this long ago

> - It as an optimal amount of spinning

No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs

> Threads wait (instead of spinning) if the lock is not available immediately-ish

They use parking lots, which is one way to do futew (in fact, WaitOnAddress is implemented similarly). And no if you read the code, they do spin. Worse, they actually yield the thread before properly parking.

pizlonator · 2026-01-29T14:57:04 1769698624

> No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs

You say this with zero data.

I know that yielding 40 times is optimal for WebKit because I measured it. In fact it was re-measured many times because folks like you would doubt that it could’ve optimal, suggest something different, and then again the 40 yields would be shown to be optimal.

> And no if you read the code, they do spin. Worse, they actually yield the thread before properly parking.

Threads wait if the lock is not available immediately-ish.

Yes, they spin by yielding. Spinning by pausing or doing anything else results in worse performance. We measured this countless times.

I think the mistake you’re making is that you’re imagining how locks work. Whereas what I am doing is running rigorous experiments that involved putting WebKit through larger scale tests

ablob · 2026-02-06T20:12:00 1770408720

>> No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs

> You say this with zero data.

Wouldn't the null hypothesis be that the same program behaves differently on different CPUs? Is "different people require different amounts of time to run 100m" a statement that requires data?

Lectem · 2026-01-31T12:55:51 1769864151

>You say this with zero data.

Or so you assume

> Spinning by pausing or doing anything else results in worse performance. We measured this countless times.

And I've seen the issue in hundreds of captures using a profiler. I suppose we just have a different definitions of the what "worse performance" is.

> Whereas what I am doing is running rigorous experiments that involved putting WebKit through larger scale tests

Or perhaps the fish was drown in the stats, or again different metrics.

pizlonator · 2026-01-31T20:53:21 1769892801

> Or so you assume

You're not including data in your discussion of this topic. Your post included zero data.

My post on WTF locks has tons of data.

So, I'm not assuming; I'm observing.

> And I've seen the issue in hundreds of captures using a profiler. I suppose we just have a different definitions of the what "worse performance" is.

Nobody cares what you saw in the profiler.

What matters is the performance users experience.

By any metric of observable performance, yielding is the optimal way of spinning.

Phelinofist · 2026-01-29T08:49:13 1769676553

I guess you mean this regarding spin locks? https://web.archive.org/web/20250219201712/https://www.intel...

The direct link to Intel 404s.

arghwhat · 2026-01-29T13:25:28 1769693128

For reference, golang's mutex also spins by up to 4 times before parking the goroutine on a semaphore. A lot less than the 40 times in the webkit blogpost, but I would definitely consider spinning an appropriate amount before sleeping to be common practice for a generic lock. Granted, as they have a userspace scheduler things do differ a bit there, but most concepts still apply.

https://github.com/golang/go/blob/2bd7f15dd7423b6817939b199c...

jacobp100 · 2026-01-29T09:42:19 1769679739

The guy you relied to wrote the locking code. If you’re so certain they’re doing it wrong, would it not be easier to just prove it? It’s only one file, and they already have benchmarking set up

Lectem · 2026-01-29T11:47:45 1769687265

I mean my "No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs" can be verified directly by having a look at the table in my article showing different timings for pause.

For the yield part, I already linked to the part that shows that. Yes it doesn't call yield if it sees others are parked, but on quick lock/unlock of threads it happens that it sees nobody parked and fails, yielding directly to the OS. This is not frequent, but frequent enough that it can introduce delay issues.

Lectem · on Dec 24, 2019

Yeah... why share something that is 2 years old ?

Lectem · on Dec 23, 2018

Looks like ArtStation didn't bribe Google like the rest of the big apps like Instagram, Facebook, Twitter... I'm surprised people forget that Google now IS evil and corrupt, and are just squashing "small" companies in favor of buying or destroying them

KMag · on Dec 23, 2018

I didn't downvote you, but I think with better tone and a bit more detail, you could have made a well-received comment out of this thought.

Lectem · on July 1, 2018

I wouldnt trust google with anything anymore. Their customer support (entreprise or not) always sucked. They're always "right". You can only suffer the damage silently unless you're worth millions for them. People should come to realize that it's been years google is NOT your friend.