
Implement mechanism to wait on any of several futexes - yankcrime
https://lkml.org/lkml/2019/7/30/1399
======
jdsully
I like how they won’t say the windows API’s name. WaitForMultipleObjects is
quite nice - an epoll that works with all sorts of things not just fds. This
is kind of a half assed implementation - sometimes I wish Linux would admit
Windows has a good idea once in a while.

~~~
jlokier
Linux has WaitForMultipleObjects already:

fds are equivalent to Windows HANDLEs, and WaitForMultipleObjects() is
equivalent to poll().

But, Linux also has epoll() which scales better for non-trivial numbers of
things, and Windows has IOCP. So WaitForMultipleObjects isn't particularly
special.

Both Windows and Linux have things that don't work with these interfaces.
Nonetheless, Linux has been trending towards "waiting on all sorts of things"
by makng more and more things into fds that can be waited on with pol;/epoll.
Examples: timerfd, signalfd, eventfd. It's quite a unixy approach.

In fact, Wine already uses eventfd to implement WaitForMultipleObjects. This
kernel change is just an optimisation, to speed up Wine, and a workaround for
some distros setting Wine's max fd limit too low for Windows apps.

Futexes used to support waiting on multiple futexes, using FUTEX_FD. That was
arguably better than the new patch FUTEX_WAIT_MULTIPLE, because in old Linux
you could wait for futexes and other fds at the same time - it did work with
"all sorts of things".

But FUTEX_FD was removed after searches online found no code using it, and
kernel devs didn't like keeping it. (To my mind, this was a suprising, unusual
breakage of system call binary compatibility) The new FUTEX_WAIT_MULTIPLE
allows programs like Wine to wait for multiple futexes faster than before, but
it's more limited than the old FUTEX_FD because you can't mix them with other
things.

~~~
jdsully
epoll only works with file descriptors and is therefore not equivalent. The
whole point is WaitForMultipleObjects works with many different types.

~~~
monocasa
The vast majority of the interesting types _are_ files under Linux. You can
use epoll with files, signals, timers, sockets, pipes, kernel semaphores,
other epoll descriptors, page fault handling descriptors, etc.

FDs are pretty much HANDLEs of the nix world.

~~~
cryptonector
HANDLEs are better. And on Windows you get HANDLEs to processes. We could have
that in *nix land, you know.. something like opening /proc/$victim/status and
using it to wait for the process and keep its PID from being reused until this
is closed.

~~~
cesarb
> We could have that in *nix land, you know.. something like opening
> /proc/$victim/status and using it to wait for the process and keep its PID
> from being reused until this is closed.

We are going to soon: "The 5.3 kernel also adds the ability to pass a pidfd to
poll(), which will provide a notification when the process represented by that
pidfd exits."
([https://lwn.net/SubscriberLink/794707/905eb6b5b7287e77/](https://lwn.net/SubscriberLink/794707/905eb6b5b7287e77/))

------
gnufied
Personally for me - the problematic part of gaming on Linux has been input(i.e
mouse) latency and acceleration profile.

I am not sure if this is just my experience but when using libinput on Fedora
for example - the cursor movement is not exactly precise. This is not obvious
when working but while gaming this is a deal breaker.

~~~
Shorel
Games in Windows can get raw mouse input just by writing the code for it, and
Linux games usually don't because that would require root permissions, to add
the user to the input group, etc.

It is a security issue and an X-Window design issue.

~~~
danappelxx
Oblivious question: is there any notion of granulated permissions, ie a
‘mouse’ user group that the game could add itself to? Seems like something
like this shouldn’t be a deal-breaker.

~~~
LukeShu
There is an 'input' user group, which gives you access to raw mouse, keyboard,
joystick, et c. under `/dev/input/`. (I'm sure you could configure udev to use
more more granular user groups for different types of input devices, if you
really wanted). Normal human users of a single-user system should be members
of the 'input' group, so this "should" be a non-issue.

~~~
Crinus
I have added support for reading the input device directly on my 3D game
engine exactly so i can provide unfiltered mouse input, but at least on Debian
(and i think on derivatives) the user isn't on the 'input' group by default.
The engine shows a message if raw input is enabled it cannot open the device
handle (something like "cannot open handle check if you have permissions for
accessing - maybe you need to be in the input group?") and it falls back to
regular X11 events (the feeling of which depend greatly on the current
configuration, which input driver is used, etc - on my PC where i use udev and
1:1 mapping it feels fine, but others use libinput, which i think is the
default nowadays, and mappings can be all over the place).

------
mabbo
Linux Gaming with Steam is actually quite nice these days. I spent about 3
years using an Ubuntu desktop for all my gaming at home. Most of the games I
played installed via steam and worked great on Linux.

The only reason I switched back to a Windows Desktop was that there were just
one or two games I specifically wanted to try, but couldn't install to Linux.
And once I had switched back (and paid the price for Windows) there were no
games that needed Linux, so no motivation to go back.

~~~
Scaevolus
The article is discussing optimizing Proton, a version of Wine embedded in
Steam, meaning you can (try to) play Windows-only games on Linux with minimal
effort. It works quite well, though graphically intensive games are more
likely to experience glitches or slowdown from the translation.

------
swiley
This looks to me like the main change is to make it easier to create mutexes
with timeouts.

Isn’t a mutex timing out an indication that:

a) a lock wasn’t needed in the first place or

b) the program is incorrect?

It feels more like they just want the api to match win32 better but most of
the multithreaded programming I’ve done lately has just used go’s channels so
I totally could be missing something.

~~~
Pfhreak
Correctness isn't always the right thing to do. Games, in particular, are full
of code that approximates the right thing and falls back to less and less
correct solutions. It's more important to be fast than right in a lot of
cases. Dropping frames can have a significant negative experience for players.
Dropping an AI pathing algorithm, particle physics computation, or other
background task can often be fine or even unnoticed.

~~~
AstralStorm
Approximations, including temporal ones, are also science.

Ever seen a texture pop in instead of a stutter? If a lock would be taken on
that load without timeout (very short) it'd either not load on time or load
when it's no longer needed.

~~~
smaddox
No need for a lock for such synchronization between a loading thread and a
rendering thread, though, when you can use an atomic pointer or ready flag.

Maybe a lazy GUI framework could use a mutex timout, though.

------
sedatk
This is apparently for improving the performance of Wine emulator on Linux
mainly, not native Linux games like the headline suggests.

~~~
rbanffy
Would it be useful to native programs?

~~~
ufo
Valve's post suggests that they believe this to be the case, although they
didn't explain the specific details.

> We think that if this feature (or an equivalent) was adopted upstream, we
> would achieve efficiency gains by adopting it in native massively-threaded
> applications such as Steam and the Source 2 engine.

~~~
rbanffy
I got that. I intended to go for "actually useful".

~~~
kbenson
Source 2 has a linux port I believe, and it's an engine that other games could
(do?) use to get easy linux support for a lot of their codebase. I think that
counts for actually useful.

Depending on what they mean by "massively-threaded", that might cover some
other popular network applications that thread to handle requests, but I'm not
sure how much work they would put into a linux specific solution if it
complicated their codebase.

~~~
rbanffy
It occurred to me that a message passing OO language that mapped every active
object (that's responding to a message) to a thread could benefit from it.

------
Asooka
I know the patch mentions interactive multimedia applications (games) in
particular, but an actual mechanism to implement WaitForMultipleObjects on
Linux would be very welcome for many high-performance multi-threaded
applications.

Say you have one worker thread per CPU core. On Windows, each thread would get
an Event object and you would WaitOnMultiple to be able to act on the first
unit of work that was complete. On Linux you would have to roll your own
solution using lower-level primitives and it will not be correct. Being able
to wait on multiple events on Linux will be awesome.

~~~
badamp
Linux already has what you’re talking about with eventfd and epoll.

In Linux each thread can get an eventfd and you can POLLIN all of them.

In fact I would argue that using futexes is the “roll your own solution” using
lower level primitives (and easier to fuckup) much more so than eventfd and
epoll.

As mentioned somewhat poorly in the post, using futexes gives a performance
boost which is not surprising since they are fast user mutexes. FWIW I didnt
think windows events had a fast user space path but I may be mistaken.

For most worker pool scenarios you’re describing, the overhead of eventfd is
probably in the noise.

~~~
Taniwha
They point out that they already have an implementation that does just this
.... and it fails on some programs due to running out of file descriptors
(they have one program that needs ~1 million of them ...)

~~~
badamp
If you read the full thread that is a bit of a red herring and beside the
point (thats why I said the conveyance of the performance implication was
poor)... indeed window WFMO only supports 64 objects per call. They mention
that the fd issue is due to _leaking_ objects in many windows programs..which
was an odd mention and a little off the main subject. The main motivator is
performance. If eventfds performed better it would likely be better to fix the
fd leak issue with a cache.

Again.. eventfd and epoll covers the same use case as WFMO and EVENTs.

~~~
solipsism
Curious, how would a cache fix the fd leak issue?

~~~
badamp
Perhaps a better term would be “pool”. Anyway, what’s being leaked is
“handles” or events not actually fds. You only actually need as many fds as
the maximum possible number passed to a syscall. The mapping of handles/event
objects in user space does not have to be 1:1 with the kernel resource.

------
alexchamberlain
This is awesome. If I understand this correctly, this would allow you to write
a library that allows you to wait on multiple, fully independent queues from a
single consumer thread. At the moment, those queues would have to share a
mutex.

------
cryptonector
I'd like a way to wait on multiple condition variables. Windows has this... Or
to treat condition variables as file descriptors so that select/poll/epoll can
be used to wait on them (you'd get a notification of a signal, though it might
be spurious, so you still have to take the lock and check the condition before
you consider the CV signaled).

------
ot
I wonder if they have considered implementing futexes in userland so they
don't have to wait for the kernel implementing this.

WebKit has its own implementation (see the ParkingLot in
[https://webkit.org/blog/6161/locking-in-
webkit/](https://webkit.org/blog/6161/locking-in-webkit/)) which inspired an
implementation in folly
([https://github.com/facebook/folly/blob/master/folly/synchron...](https://github.com/facebook/folly/blob/master/folly/synchronization/ParkingLot.h)).

Surprisingly, implementing futexes in userland can have performance benefits
wrt kernel futexes (because of better control over the fast path, possibly
avoiding syscalls) and a richer interface (for example the value doesn't need
to be 32 bit, just any atomic).

------
scohesc
Am I interpreting the graph on their page all wrong? It looks like the older
version of Proton provides better performance? Unless they're graphing
function call return time? The graph and the text around it doesn't do a good
job of explaining that.

~~~
GranPC
4.11 is newer than 4.2

~~~
geddy
I have no idea why this took me so long to realize. Must be the versions I'm
used to seeing but my brain parsed `4.11` as `4.1.1`. Weird.

Thanks for pointing out the obvious for those of us who missed it!

~~~
falcrist
You parse it the same way I usually do. As a decimal fraction, 4.2 > 4.11...
however, version numbers aren't decimal fractions.

In any case, you're not alone.

------
wahern
The `abs_time` argument to both the futex_wait and new futex_wait_multiple is
a pointer, but nowhere is the address checked for validity. (Tracing the
syscall path, it seems to be dereferenced in futex_setup_timer without any
validity check beforehand.) It's never written to AFAICT, only loaded, but it
still seems like leaving around a loaded gun. More importantly, couldn't this
unchecked address be used to probe kernel memory to test for values like NULL,
or possibly any value with enough sampling?

Note that the futex address itself is also a pointer, but it's validated with
access_ok in get_futex_key.

~~~
monocasa
It's not a user pointer, it's a pointer to the kernel stack that's constructed
here:

[https://elixir.bootlin.com/linux/latest/source/kernel/futex....](https://elixir.bootlin.com/linux/latest/source/kernel/futex.c#L3675)

------
shmerl
Nice to see Valve pushing Linux gaming advancements like that!

------
Kuraj
The scale goes from 0 to 80. It's not horribly misleading, but just saying

~~~
aardvarklegend
I believe that’s because it’s frames per second, not an abstract percentile.

~~~
Kuraj
Oh, wow.

I'm an idiot

~~~
donaltroddyn
Not at all - it's a poor graph with an unlabelled axis.

------
robin_reala
The specific proposal (linked to from the article) is
[https://lkml.org/lkml/2019/7/30/1399](https://lkml.org/lkml/2019/7/30/1399)

~~~
Sahhaese
Depressing to see reviewers waste review bandwidth bringing up issues such as
"wasted newline" and "incorrect comment format". Do kernel developers not use
auto-formatters?

~~~
nixgeek
On the contrary, I think the feedback provided was excellent and far better
than just saying “Doesn’t conform to our style guidelines, please try again”.
Bravo Peter.

This may be someone’s first submission and they may consequently not be aware
of style guides, tools which can help lint, etc?

~~~
eco
The technical review was great. The tone of the review left a lot to be
desired.

~~~
ori_b
The technical review wasn't technical. It was a human roleplaying as a code
formatter. The technical content was entirely found in the comment about ABI
compatiblity.

I didn't see any discussion about tradeoffs, alternative approaches, or a
survey of what other systems do for this kind of functionality, or detailed
benchmark results.

The tone was roughly what I'd want people to give me in a code review -- the
only problem is that it was trivial, and could have been summarized as "Fix
the style, check it with $tool"

~~~
cesarb
> The technical review wasn't technical. [...] The technical content was
> entirely found in the comment about ABI compatiblity.

That review also had a comment about an implicit limit on the number of
objects, which is caused by a limit on the amount of physically contiguous
memory the kernel memory allocator can obtain at once, and a comment that the
code being reviewed would allow for a large increase of the reference count of
a couple of important structures. Both appear to be very technical comments to
me.

~~~
Sahhaese
This is my complaint about the styling issue, that the important technical
notes get lost in the 'noise' of styling issues.

