I like how they won’t say the windows API’s name. WaitForMultipleObjects is quite nice - an epoll that works with all sorts of things not just fds. This is kind of a half assed implementation - sometimes I wish Linux would admit Windows has a good idea once in a while.
fds are equivalent to Windows HANDLEs, and WaitForMultipleObjects() is equivalent to poll().
But, Linux also has epoll() which scales better for non-trivial numbers of things, and Windows has IOCP. So WaitForMultipleObjects isn't particularly special.
Both Windows and Linux have things that don't work with these interfaces. Nonetheless, Linux has been trending towards "waiting on all sorts of things" by makng more and more things into fds that can be waited on with pol;/epoll. Examples: timerfd, signalfd, eventfd. It's quite a unixy approach.
In fact, Wine already uses eventfd to implement WaitForMultipleObjects. This kernel change is just an optimisation, to speed up Wine, and a workaround for some distros setting Wine's max fd limit too low for Windows apps.
Futexes used to support waiting on multiple futexes, using FUTEX_FD. That was arguably better than the new patch FUTEX_WAIT_MULTIPLE, because in old Linux you could wait for futexes and other fds at the same time - it did work with "all sorts of things".
But FUTEX_FD was removed after searches online found no code using it, and kernel devs didn't like keeping it. (To my mind, this was a suprising, unusual breakage of system call binary compatibility) The new FUTEX_WAIT_MULTIPLE allows programs like Wine to wait for multiple futexes faster than before, but it's more limited than the old FUTEX_FD because you can't mix them with other things.
The vast majority of the interesting types _are_ files under Linux. You can use epoll with files, signals, timers, sockets, pipes, kernel semaphores, other epoll descriptors, page fault handling descriptors, etc.
HANDLEs are better. And on Windows you get HANDLEs to processes. We could have that in *nix land, you know.. something like opening /proc/$victim/status and using it to wait for the process and keep its PID from being reused until this is closed.
> We could have that in *nix land, you know.. something like opening /proc/$victim/status and using it to wait for the process and keep its PID from being reused until this is closed.
We are going to soon: "The 5.3 kernel also adds the ability to pass a pidfd to poll(), which will provide a notification when the process represented by that pidfd exits." (https://lwn.net/SubscriberLink/794707/905eb6b5b7287e77/)
Last I heard, epoll() was, ironically, not useful with actual file descriptors, because file I/O isn't blocking on Linux [1]. I'm not sure if this problem has been remedied in recent years.
You are confused. File descriptors on linux represent "many different types." They are not just for "disk files". Please see signalfd, timerfd, eventfd, inotify, let alone sockets (which themselves represent things other than IP sockets). FD is essentially like handle. epoll therefore works with many different types.
While the Unix philosophy is that everything is a file - that is not the case with Linux. Futexs being a good example. There was an attempt to integrate them but it was done incorrectly with inherent race conditions and abandoned.
The VMS and NT philosophies of everything being an object are a bit more general and easier to follow in practice.
Futexes currently have _no_ kernel state besides in the threads that are currently waiting on them. There's no futex_create system call for instance. It's litreally just a call to "sleep until this memory address changes or a timeout occurs". There's not really anything to make a type around, which is why FUTEX_FD was doomed to failure and they rightly backed it out.
Ah you edited your response? An HMODULE.. That is weak dude. Calling and HMODULE a HANDLE is cheating... by that definition a UINT is a HANDLE since you can cast it.
Can you use HMODULEs orthogonally to common APIs that use Handles? That's what really matters. Just because something is 4 or 8 bytes wide and you call it the same thing isn't interesting. Like can you pass HMODULE to WaitForSingleObject? Oh Ok, I guess that's mean because to be fair, what does "waiting" on a DLL mean.
Ok, well surely you can pass that HMODULE to CloseHandle, at least closing should be orthogonal.. Why don't you try that? I'll wait.
So what point are you trying to make here? It sound like you're just jerking everyone around, tbh.
Edit: Moreover, it's a bit ridiculous to say that an HMODULE is just an address with no kernel state. It uniquely identifies the loaded DLL, so it a key to a tremendous amount of kernel bookkeeping about the loaded module.
Handles are not kernel addresses in NT and windows will not intentionally leak those to user space. Handle accesses are always through a layer of indirection in the object manager.
You will find that windows never makes guarantees of what handles actually are - in case it wants to change them.
In windows NT most GDI handles are user mode and not kernel objects. Other objects may or may not be kernel based depending on the version and whims of the implementor.
What does that have to do with anything? You wouldn't use WFMO to implement a futex in Windows. And regardless of anything in Linux, defending WFMO seems like a strange hill to die on. It's not a great API.
I'm calling your bluff too. Which one of the supported waitable handles in Windows are just addresses with no other state? I'm more surprised because they all need an access mask at a minimum, and I thought they all involve an ObCreateObject, even a Mutant... these dusty corners of the kernel are visible through the DDK. Anyway, I would be glad to be shown wrong.
I’m not sure why theres such animosity in your replies. The goal here is not to defend windows but to point out that handles need not aways refer to kernel objects.
It would be possible to provide an orthogonal handle based API even if windows doesn’t always live up to that ideal.
Ultimately it seems these systems are just too big to maintain a cohesive design - but it is still an ideal to aspire to.
what are you on about? I just gave you the specific examples: timerfd, eventfd, signalfd, inotify... these are all epollable fds in Linux?
Futex is a special case because futexes themselves are quite special. There is no userspace equivalent to them in Windows anyway as has been mentioned. Windows Events are similar to what is provided by eventfd, but not as featureful.
Probably the core of the issue here is that VMS and derivatives tried to hard to fit everything into a generic handle/fd interface while UNIX historically came with a smaller core with many important API surfaces (like signals or timers) only relatively recently getting absorbed under the unified handle/fd interface, giving the impression it's an afterthought.
It's also not something that works on all Unices (e.g. afaik macos has no timerfd). But in all fairness, that problem only exists because OS speciation, as it's biological counterpart, is not as clear cut a concept as one would desire.
I know; but they cannot be closed with close, or passed via unix domain sockets to other processes etc etc.
I _think_ this whole thread is about "purity", whatever that means.
WaitForMultipleObjects is problematic for a few reasons:
* limited to 64 handles.
* passes the entire handle buffer to the kernel on every call. That is also a key reason poll is worse than epoll/kqueue. A better way is to let the kernel retain the list of handles across calls.
* when two handles are signalled, the one earlier in the array is always the one returned. So handle with a lower array index being frequently signalled can starve out your opportunity to process the later ones.
* since everything happens by returning an index, when multiple handles are signalled, you can only process one at a time, needing a new syscall for each.
Thats my point. They should just bite the bullet and implement WaitForMultipleObjects instead of having all these disjoint APIs. Plus sometimes you want to wait for both and this is very difficult with Linux.
Its also telling Linux has gone through the whole select/poll/epoll madness while WaitForMultipleObjects has worked well in windows NT since the 90s. Its a proven design.
I mean.. epoll is already better than WFMO. It doesn't have that silly 64 item cap, or O(N) entry and exit on the wait like it's 1995 all over again. (And that's not to say that epoll is the pinnacle of design, just that WFMO is worse. We can do better than both)
And this better too, you don't need to register futexes with the kernel, the futex wait call is just "I want to sleep until the data at this memory address changes", and now you can say "I want to sleep until _these addresses_ change". NT doesn't give you any equivalent, you need to build in kernel objects to do the same thing.
And if you want to wait for both, you just wrap it into an eventfd and epoll on that.
The thing is, on Windows, you're not supposed to wait on a thousand handles at once. You're supposed to use overlapped I/O, completion ports, etc. which work quite well.
This use case in the article isn't about I/O at all though. Futexes are user space mutexes. Overlapped I/O and completion ports don't help you in that case.
Right, but when do you need to simultaneously wait on more than 64 things other than for I/O? I've never had to wait on that many mutexes...
(P.S. there is a really ugly way to get around this on Windows if for some bizarre reason you really need to, which is to have 1 thread per 64 handles, then wait on the thread handle instead. I've never found a need do even get close to doing such a thing though.)
> Windows has a ton of subsystem specific hacks each with their own drawbacks.
Is this your way of saying "I can't think of any legitimate response, but I insist the Windows API must suck because that's just how I feel about it"?
I literally told you there is a proper solution that turns out to be different than what you're expecting coming from Linux, and instead of either realizing it's a good solution or telling me why it isn't, you just trashed the OS and told me it sucks.
It's my way of trying to keep the conversation on the topic of the actual article, rather than turning it into a generic Linus v Cutler boxing match where both of their syscall interfaces are on the table.
And staying on the topic of mutexes, that "just spawn another thread for every 64 items you want to wait on" (which I already knew about) is about the hackiest shit ever.
(And FWIW, I started off as a Win32 developer, have code in ReactOS, and have written NT and WinCE drivers. You're not talking to some Linux fanboy)
> And staying on the topic of mutexes, that "just spawn another thread for every 64 items you want to wait on" (which I already knew about) is about the hackiest shit ever.
Is this surprising to you? I already told you it would be a hack because I told you there is a better and proper solution for the actual problem you were encountering. You're stubbornly insisting for no reason on actively doing something bizarre, and you're frustrated you need an obtuse hack to make it happen?
If you're trying so hard to "stay on the topic of mutexes" why do you trash WFMO for the "silly 64 item cap" and then repeatedly refuse to provide a single situation in which waiting on 64 mutexes would actually come up as a legitimate problem? Somehow you find the inability to use a wrong method to solve a problem whose existence you can't even show evidence of to be "silly"?
Windows runs on single socket systems with 128 cores. If you can't think of some time you might need to wait on more than 64 items at a time, you only have your own lack of imagination to blame.
The number of mutexes I generally wait on has nothing to do with how many cores my computer has. But I'm giving up on the hope that you'll ever share with us your wonderful imagination.
Pro tip: implementing a hashed wheel timer yourself in user space comes with weird jitter because you can't atomically grab the system time and sleep for the next tick. Which is why all the sane OSes implement it or something like it in kernel space, including NT.
It’s not about implementing it more efficiently in a universal sense, it’s about making fewer unnecessary trips to the kernel and back. Context switches probably introduce more jitter than your userspace hashed timer wheel.
epoll was a problem child API on linux, with a number of missteps early on. It's much better now. It does everything WFMO does and more. What disjoint APIs are you talking about? You can just use epoll and be done with it?
What are you talking about by "wait for both"?
But jesus, the windows API in this area is hot garbage. A hard 64 limit and O(N)? It's a terrible design.
Kqueue is quite nice. BSD is much better designed and I enjoy working with it more. Linux suffers from people doing just enough to solve their specific problem and then no more. The result is APIs are not orthogonal and follow differing styles. Sometimes they don’t even work at all (e.g. futex fds)
I admire what Linux accomplishes functionally. But it is just as ugly as win32 in my opinion. The two are really more alike in their pragmatism than they would care to admit - but mutual hatred prevents learning from each others mistakes.
I've never worked with Win32, just read about the more core bits of NT, and those seem quite nice although perhaps overgeneral. But I've always had a soft spot for oddball systems.
Personally for me - the problematic part of gaming on Linux has been input(i.e mouse) latency and acceleration profile.
I am not sure if this is just my experience but when using libinput on Fedora for example - the cursor movement is not exactly precise. This is not obvious when working but while gaming this is a deal breaker.
Games in Windows can get raw mouse input just by writing the code for it, and Linux games usually don't because that would require root permissions, to add the user to the input group, etc.
It is a security issue and an X-Window design issue.
Oblivious question: is there any notion of granulated permissions, ie a ‘mouse’ user group that the game could add itself to? Seems like something like this shouldn’t be a deal-breaker.
There is an 'input' user group, which gives you access to raw mouse, keyboard, joystick, et c. under `/dev/input/`. (I'm sure you could configure udev to use more more granular user groups for different types of input devices, if you really wanted). Normal human users of a single-user system should be members of the 'input' group, so this "should" be a non-issue.
I have added support for reading the input device directly on my 3D game engine exactly so i can provide unfiltered mouse input, but at least on Debian (and i think on derivatives) the user isn't on the 'input' group by default. The engine shows a message if raw input is enabled it cannot open the device handle (something like "cannot open handle check if you have permissions for accessing - maybe you need to be in the input group?") and it falls back to regular X11 events (the feeling of which depend greatly on the current configuration, which input driver is used, etc - on my PC where i use udev and 1:1 mapping it feels fine, but others use libinput, which i think is the default nowadays, and mappings can be all over the place).
It does but this is only in an environment where the same computer is accessible by more than one people at the same time with different keyboards and monitors. In practice this is an extremely rare case and in that case (and assuming the people involved cannot just trust each other for some reason) the permissions can be set to only allow access to a single device per user.
There is nothing in its design that prevents X windows to be made to send an event like WM_INPUT that games can handle in a similar way to how they handle WM_INPUT on Windows today.
Seems like in an ideal world, games would select that profile programmatically. Assuming, of course, that there's no reason to prefer mouse acceleration and that we're talking about using the mouse as a linear controller for stuff like aiming or moving.
Ideally yes, but remember that a lot of these games aren't designed for Linux but are effectively running in a compatibility layer for Windows. There is no way for the programmers to know that they need to be interacting with a Linux window manager's behavior.
Good point. I guess the emulator could request the profile, but then you either need a one-size-fits-all approach or somebody has to maintain a database of which games to enable it for.
A critical distinction is that it's not an emulator. "Compatibility layer" is a great description of what's actually happening. It's a (set of) native library(ies) which provide the ABI and call targets that are expected by the software and wrap around native OS functions to provide those.
To use a metaphor, it's like someone made a replacement edge for a puzzle, which interlocks with a subset of existing pieces for another puzzle rather than someone making a table-sandbox within which to use the the entire initial puzzle.
Right. I've used the wrong terminology technically because the machine code is running natively. (I'm not aware of a succinct term for what WINE does, though, so I guess I'm being sloppy for convenience.)
Linux Gaming with Steam is actually quite nice these days. I spent about 3 years using an Ubuntu desktop for all my gaming at home. Most of the games I played installed via steam and worked great on Linux.
The only reason I switched back to a Windows Desktop was that there were just one or two games I specifically wanted to try, but couldn't install to Linux. And once I had switched back (and paid the price for Windows) there were no games that needed Linux, so no motivation to go back.
The article is discussing optimizing Proton, a version of Wine embedded in Steam, meaning you can (try to) play Windows-only games on Linux with minimal effort. It works quite well, though graphically intensive games are more likely to experience glitches or slowdown from the translation.
I don't have the nerves to bear a native Windows install in my environment, so I turned their LTSC into a "Windows Gaming Container" with VFIO.
Alright, it's a headless VM, but it's pared down with 'unfuck' and other telemetry- and uselessness- neutering projects into something safer. Lookingglass peers into one of my GPU's framebuffers, so it lives inside a window on my Linux host.
It's also been at least a year since I've needed it, though, since Steam and wine cover everything else I'd want to play or run, so it might be time to cut it loose for good.
This looks to me like the main change is to make it easier to create mutexes with timeouts.
Isn’t a mutex timing out an indication that:
a) a lock wasn’t needed in the first place or
b) the program is incorrect?
It feels more like they just want the api to match win32 better but most of the multithreaded programming I’ve done lately has just used go’s channels so I totally could be missing something.
Correctness isn't always the right thing to do. Games, in particular, are full of code that approximates the right thing and falls back to less and less correct solutions. It's more important to be fast than right in a lot of cases. Dropping frames can have a significant negative experience for players. Dropping an AI pathing algorithm, particle physics computation, or other background task can often be fine or even unnoticed.
Approximations, including temporal ones, are also science.
Ever seen a texture pop in instead of a stutter? If a lock would be taken on that load without timeout (very short) it'd either not load on time or load when it's no longer needed.
You can't acquire locks in any order, that's a recipe for deadlock when one process has acquired half the locks and another has the different half, resulting in them waiting on each other.
Use of WaitForMultipleObjects is more usually for completion of tasks, in the way that win32 "overlapped" works.
One valid scenario for mutexes timing out is stealing:
1. Each CPU first attempts to take an allocated object from its own pool, using a 1ms timeout to acquire the mutex on the pool.
2. If that fails, the CPU attempts to steal from the other CPU pools, using a zero timeout (immediately return if the mutex can't be taken).
3. If that fails, acquire the mutex on its own pool with an infinite timeout.
This is definitely correct and is much faster than any of the lock-free approaches I could come up with. I guess the general class of problems where mutex timeouts help is when you have "many valid options, some of which are being used by others."
It could be (b) -- it's incorrect because there's a deadlock due to incorrect mutex usage here.
It could also be (c) -- it's correct but there's contention or: the mutex is too coarse / there's "too much" work being protected by the mutex.
But I think your point about matching windows is likely the case (this is how wine implements WaitForMultipleObjects maybe?). The fd exhaustion with FUTEX_FD means they need another way.
Yes, it looks exactly like WaitForMultipleObjects to me. A very reasonable plan. The mutexs aren't being used to exclude critical sections but as completion indicators for various tasks.
Mutex timeouts show up in some implementations as leases.
Essentially you're putting a lower bound on Availability in favor of Consistency. Lease expiry happens when the lessor isn't around to retire the lease in an orderly fashion. It's detecting a Partition. In theory a network partition, but we all know how CPU boundedness creeps into the system as the feature set or the data set grows... and that can show up even on a solitary machine.
Yeah, seriously. It's not emulating a processor, but it is emulating an API. I think the only reason it's not called Windows Emulator is (probably well-founded) concern about getting sued for trademark violation.
Valve's post suggests that they believe this to be the case, although they didn't explain the specific details.
> We think that if this feature (or an equivalent) was adopted upstream, we would achieve efficiency gains by adopting it in native massively-threaded applications such as Steam and the Source 2 engine.
Source 2 has a linux port I believe, and it's an engine that other games could (do?) use to get easy linux support for a lot of their codebase. I think that counts for actually useful.
Depending on what they mean by "massively-threaded", that might cover some other popular network applications that thread to handle requests, but I'm not sure how much work they would put into a linux specific solution if it complicated their codebase.
It occurred to me that a message passing OO language that mapped every active object (that's responding to a message) to a thread could benefit from it.
Wine is a native program, so yes. But it is a niche facility that Wine benefits from, and other programs that handle a lot of locks/queues also do, but may not directly benefit Linux native game performance.
I know the patch mentions interactive multimedia applications (games) in particular, but an actual mechanism to implement WaitForMultipleObjects on Linux would be very welcome for many high-performance multi-threaded applications.
Say you have one worker thread per CPU core. On Windows, each thread would get an Event object and you would WaitOnMultiple to be able to act on the first unit of work that was complete. On Linux you would have to roll your own solution using lower-level primitives and it will not be correct. Being able to wait on multiple events on Linux will be awesome.
Linux already has what you’re talking about with eventfd and epoll.
In Linux each thread can get an eventfd and you can POLLIN all of them.
In fact I would argue that using futexes is the “roll your own solution” using lower level primitives (and easier to fuckup) much more so than eventfd and epoll.
As mentioned somewhat poorly in the post, using futexes gives a performance boost which is not surprising since they are fast user mutexes. FWIW I didnt think windows events had a fast user space path but I may be mistaken.
For most worker pool scenarios you’re describing, the overhead of eventfd is probably in the noise.
You’re talking about interfaces for waiting on multiple kernel resources but the new futex interface enables you to wait for multiple user resources.
Though it can emulate a win32 api for waiting on multiple “objects”, it’s strictly more powerful than WaitForMultiple if you are dealing with user objects since futexes impose very few constraints on how your user synchronization object is shaped and how it works.
So, the new interface is totally different from things like epoll. In one case the kernel is helping you wait for multiple user objects and in the other case it’s helping you wait for multiple kernel objects. The distinction is intentional because the whole point is that the user object that has the futex can be shaped however user likes, and can implement whatever synchro protocol the use likes.
Finally, it’s worth remembering that futex interfaces are all about letting you avoid going into kernel unless there is actually something to wait for. The best part of the api is that it helps you to avoid calling it. So for typical operations, if the resource being waited on can have its wait state represented as a 32-bit int in user memory, the futex based apis will be a lot faster.
They point out that they already have an implementation that does just this .... and it fails on some programs due to running out of file descriptors (they have one program that needs ~1 million of them ...)
If you read the full thread that is a bit of a red herring and beside the point (thats why I said the conveyance of the performance implication was poor)... indeed window WFMO only supports 64 objects per call. They mention that the fd issue is due to leaking objects in many windows programs..which was an odd mention and a little off the main subject. The main motivator is performance. If eventfds performed better it would likely be better to fix the fd leak issue with a cache.
Again.. eventfd and epoll covers the same use case as WFMO and EVENTs.
Perhaps a better term would be “pool”. Anyway, what’s being leaked is “handles” or events not actually fds. You only actually need as many fds as the maximum possible number passed to a syscall. The mapping of handles/event objects in user space does not have to be 1:1 with the kernel resource.
Yes, and you have to cobble together an event implementation out of eventfd and epoll. There are two problems (specifically talking about multi-platform software)
1. You'll likely get it wrong and have subtle bugs.
2. This is significantly different than the Windows model where you wait on events. Now you have two classes of events - regular ones, and ones that can be waited on in multiple. The second class also comes with its own event manager class that needs to manage the eventfd for this group of events.
You end up with a specialised class of event that needs to be used whenever you need to wait in several of them at once. Then you realise you used a normal POSIX event somewhere else and now you want to wait on that as well, so you have to rewrite parts of your program to use your special multi-waitable event.
It's mostly trivial to write a event wrapper on top of POSIX events that behaves the same as Windows Events, except for the part where you might want to wait on multiple of them. I would expect that once this kernel interface is implemented we'll get GNU extensions in glibc allowing for waiting on multiple POSIX events. I absolutely do not want to roll my own thread synchronisation primitives except for very thin wrappers over the platform-specific primitives. Rolling your own synchronisation primitives is about as fraught with peril as rolling your own crypto.
To be honest, WaitForMultipleObjects will probably become not very useful in the near future. We're getting 32-core workstation CPUs today, it's quite likely there will be CPUs with more than 64 cores in near future workstations making it impossible to use this classic Windows primitive, but I suspect Microsoft will provide WaitForMultipleObjectsEx2.
On Linux your workers would push the work onto a single output queue or could signal and condition variable pointed to by the work. I've never really felt the need for WaitForMultiple.
This is awesome. If I understand this correctly, this would allow you to write a library that allows you to wait on multiple, fully independent queues from a single consumer thread. At the moment, those queues would have to share a mutex.
I'd like a way to wait on multiple condition variables. Windows has this... Or to treat condition variables as file descriptors so that select/poll/epoll can be used to wait on them (you'd get a notification of a signal, though it might be spurious, so you still have to take the lock and check the condition before you consider the CV signaled).
Surprisingly, implementing futexes in userland can have performance benefits wrt kernel futexes (because of better control over the fast path, possibly avoiding syscalls) and a richer interface (for example the value doesn't need to be 32 bit, just any atomic).
Am I interpreting the graph on their page all wrong? It looks like the older version of Proton provides better performance? Unless they're graphing function call return time? The graph and the text around it doesn't do a good job of explaining that.
I don't know what the source of this convention is, but this is why I was taught that the version "1.2.3" should be read aloud as "one dot two dot three" instead of "one point two point three". The idea is that people--where I'm from at least--tend to read decimals as "point" and not "dot".
Yes, sadly the proliferation of the "two point oh" meme has set society back on that front. (Had we the opportunity to start over, I would have proposed different punctuation for the delimiter to avoid natural confusion with decimals.)
4.0.0 (semver) and 4.0.0.0 (Windows) aren't valid numbers anyways; The confusion can only arise on X.Y version numbers. Just add a .0 at the end of those
Dates are the same, though — 01.15.2018 is not a real number, it's a sequence of integers. (And in many places they're not even written in most-to-least or least-to-most significant order! …I guess that makes it a tuple, not a sequence.)
The `abs_time` argument to both the futex_wait and new futex_wait_multiple is a pointer, but nowhere is the address checked for validity. (Tracing the syscall path, it seems to be dereferenced in futex_setup_timer without any validity check beforehand.) It's never written to AFAICT, only loaded, but it still seems like leaving around a loaded gun. More importantly, couldn't this unchecked address be used to probe kernel memory to test for values like NULL, or possibly any value with enough sampling?
Note that the futex address itself is also a pointer, but it's validated with access_ok in get_futex_key.
Depressing to see reviewers waste review bandwidth bringing up issues such as "wasted newline" and "incorrect comment format". Do kernel developers not use auto-formatters?
On the contrary, I think the feedback provided was excellent and far better than just saying “Doesn’t conform to our style guidelines, please try again”. Bravo Peter.
This may be someone’s first submission and they may consequently not be aware of style guides, tools which can help lint, etc?
Because bugs due to missing {} happen, and are a royal pain, the rule is that any nontrivial statement; especially any multi-line; must have braces.
Also, given the amount of patches I have to read, uniform style really does matter. Unconventional style breaks the flow and detracts from the important bits.
Also, I'm not aware of a lint like tool that works on diffs. Many of the patches never get further than my MUA.
I think you took my comment as sarcasm, but I quite agree with you; being able to leave out the curly braces was a mistake in the language design. Not as big as some of the others, but costly enough over all these decades.
Personally I prefer to be less strict about style whenever possible, but then I prefer to work in safer languages, and I don't have the same firehose to deal with.
The technical review wasn't technical. It was a human roleplaying as a code formatter. The technical content was entirely found in the comment about ABI compatiblity.
I didn't see any discussion about tradeoffs, alternative approaches, or a survey of what other systems do for this kind of functionality, or detailed benchmark results.
The tone was roughly what I'd want people to give me in a code review -- the only problem is that it was trivial, and could have been summarized as "Fix the style, check it with $tool"
> The technical review wasn't technical. [...] The technical content was entirely found in the comment about ABI compatiblity.
That review also had a comment about an implicit limit on the number of objects, which is caused by a limit on the amount of physically contiguous memory the kernel memory allocator can obtain at once, and a comment that the code being reviewed would allow for a large increase of the reference count of a couple of important structures. Both appear to be very technical comments to me.
In a project like the kernel consistent style is important and the kernel has tools (Coccinelle spatches, checkpatch.pl) to help developers comply with it. There are standards that need to be followed and the bar is the same for everyone.
Because it's not the committer's job to play housekeeper for the contributor. They have enough work to do already, and the contributor is the one who wants their code merged in the first place - they should put in the basic effort to clean their own code before submission, just like everybody else. And because that would introduce some grey areas into the Signed-off-by line[0]. Committers do not want to change submitted code. See the bit about process for maintainers modifying submitted patches and imagine if they had to do that all the time because people can't just follow the damn style guide.
You haven't contribute to a large project or figured out its culture yet, have you? In most major free and open source projects, the default rule is: the entire burdens of correcting any issues, including code-formatting, is the sole responsibility of the contributor, some big projects on GitHub even integrated automatic checking, few people would bother to review a patch if it doesn't fit the coding standard or fails to compile). And no, it's not only used an excuse for disliking one's patches. Even long-term contributors often resubmit some patches to fix their coding styles. If you are a high-profile developer of a subsystem/project, perhaps sometimes you can get a generous help for a free typo/style-correction by the upper level committer (especially when the patch has already went through the review and started moving up, at this point, even the maintainers agree it's pointless to send the patch back), but in general, there's no such a thing.
Do you really suggest that ten reviewers (or however many they may be) do this work each and every one, because it is too much work for the sole submitter to do this once?
It is hard enough to get people to review code as it is. I think everyone would be better off with a little humility and be thankful that other people review their code, even in the cases where the review itself isn't very helpful.
It's the same reason you dress nice and comb your hair for a job interview.
If the developer of the patch couldn't get the easy minor details right before submitting, I wouldn't have much confidence that they spent a lot of effort thinking about the hard, major details either.
I think a law that says speeding is very backwards. When I'm late, details like how far over the speed limit I'm going are a blocker or a waste of time.
The kernel team came up with rules about how code is formatted. If you don't follow the rules, they are under no obligation to allow your code to be merged in to the main repo, and in fact, are within their rights to reject it. I take the initial response more as a "I'm going to let you off with a warning" versus "Here's your ticket, see you in court"
For Linux kernel, the first round of review is always style-checking, it's the standard operating procedure. Ideally, the patch submitter should have already used ./scripts/checkpatch.pl and eliminated all formatting issues, but often there are missed ones, or other style issues not identified by the tools as well.
To me, it serves as a type of virtue signalling. It's kind of interesting to view the issue from a social perspective:
1. It gives a feedback to the committer, shows that your patch has caught the attention of a kernel maintainer, not lost or ignored (Example: last time, I sent a bunch of patches to a subsystem, no reply at all, it turned out that the maintainer was on a vacation. On the other hand, if I received a review on non-conforming code style, I know the maintainer is at least available, and I'm not rejected because I did something seriously wrong).
2. It gives kernel maintainers a chance to immediately expresses objections to your patch, thus affirming the social status and authority of a kernel maintainer (Example: After submitting a few patches, you'll quickly know who's in charge and who has a saying on the development).
3. By doing (2), it also creates a personal connection from the maintainer to the committer, the committer now knows all the sequentially modifications can be CC-ed to maintainer J. Random Hacker for review (although scripts/get_maintainer.pl should always be used, but at least you know who's the most active one).
4. It exerts peer pressure to the submitter to follow the cultural norms, "the system" of the kernel development process, including obeying the Linux kernel coding standard.
5. It creates a system of bureaucracy that could accelerate and mechanize the workflow of a patch-reviewing maintainer (Other examples include pull requests written in a formal, respectful language, often semi-automatically generated, can be compared to the bureaucracy paperwork, e.g. https://lore.kernel.org/lkml/20190731062622.GA4414@archbox/).
6. A lot of the older kernel code has many strange nonstandard coding styles and technical tricks from the early days, which is now discouraged. A strict coding style review prevents any nonstandard practice continues to enter the kernel as new code.
The act of expressing role and power through virtue signalling exists in all organizations. If "the system" itself serves its intended useful proposes without objectionable, serious harms [0], there is no reason to abolish it.
The only problem seems to be frustration over lengthy E-mail exchanges without progress. However, the workflow of Kernel is large, loose, highly asynchronous across different timezones, with a lot of reviewers, some are not even dedicated to the kernel project. Organizing itself already implies a relatively slow pace, so it's not seen as a major problem.
I believe most traditional FOSS project works more or less in the same way. In fact, I think Linux Kernel is actually a lot more open that other similar low-level projects, at least for the "non-core" (not linux-mm) parts, like device drivers.
Finally, I think there are valid criticisms to the traditional model of a FOSS project driven by mails, and many people have attempted to innovate towards a more accessible system of development. GitHub's "Pull Request" proved to lower the barrier-of-entry and boost productivity considerably for small-to-medium projects. And I welcome other innovations if you are starting a new project. On the other hand, the Linux Kernel is now a canonical representation of the "old system" which is very unlikely to change in the next 20 years. My recommendation is: Don't waste your energy to attack the old systems, instead, learn from all major projects and study their workflow and governance, and see if you can invent something new, we need a lot of innovation).
[0] Verbal abuses are criticized as a problem of this system, but by itself, it's not a part of the workflow, using what words is more closer that a matter of personal choice (so yes, one could say harsh criticisms is a greater problem in hacker culture, not only a workflow problem, it can be seen on mailing list, on online forums, IRC, or even offline). Also, Linus Torvalds recently changed his behavior under external pressure.
It also creates a filter for investment in the patch. If the submitter doesn’t bother to do a second round for trivial revisions, it would have been a waste of time for the maintainer to read the patch deeply.