Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How io_uring and eBPF Will Revolutionize Programming in Linux (scylladb.com)
709 points by harporoeder on Nov 26, 2020 | hide | past | favorite | 320 comments


Ah the funny things we resad about in 2020.

In 1985... yes I said 1985, the Amiga did all I/O through sending and receiving messages. You queued a message to the port of the device / disk you wanted, when the I/O was complete you received a reply on your port.

The same message port system was used to receive UI messages. And filesystems, on top of drive system, were also using port/messages. So did serial devices. Everything.

Simple, asynchronous by nature.

As a matter of fact, it was even more elegant than this. Devices were just DLL with a message port.


And it worked, well, with 512K memory in 1985.

The multitasking was co-operative, and there was no paging or memory protection. That didn't work as well (But worked surprisingly well, especially compared to Win3.1 which came 5-6 years later and needed much more memory to be usable).

I suspect if Commodore/Amiga had done a cheaper version and did not suck so badly at planning and management, we would have been much farther along on software and hardware by now. The Amiga had 4 channel 8-bit DMA stereo sound in 1985 (which with some effort could become 13-bit 2 channel DMA stereo sound), a working multitasking system, 12-bit color high resolution graphics, and more. I think the PC had these specs as "standard" only in 1993 or so, and by "standard" I mean "you could assume there was hardware to support them, but your software needed to include specific support for at least two or three different vendors, such as Creative Labs SoundBlaster and Gravis UltraSound for sound).


Something else that's mentioned less than the hardware side is AmigaDOS and AmigaShell, which were considerably more sophisticated than MS-DOS, and closer to Unix in power (e.g. scripting, pipes, etc.).

The fate of Amiga is so infuriating. It's mind-boggling to think how Microsoft was able to dominate for so long with clearly inferior technology, while vastly superior tech (NeXT, Amiga, BeOS) lost out.

There are many such unhappy stories, and I often think about the millions of hours spent on building tech that should have conquered the world, but didn't. The macOS platform is a rare incidence of something (NeXT) eventually winning out, but the Amiga was a different kind of dead end.


If you think about it, the triumph of "good enough in the right place at the right time" describes most of history of computing. Unix was that, as well, compared to many of its contemporary OSes. C was several steps back from the state of the art in PLs. Java, JavaScript, PHP... the list goes on and on.


There’s also the “tearing your competitors to shreds, regardless of the law or ethics” which is how I think of Microsoft in the pre-iPhone era.

As someone who loves software, there was a very clear feeling at that time that Microsoft was putting a huge chilling effect on the whole industry, and that the entire industry was stagnating under their control.

Thank god for Netscape, Google, Apple, Facebook, and Amazon (in that order) who were able to wrest that control from them. Now at least there are multiple software ecosystems to move between. When one of these massive companies poisons the water around them, there are other ecosystems doing interesting things.


Got bad news for you my friend. All of them (well, of course Netscape doesn't exists anymore) poisoned the waters around them. So regardless where you move, you still inhale some poisoned air.


“C was several steps back from the state of the art in PLs”

This very accurately describes Go


Sure, if literally the only metric you judge a language by is how state of the art it’s expressiveness is.

Sometimes it feels like all the hate on HN toward Go is ignorance that there is a whole domain of software outside of scripting and low level systems programming, and how some enterprises value 20 year maintenance over the constant churn of change eg Rust and JavaScript. And yes, I often hear people saying “you can still do that in x or y” but the point is that Go does it better than most languages because it was purposely designed with those goals in mind - hence exactly why it suffers from expressiveness, state of the art features et al. And I say this from 30 years of experience writing and managing enterprise software development projects across more than a dozen different languages.

Go might not be cool nor pretty, but it’s extremely effective at accomplishing its goal.


I think this is a very two-dimensional way of looking at the problem.

Go reduces complexity in order to make it easier to build resilient systems.

A language like Perl has bucketloads more features, and more expressive syntax, but I’d still say Go is many steps ahead of Perl.

On another note, I’d actually argue that some of Go’s features, such as “dynamically typed” interfaces and first-class concurrency support are streets ahead of most other languages. Not to mention its tooling, which is better than any language I’ve used, full stop (a language is so much more than simply its syntax).

I believe that functional languages, with proper, fully-fledged type systems, are the best way to model computation. But if I had to write a resilient production system, I’m choosing Go any day.


> This very accurately describes Go

...and was a deliberate design objective made by an ex bell labs guy.


C was designed to fill its main purpose in writing portable OS in a higher level language, and given that majority of the today's world's OS is written in C, is the testament of its success.

It is interesting to note that while Brian Kernighan and Ken Thompson are involved in initial Go language design, C was largely Dennis Ritchie's baby and his got a complete PhD thesis on programming language design meaning that he basically aware of the state-of-the-art of programming languages design at that time.

The main argument of several steps back is probably about the lack of the functional language aspects like closure and this feature probably at the very bottom of the programming language features list that you want to have in porting OS, given the computer systems CPU and memory limit at the time. The other is object oriented, but you can perform object oriented programming in C inside the kernel just fine but not as gung-ho as things like the multiple inheritance nonsense [1].

Jury is still out on Go. The fact that Kubernetes is very popular for the cloud now does not mean it will be as successful as 50 years of C. Someone somewhere will probably come up with better Kubernetes alternatives soon that uses different languages. To be relevant in today and in the future, Go needs to adopt generics and its designers are well aware of the deficiency of not having generics for current Go implementation.

[1]https://lwn.net/Articles/444910/


Not at all, C was designed to fill its main purpose in writing portable OS in a higher level language at Bell Labs, the rest of the world was doing it since 1961. Quite easy to find out for anyone going through digital archives from bitsavers, ACM and IEEE.

The majority of the today's world's OS is written in C, as a testament to the success of free beer OS given alongside tapes with source code, while other mainframe platforms required a mortgage just to start.

Had Bell Labs been allowed to sell UNIX and there wouldn't exist a testament of anything.


The main competitor to UNIX namely VAX/VMS is mainly written in C and also its natural successor Windows NT kernel, it is probably the second most popular OS in the world. The more modern BeOS and MacOS kernels are written in C. Even the popular JVM (equivalent to Java mini OS) is written in C. Why are these UNIX alternatives have chosen to use C while other alternative programming languages are readily available at the time for examples Pascal, Objective C and including the safe Ada?

And do the mainframe OSes were written for portability in the first place like UNIX?


VAX/VMS was written in BLISS, it only adopted C after UNIX started to widespread and they needed to cater to competition and their own in-house UNIX implementation, learn history properly.

https://en.wikipedia.org/wiki/BLISS

The even the popular JVM is written in a mix of Java and C++, with plans to port most of the stuff to Java, now that GraalVM has been productised, https://openjdk.java.net/projects/metropolis/

Speaking of which, there are at least two well known version of the JVM written in Java, GraalVM and JikesRVM. Better learn the Java eco-system.

UNIX was written in Assembly for the PDP-7, C only came into play when they ported it to the PDP-11 and UNIX V6 was the first release where most of the code was finally written in C.

IBM i, z/OS or Unisys ClearPath in 2020, have completly different hardware than when they appeared in 1988, 1967 and 1961 respectively, yet PL/S, PL/X and NEWP are still heavily used on them. Looks like portable code to me.

Mac OS, you know the predecessor for macOS, was written in Object Pascal, even though eventually Apple added support for C and C++, which than made C++ with PowerPlant the way to code on the Mac, not C.

BeOS, Symbian were written in C++, not C.

Outside of the kernel space, Windows and OS/2 always favoured C++ and nowadays Windows 10 is a mix of .NET, .NET Native (on UWP) and C++ (kernel supports C++ code since Windows Vista).

NeXT used Objective-C in the kernel, that's right, NeXT drivers were written in Objective-C. Only the BSD/Mach stuff used C.

macOS replaced the Objective-C driver framework with IO Kit, based on Embedded C++ again not C. Nowadays with userspace drivers the C++ framework is called DriverKit in homage to the original Objective-C NeXT framework.

Arduino and ARM mbed are written in C++, not C.

Android uses C only for the Linux kernel, everything else is a mix of Java and C++, and since Project Treble you can even write drivers in Java, just like in Android Things allowed to since version 1.0.

Safe Ada is used alongside C++ on the GenodeOS research project.

Inferno, the last iteration of the hacker beloved Plan 9, uses C on the kernel and the complete userspace makes use of Limbo.

F-Secure, you might have heard of them, has their own bare metal Go implementation for writing firmware and is used in production via the Armory products.

IBM used PL.8 to write a LLVM like compiler toolchain and OS during their RISC research and only pivoted to Aix, because that was what the market wanted RISC for.

Contrary to the cargo cult that Multics was a failure, the OS continued without wihtout Bell Labs and was even assessed to be more secure by DOJ thanks to its use of PL/I instead of C.

There is so much to the world of operating systems than the tunnel vision of UNIX and C.


Personally I'd consider C++ as C with classes or object oriented extension of C, prior to 2010. The modern C++ after that is more of a standalone language after some other languages' features adoption (e.g. D). Objective-C on the other hand is totally a separate language.

The original JVM written by Sun was in C not C++ or Java.

Windows NT the kernel part is mainly written in C. The chief developer of Windows NT Dave Cutler is probably the most anti UNIX person in the world, but the fact that he has chosen C to write Windows NT kernel in C is probably the biggest testament you can get. Dave Cutler is also part of the original developers of VMS, if BLISS with its typeless nature is better for developing OS than C, he'd probably has chosen it.

For whatever reasons Multics had failed to capture wide spread adoption compared to UNIX and the fact that its name existed mainly in most of operating System books as pre-cursor OS to UNIX. For most people Multics is like B language that is just a pre-cursor to C language. I know it a shame that Multics had become a mere footnotes inside OS textbooks despite its superior design compared to UNIX.

PL/I language is interesting by the fact that it is quite advanced at the time but as I mentioned in my original comments, Dennis Ritchie had to accommodate the fact that some of languages features are over engineered based on the hardware of the day and had to compromise accordingly. Go designers, however, have chosen to compromise not based on the hardware state-of-the-art but what the language designers think are good for Google developers at the time of the original language design proposal.


>" Unix was that, as well, compared to many of its contemporary OSes"

Which other OSes from the era are you referring to here?


Yes, the Worse is Better thing.


On Microsoft dominating - because developers don’t matter, users do. None of these superior technology would be willing to make an exception in the kernel so that SimCity can run (look up the story from Raymond Chen if you are not familiar). Linux found considerably more success in servers as the users themselves are “developer like”.


Closest thing I can find is from Joel Spolsky:

I first heard about this from one of the developers of the hit game SimCity, who told me that there was a critical bug in his application: it used memory right after freeing it, a major no-no that happened to work OK on DOS but would not work under Windows where memory that is freed is likely to be snatched up by another running application right away. The testers on the Windows team were going through various popular applications, testing them to make sure they worked OK, but SimCity kept crashing. They reported this to the Windows developers, who disassembled SimCity, stepped through it in a debugger, found the bug, and added special code that checked if SimCity was running, and if it did, ran the memory allocator in a special mode in which you could still use memory after freeing it.

https://www.joelonsoftware.com/2004/06/13/how-microsoft-lost...


Yes I could not find the original link too. Closest I could find is (the link mentioned in the paragraph below is broken) -

"Interesting defense. While Chen has a good point about Microsoft taking the blame unfairly over this and perhaps similar issues its not like Microsoft is renown for their code quality. Indeed, check out an item in J.B. Surveyer's Keep an Open Eye blog from September 2004 which details how Chen's team added code to allow Windows to work around a bug in Sim City!"

https://www.networkworld.com/article/2356556/microsoft-code-...


Here you go:

https://web.archive.org/web/20070114082053/http://www.theope...

It actually refers back to the Joel on Software blog post in the comment above.


The Amiga lost the war for completely different reasons. The competition at the time was MS-DOS 4 or 5, windows 2 (and 3.0) were still a toy — and they had compatibility problems.

BeOS had no chance to be evaluated on its own merits because by that time Microsoft had already applied anticompetitive and illegal leverage on PC vendors - for which they were convicted and paid a hefty fine (which was likely a calculated and very successful investment, all things considered).

The Amiga wasn’t even expensive for what it gave: it was significantly cheaper than a PC or Mac with comparable performance, and even had decently fast PC emulation and ran Mac software faster than the Mac.

It did not have a cheap “entry level” model, though, which was one big problem. The other (not unrelated) problem was incredible incompetence among Commodore management.


Do you not consider the A500 cheap? I believe you they were going for around $600 in late 80’s money, at least in the US. This wasn’t Atari ST cheap but still not a bad deal.


A starter no-brand beige box was always cheaper. And iirc, for a long time you couldn’t get an el-cheapo monochrome monitor for the Amiga - only color or TV, which was ok for the C64-upgraders but not for the PC competition.


True. PC clones were always cheaper.

You could use a composite monitor on the A500/2000. It was only in monochrome. I did that for the first couple months I had my A500.


Around me, they still cost twice as much as a CGA or Hercules (or even dual) monochrome monitor. The starter Amiga cost twice the starter PC until 1993 or so, and by then the war was lost. It was too expensive for a middle class family where I lived.


There's a similar explanation for Linux's success on servers: Linus is very strict on backward compatibility for the kernel. But for Linux on desktop, the rest of the stack (GUI environments) is made by a bunch of CADT devs who don't care about backward compatibility is, so of course it failed..


What kind of backward compatibility for GUI environments are you talking about?


The kind where you can still use an application 2 years later without having to recompile it.


That generally isn't a problem if you pin/vendor your dependencies... the same thing that most developers seem to do on Windows and mac anyway.


CADT? What do you mean?


Cascade of Attention-Deficit Teenagers (open-source software development model)



google "CADT jwz". For various reasons it can't be linked from HN.


For those curious about the story https://www.joelonsoftware.com/2000/05/24/strategy-letter-ii... is a great read with other useful tidbits as well.


There is something to be said about the 2 companies mentioned other that MS, that did the right backwards compatible thing, transmeta and paymybills one is bust and the other doesn’t show up the the 1 page of a search.


> the other doesn’t show up the the 1 page of a search.

It merged with PayTrust, which was acquired by Metavante, who then sold the customers to Intuit. All this happened in the early 2000s. More recently it seems that Intuit sold Paytrust back to Metavante. It's still operating a service at Paytrust.com.


My first Linux box felt pretty comfortable after cutting my teeth on an Amiga's shell, which was largely inspired by Unix and still similar enough in concepts to make the transition easy.


I was an Amiga user for most of the late 80's and early 90's. The hardware didn't change much over the years. Software wise, OS 2.0 was a huge upgrade, but hardware wise, it felt like little changed until AGA. AGA machines (1200/4000) were too little, too late. If the had come out in 1990 instead of 1993, it might've been enough of a lead. Maybe in an alternate universe where the A3000 had AGA.


AGA was the only significant hardware revision.

They should have ditched m68k too. I loved it, but with 68040 and 486, the writing was on the wall for everyone to see.

By the time of Pentium, the writing was on the wall, the floors, the windows, the ceiling, the windows.

Yes, 68060 held a candle against early Pentium but it was not intended as a Personal Computer CPU, more "fast embedded".

The Amiga OS was great. No memory protection, but Win 3.1 had none, DOS had none, Win 95 had some but it somehow crashed relentlessly anyway. It took years for them to discover that it had a max uptime of 48 days because of a timer running out of bits.

AGA could and should have been incrementally upgraded with more modes, ever keeping backwards compatibility. (Like AGA did with the original chipset.) They could have sold Amigas on PCI boards, with a cheap 68000 to boot legacy Amiga OS until the transition was complete with emulation or whatever, and using the PC x86 for games code. So many possibilities, but R&D was on a shoestring budget.

The "game console like" conformity was the strength and ultimately the downfall of the platform, but not because that's bad inherently, but because the revisions stopped coming. The original PS2 was compatible with the PS1, and the original PS3 was compatible with the PS2.

The iPhone also shows the strength of vertical integration, Commodore had a great chess board opening but traded all its pieces for nothing, except in the end, pork for the CEO and board.


My grampa gave me my first pc and it had a cli I learned and then years later when I was introduced to Linux I just had an intuition for the basic commands and usage, I haven’t been able to track down what that PC was (it was cli only but could load games from floppy’s). I think it may have been an Amiga (this was in 98 but the pc was a decade+ old at the time.)


macOS may have technical advantages, though less and less over time. But it always had a dramatically more restrictive business model from the very beginning.

This is what made it unattractive to business and continues to make it unattractive to many.

The restrictiveness of Apple is likely an advantage for novice mobile users, and other vendors copied it.


You say it's unattractive to business and yet MacBook seems to be the single most popular brand of laptop at most tech companies these days.


Maybe the most visible brand, among web-dev, but HP, Lenovo, and Dell own the business/enterprise laptop market in a big way.


I'm aware, I'm a full-time Windows developer.


Around here it is a mix. Macs aren't extremely popular (3 - 5 of 20 or something) and Linux have quickly become more common and seems to be eating into the Windows marketshare.


Every time there's a huge media article about a newly discovered tracking mechanism in Windows 10, you can immediately see the posts of newcomer questions in Linux specific areas.

A lot of people seem to have switched to Ubuntu or Arch due to Windows 10 tracking. And these are also non-technical people that have no idea what they are doing, which is kinda awesome.

I always love when using Linux gets a bit easier to use as a Desktop for the wider audience.


50/50 with Linux among developers in the case of my company. In my team there are like 4 Dells and only 1 Mac.


Software developers are a tiny fraction of business users. In our industry - vaccines - most people use Windows laptops to get work done. Execs do use Apple stuff, but they're not doing the actual ground-level work, so its probably the same everywhere. Apple hardware is expensive to buy, expensive to maintain, and difficult to service due to its flawed design (everything is soldered, no easy access to components, no third party repair, no access to parts, generates lots of e-waste, etc). The OS also is not capable enough to be easily administered by IT.


The PC was an open free specification, so the hardware was cheaper. I think that was the main driver behind the PC winning, not Microsoft..


Only thanks to Compaq, that wasn't part of IBM plans.


Maybe its only clear to you. I don't see what is "clearly inferior" about Microsoft tech. It is reliable and rock solid and has worked for our industry very well (vaccines).


I'm of couse referring to the periods when Amiga and Apple competed with MS-DOS and Windows, which were vastly inferior technologically.


Amiga multitasking was actually preemptive. It was only cooperative in the sense that all processes / tasks were in the same address space and without memory protection...


Between the preemptive multitasking and purely message-based communication it sounds a lot like Erlang.


Unlike erlang, though, when it crashed, it crashed.

RIP Guru meditation.


I learned C programming on an Amiga. It made me very careful. If you messed up, you were looking at the guru followed by a couple minutes for a reboot. Fun times...


We have to "thank" Compaq for it as well.

Even with all its management flaws, the Amiga might have survived without the mass production of PC clones.


Minor nitpick: The Amiga had preemptive multitasking (and thus didn't depend on user code to willingly give up their time slice, in that regard it was more like UNIX, and unlike early Windows and MacOS versions).


Another amazing thing is that Carl Sassenrath, creator of the Amiga OS kernel, also went on to create the REBOL language, which seemed quite innovative too - I've checked it out some - though it is kind of dormant now, and now there is the Red language, based somewhat on REBOL. https://en.m.wikipedia.org/wiki/Carl_Sassenrath


I think the multitasking was actually preemptive. But yes, it had no memory protection: the message passing infrastructure relied on it and it would have been very hard to retrofit even on cpus with an MMU (although I think recent versions might have actually tried).


True, but as I read from more knowledgeable sources than myself, the problem of the Amiga was that the software was intimately linked to and effectively exposed hardware implementation details.

This made upgrading chips nigh impossible without full software rewrites, which ultimately caused stagnation.

Indeed, as an A500 kid I used to laugh and was horrified by my first PC...


The OS actually had a very advanced abstraction layer. The problem is most games bypassed it (and the OS itself in fact) and talked directly with the hardware.


Thanks for the clarification


A friend of mine was amazed by this capability of the Amiga when I showed him that on one screen I could play mod.DasBoot in NoiseTracker, pull the screen down partly then go on the BBS in the terminal by manually dialing atdt454074 and entering, without my A500 even skipping one beat...

All I had was the 512kB expander, he had a 386 with 387 and could only run a single tasking OS


Linux was originally built for the 386.


He could've done the same thing with DOS if he bought DesqView. That let you multitask multiple DOS applications on a 386.


Not quite. The multiple screen thing allowed several full screen graphical applications with different resolutions on screen at once, divided by a vertical barrier (a title bar similar to those on windows). This was a hardware feature at its core, if memory serves me right.


Yes, I remember the screen feature on my A500. It was neat.

To say a 386 is limited to single tasking is wrong though. That was my main point.


I remember NetWare's IPX/SPX network stack used a similar async mechanism. The caller submits a buffer for read and continues to do whatever. When the network card receives the data, it puts them in the caller's buffer. The caller is notified via a callback when the data is ready. All these were fitted in a few K's of memory in a DOS TSR.

All the DOS games at the time used IPX for network play for a reason. TCP was too "big" to fit in memory.


"In 1985... yes I said 1985, the Amiga did all I/O through sending and receiving messages"

I do remember that, and it was cool. But, lightweight efficient message passing is pretty easy when all processes share the same unprotected memory space :)


L4 uses a similar model, and the last ~20 years of research around L4 has mostly focused on improving IPC performance and security. The core abstraction is a mechanism to control message passing between apps via routing through light weight kernel invocations (which is indeed practically the only thing the kernel does, it being a microkernel architecture).

Memory access is enforced, although not technically via the kernel. Rather at boot time the kernel owns all memory, then during init it slices off all the memory it doesn't need for itself and passes it to a user space memory service, and thereafter all memory requests get routed through that process. L4 uses a security model where permissions (including resource access) and their derivatives can be passed from one process to another. Using that system the memory manager process can slice off chunks of its memory and delegate access to those chunks to other processes.


When you want to squeeze every bit of performance out of a system, you want to avoid doing system calls as much as possible. io_uring lets you check if some i/o is done by just checking a piece of memory, instead of using read, pool, or such.


One thing that doesn't change is that every decade people will look at the Amiga and admire it the same no matter how much ~advances have been made since.


This over-romanticizes Amiga (a beautiful system no doubt) because there have been message-passing OSes since the 1960s (see Brinch Hansen's Nucleus for example). The key difference with io_uring is that is an incredibly efficient and general mechanism for async everything. It really is a wonderful piece of technology and an advance over the long line of "message passing" OSes (which always were too slow).


Purely for entertainment, what is the alternate history that might have allowed Amiga to survive and thrive? Here's my stab:

- in the late 80s, Commodore ports AmigaOS to 386

- re-engineers Original Chipset as an ISA card

- OCS combines VGA output and multimedia (no SoundBlaster needed)

- offers AmigaOS to everyone, but it requires their ISA card to run

- runs DOS apps in Virtual 8086 mode, in desktop windows or full-screen


the reverse was actually possible: there were PC compatible expansion cards for the amiga [1]. The issue is that they were very expensive and 8088 only.

[1] for example: https://en.wikipedia.org/wiki/Amiga_Sidecar although "card" is stretching it :)


Yes, those PC Card addons for Mac/Amiga/etc are endlessly fascinating to me. But with the benefit of hindsight, the crucial factor wasn't just being able to run DOS applications on your fancy propriety computer, it was riding the PC Compatible rocketship as it blasted off. Creative Labs and 3Com and Tseng and many others showed that there was more value in manufacturing a popular expansion in the massive PC world than in owning your own closed platform bow-to-stern.



Nice. I did not know those.


> Devices were just DLL with a message port.

Reminds me of: https://en.wikipedia.org/wiki/Unikernel


Just like Erlang + receive.


All this fuss because Linux wouldn't just implement kQueue ... Sigh.


Please explain to me how kqueue facilitates submitting arbitrarily large numbers of syscalls to the kernel as a single syscall, to be performed asynchronously no less. Even potentially submitted using no syscall at all, in polling mode.


Linux should have had kqueue instead of epoll. But io_uring is a different thing.


AFAIK it's unnecessary at this point, Linux has most of the equivalent functionality and there is a shim library for it: https://github.com/mheily/libkqueue


yes, these days you can get a file descriptor for pretty much everything so epoll is sufficient.

I think that epoll timeout granularity is still in milliseconds, so if you want to build high res timers on top of it for your event loop you have to either use zero timeout polling or use an explicit timerfd which adds overhead. I guess you can use plain ppoll (which has ns resolution timeouts) on the epoll fd.


This is corrected in io_uring too, if you use IORING_OP_TIMEOUT that takes a timespec64.


Solaris didn't port kqueue either. We're doomed to reinvent the wheel.


And Bryan Cantrill has expressed quite a bit of remorse about that.


This reminds me of David Wheeler's adage:

  All problems in computer science can be solved by another level of indirection.
The rejoinder, and I don't know who gets credit for it, is:

  All performance problems can be solved by removing a layer of indirection.


An often cited corollary to the first one is, "...except for the problem of too many layers of indirection." :)

https://en.wikipedia.org/wiki/Indirection


Incorrect: that problem, too, can be solved by adding another layer of indirection - then adding an optimized implementation underneath it.


Have we stopped solving all performance problems with introducing a cache? Why wasn't I told? Will I have to hand in my union card?


No worries, your union card is safe: a cache is just a particular kind of added indirection. :)


And some architectures have magically improved performance by reconfiguring caches as scratch pad memories.

http://rexcomputing.com/


I don't think io_uring and ebpf will revolutionize programming on Linux. In fact I hope they don't. The most important aspect of a program is correctness, not speed. Writing asynchronous code is much harder to get right.

Sure, I still write asynchronous code. Mostly to find out if I can. My experience has been that async code is hard to write, is larger, hard to read, hard to verify as correct and may not even be faster for many common use cases.

I also wrote some kernel code, for the same reason. To find out if I could. Most programmers have this drive, I think. They want to push themselves.

And sure, go for it! Just realize that you are experimenting, and you are probably in over your head.

Most of us are most of the time.

Someone will have to be able to fix bugs in your code when you are unavailable. Consider how hard it is to maintain other people's code even if it is just a well-formed, synchronous series of statements. Then consider how much worse it is if that code is asynchronous and maybe has subtle timing bugs, side channels and race conditions.

If I haven't convinced you yet, let me try one last argument.

I invite you to profile how much actual time you spend doing syscalls. Syscalls are amazingly well optimized on Linux. The overhead is practically negligible. You can do hundreds of thousands of syscalls per second, even on old hardware. You can also easily open thousands of threads. Those also scale really well on Linux.


I don't know what kind of programming you're doing, but in network apps, if you have a thread per client and lots of clients (like a web server), you end up with lots of threads waiting on responses from slow clients, and that takes up memory. The time blocked on the syscall has nothing to do with your own machine's performance.

But on the other hand, if your server is behind a buffering proxy so it's not streaming directly over the Internet, it might not be a problem.


> But on the other hand, if your server is behind a buffering proxy so it's not streaming directly over the Internet, it might not be a problem.

This is one instance of a larger pattern I've been noticing. When using some languages (like Python and Ruby) in the natural, blocking way, a back-end web application typically needs multiple processes per machine, because it doesn't handle many concurrent requests per process. Combine this with the fact that each thread has to block while waiting on the client, and you have to add more complexity around the application server processes to regain efficiency. The proxy in front of those servers is one example. Another is an external database connection pool like PgBouncer. Speaking of the database, to avoid wasting memory while waiting on it, you may end up introducing caching sooner than you otherwise would. And when you do, the cache will be an external component like Redis, so all of your many processes can use it. Or you might use a background job queue just to avoid tying up one of your precious blocking threads, even for something that has to happen right away (e.g. sending email). And so on.

Contrast that with something like Go or Erlang (and by extension Elixir), where the runtime offers cheap concurrency that can fully use all of your cores in a single process, built on lightweight userland threads and asynchronous I/O, while the language lets you write straightforward, sequential code. In such an environment, a lot of the operational complexity that I described above can just go away. Simple code and simple ops -- seems like a winning combination to me.


Cooperative multitasking is much easier to implement and administer than preemptive multitasking, and always has been. But there are cases where it isn't good enough, and if you hit those then you need a system that can do preemptive multitasking gracefully - which often means you end up with just as much complexity as if you'd used preemptive multitasking from the start, but with the complex parts being less well-tested.


>"But there are cases where it isn't good enough, and if you hit those then you need a system that can do preemptive multitasking gracefully ..."

What are some of those use cases where userland threads are no longer good enough? In what areas do they fall short?


Essentially any time you have to run something that's not completely trusted to not block a thread - which could be user-supplied code (or "code" - matching a regex is unsafe in most implementations, rendering PostScript is famously Turing-complete) or just a third-party dependency.

At my first job we had a prototype that performed 2x faster (on average) by using Go-style async, but we couldn't trust our libraries enough to eliminate bugs from blocking dispatcher threads. So we stuck with traditional multithreading.


It's all true, and yet most webservers were like that 20 years ago - and they still managed to run even fairly high-traffic websites on hardware much less powerful than what we have today. I would argue that >90% of the web doesn't really need the extra throughput that async gives you at the cost of extra complexity.


Writing asynchronous code is trying to fix how your code is executed in the code itself. It is the wrong solution for a real problem.

But I think what many people get wrong (not the person I'm replying to) is that how you write code and how you execute code does not have to be the same.

This is essentially why google made their N:M threading patches: https://lore.kernel.org/lkml/20200722234538.166697-1-posk@po...

This is why Golang uses goroutines. This is why Javascript made async/await. This is why project loom exists. This is why erlang uses erlang processes.

All of these initiatives make it possible to write synchronous code and execute it as if it was written asynchronously.

And I think all of this also makes it clear that how you write code and how code is executed is not the same, so yes, I'm in agreement with the person I'm replying to, I don't think this will change how code is written that much, because this can't make writing code asynchronously any less of a bad idea than it is now.


> This is why Golang uses goroutines. This is why Javascript made async/await. This is why project loom exists. This is why erlang uses erlang processes.

JavaScript async/await is different from the others. It requires two colors of functions [1], and it conflates how the code is written with how it's executed, so it has the same problem you were talking about at the start of your comment.

Also, JavaScript async/await is suboptimal in that it's ultimately built on top of unstructured callbacks. Or, as Nathaniel J. Smith put it in a post about Python's asyncio module, which has the same problem, "Your async/await functions are dumplings of local structure floating on top of callback soup, and this has far-reaching implications for the simplicity and correctness of your code." [2] That whole post is well worth a read IMO.

[1]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

[2]: https://vorpus.org/blog/some-thoughts-on-asynchronous-api-de...


> JavaScript async/await is different from the others.

It allows me to write synchronous code and execute it asynchronously. The mechanism is different - but the purpose is the same. I'm not endorsing the implementation. But I do use it, because it is way better than writing asynchronous code.


What a wonderfully dogmatic comment that completely misses the point of io_uring.


Given the article's over-the-top opening, I think it's good to have a reality check that reminds us of fundamentals like correctness over speed, and clarity over cleverness.


io_uring is correct. It's nothing clever - in fact, it's quite boring. It is specifically meant for applications that must handle high volumes of asynchronous I/O.

Yes, believe it or not, you can achieve correctness and speed, together, without compromise.


For 99,99% of Linux programming io_uring and eBPF do not have a point and developers couldn't care less about them.


Care to shed some lights on the points of io_uring the OP misses? (honestly interested)


What are your thoughts on Rust?


Coincidentally last night I announced [0] a little io_uring systemd-journald tool I've been hacking on recently for fun.

No ebpf component at this time, but I do wonder if ebpf could perform journal searches in the kernel side and only send the matches back to userspace.

Another thing this little project brought to my attention is the need for a compatibility layer on pre-io_uring kernels. I asked on io_uring@vger [1] last night, but nobody's responded yet, does anyone here know if there's already such a thing in existence?

[0] https://lists.freedesktop.org/archives/systemd-devel/2020-No...

[1] https://lore.kernel.org/io-uring/20201126043016.3yb5ggpkgvuz...


I'd like something roughly similar, to make the rr reverse debugger support io_uring. That likely can't work like most other syscalls, due to the memory only interface...


I have some thoughts about io_uring support in rr: https://github.com/rr-debugger/rr/issues/2613


I was thinking about doing this for an event loop I was working on, but no code to show yet... you probably can get away easily with using pthreads and a sparse memfd to store the buffers.


(Reply since I can't edit anymore) The only catch is that with that approach, you can't poll on the fd like you can with the real thing.


Assuming you're talking about emulating io_uring in userspace at the liburing API level, couldn't you just use a pthreads thread pool for the syscall dispatching from submitted SQEs, and when those complete their results get serialized into the CQE.

For fd-based monitoring of the CQE, wouldn't a simple pipe or eventfd suffice? When CQEs get added, write to the fd, it just happens to all be in-process.

I must admit I haven't gone deep into the liburing internals or the low-level io_uring API, but conceptually speaking there doesn't seem to be anything happening that can't be done in-process in userspace atop pthreads for the blocking syscalls. It just won't be fast.

Am I missing some critical show-stopping detail?


This feels very very similar to IO completion ports / iocp on Windows. More modern versions of Windows even has registered buffers for completion which can be even more performant in certain scenarios. I'm looking forward to trying this out on Linux.

I'm curious to see how this might work its way into libuv and c++ ASIO libraries, too.


io_uring allows the kernel and the user program to communicate purely via shared memory without having to perform a system call, i.e. a context switch to the kernel.

Do windows completion ports also work that way or do they involve a system call to be performed in order to consume completion events?


Windows registered IO (RIO) does imho the same (https://docs.microsoft.com/en-us/previous-versions/windows/i...). When enqueuing reads/writes with RIO there at least exist flags to specify that the kernel should not immediately be woken up, and thereby to batch syscalls as with io_uring.


You do still need a single system call (io_uring_submit) to submit each batch of entries in the submission queue.

Edit: actually no it's not required in all cases. Thanks for the correction.


I only read about io_uring without yet having a chance of actually using it so take this with a grain of salt:

I read that io_uring has two modes, one where you signal via a system call and another that uses memory mapped polling.

https://unixism.net/loti/tutorial/sq_poll.html states:

> Reducing the number of system calls is a major aim for io_uring. To this end, io_uring lets you submit I/O requests without you having to make a single system call. This is done via a special submission queue polling feature that io_uring supports.


Submit is not a syscall. io_uring_enter is the only syscall that is used while running a ring. That one may submit, wait or both at the same time. Strictly speaking it isn't necessary but to avoid it you require elevated privileges.


From Linux 5.10 you only need CAP_SYS_NICE to perform SQPOLL: https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-fi...


Yes, hence "elevated" and not "root". It is still higher than default, right?


You can ask the kernel to poll the submission queue and skip io_uring_submit too. (Though you need elevated privileges to do this IIRC)


Not with SQPOLL. You can eliminate all syscalls with SQPOLL.



Tracking that issue was a motivator for me to begin adding support to glib: https://gitlab.gnome.org/GNOME/glib/-/issues/2084


You're quite right -- it's basically the same idea as IOCP on Windows, kqueue on FreeBSD, and Event Ports on Solaris.


Isn't kqueue for sockets still readiness based? I know most runtimes (like libuv) just use it in the same fashion as epoll, and await readability/writeability through the queue. Not sure if it also has completion based options.


You're quite right, I muddled things up (and now the edit window has elapsed). epoll is the Linux equivalent of kqueue, IOCP, and Event Ports (all readiness based). Not sure how I screwed that one up...


IOCP is actually submission+completion based and thereby closer to io_uring than to epoll. The main difference between IOCP and io_uring at this point seems to be the use of a ringbuffer based submit interface instead of a syscall based one. But that is more of a performance optimization than a huge difference in the programming model.


It is already integrated with asio. Third-party, of course, because that's the whole point: io_uring does not need to know anything about asio, nor does asio need to know anything about io_uring, to get optimal performance.

It's all on github, with accompanying CppCon talk. Asio, by the way, will be C++23's network layer.


There's currently a lot of talk about io_uring, but most articles around it and usages still seem more in the exploration, research and toy project state.

I'm however wondering what the actual quality level is, whether people used it successfully in production and whether there is an overview with which kernel level which feature works without any [known] bugs.

When looking at the mailing list at https://lore.kernel.org/io-uring/ it seems like it is still a very fast moving project, with a fair amount bugfixes. Given that, is it realistic to think about using any kernel in with a kernel version between 5.5 and 5.7 in production where any bug would incur an availability impact, or should this still rather be a considered an ongoing implementation effort and revisited at some 5.xy version?

An extensive set of unit-tests would make it a bit easier to gain trust into that everything works reliably and stays working, but unfortunately those are still not a thing in most low-level projects.


Don't use io_uring until at least 5.10 rc3, if not 5.11. SQPOLL is still to be properly added and fixed and there are some security concerns (e.g. CAP_SYS_ADMIN being replaced by CAP_SYS_NICE to start a kernel submission queue polling thread).

io_uring has many tests in the companion user space library liburing, maintained by the same person that made the kernel patches (Jens Axboe). They test both the library as well as expected functionality in the kernel.

io_uring is not going to give you speed ups if you use it in the same way as you would epoll or kqueue. Thus, simply sticking it into e.g. libuv without changing how the applications are built probably won't give you a lot of benefit (speculating).

It comes down to how you work with the ring buffers and how much you take advantage of the highly out-of-order, memory-barrier-based shared memory approach as opposed to more "discrete" (maybe not the right word) syscalls.

As of yet, I haven't personally come across a published example of a production framework that utilizes these features adequately. We have some internal IP that does, but probably won't be open sourced.


This article recently featured on HN may be of interest to answer your question: https://itnext.io/modern-storage-is-plenty-fast-it-is-the-ap...

io_uring allows for better utilization of fast storage.


> Things will never be the same again after the dust settles. And yes, I’m talking about Linux.

One has to be in quite a techie bubble to equate Linux kernel features with actual world-changing events, as the author goes on to do.

More on-topic though, having read the rest of the article, my guess is that while these features will let companies squeeze some more efficiency out of high-end servers, they won't change how most of us develop applications.


Any async or event-loop runtime can be almost entirely powered by io_uring. Timers, waiting for work when you're out of CPU-bound tasks, most IO syscalls, it all can go through io_uring.

You'll still need a few worker threads for blocking syscalls that haven't been ported to io_uring yet but that need is greatly reduced compared to the previous state of things.

So even if you're not using io_uring yourself the language standard libraries or server frameworks will.

There are WIPs for netty, libuv, nginx. Other projects are exploring it or have announced intent to use it.


Another example, Zig landed io_uring in the std lib a month ago: https://github.com/ziglang/zig/pull/6356

I'm also really excited by how you can use io_uring to power everything (fs, networking etc.) with one easy api and a single-threaded event loop: https://github.com/coilhq/tigerbeetle/tree/master/demos/io_u...

io_uring makes thread-per-core designs so much easier.


He also brings up 2020 because OMG, it's worst year EVAR.

It's not a tech bubble as much as it's a journo bubble. People are reading before they're writing, so he's seeing trendy topics like 2020 and the virus. He feels he needs a hook to get his readers engaged, so he's reaching for things readers can related to.

It's a bad hook. I think an editor would have cut that whole intro.


Oh please. It's bad hook, yes. But once we get over that, we should acknowledge that this is an extremely well-written article. It has been a long time since I've stumbled over an article on HN that was such a joy to read.


I agree it is well-written overall and should have been clearer about that; in my defense, my conclusion was simply that an editor would cut the intro.

The intro is both highly visible, and, because of how writing and thinking work, it's also the spot where you're either collecting your thoughts or trying to hook the reader.

If you don't have an editor and you're done with your first draft, try deleting your first few paragraphs. It's often a simple way to vastly improve a piece.


If the premise is nonsense how can the article be well-written?

Most applications will not change the way they work with the kernel, because they don't work with it, they hide it as well as possible under libraries and frameworks. Even so, most applications need neither io_uring, nor eBPF. Hardly a revolution.


It was written (or, published) in May. The zeitgeist was very much Corona without the fatigue we have now.


Well, getting GB/sec speeds instead of 100s of MB/sec is a pretty impressive improvement in disk utilization.


Without any doubt, but the impact on the world as a whole is going to barely noticeable.


My real hope is that eventually, you can use some higher-level language to write device drivers for things like crappy IoT gadgets using eBPF, without any chance of crashing the machine due to a pointer fu or so.

Knowing that with eBPF I simply cannot crash the laptop I'm working on is a huge deal, and reduces the great psychological hurdle that kernel development always had (for me, at least).


I am impressed with the level of linux knowledge in this thread. How do people become linux kernel hackers? Most of the developers I know (including myself) use linux but have very little awareness beyond application level programming.


You don't necessarily have to be a kernel hacker to be familiar with many of the features that the kernel provides. Just doing application debugging often requires to dig deeper until you hit some kernel balrogs.

Container problems? Namespaces, Cgroups, ...

Network problems? Netfilter, tc, lots of sysctl knobs, tcp algorithms (cue 1287947th thread on nagle/delayed acks/cork)

Slow disk IO? Now you need to read up on syscalls and maybe find more efficient uses. Copy_file_range doesn't work as expected? Suddenly you're reading kernel release notes or source code.


> How do people become linux kernel hackers?

Honestly, by hacking it.

There's a famous book about Linux internals that I don't remember the name (but has "Linux" and "internals" on it). But I have never seen anybody doing it by reading a book (despite how excellent it can be). You just go change what you want or read the submodule you are interested in understanding, and use the book, site or whatever when you have a problem.


>There's a famous book about Linux internals that I don't remember the name (but has "Linux" and "internals" on it)

this one? https://0xax.gitbooks.io/linux-insides/content/index.html

though it says insides instead of internals


Yes, this one. Thanks.


I have never run into a problem that I thought needed to be solved in the kernel. What kinds of things have you wanted to change?


The first time I went into it was to write a driver for a device in my undergrad. After that I've changed a driver here or there (never anything worth merging), and needed the documentation of the sound systems.

It's not an easy thing, by any means. Just locating where you have to touch on the source tree is a problem that will lead you to plenty of books or sites. But don't try reading those before you have a problem to solve, you will lose time and drown in information.

(By the way, I am assuming you know how syscalls work. If you don't, go study that before you start anything.)


I learned 90% of the things I know about operating systems by solving problems I myself caused.


Once I found myself reading about device module programming since the common USB-Serial device module (I forgot its name, cdc something) wasn't properly working for a Chinese multiserial port chip inside a GSM cluster modem (one USB device to multiple serial ports).

I was attempting to hack away a simple example but I found the USB-Serial (more) generic driver intended for "test only" and... it just worked.

Another reason for reading about IO calls, schedulers, etc? That "I'm still writing data into your USB flash drive even when the GUI says it finished 5 minutes ago" that I hate so much.


Conversely every problem I’ve run into in a kernel that I could potentially solve requires me to be an expert in 1-3 things outside of the kernel.


Apart from Linux hw support for things at work, I implemented a fairly simple pseudo-device for establishing TCP connections from a process in capability mode on FreeBSD. The device driver has support for a denylist to disallow connections to specific IP ranges. It has multiple syscalls wrapped into one ioctl, and sockets opened from the device always had TCP_NODELAY, O_CLOEXEC and SOCK_NONBLOCK set. Worked pretty well for its intended use case.

https://github.com/sebcat/yans/blob/master/drivers/freebsd/t... https://github.com/sebcat/yans/blob/master/drivers/freebsd/t...


In my case, which is probably typical, there was a bug in a device driver for some obscure thing we were using where I worked. So I had to dive into the world of kernel modules and fix it. I think a lot of kernel knowledge and development is commercially-driven, in this sense.


I think you were thinking “Linux Core Kernel” by Scott Maxwell. And yes, it’s an awesome book that’s copied the style of a same type of book that annotates the SVR4 kernel


For the most part, it's just software. If you have the time and the interest, you can learn it like anything else. At some level, it requires an awareness of how the hardware works(page tables/MMUs/IOMMUs, interrupts, SMP, NUMA, etc).

I don't mean to downplay the investment, but if you're already an experienced software engineer you can get into it if it interests you. There is a different mindset among systems software programmers though. Reliability comes first, performance and functionality come second. It's a world away from hacking python scripts that only need to run once to perform their function.


I learned a TON about the Linux kernel through writing custom device drivers for FPGAs. Granted most of my experience is in the driver area and not in any of the subsystems, but even still I have a much better grasp of how the kernel operates now (and even more importantly, I know how to navigate it and how to find relevant documentation).


As others have said, hacking it, certainly. But if you're not up for that and would like something more passive, read LWN.net (and possibly subscribe!)


I learned a lot by trying to make Go talk to ALSA without using any existing C interfaces. Just happy exploration goes a long ways!


UTSL


Also from Glauber Costa, a thread-per-core framework using io_uring written in Rust[1] and discussed in HN[2].

[1]: https://github.com/DataDog/glommio [2]: https://news.ycombinator.com/item?id=24976533


Today I am grateful for the brilliant minds around the world that continually open up fundamentally revolutionary new ways to develop applications. To Jens, to Alexei, and to Glauber, and to all of their kindred and ilk, we raise a glass!


The title of the HN post is missing a suffix of "for a few niche applications".

My work is "programming in Linux", but it's not impacted by any of this since I'm working in a different area.

I'm sure this is important work, but maybe tone down such claims a bit.


"few niche applications" being any application that touches files, network or want to run code in the kernel. Sounds like a bigger target than just "niche", but I'm no Linux developer so what do I know.


eBPF has been available for a while now, one year ago there were even two books published about it, one by an author of the "bcc" mentioned in this ad disguised as a technical article. It didn't revolutionize programming in Linux that I'm aware of. It found its market in observability and performance analysis.

io_uring seems to be relevant mostly to people using or wanting to use AIO. Outside of a "few niche applications" this is unimportant for the majority of Linux developers. Libraries like ASIO would likely wrap it anyway since these low-level APIs are not pleasant to use.



GHC RTS integration already well in the works too :) http://wjwh.eu/posts/2020-07-26-haskell-iouring-manager.html


At SCO in the mid-90s we were playing with very similar ideas to boost DB performance. The main motivation was the same then as it is now, don't block and avoid making system calls into the kernel once up and running. Don't recall if any of the work made it into product.


eBPF is still a bit rough but it's already very cool what you can do already.

It would be nice to see it at a high-level at the syscall interface i.e. currently if I want to attach a probe I have to find the function myself or use a library but it would he nice to have it understand elf files.


One thing that I haven't been able to get is if this makes things like DPDK or user mode tcp stack unnecessary since the system call overhead is gone.


io_uring reduces but doesn't remove the system call overhead.

Only with in kernel polling mode is it close to removed. But kernel polling mode has it's own cost. If the system call overhead is no where close to being a bottle neck, i.e. you don't do system calls "that" much, e.g. because your endpoints take longer to complete then using kernel polling mode can degrade the overall system performance. And potential increase power consumption and as such heat generation.

Besides that user mode tcp stacks can be more tailored for your use case which can increase performance.

So all in all I would say that it depends on your use case. For some it will make user mode tcp useless or at least not worth it but for others it doesn't.


I am not sure how user stack tcp or DPDK would get around the power consumption issues of kernel polling. Infact the usage I am aware of pretty much involve polling in user mode because any context switch or O/S scheduling related overhead is excessive. The only thing you can do is to keep your task queue full so as to always be doing something.


io_uring does allow to remove a lot of the syscall overhead, without polling. Many operations can be submitted with just one syscall. And ready completions can be consumed without a syscall at all.

Additionally, compared to using epoll/select/.. for network IO, one can just submit a send/recv, instead of patterns like recv -> EAGAIN, epoll, recv


It does remove the syscall overhead, but as the IO itself will be performed by the kernel so the cpu will still need to switch regularly between user and kernel level. With a full user level network stack and correct interrupt steering the kernel need not be involved at all and the cpu can stay in userspace all the time.

Or you can run the kernel IO thread on another CPU, but that itself has overhead compared to performing IO and handling the data all in the same thread.


I see so the extra kernel IO thread which can be spin waiting is one extra busy core and latency of getting data from one core to the other is the additional overhead.


Are ready completions strictly determined by continuous polling if there's no system call involved? If lots of applications end up using this method, will it increase power consumption due to many processes actively idling until a new consumable shows up in the completion queue?


They said without polling.

If the queue completely empties, then a normal application will use a system call to go to sleep.

But as long as it's not empty, the application can keep receiving events with neither system calls nor active polling.

You'd only actively poll in very specialized/niche cases.


Thanks for explaining. I was confused by the "no system call on consumption" example in the blog post, but if it uses a system call after emptying the completion queue then that'll work just fine.


I'm genuinely curious; both of these changes seem to be exciting due to the ability for people to extend and implement specialized code/features using the kernel. Since the Linux kernel is GPLed (v2, I believe?), does this mean that the number of GPL requests related to products' operating systems is likely to increase, since groups using this extensibility will be writing code covered by the GPL which might actually be of value to other people? Or does the way io_uring and eBPF are implemented isolate the code in such a way that the extensions through their frameworks such that the GPL license won't affect them?


I don’t know about io_uring, but for BPF programs only the kernel space needs to be licensed as GPLv2. Everything on the user space side is handled with system calls or higher level libraries that aren’t GPL licensed (libbpf).


io_uring is a data structure, not code. It's not Turing complete, so there is absolutely no way it would extend GPL virality from the kernel into userspace.

eBPF is code, and follows similar rules to kernel modules. That is, non-GPL-compatible eBPF code is allowed, but a subset of APIs (helpers, like module symbols) are only available to GPL-compatible eBPF programs.


What seems to prevent an GPL issues with io_uring is the "linking" part of GPLV2. Covered here: https://www.gnu.org/licenses/gpl-faq.en.html#GPLStaticVsDyna...

Glibc being the entry point for the syscall, and glibc being LGPL is specifically why it's "okay". If you were to directly link an application to the kernel code, it would be viral.


Licenses only matter to the extent that the resulting product is a "derivative work" of the GPL code. If it's not derivative, then you have no copyright claim that requires the license to permit you to use it.

While the exact nature of when a software project is a "derivative work" of the libraries it depends on is still somewhat of an open legal question, I would be very surprised if anyone were to find that a computer application were a derivative of the OS it runs on. The typical understanding of the industry is essentially a process boundary, and the boundary a system call represents is closer to a process boundary than it is to a library call.


> The typical understanding of the industry is essentially a process boundary,

I agree that this is the typical thinking but I've always found it a little silly and arbitrary. It implies that if I write a GPL-licensed library and release it along with a thin wrapper program that gives it a command-line interface, say it does something like a complicated calculation which reads some data and outputs a single number; then someone could come along and write a program that would not work without it, say something that transforms another input format and then passes it to my calculation. As long as that program calls my "library" as a "program" (using "system()" for example) then they are not bound by the GPL, but if they link to my library and call the calculation directly, then all of a sudden they are?

This linking vs. process boundary thing always seemed like the wrong way to determine if a program is a derivative work of another. If someone writes a program that does not work without the GPL code, they should be bound by the GPL, regardless of whether it's linked, loaded into the same process, called through the command line, or over the wire.

This last one would obviously be controversial, but frankly a lot of companies do hide their use of open source code behind a REST API, and avoid adhering to any particular licenses that way, since they are not "distributing" the software.


That goes beyond what most people consider the GPL to cover. There are other licenses with stronger copylefts specifically to cover that last case -- notably, the AGPL.

I suspect trying to make the case that GPL's viral copyleft isn't limited to strictly linking but potentially any interaction with it would probably have a chilling effect on the use of GPL code, and this reinterpretation would only reinforce some people's prejudice against the GPL, a la Ballmer's "Linux is cancer" line.

Maybe it's the pragmatism in me, but I think it would have a net negative effect long term, unless it managed to flip all of the tables and convince everyone to use all GPL code, instead of making people reject copyleft wholesale.


> potentially any interaction with it

But that's not what I said. I said programs that do not work without some other program, is, in my opinion, a derivative work. I just don't see how the calling mechanism even plays into that judgement.

I do agree that there are other licenses such as the AGPL that try to cover these cases.

And arguably the online thing is a whole different ball of wax, because you can talk about software using a service, etc. It really is tricky in that case.

But I don't see the reason to distinguish between calling a function via the C stdcall mechanism, vs. "popen" and capturing stdout. It's exactly the same, logically, the only difference are details that imho should not matter for the legal case.

Right now, if I release a GPL library, what stops someone from coming along and writing a CLI program that just wraps every function with some textual interface, and including that with their closed-source program? The GPL becomes pretty toothless if it's bypassed so easily.


I'm under the impression that's what the Remote Network Interaction clause of the the Affero GPL license is supposed to do. The "boundary" is then if someone is interacting with the AGPL code at all, so when you use the AGPL-licensed code behind a REST API, even if that's on someone else's server, the use of that code in producing any response to the API request requires publishing the AGPL'ed code/modifications.


> If someone writes a program that does not work without the GPL code, they should be bound by the GPL, regardless of whether it's linked, loaded into the same process, called through the command line, or over the wire.

Let's say I'm writing a refinery simulator to sell to people, and I use a GPL command line utility to do some particular calculation about flow rates.

Now I'm GPL just for outsourcing a single equation. But only because that's the only program around for doing that calculation. As soon as someone else reads a paper on the subject and makes an alternate program for that math, my program is no longer GPL?

Those consequences sound like a mess I don't want to deal with.


> Now I'm GPL just for outsourcing a single equation. But only because that's the only program around for doing that calculation. As soon as someone else reads a paper on the subject and makes an alternate program for that math, my program is no longer GPL?

I don't really see the problem. You are saying that if you change your dependency to a non-GPL program, then you are no longer GPL. The answer to your question is simply "yes".

We are not talking about patents here, but copyright. If someone comes up with an alternative implementation with a different license, you are perfectly free to start using it instead, what's the issue?


> if you change your dependency

Depends on what you mean by changing the dependency. Let me lay out the scenario in more detail.

The program is still exactly the same. It asks to be pointed at a fluid sim program, and then uses that for some of the math it needs.

When it was coded, the only dependency it could use was GPL.

Now there's a new non-GPL dependency it could be pointed at, with the same API.

Now it's possible to run the program without using any GPL code. Does that make the program no longer GPL, even though it didn't change?


I see your point now.

I'll answer with my own hypothetical. If I write a program that dynamically links a library performing the same GPL'd fluid sim calculations, it is presumably forced to be GPL, because it links to it. What if someone comes along and runs the program but at runtime uses LD_PRELOAD to override the dynamic linker, linking it to an alternative library that presents the same interface. Is the program still required to be GPL?

I don't really have an answer to your specific proposed loophole, it's pretty clever and is a very good question; but I don't think the calling mechanism is part of the issue. You could make the same argument whether you are talking about a "program" or a library. The calling convention is a meaningless detail imho.

I think you are specifically responding to my "does not work without" interpretation overly literally. Clearly if the program is written for and tested against a specific interface of a GPL'd program, it is intended to work with that program.

On the other hand if it's written to call into some kind of standard interface, it no longer requires that GPL program specifically, but could work with any program implementing that interface. And I will admit that whether a program is written only to work with a GPL program/library/whatever, or is more general, may be up to interpretation, what is considered "standard", etc., but that is exactly my point -- law is nuanced. If it were possible to codify laws perfectly with overly simple rules like "the copyright applies because it's a DLL and not a program", then we wouldn't need lawyers.

In law, intent is important. If I write a non-GPL program that depends on the functionality of a GPL library, I can go find all sorts of ways to not "link" to it but still use it, e.g., as a program, a service, etc. -- and it happens -- but the intent, which was to find a way to use GPL software without adhering to its license, is still quite clear.


> I'll answer with my own hypothetical. If I write a program that dynamically links a library performing the same GPL'd fluid sim calculations, it is presumably forced to be GPL, because it links to it. What if someone comes along and runs the program but at runtime uses LD_PRELOAD to override the dynamic linker, linking it to an alternative library that presents the same interface. Is the program still required to be GPL?

I've never believed that linking made your code necessarily GPL in the first place. I don't care what the FSF says, they're not exactly unbiased.

> I think you are specifically responding to my "does not work without" interpretation overly literally. Clearly if the program is written for and tested against a specific interface of a GPL'd program, it is intended to work with that program.

> On the other hand if it's written to call into some kind of standard interface, it no longer requires that GPL program specifically, but could work with any program implementing that interface.

Well that's basically how the standard already works. If your code is using a specialized enough interface, sharing data structures you got from the GPL code, then it's derivative of the GPL code and needs to follow the GPL.

So while "process boundary" is an inexact tool, your suggestion of "does not work without" doesn't seem significantly better to me.


Yeah, I think you make some great points and I'll give you that; you probably did show here why my idea is not correct. I don't know the right answer, I'm certainly no lawyer ;)

I just know that, to me, "dynamic linking" seems like an arbitrary and imprecise way to define "derivative work". And, I'm not sure whether it's really something that _can_ be defined and possible to determine without consider it on a case by case basis. It's a good "right hand rule", perhaps, but doesn't strike me as either necessary or sufficient to really define it. We'll never really know, I guess, until someone makes that actual argument in court.


Perhaps I phrased something odd? We are saying the same thing from my perspective. By "prevents GPL issues", I'm saying the user space code wouldn't need to be GPL.


What I took away from your message is that you believe that using a syscall directly would cause your application to require GPL licensing, and that glibc being the code that calls the syscall is what prevents it normally.


Sorry, no. I was saying the opposite. I called out glibc to point out that the only linking (as in the linker) was linking to LGPL code.


FYI, the userspace portion (liburing) is dual licensed LGPL/MIT: https://github.com/axboe/liburing


> If you were to directly link an application to the kernel code, it would be viral.

That doesn't sound right, how is a io_submit syscall different from a read syscall? Obviously if you write a kernel module it links with the kernel, but just issuing a syscall shouldn't be considered linking, otherwise every single proprietary software that issues a raw syscall would be GPL-infringing.


We are saying the same thing. "Directly link" meaning dynamic or static library linking. That's where GPLv2 draws the line.


Stop using the term "viral". It's propaganda made up by microsoft.


I wish people would stop trying to work around the GPL. It causes immense heartache and someday, the kernel developers will revolt and destroy any avenue that you try to use. After all, that's what just happened with the NVIDIA GPU drivers with Linux 5.9.

Also, you're still pulling in parts of Linux into your code, so GPLv2 still applies.


I don't see how that would apply to an application using io_uring through syscalls. It's not linking to any GPLv2 code, and the header file (io_uring.h) is dual licensed as GPLv2 and MIT. Similarly, if you choose to use the higher level liburing, it's dual LGPL and MIT licensed. It's all very deliberately not viral for user-space applications.


That is not true for eBPF, though. That stuff pretty works exclusively out of instrumenting and manipulating Linux itself.

That said, libbpf is LGPLv2+, even though a lot of the stuff you'd pull in to use eBPF at the kernel level forces it to be GPLv2.


The 'workaround' will be for companies to use it for internal projects, and to decline to publish their work. The kernel is licensed under GPLv2 after all, not the AGPL.


Who added the two generic Covid paragraphs to the start of this otherwise good article? _Please_ stop.


Such an odd thing to open an article about IO tech with …


Phoronix showed that a recent bug fix in io_uring negated most of the gains when they profiled redis


How does it make Linux compare to Windows, OSX and *BSD?


The underlying principle of submitting a job to be done and letting the kernel do it (submitting a read() and waiting for it to complete), as opposed to the original model of waiting for the kernel when it's ready to let you do the job (waiting for an fd to become readable so you can call read() on it) is the same as the completion-based model of Windows async I/O (and I think BSD's kqueue too?) as opposed to the readiness-based model of epoll.

The part of where this submission and completion information involves a ring buffer mapped to kernel space is unique to Linux, I believe.


> The part of where this submission and completion information involves a ring buffer mapped to kernel space is unique to Linux

RIO is very similar and predates Linux version by many years: https://docs.microsoft.com/en-us/previous-versions/windows/i...

Main downside, that thing is only for sockets.


And plain old overlapped I/O can lock pages in memory. With Direct I/O devices can directly DMA into these buffers, so theoretically any device can provide similar functionality and use completion ports for notification.

(or polling, since the possible high "interrupt rate" bottleneck of completion notifications is one of the things that motivated RIO)


Mapping pages is relatively expensive. Not the mapping itself, but as a consequence of such update CPU has to flush at least a portion of TLB cache. With overlapped I/O the kernel has to do that for every I/O request.

With RIO and now io_uring, kernels map buffers to both kernel and user addresses spaces just once on initial setup, and reuse the same buffer for many I/O operations.


You're right, I forgot about the cost of locking pages for every overlapped I/O.


The batch system call part is not so hard on its own: https://github.com/c-blake/batch


io_uring interface is designed to allow to encode arbitrary syscalls, although it seems that only a handful are actually supported


It's under active development, more syscalls keep getting added. Contributions are welcome.


Well, it was not hard for me to add batching for every syscall. The whole module is under 200 lines of C. Granted, I only did x86_64.

I think it has some bearing to those using eBPF to just batch calls, too. Unless I am missing something, I do not think there needs to be any super-user/root/capability restriction on syscall batching since all the syscalls check permission "on the inside". That gives it maybe more scope for applications.

That sys_batch is kind of a tiny "jump-forward-only" assembly language where you can use the output of prior calls in later ones. The jump forward only (no loops) I do should also guarantee termination { at least conditioned upon all syscalls terminating...but that's a whole other domain ;-) }. (EDIT: IIRC, the article that this conversation is about was excited about this aspect. In my examples/ I have an "mmap a whole file in one syscall" example.)


I don't personally find such naive syscall batching flexible enough to be all that useful.

No ability to do such common things as concatenate strings from syscall results, or branch according to stat() output for example, makes for a severely limited interface.

io_uring is already in mainline, and it delivers syscall batching as a side-effect while bringing async to the table. I just don't see the point in adding another, severely limited syscall batching thingy, certainly not now, even moreso with talk about ebpf logic joining the party.

BTW as mentioned in a sibling comment, you might want to check out mingo's syslets proposition, which has similar naive batching, but was also async.

https://lwn.net/Articles/221887/


It can branch based on the syscall return and the ability to copy word-values like file descriptors from open to an fstat and so on. That copying could perhaps be extended to strings in some limited way, but I agree it is generally much more limited.

Not needing superuser/any special capability is also nice, though. Not sure of the current status/plans, but I am pretty sure eBPF needed root that for a very long time.

Anyway, I was not trying to "compete" with you or try to "get into mainline". A module works fine for me. Was just exhibiting an easy possibility about some points discussed.

EDIT: and thanks for the pointer. I will check it out.

EDIT2: and much like the word copy is a fake syscall, other fake syscalls like a "value test" could be added to forge a sort of if condition jump forward thing. My little repo there is more a proof of concept than anything else.


> I was not trying to "compete" with you

It's not like I have a dog in this race, I'm just another consumer of these kernel interfaces...

But I am happy something finally has landed upstream we can start writing generic userspace programs targeting and actually expect them to work on distro kernels in the future. But we probably still need a compatibility layer for emulating it in userspace, looks feasible.


Ah. "Contributions are welcome" made me think of you as one of the welcomers.

Compatibility-layer-wise, I did actually do that for my batch system. In the tiny user-space entry point I check if sys_batch is working and if not I fall back to just a loop of userspace making syscalls. That also checks a BATCH_EMUL environment variable to force that emulation mode for benchmarking purposes { so I don't have to unload/reload the module. :-) }

So, user code would always just work, but work faster on kernels with the module loaded.


I've just been following the io_uring mailing list as of late. It appears to be a welcoming environment to outside contribution, assuming quality and relevance of course.


Interesting. This is from 2007. What happened with the idea?


I'd love a TL;DR explaination for why a blanket interface is not possible. I can guess that there are different ways that syscalls handle parameters and there are different families of syscall behaviours, but I'm not sure if that's the reason. I'd love a quick intro and pointers to more details.


While I won't be writing a TL;DR summary, this isn't exactly unexplored territory, and it's been documented by lwn:

https://lwn.net/Articles/316806/

https://lwn.net/Articles/221887/

https://lwn.net/Articles/219954/



Those articles (and syslets) all seem to have this strong async focus. That could simply reflect pengaru's bias/interest. So, I do not know for sure, but I suspect the TL;DR is "async & scheduling in the mix makes it hard to get right". That complexity may also relate to the missing syscalls.

My approach has both the virtue and curse of being too simple to worry about all that, but it _does_ remove basic syscall overhead.


You can look at my code in the link I gave. It's pretty short and should work on kernels 3.x series to 5.x series. { EDIT: I realize this may make you ask your question even more strongly. :-) }


I wouldn't be doing my job if I failed to mention that both Alexei (eBPF) and Jens (io_uring, block) work at Facebook. Beyond them, we've got a bunch of folks working on the primitives as well as low-level userspace libraries [0] that enable us to use all of this stuff in production, so, by the time you're seeing it, we've demonstrated that it works well for all of Facebook's load balancers, container systems, etc.

[0] https://github.com/facebook/folly/blob/16d6394130b0961f6d688...


Are I/O libraries like Tokio for Rust using io_uring?


Some are other will be other won't.

One problem with io_uring is that it's completion based I/O where you move ownership of an buffer to the kernel which then writes to it until the operation completes.

This means you might not be able to (sync) cancel an operation occurring in the background.

This makes it harder to integrate into some I/O libraries, as the previous fact conflicts with RAII patterns.

Another think making adaption harder is that the interfaces for reading/writing with io_uring are conceptually slightly different.

Because of this e.g. Tokio currently hasn't switched yet to use io_uring but still uses readiness based async I/O as far as I know. (Which doesn't mean it won't support it in the future.)

This issue might be relevant (mio is internally used by tokio for async I/O): https://github.com/tokio-rs/mio/issues/923


The glommio library from DataDog was specifically built around io_uring: https://www.datadoghq.com/blog/engineering/introducing-glomm...


There's the ringbahn[0] project, which is a couple of layers wrapping the liburing library; it seems like it's presenting itself as an orthogonal runtime to tokio. Meanwhile, looks like tokio is looking into the feasibility of using io_uring[1], though I'm not sure if it would be using ringbahn or not.

[0]: https://github.com/ringbahn

[1]: https://github.com/tokio-rs/tokio/issues/2411


Apart from tokio (which the sibling comment covers), you can use rio [1] which specifically uses io_uring, and can be used as a Future [2] so that you can use it as part of the wider Future ecosystem, with tokio or any other executor.

[1]: https://crates.io/crates/rio [2]: https://docs.rs/rio/0.9.4/rio/struct.Completion.html#impl-Fu...


Looks like it's on the roadmap: https://github.com/tokio-rs/tokio/issues/2411


Is this similar to XNU's Mach messaging?


So at line 12, it's a macro for a loop right ? Or am I missing something ? https://gist.github.com/PeterCorless/f83c09cc62ccd60e595e4eb...


  #define io_uring_for_each_cqe(ring, head, cqe)    \
   /*        \
    * io_uring_smp_load_acquire() enforces the order of tail \
    * and CQE reads.      \
    */        \
   for (head = *(ring)->cq.khead;     \
        (cqe = (head != io_uring_smp_load_acquire((ring)->cq.ktail) ? \
    &(ring)->cq.cqes[head & (*(ring)->cq.kring_mask)] : NULL)); \
        head++)


The author states:

>"It’s beyond our scope to explain why, but this readiness mechanism really works only for network sockets and pipes — to the point that epoll() doesn’t even accept storage files."

Could someone say here explain why this readiness mechanism really works only for network sockets and pipes and not for disk?


I suppose this will help for the big corporate users of linux. And I suppose that's where most of the programming gets done for linux. But the rate of change and feature adoption by the big commercial pushers of linux has made linux as a desktop more troublesome due to the constant futureshock.


Io_uring will be big under the hood for frameworks and/or programming languages too. But yes, adoption will take a while..


Given it uses a queue that has a producer and a consumer, I wonder if a monitor will be required?


I think that either futexes or polling can be used for notification.


Can you elaborate? What do you mean by monitor?


I believe they are referring to the synchronization primitive [1], conceptually a condition variable plus a mutex.

[1] https://en.wikipedia.org/wiki/Monitor_(synchronization)


So asynchronous message passing is faster than syscalls? Andy Tanenbaum laughs last.


Thanks. Now I understand what netdata ebpf.plugin process is doing.


So, after all async io converges on io completion port design?


hopefully this will bubble up to higher-level C-esque languages such as PHP, for which asynchronicity is still a pain


This won't help the lack of async I/O in those languages that do not support the concept as a whole. If it can't handle epoll, it certainly won't handle io_uring.


> Joyful things like the introduction of the automobile, which forever changed the landscape of cities around the world.

What?!


The author of the black swan book explained that the covid pandemic was not what he meant by a black swan event. Because it was not something entirely unpredictable.. if we look back, we have been talking about pandemics for decades.


> Because it was not something entirely unpredictable..

To the point that as part of the Obama-Trump transition, a literal playbook was created for pandemics, with Coronaviruses (MERS-COV, SARS) explicitly mentioned:

* https://assets.documentcloud.org/documents/6819268/Pandemic-...

They had tabletop exercise on pandemics:

* https://www.politico.com/news/2020/03/16/trump-inauguration-...


That someone would completely disband the team and then ignore the knowledge acquired was kind of unpredictable. I mean, what kind of person would do that?!


People who see government not as a way to (hopefully) better society through good implementations of good policies, but rather see government as source of power to wield for their own benefit.

> Friends of the government win state contracts at high prices and borrow on easy terms from the central bank. Those on the inside grow rich by favoritism; those on the outside suffer from the general deterioration of the economy. As one shrewd observer told me on a recent visit [to Hungary], “The benefit of controlling a modern state is less the power to persecute the innocent, more the power to protect the guilty.”

* https://www.theatlantic.com/magazine/archive/2017/03/how-to-...

* https://archive.is/ZIzCm


Maybe, but I fail to see how this relates to Linux kernel (wrong thread, I suppose)


Fourth paragraph of the article brings it up.


[warning: slight offtopic)

TLDR: Any recommendations on the best way to clone one harddrive to another that doesn't take forever?

> Storage I/O gained an asynchronous interface tailored-fit to work with the kind of applications that really needed it at the moment and nothing else.

Say you have 2x 2TB SSD harddrives and one needs to be cloned to the other.

Being the clever hacker I am who grew up using linux I simply tried unmounting the drivers and trying the usually `dd` approach (using macOS). The problem: It took >20hrs for a direct duplication of the disk. The other problem: this was legal evidence from my spouses work on a harddisk provided by police, so I assumed this was the best approach. Ultimately she had to give it in late because of my genius idea which I told her wouldn't take long.

Given a time constraint the next time this happened, we gave up `dd`, and did the old mounted disk copy/paste via Finder approach... which only took only 3hrs to get 1.2TB of files across into the other HD - via usb-c interfaces.

I've been speculating why one was 5x+ faster than the other (besides the fact `dd` doing a bit-by-bit copy of the filesystem). My initial suspicion was options provided `dd`:

> sudo dd if=/dev/rdisk2 of=/dev/rdisk3 bs=1m conv=noerror,sync

I'm not 100% familiar with the options for `dd` but I do remember a time where I changed `bs=1M` to `bs=8M` helped speed up a transfer in the past.

But I didn't do it for the sake of following the instructions on StackOverflow.


I wrote some Python glue that constructed a bunch of dd commands to run concurrently which helped when I last cloned a 1TB NVMe drive. Resulting commands:

    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=0 seek=0 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=1302349 seek=1302349 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=2604698 seek=2604698 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=3907047 seek=3907047 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=5209396 seek=5209396 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=6511745 seek=6511745 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=7814094 seek=7814094 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=9116443 seek=9116443 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=10418792 seek=10418792 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=11721141 seek=11721141 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=13023490 seek=13023490 count=1302349 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=65536 skip=14325839 seek=14325839 count=1302358 status=progress &
    dd if=/dev/nvd0 of=/dev/nvd1 bs=4096 skip=15628197 seek=15628197 count=6 status=progress &


Probably faster to copy via finder since it’s not actually copying every byte, but only the utilized bytes.

It might be faster to have multiple rsync operations (via xargs or the like), but if the disk is relatively empty I can see this being faster. Finding the right level of parallelism isn’t something I can help you with, probably needs some experimentation.


[flagged]


Please read the site guidelines and follow them: https://news.ycombinator.com/newsguidelines.html. Note this: "Eschew flamebait. Don't introduce flamewar topics unless you have something genuinely new to say. Avoid unrelated controversies and generic tangents."

We detached this subthread from https://news.ycombinator.com/item?id=25222895.


I despise Facebook, but this is a horrible take. They’re not actively in the business of genocide.


Even if it's through inaction, they're still liable.


Unless you're Rohingya


Myanmar?


Is that an accusation that eBFP & io_uring are somehow tools of genocide? That seems a stretch. At some point, good work is good work.

It isn't like Alexei & Jens working for a different company and implementing the same features would stop people using them. As TFA mentions, these are likely to be hot, high demand nuggets for many companies and developers.


You're stretching big with the implication that they're tools of genocide, since that's not really how his comment (however snarky) reads. Good work does not remove it from a morally messed up situation.

Facebook has caused serious issues with society and it's not cool to just glaze over this any time their contribution to technology comes up.


I understand your feelings but it's not fair to blame developers for what the management is doing.


If you choose to give your labor to a company, you tacitly approve of what the company does with the fruits of that labor.


> you tacitly approve

This view seems quite reductionist. It's smells like guilt by association. It excludes the middle ground where you do not approve but ignore that due to other priorities (such as money or excitement about technology you get to work on). Approval is not a binary choice. It also is subject to tradeoffs.

So all you can derive from someone working at facebook is that their preference against indirect contribution (by several hops) to genocide is weaker than the combination of some other preferences. This may also be due to how they're discounting that responsibility distance.


Frankly, I don't understand this logic. This would mean all workers approve what their companies are doing. There are many warehouse workers at Amazon who hate several aspects of their company. Nevertheless, they still work there as they need to feed their families. That's one point.

Second, Facebook is not a tobacco or gun manufacturing company. Their main business is advertising, and they made several grave errors on the way, and they continue to do certain wrong things, but as much as you can hate them you can't morally equate any communication platform with planned genocide. It's shifting responsibility from actual people who commit attrocities to a platform that makes it easier for them to communicate. Yes, FB is easier to blame, but this approach misses the main point.


Maybe if the management changed the company's direction for the worse after the developers were hired - which seems to have happened at Google, but not at Facebook. Nevertheless, I think Facebook contributing to infrastructure for everyone is a net positive due to the large amount of outside users vs. the usefulness for Facebook.

Personally, I have a few things I won't do, which are different from what other people won't do. In a certain "Overton Window", I don't argue about it.


respectfully disagree.


[flagged]


Personal attacks and flamebait are not allowed here. Please read https://news.ycombinator.com/newsguidelines.html and stick to the rules.

We detached this comment from https://news.ycombinator.com/item?id=25222895.


BSD folks are yawning right now.


Why?


kqueue on FreeBSD is effectively like io_uring but has existed for much longer (and Windows I/O Completion Ports predate kqueue). kqueue also gives you a way to get system events which you can't get on Linux (it has an equivalent of cn_proc that isn't awful).


Huh? Kqueue is nice, but I don't think you'll get anywhere near the performance of io_uring out of kqueue:

- kqueue only tells you there's data to read (/space in the write buffer). You still need to call read() or write(), including paying the cost of the syscall. io_uring lets you batch a lot of read/write calls together and either issue a single syscall to the kernel for all calls, or have the kernel poll and never syscall at all.

- kqueue doesn't let you issue fsync, or any of the other syscalls now in io_uring. fsync is essential on the write path for correctness in lots of cases, and for that you still need to dispatch to a local thread pool or something.

So yeah, I prefer kqueue over linux's epoll. But io_uring seems like the new king.


As the other commentor pointed out, I was wrong -- I'd mixed up the correspondence between kqueue and epoll with io_uring. Yeah kqueue doesn't allow asynchronous operations. Sadly I can't delete or edit my comment.


I would argue the api of kqueue is clunkier and less general.

Io_uring can keep track of events through using event_fds, but yes, that is perhaps not optimal.


Yes, sorry I muddled things up (and now the edit window has elapsed). io_uring isn't the same as kqueue because it doesn't permit asynchronous notification of job completion nor can you do more complicated chained operations, epoll is the Linux equivalent of kqueue (and in that comparison, kqueue is better).


wasm in the kernel when?



Pity it's based on commonwa instead of wasi


Perhaps you would enjoy the sequel, then: https://github.com/wasmerio/wasmer


Never. eBPF is a good interface for running code in the kernel because of how restricted it is. Before loading a eBPF module, the kernel does a lof of verification on it, including proving that it terminates. This is famously rather hard on Turing-complete languages. (Although in this case could be solved by giving programs a fixed time budget per invocation.)


Guaranteed termination isn't the only analysis worth having, especially since if your aim is termination (as opposed to leveraging that simplicity for other analysis) it's much easier to simply impose termination and be done with it.

So, WASM isn't entirely unconstrained, and those constraints do allow some interesting analyses that true native code cannot support, e.g. symbolic execution: https://blog.trailofbits.com/2020/01/31/symbolically-executi...

By it's very design it's aimed at running untrusted, even malicious code without exposing the host to security risks, while allowing the imposition of resource constraints. It doesn't sound completely absurd to me (not an expert in this) that some project might seek to use those safety guarantees as a way to avoid paying for the more heavyweight guarantees provided by the kernelspace/userspace split.

A quick google find lots of people trying to get this to work; who knows - it might pay off.


eBPF is already it, apparently people keep forgeting that WASM isn't the first nor is going to be the last polyglot bytecode format.


I haven't seen anyone make that claim. WASM's value has always been in its pragmatic integration plan using what is already there.


Then you haven't been paying attention to the "rewrite in WASM" advocacy force and how WASM "has invented" polyglot bytecode.


I've been paying pretty close attention and read a lot of articles and discussions, starting when it was being created and I have never seen anything about either of those. No one actually writes WASM directly, so no one says to "rewrite in WASM". It is only an optimization, so the two uses are to speed up javascript and to compile native languages.

No one even says "polygot bytecode", let alone says that WASM 'invented it'. Most programmers have already used something that uses a bytecode format, why would anyone say that?


Hopefully never


Interesting. I was just surprised as this:

> Joyful things like the introduction of the automobile

Cars cause so much pollution, noise, traffic, and take up so much space... How can you say its introduction is joyful?

About the new api: while I’m not very knowledgeable about the kernel, it seems like very good news for performance, the improvements are drastic!


> How can you say it’s introduction is joyful?

Sure, if you want to sidestep the innumerable ways the automobile, or more accuracy the internal combustion engine, have completely revolutionised society, you could make an argument...

Except you can’t. Think about the increase in distance and speed at which goods and services can be rendered compared to prior modes of transportation. One simple example that comes to mind is the ambulance, in which that increase may well be the difference between life or death for an unknown but surely enormous population. A similar argument can be made for logistic supply chains delivering medicine, food, sanity products, waste removal, all of which without, mortality would surge. Plague, famine, disease are some of largest killers in human history.

Please tell me again how the absolute gains measured in human lifetimes is anything other than joyful, outside of subjectivity which is an unresolvable debate.


> Cars cause so much pollution, noise, traffic, and take up so much space... How can you say its introduction is joyful?

Horses generated literal tons of pollution in cities, were complicated to look after, and slow. The arrival of the car cleaned up the streets, made it easier to travel (and travel far) and very rapidly became the preferred mode of transport all over the world.


No it will not. Two rather specialized tools to help with rather specific issues are no reason to throw out heaps and mounds of existing and perfectly working code and solutions


What excites me is if you're working down at the high frequency, low latency I/O domain... linux traditionally has actually sort of sucked.

We're _very_ good at buffering things up and handling large chunks to gain high throughput. We have layers and layers of magic that make us _very_ good at that.

For cases where you have lots of devices throwing small chunks at high frequency and you have to respond (in this one, do a small computation, and out that one...) we've sucked bad.

That's why bare metal RTOS's still exist.

The io_uring / eBPF combo looks _very_ promising for opening up that domain in a tidy fashion.

I also hope there will be no reason to throw away "heaps and mounds of existing and perfectly working code and solutions".

I hope to replace my finely tuned epoll / read/write reactor pattern inner loop with io_uring / eBPF and leave 99% of the code untouched... just better throughput / lower latencies.


> I hope to replace my finely tuned epoll / read/write reactor pattern inner loop with io_uring / eBPF and leave 99% of the code untouched... just better throughput / lower latencies.

I hope to do an update of Asio at some point and it will start to use io_uring automatically as a backend


io_uring is no more specialized than an SSD is today, our storage interfaces went from handling between 1 and 4 requests in parallel for the past 50 years to suddenly handling literally hundreds on consumer grade hardware (IIRC NVMe is specced either for up to 64k queue depth or unlimited queue depth). Of course the software environment must change to keep up, is it really reasonable to continue feeding such devices one IO at a time because that's how it was done in 1970?

It is not fun to program against io_uring just as BPF can be a nightmare, but that's a problem for userspace to solve with libraries and better abstractions, much in the same way userspace usually doesn't have to deal with parsing the shared ELF segment exported by the kernel (glibc does that) or writing complex routines for fetching the system time (the ELF segment does that), both of which are implementation details necessary for extracting the full performance from modern hardware.

We'll catch up eventually, but first the lower level interfaces must exist. In meantime, I cringe Every. Single. Time. I run UNIX find or du over my home directory, realizing it could have completed in a fraction of the time for almost 10 years now if only our traditional software environment was awakened to the reality of the hardware it has long since run on.


Well at least in the near future I don't think you will see it on any other UNIX than Linux as it is compared to aio not a standard, but a Linux specific api.


BPF and io_uring architectural styles both have heritage in and close equivalents at least on BSD. You could consider io_uring as nothing but a slightly fancier cousin of FreeBSD's netmap that exists for exactly the same reason, they just wear different shades of lipstick. BPF of course came directly from BSD into Linux

Any inevitable standard will likely come from userspace, much in the same way libpcap successfully papered over raw packet capture for a huge variety of operating systems. The underlying kernel interface is basically just details. We're decades past the point where application portability required standards like POSIX to make progress, I imagine comparatively few modern programmers even know much about those kinds of standards any more.


Huh? io_uring from what I've seen is mostly about direct disk async IO which some databases like (and I prefer databases that mmap :P) while netmap is kernel bypass for networking – "mmap" for NIC ring buffers – without monster frameworks like DPDK, with universal API that keeps all NIC-speicifics in the kernel drivers.


Cousins not twins.. io_uring with O_DIRECT to a block device is already architecturally equivalent to netmap with a network device. Its kernel-side internals have been kept sufficiently generic that we might eventually even see functional equivalence (but if Axboe is listening, please give us getdents64() via uring first!).


Hopefully you are right and software will not not being to lean too much on direct io_uring instead of some kind of wrapper. Thanks for the comment it was pretty educational.


I expect these technologies to be integrated into language runtimes and webservers such that most developers won't even know it being used similar to EPOLL today. While revolutionary is an extreme characterization the performance and improved API changes are at least non trivial.


There's a very niche conference called eBPF Summit* that has presentations from people at companies with first class engineering orgs talking about what they are doing with eBPF. The problems they are solving and the breadth of problems being solved are very impressive.

[*] - https://ebpf.io/summit-2020/


yep, at the Linux Audio Conference we have a paper for using eBPF to process network audio packets this year for instance (https://lac2020.sciencesconf.org/program / paper is https://lac2020.sciencesconf.org/307835) - presentation is tomorrow (happens online)


This conference was more a Cilium conference. Google will use cilium as the default cni on gke https://news.ycombinator.com/item?id=24212021

Not speaking was Kinvolk’s CTO Alban Crequy https://news.ycombinator.com/item?id=23042722


There are tons of applications that are built upon higher-level APIs that can be updated to take advantage of this automatically. Furthermore you don't need to have every single application updated to revolutionize something. The apps that need this will be quite happy to make the jump.


You are right. For most programs, these two are more work for not much payoff.

Though they are great for some things.


Neither ebpf or io_uring are "specialized", in fact they are specifically not specialized which is what makes them revolutionary (along with being better than currently specialized apis like aio).


Project loom is gonna make Java threads automatically use io_uring and restartable sequences. Btw netty is currently actively working on io_uring support, which could enable truly non blocking sockets, which could enable truly asynchronous connection to postrgresql through JDBC, which would enable state of the art Spring performance on techempowerup benchmarks


What do you mean "truly non blocking sockets"? There's API for non-blocking network sockets in Java since forever. You just have to explicitly use it. I think you mean using non-blocking sockets even for code which explicitly uses blocking sockets.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: