If they've gone that far, they may as well implement QNX messaging, which is known to work well. QNX has an entire POSIX implementation based on QNX's messaging system, so it's known to work. Plus it does hard real time.
The basic primitives work like a subroutine call. There's MsgSend (send and wait for reply), MsgReceive (wait for a request), and MsgReply (reply to a request). There's also MsgSendPulse (send a message, no reply, no wait) but it's seldom used. Messages are just arrays of bytes; the messaging system has no interest in content. Receivers can tell the process ID of the sender, so they can do security checks. All I/O is done through this mechanism; when you call "write()", the library does a MsgSend.
Services can give their endpoint a pathname, so callers can find them.
The call/reply approach makes the hard cases work right. If the receiver isn't there or has exited, the sender gets an error return. There's a timeout mechanism for sending; in QNX, anything that blocks can have a timeout. If a sender exits while waiting for a reply, that doesn't hurt the receiver. So the "cancellation" problem is solved. If you wan to do something else in a process while waiting for a reply, you can use more threads in the sender. On the receive side, you can have multiple threads taking requests via MsgReceive, handling the requests, and replying via MsgReply, so the system scales.
CPU scheduling is integrated with messaging. On a MsgSend, CPU control is usually transferred from sender to receiver immediately, without a pass through the scheduler. The sending thread blocks and the receiving thread unblocks.
With unidirectional messaging (Mach, etc.) and async systems, it's usually necessary to build some protocol on top of messaging to handle errors. It's easy to get stall situations. ("He didn't call back! He said he'd call back! He promised he'd call back!") There's also a scheduling problem - A sends to B but doesn't block, B unblocks, A waits on a pipe/queue for B and blocks, B sends to A and doesn't block, A unblocks. This usually results in several trips through the scheduler and bad scheduling behavior when there's heavy traffic.
There's years (decades, even) of success behind QNX messaging, yet people keep re-inventing the wheel and coming up with inferior designs.
Synchronous Interprocess Messaging Project for LINUX (SIMPL) is a free and open-source project that allows QNX-style synchronous message passing by adding a Linux library using user space techniques like shared memory and Unix pipes to implement SendMssg/ReceiveMssg/ReplyMssg inter-process messaging mechanisms.
QNX itself implements pipes via messaging.
Can't immediately find any more information about it though, so don't know the maturity.
Having done a fair amount of IPC through shared memory, you'll have to explain this one. One process crashing doesn't destroy a memory mapped file on Linux, OS X, or Windows.
What's a good open alternative, or starting point for new innovation, given current investments in microservice architecture?
- ZeroMQ (and nanomsg)
- gRPC (Google)
- Apache Thrift (Facebook)
- Finagle (Twitter)
- L4/seL4 IPC
Not necessarily. QNX messaging is old enough that patents related to the interface may have expired.
"An asynchronous message passing mechanism that allows for multiple messages to be batched for delivery between processes, while allowing for full memory protection during data transfers and a lockless mechanism for speeding up queue operation and queuing and delivering messages simultaneously."
Maybe there are older patents that have expired?
Does this run into security corner cases around pid reuse? (Or race conditions like a process dropping it's privileges after sending?) I remember the kdbus authors talking about making a lot of security metadata attach to the message itself, rather than indirecting through a pid, maybe for these reasons?
So QNX messaging is implemented in kernel space?
I never really understood why kdbus was rejected from Linux, it seems to only have advantages compared to a user space message bus. The only disadvantage I can come up with is security.
How do they implement this securely? I can't immediately think of a POSIX-y way for Process A to prove its pid to Process B without involving the kernel.
DBus is the one part of the modern linux desktop I would like to/have to install to get the applications I want running, even though I dislike it a lot (pulseaudio and systemd one can just not install). One example is the password remember function of steam. Having a more reasonable implementation could help with this a lot.
There is a reason there is no "shared bus" in Internet communications.
Yes, but dbus isn't _for_ Internet communications. It was designed to wire together the multiple processes that act more-or-less as a whole to implement a desktop environment.
"Better" is contextual. The main problems dbus solves aren't "IPC" at all - they are things like lifecycle tracking, service discovery, and getting events across the system/user-session security boundary.
dbus-broker looks interesting!
Yeah, god forbid anyone attempts to unify similar concerns and do away with the mess of ad-hoc solutions that is POSIX/Linux.
Personally I think the Windows messaging system would actually be a pretty good model to follow, especially if you could give it an actual payload and not just two words. It would certainly solve the actual problems DBus was built to address - media change notifications and things like that.
ZeroMQ is getting used more for those kinds of purposes; the Greenbank Telescope uses it for one of their instrument backends and we are now using it for VLITE and REALfast. The new archive system I'm helping build uses AMQP.
I have been joking about how we should build a "total sh*t array" out of old Dish/DirecTV antennas, so that we could explore the systems design without worrying too much about whether anything could be done with the data collected. This hasn't interested my coworkers that much :) There is an amateur radio astronomy society, and there are plans for how to build various levels of radio telescope, starting from ~$50 and an old Dish receiver and going up. And our open-skies policy means that you do not have to be a professional astronomer to use our instruments, although I only know of one or two amateurs that have proposed for time. (They did get it, though).
As a rough back-of-the-hand deal, we allow anybody who gets time to access about 25 MB/s of data. The correlator we have currently (WIDAR) can certainly output much more than that, up to gigabytes per second, but the rest of the infrastructure certainly can't keep up at that rate sustained. It's not unusual for an observation to top a few TB in size. ALMA data files are probably even larger on average.
We are already in the early design stage for a next-generation VLA, which will increase the number of antennas to about 300. At that point, we probably won't be able to keep correlated but unprocessed raw data, just because of the sheer size of it.
Dbus solves problems that the IPC methods you discussed do not. If there were a better solution, it would probably have been adopted by now.
Linux desktops are implemented as process swarms and communication among processes is one of the central things they have to deal with.
How old is CORBA, again? And how crazy is SOAP?
"better" depends on what you are trying to do.
Of course CORBA and SOAP are considered old and horrible now, 15 years later. But currently-popular stuff is in many ways equally unsuited to coordination of local desktop processes, because it's not designed for that.
There are quite a few cases where reliable 1-to-many and many-to-many communications need to occur. This is particularly the case when you have many loosely affiliated independent applications with optional communication paths. d-bus, for all of its flaws... does that well enough that I rarely notice it's running on my system.
I like this approach more and more these days. For example, I run murmur(mumble) servers sometimes, and they deprecated d-bus support for ZeroC ICE (gplv2 or proprietary), but it seems almost as bloated if not more so. The reasoning was mostly around the portability bindings...
Recently though, I have been refusing to support Windows and OSX as a concious decision. One thing I've found is that the constant want/need to target every platform adds an ever-increasing amount of complexity, which really seems to go against the unix philosophy. So I applaud others willing to buck the trend and narrow scope down.
In the end, I think the main problem with the many eyes theory is that code has gotten so complex that there simply aren't enough eyes, and therefore I think the future of software is going to be in reduction of complexity. For example, loc isn't the best measure, but the Minix 3 kernel is at ~20kloc, while the Linux kernel is now at, what ~11mloc!? Not even redhat can audit that shit properly. (another reason we need a Hurd microkernel, but I digress)
Well, I don't think Hurd is going anywhere. They missed a crucial opportunity to move from Mach to L4 and they simply didn't have the manpower. What we might focus on is migrating facilities (drivers, core services like TCP, system services) from Linux, OpenBSD, Illumos, and Minix3(especially that daemon that can restart things even when the filesystem daemon goes kaput) to a well-designed L4 like seL4. At that point, at least we have some hope of taming the beast.
The great opportunity here is that you don't need to care too much which license each driver or service is under, since they're all running in user space. You can have your (yuck) CDDL processes, where you keep your OpenZFS instance. You can have your GPLv2+ processes, where you keep your (maybe a bit dirty, but at least they exist!) Linux drivers.
Also, the major difference in line counts is precisely because of the number of facilities offered by the Linux kernel (most of which you can disable, or would never be enabled in the first place!). Minix3 in its "equivalent" form (containing sufficient drivers for the machine running standard daemons) vs Linux with the same subset of drivers and services would be a much fairer comparison.
It's not like that's 11mloc in one monolithic system. The Linux kernel has a variety of different subsystems, and is maintained by a lot of people. Each subsystem is auditable, so I don't see you have a valid objection here.
Btw, whats happening at bus1, haven't heard about it lately?
That is something we intend to explore. The idea would be to let bus1 be used under the hood by dbus libraries to do peer-to-peer communication where possible (circumventing the broker) but still stay compatible to the D-Bus semantics.
> Btw, whats happening at bus1, haven't heard about it lately?
We spent half a year working on dbus-broker ;)
Things like implicit message buffering were deliberate design decisions.
D-Bus code is basically unreadable, as not only are the bus names heavily scoped (java-style) to avoid collisions, but also the interface and method names. A tiny python (or whatever) script to invoke a single method on a well-known object should be a one-liner but in practice lives over 6-7 lines just due to verbosity.
Whatever technical limitations dcop may have had, its command line was amazing: space separated words and a emphasis on discoverability made it a joy to use
If you wanted a more advanced API than AREXX could reasonably accommodate, it was easy enough to layer the more complex bits next to it.
The threshold for people to take full advantage of DBus is still too high. Maybe there's a need for something that complex for inter-application communication, but if so we'd also benefit from something simpler.
Maybe it's just a documentation failure... I don't know.
iterative work is lame, the old solution is so bad it's not even wrong, here is my idea for a rewrite, look it's even still compatible (for another few minutes).
So this reads more of a strategic re-write, or re-do of implementation while keeping API, which I think is often a smart way to do it.
But, not saying that characterization applies here specifically. The article was quite well-reasoned in explaining the proposed changes, as far as I could tell. Disclaimer: I barely know anything about D-Bus.
Since from the text dbus-broker does not use the bus1 kernel module, does that mean the bus1 project is dead?
A quick review of the code reveals that dbus-broker-launch relies upon systemd entirely for bus-activation. To activate a dbus server on demand, it sends a message to systemd using a systemd-specific protocol. It has no way to demand-activate services on a non-systemd operating system.
The dbus-daemon that this purports to be compatible with at least can be persuaded, via its launch helper, to demand-activate services in a generic fashion using whichever of initctl, systemctl, service, or system-control is appropriate.
It's not like you can run 'systemd-udevd' standalone, for example. Instead there are massive "porting" efforts like eudev and elogind, just to extract the functionality BACK from systemd. And then you have obsolete-but-necessary components such as ConsoleKit and PolicyKit that are stuck on ancient pre-systemd versions with no current replacement.
I started using systemd back before they even took over "udev". Back then systemd was a breath of fresh air. Now I'm using a different service manager and observing systemd gobbling up various critical parts of the Linux desktop like some damn Katamari is like watching a train accident in slow motion.
systemd can iterate quicker, and work faster and better, because they can share more code between projects.
Code that should have been in the stdlib, provided by the distro, but which no one does. So it ends up in systemd.
You see the issue even in GNU yes, which implements its own version of a buffered output, or in cat, which does the same, but slightly different.
All these things should be in the stdlib, and because they’re not, those projects that can use premade solutions iterate a lot quicker, and can get better, faster.
Perhaps it is best considered a time/money/sanity redistribution scheme, because I've certainly spent plenty on the above.
(And what's wrong with time sync, exactly? It seems to work perfectly with zero configuration required, for a large number of people, myself included.)
I'd suggest you search for timesyncd issues on Google, but you preemptively announced bugs don't matter to you and declared "works for me!", so I don't know why you'd ask. So perhaps just stop and consider for a moment why, exactly, it is that your init system is expanding to replicate existing, functional, standards-compliant userspace daemons with limited, buggy, noncompliant "replacements".
Yes, they've been doing a great many things right. That doesn't make them bug-free by any means, nor does it mean that every single thing they've done is right; it does mean they built something incredibly useful and working for a large number of people. People don't seem to talk about those as often; outrage carries so much louder.
> I offered several examples, such as the '0day' username thing that was not only an example of a bug, but an example of very clearly Doing It Wrong on a design level.
Yes, and that was broken. And it has since been fixed, but that didn't get nearly as widely reported. systemd now checks for that issue and reports it rather than running the unit in question.
(You could argue about its parsing of such fields, and that discussion is ongoing, but that's separate from the issue of running the unit as root.)
> I'd suggest you search for timesyncd issues on Google, but you preemptively announced bugs don't matter to you and declared "works for me!"
No, I asked the question of what you considered problematic about timesyncd, especially since you seemed to be talking about what you considered fundamental design issues. I keep a close eye on the large community of Debian folks running systemd, and read the bugs reported, and I had not seen anything notable related to timesyncd, especially not anything that would suggest a design issue.
I never said bugs don't matter to me, nor was I attempting to generalize my own experiences to suggest that it must necessarily work for everyone. You seem to be actively seeking out and assuming hostility where none exists; I would not be surprised if you find it, but I'm not looking to supply any.
Sounds like we're agreeing, sort of. A nasty bug was caused by an inexplicably weird design decision, demonstrating that they haven't been "doing it right" since it became mainstream. That is what I was responding to.
Backing into my interest in the discussion: it is that inexplicably weird design decision, made far worse by the authors' repeated habit of reflexively trying to make their problem someone else's, that explains why I simply don't trust them or their code. This is a pattern that has been repeated over several several iterations across years - the first one I was aware of was when they were crashing the kernel in debug mode, leading to the famous "fuck systemd" patch-showdown. The repeated replay of that pattern shows they haven't learned anything. That combination of arrogance and incompetence is annoying in college grad-hotshots, but they can be kept in line until they grow up; bluntly, it has no place in my systems, and makes me wonder why RHEL wants to burn trust and goodwill like this.
Moving on: briefly, timesyncd is sntp, not ntp, is client-only, doesn't track jitter, only jumps forward, and makes other mistakes. I've seen reports of the sntp implementation being wrong, or perhaps simply having interop problems, but haven't bothered to look because I don't use it. Those things probably don't matter for a gaming client or such, but it simply isn't an ntpd replacement.
And it still leaves the question of why your init system, produced by an erstwhile-enterprise vendor, is replacing unrelated daemons with reimplementations that look more like college-assignment toys than production software.
As I understand it, two separate design decisions interacted there. One was "some fields should be ignored if not supported or if they use new syntax that isn't supported, so that a unit written for new systemd won't break on old systemd"; if they'd done that differently, it'd have generated many problems as people wrote units for the latest bleeding-edge version. The other was "parse and validate usernames and see if they look sane"; the ideal solution there would be "check if they exist and do no other validation if they do", but NSS turns out to not be viable for that in the context of an init system. Usernames should never have been an "ignore if not supported" field, but it's at least understandable how the issue could occur.
If your standard for "doing it right" is "every single thing they do is always correct", very little software will meet that standard.
> Moving on: briefly, timesyncd is sntp, not ntp, is client-only, doesn't track jitter, only jumps forward, and makes other mistakes. I've seen reports of the sntp implementation being wrong, or perhaps simply having interop problems, but haven't bothered to look because I don't use it. Those things probably don't matter for a gaming client or such, but it simply isn't an ntpd replacement.
timesyncd doesn't claim to be a replacement for all of ntpd; it claims to be a simple implementation of the common case of "I want my time to be correct". A client-only SNTP implementation is what they set out to build.
I agree with _jal that this kind of arrogant, dismissive behavior and repeat behavior doesn't instill trust and does a huge disservice to the project's reputation.
I can sympathize with people who spend all day dealing with unwarranted rants and flames letting some of that leak out into their responses to everything, but yes, that should have been handled better.
That's the only thing I've seen people praise systemd for, and I happen to agree.
Nothing else systemd does do I think it does better.
If systemd had remained an init system and nothing else, it would've been a clear improvement worth breaking what it does. What systemd is today should be called the Fedora userspace suite, not a Linux init system.
IMO a well working together basic building block is great. The huge amount of pointless differences needed to be reduced ages ago.
Indeed, who needs competition, the bazaar philosophy got us nowhere right? /s
I like my systems elegant, transparent and bloat-free. Systemd is none of that.
It doesn't relate in any way to the quality of the solution.
That you do not like the outcome does not mean the process was inherently flawed or incomplete.
That honestly doesn't seem unreasonable to me. If you build a tool to make things easier to maintain, you lose much of the benefit of that if you still have to support other alternatives where you have to do everything manually. (For instance, maintaining a 100-line init script in addition to a 10-line unit file.) Asking people who care about that to do the work to maintain it seems perfectly reasonable.
> Which at some point extended to GNOME and other vital sine qua nons.
There are far more people complaining about the lack of alternatives, and far fewer people willing to actually write and maintain alternatives. It doesn't help that many of the people complaining take the attitude of "you don't really need that anyway".
RHEL5 came with one init system. RHEL6 came with another. RHEL7 comes along with yet another init system replacement.
Each of which has required software vendors who build software to run on those platforms to do non-trivial porting work. I know it's annoying the software vendors no end.
I've heard from so many end users that they can't upgrade to RHEL7 (or derivatives) because the software they need to support doesn't work with systemd yet. Stuff that makes them frustrated with both RedHat and the software vendor. Annoying your customers hardly seems the sanest business practices.
Luckily with Debian being on board, and thus Ubuntu, at least there's some incentive for vendors to work at it.
It's far more likely that the real cause is one of the many major dependency updates (6 to 7 is like a half decade jump) and the systemd mention is either an excuse or axe-grinding.
Add to that the games they're playing with interfaces, and at some point, it starts smelling like a miniature, farcical version of Microsoft in the 90s.
I'd like to read this part about RHEL throwing their weight around on the Debian lists. Given your implication is that Debian was clearly "forced" to adopting it, I imagine the relevant evidence shouldn't be hard to find.
(Alternatively, you could actually ask other Debian maintainers yourself, like Josh, in this thread how it went. But you already did that and it didn't seem his narrative aligned with yours, so...)
It might be worth everyone's while remembering that Debian is not the be-all and end-all here, even though it did have a massive hoo-hah. The processes in other distributions were markedly different.
Arch rc maintainer decided to drop rc for systemd.
The linked evil-poettering intermezzo seems to be irrelevant and strawman-ish too.
The process at Debian started years later than any other distribution. It involved various votes, at least a year of discussion, etc.
Then in 2017 someone ignores history and summarizes this into "RHEL was throwing its weight around"... ?!?
It's simply disingenuous to describe Red Hat contributing a lot of engineering time for free as “hammering it down to everybody's throats”, any more than Linux was hammered down our throats over Hurd. I don't think systemd is perfect but I think anyone's standing to complain about open source software is bounded by your willingness to commit to supporting alternatives.
(And, lest you think I'm some sort of die-hard Red Hat fanboy I should note that I started using Debian in the Bo/Hamm era and have never found a compelling value to using RH)
So if they weren't forced into it and RedHat did indeed provide some guarantees, it does actually suggest what the parent says.
Nothing broke, init scripts still plain simple.
Given scripts are used in nearly all linux software branding them them as 'prehistoric' comes off sounding misinformed.
And I started using linux in 1994-95ish and still use it and different bsds.
The only thing it doesn't do properly out of the box is logging. I'm not the biggest fan of non text log files.
There seem to be some Unix/Linux developers who have a mysterious affinity for obtuse arcane time-wasting cognitive-load-increasing design. It's like they see the ability to master crufty badly designed systems as a badge of honor or something, or maybe it comes from a drive for job security or consulting hours.
Example: "systemctl list-units"
What's wrong with "systemctl ls"? What would have been wrong with a shorter command that's easier to type like "sys ls"? It's a core aspect of the system so a name like "sys" would have been appropriate, easy to type, and easy to remember.
Even worse the output of list-units has overly long lines and manages to be simultaneously hard for humans to read and hard for machines to parse. It uses white space as both a delimiter and within identifiers, making shell script parsing with "cut" etc. impossible.
The entire design is like this: obtuse, verbose, clunky, hard to type, hard to remember.
I absolutely loathe this stuff. Using it inspires fantasies of causing physical pain to its designers. A little bit of thought could have resulted in a clean, sparse, intuitive, and discoverable design with memorable commands and a straightforward configuration structure.
Don't get me started on abominations like Debian packaging or Windows drivers, though those are somewhat forgivable as their ugliness can be explained by their age and the need for backward compatibility. Systemd was a green field design from the 21st century so it has no excuse.
I've made enough aliases to make it useful and without having to type such a long name. It should definitely be a shorter name for such a critical piece of software.
One example is using user services vs system services and the various non-intuitive locations and name schemes of the various .service files.
The package management system regardless of distro is pkg the the service management is serv. This would also be helpful if you are using more than one system and needn't be complicated.
"Even worse the output of list-units has overly long lines and manages to be simultaneously hard for humans to read and hard for machines to parse. It uses white space as both a delimiter and within identifiers, making shell script parsing with "cut" etc. impossible."
It's the Microsoftization of Linux. This gives me bad flashbacks of Powershell where exactly what you describe is the case, where the simple is made difficult and the difficult made impossible.
The overengineered, bloated monstrosity that was CORBA also springs to mind. Systemd has been one of the worst things that have happened to Linux in recent years.
That said, the problem with "systemctl ls" would be that systemctl has commands to list multiple things things: units, socket, timers, dependencies, unit files, virtual machines, jobs.
I agree it's annoying that the output isn't formatted in more parsing friendly ways, though. Especially because there e.g. is an option to format the journal output in systemctl status as json -- why couldn't they do that for the main output too (indeed, I wish all tools had an option for that)
It's pointless arcana to boot. There is no reason whatsoever that a green field implementation of something so straightforward needs to be so obtuse.
If the complexity of interface exceeds the complexity of the information it needs to take and provide, it's bad design. If the command or UI structure uses arcane terms when straightforward terms exist, it's bad design.
Sometimes the code is bad. But the model isn't bad. People sometimes use "init script" to refer to the tools that run and maintain network services, and those are completely different.
An init script boots your computer. A service runner is responsible for interfacing with an application and performing complex operations. They should not be confused, but often are, as Systemd conflates the two.
The init scripts differed way too much across distributions. This including pointless differences in where various files are located. All of that has become way more standard, thankfully.
I did have various issues with a few init scripts of my distribution. It didn't happen often, but nowadays you usually can do with a config file, way easier.
We should really just move to BSD already and let them sink this ship.