Pretty much no software, even when run deterministically, is bijective. There are almost always cases where two different states map to the same state.
You don’t need to reverse time if you can deterministically reproduce everything that led up to the point of interest. (In practice we save a snapshot of your system at some intermediate point and replay from there.)
I think they are correct. I’d say it’s easy to work all your life in the shadow in the cave and never realize that you can do everything yourself. You don’t need “glue” people or project managers.
What is "you" here? If "you" are a single developer or founder of a small startup making a single product then sure, "you" might not need any of these things.
But I think the point of the article is that Google seems to have pretty serious problems stemming from the fact that it apparently has lots of individual teams working on their own thing without enough coordination with each other.
I don’t think google’s problem is a necessary outcome of relying on engineers for the glue and project planning/management. In fact, Google does not do that. They’re full of glue people.
The strings are concatenated, it’s just that the first line was a comment, meaning the compiler/assembler saw all the next lines as part of the first. Comments don’t cause compiler errors, so no error here…
The first string is a comment, so all the strings after it just get appended to the comment text, until it finally hits the first newline. Quite an insidious mistake.
They're not creating raw sockets†. The neat thing about WireGuard is that it runs over vanilla UDP, and presents to the "client" a full TCP/IP interface. We normally plug that interface directly into the kernel, but you don't have to; you can just write a userspace program that speaks WireGuard directly, and through it give a TCP/IP stack interface directly to your program.
† I don't think? I didn't see them say that, and we do the same thing and we don't create raw sockets.
It reminded me a lot of some of the stories told by that poor fellow currently blogging his way through late-stage neck cancer, and whose posts lately have often reached the front page - specifically, it reminded me of his posts that talk in detail of what it's been like for him and his partner trying to negotiate the modern medical industry, and which would read like something out of Terry Gilliam's most darkly comedic fever dreams were it not for the literally life-and-death import of their subject matter.
I doubt anything like that was the author's intent, but who cares? Theirs is the first, not the last, of the hands which shape the text.
Yeah that’s really weird. On the one hand, certain precision operations are already carried out by robots guided by surgeons on a computer screen, and experienced surgeons in certain domains can even oversee multiple robotic operations simultaneously.[1] On the other hand, unless you’re willing to put up with failed organ retrievals to save a little bit of money, I can’t imagine humans out of the loop any time soon.
[1] Source: A family member recently retired after working in the OR for more than three decades.
Yeah it was jarring — wouldn’t be surprised if the first draft didn’t contain it at all, and the editor forced it in there so the piece would have a “hook” to current events.
Who said overflow was wrong? The sort of code I work on has to continue running, no matter what. Continuing on is always better than crashing, at least in my context.
It’s also basically impossible to establish invariants about a system for code that is run in signal handlers
There are almost no circumstances where a crash in production code is preferable.
Can you imagine working on a complicated Excel document when you run into a 2038 bug and you lose all your work instead of just having one of your future dates calculated as being in the 1970s?
Or flying a plane and the system has to reboot midair instead of saying you're at -32768 feet of altitude?
Or there are now 2.15 billion virus definitions, so your entire security suite just stops running?
Or your Tesla stops keeping you in your lane without warning because your odometer turned over?
Most things are not Therac-25. Much more software is protecting us and our data than is exposing us to danger. Loss of data, loss of navigation, loss of safety systems or life support... simply unacceptable. Turn on -ftrapv when testing and especially when fuzzing, sure, but production code should almost always be built with -fwrapv.
Excel maintains a recovery file of your open document. If it crashes, you can reopen Excel and will likely recover most if not all of your data. You are far more likely to corrupt data by keeping the files (primary and recovery) open in a program that is in a known-bad state.
If software crashes, it can simply be rebooted. I can't get the actual requirements for airplane control softward without paying hundreds of dollars. However, I would imagine that a reboot finish quicker than a pilot could open the manual to the "negative altitude indication" section.
> Or flying a plane and the system has to reboot midair instead of saying you're at -32768 feet of altitude?
This is exactly how Airbus's computer control philosophy works: Bail out in case of inconsistencies/unexpected errors. In the best case, this means falling back to the other (one or two) still operational redundant instances of the same component. The remediation is to literally reboot the errored-out component manually, mid-flight, with absolutely no adverse consequences. For all you know, this has already happened on a flight you've been on!
In the worst case (which can happen if it's an internally-consistent error condition, e.g. something like a conceptually unforeseen condition that none of the equivalent system instances can handle), the entire system declares failure and bails out to a lower-level system, if needed all the way down to mechanic/hydraulic connections to the most critical control surfaces in case of e.g. the A320. This can mean that e.g. the pilots need to start flying the plane manually.
Flying manually is preferred over e.g. one of two systems thinking that the plane is upside down, with a compromise action of rotating 90 degrees along some axis.
> Can you imagine working on a complicated Excel document when you run into a 2038 bug and you lose all your work instead of just having one of your future dates calculated as being in the 1970s?
if your program loses data on a crash it is not worth using. What is this, 1970?
> Or flying a plane and the system has to reboot midair instead of saying you're at -32768 feet of altitude?
I'm pretty sure that's exactly what happens in fact: a quick reboot temporarily handling control to lower level or redundant controls is better than feeding wrong information to a sophisticated flight control system. I think this is generally true for all high reliability control systems.
Crash-only software works for Erlang, and I think people expect telephony switches to be reliable. (Swift also crashes on overflows and runs your phone.)
Most of your examples would be just as bad if they started having unbounded incorrect behavior, and an overflow you didn't know about could lead to that. So don't get the math wrong!
> (Swift also crashes on overflows and runs your phone.)
AFAIK, over half of the mobile phones in the world, and probably nearly all of the fixed phones, do not run Swift, but instead some older language which does not have this "crash on overflow" behavior.
Many landline and mobile phone switches do run on Erlang. Not getting a connection in case of e.g. an integer overflow for "pick the next available trunk line" is preferable to kicking out an existing connection on line 0.
Note that an exception doesn't need to literally mean "the entire system comes to a screeching halt": You'd usually limit exception handling to the current operation.
Nobody is talking about exceptions. We're talking about intentionally invoking undefined behavior.
You don't use trap representations to handle potential overflows. You check for possible overflow prior to the operation, and then handle appropriately.
The plane example. Better to have software fault, stop and let redundant system take over. Which might be the pilot looks at another gauge. Pilot logs problem and gets looked at.
I totally disagree. Overflow almost certainly represents a bug and the violation of various invariants. Your code is doing something, even if it isn't giving radiation doses. Computing the wrong data and storing it can cause problems for customers, modify data you didn't plan on touching, or render data entirely inaccessible if various encryption operations are performed incorrectly.
Just like most things are not the Therac-25, most things are also not safety critical when they crash. Some web service that crashes just means that it returns 500 and you see it on whatever stability dashboard.
Web services are a great example. Explain to me why it's preferable to give a constant 500 and lock me out of my account, rather than having a fucked up background image scaling or whatever else your web service would be doing with arithmetic.
I write software for spacecraft. I’d much rather take my chances that the overflowed value is meant for humans or for some tertiary system than to end the mission regardless.
I would guess they don't use a language where signed overflow is undefined. I use C all day every day and that's gotta be one of the worst legacy artefacts of it. As someone else said "it isn't 1970" the system is twos complement.
I have no idea what you mean by this. A couple types have diagnostics, but so do overflow and wrapping. Most don't. In general, it's much easier to have runtime diagnostics for arithmetic than for UB.
I guess if I squint and "silent" implies you're not allowed to install diagnostics, then it's worse? But that's so artificial of a comparison as to be silly.
Unmanned spacecraft, hopefully? "Taking chances" is not a thing that I'd like the designers of a system that controls the vessel I'm on (or that overflies my house) routinely do during development.
Undefined behavior can transitively expand to the entire system very quickly. For any subsystem, the proper error handling for an overflow/arithmetic error that's not properly handled right in the place it occurs is to declare the subsystem inoperative, not to keep going.
If that approach routinely takes out too many and/or critical subsystems, you likely have much larger problems than integer overflows.
There are the kinds of defects that people notice, and then there are the kinds of defects that people don't notice. Crashes are usually noticed, while nonsense calculations and undefined behavior are sometimes not noticed.
I worked at a company that sold a product that some customers used to make decisions involving a lot of money. A program misbehaved, or gave the wrong answer, or returned from a function too soon, or something. It didn't crash. A customer lost a lot of money, and we were liable.
It turned out that the bug was a symptom of undefined behavior that would have been caught by one of the flavors of assertions built into standard releases of the core C++ libraries. The team in charge of the app in question had long ago disabled those assertions because "they keep crashing the code."
Infrastructure within the company then went through a small internal crisis about whether to force assertions down people's throats or to let it be wrong on prod. The compromise was to have the assertions expand to loud warnings on prod. Ops then had a tool that would list all of the "your code is provably broken" instances, and that went to a dashboard used the shame the relevant engineering managers.
Not sure if it worked, but when you have a _lot_ of C++ code, a great deal of it is broken and "it's fine." Until it isn't.
Overflow is fine if you're aware of it and have code that either doesn't need to care about it, or can work around it.
Consider protocols with wrapping sequence numbers. Pretty common in the wild. If I increment my 32-bit sequence counter and it goes from 2^32-1 back to 0, that's likely just expected behavior.
reply