The general thing in software dev is that it's usually easy to catch a specific bug, it's just that there are so many possible bugs you can't think of them all. I'm sure there is now a test that checks for strange deletions on macos, or will be in a few days. Until the next stupid thing nobody expected..
SIP is a relatively new feature. This would have been devastating only a few years ago. What Google did was effectively:
sudo rm -Rf /var
Would you trust a program that even attempted such a command on your machine? "But they had SIP disabled" is no defense for software that tries to take destructive action.
SIP is intended to protect against malicious software and against users accidentally hosing their system; there are legitimate reasons for disabling it. This was not anyone's fault but Google's.
How many Chrome updates have you gotten that did not break your computer? Was your computer broken by this? My understanding is the rollout was stopped fairly quickly.
How many times did Ann Rule interact with Ted Bundy and he didn't murder her once?
I mean, he can't be all that murdery, right?
We're talking about nuking a filesystem, not glitching out randomnly. That it hasn't nuked the filesystem before is not extenuating circumstances. It doesn't speak to character either.
I'm as comfortable using Google apps today as I was a week ago, because I figured out long ago how to disable Keystone permanently. (It wasn't easy.)
But I am rather pissed at Apple because SIP is a global switch and with it enabled, there are several significant, legitimate things you cannot do with your own computer. SIP should work like sudo, not like a meta version of root. If it did so, nobody would have been affected by this week's Google nonsense.
> with it enabled, there are several significant, legitimate things you cannot do with your own computer.
For those wondering about such a use case: the only way to get eGPU's working on <=2015 MBPs is to disable SIP (and use purge-wrangler). It's not officially supported because 2015s don't have TB3
Why do they have code that's trying to remove /var?
Doing something stupid and relying on the safety equipment to save you is a stunt. Doing it with someone else's stuff is being an asshole. This is not the behavior of sober grown-ups.
could be something as simple as "rm /var/$myfile" with myfile being null or unset. As long as /var isn't owned by the current user and/or SIP is installed, testing won't let them know they have a problem.
This is why you should always construct paths in particular and URIs in general using your languages' path APIs instead of string interpolation.
That doesn't protect you fully. You still have to check that $myfile is not undefined or "", but it helps with related problems and it tends to arrange the code in such a way that the lack of further sanity checks sticks out a bit more.
1) included code that performed dangerous system-level operations, despite the fact that
2) the included code was guaranteed to fail under the expected / default OS configuration?
The only logic I could see for such a state would be "let's teach them a lesson" for those users who chose to operate in the "dangerous" configuration. It doesn't make sense why they'd even attempt the symlink removal.
or 3) They intended to remove a more specific symlink, with the address generated from variables, but due to bugs some/all of the variables were empty and the concatenation of the variables just produced /var instead.
Which is the cause of like 90% of accidental "rm -rf /"s in history.
Right, but if you're a large software distributer, the onus is entirely on you to QA software releases on every possible OS configuration.
If this were an individual working alone (or even a small company), I'd have some sympathy. But Google has enough to pay QA engineers and build a sufficiently sophisticated test lab. They should get no pass for this.
Well, hold on. Let's not scapegoat another team for the fact that we can't seem to handle relative paths well after 30 years of spectacular case studies in how fucking stupid we are with path calculations.
I'm always finding people doing string arithmetic instead of using the APIs. Sometimes I catch myself doing it.
Rails got close to a solution by half-assedly tracking the provenance of all strings passed into certain functions. We could probably use a bit more of that.
The fact that this got through Google's QA is inexcusable. Yeah I understand a dev messed up. It happens. But it should have been caught before being released.
All it would have taken is testing the installer on a non-SIP Mac.
In this case a smoke test on such a Mac could have sufficed, yes.
But even when I have a QA team, which is less and less often, I like the devs to be involved in setting up stuff like this. QA is often not so good at engineering robust tests. But if they could write better code than we can they’d make more money as devs. So I’m still not letting the devs off the hook.
This class of error comes down to our chronic insistence on using stringly typed APIs for path/URI manipulation. It’s tantamount to a SQL injection attack, but with less data in the “query” instead of more. You should not be assembling file paths for destructive writes or deletes by string concatenation. But this is on almost nobody’s radar.
Is anyone else in this thread advocating for a change in software design? Mostly we are all blaming Google for fucking up. This is the headspace everyone but PHP was in for SQL fifteen years ago.
Another possibility is that it had a list of all the files and directories it has installed, and removed each one before installing a new version. If the list included also all the parent directories (which is common if you want to avoid leaving empty directories behind), and the list did not specify whether each entry is a file or a directory (that is, it looked at the disk for each entry: if it was a directory, it did a rmdir, otherwise it did an unlink), and it installed anything below /var, all that combined plus the fact that /var was a symlink instead of a directory could lead to it calling unlink on /var, while the developers would be expecting it to do a harmless rmdir (which does nothing unless it's an empty directory).
How the hell does that get past QC?
Also heartily agree with other comments about Google's update processes in general.
Trust is still a house of cards in computing :(