Hacker News new | past | comments | ask | show | jobs | submit login

My understanding is that /var was only removed on systems without SIP enabled, the recommended (and default) setting.

This only affected users who went out of their way to disable this security feature. Presumably Google QA had this enabled on all their machines.




SIP is a relatively new feature. This would have been devastating only a few years ago. What Google did was effectively:

    sudo rm -Rf /var
Would you trust a program that even attempted such a command on your machine? "But they had SIP disabled" is no defense for software that tries to take destructive action.

SIP is intended to protect against malicious software and against users accidentally hosing their system; there are legitimate reasons for disabling it. This was not anyone's fault but Google's.


It's a symlink and / has to be writable by the logged in user, so it's actually just effectively

  rm /var


Since when is / writable by any logged in user? Is a MacOS thing?


It requires the user to have disabled SIP and then used root access to modify it. On a standard install, it's owned by root:wheel with mode 755.


Thanks for the correction!


I don't think this was intentional or malicious on Google's part. It's only not a defense for the QA team.


Whether it was malicious or not it was unnecessary and irresponsible.

It’s not just a QA issue - it’s a design flaw.


I expect it was a typo, like that time Steam deleted people's data on Linux https://www.theregister.co.uk/2015/01/17/scary_code_of_the_w... , or [edit: ok this one probably isn't relevant] when Adobe CC removed random folders on Macs. https://help.backblaze.com/hc/en-us/articles/217665378--bzvo...


That's hardly an excuse, is it? The question is whether, after this incident, you're still comfortable with using any Google app.


How many Chrome updates have you gotten that did not break your computer? Was your computer broken by this? My understanding is the rollout was stopped fairly quickly.


How many times did Ann Rule interact with Ted Bundy and he didn't murder her once?

I mean, he can't be all that murdery, right?

We're talking about nuking a filesystem, not glitching out randomnly. That it hasn't nuked the filesystem before is not extenuating circumstances. It doesn't speak to character either.


Yes, an app accidentally deleting a folder after you've turned off protections on that folder is definitely the same thing as serial killing.

People here sometimes.


> That's hardly an excuse, is it?


I was commenting on the hilariously bad choice of analogy.


How many other equally dangerous and irresponsible mechanisms are Google using?


I'm as comfortable using Google apps today as I was a week ago, because I figured out long ago how to disable Keystone permanently. (It wasn't easy.)

But I am rather pissed at Apple because SIP is a global switch and with it enabled, there are several significant, legitimate things you cannot do with your own computer. SIP should work like sudo, not like a meta version of root. If it did so, nobody would have been affected by this week's Google nonsense.


> with it enabled, there are several significant, legitimate things you cannot do with your own computer.

For those wondering about such a use case: the only way to get eGPU's working on <=2015 MBPs is to disable SIP (and use purge-wrangler). It's not officially supported because 2015s don't have TB3


Why do they have code that's trying to remove /var?

Doing something stupid and relying on the safety equipment to save you is a stunt. Doing it with someone else's stuff is being an asshole. This is not the behavior of sober grown-ups.


could be something as simple as "rm /var/$myfile" with myfile being null or unset. As long as /var isn't owned by the current user and/or SIP is installed, testing won't let them know they have a problem.


This is why you should always construct paths in particular and URIs in general using your languages' path APIs instead of string interpolation.

That doesn't protect you fully. You still have to check that $myfile is not undefined or "", but it helps with related problems and it tends to arrange the code in such a way that the lack of further sanity checks sticks out a bit more.


So, Google:

1) included code that performed dangerous system-level operations, despite the fact that

2) the included code was guaranteed to fail under the expected / default OS configuration?

The only logic I could see for such a state would be "let's teach them a lesson" for those users who chose to operate in the "dangerous" configuration. It doesn't make sense why they'd even attempt the symlink removal.


or 3) They intended to remove a more specific symlink, with the address generated from variables, but due to bugs some/all of the variables were empty and the concatenation of the variables just produced /var instead.

Which is the cause of like 90% of accidental "rm -rf /"s in history.


Right, but if you're a large software distributer, the onus is entirely on you to QA software releases on every possible OS configuration.

If this were an individual working alone (or even a small company), I'd have some sympathy. But Google has enough to pay QA engineers and build a sufficiently sophisticated test lab. They should get no pass for this.


Well, hold on. Let's not scapegoat another team for the fact that we can't seem to handle relative paths well after 30 years of spectacular case studies in how fucking stupid we are with path calculations.

I'm always finding people doing string arithmetic instead of using the APIs. Sometimes I catch myself doing it.

Rails got close to a solution by half-assedly tracking the provenance of all strings passed into certain functions. We could probably use a bit more of that.


The fact that this got through Google's QA is inexcusable. Yeah I understand a dev messed up. It happens. But it should have been caught before being released.

All it would have taken is testing the installer on a non-SIP Mac.


In this case a smoke test on such a Mac could have sufficed, yes.

But even when I have a QA team, which is less and less often, I like the devs to be involved in setting up stuff like this. QA is often not so good at engineering robust tests. But if they could write better code than we can they’d make more money as devs. So I’m still not letting the devs off the hook.

This class of error comes down to our chronic insistence on using stringly typed APIs for path/URI manipulation. It’s tantamount to a SQL injection attack, but with less data in the “query” instead of more. You should not be assembling file paths for destructive writes or deletes by string concatenation. But this is on almost nobody’s radar.

Is anyone else in this thread advocating for a change in software design? Mostly we are all blaming Google for fucking up. This is the headspace everyone but PHP was in for SQL fifteen years ago.


> This class of error comes down to our chronic insistence on using stringly typed APIs for path/URI manipulation.

Like most failures, the analysis needs to identify a chain of events involving various failures; perhaps it would go something like this:

1. Industry and libraries commonly use strings for path/URI manipulation.

2. A software engineer did so in the installer and made a typo.

3. Code review did not identify the problem.

4. QA (and the CI process) didn't test on a Mac that was either old enough not to have SIP or had SIP disabled.

5. Many Mac users, especially in specialized environments running custom or specialized kernel extensions and drivers, use macOS with SIP disabled.

6. Chrome is widespread enough that some users in (5) downloaded the update. That subset had their Mac systems hosed.

I would identify (2), (3), and (4) as problems in the chain where Google carries blame.


Another possibility is that it had a list of all the files and directories it has installed, and removed each one before installing a new version. If the list included also all the parent directories (which is common if you want to avoid leaving empty directories behind), and the list did not specify whether each entry is a file or a directory (that is, it looked at the disk for each entry: if it was a directory, it did a rmdir, otherwise it did an unlink), and it installed anything below /var, all that combined plus the fact that /var was a symlink instead of a directory could lead to it calling unlink on /var, while the developers would be expecting it to do a harmless rmdir (which does nothing unless it's an empty directory).


No, [2] is incorrect – exactly opposite. The included code was 'guaranteed to' (and did) work under the default OS configuration.


Like kids' bowling, where they block up the gutters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: