
Patching Is Hard - dankohn1
https://www.cs.columbia.edu/~smb/blog/2017-05/2017-05-12.html
======
dpiers
One of my favorite stories involving patching and security vulnerabilities
comes from Jonathan Garrett at Insomniac Games:

"Ratchet and Clank: Up Your Arsenal was an online title that shipped without
the ability to patch either code or data. Which was unfortunate.

The game downloads and displays an End User License Agreement each time it's
launched. This is an ascii string stored in a static buffer. This buffer is
filled from the server without checking that the size is within the buffer's
capacity.

We exploited this fact to cause the EULA download to overflow the static
buffer far enough to also overwrite a known global variable. This variable
happened to be the function callback handler for a specific network packet.
Once this handler was installed, we could send the network packet to cause a
jump to the address in the overwritten global. The address was a pointer to
some payload code that was stored earlier in the EULA data.

Valuable data existed between the real end of the EULA buffer and the
overwritten global, so the first job of the payload code was to restore this
trashed data. Once that was done things were back to normal and the actual
patching work could be done.

One complication is that the EULA text is copied with strcpy. And strcpy ends
when it finds a 0 byte (which is usually the end of the string). Our string
contained code which often contains 0 bytes. So we mutated the compiled code
such that it contained no zero bytes and had a carefully crafted piece of
bootstrap asm to un-mutate it.

By the end, the hack looked like this:

1\. Send oversized EULA

2\. Overflow EULA buffer, miscellaneous data, callback handler pointer

3\. Send packet to trigger handler

4\. Game jumps to bootstrap code pointed to by handler

5\. Bootstrap decodes payload data

6\. Payload downloads and restores stomped miscellaneous data

7\. Patch executes

Takeaways: Include patching code in your shipped game, and don't use unbounded
strcpy."

source:
[http://www.gamasutra.com/view/feature/194772/dirty_game_deve...](http://www.gamasutra.com/view/feature/194772/dirty_game_development_tricks.php)

~~~
Godel_unicode
This is my favorite comment. I would like to propose an additional takeaway;
validate all content downloaded from the internet.

------
solatic
> we have to get updated database software from a vendor, and to install it we
> have to update the API the billing software uses

And so why haven't you updated the API the billing software uses, long before
now? Why haven't you updated the database software before now?

CIOs create risk when they don't prioritize keeping their products up to date.
When you can't even install security updates without breaking your
installations, _you have a problem._ And your problem is more than some
technical problem, it's a cultural problem.

Yes it's risky to update a large number of machines. But as a CIO, it is _your
job_ to mitigate that risk. There's no such real thing as "unexpected security
fixes" in this day and age. They are entirely expected, and if you cannot deal
with predictable occurrences then you are quite simply incompetent.

~~~
korethr
I think the reason that the billing software's API hasn't been updated is
right there in it's name: "billing". Billing is one of those things in an
organization that can't suffer much downtime. Nobody wants to be that guy who
who endangered revenue because there was no possible way that the security
patch should have been able to break the billing software.

Does billing work? Yes? Then don't mess with it. Billing is only to be messed
with when the cost of not messing with it is that it will never work again and
all prior data will be lost forever. Nothing short of that risk is sufficient
justification to touch anything related to billing.

I'm being hyperbolic there, but conservatism around systems that presently
work shouldn't be terribly surprising, something like billing especially.

~~~
solatic
> Does [a service with high uptime requirements] work? Yes? Then don't mess
> with it

Ah yes, the "don't fix what ain't broken" canard. And it would be completely
understandable if we didn't understand that, in general, the best way to
ensure overall uptime is to encourage small, frequent updates over large,
infrequent updates, because change is inevitable and the risk of the update is
proportional to its size.

I understand that you're being hyperbolic, but that kind of conservatism is
born of ignorance. Expecting CIOs to be educated about mitigating risk in the
systems they are in control of is not a high expectation for someone in a CIO
role.

~~~
csydas
A lot of this is a social issue too - it's IT professionals over-committing
with SLAs and being too passive when it comes to discussing terms to set
realistic RPOs for fragile systems when the resources aren't available for
proper patch testing.

It's very difficult to explain to end-operators of systems the importance of
having things like redundancies, test systems, and the ability to have
downtime for patching, but it's something that IT Professionals need to be way
better about. It's very tempting to throw out goals like 99.9% uptime, but
many operations run with an employee bandwidth that in no way can support such
a goal for the number of systems they need to deal with.

To be fair, sometimes the end-operator needs require some absolutely
antiquated pieces of technology that rely on voodoo like rituals to keep the
systems running, and trying to shift organizations off this technology is the
diplomatic equivalent of a land war in Russian during winter, and IT
administrators [1]want to avoid getting into such a battle.

Hopefully, this Ransomware outbreak will help provide disruption to such
pieces of technology that are stuck in the past, but part of it is going to
require that the new technology makers be willing to respect why so many
organizations hang on to older technology. (This tends to revolve around
pricing models) I think there is going to be a lot of opportunity to review
major systems that have big restrictions on legacy software and hardware and
overtake the incumbents that aren't willing to shore up their products.

[1] Edit: removed too many mixed metaphors from one sentence O.O

~~~
tannhaeuser
> _(This tends to revolve around pricing models)_

Care to elaborate? Do you mean that newer software comes with mandatory
maintenance costs that users are unable to unwilling to bear? In that case,
paying for security patches and maintenance should be palatable to customers
in this context, shouldn't it? Or did you mean something else?

------
sasas
The vulnerability used by this worm could have been mitigated by disabling
SMBv1, hardening the machine, network segmentation .. the list goes on. If you
couldn't patch, you could have still prevented this worm from impacting you
organisation.

It's a case of inadequate security management and negligence.

------
KaiserPro
"We've got to install MS17-010; these are serious holes."

"We can't just yet. We've been testing it for the last two weeks; it breaks
the shipping label software in 25% of our stores."

But in this case its bollocks. The patch is easy and doesn't fuck too many
things.

at $work, we've had the same issue. We have three estates Windows, Linux and
some solaris. The Linux estate is patched within hours of upstream fixes.
Staged, starting in dev, and bubbling up to prod.

Windows, I've discovered has auto updates turned off. The servers are not in
config management, or monitored.

Its not because patching is hard, its because its not seen as important,
despite being repeatedly hit with cryptolocker malware.

Its just utterly pathetic.

~~~
equalunique
Any insight on how this is going with Solaris? The boot environment feature
seems like it should ease the uncertainty of patching a bit.

~~~
KaiserPro
Its not new solaris, so I have no idea what the mechanism is, we just run the
patch and reboot...

~~~
equalunique
Ah. I believe it's Solaris 11+ where this is best supported.

------
spydum
He is right, but patching is a single layer in your defenses. Where was
network segmentation? What about IPS/IDS?

You would hope one layer of failure doesn't take out an entire business.

~~~
scottLobster
The issue is non-technical people don't understand what those are and care
even less what some IT nerd is complaining about. The more intelligent ones
with more sociable IT staff might even be brought to agree in principle, but
talk about taking the network offline for serious upgrades (and the money to
buy equipment/software/extra staff for said upgrades) and the door slams shut.

On a micro-scale, I got my sister (a political science major now working for
the UN) an SSD for her aging HD-based MacBook two years ago and offered to
install it for her, do a complete system transfer. She was all gung-ho until I
said I'd need to take her laptop offline for a few hours to do the transfer.
She'd rather deal with firefox taking 30 seconds to load, and waiting minutes
to switch tabs, instead of taking a few hours on a weekend for an upgrade. She
simply thinks she can't be out of contact with her coworkers for that period
of time.

Sadly I think she's more representative of the general population than anyone
here is.

------
AdamN
If you upgrade early and often, life will be tough but consistently so. If the
operation is big, roll the patches. If you know there's a problem ... it
should never take 3 months to fix. If it takes you 3 months to deploy a
software update you have 2010 procedures in 2017.

~~~
discreditable
> Upgrade early and often.

This is my motto. I have the suspicion that Microsoft and other companies
don't QA older software as hard as their latest & greatest.

Some folks say I'm crazy but I auto-approve all security updates in WSUS. In 5
years at my company, updates have only broken important software twice. In
both cases I just uninstalled the updates with WSUS and everything was okay.
In today's environment, I feel that the risk of not patching is greater than
the risk of patching—even when you consider buggy software.

------
rbc
I think the problem is more that budgets don't take into account for enough
integration testing, packaging, or automated patching to support enterprise
environments.

There was a time when there were fewer threats to Internet based
infrastructures and skipping patches didn't matter so much. Those times have
passed and there are now significant threats arrayed against Internet systems.
Additionally, commerce and government services are now predominately Internet
based, putting far more at risk than before.

It's time to start putting more money into fixing the patching problem. I'm
really tired of fixing broken systems that I was trying to patch. No good deed
goes unpunished...

------
stonogo
Ransomware is crippling hospitals. People's lives are on the line. And the
tech community is in a frenzy of excuses, whining, and hypothetical bullshit
about shipping labels. Sagely pointing out to each other that hospitals aren't
tech companies, like only tech companies know how to use computers.

Sometimes this industry disgusts me. "X is hard" \-- what the fuck is your
profession? Easy shit? Fine, step aside.

We have let our civilization down. Whining that X is hard is not going to fix
anything. Take the week off and put in some pro bono consulting time with any
nearby organization that got hit. Make things better. Fuck your blog posts.

~~~
dasil003
Sorry, I didn't catch where your volunteer shifts are happening?

In the very first sentence you shame commentators for presuming to be more
competent than hospital staff, then you go on to suggest riding in on a silver
steed to bless them with your powerful expertise.

Frankly, a bunch of startup hackers showing up and trying to play hero is not
going to be any more effective than this blog post. This doesn't get solved
with arrogant cowboy antics, and it doesn't get solved in a week even by
seasoned experts. You have no knowledge of the ecosystem of devices operating
on their network, and the constellation of concerns they must balance, and
thus you can't offer anything but the most general of advice with which their
IT staff is certainly already familiar.

If you really want to make a difference, go apply for a job there and put in a
few years of work—that is likely to have impact. Short of that, there's worse
things you can do than write a blog post; at least a few of them are probably
providing useful perspectives to those actually tasked with solving this
problem long-term.

~~~
stonogo
My shifts are happening in two elementary schools, three urgent care clinics,
and a local nonprofit's office. Why did you think I was speaking
hypothetically? I've already taken the next two weeks off. When I'm done with
this batch I've got more to do.

Your entire second paragraph is bullshit. The affected are _being_ affected
because their existing technology deployments are broken. This isn't just some
nightmare that happens to everyone, _and_ it isn't some abstract structural
issue that only affects large organizations. There are a lot of groups getting
screwed here. They need help. If they didn't need help, they wouldn't have got
hit. I am helping.

You can sit by and armchair-quarterback the incident response, that's fine. We
don't need you anyway. Useful perspectives can go pound sand. There is work to
be done.

~~~
dasil003
The OP is talking about big orgs, so it's kind of a dick move to hijack the
conversation and proclaim your agenda to be morally superior. You don't have
to put other people down to make your point. It's childish, counter-
productive, and you leave yourself open to criticism that your work is also
The Wrong Thing because you're not volunteering for some greater cause like
helping sick and dying people in developing nations. But I guess if a desire
to be better than everyone else motivates you to do some good then I guess
it's not a total loss?

------
joncampbelldev
"Patching is Hard", yes but the brick wall of a massive cyberattack is harder.
It's not an excuse, not patching is a bet, its people saying "This is
difficult so I'm going to skip it, in exchange for increasing the chance of a
huge amount of difficulty in the future"

I bet all the hospitals are really glad their managers/IT staff avoided the
difficulty and small uptime impact of patching now ...

Hopefully budgets will be allocated in the NHS at least to prevent future
incidents like this.

------
rietta
I would say that patching is maddeningly difficult in the absence of automated
end-to-end tests. It's much easier with those in place.

~~~
akkartik
Absolutely. And for a flaw in your OS you need an OS with automated end-to-end
tests. Good luck!

(Speaking as someone who's trying to imagine OSs with automated end-to-end
tests: [https://github.com/akkartik/mu](https://github.com/akkartik/mu))

~~~
rkeene2
I have automated end-to-end OS testing. It's basically a small bash script
which spins up a number of VMs in various network configurations (since my OS
is only useful in a cluster, as it provides Ceph and a cloud API to KVM). One
of the fully automated tests is even an upgrade test where the previous
release is installed and then the currently built version is constructed into
a patch, installed, and verified that all the VMs it started pre-upgrade are
still running, then builds some more VMs (these are nested VMs since the test
systems themselves are VMs). It's pretty simple, it starts by installing the
ISO and driving the systems over their console.

------
RachelF
Microsoft used to do decent testing of their updates.

In the last two years, they seem to be saving money by letting their users to
the testing.

When a patch comes out we need to weigh the unpatched security risk vs the
risk of the patch breaking things.

~~~
cryptarch
Microsoft, training the next generation of operations teams to use staging
environments.

------
tempodox
Nice to see a sober reaction to all the victim blaming that's been going on.
Reality is eventually more complex than some commenters would like to
acknowledge, even on HN.

------
based2
[http://marc.info/?l=patchmanagement](http://marc.info/?l=patchmanagement)

------
jimmcslim
It's certainly an argument in favour of adopting PaaS wherever possible, but
then you're making the assumption that the people running your PaaS are
competent...

~~~
dredmorbius
Operational compentency _is_ something which a sufficiently well-funded and
-capitalised organisation can accomplish.

Though there are systemic risks involved ehre as well.

I'm quite torn, myself.

