
Meltdown Update Kernel doesnt boot - sz4kerto
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1742323
======
hansendc
This arch/x86/events/intel/ds.c fix is unlikely to have rendered too many
things unbootable. I missed ds.c entirely when doing the original
implementation. I can't fault the nice folks at Canonical for mis-merging a
tiny hunk like this. It really only affects pretty specific hardware anyway.

~~~
MBCook
What hardware does it effect?

~~~
JackNichelson
Pretty specific one.

~~~
ASalazarMX
Take my hand, I'll guide you back to Reddit.

------
gkya
Would this entire Meltdown/Spectre thing count as the biggest mess-up of
computing history? When yesterday the PoC repo was posted here, the spy.mp4
demo video gave me some chills. And now I can't update my OS before making an
installation USB because Canonical can't just follow Linus' releases. Thanks.

~~~
bArray
>Would this entire Meltdown/Spectre thing count as the biggest mess-up of
computing history? When yesterday the PoC repo was posted here, the spy.mp4
demo video gave me some chills.

It must be up there amongst the greats, probably with the "halt and catch
fire" op code. Normally they just patch this stuff with microcode and never
really tell anybody, this time that won't work.

I'm not entirely convinced it was a mistake at all (dons tin foil hat), Intel
have been making suspicious design decisions to their chips for a well now
(think this, Intel Me, hidden instructions, etc). It seems clear to me that
this security by obscurity approach is quite frankly crap.

>And now I can't update my OS before making an installation USB because
Canonical can't just follow Linus' releases. Thanks.

Linus' releases need some sort of buffering before they can be considered
stable, often distributions will apply their own patches on top. Also consider
the scenario where Linus releases a bad kernel and no testing has been
performed before rolling out to all Linux users.

~~~
throwawayfinal
I think it's absolutely unreasonable to imply that this was intentional.
Besides the massive amount of complexity these systems have, there are plenty
of "legitimate" places to hide backdoors, instead of in a performance
architecture decision.

Keep in mind that whatever "evil agencies" would have asked for this would
most likely find themselves vulnerable, and nobody would sign off on.

I do agree, however, the "security by obscurity approach is quite frankly
crap". The fact that even large corporations (not the big 5) can't even get
away from ME speaks volumes about why this is a bad idea. Facebook isn't the
only company with important data.

~~~
Animats
The Intel Management Engine is a backdoor. Speed variations in speculative
execution are an inherent property of the technology. Until recently, few
people thought this was exploitable, and it took a lot of work to figure out
how to exploit this.

~~~
andrewflnr
You do realize those are ideal properties for a backdoor, don't you? If you
were writing the spec for a dream backdoor, you would write that down. The
only way you could improve it would be "everyone thinks it's impossible, and
they never figure it out."

~~~
jijji
the ideal properties of a backdoor were visualized to me the day i hacked into
an author of a largely distributed piece of smtp mail server, only to find
sitting in his home directory an unpublished integer overflow exploit written
by him years earlier for a version of the software that is currently in wide
distribution...

~~~
dvfjsdhgfv
That's close to perfect, indeed. The drawbacks in this scenario are that (1)
not everybody runs an SMTP server, (2) if it's open source (and if it's very
popular, then it is), some other smart people will look for the bug and
publish it for fame. That's quite different from a backdoor built into a
processor (although I really doubt Intel was really involved in any shady
practices, it looks like they were not smart enough).

~~~
paulie_a
Judging from the numerous decades old bugs recently found, the concept of many
eyes needs to die.

And in the case of SMTP, it's basically a pinata of bugs for the last 30 years
regardless of platform

------
trendia
(Copying my instructions from another post).

If kernel 4.4 doesn't work, I recommend compiling the 4.15 kernel. (Note,
however, that you may need to apply a patch to NVIDIA drivers).

I've done this on Ubuntu 16.04 LTS, 17.10, and Debian 8 so far this week. To
compile, set CONFIG_PAGE_TABLE_ISOLATION=y. That is:

    
    
        sudo apt-get build-dep linux
        sudo apt-get install gcc-6-plugin-dev libelf-dev libncurses5-dev
        cd /usr/src
        wget https://git.kernel.org/torvalds/t/linux-4.15-rc7.tar.gz
        tar -xvf linux-4.15-rc7.tar.gz
        cd linux-4.15-rc7
        cp /boot/config-`uname -r` .config
        make CONFIG_PAGE_TABLE_ISOLATION=y deb-pkg

~~~
lathiat
I wouldn't really recommend doing this, but if you really want to do this, it
would probably be easier just to use the pre-spun mainline kernels:
[https://wiki.ubuntu.com/Kernel/MainlineBuilds](https://wiki.ubuntu.com/Kernel/MainlineBuilds)

~~~
gphreak
Exactly, and use 4.14. According to a recent comment from a kernel dev both
kernels use the same patch approach. 4.4 and 4.9 are using a different
approach that’s less ideal, less complete and apparently less tested.

------
nykolasz
Waiting a few days to patch my own servers... Not sure what is more dangerous
right now: applying these rushed patches or the vuln itself.

~~~
snuxoll
What distro are you running? I trust Red Hat to get kernel updates right the
first time, I just patched externally facing servers and systems that handle
PHI tonight with no issues (outside of one of my PostgreSQL servers showing a
non-neglible increase in CPU usage, damnit Intel).

Of course, I also go into any updates with a rollback plan. ITIL sucks, but
one thing it taught me was the value of well documented plans any time you
make changes to production systems.

~~~
gphreak
Same. RHEL/CentOS went without a hitch. The age of the kernel start’s to
concern me, though.

According to the top comment in one of the posts in HN even 4.9 and 4.4 use a
less ideal patch:
[https://news.ycombinator.com/item?id=16085672](https://news.ycombinator.com/item?id=16085672)

I can’t really judge how much RH engineers are capable of fixing that kind of
stuff in a kernel that’s officially out of support upstream.

Based on the general quality of RHEL/RHV I trust them to do the right thing,
but I have no insight whatsoever in how kernel development actually works.

~~~
snuxoll
Red Hat pays the salary of a couple kernel developers, backporting security
fixes is a pretty big part of their job. Keep in mind, RHEL/CentOS 7 doesn't
even use something newer like 4.4 - it's still on 3.10 because Red Hat
guarantees a stable kABI throughout the lifetime of a release

------
lunorian
See this is why you wait a day or two before patching :)

~~~
Whitestrake
If everyone waited a day or two before patching, this bug would simply be
opened a day or two later than it was.

~~~
snuxoll
How hard is it to just boot an older kernel and rollback the default? Before I
even thought about patching sensitive systems tonight the first thing our IT
director asked was if I had a rollback plan. The answer? "Yes, boot old
kernel, yum history undo [transaction id], reboot".

Always have a backout plan when doing upgrades, I'm just glad EL and derived
distributions have an easy way to do it with yum's transaction history.

------
noncoml
What was the problem?

All the notes say is that 109 fixes it.

~~~
jonathonf
[https://launchpad.net/ubuntu/+source/linux/4.4.0-109.132](https://launchpad.net/ubuntu/+source/linux/4.4.0-109.132)
:

    
    
        linux (4.4.0-109.132) xenial; urgency=low
        
          * linux: 4.4.0-109.132 -proposed tracker (LP: #1742252)
        
          * Kernel trace with xenial 4.4  (4.4.0-108.131, Candidate kernels for PTI fix)
            (LP: #1741934)
            - SAUCE: kaiser: fix perf crashes - fix to original commit
    

diff'ing the two changes it was this:

    
    
        > diff -u linux-4.4.0/arch/x86/events/intel/ds.c linux-4.4.0/arch/x86/events/intel/ds.c
        > --- linux-4.4.0/arch/x86/events/intel/ds.c
        > +++ linux-4.4.0/arch/x86/events/intel/ds.c
        > @@ -415,7 +415,6 @@
        >  		return;
        >  
        >  	per_cpu(cpu_hw_events, cpu).ds = NULL;
        > -	kfree(ds);
        >  }
        >  
        >  void release_ds_buffers(void)
    

plus it's here:
[https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1741934...](https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1741934/comments/17)

~~~
AdmiralAsshat
They consider a bug that renders the OS unable to boot a "low" urgency!?!

~~~
lunorian
Sorry fam - security issues are more important ¯\\_(ツ)_/¯

~~~
arcticbull
I think it's fair to say build 108 has no security issues :P in fact, it's the
most secure one yet.

~~~
dingo_bat
Yup. It's so secure that they don't even load userspace into RAM!

------
agumonkey
any report of windows update messing with the bios rendering motherboard non
bootable (powers on , but no post, not even an error beep)

~~~
otakucode
For AMD chips, yes. Yesterday Microsoft announced they were suspending rolling
out updates to certain AMD chips because it was resulting in non-bootable
systems. I didn't read the technical details so I can not say whether it was
specifically BIOS-related. Both a total non-booting state and BSODs were
mentioned in the article I saw (from general press, so might have been
garbage, sorry).

~~~
rincebrain
Microsoft's claim was "Microsoft has determined that some AMD chipsets do not
conform to the documentation previously provided to Microsoft to develop the
Windows operating system mitigations to protect against the chipset
vulnerabilities known as Spectre and Meltdown"[1], and their docs simply
suggest asking AMD for more details[2].

So it sounds like it was probably specific chipsets and not CPUs, but who
knows.

[1] - [https://www.engadget.com/2018/01/09/microsoft-halts-
meltdown...](https://www.engadget.com/2018/01/09/microsoft-halts-meltdown-
spectre-amd-patches/)

[2] - [https://support.microsoft.com/en-
us/help/4056892/windows-10-...](https://support.microsoft.com/en-
us/help/4056892/windows-10-update-kb4056892)

~~~
cpncrunch
It happened to one of my HP boxes that has an AMD chipset. After installing
the update, windows 10 just hangs at the blue windows logo. Only solution is
to turn the machine off and on twice, which then results in it undoing the
update.

The Microsoft link you provide says "Microsoft is working with AMD to resolve
this issue", so they're not just brushing it off and telling customers to
contact AMD.

------
krutzger
I was under the impression that Ubuntu would automatically revert to last good
kernel of the new one fails to boot. Was I mistaken?

~~~
bproven
AFAIK worse case you should be able to select the previous kernel in grub2
upon reboot. It just hangs after grub on this (bad) kernel I think...

~~~
ams6110
Yeah but if you have a couple of hundred machines that aren't booting....
That's pretty worst case.

~~~
jlgaddis
That is pretty worst case. OTOH, it teaches the lesson about testing updates
on a small number of hosts before rolling 'em out globally.

One has to learn that lesson at some point.

------
nkkollaw
This is a little bit sensationalist (is that a word?).

It's not like Windows that bricks your laptop. It's a handful of hardware
config, and you can easily boot with an older kernel.

------
nabilt
I just updated my Dell, but haven't restarted. Do we know how widespread the
problem is and should I roll back the update?

~~~
bArray
I would suspend your machine until we find out more, I have the same problem.

~~~
Nacraile
You all do realize that (at least by default) ubuntu will keep old kernel
versions around, and you can choose to boot them in GRUB, don't you?

This is certainly a pain, but it's hardly the first time a broken kernel has
shipped. Reasonable recovery mechanisms are in place.

------
mycpuorg
Please look under "Meltdown - x86" section in GKH's (The Stable Kernel
Maintainer) blog: [http://www.kroah.com/log/](http://www.kroah.com/log/)

------
del_operator
Sounds like one way to stop the bug. :P

------
lurr
So do we make snide remarks about fixes not being tested like we did when
microsoft also had issues fixing CPU level bugs?

------
user5994461
Did ubuntu botched an update again? Or is it the upstream kernel?

~~~
eecc
Has Ubutu botched or did Ubutu botch... please ;)

~~~
ksenzee
Muphry's Law strikes again!

~~~
jwilk
[https://en.wikipedia.org/wiki/Muphry%27s_law](https://en.wikipedia.org/wiki/Muphry%27s_law)

 _If you write anything criticizing editing or proofreading, there will be a
fault of some kind in what you have written._

------
ask098
it's the fault of Intel, why don't they recall all the CPU? just like vehicle
company

~~~
JonRB
Doesn't this affect all of their CPUs going a long way back? And how do you
recall embedded or laptop CPUs, which are often soldered in-place?

A recall would be great, but there's no way they'd be able to do it. Vehicle
recalls are a bit different because they impact physical safety. Digital
safety doesn't get the same priority.

~~~
em3rgent0rdr
even if it affects all speculative CPUs, if this happened in the car world,
all the cars would be recalled. Not saying that is practical in computer
world...just continuing with the analogy.

Spectre/Meltdown is a wakeup call for many things, one of them probably being
for computer manufacturers to not solder the CPU to the Motherboard and for
the x86 world to stick with a standard socket, to facilitate replacing parts.

~~~
morganvachon
> _" Spectre/Meltdown is a wakeup call for many things, one of them probably
> being for computer manufacturers to not solder the CPU to the Motherboard"_

Good luck with that. A large portion of affected CPUs/SoCs are in mobile
devices and ultrabooks. Socketed chips simply won't fly in those kinds of
devices.

~~~
flukus
Then the whole device should be replaced, that's the price they pay for their
design decisions. Being "too hard" doesn't absolve you of your responsibility
to consumers.

------
jnordwick
I will not be updating. I have yet to see this mythic JavaScript exploit, and
I see too many other ways I, as an end user, can be affected.

I haven't even seen a proof of concept exploit that has the same conditions as
in the wild. All the POC exploits seems to have been given some assistance in
various ways (such as being given root perms or a preknown memory address).

Does anybody have an example of this JavaScript exploit or any exploit that
would work in the wild?

