Hacker News new | comments | show | ask | jobs | submit login
Meltdown Update Kernel doesnt boot (launchpad.net)
284 points by sz4kerto 10 days ago | hide | past | web | favorite | 197 comments

This arch/x86/events/intel/ds.c fix is unlikely to have rendered too many things unbootable. I missed ds.c entirely when doing the original implementation. I can't fault the nice folks at Canonical for mis-merging a tiny hunk like this. It really only affects pretty specific hardware anyway.

What hardware does it effect?

From the bug it looks like a smattering of Xeons, Celerons, i5s, not sure what's "specific" about it from first glance.

Pretty specific one.

Take my hand, I'll guide you back to Reddit.

Would this entire Meltdown/Spectre thing count as the biggest mess-up of computing history? When yesterday the PoC repo was posted here, the spy.mp4 demo video gave me some chills. And now I can't update my OS before making an installation USB because Canonical can't just follow Linus' releases. Thanks.

>Would this entire Meltdown/Spectre thing count as the biggest mess-up of computing history? When yesterday the PoC repo was posted here, the spy.mp4 demo video gave me some chills.

It must be up there amongst the greats, probably with the "halt and catch fire" op code. Normally they just patch this stuff with microcode and never really tell anybody, this time that won't work.

I'm not entirely convinced it was a mistake at all (dons tin foil hat), Intel have been making suspicious design decisions to their chips for a well now (think this, Intel Me, hidden instructions, etc). It seems clear to me that this security by obscurity approach is quite frankly crap.

>And now I can't update my OS before making an installation USB because Canonical can't just follow Linus' releases. Thanks.

Linus' releases need some sort of buffering before they can be considered stable, often distributions will apply their own patches on top. Also consider the scenario where Linus releases a bad kernel and no testing has been performed before rolling out to all Linux users.

I think it's absolutely unreasonable to imply that this was intentional. Besides the massive amount of complexity these systems have, there are plenty of "legitimate" places to hide backdoors, instead of in a performance architecture decision.

Keep in mind that whatever "evil agencies" would have asked for this would most likely find themselves vulnerable, and nobody would sign off on.

I do agree, however, the "security by obscurity approach is quite frankly crap". The fact that even large corporations (not the big 5) can't even get away from ME speaks volumes about why this is a bad idea. Facebook isn't the only company with important data.

> I think it's absolutely unreasonable to imply that this was intentional.

Amen. It blows my mind that some people think clever techniques like speculative or out-of-order execution must've somehow how nefarious intentions behind them. Come on HN...

The Intel Management Engine is a backdoor. Speed variations in speculative execution are an inherent property of the technology. Until recently, few people thought this was exploitable, and it took a lot of work to figure out how to exploit this.

You do realize those are ideal properties for a backdoor, don't you? If you were writing the spec for a dream backdoor, you would write that down. The only way you could improve it would be "everyone thinks it's impossible, and they never figure it out."

This backdoor is too tricky to be a backdoor. A simpler backdoor would be "Call this opcode 45 times, followed by another opcode 20 times, and you will have activated backdoor mode where these opcodes are now available"...

the ideal properties of a backdoor were visualized to me the day i hacked into an author of a largely distributed piece of smtp mail server, only to find sitting in his home directory an unpublished integer overflow exploit written by him years earlier for a version of the software that is currently in wide distribution...

That's close to perfect, indeed. The drawbacks in this scenario are that (1) not everybody runs an SMTP server, (2) if it's open source (and if it's very popular, then it is), some other smart people will look for the bug and publish it for fame. That's quite different from a backdoor built into a processor (although I really doubt Intel was really involved in any shady practices, it looks like they were not smart enough).

Judging from the numerous decades old bugs recently found, the concept of many eyes needs to die.

And in the case of SMTP, it's basically a pinata of bugs for the last 30 years regardless of platform

it is still way more likely a reasonable design decision for performance reasons than it is for a backdoor.

Alone the risk would not be worth to intel. Do you really think, nsa has enough money to compensate for this backslash and newscoverage?

Yes, though it's moderately hard to exploit against a specific target. It's more useful for bulk attacks - getting everyone who visits a specific web site to run a DDOS attack, or ransomware.

If any quantity about what the processor does, outside the intended effect, has a different distribution when X happens versus Y, then the distribution of that quantity is exploitable. Period.

Any nonuniform distribution in any quantity that is not part of the spec is exploitable!

It is only exploitable if one can measure the difference and extract useful information. Until Spectre guys discovered the double read technique, the expectation was that speculative execution did not allow to extract useful information besides extremely artificial theoretical cases.

Adding a backdoor seems unreasonable but they may have chosen performance over security. Even if this individual bug wasn't intentional they are responsible for setting their priorities.

There are CPUs available which choose security over performance. They aren't made by Intel, but you can buy them, and they're even cheaper.

Oh, you don't want to do that?

Well, I read somewhere the other day that this form of error/attack was conceived of in the academic literature back in 1992. I won’t believe it’s intentional without evidence in that direction, but this is conceivably the kind of obscure/complex attack you’d expect of a state actor.

This has been a known issue in xbox 360 hardware since about 2010.

It just keeps popping up, someone finally thought to weaponize it.

>It just keeps popping up, someone finally thought to weaponize it.

Someone published its weaponization, you mean :)

Those undocumented features & byte code? HAP mode - something the NSA doesnt want you to know exists, but that they had put into intelME from Skylake onward.

But yet and still we found out. So yes, this security through obscurity approach is terrible (with a code embargo being the obvious exception).

They only update microcode when they have to. When doing otherwise risks... Well, this kind of mess.

You dont wanna know how many times I've rebuilt my gentoo system chasing after retpoline kernel & gcc builds that just... Break everything.

It should be interesting to see how it all develops

Yeah ME is a scary thing also. WRT Linux, well my Xubuntu 16.04 (Xenial) is on 4.10 and no new kernels are available to ma ATM. So if they're going to patch my OS, that's probably going to be a backport to that version, not the latest release integrated to my OS version. I guess that's what caused this bug too, although I admit I only skimmed the conversation linked.

They've put out updates for 4.4 and 4.13 (HWE) for 16.04, if that helps.

See https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/SpectreAn...

I'm guessing that updates for LTS kernel will come later. I don't know if I can update to 4.13.

LTS for xenial is 4.4, and patches have already been released for it.

Note there won't be fixes for 4.10, as it's reached EOL for Ubuntu, so you'll need to move to the 4.13 patched kernel.

I think that title is currently held (deserved or not) by null pointers.

Tbh it’s not the most meaningful of statements, but it’s food for thought.

A null pointer doesn't hold a reference, though.

It does if you have something at 0x0.

Or, to put it another way, I have no clue to what you're referring--what do references have to do with "The Billion Dollar Mistake"[0]?

[0]: https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retra...

EDIT: my apologies, that joke was actually pretty good.

I think it was supposed to be a joke.

Yes, thank you.

I think it was supposed to be a pun on "hold". As for the word "reference", your own link uses it.

I use it in a later comment! I was confused about the word in context.

However, I completely missed the pun. Cheers :)

Cheers to you :)

What modern systems even map memory to 0x0? Doing so breaks the C standard, among other things.


> On system reset, the vector table is fixed at address 0x00000000.

Also, I'm not an expert on the C standard, but in my understanding, it doesn't "break" it. That is:

* Address 0 and the null pointer are distinct

* A 0 literal treated as a pointer is guaranteed to be the null pointer

* The null pointer is not guaranteed to be represented by all zero bits

* If you get a pointer to address zero via pointer math or by other means than a 0 literal, you can still access address zero.

Yeah, the NULL pointer is a pretty weird part of the standard - it makes some sense, but leads to weird situations. That said, I think your last point needs a bit of clarification. What you've described is actually already impossible per the standard - with a few exceptions, it is illegal to use pointer arithmetic to address past the size of an allocated object (Because those pointer values may not even be valid for the architecture), so it is technically impossible to use pointer arithmetic on a valid pointer to end-up with the NULL pointer - it would require calculating an address outside of the current object.

So the question of what happens when you actually do that is purely up to your compiler and architecture. In most cases, if you manage to get the NULL pointer value through pointer arithmetic, it will still compare equal to the 'actual' NULL pointer and treated as if it was a literal 0, so that doesn't allow you to get around NULL pointer checks. The only situation where it really matters if the NULL is only known at runtime, since that may have implications on optimizations. Since dereferencing the NULL pointer being undefined behavior, the compiler can remove such dereferences, but it can't remove the dereference completely if it can't prove the pointer is always NULL. There is nothing preventing the compiler from adding extra NULL checks in that aren't in your code however, which would foil the plan of generating a NULL pointer at runtime to dereference it. So unless your compiler explicitly allows otherwise, you cannot reliably access the memory located at the value of the NULL pointer - as far as the standard is concerned, there is no such thing.

Talking specifically about the ARM vector table, that largely works ok because only the CPU ever has to actually read that structure, normally you C code won't have to touch it (If you even define it in your C code. The example ARM programs define the vector table in assembly instead). If you did ever have a reason to read the first entry of that table from C though, you could potentially run into issues (Though I would consider it unlikely, since the location of the vector table isn't decided until link-time, at which point your code is already compiled).

On that note, it's worth adding that POSIX requires NULL to be represented by all zero bits, which is useful. Lots/most programs actually rely on this behavior, since it is pretty ubiquitous to use `memset` to clear structures, and that only writes zero bits.

(Sorry for the long comment, I've just always found this particular part of the standard to be very interesting)

Oh not at all! Thanks for this; I also find it very interesting, and was glad for the correction.

Again, I am unsure how this relates to "The Billion Dollar Mistake" I linked above and was referring to.

I am not sure your point. The reason modern systems don't map memory to 0x0 is because NULL pointers exist. It is a reflection of a leaky abstraction equating pointers to references. That leaky abstraction has (or so the argument goes) caused >$1B in software bugs.

The other mindset would be "malloc always has to allocate memory or otherwise indicate failure; you cannot cast from a integer to a pointer; you cannot perform arithmetic on a pointer to get a new one; you must demonstrate there are no hanging references when freeing". This is essentially what rust did for safe code.

The reason why I indicate so much skepticism is that rust is the first time I've seen the problem solved well in the same problem space as C. Ada has problems of its own. It's more about how small assumptions can have massive economic (and health, and safety, and ethical) consequences. Certainly comparable to a speculative execution bug leaking memory in an unprotected fashion--in both cases the bugs find their way through human error in evaluating enormously complex systems for incorrect assumptions :)

Web assembly.

Llvm won't use it for anything. (I think it starts putting things at 8). Trying to access it explicitly in C will generate `unreachable` instructions.

"biggest" by number of affected CPUs? very possibly, yes. the march of time has that effect: there are more cpus potentially and actually affected, worldwide, than at any other time in history.

"biggest" by net financial loss to a single entity? I dunno. How much did that failed NSA launch cost the state again?

It's unclear whether it failed or if Northrop Grumman want us to think it failed; since the second stage actually did one full orbit with nominal performance, they might be trying to slip one past us. We'll know in a few weeks time I suppose, every satellite tracking enthusiast will be looking for it.

> failed NSA launch

I didn't hear about that one

To which no government agency is officially attached and the failed thing is more a rumor. SpaceX said that on Falcon 9 side everything worked like it should and NG says they cannot comment on classified payloads. So there's literally no information.

I watched that spy demo. How does it know what memory location contains the password being typed?

I'm assuming if you know either common byte patterns or string patterns you might be able to figure out where the password string is being allocated and watch that area of memory for changes.

Not sure if Meltdown is the same, but I read that Spectre can recover memory at about 10kb/sec. So it wouldn't be very efficient to scan the entire memory for a known pattern.

I suppose if there was an exploit targeted at a specific program, it would be possible to work out what location the secrets are stored in?

I leave my machine on for weeks at a time. If something was scanning the memory even if it failed to find the location of my password 99.9% before it is erased eventually it will be lucky and get it.

Good point. I was only thinking about a single run but that makes sense.

According to the paper, Meltdown can recover memory at about 500 kb/s

It's still "only" 1.7gb/hour. If programs follow reasonable security practices, it shouldn't be possible to stumble upon secrets in the memory. This underlines the importance of things such as ASLR and not holding your key in memory longer than needed and rotating them as well.

Once you know the location, if the process is not randomized, you can extract from that location. You may assume some things about implementation (e.g. libstdc++ or libc++, glibc memory allocator, general compiler version)

Additionally some hardening methods like stack protector make stack allocated objects stand out a lot from register values.

Meltdown is fast enough to learn everything about layout of data structures in kernel or other programs and then use it to extract information from particular areas holding the keys.

It appears to be known to the exploit. I feel that this is being so overblown and that the exploits we are seeing require more info that something in the wild would have.

Code in the wild would have access to all memory (slowly) so could eventually find the correct location.

Given that whoever writes is would also have access to the other program, they would have a lot more information on where to look in memory.

I would think it would be. The strange thing is the markets didn't react at all. They actually went up on January 4th.

Because this has largely remained theoretical, unlike Maersk or Equifax.

What are you talking about? We've seen working POCs since last week. This isn't "largely theoretical", this is an actively exploitable hole.

Meh, it's not really very serious in the average case. It's a lot of sky-is-falling rhetoric from the infosec community. Remember Heartbleed and how it was end-of-times bad? Yeah, turned out to be a non-event. Information disclosure bugs like this are difficult to glean useful information from in widely targeted attacks.

(Obviously if you have nation states or serious criminal organizations trying to breach you regularly, this is more serious)

You clearly haven't been paying attention or reading about how this works.

Heartbleed was touted as being bad by those that didn't read too far into it. You could scrape memory, sure. But it was always random fragments. This lets you make targeted address attacks. Force a process to use that memory space through a NOOP and now you can start scraping at will. Or you can just do an entire memory dump and pull things out in plaintext (like scraping Firefox passwords, which we've seen done already).

The only reason this isn't worse is it requires the ability to execute code on the machine. It has high (near absolute) impact, but low-to-moderate on the ease of execution.

"Would this entire Meltdown/Spectre thing count as the biggest mess-up of computing history?"

This title is held by autorun.inf which has caused over 20 years of broken, vulnerable behavior and, AFAIK, is still going strong.

Link to the demo?

YouTube mirror for mobile users anyone?

I think Y2K had more practical impact across the business world. There was genuine fear that it could cause an actual apocalypse with all major computerized systems failing, medical machines killing people, banks being affected and all money and debts disappearing overnight.

It wasn't that bad, because people took it seriously. But there were still tons of practical systems affected and billions of corporate dollars associated with fixing it.

So when you say "biggest mess up" you gotta define specific qualifiers. Because Meltdown/Specter is going to be solved by simply... buying a new CPU. (And retrofitting the old ones). So it consist of mostly a patch.

A BIG important patch, granted. But it's still just a patch. But some ATM's aren't going to start spewing money like they did on Y2K.

I can't find any reports on ATM's spewing money after Y2K, or was that a figure of speech?

I didn't know Y2K was that big of a deal! I guess I'll have to read a bit more about it, as it seems to be an interesting topic. Thanks!

(Copying my instructions from another post).

If kernel 4.4 doesn't work, I recommend compiling the 4.15 kernel. (Note, however, that you may need to apply a patch to NVIDIA drivers).

I've done this on Ubuntu 16.04 LTS, 17.10, and Debian 8 so far this week. To compile, set CONFIG_PAGE_TABLE_ISOLATION=y. That is:

    sudo apt-get build-dep linux
    sudo apt-get install gcc-6-plugin-dev libelf-dev libncurses5-dev
    cd /usr/src
    wget https://git.kernel.org/torvalds/t/linux-4.15-rc7.tar.gz
    tar -xvf linux-4.15-rc7.tar.gz
    cd linux-4.15-rc7
    cp /boot/config-`uname -r` .config

I wouldn't really recommend doing this, but if you really want to do this, it would probably be easier just to use the pre-spun mainline kernels: https://wiki.ubuntu.com/Kernel/MainlineBuilds

Exactly, and use 4.14. According to a recent comment from a kernel dev both kernels use the same patch approach. 4.4 and 4.9 are using a different approach that’s less ideal, less complete and apparently less tested.

I can somewhat understand where you're coming from if the person doing it only wants something that works.

But for those who would be willing to risk breaking a few things to try something out, building a kernel is a worthwhile effort, and the meltdown / spectre bugs provide a perfect excuse to do it.

Waiting a few days to patch my own servers... Not sure what is more dangerous right now: applying these rushed patches or the vuln itself.

What distro are you running? I trust Red Hat to get kernel updates right the first time, I just patched externally facing servers and systems that handle PHI tonight with no issues (outside of one of my PostgreSQL servers showing a non-neglible increase in CPU usage, damnit Intel).

Of course, I also go into any updates with a rollback plan. ITIL sucks, but one thing it taught me was the value of well documented plans any time you make changes to production systems.

Same. RHEL/CentOS went without a hitch. The age of the kernel start’s to concern me, though.

According to the top comment in one of the posts in HN even 4.9 and 4.4 use a less ideal patch: https://news.ycombinator.com/item?id=16085672

I can’t really judge how much RH engineers are capable of fixing that kind of stuff in a kernel that’s officially out of support upstream.

Based on the general quality of RHEL/RHV I trust them to do the right thing, but I have no insight whatsoever in how kernel development actually works.

Red Hat pays the salary of a couple kernel developers, backporting security fixes is a pretty big part of their job. Keep in mind, RHEL/CentOS 7 doesn't even use something newer like 4.4 - it's still on 3.10 because Red Hat guarantees a stable kABI throughout the lifetime of a release

This seems like one of those things that is very hard to get right: https://forums.aws.amazon.com/thread.jspa?messageID=823033

I'm guessing Xen PV isn't well tested by Red Hat anymore since most (if not all) of their paying customers still (stuck) using it are likely on RHEL5, which they haven't released a patch for yet due to that very reason.

I'm kind of shocked Amazon doesn't have something like Linode's Fennix, but you can always do an EBS snapshot of your /boot volume and revert it if a kernel upgrade breaks stuff.

Unless you are running remote code there is no reason to patch.

This is wrong and bad advice. All you need is a remote code execution vulnerability in PHP or so.

Only don't patch if your server is isolated and not connected to the internet.

RCE is already kind of game over though. If you have RCE on the server, you can probably get to everything interesting without having to go through a slow side channel.

If you have a remote exploit, there are much bigger issues to worry about. And since this is a timing issue, I'm not even sure that would be enough.

See this is why you wait a day or two before patching :)

If everyone waited a day or two before patching, this bug would simply be opened a day or two later than it was.

How hard is it to just boot an older kernel and rollback the default? Before I even thought about patching sensitive systems tonight the first thing our IT director asked was if I had a rollback plan. The answer? "Yes, boot old kernel, yum history undo [transaction id], reboot".

Always have a backout plan when doing upgrades, I'm just glad EL and derived distributions have an easy way to do it with yum's transaction history.

(tinfoil hat) That may be this kernel boot bug exists--the agencies are just trying to squeeze a few more days in of extracting data from prime targets, conveniently under the guise of public knowledge about exploiting it.

What was the problem?

All the notes say is that 109 fixes it.

https://launchpad.net/ubuntu/+source/linux/4.4.0-109.132 :

    linux (4.4.0-109.132) xenial; urgency=low
      * linux: 4.4.0-109.132 -proposed tracker (LP: #1742252)
      * Kernel trace with xenial 4.4  (4.4.0-108.131, Candidate kernels for PTI fix)
        (LP: #1741934)
        - SAUCE: kaiser: fix perf crashes - fix to original commit
diff'ing the two changes it was this:

    > diff -u linux-4.4.0/arch/x86/events/intel/ds.c linux-4.4.0/arch/x86/events/intel/ds.c
    > --- linux-4.4.0/arch/x86/events/intel/ds.c
    > +++ linux-4.4.0/arch/x86/events/intel/ds.c
    > @@ -415,7 +415,6 @@
    >  		return;
    >  	per_cpu(cpu_hw_events, cpu).ds = NULL;
    > -	kfree(ds);
    >  }
    >  void release_ds_buffers(void)
plus it's here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1741934...

They consider a bug that renders the OS unable to boot a "low" urgency!?!

I don't think the urgency field is meaningful in Ubuntu, especially for post-release fixes - I believe a package migrates as soon as it's approved and passes builds/tests. The urgency field comes from Debian, which uses it to describe how many days a package should sit in Debian unstable before migrating to Debian testing, in the hopes that if it's buggy, people will file a bug before it migrates to testing. The Debian default is now "medium" (5 days) instead of "low" (10 days), but people with older tools tend to generate changelog entries that say "low". (And even in Debian, I don't think the field has any meaning for post-release updates; I think it only applies to unstable-to-testing migrations.)

Given this seems to be affecting a relatively small number of systems, that's not necessarily unreasonable. It might be very urgent for the people affected, but still low urgency for the userbase overall as compared to other problems.

Though it seems more likely to me that this bug was filed as a placeholder for the already-written patch and verification thereof, and the person filing it simply didn't bother with the urgency field since it wasn't really a bug-report-as-such.

Sorry fam - security issues are more important ¯\_(ツ)_/¯

I think it's fair to say build 108 has no security issues :P in fact, it's the most secure one yet.

Yup. It's so secure that they don't even load userspace into RAM!

The last time I used Ubuntu, they hadn't yet realized what vertical display synchronization is, and nobody had explained to them that you don't do your rendering on the framebuffer you're currently scanning out. So an occasional boot of the recovery kernel truly vanishes behind the plain broken display they expect you to put up with.

any report of windows update messing with the bios rendering motherboard non bootable (powers on , but no post, not even an error beep)

For AMD chips, yes. Yesterday Microsoft announced they were suspending rolling out updates to certain AMD chips because it was resulting in non-bootable systems. I didn't read the technical details so I can not say whether it was specifically BIOS-related. Both a total non-booting state and BSODs were mentioned in the article I saw (from general press, so might have been garbage, sorry).

Microsoft's claim was "Microsoft has determined that some AMD chipsets do not conform to the documentation previously provided to Microsoft to develop the Windows operating system mitigations to protect against the chipset vulnerabilities known as Spectre and Meltdown"[1], and their docs simply suggest asking AMD for more details[2].

So it sounds like it was probably specific chipsets and not CPUs, but who knows.

[1] - https://www.engadget.com/2018/01/09/microsoft-halts-meltdown...

[2] - https://support.microsoft.com/en-us/help/4056892/windows-10-...

It happened to one of my HP boxes that has an AMD chipset. After installing the update, windows 10 just hangs at the blue windows logo. Only solution is to turn the machine off and on twice, which then results in it undoing the update.

The Microsoft link you provide says "Microsoft is working with AMD to resolve this issue", so they're not just brushing it off and telling customers to contact AMD.

I was under the impression that Ubuntu would automatically revert to last good kernel of the new one fails to boot. Was I mistaken?

There's something close-but-not-quite-the-same: by default the grub menu is hidden with an instant timeout (IIRC) but if a boot fails to complete, on the subsequent boot the menu won't be hidden. Google 'recordfail' if you're interested in the details.

AFAIK worse case you should be able to select the previous kernel in grub2 upon reboot. It just hangs after grub on this (bad) kernel I think...

Yeah but if you have a couple of hundred machines that aren't booting.... That's pretty worst case.

That is pretty worst case. OTOH, it teaches the lesson about testing updates on a small number of hosts before rolling 'em out globally.

One has to learn that lesson at some point.

yeah - I hear you. :( Still better than being totally hosed...at least there is some option

No, you have to manually restart, boot into grub and select the one that works

This is a little bit sensationalist (is that a word?).

It's not like Windows that bricks your laptop. It's a handful of hardware config, and you can easily boot with an older kernel.

I just updated my Dell, but haven't restarted. Do we know how widespread the problem is and should I roll back the update?

There's a -109 out now. I'd install it before rebooting and probably expunge -108 entirely.

I would suspend your machine until we find out more, I have the same problem.

You all do realize that (at least by default) ubuntu will keep old kernel versions around, and you can choose to boot them in GRUB, don't you?

This is certainly a pain, but it's hardly the first time a broken kernel has shipped. Reasonable recovery mechanisms are in place.

For me the 108 kernel boots fine, but breaks suspend. :D

Please look under "Meltdown - x86" section in GKH's (The Stable Kernel Maintainer) blog: http://www.kroah.com/log/

Sounds like one way to stop the bug. :P

So do we make snide remarks about fixes not being tested like we did when microsoft also had issues fixing CPU level bugs?

Did ubuntu botched an update again? Or is it the upstream kernel?

Has Ubutu botched or did Ubutu botch... please ;)

Muphry's Law strikes again!


If you write anything criticizing editing or proofreading, there will be a fault of some kind in what you have written.

Can't spell Ubuntu right, though.

Whatever, it’s a commercial name (might have even been the autocorrect.) I recommended — nicely, no offense intended — the correct grammatical form.

Stop staring at my finger. Please ;)

it's the fault of Intel, why don't they recall all the CPU? just like vehicle company

Are you suggesting that we recall the great bulk of modern CPU's? Like, literally gut everyone's computers, including those in data centers and running critical infrastructure, until replacements are eventually manufactured?

Or did you mean something else?

I'd think it'd be reasonable to get a refund in some manner, provided you could provide proof-of-purchase for the CPU in question.

I wouldn't expect them to replace any CPU, unless it was manufactured recently and still being manufactured.

But a refund in some capacity? That's reasonable, I think. In the meantime, we would have to settle for the software fixes.

Why would you need a proof of purchase? Intel can verify that it's its own unpatched chip out in the wild being returned for a recall. It doesn't matter if it's the original owner or a woman 15 owners down the line, it's still a loose security flaw out in the wild; who knows where or who whose network it will wind up. I don't need a proof of purchase when I bring my Ford in for its 10 recalls a year. I don't even need to care about which dealer I bring it into. It has to be fixed. They look at the VIN and if it's not marked as fixed they fix it.

Is there a market of 99%+ seemingly authentic fake Intel chips out there?

I think this is pretty weird thing to talk about, because it's kinda pointless. Do I think Intel ought to refund us somehow? Hell yeah I do, especially given the fact that I have bought a laptop with Intel processor recently and why even bother buying products with a warranty if any fatal design flows don't qualify as refundable anyway? Do I believe Intel will refund or replace something? Of course not, it's hardly even realistic. Even if they wanted to (which they surely don't) what kind of loan do they have to get to afford even a partial refund of every single Intel CPU out there?

Boxed Intel processors carry a 3 year warranty. It certainly seems reasonable for everyone who bought a CPU within the last 3 years to expect a warranty replacement with the manufacturer defect fixed.

In the EU virtually every product comes with a 2 year warranty. So every CPU sold in the EU in the last two years should be replaced for free by Intel, even through OEMs.

I wonder what potential class action lawsuits Intel might be facing.

Any sufficiently complex CPU surely contains some number of defects, perhaps even serious security defects, just as any sufficiently complex piece of software contains bugs and security holes. I wouldn't be surprised if someone tries to sue Intel over this, or even if they win, but this is way outside the scope of what a warranty would traditionally cover, which in the case of a CPU would be hardware failure. If a warranty had to cover every possible defect, a bunch of people would be constantly trying to get free CPUs out of Intel every time they updated their errata:


Note that the cost of overly onerous regulation (e.g. requiring that every computer manufacturer replace these chips even though the problems can largely be worked around in software) is of course passed onto consumers.

> but this is way outside the scope of what a warranty would traditionally cover

The warranty and any other legalese from intel is irrelevant here, this is about consumer protection laws of various countries that supersede an intel warranty. A serious post sale drop in performance would be enough for a refund on any computer purchased in many countries. In Australia if I bought a computer 6 months ago I'd be entitled to take it back to the store for a refund, then it's up to them to argue with dell and dell to argue with intel.

> Note that the cost of overly onerous regulation (e.g. requiring that every computer manufacturer replace these chips even though the problems can largely be worked around in software) is of course passed onto consumers.

Demanding that a product works and in lieu of that offering a replacement or refund is not overly onerous regulation, it's a very basic standard protection.

I’m not convinced that a software update slowing down your phone or computer a few percent while performing certain operations should automatically qualify you for a refund. It’s widely understood that keeping your computer secure requires installing software updates, and it’s even more widely understood that installing updates often slows down your computer. If that’s going to be your bar, I think an iPhone would have to sell for about $25,000 so Apple could afford to give you a replacement every year for the rest of your life.

Of course the cost of producing products that actually perform at the level they're advertised to perform is passed onto the consumer, regardless of regulation.

I guess it depends if everyone agrees on whether or not the product performs "as advertised" as not. If you have a defect that affects e.g. 1% of your users, but the government forces you to compensate 100% of your customers, that seems like an unnecessary cost.

For something like Meltdown/Spectre, the patches/workarounds reportedly barely affect some workloads, but cause drastic slowdowns for others. So already not everyone's affected to the same extent. Then you have computers with easily replaceable CPUs vs. stuff like phones and laptops which probably were only designed to work with a single CPU, and the manufacturer's already working on their next model and doesn't want to waste money building replacement parts for the previous one. At that point, maybe you have a complaint with e.g. Apple for selling you an iPhone that doesn't work as performed because they had to work around a security problem, and Apple might themselves go after Intel. The whole situation is a lot more complicated than "it should totally be covered under the warranty."

Intel would just say it functions exactly as designed. :)

(because it's a design flaw)

They did say that:

> Intel and other technology companies have been made aware of new security research describing software analysis methods that, when used for malicious purposes, have the potential to improperly gather sensitive data from computing devices that are operating as designed.

> […]

> Recent reports that these exploits are caused by a “bug” or a “flaw” and are unique to Intel products are incorrect.

(https://newsroom.intel.com/news/intel-responds-to-security-r... ; emphasis mine.)

Right, right. And they'll just keep saying it. :)

Doesn't this affect all of their CPUs going a long way back? And how do you recall embedded or laptop CPUs, which are often soldered in-place?

A recall would be great, but there's no way they'd be able to do it. Vehicle recalls are a bit different because they impact physical safety. Digital safety doesn't get the same priority.

I don't think they would be able to produce all necessary CPUs. Replacing all current stock would create a huge problem. Replacing all sold cpus from the last two years would be a huge problem, even if (and I don't know how complicated or not it is) it would be quite simple to redesign all these chips, how long would it take to do that?

Then imagine all chips from 1995 to 2015, having to make them again, they don't have the machines anymore.

Also vehicle recalls are usually done by fixing stuff next time the vehicle comes in for regularly scheduled service. How often does your computer get those?

Depends on the recall, the GM ignition recall was done on an independent appt basis.

(besides you should not be taking your car to the dealer if you value your wallet)

You need a new dealer...most have greatly improved customer service experiences these days, and many independents are no panaceas...

Dealers make almost zero margin on car sales(aside from used and trade-in shenanigans). The majority of their margin comes from services so they'll happily gouge you on them.

How about they go into bankruptcy with most of the world’s computer users as their creditors? Maybe not, but it’s terrifying that you can avoid responsibility by fucking up on a larger scale than most.

I'm sure Intel has a liability insurance policy that covers this type of thing.

Maybe, but that assumes they get the payout, that they did everything on their end of that deal, etc. Insurers don’t like to pay.

Someone on Twitter had the last safe cpu that Intel made it was date stamped 92'-93' he was asking a Bitcoin for it!

even if it affects all speculative CPUs, if this happened in the car world, all the cars would be recalled. Not saying that is practical in computer world...just continuing with the analogy.

Spectre/Meltdown is a wakeup call for many things, one of them probably being for computer manufacturers to not solder the CPU to the Motherboard and for the x86 world to stick with a standard socket, to facilitate replacing parts.

> "Spectre/Meltdown is a wakeup call for many things, one of them probably being for computer manufacturers to not solder the CPU to the Motherboard"

Good luck with that. A large portion of affected CPUs/SoCs are in mobile devices and ultrabooks. Socketed chips simply won't fly in those kinds of devices.

Then the whole device should be replaced, that's the price they pay for their design decisions. Being "too hard" doesn't absolve you of your responsibility to consumers.

Alternatively, even if the boards + CPU are tightly integrated, if used a particular standard like EOMA-68, then they could be easily replaced with rest of desktop/laptop/phone not being affected.


That's why using non-OEM parts in your car voids the warranty.

No it doesn't. The Magnuson-Moss Warranty Act of 1975 forces them to honor the warranty unless they can prove that the non-OEM part caused the fault.

And how would they replace the bad part when a good one doesn't exist?

With money, and a handwritten apology soaked in the tears of their C-levels?

Or just the money actually. If you can’t replace my broken item, a refund is always appreciated.

Bryan Cantrill refrence has been noted.

That was a reference to me? If so -- I'm flattered. I haven't yet asked Intel to soak a written apology with their tears, but it's an excellent idea! (I have, however, given them many other candid thoughts on how they can improve their handling of Spectre and Meltdown -- but so far, to no avail.)

Most vehicle recalls are not of the form: return your vehicle, we give you a new fixed one; but rather: bring your vehicle to one of our dealers and they'll perform some action to repair the defect.

The later is pretty analogous to issuing firmware and OS patches to mitigate the flaw.

What if the fix to the car reduced your MPG by 30% to address a safety issue? This seems somewhat analogous to the number being bandied about as the CPU performance decrease. (Depends on workload, etc, etc.) I think most car owners would expect some kind of compensation for a product that no longer has the same efficiency as when they bought it.

This issue is similar to what happened with the Volkswagen emission cheat. After they fix the ECU to have the advertised NOx emissions, the car lost peak power and fuel efficiency.

Is this endangering the life of anyone? A car manufacturer doesn't do a recall if personal safety isn't at risk.

They will often refuse to do a recall even if personal safety is at risk. The real quantifier is how much financial damage the auto company will endure in the event they do or do not do a recall

> They will often refuse to do a recall even if personal safety is at risk.

Ford Pinto, anyone...

Another one that many people don't know about is a problem with old 2-door Tahoes; a bracket on the driver's side seat likes to fail upon quick acceleration - such as when getting on the freeway, for instance.

One minute you're upright, pushing the pedal to get up to speed, the next - whoops! - flat on your back! If you're lucky, you live to tell the tale...

AFAIK, GM never issued a recall about that one (it caused me to pass on a really nice lifted 4wd Tahoe a few years back)...

Eh, sometimes they do for the sake of brand preservation. If personal safety is at risk they will do a recall, either voluntary or mandated by the NHTSA.

But they have yet to do any recalls for cars that are susceptible to being broken into using a slim jim.

Volkswagen emissions recall.

And replace them with what? They don't have any current processor that doesn't have the same bugs. It will be years before they design and make one. The best we can do is some class action and get a refund, but not too much or they'll go broke and close.

> And replace them with what?


Which doesn't have Meltdown but still has Spectre. Furthermore you have to replace at least the whole motherboard on a desktop or probably all of your laptop except the discs and maybe the RAM.

> still has Spectre

I didn't realize that when I made the comment, and I agree my suggestion falls flat now that I know.

> You have to replace at least the whole motherboard on a desktop or probably all of your laptop except the discs and maybe the RAM.

I'm ok with putting that responsibility on Intel to remedy the situation, even if it deeply hurt them financially or put them out of business. If you sell a faulty product that doesn't live up to its description, yes you risk actually going out of business. But with the fact that AMD has Spectre this idea of replacement no longer makes much sense and your original idea of a partial refund makes the most sense.

They really should.

The problem is this nearly every single processor Intel shipped for a decade so Intel doesn't have the cash flow to RMA that many replacements. They're going to fight tooth and nail to avoid this.

In reality they could probably argue standard depreciation on a product and offer the remaining value as a discount towards a working product... only there currently aren't any equivalent products on the market. AMD's CPUs are still affected by lower profile variants of this, but aren't as /horridly/ broken.

    In reality they could probably argue standard
    depreciation on a product and offer the remaining
    value as a discount towards a working product...
This could work for older models, but their flagship $15,000+ each (in bulk trays) are also affected. So its hard to argue depreciation on <2 month old silicon.

Not a decade or so. Since the Pentium Pro.

So since ‘95 or ‘96.

Wait if a court forces Intel to replace processors with a safe one does that mean I get new macbook?

I can't imagine a court doing anything. The cost to intel is simply to high, and its not like they were negligent given that ARM/POWER/etc are all vulnerable to some extent too.

This whole thing is the equivalent of discovering that if someone throws enough nails on the road your car will blow a tire, spin out of control and kill you. With the kneejerk reaction of trying to fill everyone's tires with concrete to avoid the tires blowing out rather than trying to figure out how to keep people from throwing nails on the road (with the idea that spiteful users are more likely on toll roads) in the first place.

I agree with you that the courts probably won't do much, but you should not group all the CPU manufacturers together. Meltdown mainly (only?) affects Intel. It's spectre that affects basically everyone.

Actually, it is more like that someone can throw a rock through a car window and then compromise the car door lock. Or, probably more accurately, that a hitchhiker can knock you out of the car, and drive off with it.

Spectre/Meltdown ONLY is an issue if you run some untrusted code on your system. If you avoid this, there is not problem. Yes, we like to be able to run untrusted code (such as in a web browser / javascript), but that is not the fault of a car manufacturer that you like to pick up hitchhikers.

I'm with you on the untrusted code bit, which is why I think unmapping the kernel should be restricted to untrusted processes. Then it only applies to your browser, the KVM/qemu instances or whatever runs untrusted code.

Yup, this will hit the EC2/etc users hard, but those people have already IMHO given up on absolute performance by putting themselves in shared environments where bad neighbor syndrome can already hit their perf pretty badly.

But for whatever reason (probably because its easier) the current plan just seems to be to use the big hammer.

The big hammer is the pragmatic approach for the short term. Everyone and their dog wants to claw back the lost performance, we're only week past the big reveal.

Your idea of black/white listing processes might bubble up as a solution in some scenarios. Perhaps it could be pledge-like; if you're savvy enough, try implementing it, or fleshing out the details.

That might just create yet more problems. You can't just plug a new CPU into your motherboard--it has to match the socket, chipset, memory, installed OS, etc.

Well if they didn't change the socket every year...

Realistically though, they might be able to do it for cpu's that aren't soldered on (think just about every laptop) made in the last year or two, but would they really fab new versions of 10 year old cores? Its not like many of those lines are even running anymore, so they would basically have to redesign/layout and reverify everything.

Probably easier/cheaper just to send everyone a new machine.

Or give a price break on future hardware. The fix is turning out to be incredibly expensive for ordinary users, virtualization vendors, hardware vendors, OS providers, cloud providers, etc.

This incident demonstrates why you really don't want catastrophic bugs in the CPU. The fact that the hardware vendors missed this one makes you wonder what else is out there.

>Or give a price break on future hardware

feels like this would happen:

>intel agrees to give consumers a $30 price break in response to meltdown/spectre

>in other news, intel raises the prices of next generation CPUs by $30

Make it $60 and I'll take that, all the better for them to get undercut by AMD.

Faulty vehicle it's a potential accident with multiple casualities vs a hacked computer with potential loss of your private emails or whatever.

What if that computer is running something critical, say a reactor, and elevator or some device in a hospital? Computers are everywhere these days... Todays proof-of-concept becomes tommorow's metasploit module - and could cause large damage even in incompetent people's hands.

Or just a case of emissions violation?

I want to be there for question 099.

A * B * C = X

Is this a Fight Club reference?

I will not be updating. I have yet to see this mythic JavaScript exploit, and I see too many other ways I, as an end user, can be affected.

I haven't even seen a proof of concept exploit that has the same conditions as in the wild. All the POC exploits seems to have been given some assistance in various ways (such as being given root perms or a preknown memory address).

Does anybody have an example of this JavaScript exploit or any exploit that would work in the wild?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact