Hacker News new | comments | ask | show | jobs | submit login

Intel: "Long term reliability cannot be assured unless all the Low-Power Idle States are enabled."

Does that mean if you run the CPU too much, it will die quickly? Is there some low limit on time at full power? Electromigration problems, perhaps?




Metal layers where electromigration would occur haven't scaled to the same degree as the transistors. I would suspect hot carrier injection (which is coinciedentally worse at higher thermal temperatures) is becoming a greater issue as the transistor channels and gate oxide thickness shrink.


Electromigration

That would be my bet. They probably assumed based on typical workloads, a certain percentage of the time, the part would be asleep. Which is generally a reasonable assumption for non-server parts.

More sleep means less activity, and less heat (which accelerates EM)


The desktop Broadwell CPU was supposedly canceled because they had lifespan problems.[1] Overclocking sites report that Skylake processor life suffers badly when overclocking is attempted. Now Intel is effectively saying their CPU is intermittent-duty only. Intel may be having serious problems with electromigration.

Is this the limit for CPU speed and transistor size?

[1] http://wccftech.com/intel-debating-commercializing-broadwell...


Now Intel is effectively saying their CPU is intermittent-duty only.

That's really disturbing if true. Poor power management has resulted in devices being warmer than they need to be and shorter battery life, but that seems trivial in comparison to the hardware actually being damaged. IMHO I would consider it a flaw if a CPU did not last effectively forever at full load --- older OSs which lacked any sort of power management basically kept the CPU in this state all the time, and there's plenty of old hardware around and working to show that it isn't unrealistic.

It seems they're heavily sacrificing lifespan for performance, which is attractive to (most) users and also builds in some planned obolescence, but it's sad that what was once considered to have indefinite lifetime is now almost a consumable. To use a car analogy, this is like moving from a conservatively designed engine that lasts hundreds of thousands of miles but only produces 100HP to a top-fuel dragster engine that can produce thousands of HP but can't run at full power for even a minute without destroying itself.


Making the CPU consumable is a smart from their PoV since the rapid increase in performance every generation has plateaued.

I still do all my development at home on an i5-2600K and an i5-2430 laptop, neither is noticeably slower than any of the new machines I've used (both have SSD's).

I'll probably run this desktop til it dies as there is no compelling reason to upgrade.


Can you imagine a future where we get so used to producing microprocessors this way for so long that another consideration for long-distance space travel becomes "but will the CPU still work when the rover gets there"?


Presumably Intel is having electromigration problems with their chips because the feature size is so small-- 14nm. Space hardware tends to use rad-hard chips, which use bigger transistors. Curiosity apparently uses the RAD750, which has a 150nm feature size, and can be clocked up to a blistering 200MHz.


And costs a blistering $200K.

The US applies export controls to radiation-hardened ICs, which has resulted in a dearth of rad-hard ICs. Nobody wants to run a silicon on sapphire fab any more.


To me this is more worrying for GPU's at future process nodes. Most modern video games push my i5-4570 to reasonable levels, but my R9 290 uses over 3X the power (and generates 3X the heat). If electromigration is much more prominent due to feature size we could see GPU's dying a lot faster than CPU's.


Intel sells Broadwell Xeons, so if they had an electromigration problem with their 14nm node, they must have fixed it by now. Either that or they're going to recall a ton of server chips.


Server chips are usually equipped with better cooling though, so the problem may not be as acute there. There's also the possibility than Intel is only now finding out the problem with their 14nm process, now that Broadwell has been around for a few years.


Electromigration is more of a metal-layer interconnect problem. With wacky new transistor structures, we are more likely to see transistor problems, like hot-electron effects and other device-level problems.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: