

Mistakes in silicon chips could help boost computer power - yanw
http://news.bbc.co.uk/2/hi/technology/10134655.stm

======
coderdude
So software (or another part of the CPU) will handle the errors so that the
programmer doesn't have to check that every single operation worked correctly.
This reminds me of how Hadoop handles failures such as a hard drive dying. If
we're abstracted from the failures we really don't have to pay attention to
them (except with a distributed filesystem you would need to replace dead
hardware).

This is a very interesting concept. Although it mentions decreased power
consumption, would this also entail more performance for chips? It mentions
that Moore's Law is being hampered by our "need for perfection."

~~~
forgottenpaswrd
When you use a hard drive you are already using a "stable" CPU, you are sure
that the checking operations(CRC...) you do are well done.

How are you going to check something is well done without doing it if you
don't know the solution?.Whatever you do you are going to add a ton of new
logic(more consumption, and complexity and HORRIBLE BUGS on hardware level).

I meet people that had to manage hardware bugs when CPUs where not as reliable
as today(automatic testing in VHDL and Verilog has helped a lot), like the
people that worked in the first Cray computers. It made their lives miserable,
boring, repetitive and painfully slow work.

The worst bug in existence for a programmer is the one that appears and
disappears randomly and when you had invested months of work trying to find it
on your code, you find is a hardware bug!!

~~~
reitzensteinm
I always imagined that this would work by doubling up the hardware, requiring
two logic units to agree, if they didn't then computation work would be
redone, much like a branch prediction failing.

This would obviously not help power consumption at all, but freqeuncy could
probably be scaled up quite a bit.

It also assumes that if an error happens it's unlikely to repeat in the same
way in the second logic unit. But that may not be a reasonable assumption.

~~~
gdickie
At the lowest level, you don't need to double-up, you can use an error
correcting code. If the memories and registers on a chip use a multiple-error
correcting code, then the underlying error rate could be quite high without
making any difference in the user-visible error rate.

Similarly you could use noisy-network protocols for on-chip wires, so that
each signal path doesn't need to be perfect. Again you don't need to double-
up. Instead you lose a small percent to overhead, and a delay in order to
encode / decode.

~~~
reitzensteinm
How would the error correcting code work for something like a floating point
multiplication? Correcting errors in storage is simple, but correcting errors
in computation seems like a significantly harder problem.

------
maushu
So now, not only do we have to worry about our own mistakes, we have to worry
that 1 + 1 might be 3, while programming? Great.

------
wlievens
This probably applies best to specific needs. For instance, I can imagine that
a lot of people would accept small errors in images rendered by their GPU in
exchange for greater performance.

------
forgottenpaswrd
oh, brilliant, so we need to make our software systems exponentially more
complicated(when anything is possible, you need to add complexity) to gain a
30% power reduction in hardware?.

And add the possibility for the computer to fail at random(statistical
atypicals that are not "filtered").

I think this is not a good idea.

~~~
someone_here
Keep in mind that currently, about 100 bits of memory in your computer are
flipped every year just because of cosmic rays.

~~~
nudded
[citation needed] (for real, I would love to know if this is true)

~~~
Retric
ECC really does help. For home use non ECC memory is probably acceptable as
most errors are going to be in graphics data etc. But, for a few $ more you
can significantly increase your computers stability.

"Recent tests give widely varying error rates with over 7 orders of magnitude
difference, ranging from 10−10−10−17 error/bit•h, roughly one bit error, per
hour, per gigabyte of memory to one bit error, per century, per gigabyte of
memory.[7][11][12]"
<http://en.wikipedia.org/wiki/Dynamic_random_access_memory>

------
Daniel_Newby
A 1% error rate in pointer writes would be ... interesting.

