Most of the time, this is not a trade off worth making. I can't think of a researcher that would willingly trade replicability for speed. I can't think of a mathematician who would base a proof on the idea that a number is probably prime. I can't think of a bank customer who would be fine with the idea that the balance displayed on the ATM is pretty close to where it should be. I can't think of an airline passenger who would be totally fine with the flight computer usually being pretty good.
It would be a fine trade off for games, however. And I'm sure there is room for some fudging in complex simulations that have plenty of randomness already.
But given the choice between an answer that is correct, and an answer that is probably correct, I will take the correct answer. Even if I have to wait a little.
"I can't think of a mathematician who would base a proof on the idea that a number is probably prime."
I can assure you that such a proof probably exists :) Just look at https://en.wikipedia.org/wiki/Probable_prime
Probability theory can be an extremely powerful tool when researching things that are otherwise difficult to reason about. And the theorem statement does not have to be probabilistic for the probabilistic method to be applicable. Just see
As for the following:
"I can't think of an airline passenger who would be totally fine with the flight computer usually being pretty good."
Actually, I would think it's pretty much the opposite. That is, the only type of airline passenger I can think of, is one who is fine with the flight computer (and the airplane in general) usually being pretty reliable. We already know that computers can malfunction and airplanes can crash. Now, of course, how reliable you want the airplane to be is up to you, but if you want it to be flawless, then you should never board an airplane.
However, when interfacing to the real, non-exactly specified, incomplete information world, "good enough most of the time" is not only fine, it is all you can hope for. For example, robots that need to navigate rough terrain, or in living orgamisms at the nano-scale, or communicate in human language.
There is a huge class of things that simply cannot be addressed with the current "correct" model of computation because they are not well-specified to the lowest level in the first place.
Computers, in other words, needs to be more like brains. Yes, this does give up certain advantages, and safety guarantees. But it does not give up everything. People have trusted their life to animals and other humans, for example, for all history, even though they are far from perfectly safe and/or compliant...
(and even now, it is "fallible" humans that control the technology, so you could make a point that nothing is lost, as it was never there in the first place)
I wonder about this. One of the most common human methods of making decisions with incomplete information and/or short time frames is to fall back on heuristics rather than try to reason through the data. Heuristics have the advantage of being fast and available, but they're also notoriously bad at producing the quality of decision that might come from a more thorough analysis of the data.
Frankly, I think I'd feel more comfortable with computers that made decisions through brute force processing and bayesian analysis.
Brute forcing all possible combinations and finding the best one (for your specific goal, or taking into account all externalities as well) would be great. But it is impossible for two reasons:
1) incomplete, or even purposefully incorrect information
2) time constraints/compexity. For a non-linear system with more than a few variables, it is just not feasible to enumerate everything.
You end up with heuristics. The challenge is that remains is finding good heuristics. If the decision is given more time, you can probably come up with better heuristics and even sort-of brute force through the "best" options (which is what humans do when they reason).
Floats are inaccurate (sometimes far from real,) but precise (always yield the same output given the same input.)
Edit: reading the article, it calls out a special (different) meaning of the use of precise in the IEEE float spec
> In the case of full reproducibility, such as when rounding a number to a representable floating point number, the word precision has a meaning not related to reproducibility. For example, in the IEEE 754-2008 standard it means the number of bits in the significand, so it is used as a measure for the relative accuracy with which an arbitrary number can be represented.
If I understand correctly - no single neuron is neccesary for your brain to work ok. Brain does not depend on any single link working correctly. Somehow it works reliable enough.
It's possible to make reliable systems from unreliable parts.
From the non-software perspective human brains make big mistakes all the time. Consider car accidents, gross errors in judgement, etc.
Complaining that brains without training are not reliable is like complaining computers don't work correctly without software.
BTW - many of the errors brains make are not because of unreliable parts, but because of system design - we have some firmware that made sense before, and now is problematic. This firmware is reliable - it works as intended, only now the problems changed, and we can't change the firmware, and patching this in software causes some problems.
Parsing natural language is probably inherently non deterministic since no natural language grammers are context free. When we parse a sentance there is almost always at least some ambiguity in the sentence itself, plus we also have to process other things that the person has said; in many cases years in the past and what motivation they may have for saying a certain thing. It is very unlikely that 2 people will get exactly the same meaning out of some natural language sentence because it comes with all kinds of side effects for example something somebody says may also change your opinion of them and other things they have said.
I suspect that if we cannot get deterministic behavior out of future computers because of the amount of parallelization required to make efficient use of their CPUs we will end up with 2 streams of computing , one of which will stay relatively static and be interested in using computers for the types of problems we currently do and another which will be interested in applying it to new problems that do not require determinism.
The software for these computers will be radically different so most likely you will have 2 computers on your desk (or in your pocket , or in VMs), one with < 10 cores and one with > 1000 cores.
I don't know much about how the brain works but I guess this is a process that uses allot of heuristics and psuedo randomness that probably lends itself well to being parallelized which is why we set up our languages this way.
It's more that one of the properties of the brain is that it believes itself to be reliable enough.
You can't get to 100% reliability. You just have to choose how many nines you want. Why should this be any different?
Not in a multi-core setting. When you do a floating point reduce operation the order in which you reduce your floating point numbers matters, but this order is not necessarily deterministic.
Let's say you have some floating point numbers, like...
float A = 42.90F;
float B = 00.01F;
float C = /* ... */;
float D = /* ... */;
In fact, if you do (A + B + B + B ...) very much, then you will wind up exacerbating the problem --- it can be a source of bugs if you repeatedly increment a counter by small floating point values. For example:
const int Iterations = 10;
for ( int i = 0; i < Iterations; ++i )
A += B;
printf( "%2.4f\n", A );
Astute readers probably would note that 00.01F can't be exactly represented in base 2, and wonder whether something like 00.0125F would still suffer from this problem. Alas, yes:
A = 42.9000F
B = 00.0125F
8 iterations == 43.0000
80 iterations == 43.9001
800 iterations == 52.9006
8000 iterations == 142.8820 /* we would expect 142.9 */
float A1 = 42.9000F;
float B1 = 00.0125F;
float A2 = 42.9000F;
float B2 = 00.0125F;
const int Iterations = 4000;
for ( int i = 0; i < Iterations; ++i )
A1 += B1;
A2 += B2;
float A = A1 + A2 - 42.9000F;
printf( "%2.4f\n", A );
You may be tempted to try various things to get around this fact... for example, what about counting from zero rather than from 42.9000? (Answer: the error can be even worse, which is a topic worthy of its own separate post) ... but you will quickly discover that this is a fundamental issue. The order of operations matters.
Also, if you do some kind of map-reduce style work distribution where different nodes perform same operations on different data, it pays off to use a deterministic (pseudo-)random algorithm for work distribution, so that your results can be reproduced.
There is no such thing as a "floating point reduce operation". You can do a reduction using a floating point calculation as your joining operation, but even in that case, each individual operation is still deterministic. Moreover, a classical reduce is also deterministic, being a fold in one direction or other based on the associativity of the joining operation. The non-determinism comes only from your choice to specify the behaviour of your "parallel reduce" function in a non-deterministic way, and if you do that without understanding the consequences that's hardly floating point's fault.
The MPI standard does not require that reductions use deterministic associativity for the same number of processes (independent of the physical network topology), but they do suggest that it is desirable and (AFAIK) most implementations are deterministic on the same number of processes.
There are also plenty of existing models where accuracy is ultimately assured, but not always maintained internally. Branch prediction, for example. The computer guesses which way the branch will go! This can give a big speed boost depending on the workload and the predictor, and the computer assures accuracy by fact-checking the guess down the line.