Hacker News new | comments | show | ask | jobs | submit login

"The obstacle we shall have to overcome, if we are to successfully program manycore systems, is our cherished assumption that we write programs that always get the exactly right answers."

Most of the time, this is not a trade off worth making. I can't think of a researcher that would willingly trade replicability for speed. I can't think of a mathematician who would base a proof on the idea that a number is probably prime. I can't think of a bank customer who would be fine with the idea that the balance displayed on the ATM is pretty close to where it should be. I can't think of an airline passenger who would be totally fine with the flight computer usually being pretty good.

It would be a fine trade off for games, however. And I'm sure there is room for some fudging in complex simulations that have plenty of randomness already.

But given the choice between an answer that is correct, and an answer that is probably correct, I will take the correct answer. Even if I have to wait a little.

I think I understand your point, but I don't agree with some of your examples.

"I can't think of a mathematician who would base a proof on the idea that a number is probably prime."

I can assure you that such a proof probably exists :) Just look at https://en.wikipedia.org/wiki/Probable_prime

Probability theory can be an extremely powerful tool when researching things that are otherwise difficult to reason about. And the theorem statement does not have to be probabilistic for the probabilistic method to be applicable. Just see http://en.wikipedia.org/wiki/Probabilistic_method http://en.wikipedia.org/wiki/Probabilistic_proofs_of_non-pro...

As for the following:

"I can't think of an airline passenger who would be totally fine with the flight computer usually being pretty good."

Actually, I would think it's pretty much the opposite. That is, the only type of airline passenger I can think of, is one who is fine with the flight computer (and the airplane in general) usually being pretty reliable. We already know that computers can malfunction and airplanes can crash. Now, of course, how reliable you want the airplane to be is up to you, but if you want it to be flawless, then you should never board an airplane.

It's not just the examples that are flawed. In most practical situations, provably correct answers do not exist. In most cases, one can only choose a level of certainty. Sometimes not even the level of certainty is possible to know.

Also, no airline passenger cares exactly what trajectory they take through the air to reach their destination. If it's easier to program an autopilot which tacks slightly off course, nobody will care.

Those are examples of problems that exist within the space of mathematics and number games we created for ourselves. Computers as we have them now are great for those.

However, when interfacing to the real, non-exactly specified, incomplete information world, "good enough most of the time" is not only fine, it is all you can hope for. For example, robots that need to navigate rough terrain, or in living orgamisms at the nano-scale, or communicate in human language.

There is a huge class of things that simply cannot be addressed with the current "correct" model of computation because they are not well-specified to the lowest level in the first place.

Computers, in other words, needs to be more like brains. Yes, this does give up certain advantages, and safety guarantees. But it does not give up everything. People have trusted their life to animals and other humans, for example, for all history, even though they are far from perfectly safe and/or compliant...

(and even now, it is "fallible" humans that control the technology, so you could make a point that nothing is lost, as it was never there in the first place)

> Computers, in other words, needs to be more like brains.

I wonder about this. One of the most common human methods of making decisions with incomplete information and/or short time frames is to fall back on heuristics rather than try to reason through the data. Heuristics have the advantage of being fast and available, but they're also notoriously bad at producing the quality of decision that might come from a more thorough analysis of the data.

Frankly, I think I'd feel more comfortable with computers that made decisions through brute force processing and bayesian analysis.

Yes, we'd all like to live in the best of all worlds. But in any real-life scenario, there is a trade-off.

Brute forcing all possible combinations and finding the best one (for your specific goal, or taking into account all externalities as well) would be great. But it is impossible for two reasons:

1) incomplete, or even purposefully incorrect information

2) time constraints/compexity. For a non-linear system with more than a few variables, it is just not feasible to enumerate everything.

You end up with heuristics. The challenge is that remains is finding good heuristics. If the decision is given more time, you can probably come up with better heuristics and even sort-of brute force through the "best" options (which is what humans do when they reason).

In most applications, using a IEEE float is fine. It's a trade-off where a loss in accuracy is acceptable. Lots of digital things (images for instance) are discrete approximations of continuous phenomenon. The question is the degree of inaccuracy that is tolerated and the sorts of things you have to do to prevent it from propagating (which should be familiar to anyone who has to do math and work with finite precision floating point numbers).

floating points have well defined rules and have perfectly accurate calculations, they are only "inaccurate" when used as a computer representation/approximation of real numbers. However, they still do not exhibit any randomness (indeterminacy), are usually not a cause of strange, hard-to-reproduce errors (that concurrent, non-sequential memory instructions often cause).

It depends on if you are using "accurate" in the STEM jargon sense or if you are using it in the common parlance. In jargon, there is a difference between accuracy and precision. http://en.wikipedia.org/wiki/Accuracy_and_precision

Floats are inaccurate (sometimes far from real,) but precise (always yield the same output given the same input.)

Edit: reading the article, it calls out a special (different) meaning of the use of precise in the IEEE float spec

> In the case of full reproducibility, such as when rounding a number to a representable floating point number, the word precision has a meaning not related to reproducibility. For example, in the IEEE 754-2008 standard it means the number of bits in the significand, so it is used as a measure for the relative accuracy with which an arbitrary number can be represented.

Most "accuracy" issues that crop up with floats have nothing to do with either accuracy or precision, but simply with wrong expectations about decimal behaviour.

And yet our brains works that way, and we trust them.

If I understand correctly - no single neuron is neccesary for your brain to work ok. Brain does not depend on any single link working correctly. Somehow it works reliable enough.

It's possible to make reliable systems from unreliable parts.

As a software developer my brain is one thing I certainly don't trust, the brains of others I trust even less. That's exactly why we put things like tests and code reviews in place. Still despite our best efforts our software is full of defects.

From the non-software perspective human brains make big mistakes all the time. Consider car accidents, gross errors in judgement, etc.

We trust our brains, but they're not, in fact, very reliable, not without lots of training anyway. In everything from long-term decision making to remembering why you walked into the room, they make little errors that sometimes get us into big trouble. A large part of the reason computers are so useful is precisely because they don't usually make those kinds of mistakes.

They are reliable for things they are "designed" to. For example for motoric coordination, coordinationg body automatic functions - breathing, pumping blood, etc. They are also highly reliable for new things, if properly trained - like playing music, riding bike, flying plane, playing games, etc. Parsing natural language is not easy task, it involves symbolic processing, yet brains are better at this than computers.

Complaining that brains without training are not reliable is like complaining computers don't work correctly without software.

BTW - many of the errors brains make are not because of unreliable parts, but because of system design - we have some firmware that made sense before, and now is problematic. This firmware is reliable - it works as intended, only now the problems changed, and we can't change the firmware, and patching this in software causes some problems.

It's difficult to say how reliable our brains are at doing things like driving cars or riding bikes because we don't really have anything to compare them to. I expect that it won't take long before we can build automated cars that are far safer drivers than any human.

Parsing natural language is probably inherently non deterministic since no natural language grammers are context free. When we parse a sentance there is almost always at least some ambiguity in the sentence itself, plus we also have to process other things that the person has said; in many cases years in the past and what motivation they may have for saying a certain thing. It is very unlikely that 2 people will get exactly the same meaning out of some natural language sentence because it comes with all kinds of side effects for example something somebody says may also change your opinion of them and other things they have said.

I suspect that if we cannot get deterministic behavior out of future computers because of the amount of parallelization required to make efficient use of their CPUs we will end up with 2 streams of computing , one of which will stay relatively static and be interested in using computers for the types of problems we currently do and another which will be interested in applying it to new problems that do not require determinism.

The software for these computers will be radically different so most likely you will have 2 computers on your desk (or in your pocket , or in VMs), one with < 10 cores and one with > 1000 cores.

I don't know much about how the brain works but I guess this is a process that uses allot of heuristics and psuedo randomness that probably lends itself well to being parallelized which is why we set up our languages this way.

It's not that the brain works "reliably enough"; any number of experiments will show that it doesn't.

It's more that one of the properties of the brain is that it believes itself to be reliable enough.

Isn't every real computer program inherently probabilistic? Even if the code is not, the hardware always has some chance of failure. Even with ECC RAM etc, there is always some chance that multiple bits will get simultaneously corrupted into a new but still valid configuration.

You can't get to 100% reliability. You just have to choose how many nines you want. Why should this be any different?

This is different. If you get a failure, like a flipped bit in RAM, it's a failure. With many-core processors, the normal mode of operation is to provide correct results but they may be different from time to time, assuming more than one correct solution exists. Even if the results are exactly the same or only one solution exists, the process of getting there might be different.

Sure, it's different in terms of why it's not reliable, but the outcome is the same, so why does it matter?

If I understood correctly, the outcome is not the same. In a memory failure you get incorrect results. In a concurrent program, you get correct results but they might not be the same results every time (provided that there's more than one correct answer) or the computation that took us there might be different (in terms of timing, power consumption, etc).

Still seems the same to me in principle. For problems where there are multiple correct answers, probabilistic methods are frequently used to find them (e.g. genetic algorithms). All kinds of things can change timing and power consumption, especially when using a probabilistic method.

Due to round off error correct programs are not a subset of replicable programs. Scientists do trade replicability for speed.

You are perhaps not understanding what I mean by replicability. By replicability, I mean that if you run some statistics on a large dataset several times, you will get the same answer each time. Rounding errors in floating point arithmetic don't harm replicability, because all modern computers have that issue. There is some loss in accuracy, but floating point arithmetic is still deterministic.

> There is some loss in accuracy, but floating point arithmetic is still deterministic.

Not in a multi-core setting. When you do a floating point reduce operation the order in which you reduce your floating point numbers matters, but this order is not necessarily deterministic.

Translation for readers:

Let's say you have some floating point numbers, like...

  float A = 42.90F;
  float B = 00.01F;
  float C = /* ... */;
  float D = /* ... */;
(A + B + C) will not necessarily produce the same result as (A + C + B).

In fact, if you do (A + B + B + B ...) very much, then you will wind up exacerbating the problem --- it can be a source of bugs if you repeatedly increment a counter by small floating point values. For example:

  const int Iterations = 10;
  for ( int i = 0; i < Iterations; ++i )
    A += B;
  printf( "%2.4f\n", A );
The output is 43.0000, as you'd expect. But what if we increase Iterations to 100? We get 43.8998.

Astute readers probably would note that 00.01F can't be exactly represented in base 2, and wonder whether something like 00.0125F would still suffer from this problem. Alas, yes:

  A = 42.9000F 
  B = 00.0125F

  8    iterations ==  43.0000 
  80   iterations ==  43.9001
  800  iterations ==  52.9006
  8000 iterations == 142.8820  /* we would expect 142.9 */
So if you want to parallelize some computations, and you want to do 4000 iterations on core #1 and 4000 iterations on core #2 and then sum the results, well... let's see what happens:

  float A1 = 42.9000F;
  float B1 = 00.0125F;

  float A2 = 42.9000F;
  float B2 = 00.0125F;

  const int Iterations = 4000;
  for ( int i = 0; i < Iterations; ++i )
    A1 += B1;
    A2 += B2;
  float A = A1 + A2 - 42.9000F;
  printf( "%2.4f\n", A );
The output is 142.8885 -- which is different from our earlier result of 142.8820!

You may be tempted to try various things to get around this fact... for example, what about counting from zero rather than from 42.9000? (Answer: the error can be even worse, which is a topic worthy of its own separate post) ... but you will quickly discover that this is a fundamental issue. The order of operations matters.

Which is why every good numeric library should have a "correct" reduce functions implementations that guarantee minimal errors. in case of a sum of an floating point array, that would be a tree-based sum, possibly coupled by sorting the array beforehand.

Also, if you do some kind of map-reduce style work distribution where different nodes perform same operations on different data, it pays off to use a deterministic (pseudo-)random algorithm for work distribution, so that your results can be reproduced.

> When you do a floating point reduce operation the order in which you reduce your floating point numbers matters, but this order is not necessarily deterministic.

There is no such thing as a "floating point reduce operation". You can do a reduction using a floating point calculation as your joining operation, but even in that case, each individual operation is still deterministic. Moreover, a classical reduce is also deterministic, being a fold in one direction or other based on the associativity of the joining operation. The non-determinism comes only from your choice to specify the behaviour of your "parallel reduce" function in a non-deterministic way, and if you do that without understanding the consequences that's hardly floating point's fault.

In practice, the non-determinism is important for efficiency. Consider computing a norm or dot product in parallel. You sum the owned part of the distributed vector, then use MPI_Allreduce() to get the result of the global sum (the parallel implementation uses some tree structure, usually semi-redundantly, perhaps using dedicated hardware). If you run with a different number of processes, then the reduction will necessarily be done with different associativity. Of course any such result is as good as the others (unless the vector is somehow sorted in which case some orders will commit larger rounding error).

The MPI standard does not require that reductions use deterministic associativity for the same number of processes (independent of the physical network topology), but they do suggest that it is desirable and (AFAIK) most implementations are deterministic on the same number of processes.

Map reduce type calculations are less likely to be on kniwn or fixed numbers of pricesses, as they are designed for failing hardware and more flexible setups, unlike most MPI jobs. But most if the calculations are probabky less sensitive to errors too.

Exactly. So long as we're talking about accuracy in terms of precision, and not "5% of the time the result will be nonsense", people already make this sacrifice every day.

There are also plenty of existing models where accuracy is ultimately assured, but not always maintained internally. Branch prediction, for example. The computer guesses which way the branch will go! This can give a big speed boost depending on the workload and the predictor, and the computer assures accuracy by fact-checking the guess down the line.

The concept is less foreign that you think: When you do a Google search, do you get a replicable, provably correct result? Or do you get a result that 99% of the time is good enough?

The user of a search engine is probably willing to get the occasional poorly-ranked result. The user of an AI system is probably willing to get the occasional wrong answer, etc.

Implicit numerical schemes tend to handle errors better.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact