Hacker News new | comments | ask | show | jobs | submit login
High Performance Computing: Are We Just Getting Wrong Answers Faster? (1998) [pdf] (nd.edu)
83 points by collapse 3 months ago | hide | past | web | favorite | 34 comments



I think it's worse than this, at least in some fields. In my field, aerospace engineering, the undergraduate curriculum typically has a numerical methods course. But the course is usually focused on basic algorithms — Newton's method, basic numerical integrqtion schemes, etc. — and does not go deeply enough to allow one to derive or understand the numerical methods used in the design of actual aircraft or spacecraft (FEM, solutions to the Navier Stokes equations, solutions to the PDEs that describe thermal transfer, etc.) so the software that performs these calculations is often treated by engineers as a black box. Admittedly, in a well run project there is usually hardware testing to back up the analysis, but such testing cannot realistically reproduce the conditions a spacecraft will encounter on orbit, or that an aircraft will encounter in near-crash situations.

I once sat on the external departmental review board for my alma mater, along with Mike Griffin (former administrator of NASA). We were interviewing recent graduates, and I brought this point up and got the interviewee to admit that he did not understand and could not double check the results of his FEM analysis tool. I was trying to get the board to rcommend beefing up the numerical methods portion of the curricula. Dr. Griffin, on hearing the interviewee admit this, agreed to add the recommendation. But nothing changed in the curriculum.


A lot of code is treated like a black box. For example, can you describe me how ssh works other than schematically? Do you know the details of the cryptographic bits? What you asked for in terms of the fundamental numerical algorithms like FEM is somewhat close to this. They hopefully (especially if they are trying to get a graduate degree) should be able to schematically describe it, but in no way should every scientist have to reinvent FEM or finite difference methods again and again just like people shouldn't implement their own crypto: it can be rather hairy and making a mistake could be disastrous. Writing complicated algorithms should be left to experts who can be trusted.


> Writing complicated algorithms should be left to experts who can be trusted.

How exactly do you propose we get those experts, if colleges aren't capable of training them?


> How exactly do you propose we get those experts, if colleges aren't capable of training them?

In the majority of cases, colleges (undergrad) aren't in fact capable of training them. The material is too specialized for undergrad. However colleges can prepare those who are talented and interested in the subject matter for further studies.

Many experts in numerical analysis start specializing in graduate school, and then acquire their expertise through postdocs, academia and industry experience.


This may have been the case when this article was written but I don’t think it’s the case now. Many universities have graduate and undergraduate computational science programs now.

https://www.siam.org/Students-Education/Resources/For-Underg...


noobermin, I don't think this is an accurate comparison, for a few reasons.

First, I'm not proposing that engineers need to write their own FEM tools. I am proposing that engineers understand how FEM works, so that they can understand where it can fail. I would not expect an IT security expert to write his own ssh, but I would expect him to understand the ways in which ssh can fail.

But second, ssh is not like FEM. With ssh, once you have a solid implementation, it sort of doesn't matter what data you put through it. But the issue with FEM analysis is that every FEM model is different, and the generation of those models is almost always algorithmic, not manual. An engineer should have either a freat intuitive feel for how well a FEM model will match reality, or straightforward mathematical ways to double-check the output of an FEM analysis.


Nooberim, I think many people could actually explain (in gross detail) how SSH works by the way.


If I were every to have input into the curriculum for an engineering program, I'd stress adding more about verification and validation of models, not adding more on numerical methods in general. All users of off-the-shelf models must do at least some sort of check on the model. You do not need to understand how the model works to check it, fortunately. Unfortunately, it's rare in my experience for engineers to do V&V. It might take some culture changes to fix this.


Agree with this. Adding more numerical methods at the undergraduate level may or may not achieve the desired result due to limited time in undergrad and lack of mathematical maturity among undergrads. It might end up being just another thing to cram.

I took a number of numerical analysis courses as a graduate student, and we had an entire course on how stuff could go wrong in numerics, e.g. catastrophic cancellation, ill-conditioning, nonconvexity, approximation errors due to discretization, etc. I don't think I could have appreciated all of this as an undergrad because I didn't see the practicality of this stuff until I had to solve large-scale numerical optimization problems as a grad student. Then it all started to come together. Also there was more time in grad school to step back and think more deeply about problems, as opposed to undergrad where you're just rushing to get as many courses done as possible.

Furthermore, numerical computation is as much an art as a science. The only way to learn is through the repeated grind of trying and failing, and there's no way to fit that kind of practice into an undergrad curriculum. Most people learn this stuff on the job or in grad school. The role of an undergrad numerical methods course is usually only as an introduction to the landscape.

In engineering, V&V requires an intuitive understanding of the physical system at hand, which requires experience, which in turn is usually acquired on the job. This is another reason why it is not taught in undergrad. These types of intuitions are best acquired in industry.


Yeah, that's a great point. In the example I had in mind, finite elements, the actual model is typically autogenerated by the software package. The CAD model is not meshed by hand. This issue was alluded to in the OP as well. But you are correct that this issue is not really with understanding the numerical method, it's with the model setup.


If you think that is bad wait until you see biologists use software. They do not treat it as a black box, they treat it as a god. Obviously this will not hold for all biologists but the things I have seen horrify me that that is part of science. Output from complex pipelines of software is taken as gospel without any checking because ´who can check all of that´.


I've seen bio data get messed up just by one person opening a csv in excel: https://genomebiology.biomedcentral.com/articles/10.1186/s13...

I mean not even editing anything, just opened the file to look at it and then allowed saving the file on close.

But yea, your typcial bio analysis is a lot of clicking around in one peice of software, importing that output into the next software, etc and is totally unreproducible. Then in the end you get an average and standard deviation (at least its not usually only p-values anymore) but no one ever checked what the underlying data looked like.

"Is your standard deviation for Group X so large because of low outliers, high outliers or is it a multimodal distribution?" "No idea. Whats a distribution?"


> the software that performs these calculations is often treated by engineers as a black box

IMO this is a deficiency in how programming is treated in engineering courses. It's there, but people are not taught to apply it to everything else, in the way they are with the math courses.

In my degree we did finite element methods, PDEs, and a load of other things, but only in a presentation style. So you'd go into some depth about how the equations worked, but the examples were always little atoms of a real model. So there'd be some diagram of how one element meshes with another and you could do some hand calculations to see what happens.

I think the issue is that you have to be a pretty good programmer to understand both the new FE/PDE/etc material you've been shown as well as be able to abstract it to such a level that you can write sensible code for it. If you're not quite proficient with abstraction, a specific language, as well as the coding toolchain, everything takes a long time and you'll be thinking about how the debugger works rather than how the model works.


my mechanical engineering curriculum included a basic numerical methods course in 3rd year along with a finite difference heavy heat transfer course, then optional courses on FEA and CFD in fourth year, that get you through the theoretical backing and writing your own solvers.

I'd assume any accredited Canadian university would have the same


Note that there are at least three ways to get the wrong answer with the correct code :

- Modeling and discretization errors (usually unavoidable but well understood).

- Uncertainties from the inputs (we have tools to get a feeling for their impact on the result).

- Floating point arithmetic (scientist tend to be unaware of its impact and/or consider that it is negligible next to the other sources of error)

My PhD covers methods to quantify the impact of floating-point arithmetic and suffice to say that you probably grossly overestimate the number of significant digits of your simulation. On the subject I recommend "What Every Computer Scientist Should Know About Floating-Point Arithmetic" : https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf


"Model inadequacy", i.e., the model being wrong, is also a major source of error.

In my training, it was basically assumed that floating point errors are negligible these days, so I'll check out your recommendation. If you've published any of your own more recent work, I'd be happy to see that as well.


> In my training, it was basically assumed that floating point errors were negligible these days

Yeah, that's what most of us think initially. The problem is that the effects of error can compound in many ways -- nonlinearities, feedback, etc. can all wreck your accuracy enough to make the results meaningless.

A really instructive counterexample for me was solving quadratics. Easy, right? Just try writing a solver for ax^2 + bx + c = 0. Now try breaking it. If all you did was use the quadratic formula, you should find that it's quite easy to break it, and surprisingly difficult to write a robust solver. (If you're completely stuck, consider the most obvious edge case: what if a ≈ 0? Any other scenarios you can think of?)


I have a paper that should be out in few months (I might do a show Hackernews). It details a method+tool, which provides fine-grained information out of the box, that I used on published simulation codes.

In the meantime, if you want a way to explore this subject on your own applications, I would recommend Verrou[0]. It has its shortcomings but it is probably the current easiest way to evaluate the impact of floating-point arithmetic on your code (I did not work on it but exchanged a lot with their team).

[0] https://github.com/edf-hpc/verrou


Interesting paper. I work in fluid dynamics and tend to agree that computational power often is wasted.

To elaborate on a point not discussed in the paper: Many methods used in practical engineering won't necessarily converge to the empirically correct answer as the computational cost increases, say, by decreasing the time step, increasing the number of computational particles, and/or refining a grid. This is true even if the uncertainty is quantified. Without this guarantee, additional computational expenditure could simply be wasted. The problem isn't the lack of computing power, it's the model.

One reason is that "reliable" physics like, for example, the Navier-Stokes equations for fluid flows, are not being used as the model because the computational cost for solving those equations directly is far too high. Instead lower cost models are being used which may be loosely based on the reliable model, but ultimately is not the same thing. A numerical implementation of the lower cost model might converge to something (which is not necessarily right) as the computational cost increases, but converging to the right answer is less likely than the reliable model. It depends on the reliability and generalizability of the lower cost model and how well it has been tuned.

(To those familiar with computational fluid dynamics, convergence is one reason to prefer LES to RANS, though it's worth pointing out that not all LES methods are convergent.)

A hypothesis I have is that computing resources have caused many scientists and engineers' theoretical abilities to atrophy. Perhaps they were never that good on average to begin with, but in my academic field I work in theory partly because I've found so much low hanging fruit. I don't see much of a point in making expensive computational models if applying basic regression techniques with a simple theory-based model is similarly or even more accurate. These simple models are much more efficient.

Also: I wasn't aware of attempts to bound the results. I attended an uncertainty quantification seminar before and asked one of the lecturers about this more generally and they seemed skeptical of its utility.


As Vladimir Arnold, who used to know a thing or two about fluid dynamics in particular, once said

> At this point a special technique has been developed in mathematics. This technique, when applied to the real world, is sometimes useful, but can sometimes also lead to self-deception. This technique is called modelling. When constructing a model, the following idealisation is made: certain facts which are only known with a certain degree of probability or with a certain degree of accuracy, are considered to be "absolutely" correct and are accepted as "axioms". The sense of this "absoluteness" lies precisely in the fact that we allow ourselves to use these "facts" according to the rules of formal logic, in the process declaring as "theorems" all that we can derive from them.

> It is obvious that in any real-life activity it is impossible to wholly rely on such deductions. The reason is at least that the parameters of the studied phenomena are never known absolutely exactly and a small change in parameters (for example, the initial conditions of a process) can totally change the result. Say, for this reason a reliable long-term weather forecast is impossible and will remain impossible, no matter how much we develop computers and devices which record initial conditions.

https://www.uni-muenster.de/Physik.TP/~munsteg/arnold.html


>> a small change in parameters (for example, the initial conditions of a process) __can__ totally change the result. Say, for this reason a reliable long-term weather forecast is impossible and will remain impossible

(emphasis mine)

A small change in initial conditions can totally change the result in chaotic systems such as weather forecast, or a much simpler example — double pendulum.

There are systems where a small change changes the results only slightly. E.g. a single pendulum.


Regarding the bounding of numerical results, we currently have precise method that cannot be used on real size applications and methods that can be used at scale but tend to tell you that your result is between minus infinity and plus infinity. I have seen it used sucessfully in small math kernel and embedded systems but genuine application usually require deep modifications of the algorithms to get any useful result.


Speaking of bound results, interval arithmetic comes to my mind and the work of Ulrich Kulisch [0]. I've used it together with automatic differentiation with some remarkable results, but the problems were comparably small (approx. 40 dimensional optimization).

[0] http://www.math.kit.edu/ianm2/~kulisch/de


In this context, I also highly recommend John Gustafson's book, The End of Error. It is full of great examples, and at one point contrasts advances in numerical computing with improvements in other areas:

"In 1970, a printer might produce something that looks like this, and take about 30 seconds to do so: [picture] Over forty years later, a laser printer still might take 30 seconds to put out a single page, but technology has improved to allow full-color, high-resolution output."

The book makes a compelling argument that working with floating point numbers is similar to printing thousands of low-quality pages per second, instead of getting a single high-quality page in 30 seconds. A few paragraphs later, the book states:

"Eventually it becomes so obvious that change is necessary that a wrenching shift finally takes place. Such a shift is overdue for numerical computing, where we are stuck with antediluvian tools. Once we make the shift, it will become possible to solve problems that have baffled us for decades. The evidence supporting that claim is here in this book, which contains quite a few results that appear to be new."

For more information, see:

https://www.crcpress.com/The-End-of-Error-Unum-Computing/Gus...

and John Gustafson's home page:

http://johngustafson.net/


He has great examples. Sadly he found shortcomings to his unum numbers and his new alternative[0] do not seem that great compared to the current status quo.

[0] https://posithub.org/index


unums are at best very controversial. In actuality, I don't think I've heard a numerical analyst (save for Gustafson) come out in favor of them.

The original proposal that Gustafson had involved a variable-width format, which is going to absolutely kill performance in practice (particularly on the GPU-based supercomputers that are all the rage nowadays). The posit format was essentially a recognition that variable-width doesn't work.

The more serious error is that, well, interval arithmetic turns out not to be useful in practice. Essentially, the potential accumulation of error is exponential in the number of steps in the worst case, so interval arithmetic in practice too often gives you the answer "the result is 1, but it could have been anywhere between -inf and +inf", which is completely useless. He may complain that he doesn't want to use a numerical analyst to get work done, but the rebuttal is that numerical analysis is still necessary anyways, so promising that one isn't necessary is a false promise.


At least with (-inf, +inf) you know that the answer is nonsense and that you need to go back to the algorithm to find a better order of operations.

With ordinary floating point, all too often people don't realize their results are nonsense. Adding measurement and computation error bounds to their graphs is illuminating.


This is indeed a key point in John Gustafson's arguments: With floats, even though there are specialized flags and values that let you detect certain problematic situations, these flags are often not readily accessible in programming languages, and hence these issues tend to go unnoticed.


> these flags are often not readily accessible in programming languages

Unless you use C, C++, or Fortran. Which accounts for most numerical algorithms code. Yes, most people aren't aware of them, but not being aware of all the functions in the standard library isn't the same as them not being present there.


Just because the error interval is [-inf, +inf] doesn't mean the actual error is [-inf, +inf]. There are other techniques (such as Monte Carlo simulation) that can give much narrower error bounds for the actual computation.


How do I calculate measurement and error bounds for my plots when I'm using floating point? I honestly have on idea what to google.


>90% of numerical analysis would remain if continuous arithmetic was exact. There exist some prominent failures of finite precision arithmetic (and they resonate with uninformed audiences), but discretization and modeling errors are far more insidious and a deeper challenge for reliable scientific computing.


Agreed. Though I'd say that model errors are a challenge, but I wouldn't call them insidious. There is generally serious analysis of the effect of the chosen discretization method/grid size, how to handle sub-grid scale effects, etc. Likewise models.

I think it is interesting to compare this old panic with ML work. Without gazillions of flops the neural network approach was dead. After a few years it became clear that float/double precision was useless, and now for certain problems half precision is standard.

Where one resource (people, CPU, ram, money) is the fixed constraint, other factors will be tweaked to compensate.


I don't know what domain you work in. The modest 1986 editorial statement from the Journal of Fluids Engineering is still pretty much a gold standard for rigor in real-world applications and is exceedingly rarely achieved in domains where direct observational data is hard to come by (most of geophysics, much of biophysics, etc.). E.g., https://jedbrown.org/files/20160912-NumericalAccuracy.pdf




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: