Hacker News new | past | comments | ask | show | jobs | submit login

From personal experience (anecdote alert!), errors are also common in the ostensibly stone-cold-hard field of algorithms in computer science. A few years back I went on a string algorithm kick, and started dredging up old algorithm papers from the 80's on which to build Wikipedia articles.

Often, the papers would get the general idea right, but if implemented as described would not work at all or fail on edge cases. The best example I have is an algorithm to find the lexicographically-minimal string rotation[0]. The simplest and fastest algorithm to do this is based on the KMP string search algo, and is tribal knowledge among ACM ICPC competitors. I thought it was pretty neat and wanted to cement this algorithm in popular knowledge, so I set about researching and writing the Wikipedia article.

I found the KMP-based algorithm in a 1980 paper[1] by Kellogg S. Booth. The paper has very detailed pseudocode which does not work. At all. The tribal knowledge version I inherited had similarities in the general idea of the algorithm (use of the KMP preprocessing step) but everything else was different. I scoured the internet for a retraction or correction, but all I found was a paper written in 1995[2] which mentioned in passing errors in the 1980 paper.

I do wonder exactly how common this is. I emailed a professor who co-wrote one of the papers, and he replied that "it seems to me that all the algorithms (including our own) turned out to have errors in them!" Has anyone done studies into errors in computer science papers?

[0] https://en.wikipedia.org/wiki/Lexicographically_minimal_stri...

[1] http://www.sciencedirect.com/science/article/pii/00200190809...

[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.9...




There is a point of view that says that computer science conferences exist for the purpose of gaming the tenure system. The name of the game is plausible deniability: you're not supposed to submit papers that contain known false claims, but everything else is fair game. And this has become such an integral part of the culture that technical correctness is no longer a necessary condition for accepting a paper [1]. I think in this light it's quite clear why many scientists are happy to leave their papers hidden behind the ACM paywall.

[1] http://agtb.wordpress.com/2013/04/14/should-technical-errors...


Thank you, that was a fascinating read. It is understandable that technical errors are given a pass, as they aren't the meat of the paper. In the case of the Booth paper, I really should state I do not mean to attack him. The idea of using the KMP preprocess to solve the problem is a wonderful approach and works very well despite the actual implementation being technically incorrect. If I recall, the bug had to do with the termination condition; the algorithm had to run twice as long to terminate correctly. I will say my understanding of the algorithm improved as a result of debugging it!


I think it's pretty common. Here's a note I wrote regarding Luc Devroye's highly-regarded (and excellent!) book on univariate random number generation:

Although not dissenting from the other reviews which tout the comprehensiveness of the treatment and its level of detail, I have to add an unpleasant fact about the algorithms: the codes may not work as written, and if they don't, there's not an easy way to track down the problem. (This is because of the nature of the constructions used in the complex constant-time algorithms -- this opaqueness is not a problem for the elementary algorithms which, alas, may not run in constant time.)

A look at the author's web site (currently at errors.pdf off his main page) shows that, e.g., the algorithm on page 511 [of the book] for Poisson r.v.'s has four serious bugs as originally published. This means that the main algorithm for one of the most important discrete distributions was not coded and tested by the author before the book appeared!

In fact, I believe this algorithm has at least one more bug, because I'm still seeing a small off-by-one anomaly in my implementation. The algorithm for binomial r.v.'s may have trouble as well -- I see problems for N=400, p=0.05. After 10 million draws (i.e., enough to get good statistics) I see deviations of counts in some bins near the peak (i.e. number of integer outcomes of the R.V.) of 8 standard deviations from the expected number of counts. So, be careful, and consider alternate implementations of the more complex algorithms.

There's a lot of detail in the book, and the techniques are valid. But as we all know, implementation sometimes reveals algebra errors!


Implementations of common algorithms are even worse.

The Java binary search implementation had a bug that eluded detection for 9 years, and it was based on a implementation from the 1986 book "Programming Pearls" that also contained the same bug (TL;RD: it's an overflow error that computers in 1986 would probably never have run into - who could imagine having an array with more than 2^30 elements?!).

Even worse: While the first binary search was published in 1946, the first binary search that works correctly for all values of n did not appear until 1962. - and this bug shows it is likely this 1962 version would have failed the same way.

See http://googleresearch.blogspot.com.au/2006/06/extra-extra-re...


Curiously, most c programs with 2^30 element arrays will probably have 64 bit ints, and will therefore work. In the time the code original code was written, I suppose the ints would have been 16 bit?


This is a problem in computational electrophysiology as well. There are a lot of typos and other simple mistakes in classic papers. Say you want to implement a finite differences model for a certain type of voltage-gated potassium channel. If you go back to the original paper it's not uncommon to find minus-signs omitted, parentheses placed improperly or other unfortunate bugs. It can take a lot of head scratching and wasted time to get to the point that you can reproduce the figures from the paper!

Granted when something has been in the literature for a long time, the derivative papers and popular implementations (in eg Neuron) are usually right, but there is rarely anything in the scholarly record that documents these errors. It's all tribal-knowldege and side-channels.


Ugh, that sounds terrible. During a previous internship at a HPC company[0] I implemented a computational electrodynamics FDTD algorithm as given in the Taflove book[1], and I made more than enough errors even without the book containing mistakes! Two fields, each with three components and subtly different equations for all. What a nightmare. Especially since it's impossible to tell what is wrong when watching the EM wave propagate in an impossible oblong fashion during your simulation.

[0] http://www.acceleware.com/

[1] http://www.amazon.ca/Computational-Electrodynamics-Finite-Di...


This is why I think it be important to have a language designed to express algorithms and can be verified of correctness to a certain point.


> This is why I think it be important to have a language designed to express algorithms and can be verified of correctness to a certain point.

Do you mean math?


mathematician make errors. I want something that can be checked formally.


> mathematician make errors. I want something that can be checked formally.

So you mean math?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: