Hacker News new | past | comments | ask | show | jobs | submit login
Nelson Rules (wikipedia.org)
219 points by misterdata on July 31, 2015 | hide | past | favorite | 40 comments



Out of curiosity, with a regular normal distribution I wonder what the probability is that the most recent point finds a problem. I guess you could calculate these things separately for an approximation, but I'd probably just want to simulate it...

For a first cut:

- Rule 1: 0.3% of samples are more than 3 standard deviations of the mean.

- Rule 2: 1/2^8 = 0.4% chance the previous 8 points were on the same side of the mean as the most recent one.

- Rule 5: 2.5% chance of being above 2sd on either side, 3 choose 2 is 3, 2 sides, so 0.375% of exactly 2. "2 or 3" is not much higher.

- Rule 6: More than 0.55%, if I've done my maths right.

- Rule 7: 0.3%

I guess you're going to get a lot of false positives if you're sampling reasonably frequently -- maybe one in 50?


I got interested in this so whipped up a notebook for my interpretation of the rules against random sequences of different lengths. Failure rates are the number of sequences that contained an error, not the number of errors in all the sequences (so a sequence that had 3 errors counts as one failure).

A couple of rule descriptions were ambiguous, rules 5 & 6 (at least to me).

I'll upload a PDF output when I get latex installed again (downloading the html file is probably the easiest way of quickly seeing the output)

Edit - updated.

PDF:

http://files.figshare.com/2196604/Analysis_of_the_false_posi...

PDF/HTML/notebook

http://dx.doi.org/10.6084/m9.figshare.1499204


Damn, missed the edit window. Fixed rule #5

http://files.figshare.com/2196656/Analysis_of_the_false_posi...

Same DOI, new version.


I'm probably missing something obvious by why random sequences when the Nelson rules appear to be aimed at measurements of real world properties?


Well the assumption here is that we've got something we're measuring with a steady mean and either inherently noisy variation or some measurement error on top.

Lots of real-world samples follow the normal distribution, and anything that does should look roughly like that sim.

So there's no real need to use random numbers, but it's a very quick way of me getting data that looks like real data and I know I've got the standard deviation & mean correct and that there should be no anomalies.

My sim can only show one side of the story though, it can't show how often real issues are picked up. For that, we'd probably want to look at real-world data and investigate each reported issue to see what proportion are important (and then possibly try and see how many were missed).


A small random variance or a bunch of uncorrelated errors ought to produce something very similar to a normal distribution, which we can model with random generation.

The Nelson rules are basically an attempt to determine whether relatively 'healthy' data is actually coming from some specific forcing events (e.g. oscillation instead of random variance), so it looks for breaks with normal distributions.


A lot of this is the context... You're not trying to prove something is mathematically perfect. You're trying to figure out if the variation is common cause [0] or a special cause. It's ok if you get this wrong from time to time because it's decision support, and it's better to act on imperfect information than to wait too long for perfection.

[0] https://en.wikipedia.org/wiki/Common_cause_and_special_cause... Common causes are addressed by fixing the system as a whole, while special causes are addressed by fixing one-offs. For example if you are addressing variations in delivery times, you address small delivery variance by improving the maps, which help every delivery. You address the one-off variation by firing the guy who takes 3 hour lunches mid-delivery every few weeks.


Can I reverse your numbers and say that Rule 1 has a 99.7% confidence that a datapoint is out of control? That seems a little on the low side, 0.3% means every 1000 data points, which might be every hour and a half the system would generate such an outlier. That's too many for my sysadmins to generate an alert on anyway.

When do you generate an alert? I'd say a false positive once a month would be acceptable. That'd be around 6-sigma confidence if you measure every 5 seconds.


You're missing a crucial parameter, which is how often you think your system goes out of control.

You want a high Pr(problem|alert) but also want a high Pr(alert|problem), the trade-off you choose between them depends on how often you expect problems to occur. If they are rare then you want false positives to be rare. If problems are frequent then false positives can happen more often without affecting Pr(problem|alert) so much.


Thanks for this. Rule 5 immediately stood out to me as a high error source, and I came to the comments hoping someone had done the actual math.

My guess is that positives from these rules would be logged rather than immediately reacted to. If over 1000 detections you break each rule at about the expected rate, all is well. If you break a few rules well outside of expectation, it raises some questions.


By error source you mean false positive source? Because according to the analysis doc the rules 1 and 6 have way higher false positive rates.


Small communities become smaller when certain people share similar interests. -Tarblog


At my work in ad tech, we see about 1.5% instead of 2.5% for our publishers


I guess that's the idea. If you are manually sampling a process, or looking into a graph manually created by somebody else, you shouldn't see two rules broken on the same day.

Of course, nowadays we can get so much data that we can create a procedure where an event with 10^-3 chance is seen several times a day.


How big is your simulation test set? For small number of elements, that outstanding Rule 5 result could be just a coincidence... ;)

(I'm taking some rigorous approach here. By intuition it seems to me that Rule 5 should generate more false-positives than other rules)


This is not about simulation tests, but more likely about industrial processes.

If you are already paying $60K per year to your QA technicians, it does not matter if the rule fires once per hour or once per week. You want those guys to investigate every incident. Most of the times there will be false positives but every other month QA will catch enough minor defects to pay for itself. The real benefit though, is to catch the bigger multi-million-dollar-issues that come every other year.


Harry Nyquist would probably have something to say about the validity of rule four. Fourteen points in a row of alternating increasing and decreasing isn't just indicative of an oscillation, but an oscillation that's close enough to your measurement frequency that you aren't actually able to measure it accurately. It could be a real oscillation, or it could be an artifact of much higher frequency behavior.


I think Rule 4 is valid because it is intended to catch both situations you describe: oscillations at approximately the Nyquist frequency and also much higher frequency behavior (Rule 8 is intended to catch oscillations below the Nyquist frequency).

In any of these cases, the Nelson Rules state that the signal is unlikely to be a stochastic random variable around the given mean. It is instead likely to have some underlying shape that is not being described.


The thing is, something going on above the Nyquist frequency could create ANY of the Nelson rule patterns - or none of them. Expecting it to create a nice alternating up/down pattern is wishful thinking.


Yeah but the rule isn't "if this doesn't happen everything is fine", rather "if this happens, something's probably wrong".


I'm also thinking that in the case of rule 2 or 3 you could also have an oscillation that has a frequency close to or equal to the acquisition frequency.


Related: https://en.wikipedia.org/wiki/Moscow_Rules

Which includes my favorite: "Once is an accident. Twice is coincidence. Three times is an enemy action."


Interesting - your bringing up operational security in the context of trying to spot anomalies in data reminds me of the information warfare concepts in Neal Stephenson's Cryptonomicon, where the challenge for some of the World War II era characters was to try to prevent Nelson-pattern-like deviations from normal randomness turning up in data available to the enemy, so they wouldn't suspect that their cryptosystems had been cracked.


> 3. Everyone is potentially under opposition control.

Even the author of the rules? Hmm…


"Being under opposition control" does not necessarily imply "is not telling the truth."


I think I read that first in Goldfinger. :-)


Thanks for this reference!

GCSE Statistics (UK school exams at 16 years old) teach a simpler system of process control rules, closer to Western Electric https://en.wikipedia.org/wiki/Western_Electric_rules: and that is the only place I have ever come across them.

Is this in current practical use?


Oh yes. Yes yes. Learn this and understand how it applies to your systems, your processes, and especially (surprise) your people.

This is one quarter of how W. Edwards Deming promoted organizational quality control—understanding how variation works, period. (The other three being understanding psychology, understanding systems, and understanding the theory of knowledge or scientific method).

This applies directly to understanding whether observed variation has a common cause (is a natural pattern of the system), or is special cause (something unexpected): https://en.wikipedia.org/wiki/Common_cause_and_special_cause... and this impacts how you handle the variation.

For those criticizing validity, I'll say this is a way to mentally model how to understand variation, and is not meant to be 100% accurate. You're trading intuitive modeling for perfect math. But it will allow you to get close in a back-of-the-napkin quick way so you can identify patterns to study in more depth. Also, think of this in the context of many types of systems, not just a tight electrical signal pattern (which are easy systems to understand). Systems of people doing software development, machines in manufacturing processes, complex network error patterns, etc etc.

People don't often have a good idea of what's important and what's noise, especially when you don't even have a control chart but are just using intuition and a few data points. We see outliers and variations all the time in processes, especially in human processes like those we encounter in most software companies. Estimation and delays, developer performance, load failures; all kinds of complex systems that exhibit variation that people are usually "winging it" to understand.

Instead of understanding the variation and the data, people often handle every large variation in the same way, trying to "fix" it or peg it on some obvious correlation they think they observe. This says: hold on, understand what you're looking at intuitively first. Then gather more data. Don't act without understanding. Deming was fond of saying, "don't just do something, stand there!" Lots to be learned from that, and much to be gained from the simple intuitive understanding of patterns in variation.


This seems terribly fragile and ad-hoc. It doesn't even take into account sampling rate and it clearly depends on it.

I guarantee there are better methods.


That's exactly right. They are heuristic. Control charts were designed to be used on a manufacturing floor by workers who updated their charts manually on paper and called an engineer to aid in further investigation when a manufacturing process might to be out of control. Making out of spec parts or having an expensive machine break because it's operated outside it's limits cost real $$ so some false positives on a low frequency signal can be worth it. Also manufacturing engineers or QC would be responsible for the frequency of testing and sampling procedures so it's unlikely to be as naive as you might assume from the outside.


It is terribly fragile and ad-hoc. It depends on what you're trying to detect, but the control charts in the OP produce a maddening number of false positives. I had a job that, for about six months, consisted almost entirely of adjusting control limits by hand so the false signal rate would drop.


Since I was preempted for the comment "this is typical quality-control ad-hockery BS", I'll play devil's advocate and argue that the point of quality control is to identify two components of a mixture distribution[0]: a bounded distribution of uncertainty, which can be modeled as a beta ("PERT", in some patois), and an unbounded, "error" term that functions more like a Poisson or even a Pareto.

This is already an adhockish simplification of something like Mandelbrot's seven regimes of randomness [1], which is itself, well, an oversimplification of his own work. But it formalizes the insight that quality-control is trying to impart -- the identification of inconsistencies among consistent variation.

So let's run some simulations in Matlab. We'll generate M numbers distributed like Beta and N like a Pareto (a "long tail", "black swan" distribution) with identical mean and standard deviation, and shuffle them before we interpret them as a time-series. Then we'll check Nelson's rules. Since we know how many ordinaries there are, we have a target.

In 10^4 repeated simulations each with samples of 180 ordinary betas and 20 Paretos, we expect to identify 10% of abnormals. Now, my samples are shuffled, and Nelson rules rely on time-structure (but this is precisely their weak spot); my code [2] also has visible bugs I didn't bother to fix because they'd involve thinking too hard and didn't seem so serious in large samples. Still, here are identification rates:

- Rule 1: 0.5%. This will be counter-intuitive to students of the normal distribution, but recall that the ordinary observations have a bounded distribution and we're really catching only the abnormals. Now, we've missed a lot of the 10% by this rule.

- Rule 2: 6.32535%

- Rule 3: 3.4421%

- Rule 4: 4.484%

- Rule 5: 27.83735%. This is the "medium tendency for samples to be mediumly out of control".

- Rule 6: 8.49465%

- Rule 7: 4.38775%

- Rule 8: 2.1294%

(Edited after some bug fixes that, comfortingly, didn't change the results by much!)

[0] https://en.wikipedia.org/wiki/Mixture_distribution [1] https://en.wikipedia.org/wiki/Seven_states_of_randomness [2] http://lpaste.net/137664


a confusion matrix would help a lot here for a better idea of classification performance


I have B.S. in Statistics and I have never heard the term "Nelson Rules". However all of this information was taught under other names when we were dealing with "Normality". Also, don't forget to convert your data into Standard Normal Distribution (and it is not that simple, you have check some tests also!!!). And of course you will always make mistakes because even 4th year Statistics students make mistakes...


I wonder what the application, if any, there is to finance/stock charting?


Stock prices are more complex than that. In general they don't follow normal distribution.

Then they have some sort of trend.

Then they might be autocorrelated.

Then sometimes they have some seasonality.

Also different time series are correlated between them to some degree..


The closest thing to this in finance would be technical analysis (different from quantitative analysis, which has more basis in reality). But as other posters pointed out, this stuff has a fair amount of selection bias clouding how useful it really is.


monitoring of manufacturing processes.


We live by these things in the clinical lab, though we know them as the copyrighted version, the "Westgard rules". Glad to have the original, thanks!


There is also a MILSPEC for general quality control processes, though I can't find the particular document at the moment.

If you're one of the people in the thread describing this as too subjective or strict, the MILSPEC is probably more appropriate for your process.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: