
The Patriot Missile Failure - AndyBaker
http://ima.umn.edu/~arnold/disasters/patriot.html
======
dmourati
In 1993, while in college, I wrote a paper about the fallacies of the Patriot
Missile effectiveness. What I learned in writing it was that the so-called
accuracy of the Patriot was grossly exaggerated. The mainstream media, and
indeed the country, needed some symbol to rally around and the Patriot Missile
became that symbol. The battery's very name started the process.

The US touted a 95% accuracy.

[http://www.slate.com/articles/news_and_politics/war_stories/...](http://www.slate.com/articles/news_and_politics/war_stories/2003/03/patriot_games.html)

In reality, a "success" didn't mean disarming the scud. It meant flying on a
trajectory sufficiently close to the scud to have disabled it if all else
worked as designed.

~~~
otoburb
The touted 95% accuracy was the primary reason US expats would sometimes
_leave_ their houses when the sirens sounded with their home cams to try to
catch the Patriots intercepting incoming Scud missiles during the Gulf War.

I lived on the civilian airport during my youth in Riyadh during the Gulf War.
The other expats thought the Americans were crazy no matter what the supposed
accuracy.

It wasn't until after the war that we all realized the interception rates in
the field were so much lower.

Pretty surreal.

~~~
midas007
The question then: is lying to people kinder than freaking them out with the
truth?

~~~
TeMPOraL
This breaks down when your lie goes a full circle around the feedback loop and
now someone above you is making the decision to pursue further a solution that
was based on a lie.

------
snacktimetoday
As an Israeli who lived through the Gulf War and served in the army, I can
attest that the patriot missile was a success and massive failure at the same
time. It was better than nothing, let me just say that.

The problem was that we knew it had issues, complained many times, but it was
tied up in politics. We wanted to develop and deploy our own missile defence
systems for a long time, but in many ways we were more or less blackmailed
into spending the defence loans we receive to pay back the American defence
establishment. The message was take what you are given and enrich private
American companies, or else (btw for the haters, we must spend our "aid" with
companies like Lockheed, Raytheon, Boeing, etc., it does not go to anything
else at all, so really it's your tax dollars shilling your military industry).

Anyway, I saw first hand Patriot misses and the fear after that, especially
regarding chemical weapons. A huge part of the country spent time with gas
masks and plastic in safe rooms during the gulf war. At the end of the war, we
felt like our leaders failed protecting us sufficiently, especially when they
knew there were issues.

The interesting outcome is this directly lead to various missile programs
including kipat barzel, arrow, spider, and others. Before, missile defence was
a much harder sell, but the aftermath of patriot failures raise the case that
again as a country we had to be more self-sufficient regardless of the cost.
The other reason of course is that the Americans never really had an offering
that assessed needs such as short-range, low flying projectiles, rockets,
shells, multiple-target tracking, etc. Today, we have arguably the most
advanced short-range and tactical missile defence systems.

All of these systems are built heavily on targeting/guidance and to run on
cheap hardware that can fail massively. The interceptors and computers are not
necessarily cost effective and super expensive, but much more practical.
Additionally, redundancy in terms of overlap of targeting errors and misses is
a heavy part of deployment. Resource-wise, it's not always possible, but I
know first-hand it is a combination of various operational failures of the
Patriot combined with years of relentless attacks by our enemies using
anything from glorified flying garbage to old Soviet tech.

~~~
allochthon
> The interesting outcome is this directly lead to various missile programs
> including kipat barzel, arrow, spider, and others.

Good for you guys. One gets the sense that the US defense industry is
breathtakingly inefficient, and overcomes this handicap only through a massive
infusion of funding.

~~~
ejain
Or perhaps it's the massive infusion of funding that makes the US defense
industry breathtakingly inefficient? :-)

~~~
dragonwriter
The US defense industry is breathtakingly efficient at extracting money from
the US government, which is, after all, its whole point.

The US _military procurement_ system may be breathtakingly inefficient at
procuring cost-effective equipment, but that's a different issue.

------
angersock
For most of the whiz-bang web apps being written today, this sort of thing
doesn't matter.

Every so often, though, working on embedded devices or medical software or
finance stuff, it becomes really important that you remember that lives depend
--in a non-trivial and quite real way--on your code being correct and on the
implemented algorithms fitting the problem.

Something that's very tricky isn't just understanding that the code does what
it says it does, but that the code implements a solution that is properly
modeled to the problem at hand.

EDIT:

My background is in mechanical engineering, and I never forget this quote by
Dr. Dykes:

 _" Engineering is the art of modelling materials we do not wholly understand,
into shapes we cannot precisely analyse so as to withstand forces we cannot
properly assess, in such a way that the public has no reason to suspect the
extent of our ignorance."_

Software engineering in life-critical applications is serious business.

~~~
jrmenon
> Software engineering in life-critical applications is serious business.

Probably one of the most tragic cases of a SW bug:

[http://en.wikipedia.org/wiki/Therac-25](http://en.wikipedia.org/wiki/Therac-25)

~~~
angersock
Yep. And that's why you really, really want hardware failsafes in addition to
whatever nonsense code you're writing.

~~~
hga
True, but those are not always possible, nor can cover everything.

What struck me the most was the criminal incompetence of the developers, both
of the total system as it moved to software control and especially the
software itself. Not to mention the Crown Corporation's response to the
problem.

------
jug6ernaut
While Science Fiction and not directly related, one of the Tom Clancy novels
went into the Science of missile intercept pretty extensively(was a very
important part of the book.

This book brought to my attention the mathematics and for lack of a better
term difficulty of having a missile intercept system. Not only the mathematics
but the but the raw limitations of physics that must be dealt with. In the
case of the book it was ICBM's which travel much faster, 7 km/s (15,700
mph)[1], compared to the 3749.11 mph of the skud missile.

For anyone interested the book is Tom Clancy - The Bear and the Dragon.

[1]
[http://en.wikipedia.org/wiki/Missile_defense](http://en.wikipedia.org/wiki/Missile_defense)

~~~
arethuza
This clip of a Sprint ABM test gives some idea of how incredibly fast ICBM RVs
are - it makes the 0-Mach 10 in 5 _seconds_ Sprint look like it is hardly
moving:

[https://www.youtube.com/watch?v=msXtgTVMcuA](https://www.youtube.com/watch?v=msXtgTVMcuA)

~~~
hga
As I remember, source would be Jerry Pournelle, I think, from the time ICBM
RVs hit enough atmosphere that mass separates warheads from decoys (if your
decoy weights as much as a warhead, it's pointless), to the time it hits, is
about 10 seconds.

Not a lot of time for a discrete point defense (as opposed to e.g. throwing a
curtain of stuff up between you and the RVs).

~~~
cameldrv
Sprint was a last ditch system. It was supposed to intercept the warheads that
leaked through the Spartan system, and at only about 100k ft. At that point,
the Decora would have burned up. Of course both types of interceptor had
nuclear warheads. 100 sprints going off over Grand Forks wouldn't be very
pretty... The real point was to defend the missile silos though, and they
might have worked for that.

~~~
hga
Enhanced radiation ("neutron bomb!!!") warheads for Sprint, emphasize the
neutron flux, lessen the explosive yield.

Better way up there than the RV warheads detonating on the surface, although
the successor proposed LOADS planed for a 75K foot intercept to among other
things decrease EMP effects. And, hey, wouldn't all those bright lights in the
sky look pretty ^_^?

------
danbruc
This makes no sense in the way it is stated - the calculations involved are
independent of the up-time. You do not aim at a target differently depending
on how long the system is up. The correct way to think about this is probably
the following - they noticed that the up-time drifted and tried to improve on
that, but they failed to do so in all places. In consequence different parts
of the system used different times that also advanced with differing speeds
and this inconsistencies caused the calculations to go wrong. As mentioned in
the article, would they not have made the "improvement" all parts would have
used the same time and the errors would have canceled because the calculations
are time-independent and the drift is probably to small to be important during
the relatively short time a target approaches. This seems one of the rare
cases where blaming math with limited precision is wrong.

~~~
jessaustin
Do you have additional information about this incident? Although TFA alludes
to something like what you're saying with its " _inaccuracies did not cancel_
" comment, your point seems to require an assumption that is not stated there.
Aren't you assuming a non-distributed system, when it seems very likely that
radar sensors and missile launchers would _not_ be co-located? In a
distributed system, we certainly _cannot_ assume identical boot dates, and so
an accumulating error would accumulate differently at different components of
the system.

Systems like this must have time agreement among their different components.
It seems that a system developed later would have just used GPS to prevent
this problem, but I wonder why they didn't use NTP?

~~~
danbruc
The obvious evidence are the laws of physics which are invariant under time
translation. What you are saying is exactly what I said - it does not matter
if the drifting clocks are in different parts of a distribute system or if a
single system uses different clocks. In the end the system fails because
different clocks disagree on the current time, not because they drift away
from the actual time. And the article states exactly this - they tried to fix
the clock drift relative to the actual time but failed to do so in all
components and thereby inadvertently introduced two clocks drifting relative
to each other.

~~~
jessaustin
_In the end the system fails because different clocks disagree on the current
time, not because they drift away from the actual time._

The possibility I mentioned was that the clocks disagreed _because_ they had
drifted away from actual time at similar rates, for different periods. _[r ≠ 1
∧ t₀ ≠ t₁] → [t₀ + (t-t₀)r ≠ t₁ + (t-t₁)r]_. Unless specifically addressed,
this phenomenon _will_ occur in distributed systems.

That said, the link 'gibrown provided seems to establish firmly that this
particular error resulted from times being up-converted (24bit fixed to 48bit
floating) via different methods in different parts of the system.

~~~
danbruc
Now I see what you mean, equal drift rates but different up-time. Assuming two
clocks coming up at different points in time this will indeed cause time
differences if they are initially synchronized with a third clock. If the
clock coming up later synchronizes with the other clock already running there
should be no issues though.

I read the article, too, and this two different ways of converting are more or
less equivalent of two clocks starting in sync but drifting relative to each
other. Therefore my initial comment was quite on the spot.

------
fiatmoney
What's interesting is that this is the kind of thing where a standard "boot-
and-test" routine is likely to miss it (because the clock starts out in sync).
Time-dependent bugs are always tricky.

------
bradleyy
I was an 18 year old grunt, fresh out of basic training, when I went to Iraq
in 1991.

After about 48 hours of living on tarmac at the airport (where I experienced
the first of many MOPP-4 chemical warnings: let me tell you how fun it is to
have a gas mask, charcoal suit and rubber accoutrements while laying on
tarmac, psychosomatically creating the nerve agent symptoms they just taught
us), I moved to the a high-rise apartment complex.

I was in a North-facing room, on the northernmost edge of the complex. Open
desert as far as the eye can see to the North. I watched scuds get shot down
(seems like every night, but memories are wont to be inaccurate).

Here's the thing: I hear all this talk about the Patriot missle being
inaccurate, and I seem to remember something like "no patriot ever shot down a
scud". That takes some serious parsing to arrive at-- because I saw patriots
"hit" scuds, but of course they could have "exploded in the vicinity of"
scuds, destroying them.

From what I hear, the Patriots were running on 100-mile-an-hour tape (you did
know that the military has its own version of duct tape, right?!) and bubble
gum, but I'm thankful for them nonetheless; you tend to take what defense you
can when folks shoot at you.

------
madengr
Even if it didn't fail (numerically), there is a decent probability it still
would have failed to intercept the scud:

[http://en.wikipedia.org/wiki/MIM-104_Patriot#Success_rate_vs...](http://en.wikipedia.org/wiki/MIM-104_Patriot#Success_rate_vs._accuracy)

The U.S. Army claimed an initial success rate of 80% in Saudi Arabia and 50%
in Israel. Those claims were eventually scaled back to 70% and 40%. However,
when President George H. W. Bush traveled to Raytheon's Patriot manufacturing
plant in Andover, Massachusetts, during the Gulf War, he declared, the
"Patriot is 41 for 42: 42 Scuds engaged, 41 intercepted!"[28] The President's
claimed success rate was thus over 97% during the war.

~~~
at-fates-hands
You left this part out though. .

"Patriot PAC-3, GEM, and GEM+ missiles both had a very high success rate,
intercepting Al-Samoud 2 and Ababil-100 tactical ballistic missiles.[17]
However, no longer-range ballistic missiles were fired during that conflict.
The systems were stationed in Kuwait and successfully destroyed a number of
hostile surface-to-surface missiles using the new PAC-3 and guidance enhanced
missiles"

~~~
hga
Indeed. The real world performance of Patriot PAC-2, the first generation
tweaked for BMD but using the same hardware except for increasing the size of
the warhead's projectiles, says little about how later versions and
generations (PAC-3 uses a dedicated missile, 1/4 the size and therefore 4
times as many in a launcher) also preformed in the real world.

Since in this case the real world = actual instances of enemies shooting live
missiles at you trying to kill you, it's been proven in some acid tests. And
one reason Guam is sprouting Patriot and THAAD batteries.

------
bronson
The devops fix? Reboot every night. Works wonders. :/

~~~
salem
This is exactly what the Israelis figured out and told the Pentagon before
this failed interception, but the information was not passed on to units in
Saudi Arabia.

~~~
snacktimetoday
The information was not passed on because it was damaging to the military
industrial complex interests at the time. Americans at that time in particular
really felt infallible after the Cold War and did not accept feedback very
well. The skunkworks/American military machine was starting to crack apart
with laziness as things descending into contracting hell.

As I mentioned in my other comment, my perspective as an Israeli witnessing
all this was, "Don't bite the hand that feeds you." It is a shame that
Americans and others had to pay a price for arrogance and politics. It may not
be the official story, but anyone exposed to that world could tell you
otherwise.

~~~
salem
They did however release a software patch within about 2 weeks of getting the
report from Israel, but it took 10 days to make it to Saudi Arabia, one day
late.

~~~
snacktimetoday
In time of war, minutes are an eternity. No reason these soldiers should have
had to wait so long other than inefficiency.

------
briantakita
Here's a post mortem report sent to the House of Representatives:

[http://www.fas.org/spp/starwars/gao/im92026.htm](http://www.fas.org/spp/starwars/gao/im92026.htm)

------
boiler_up800
We talked about this in my microcontrollers class today! Not this exact
incident but about the problems associated with using the system clock in a
micro for precise timing.

Our micro (Freescale HC9S12) is clocked by a 24 MHz crystal quartz oscillator
and if you use the timing module to clock the micro, you cannot get exactly
1ms, 10ms, etc. This makes summing the time over a long period tricky.

~~~
azernik
What was the solution proposed in your class?

Of the top of my head, I can only think of keeping time values in units of
clock-ticks (equivalent to the Linux "jiffy") and only converting to/from
milliseconds when turning programmer or user input into internal values.

~~~
sitkack
This. There is almost no reason to put time back into ms when absolute clock
ticks will do. Never sum low precision values, etc.

------
allochthon
I'm not that knowledgeable about hardware, but the fact that they were using a
"24 bit fixed point register" sounds like they were not using a commodity
processor, even for 1991. I assume that the error described would simply not
arise today on a modern CPU running Linux?

~~~
gcb0
Medical and military equipment trove on outdated hardware just because they
can save a few hundreds of dollars on a product that will cost thousands or
millions by avoiding going thru the simple certification process for the new
hardware.

It's a simple case of beancounters saving little in obvious places and wasting
a lot later (in engineer time, testing, recalls, and in this case soldier
lifes)

sadly, everyone in those industries buys that idiotic notion and repeat it
like a mantra.

~~~
bronson
You've never worked for a military contractor have you? It's never black and
white.

Let's say you ran the project that converted the missile from the militarized
rad-hard ceramic-capped chip ($$!) that's likely in there now to a nice and
cheap off-the-shelf Arm7. Except you forgot about coefficient of expansion for
120degF middle-of-the-day Middle East launches. Your Arm7 flexes too much and
loses some bits, your Patriot misses, the soldiers lose their lives, and now
the parallel universe equivalent of you can spout off on Internet forums about
how the military just doesn't respect traditional, proven designs and is
always wasting money on following the latest trend.

Also, you probably don't realize when the Patriot entered service? And the
number of upgrades it's received?

~~~
protomyth
Don't forget the fun of making sure the chip is built for an extended period
of time to the exact specs that you started with.

------
dicroce
If the time was in tenths, why would you multiply it by 1/10 to get seconds?
Shouldn't you be multiplying it by 10 to get seconds?

update: Ahh, I get it. You'd need to divide it by 10 to get seconds.

~~~
garblegarble
I think they mean an integer storing tenths of a second (so 10 = 1 second) ,
so 10 * (1/10) == 10 * 0.1 == 1.0

~~~
dicroce
So, shouldn't it have just been a divide by 10?

~~~
foldr
Multiplying by 1/10 is the same as dividing by 10.

~~~
russell
And faster too. For older processors without hardware divide, much much
faster.

------
6d0debc071
> Ironically, the fact that the bad time calculation had been improved in some
> parts of the code, but not all, contributed to the problem, since it meant
> that the inaccuracies did not cancel.

\-----------

I wonder what trade-off informed their decision not to have the time code in
one place and pass it around as they needed it. Or at least defined in one
place and then inserted where they needed it if there was some requirement
that it be right there in the functions that wanted to know about time.

:/

Just seems a danged odd thing to do.

------
singold
And I didn't want to take that Numerical Methods course...

------
adolgert
There's an algorithm to deal with this kind of additive small error, the Kahan
Summation algorithm. It keeps a second number to record the residual of an
addition so that adding something small to something large doesn't go awry.
With modern floating point, it's rarely an issue, but with a mere 24 bits, I
guess it was a terrible problem.

~~~
pedroo
This sounds similar to the technique used in the integer-only version of
bresenham's line drawing algorithm.

------
danbmil99
I had a very similar bug in some software I wrote and used to perform in a
band. The upshot was that everything worked fine at soundcheck; 6 hours later,
the tempos were off by about 8%, and varied pseudo-randomly during each song.

The fix was to switch from float (24 bits precision) to a long int (32 bits),
and to reset all the relevant vars between songs.

Fun times!

------
bruceb
What is next? Iraqi soldiers didn't take babies out of incubators as Congress
and the American people/world was told?

Oh yeah, that was a lie also.
[http://en.wikipedia.org/wiki/Nayirah_%28testimony%29](http://en.wikipedia.org/wiki/Nayirah_%28testimony%29)

------
abruzzi
Dumb question, but could you solve the problem by simply using 1/16th of a
second, not 1/10th?

------
caycep
I'm impressed the GAO investigators got into pretty technical detail.

For a government agency, I know the GAO reports are generally considered
objective, and nonpartisan. For politico experts - how has the GAO managed to
not go, say, the way of the EPA

------
systematical
Next time I have a "bad" bug I'll try and remember this.

------
midas007
So if one happens to own or operate an early Patriot battery, wait until the
last possible moment to fire it up so accuracy doesn't go to shit. Wow.
Engineering maths fail.

