
“Has something ever failed in the field due to a bug in a CAD tool?” - dang
https://twitter.com/paulg/status/726140975530151936
======
sammcf
I had a tremendously embarrassing failure in a sheet metal part designed in
Solidworks due to a bug in the way bend allowance is calculated. The long
story short is that the flat pattern developed by SW was about 10%
dimensionally inaccurate which resulted in a rotating part binding up in the
field and going blammo. This was years and years ago as a fresh young
designer, so I reported the issue to my reseller and the response was
essentially "yeah, you're gonna have to take that into account." I've since
learned that this sort of idiosyncratic difference in calculating certain
values, particularly things like bend radii and bend allowance, things like
beam extensions on structural frames, etc, is par for the course for CAD
packages. Pretty much every engineer I know has learnt this lesson the hard
way.

I guess it's not really a bug per se, just hidden behaviour that you have to
learn about and take into consideration.

~~~
rtpg
I don't get this though...

Like if the software calculates bend radii that will break things, isn't that
a bug? What's the explanation for that even happening? Is it just that the
formula involved means that you can't always get a "right" answer, so the
software will be like "well, this is the best we can do, figure out the rest"?

~~~
chris_wot
So what are you gonna do? Move to another CAD package? Take a look at the
reasonably large number of CAD packages listed on Wikioedua here:

[https://en.m.wikipedia.org/wiki/Comparison_of_computer-
aided...](https://en.m.wikipedia.org/wiki/Comparison_of_computer-
aided_design_editors)

Virtually all of them are proprietary. They are too expensive to move away
from, and every one of them has flaws that firms workaround and build there
workflows on. Retraining, proprietary file formats and the disruption of
migrating an essential part of most businesses is prohibitive.

Frankly, IMO the CAD marketplace is an example of how closed source really
bites an industry. Funnily enough, most engineers - who would never accept
inaccurate tolerances on items like screws or bolts - seem to accept this is
the way it is and despite the stories we read here let the situation continue.

I'm frankly amazed there aren't more failures in infrastructure from bugs in
CAD software. That's more a testament to the fact that engineers are engineers
though - when they realise there are problems, they dig in, figure out where
the problem is and find a solution, and in the case of CAD software I really
think they build or design their way around the problems.

Overall, CAD software is, despite all the flaws of being coded source, still
pretty amazing and vendors do fix issues eventually. We can do so many more
things in society because of it, but as we evolve as a society it becomes
apparent there are limitations. Eventually CAD software vendors will be
disrupted, but we're a long way from that point right now!

~~~
Niksko
> Eventually CAD software vendors will be disrupted, but we're a long way from
> that point right now!

Not so sure we're that far off. Nobody really has much incentive to innovate
in this space because it's so niche. If you're a believer in the version of
the future where 3D printers are as commonplace as 2D printers are today, then
this may all change.

Suddenly, 3D modelling skills are today's word processing skills (either for
work or for fun), and there are a whole bunch of people working on coming up
with innovative CAD solutions for the masses.

~~~
CompelTechnic
I would argue that SolidWorks has been doing a good bit of innovation. The
fact that its been stealing more and more market share over time speaks to it.

~~~
auxym
Each new version of SW introduces new "features" (some genuinely useful, some
shiny marketing gimmicks) and a host of new bugs. Software churn at it's best.
Gotta release a new version to make money, just fixing bugs isn't profitable.
Oh, and make sure file formats are not backwards compatible, even if for no
good reason but to force upgrades.

------
kazinator
The answer is a resounding YES, from me right here. Wait, did you say in the
_field_? Maybe not exactly then.

I routed a two-layer PCB (printed circuit board) using a buggy program whose
DRC (design rule check) did not catch the fact that a plated through-hole
meant went right through an unrelated track in the bottom layer.

Yes, and so the couple of boards I had fabbed failed. Now that is not in the
field. But it's not hard to imagine that such a thing could escape into the
field; it's almost in the field.

I didn't catch that until I stuffed one board with all the components.

The workaround was to drill out that through hole slightly wider, removing the
metal connecting the layers, but leaving plenty of the pad in the correct
layer to which that hole belonged.

This was not a case of logic not implemented in the DRC, but a bug. I tested
the issue by adding such mistakes on purpose, and DRC was flagging them
properly.

In every other regard, the boards worked perfectly; the fix in the artwork is
just to move a track a few millimeters over not to intersect with unrelated
through-hole.

Of course, the responsibility was mine to check the artwork and not trust the
tool. But then, if you can't trust the tool, what good is it? What is the
value in a design rule check which fails, say 1% of the time. Does it still
increase productivity? I believe that it does, during the development of the
artwork. Any time the DRC catches something, it saves you time. It's just that
at the very end when you have the thing fabricated, you have to make a couple
of passes over it with your own eyes. You don't have to be that careful at
absolutely every stage of the routing, because the DRC "has your back"
(mostly).

It's the same with compilers. Just because they don't catch every error with a
diagnostic doesn't mean diagnostics aren't useful. They are still useful even
if they are buggy, in that certain cases of something that requires a
diagnostic isn't diagnosed. But then who reads tens of thousands of lines of
code to spot where some compiler failed to diagnose an invalid pointer
conversion without a cast, you know?

------
akavel
Not a straightforward answer, but this reminded me again of the "RISKS
Digest", or "Forum On Risks to the Public in Computers and Related Systems", a
long running and still active free e-periodical about various computer-related
incidents to infrastructure in real world. Highly recommended!

[https://groups.google.com/forum/#!forum/comp.risks](https://groups.google.com/forum/#!forum/comp.risks)

From
[https://en.wikipedia.org/wiki/RISKS_Digest](https://en.wikipedia.org/wiki/RISKS_Digest):

 _" RISKS is concerned not merely with so-called security holes in software,
but with unintended consequences and hazards stemming from the design (or lack
thereof) of automated systems."_

------
analog31
I suspect that on a percentage basis, the incidence of field failures due to
CAD bugs are vastly outnumbered by field failures for other reasons.

Also, perhaps by culture, we don't trust CAD enough to let it get us into that
kind of trouble, because we know that modeling always involves approximations
and assumptions. Truth be told, I rarely see CAD used to determine critical
design margins, without at least a sanity check.

------
ms013
Finite element methods are often used to validate that a structure meets its
expected requirements, and are part of the CAD workflow. The Sleipner-A
offshore platform failed and sank due to bad finite element analysis.

See:
[https://www.ima.umn.edu/%7Earnold/disasters/sleipner.html](https://www.ima.umn.edu/%7Earnold/disasters/sleipner.html)

~~~
Hondor
That was the analyst making a mistake, not a bug in the software. They used
poorly shaped or too few elements which lead to an underestimate of the
stress. This has always been a well known limitation of the finite element
method and any competent user will know about it and the techniques to protect
against it.

Here's a slightly more detailed article

[http://journals.library.mun.ca/ojs/index.php/prototype/artic...](http://journals.library.mun.ca/ojs/index.php/prototype/article/download/422/567)

~~~
sitkack
Edit: Wasn't just a matter of not enough elements, but that the software
couldn't handle the geometric configuration of the tri-cell. And that the
computed values were extrapolated from the simulation, compounding the flaws
in the simulation.

\--

Seems like this could be avoided by automatically adding more elements and
rerunning simulation, measure the gradient of the solution as the number of
elements change. Exactly the kind of thing a computer should catch. And
exactly the kind of thing having alternative analysis should prevent.

    
    
      * used an unverified model
      * trusted software, didn't double check results
      * low safety margin
      * incomplete risk model, lack of project wide oversight

~~~
auxym
From my experience in the field, convergence studies are something everyone
knows they should do, but never actually happen because results are always
needed yesterday.

As for automating it, sure for a model like that with tens or hundreds of
elements, it would take seconds on a modern computer, great. Except, model
complexity always scales to reach the boundary of reasonable computation time
on modern computers. This has to be a corollary to Moore's law or something.
Recently I've been working on a model with tens of millions of elements. Takes
12-ish hours to solve a few load cases. A good convergence study might take
weeks.

Thankfully, finite element solvers have gotten much better at tolerating bad
element geometry, and at warning users for about geometry likely resulting in
bad results. Plus things like NASTRAN's integrated extrapolation of stresses
at element corners, instead of outputting only element stresses and having the
analyst extrapolate. A colleague of mine ran some tests a few years ago at a
customer's request, running a very coarse model with corner stress
extrapolation enabled. Results were plenty good, well within normal
engineering uncertainties and tolerances.

FEM was still pretty young in the early 90s and probably required very
experienced/knowledgeable engineers to use it. As with most things, it's
gotten more and more user frendly, and more and more software are taking very
automated and behind the scenes approaches to FE (ANSYS Workbench, Solidworks
Simulations, Simlab, etc). It has its good and bad sides.

~~~
sitkack
> A good convergence study might take weeks.

How about a bulk simulation service for running HQ models in a couple
different packages for a final check? I could imagine that being a regulatory
requirement for big buildings, bridges, aircraft, etc.

------
jotux
I designed a PCB once that had a pin on a regulator connected to a ground
pour. In the CAD tool the pour extended to the pad between two traces but when
I generated gerbers for the board the pour no longer connected to the pad. I
didn't realize this until I got the board back. The bug was known and fixed in
the next version but I didn't know about it at the time. Because of that board
I now review all gerbers before I send them out for fab.

------
fsloth
At least in construction to a large part the drawings are still considered the
ground truth. They may be produced by CAD tools and autogenerated from 3D
models. Generally they are read and re-read by various discipline specialists
until anything gets built. So no one usually is just blindly trusting the
output.

That said, mistakes happen in construction all the time. With or without cad.
So, things failing in the field is an 'accepted' facet of the process - in
that context it's probably very difficult to collect precise data which parts
are to CAD errors and which are of different sort - too much noise.

That said, any engineering office worth their salt will enforce a single
software version for a duration of a construction process. They all know there
will be horrible bugs, but then they have a process to take them into account
with expert users doing special configurations and vendors suggesting all
sorts of workarounds. It's still far more efficient to do it like that rather
than with pen and paper.

CAD is totally unlike this modern apps and ecosystems thing in so many ways.

~~~
cm2187
But in construction you typically keep huge safety margins, don't you (like in
case someone left a dead body in the concrete!)?

~~~
fsloth
(Note:I'm not personally in construction but my daily job involves
implementing features for various CAD products.)

Sure, but a design error is still a design error. The capability to recover
from design errors might be there but it does not mean unexpected flaws won't
incur an overhead to costs and schedule.

------
jpollock
My understanding is that the Therac-25 UI was essentially a CAD program that
allowed the operator to design treatment programs.

The software failed to prevent the operator from entering invalid treatment
programs which then killed patients.

I seem to remember that this was made worse by operators using a shortcut from
a previous version which was safe because of some additional hardware, but the
hardware lock was removed in the 25, and it killed people.

[https://en.wikipedia.org/wiki/Therac-25](https://en.wikipedia.org/wiki/Therac-25)

~~~
kazinator
> _Therac-25 UI was essentially a CAD program_

Clever comment! But wait, is that really the Therac-25 you're thinking of? The
Therac-25 was basically just a bug-trap of assembly language race conditions,
that's all.

There was another similar case of a radiation machine in which the dose was
supposed to take into account the shield. The user could draw the shield with
a CAD-like plannning program. The users tried to draw a shield with a cutout,
and the CAD display led them to believe that the shapes were being subtracted.
(The filled path was drawn with the in-out rule or whatever.) The actual dose
calculation though ignored this and added the negative area as a positive.
Whoa, you have lots of shielding, crank up the radiation! Or something like
that.

This is not actually the radiation machine but the treatment "planning"
software. (It meets the definition of CAD: it's computer software, and it's
assisting in the design of something: the treatment.)

Gee what was this? Perhaps that circa 2000 incident in Panama?

[http://www.ncbi.nlm.nih.gov/pubmed/17199912](http://www.ncbi.nlm.nih.gov/pubmed/17199912)

This is a very informative report with all the details, including diagrams:

[http://www-pub.iaea.org/mtcd/publications/pdf/pub1114_scr.pd...](http://www-
pub.iaea.org/mtcd/publications/pdf/pub1114_scr.pdf)

(See P. 22 for instance)

~~~
chris_wot
You are seriously scaring the shit out of me. I'd be more scared, only my
Uncle was in charge of a standards body in a certain nuclear research
organisation several decades ago and a few of his old war stories would make
your hair turn white.

Thankfully when I did work for that same organisation they had completely
changed their culture, and things were considerably better, but still I
suspect that if you explore the bush around the facility (which I doubt the
police would allow...) you might see some unusual things.

------
goalieca
Yes. Microprocessors. But you probably mean things like bridges and buildings.

~~~
HorizonXP
Is that actually a bug in the CAD tool though? I always thought microcode bugs
were design issues.

Unless you mean simulation should've picked it up.

~~~
Gibbon1
Some guys I worked with got a mixed signal IC back and the inductors were off.
I don't know if it was a true bug[1] but the proprietary software used to
layout the inductors was giving values about 50% too large. Which meant the
inductors values were about 33% too small.

[1] The inductors were small spirals laid out on the top metalization layer. I
assume finite element analysis of those involves a lot of magic.

~~~
raverbashing
Yes, really, people can not put too much trust in IC resistors, to think an
inductor is going to have an exact value is just asking for trouble (but
probably not 50% off though)

~~~
Gibbon1
The little I know about RF circuits and inductors is that within 10% is
usually fine. Off by 30% though means your RF circuit isn't centered anymore.
With the IC mentioned the RF receive path was hosed. The guy that did the
inductor design also had the issue that a simulation took about 8 hours.

In their case once they knew how far off the inductors were with the process
being used they were able to adjust the target inductance and then do a re-
layout (couple of 80 hours weeks, and another full set of masks, no problem).
I think part of the problem was that for the previous geometry and design
frequency 1GHZ they were using their tool gave good results. New geometry and
2.4GHZ and it was wrong. No way to test either.

------
jonsen
_... the root cause of the failure resulted from inaccurate NASTRAN
calculations in the design of the structure._

[https://en.m.wikipedia.org/wiki/Sleipner_A](https://en.m.wikipedia.org/wiki/Sleipner_A)

------
boulos
Would you count either of Frank Gehry's fiascos (Stata Center Leaking, Concert
Hall Laser Beam) as caused by "bugs"?

I was going to say "No, ignoring your tools doesn't count as a bug" but now
I'm curious if the Disney Hall one (like the more recent building in London
[http://www.bbc.com/news/magazine-23944679](http://www.bbc.com/news/magazine-23944679))
was caused by only seeing approximation-based global illumination. If you
don't get caustics rendered correctly, you don't realize you'll melt a car ;).

------
arnarbi
Not a bug per se, but Airbus claimed some of the delays in the production of
A380 were due to incompatible designs from two different versions of the CAD
software used.

[http://calleam.com/WTPF/?p=4700](http://calleam.com/WTPF/?p=4700)

~~~
xythobuz
I've heard of this for the first time and just talked to a family member that
worked at Airbus and was part of the team using Catia V5.

They had heard of this story, which is told often, but it's just not that
simple. Problems like this should have been noticed and fixed much earlier
with the proper processes.

Now blaming Catia for the organizational problems seems too easy.

------
bpchaps
My dad does food manufacturing and loves to talk about this subject. Fact is,
if you pour sugar paste on something, don't expect anything under it to slide
away very cleanly. CAD doesn't do a very good job with sticky stuff. :)

~~~
chris_wot
I'd love to read about the physics of sugar paste on moving surfaces :-)

~~~
bpchaps
I wish I knew more about it, but what he's told me is that it's common to do
something similar to the pulling a sheet from a table trick in certain
manufacturing processes. Well, if you pour sugar goo onto something and expect
it to stay still... you're going to have a day just as bad as the wall. He
said it was one of the funniest things he's ever seen. Who doesn't love a high
speed pastry launcher? Management doesn't. :)

------
dang
I noticed that pg posted this and thought HN users might know some relevant
things.

~~~
AndrewKemendo
It's almost an impossible question to answer because it assumes that follow on
reviews and implementations would not QC or spot check the design through the
build process.

It happens a lot actually with CNC processes, if you transfer one drawing
between systems, but even then I wouldn't consider that a bug.

------
jameshart
Has a book ever contained a misprint because of a bug in a word processor?

Has software ever contained a bug because of a bug in an editor?

------
skmurphy
The premise of the question is flawed; by definition a field failure is a
failure of the organization and in particular the leadership. Gerald Weinberg
made this point many years ago: programmers cannot make million dollar
mistakes, only senior management can. Every person, every tool, every
individual process step is fallible. The key is to create an overall
methodology that detects and corrects errors.

------
jws
How about the SpaceX tank struts? These were presumably _designed_ with the
_aid_ of a _computer_ , and were not as strong as they were needed to be.

The news stories focused on testing, but testing is kind of the emergency fail
safe of product construction. The fix involved a redesign, not more testing,
so it seems safe to say the initial design was a failure.

It is also possible that the requirements were off.

~~~
hga
From their investigation,
[http://www.spacex.com/news/2015/07/20/crs-7-investigation-
up...](http://www.spacex.com/news/2015/07/20/crs-7-investigation-update):

 _The strut that we believe failed was designed and material certified to
handle 10,000 lbs of force, but failed at 2,000 lbs, a five-fold difference._

~~~
jws
I could read that as either evidence that the CAD computations were wrong, or
evidence that something completely different happened. For example, someone
might have taken a shortcut in manufacturing.

~~~
mikeash
Most of the struts performed to spec, with a fraction of a percent failing
under much less force than they should have. That sounds like a manufacturing
problem, not a design problem.

~~~
mng2
What I heard from a friend of a friend is that QA was not handled as well as
it could've been.

------
vermontdevil
Not bug in CAD but more of human oversight maybe?

[http://www.slate.com/blogs/the_eye/2014/04/17/the_citicorp_t...](http://www.slate.com/blogs/the_eye/2014/04/17/the_citicorp_tower_design_flaw_that_could_have_wiped_out_the_skyscraper.html)

~~~
sp332
Along the same lines, this sensor used for parachute deployment was designed,
installed, and tested (?) upside-down.
[https://en.wikipedia.org/wiki/Genesis_%28spacecraft%29#Recov...](https://en.wikipedia.org/wiki/Genesis_%28spacecraft%29#Recovery_phase)

~~~
sitkack
[https://en.wikipedia.org/wiki/Poka-yoke](https://en.wikipedia.org/wiki/Poka-
yoke)

------
imglorp
I would argue the Intel FDIV bug was one. This stuff gets simulated and
synthesized at length, using Cadence, Synopsys and other tools before it sees
tapeout.

[https://en.wikipedia.org/wiki/Pentium_FDIV_bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)

------
chollida1
Does the mars rover crash due to metric vs imperial units not being converted
count?

[http://www.cnn.com/TECH/space/9909/30/mars.metric.02/](http://www.cnn.com/TECH/space/9909/30/mars.metric.02/)

~~~
danso
I don't think so. From the summary posted on Wikipedia [0], it sounds more
like a routine software engineering interface error: the Lockheed Martin
system returned U.S. units, while the NASA system expected metric units.

Also, looks like classic project mismanagement made a vital contribution, too:

> _The discrepancy between calculated and measured position, resulting in the
> discrepancy between desired and actual orbit insertion altitude, had been
> noticed earlier by at least two navigators, whose concerns were dismissed. A
> meeting of trajectory software engineers, trajectory software operators
> (navigators), propulsion engineers, and managers, was convened to consider
> the possibility of executing Trajectory Correction Maneuver-5, which was in
> the schedule. Attendees of the meeting recall an agreement to conduct TCM-5,
> but it was ultimately not done._

[0]
[https://en.wikipedia.org/wiki/Mars_Climate_Orbiter](https://en.wikipedia.org/wiki/Mars_Climate_Orbiter)

~~~
sitkack
Which would have been prevented by using unit preserving calculations.

[http://futureboy.us/frinkdocs/](http://futureboy.us/frinkdocs/)

[https://msdn.microsoft.com/en-
us/library/dd233243.aspx](https://msdn.microsoft.com/en-
us/library/dd233243.aspx)

[http://unitsofmeasurement.github.io/](http://unitsofmeasurement.github.io/)

------
jokoon
I wonder if industrial software has enough developers, compared to web
developers, and if one could compare the added value of web versus the
industry.

I tend to think the web is more volatile and more oriented towards sales, ads
and services, which in my mind are somehow "less important" than industrial
software.

Of course every developer makes his own choice, but I'll always be more
condescendent towards web techs in general, versus industrial, sometimes
C-oriented programming. Maybe it comes from my technical teaching where I
learned programming.

------
hyperion2010
This doesn't happen much any more, but the case of the Vasa is a pretty good
example of a bug in the measurement hardware. Different, incompatible rulers
used during construction of opposite sides of the ship. Unfortunately that was
during implementation phase and not the design phase. I would guess that there
might be some errors in simulation routines that have cause structures to not
have been designed with the tolerances they needed, however I don't think that
counts as CAD either.

------
agumonkey
Is the situation improving or steady around the same number of issues ?

