
Toyota's firmware: Bad design and its consequences - JeremyBanks
http://www.edn.com/design/automotive/4423428/Toyota-s-killer-firmware--Bad-design-and-its-consequences
======
WalterBright
As I mentioned in another recent thread

[https://news.ycombinator.com/item?id=6615457](https://news.ycombinator.com/item?id=6615457)

engineers are often not aware of basic principles of fail safe design. I
mentioned Toyota, and this article confirms it.

Not mentioned in this article is the most basic fail safety method of all - a
mechanical override that can be activated by the driver. This is as simple as
a button that physically removes power from the ignition system so that the
engine cannot continue running.

I don't mean a button that sends a command to the computer to shut down. I
mean it physically disconnects power to the ignition. Just like the big red
STOP button you'll find on every table saw, drill press, etc.

Back when I worked on critical flight systems for Boeing, the pilot had the
option of, via flipping circuit breakers, physically removing power from
computers that had been possessed by skynet and were operating perversely.

This is well known in airframe design. As previously, I've recommended that
people who write safety critical software, where people will die if it
malfunctions, might spend a few dollars to hire an aerospace engineer to
review their design and coach their engineers on how to do fail safe systems
properly.

~~~
kabouseng
You already have that, it is called the ignition key. The fact that the driver
didn't think to either switch off the engine, put the gear in neutral, or slam
on the brakes, makes me think he also wouldn't think to press the big red
button.

Also pilots get proper training to handle their vehicles, car drivers not so
much.

~~~
owenmarshall
From the article, emphasis theirs:

Vehicle tests confirmed that one particular dead task would result in loss of
throttle control, and that the driver might have to _fully remove their foot
from the brake during an unintended acceleration event_ before being able to
end the unwanted acceleration.

Every one of those approaches you suggested are, in many modern cars, fully
software driven. And the article even shows an example of how a bug in the
software can only be resolved through _the exact opposite of what a rational
person would do_ in a crisis.

I think the only actual mechanical failsafe left is the handbrake. Please tell
me that's still sacred...

~~~
infinotize
Handbrakes are almost always mechanical cables, but they're almost definitely
not enough to stop a car under high engine output. They're mechanical and not
power-boosted (see comments on how much force is needed on the brake pedal
without power assistance) and plus, most people have at some point driven
around for a few miles before they realized that beeping sound was the parking
brake stuck on the whole time.

~~~
owenmarshall
Oh, of course - I've left the "emergency 'smell funny' lever" on before
(thanks Mitch Hedberg).

So you're probably right, it's a stretch to call the handbrake something
useful in emergencies when in reality it probably wouldn't perform that
function.

~~~
he_the_great
It's called a parking break now. Only useful when the car isn't moving :)

------
com2kid
Car firmware scares me at times. One day I started up my Kia Soul and the
transmission decided it wasn't going to shift out of first gear. A reboot
later and everything was working fine, and it hasn't happened since.

Still though, sort of horrifying. (Hopefully I hit some sort of safety or
fallback mode that can only occur on boot!)

It is with a mixture of fear and amusement that I observe the workings of my
car's firmware. (I can hear the interrupts fire in my stereo system when I
change the volume through the steering wheel control and the music skips, some
buffer wasn't full! I almost have a 100% repro worked out. :) )

Edit:

Having worked in Firmware for some time now, I can confirm that the skill
level of many embedded developers is not what I'd call stupendous. Now of
course most people are, on average, average, but embedded seems like a special
case.

The thing is, embedded systems have exploded in complexity in the last few
years. No longer are software projects worked on end to end by just a handful
of engineers, rather embedded engineering teams are being forced to learn the
lessons about properly scaling up software engineering that developers in
other areas learned long ago. A project with 16KB of space for code could be
written by 2 or 3 developers sitting next to each other, and it was reasonable
to keep the entire program state in one's head.

Now days? You can get Cortex M4 boards that look a damn lot like an actual
computer. Sure you don't have much RAM, but the code complexity is way up
there. You aren't just talking over an I2C bus to a couple of peripherals
anymore!

On top of this, more and more features are being shoved into cars through the
use of software. I talked to a developer of wiring harnesses for one of the
major auto manufacturers, he described to me exactly how the auto companies
see software as "the easy part" of things, which means they get the short
shaft in terms of resources (test, dev, time, budget, etc), but are expected
to bear most of the load of new feature work. (After all, it is so cheap to do
it in software!)

FWIW, this developer said he has gone back to purchasing older model cars, he
won't buy any of the cars running his own team's firmware.

~~~
mathattack
What's this do for your willingness to use a driverless car?

My outsider's impression is similar to your insiders - the complexity of
external systems is catching up with embedded systems, but perhaps not the
practices and talent.

~~~
derefr
I would trust car firmware written (entirely) by Google. They're a software
company, after all.

~~~
gaius
Umm, you can A/B test a new logo or colour scheme. You can't A/B test a
collision avoidance algorithm. What makes Google a successful ad platform
doesn't translate to making them a good safety-critical engineering company.

~~~
gknoy
I imagine you could, it just would be expensive, and owuld not involve the
customers at large.

------
kabouseng
Can Michael Barr be trusted to give an unbiased review. His business is after
all embedded software training and coding standards. So magnifying the
negative consequences of bad coding practices would play into his business
strategy...

(I am not saying hes verdict is biased, I am just stating he has cause to find
Toyota guilty).

Furthermore was Toyota's code really of such poor quality relative to the rest
of the industry, and relative to the economic realities of the market. I mean
it's all good and well to demand aerospace and medical device quality code,
but would the average consumer be willing to pay $1000 per LOC for his speed
control in his car? I very much doubt it.

~~~
CanSpice
Michael Barr didn't find Toyota guilty, a court in Oklahoma did. Michael Barr
provided testimony to that court.

If you're implying that Michael Barr somehow stretched the truth of what he
found, and did so under a legal oath, then I think you're implying that he
perjured himself, which is a pretty serious allegation.

~~~
ajross
Expressing an opinion via the use of concrete but in-this-case-not-actually-
demonstrated examples from the code is not normally considered perjury. He
believed what he said was true, and had good reasons for that belief.

That doesn't mean it's the truth though. I too read through that looking for
something more damning than I found:

The memory wasn't ECC and the code didn't do anything to mitigate that risk,
but that doesn't prove that a memory fault occured or even tell us what the
likelihood is. Toyota screwed up the stack depth analysis. But AFAICT no stack
overflow condition was found. Apparently some other stuff was found (probably
with a static analysis tool) on which the article doesn't elaborate.

These are bugs, and certainly don't make me feel good about the system. But
they're not a specific finding of fault either.

------
001sky
Slightly sensationalist. The story is driven by a plaintiff's lawyer and a
consultant suing a large BigCo with deep pockets. So nothing new there. The
tear-down into the ECU tho is slighty interesting.

 _Unintentional RTOS task shutdown was heavily investigated as a potential
source of the UA. As single bits in memory control each task, corruption due
to HW or SW faults will suspend needed tasks or start unwanted ones. Vehicle
tests confirmed that one particular dead task would result in loss of throttle
control, and that the driver might have to fully remove their foot from the
brake during an unintended acceleration event before being able to end the
unwanted acceleration._

------
salem
At the risk of a flame war, I'm going to state the obvious. firmware code
often has quality issues because it is often written by Electrical Engineers.

EE's often have little software engineering taught as mandatory subjects in
college, and often only get to learn good SW engineering practices on the job,
or via milspec/aerospace certification, if ever.

No disrespect is intended, it's just my experience after working with a lot of
otherwise very talented EEs.

~~~
comatose_kid
I've written production firmware (computer engineering background). In my
experience EE's don't usually write production firmware, but are responsible
for writing non-production verification loops and code.

Usually a computer engineer or software person works with the EE who developed
the board level hardware + ASIC engineers.

Of course, there exist EEs who are talented at writing software, but I wonder
how often EEs are tasked with writing production code.

~~~
vonmoltke
I am an EE, and I wrote several thousand lines of production code for an
airborne radar system. It wasn't even firmware; it was signal processing
application code running in RTLinux on commercial server hardware. Our
firmware was all written by EEs, though I ended up having to help the EE
tasked with the C code quite a bit.

~~~
salem
That sounds really awesome! But as a military system, this project likely had
a lot of mandated quality processes.

------
jonny_eh
Has it been ruled out that it wasn't just a case of the wrong pedal being
pressed? That is often the simplest explanation for cases like this. The
driver swears they were hitting the brake pedal, but that car just kept
accelerating! Not surprisingly, these cases often (but not exclusively)
involve elderly drivers, new drivers, or car rentals.

~~~
praptak
We'll never know, unless the pedal was being pressed hard enough to bend it
out of shape. Maybe it is an additional reason to make the system fail safe -
assume people will press the wrong pedal and then sue.

~~~
sliverstorm
The weird thing is we should be able to know! I recall at one point one of the
incidents was investigated, and they said that under hard braking at high
speeds, particularly sustained hard braking, certain chemical changes happen
in the surface of the brake pad due to the temps. They said they found no
evidence of those chemical signs.

~~~
WalterBright
> certain chemical changes happen in the surface of the brake pad due to the
> temps

In laymen's terms, we call that "burning" :-)

~~~
sliverstorm
Brake pads don't usually burn in cars.

------
strictfp
I just wsnt to add that not even formal methods absolutely guarantee absence
of errors. Firstly, there might be errors in the proof itself. Secondly, you
always make assumptions about the CPU, memory and other hardware components.
These might fail due to a variety of reasons: hardware bugs, electricity
disturbances, heat, radiation, physical damage and more.

~~~
stygianguest
You're right that formal methods do not guarantee absence of 'errors' in the
sense of unforseen unwanted behavior.

But formal methods, which typically are based on computer-checked proofs, can
help you to eliminate certain possibilities. They are severely underestimated
and underused because of our dependence on C.

The formal methods do not 'fail' as such. They just fail to prove anything
boyond the proven property. Such properties can very well (and often do)
include failure (of whatever kind) of parts in the system.

Apart from AI, which as an approach to embedded systems is almost by
definition the opposite, I only see one way forward from the mess we're in
now: formal methods.

------
tokenadult
"For the bulk of this research, EDN consulted Michael Barr, CTO and co-founder
of Barr Group, an embedded systems consulting firm, last week. As a primary
expert witness for the plaintiffs, the in-depth analysis conducted by Barr and
his colleagues illuminates a shameful example of software design and
development"

Well, okay, so the article kindly submitted here takes the position of
personal injury attorneys who just won a trial before an Oklahoma jury. And
perhaps that is the correct factual position about what Toyota did and what
Toyota should have done instead. (Disclosure: I am a lawyer, so I had law
school courses that trained me to think on both sides of issues that go to
litigation.) When "unintended acceleration" cases were first mentioned in the
news media, including one case that occurred here in Minnesota, I was very
wary of buying Toyota cars, and bought other brands instead. But as our
previous cars wore out, we bought Toyotas, and Toyota vehicles are what we
drive for all our driving now. I notice that both cars we bought have very
clear warnings near the floor mats about attaching those securely and not
using any floor mat that isn't attached securely. Toyota, from this point of
view, seems to be acknowledging that floor mats used to get jammed up against
accelerator pedals in a way that made cars hard to control. We have not had
any problems with our vehicles. The news stories about unintended acceleration
in Toyota cars seem to have diminished. Perhaps whatever was bad about the
former designs has been fixed.

~~~
InclinedPlane
That's a dangerous line of reasoning in this case.

I could be convinced that this court ruling is erroneous, and that the
unintended acceleration issues can be entirely accounted for by floor mats and
driver error. But in this case I think we should be thankful for bad floor
mats and driver error, as they've brought to light very fundamental flaws in
Toyota's firmware engineering processes.

If there had never been a single unintended acceleration in a toyota vehicle
it would not have been through robust engineering but instead through luck.
And we need our vehicles to be safe by design, not through happenstance.

~~~
revelation
If you can solve the halting problem, we can make software that doesn't
misbehave.

I'm not saying Toyota should be allowed to provide this shoddy piece of
software in a critical subsystem, but I very much think a) other vendors
software will be just as crappy and b) this feels like the court longing for
reasons to fault Toyota on something that was still very likely user error,
not software misbehaving.

~~~
comex
It's not necessary to solve the halting problem to have the kind of
protections aircraft have, let alone things like not drastically miscounting
the stack space, having memory protection against stack overflow, using ECC
RAM, not having 11,000 global variables, etc. Even if it was user error, this
isn't even close to a sane design.

------
cognivore
This is the same as saying, "Your web site has a slew of bugs in the
JavaScript for signing up for the newsletter and the CSS is terrible and
that's why it doesn't validate credit cards properly. They never point out the
actual code that causes the acceleration problem - they just malign the code
in general. Looks like a lot of lawyering by people who want to squeeze more
money out of Toyota.

Besides, why would bugs cause more problems when the driver is elderly
([http://www.forbes.com/2010/03/26/toyota-acceleration-
elderly...](http://www.forbes.com/2010/03/26/toyota-acceleration-elderly-
opinions-contributors-michael-fumento.html))?

~~~
jpatokal
I think you missed this bit:

 _Toyota claimed the 2005 Camry 's main CPU had error detecting and correcting
(EDAC) RAM. It didn't._

 _Unintentional RTOS task shutdown was heavily investigated as a potential
source of the UA. As single bits in memory control each task, corruption due
to HW or SW faults will suspend needed tasks or start unwanted ones. Vehicle
tests confirmed that one particular dead task would result in loss of throttle
control, and that the driver might have to fully remove their foot from the
brake during an unintended acceleration event before being able to end the
unwanted acceleration._

In other words, they used non-error-correcting memory, and the investigation
found a code path that would lead to the observed behaviour if a single bit
flips.

~~~
cognivore
>> ...driver might have to fully remove their foot from the brake during an
unintended acceleration event before being able to end the unwanted
acceleration. <<

"Might" is a weasel word you use when you don't have proof. Are they saying
that the the system is non-deterministic? Seriously, you can say it "might"
cause the problem, or you can run tests that cause the problem. Even if it is
non-deterministic, you could run 1 gazillion test and get a percentage. Then
you could figure out, based on the amount those system get used, how often
you'd expect acceleration to be uncontrolled.

And it still doesn't explain why it mostly happens to the elderly.

------
_yosefk
So... what's the actual bug? "Cyclomatic complexity" this, "single point of
failure" that sounds very damning, but if you don't see the bug it's sort of
less convincing. I mean, I can look at any code and say it's too sloppy to
work.

~~~
JeremyBanks
You're right, this article doesn't go into much detail. However the expert
witness' testimony does, and it's not pretty:
[https://www.dropbox.com/s/wnzqidngrtj8y2l/Bookout_v_Toyota_B...](https://www.dropbox.com/s/wnzqidngrtj8y2l/Bookout_v_Toyota_Barr_REDACTED.pdf)

He found massive failures in all of the safety systems, and successfully
demonstrated that a single bit flip could cause the task responsible for
controlling the gas/fuel mixture to stop running, preventing the driver from
decelerating the car. The safety mechanisms in the car would entirely fail to
catch this, and at this point Toyota wasn't using error-correcting RAM, so
it's not entirely implausible.

He found many possible buffer overflows, stack overflows, race conditions, and
unsafe casts that could lead to memory corruption or logic errors. He went on
at length about bigger-picture design flaws in the way that their failsafes
were implemented, rendering them often useless. They explicitly ignored error
codes from the operating system which indicated that things were going wrong,
as well as from their own code which was warning them that the CPU was
overburdened and necessary tasks my not have been completed.

He testifies that Toyota has no real bug tracking system, no consistent code
review, and had countless violatings of both their own safe coding standards,
and other standards which they had had contributed to.

The corresponding Reddit discussion at
[http://www.reddit.com/r/programming/comments/1pgyaa/](http://www.reddit.com/r/programming/comments/1pgyaa/)
may also be of interest.

~~~
_yosefk
None of this is finding the bug. RAM bits almost never flip, and "a single
bit" \- "the" bit he pointed to - flipping on multiple occasions is a virtual
impossibility. As to all those other things: yes, bad stuff, did he see how at
the "small picture level" these faults could cause the problem though? The
whole thing is extremely unclear and blaming all those different things smells
of not having actually understood the problem.

------
zyztem
Previous report on the issue by NASA (PDF, 177 Pages)
[http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA-
UA_report.pdf](http://www.nhtsa.gov/staticfiles/nvs/pdf/NASA-UA_report.pdf)

------
andrewcooke
_11,000 global variables_

 _absence of any bug-tracking system_

wow. and this is toyota.

[edit:
[http://en.wikipedia.org/wiki/The_Toyota_Way](http://en.wikipedia.org/wiki/The_Toyota_Way)
[http://en.wikipedia.org/wiki/Toyota_Production_System](http://en.wikipedia.org/wiki/Toyota_Production_System)
]

~~~
fffernan
Could it be that your understanding of Toyota quality is based on millions and
millions of dollars of marketing money spent each year. And then it turns out
they are just like every other large global company.

~~~
comatose_kid
Based on your reasoning, why don't all other automakers (all have large
marketing budgets, GM spends even more than Toyota) have similar reps for
quality?

Toyota's rep has been well earned, there's plenty of literature in operations
about this, as well as popular trade coverage (i.e.,
[http://www.nytimes.com/2013/10/29/automobiles/japanese-
autos...](http://www.nytimes.com/2013/10/29/automobiles/japanese-autos-lose-
ground-in-consumer-reports-reliability-ratings.html))

Their software process wasn't given the same attention, and they've been
burned by it. Embedded systems are really tricky, and a good process (design
through end-of-life) is key.

------
pnathan
That firmware written in C for a for-profit company is shoddy should surprise
no one. Rather, non-shoddy software really would be the surprise. There is a
tremendous lack of understanding of the 'rest of the software world' in the
embedded world, heightened by people who neither studied, wanted, nor trained
for developing software.

This is a perfect example of software development done without management
maturity, process maturity, and with inferior technical tools. That large
scale systems for life-critical services are written in ASM/C is horrifying.
That management did not enforce certification compliance is horrifying. That
the correctness process did not account for tin whiskers or ECC memory is
horrifying. That the engineers violated MISRA (which evidently they attempted
to adhere to) is less horrifying, but still bad.

~~~
pnathan
Followup: I found a transcript via Slashdot about this -
[https://www.dropbox.com/s/wnzqidngrtj8y2l/Bookout_v_Toyota_B...](https://www.dropbox.com/s/wnzqidngrtj8y2l/Bookout_v_Toyota_Barr_REDACTED.pdf)

Let's just say that this is an incredibly readable discussion on how to do
safety critical software wrong in many, many, many ways. Everything from using
binary blobs to using gratuitous amounts of globals (> 10,000 ?!?!) to not
having an issue tracker(!!?!?!).

At any rate, this document counterindicates buying a 2005 Camry.

------
novaleaf
This happened to me too (toyota minivan).

I was in stop-and-go traffic and went from accelerator to brake, and the
vehicle started revving it's engine to try accelerating. (gears were grinding
trying to shift to compensate for the slammed brake). The situation ended when
i lifted my foot off the brake and pumped it.

I took it to the toyota service and they said it was due to me pressing both
brake and accelerator at the same time. I was skeptical of my own user-error,
so I later tried to recreate the issue (by pressing brake+accelerator together
in various combinations) but was unable to repro it.

Now that I see this article, it makes me realize it's probably the firmware.

~~~
novaleaf
ps: this caused me to rear-end a taxi. only going at about 2km/hr so no
damage. I'm in Thailand so paid the driver about $20 and off he went.

------
srambo
Here's the testimony from the trial. They're starting to post it on EE Times.
First part is about Task X.
[http://www.eetimes.com/document.asp?doc_id=1319936&](http://www.eetimes.com/document.asp?doc_id=1319936&)

------
Fr0styMatt
So, a question.

Why does the industry use something like C? Even with the MISRA guidelines, it
still seems like the wrong tool for the job when you have things out there
like Erlang, which are designed for fault-tolerance and concurrency from the
get go. I can imagine the requirement for hard real-time operation might be an
issue (not sure enough about Erlang specifically to say if it does or doesn't
address or hinder this, so I'll leave that to someone else).

What's that industry moving towards, software wise, to address this? Is Ada an
alternative?

C (even MISRA) just seems like a horribly risky thing to use in this scenario.

~~~
arink
Primarily size. When you have 256k or less to work with, you don't have much
choice.

Specific to Erlang:
[http://www.erlang.org/faq/implementations.html](http://www.erlang.org/faq/implementations.html)
(see 8.9)

~~~
Fr0styMatt
Ah, thanks for the link.

It's easy to forget how small the RAM budgets still are in some parts of the
embedded world.

------
Spooky23
Sometimes I question the value of going digital for systems like this. Does
the benefit gained over an analog equivalent justify the risks associated by
using a black box that cannot be qualitatively evaluated or examined by the
layperson?

My friend had a 50s Ford when I was in high school. The throttle control was a
lever and rod attached to a couple of springs.

~~~
gvb
Purely mechanical does not mean failure free. I've had the throttle linkage
stick such that the spring was insufficient to drive the throttle back when I
removed my foot from the gas. Fortunately, the ignition key truly disabled the
ignition (hard-wired). The thing you had to be _REALLY_ careful of was to not
twist the key back so far as to engage the anti-theft steering lock mechanism
('70s and newer?).

~~~
drbawb
I've had a similar "total clutch failure" before.

1993 Mustang's have a plastic ratcheting mechanism that picks up slack on
their [very heavy] clutch cables.

The teeth on the plastic ratchets eventually wear out (duh....)

If you're FORD, you design it so that this piece fails catastrophically,
releasing all tension on the clutch cable. (Instead of, say, hitting a bump-
stop that keeps some minimum amount of tension on the clutch cable.)

No tension on the clutch cable = no clutch. No clutch on the OEM T5 w/ a
stock-spec clutch = good luck with the next downshift.

(For anyone interested: firewall adjuster + aluminum clutch quadrant = sweet
deal.)

\---

However I'd still argue the purely mechanical systems are _objectively better_
in that they lend themselves to preventative maintenance.

Had I thought to look: I would've seen the _very_ worn teeth on my clutch
quadrant. I knew the clutch cable was old, rusty, and starting to bind in the
sleeve, which probably led to the wear of my clutch quadrant. As for a
throttle cable: not only can I feel it binding in the pedal, but it's fairly
easy to visually inspect a throttle cable or spring for faults, rust, etc.

In addition mechanical fixes are all radically simpler. If my throttle is
sticking, I lubricate the cable and replace the return spring. If your
computer controlled throttle is sticking: you hope it's (A) an actual bug and
not intended behavior, (B) a bug that the mfr. is aware about, (C) a bug that
a patch is available for, (D) that the ECU flash can be applied free or
inexpensively.

Say you're an enterprising embedded electronics engineer and you wanted to fix
it yourself? If you try to modify an automotive computer: you've just tampered
with an emissions control device. Your car is no longer street legal in the
United States.

Purely mechanical systems by their very nature are perfectly transparent.
Proprietary computer software is almost always a "black box." \-- Due to
federal regulations though: you have absolutely no legal way to repair or
replace it on a street-driven vehicle. You are stuck with software you cannot
see, understand, or control.

------
microcolonel
More people not ordering independent certification of life-critical systems
they're purchasing, then blaming everyone but themselves when it kills
somebody.

Get your manufacturer to pay a respectable firm to certify their software
rather than just their mechanical hardware.

------
infinotize
This kind of thing makes me never want a car without a keyed ignition and to a
lesser extent (and for other reasons), a clutch pedal. Heavy machinery should
have a hardware-based killswitch. I'm kind of amazed this isn't regulated.

~~~
jonny_eh
Having a clutch also prevents accidental acceleration due to hitting the wrong
pedal, which is much more likely to occur than a firmware glitch.

------
mark-r
When Toyota started doing recalls on floor mats to get rid of unintended
acceleration, I assumed they had no idea what the real problem was. I always
suspected something like this.

------
bhewes
If this is a case of bad design in cars we have come a long way. It was not
long ago that the Ford Pinto was catching fire due to a known bad physical
design.

------
mcantelon
Perhaps an open hardware car will emerge at some point as a reaction to over-
complicating vehicles.

~~~
bradyd
Open hardware cars already exist:
[https://en.wikipedia.org/wiki/Rally_Fighter](https://en.wikipedia.org/wiki/Rally_Fighter)

~~~
mcantelon
Nice! Would be great to see an open design become popular.

------
agumonkey
Is there some kind of lagrangian programming paradigm where systems have many
paths toward lower levels of energy at any point in time ?

------
qwerta
And people call me paranoid because I refuse to drive cars without physical
kill switch.

