
Toyota's killer firmware: Bad design and its consequences (2013) - Sanddancer
http://www.edn.com/design/automotive/4423428/Toyota-s-killer-firmware--Bad-design-and-its-consequences
======
daniel-levin
>> The Camry ETCS code was found to have 11,000 global variables. Barr
described the code as “spaghetti.” Using the Cyclomatic Complexity metric, 67
functions were rated untestable (meaning they scored more than 50). The
throttle angle function scored more than 100 (unmaintainable).

>> Toyota loosely followed the widely adopted MISRA-C coding rules but Barr’s
group found 80,000 rule violations. Toyota's own internal standards make use
of only 11 MISRA-C rules, and five of those were violated in the actual code.
MISRA-C:1998, in effect when the code was originally written, has 93 required
and 34 advisory rules. Toyota nailed six of them.

How the ACTUAL FUCK did this happen!? The article makes Toyota's engineering
team seem egregiously irresponsible. Is it typical for vehicle control systems
to be this complicated? I would love to hear the other side of the story (from
Toyota's engineers). Maybe the MISRA-C industry standard practices are
ridiculous, out of touch and impractical.

~~~
imglorp
I'm going to go out on a huge limb here and am prepared to get shot down. I've
seen this first hand as inheritor of spaghetti firmware.

The limb is: don't let EE's lead a firmware group.

It's a natural division of labor. The hardware guys are familiar with all the
component datasheets and the bus timings and all the other low level details.
If the prototype is misbehaving, they'll grab a scope and figure it out.
They're probably the only guys who can. Software is an afterthought, a "free"
component.

Those same EE's who have performed that role get promoted to buck stoppers for
hardware production. They're familiar with hardware design, production,
prototyping, Design for Manufacturability, and component vendor searches. They
might cross over with production engineering and that six sigma goodness.
They're wrapped up in cost-per-unit to produce which is direct ROI. Software
remains a "free" component; you just type it up and there it is.

The culture of software design for testability, encapsulation, code quality,
code review, reuse, patterns, CMM levels, etc etc is largely orthogonal to
hardware culture.

~~~
sliverstorm
As a hardware designer I agree that hardware people are typically not very
good software engineers.

I find we can reliably write small programs that will do everything they need
to. But as the program grows, we are not adept at managing the explosion in
complexity.

However I don't believe that's because hardware culture doesn't _value_
software or anything like that. We just have no proper education in software
engineering.

Speaking for myself, I grasp most fundamental software concepts. Memory
structure, search algorithms, that sort of thing. I appreciate the value of
abstract code qualities like testability, simplicity, reusability, etc.

I just have no actual education in how to design large programs from scratch
that will achieve those goals.

~~~
yason
I don't think it's about education. It's just that software isn't what
hardware people mostly work with and you only get experienced in what you
mostly work with. And when you're experienced, you gain some insight into how
things will work.

I know some basics of EE but I have no gut feeling of how to build or analyze
any hardware for real. It's just something I've read and made my mind to
understand but it's not something I know.

Conversely, I've been writing software on the lowest to highest levels for
decades and I have a hunch of how to start building something that will
eventually grow really big and complex. I don't know exactly how I do it but
depending on what I want to achieve I have, already at the very beginning, a
quite strong sense of what might work and also _what will definitely NOT_
work.

My take on hardware is that it's mostly a black box that usually _does most
of_ what's advertised (but workarounds are regularly needed, and I guess it
must be really difficult to build features into silicon and have it work 100%
as designed), and these quirks had better be encapsulated in the lowest levels
of the driver so that we'll get some real software building blocks sooner.

This would be no basis to design hardware. I would make such a terrible mess
out of it that even if I managed to design a chip and make it appear to work
somewhat, it would fail spectacularly in all kinds of naive cornercases that a
real EE would never have to solve because s/he would never venture to build
any of them, just because s/he would know better from the start already.

------
cpeterso
How much code reuse is there between the firmware in a manufacturer's
different car models? I'm surprised there isn't a consortium (like Symbian
was) to create a standard firmware kernel to share the benefits (and costs) of
testing and auditing the code.

I'm also surprised C is still so commonly used for mission critical software.
I understand that C is familiar, has many static analysis tools (to make up
for the language's deficiencies), and has a straight-forward translation from
C to object code (though only when using simple optimizations). For example,
if MISRA-C's coding guidelines disallow recursion, why not design a language
that only supports DAG function dependencies?

~~~
agoetz
OSEK is a an industry standard rtos that is used by almost all automotive
players. It is specifically designed for use in the automotive environment.
Toyota actually claimed to use an OSEK compliant rtos, but it later surfaced
in this lawsuit that they had written their own implementation that was never
certified by an outside organization. OSEK is in the process of being
superseded by AUTOSAR, which defines much more than just the os, and included
a large hal that allows for plug and play middleware libraries. Unfortunately
it isn't economical for every ecu to make use of AUTOSAR: it has heavy
resource requirements (> 2Mbyte RAM) and so many applications don't use it.

Also on the horizon is ISO26262, which mandates quality assurance for
automotive embedded code in the form of paper trails. Unfortunately due to the
huge amount of work required by the standard, some automakers are choosing to
ignore it and hope it doesn't become mandatory.

~~~
cpeterso
Thanks! I'll read more about OSEK and AUTOSAR.

------
kw71
I have been upset with Toyota's designs, product quality, and business
practices since I bought my first (and last) Toyota vehicle. Mr. Barr's
testimony reassures me that I've made the right decision in writing them off.

One item I'd like to point out is that Mr. Barr criticizes the lack of
hardware logic to close the throttle if the driver rides the brake.

In 1987, BMW introduced the 1988 model year 750i, with a V12 engine that has
two intake manifolds and two throttle valves. The engine controls and
electronic throttle system was made by Bosch. Whether the logic is in hardware
or software, I don't know, but it doesn't take long for it to slam the
throttles shut if you hold the brake pedal down. When this happens when the
engine is delivering power, it's a very severe shock as engine power is
removed and stops working against the brakes.

Due to the time it takes to go to manufacturing from the design stage,
obviously the Germans had this figured out in the middle of the 1980s decade.
BMW went on to use the electronic throttle system in a bunch of 90's model 5
and 7 series cars with normal six cylinder engines. The only problems that I
know this system, called EML, caused were for owners and technicians who did
not understand how the system works. In other words, when this failed, the
result was that the car would not go anywhere.

And a colleague of mine with a late 90's Volkswagen also proved that the
throttle slams shut if he rides the brakes while requesting engine power with
the gas pedal.

Mr. Barr points out that in 2005, the Camry had no such logic. I seem to
remember a youtube video where Consumer Reports guys test this out in some
kind of Toyota, and they could have gone all day long until the brakes melted.

Toyota did not include this logic for safety even though it had been in cars
released to the market decades before. The ways that Toyota handled this
situation, everything they've done from blaming the operators, the
potientometer supplier, the floormats, to their brazen delays in producing
discovery, reinforce my bad experiences owning a Toyota and my conclusion that
they are a bad actor.

~~~
pedrocr
_> And a colleague of mine with a late 90's Volkswagen also proved that the
throttle slams shut if he rides the brakes while requesting engine power with
the gas pedal._

Maybe automatic cars do this but when driving a stick shift a common advanced
driving technique is to heel-and-toe on downshift so that you can rev up the
engine to the correct RPM to not upset the car. That requires revving up the
engine under braking. As far as I know this is still possible in most cars.

The usual guarantee is that the brakes are always specified to be able to
overpower the engine even at full throttle. So even if you have a stuck
throttle for some reason you should always be able to safely stop just by
standing on the brakes, even if it takes you a little longer. Naturally in a
manual car you should just stand on the brakes and clutch for emergency
braking and then the engine is completely disconnected no matter what the
throttle is doing.

[1] [https://en.wikipedia.org/wiki/Heel-and-
toe](https://en.wikipedia.org/wiki/Heel-and-toe)

~~~
serf
blipping the throttle while clutched to match RPM between the forced road
speed and given engine RPM during a gear shift to alleviate weight-shift(heel-
and-toe) is still possible on modern cars, but that is not the technique that
is being mentioned.

the technique being mentioned is more akin to 'left-foot braking', a technique
used to pivot the weight balance from the rear to the front in order to induce
certain driving characteristics that may be beneficial in a turn. It is a
common technique to balance out the inherent understeer of a front-wheel drive
car to a more neutral balance mid-turn in an effort to reduce lap times. It's
quite common in rally.[0]

and as was said, _many_ cars disallow left-foot braking now; with the worst
responses triggering semi-permanent CELs (CELs which require mechanic
intervention, as opposed to being clear by more drive-cycles).

Probably not a bad idea, as it's an advanced technique that quite easily
upsets a car mid-turn, and is often just compensatory for an ill-setup race
car.

[0]: [http://en.wikipedia.org/wiki/Left-
foot_braking](http://en.wikipedia.org/wiki/Left-foot_braking)

~~~
pedrocr
> the technique being mentioned is more akin to 'left-foot braking'

Both heel-and-toe and left foot braking require revving up the engine under
braking. The difference between the two is if the car is in gear and the
clutch engaged at the time. Maybe modern cars have enough sensors to know if
the engine is actually driving the wheels so they can disable the throttle
under braking in those situations.

~~~
kw71
> Maybe modern cars have enough sensors to know if the engine is actually
> driving the wheels so they can disable the throttle under braking in those
> situations.

This is possible, any car with traction control has wheelspeed sensors for all
wheels, and manual shift cars have a pushbutton switch on the clutch pedal arm
(for enabling the starter, or for a car with electronic throttle, for
cancelling the cruise control.)

~~~
pedrocr
Those are not enough. You need a sensor in the gearbox itself to know if you
are in gear. Otherwise you can have the clutch engaged and the wheels spinning
but not have the engine connected to the wheels as naturally happens in the
middle of a double clutch shift.

~~~
kw71
Couldn't it work by checking that crankshaft speed increases remarkably
without any corresponding increase in wheel speed?

I don't know how racetrack drivers do this, but when I downshift, the
crankshaft has some opportunity to fall a little while I get the gearbox in
neutral and engage the clutch again. Then the crankshaft speed rises to 4000
rpm or more.

------
trhway
i didn't get whether they were able to show an actual path of how UA could
have occurred (like for example in case of Arian 5 or that Canadian radiation
machine - that is what makes those cases useful and supports all the good
recommendations produced as result), nevertheless the slant seems to be clear
... Google self-driving car has just become significantly more expensive and
several years more further into the future. And Toyota i guess is going to
have much more people hired into its engineering department who isn't going to
do actual engineering (Senior rule 62 compliance engineer) and produce a lot
more and better TPS reports NASA-style instead :)

~~~
Sanddancer
The problem with comparing it to the Arianne 5 or Therac-25 is that those were
situations where there was a single bug that you could blame. Toyota's
software had so many bugs that it was impossible /to/ tell what paths would
lead to UA, just that there was no accountability and quality assurance.

Regarding your second point, similar complaints were made about other safety
features, like mandatory turn signals or airbags. Yes, there will be
additional time spent ensuring that they are compliant with safety standards.
But in this case, a lot of that time should have already been taking place.
Some code needs more review time, a second or third or even fourth set of eyes
on it. When people can die because of a bug, then the bar for what is
acceptable code can and should be higher.

~~~
trhway
>Toyota's software had so many bugs that it was impossible /to/ tell what
paths would lead to UA,

your statement sounds kind of contradictory to me. If there many - just choose
any path. It may be impossible to say which one actually happened, yet should
be possible to show an actual path, at least one, that could plausibly lead to
UA.

------
Osiris
For systems that are so critical to the safety of human life, I'm surprised
that there isn't a mandatory third-party review of the firmware source code.

~~~
LVB
Is there evidence that auto firmware issues are a major cause of injury,
enough to mandate such a thing? There is already a heavy burden to comply with
standards.

I recent called our city's public works dept to propose changing the traffic
signage/signals at an extremely confusing and potentially dangerous
intersection (every person I've talked to about it hates making the turn
there...). The lead traffic engineer was very understanding and agreed it was
a terrible intersection. But the hard stats showed few accidents there, so
he'd never be able to redirect money from more problematic areas...

~~~
Bjartr
Interestingly, the fact that the intersection feels confusing forces people to
pay close attention to figure out what's going on. That can lead to fewer
accidents in some cases.

I can't currently find it, but I once read about a roundabout that had
accident problems, but had very good signage. People felt entirely in control
going into it and weren't paying as much attention as they could be. By
removing the signage people wouldn't inadvertently take for granted the
behavior of the other drivers, they'd slow down more, they'd pay closer
attention, and as a result the accident rate dropped.

------
ck2
One of the fortunate things about driving an older car is mechanical
accelerator and mechanical steering.

I fear for when I have to upgrade someday to all drive by wire.

As coders we all know how bad some code can be or even how the best code has
flaws.

------
CodeWriter23
Recursion, though it gives me a big programmer woodie because of the beauty of
the code, in my opinion is a poor programming practice. You can always use a
loop in lieu of recursion.

~~~
Fr0styMatt8
Why do you say that recursion is a poor programming practice (curious)? What
if the solution using recursion results in more readable or declarative code?
(especially with languages/compilers that do tail-call optimization properly?)

~~~
hliyan
It's the most optimal if readability is the goal, but not if reliability is
the goal. Programmers often assume ideal interpreters/compilers and runtime
environments. This is often not the reality, especially as the environment
gets closer to hardware. Recursive implementations in particular can more
easily expose peculiarities in the stack implementation, yes?

~~~
jowiar
Readability is a very important contributor to reliability -- minimizing WTFs
is important.

That said, writing code that fits the construction of the language is probably
more useful than siding between "loops or recursion". If I'm writing Scala, my
code is usually chock-full-of-recursion. If I'm writing C, not really as much,
because I'm thinking "which bits go into which memory". With Python I tend to
think in terms of list comprehensions, etc.

