
How to Swiftly Destroy a $370 Million Dollar Rocket with Overflow "Protection" - mpweiher
http://blog.metaobject.com/2014/06/how-to-swiftly-destroy-370-million.html
======
dalke
I don't think the author has read the Ariane crash report, which says:

> Although the source of the Operand Error has been identified, this in itself
> did not cause the mission to fail. The specification of the exception-
> handling mechanism also contributed to the failure. In the event of any kind
> of exception, the system specification stated that: the failure should be
> indicated on the databus, the failure context should be stored in an EEPROM
> memory (which was recovered and read out for Ariane 501), and finally, the
> SRI processor should be shut down.

> It was the decision to cease the processor operation which finally proved
> fatal. Restart is not feasible since attitude is too difficult to re-
> calculate after a processor shutdown; therefore the Inertial Reference
> System becomes useless. The reason behind this drastic action lies in the
> culture within the Ariane programme of only addressing random hardware
> failures. From this point of view exception - or error - handling mechanisms
> are designed for a random hardware failure which can quite rationally be
> handled by a backup system.

> ... An underlying theme in the development of Ariane 5 is the bias towards
> the mitigation of random failure. The supplier of the SRI was only following
> the specification given to it, which stipulated that in the event of any
> detected exception the processor was to be stopped. The exception which
> occurred was not due to random failure but a design error. The exception was
> detected, but inappropriately handled because the view had been taken that
> software should be considered correct until it is shown to be at fault. The
> Board has reason to believe that this view is also accepted in other areas
> of Ariane 5 software design. The Board is in favour of the opposite view,
> that software should be assumed to be faulty until applying the currently
> accepted best practice methods can demonstrate that it is correct.

There were also questions about why the code was even still in operation. The
functionality was useful under Ariane 4 but not Ariane 5, the code "serves no
purpose" after launch, analysis was done to check some variables for overflow
but not all variables, and the component's functional requirements weren't
updated to reflect the new launch trajectory data.

To say it's only because of an exception during type conversion ignores the
whole, very important context, which says that the exception wasn't the real
problem.

~~~
mpweiher
Yes, the author did read the report, did the esteemed critic?

"It was the decision to cease the processor operation which finally proved
fatal."

Which is _exactly_ what the author argues against: don't terminate the process
on integer overflow.

~~~
zvrba
Don't terminate a _mission-critical process_. Garden variety software: yes,
PLEASE, DO kill it.

You don't seem to realize that many of SW practices that are adequate for
"normal" software, are NOT applicable to mission-critical, fault-tolerant
scenarios.

~~~
mpweiher
What's _mission critical_ depends on the mission, it's not the language's job
to decide what is or is not mission critical.

Please tell me why killing my spreadsheet program is OK because the cat walked
on the keyboard.

~~~
pohl
If it's the case that your spreadsheet program can tolerate overflow on
addition, shouldn't that program be using the &\+ operator instead of the +
operator?

If, instead, detecting and responding to the overflow is what you want, then
someone has already filed a rdar on that and Chris Lattner has replied saying:
"It would be straight-forward for us to provide arithmetic functions that
return the result as an optional (i.e., nil on overflow). We'll take a look,
thanks for filing a bug!"

[https://devforums.apple.com/thread/232791](https://devforums.apple.com/thread/232791)

~~~
mpweiher
Sorry, I'd rather have a language with sensible defaults.

In this case, the sensible solution would be to have a proper Numeric tower
like, for example, Smalltalk, where such overflows simply don't matter. The
default "Int" type should be part of that system.

You can then add Int32/Int64 or maybe IntNative to get to the built-in
processor types, which then have built-in processor behavior, and possibly
arithmetic overflow traps as an extension.

The current defaults are just nuts, IMHO.

~~~
pohl
You've made that pretty clear in your writings (which I have enjoyed, by the
way.) I'm curious, do you want Swift to be that language - and, if so, have
you been participating in the dev forums[1] and filing bug reports into rdar?
Would you mind sharing the rdar number that you filed about this issue so that
others whose preferences lean in the same direction can add our voices to the
chorus?

And, if not, what is your goal?

 _Edit: I see that you 've answered my question on your blog[2] by answering
someone else who asked the same thing. Thank you!_

[1]
[https://devforums.apple.com/community/tools/languages/swift](https://devforums.apple.com/community/tools/languages/swift)

[2]
[http://openradar.appspot.com/17472835](http://openradar.appspot.com/17472835)

~~~
vor_
Marcel has made his feelings on Swift clear:
[http://openradar.appspot.com/17180612](http://openradar.appspot.com/17180612)

------
zvrba
The author argues against crashing on integer overflow by drawing an example
from a totally different domain for which Swift is not at all intended.

Integer overflow has been a source of numerous exploits, and I'd prefer the
software on my desktop machine / phone / tablet to crash instead of giving
unlimited access to the intruder.

The story changes radically if we start talking about mission critical
systems, like car control.

~~~
mpweiher
Looking at
[https://www.owasp.org/index.php/Integer_overflow](https://www.owasp.org/index.php/Integer_overflow)
the consequences of integer overflow by itself either crashes/infinite loops.
Crashing on overflow is not an improvement in these cases.

To get into exploitable access, you need buffer overflows, which can and
should be protected against separately.

Once you protect against buffer overflows, crashing on integer overflows gets
you exactly nothing. (Might be OK as an optional, assert-like debugging tool).

~~~
zvrba
> To get into exploitable access, you need buffer overflows, which can and
> should be protected against separately.

Or you can use overflow to make the program to allocate too small buffer thus
obtaining the buffer overflow you need to proceed further.

~~~
mpweiher
You're not making any sense: I already wrote "protect against buffer overflow
instead of integer overflow" and your response is "buffer overflow".

~~~
zvrba
Integer overflow can be a causal cause of buffer overflow. Like this for
example:

    
    
      read(fd, &header, sizeof(header));
      int total_len = header.data_len + header.code_len;
      char *data = malloc(total_len);
      read(fd, data, total_len);
    

Now, the attacker crafts the file and.. there's your buffer overflow which is
a _direct consequence_ of undetected integer overflow [i.e., the program being
allowed to proceed].

~~~
mpweiher
Once again: if you protect against buffer overflow, an integer overflow that
might lead to buffer overflow doesn't matter. The C example doesn't matter
because C has neither.

------
deliminator
Overflow protection is fail-fast behaviour, similar to array bounds checking.
It allows you to more quickly find the source of errors. I believe everybody
agrees it's a good thing, at least in the array bounds checking case.

> Optimization flags in general should not change visible program behavior,
> except for performance.

The only behavior you would be able to observe before, that you wouldn't after
disabling overflow behaviour, is a crash, and you would have fixed that anyway
as soon as you observed it.

Of course, if there is $370 Million on the line, maybe just disable it and
hope for the best :-)

