
Don’t assume that safety comes for free: a Swift case study - deafcalculus
http://lemire.me/blog/2016/12/06/dont-assume-that-safety-comes-for-free-a-swift-case-study/
======
iainmerrick
This seems totally misguided and inflammatory. The points made boil down to:

1\. Overflow checking adds about a 2x overhead to "+" in a loop

2\. Using "reduce" rather than a loop also has ~2x overhead

Neither is really surprising, and hardly seems to warrant the final tangent
that "I do not think it is universally a good thing that software should crash
when an overflow occurs" since "a mission-critical application could crash
because an unimportant function in a secondary routine overflows".

What about the mission-critical bugs, especially security bugs, that are
_caused by_ overflows?

Swift's default behavior, of aggressively checking for overflows, is at least
arguable, and there's a school of thought in software engineering that says
"don't nail your code to the wall", i.e. don't try to keep it running at all
costs when something goes wrong. Let it crash and restart it. Obviously that
approach works well in some scenarios and less well in others. The creators of
Swift reckon it's a good default for most programmers and I reckon they're
onto something.

The fact that the overhead is _only_ 2x for a tight inner loop should be a
cause for celebration, not dire warnings. And as the author himself shows,
it's very easy to disable overflow checking on a per-operation basis when you
want to.

~~~
mikeash
I'm puzzled by how many people want their programs to just keep on trucking
and try their best after something unexpected occurs.

Crashing is not the worst thing your program can do. In fact, I'd argue that
it's the second- _best_ thing your program can do, right behind working
correctly. Losing data is much worse than crashing, and corrupting data is
worse still. Both are strong possibilities once your program's state is
something unexpected.

~~~
arcticbull
I agree and disagree. What the author says is true, if a thoroughly irrelevant
part of your program (let's say a drawing routine for UI somewhere) hits an
overflow and crashes your app in the middle of taking a payment, or in the
middle of surgery -- while that much more critical element continued to work
perfectly -- that's not ideal.

Not all parts of your app make sense to fail-fast. By enforcing this behavior
even on peripheral parts of your app, you're taking away the decision making
ability about what's important and what isn't from the system designer.

I understand that position, though I still think on the whole explicitness is
right. Being able to explicitly disable the check when you know what you want
is the right balance, whether it's &\+ in Swift or (better, imo)
.wrapping_add() in Rust.

~~~
mikeash
Displaying corrupt data in the UI can be just as bad as saving it to disk, if
the user actually reads and uses the data you display (and if not, why are you
displaying it?).

And since the UI almost certainly shares data with the rest of the program,
bugs in the UI can corrupt the underlying data.

Separation of concerns is great, but if you really want different parts of
your system to have different reliability constraints, then you need to
_truly_ separate them, for example by running them in different processes.
Running them in the same process but having some of the code be more lax about
state because it's less important is just begging for horrible bugs.

~~~
lomnakkus
> Separation of concerns is great, but if you really want different parts of
> your system to have different reliability constraints, then you need to
> truly separate them, for example by running them in different processes.
> Running them in the same process but having some of the code be more lax
> about state because it's less important is just begging for horrible bugs.

Indeed, and you might even want to run them on entirely different and
independent _systems_ (connected by some type of network/bus) depending on
exactly what bits are critical vs. non-critical.

~~~
IgorPartola
Why can't it be opt-in instead of opt-out? Or can we have more granular
control overall? On an application, module, class, function, and block level?
That way my super tight hit loop that I can prove does not overflow gets a 2x
speed up, while the rest of the code is safe. Oh and I can always recompile it
to be always safe and analyze crashes in the real world, no t just my silly
unit tests that amount to 2 + 2 == 4.

~~~
stouset
Continuing on as a default is dangerous, and has led to hundreds of thousands
of security vulnerabilities. Not continuing is inconvenient, but safe.

~~~
emodendroket
Maybe it's time to bring back VB6 and On Error Resume Next.

~~~
flukus
It never went away, now it's just "catch (Exception) { }" spread liberally
throughout the code base.

~~~
vurpo
How to fix all errors in Python:

    
    
      try:
        #your code goes here
      except:
        pass

------
bjourne
Most costly software bug I've ever witnessed was caused by an integer
overflow. Thankfully, it wasn't _caused_ by me, but if I had been auditing the
code, I probably wouldn't have found the bug.

The system was billing customers credit cards depending on how long they had
used the service. Time was measured in milliseconds (uh-oh!) for no apparent
reason. Usage could have been measured in seconds or even days but someone
thought it was good to be extra precise. And System.currentTimeMillis()
returns milliseconds.

The default charging period was 14 days which worked well. So the maximum
number of milliseconds that could be charged (for someone who used the system
the whole month) was 1,209,600,000. Then the company decided to change the
period to every two months (60 days) instead to save money as there was a
fixed cost added to every credit card transfer.

Guess what 60 * 24 * 3600 * 1000 is? It's a number a bit bigger than 2^31 - 1
which is the most positive primitive integer value in Java. And the
"totalDuration" variable had type "int". :)

So totalDuration wrapped around which caused the system to retry the
transaction over and over and debit customers hundreds of times more than what
they really owed. The resulting fallout from that debacle was one of the
reasons the company went bankrupt. Integer overflow checking could have saved
them.

------
Animats
Now that more languages are checking for integer overflow, it's time for
integer overflow exceptions to re-appear in hardware. DEC VAX machines had
this, but C didn't use them. With the hardware doing the checking in parallel,
there's no performance penalty.

If you want wrap-around arithmetic (which is rare) you should have to write
something like

    
    
        unsigned short i,j;
        i = (i + j) % 65536;
    

which the compiler should optimize into a no-check add. This gets you the same
answer on all platforms.

~~~
gok
Do you find a shorthand for wraparound math (like Swift's &+) so bad?

------
lmm
> That’s because I do not think it is universally a good thing that software
> should crash when an overflow occurs. Think about the software that runs in
> your brain. Sometimes you experience bugs. For example, an optical illusion
> is a fault in your vision software. Would you want to fall dead whenever you
> encounter an optical illusion? That does not sound entirely reasonable, does
> it? Moreover, would you want this “fall dead” switch to make all of your
> brain run at half its best speed?

If you lived in a world where hackers crafted optical illusions that made you
send all your money to them when you viewed them, you would probably want to
go blind or some such when you encountered such an illusion.

~~~
mikeash
A bit of a tangent here, but anyone intrigued by the notion of hacking the
brain using optical illusions should read "comp.basilisk FAQ":
[http://ansible.uk/writing/c-b-faq.html](http://ansible.uk/writing/c-b-
faq.html)

~~~
e28eta
Neat! Reminded me of Snow Crash, by Neal Stephenson. I'm sure nearly everyone
is familiar with it, but for those who aren't: it's pretty good!

------
sulam
I found this statement funny:

"Moreover, would you want this “fall dead” switch to make all of your brain
run at half its best speed?"

...because this is actually how our brains work! We are much slower to process
and respond to all sorts of stimuli (visual, auditory, conceptual [reading])
when it is contradictory. Think of the feeling you get looking at an Escher
sketch.

------
c0ffe
After chasing strange bugs when using dynamic languages like PHP and
JavaScript that keep running by default when minor errors happen (PHP
warnings, or undefined variables in JavaScript), I think its good that Swift
priorizes safety rather than speed.

------
mcguire
" _That’s because I do not think it is universally a good thing that software
should crash when an overflow occurs. Think about the software that runs in
your brain. Sometimes you experience bugs. For example, an optical illusion is
a fault in your vision software. Would you want to fall dead whenever you
encounter an optical illusion? That does not sound entirely reasonable, does
it? Moreover, would you want this “fall dead” switch to make all of your brain
run at half its best speed? In software terms, this means that a mission-
critical application could crash because an unimportant function in a
secondary routine overflows._ "

That is a ridiculous analogy. What if we replace "optical illusion" with
"hallucination"?

More importantly, what if there were some sort of middle ground between
continuing on as if nothing happened on an error and crashing completely?

------
nkurz
I thought it might be interesting to see how this effect changes with the size
of the array being summed. How do the relative speeds change when operating
out of L1, L3, and memory? Does the lower speed of memory access overwhelm the
overhead of the overflow checking?

    
    
      $ swift build --configuration release
      $ cset proc -s nohz -e .build/release/reduce
    
      # count  (basic, reduce, unsafe basic, unsafe reduce)
      1000      (0.546, 0.661, 0.197, 0.576)
      10000     (0.403, 0.598, 0.169, 0.544)
      100000    (0.391, 0.595, 0.194, 0.542)
      1000000   (0.477, 0.663, 0.294, 0.582)
      10000000  (0.507, 0.655, 0.337, 0.608)
      100000000 (0.509, 0.655, 0.339, 0.608)
      1000000000(0.511, 0.656, 0.345, 0.611)
    
      $ swift build --configuration release  -Xswiftc -Ounchecked
      $ cset proc -s nohz -e .build/release/reduce
    
      # count  (basic, reduce, unsafe basic, unsafe reduce)
      1000      (0.309, 0.253, 0.180, 0.226)
      10000     (0.195, 0.170, 0.168, 0.170)
      100000    (0.217, 0.203, 0.196, 0.201)
      1000000   (0.292, 0.326, 0.299, 0.252)
      10000000  (0.334, 0.337, 0.333, 0.337)
      100000000 (0.339, 0.339, 0.340, 0.339)
      1000000000(0.344, 0.344, 0.344, 0.344)
    

Code is from [https://github.com/lemire/Code-used-on-Daniel-Lemire-s-
blog/...](https://github.com/lemire/Code-used-on-Daniel-Lemire-s-
blog/tree/master/2016/12/05) with modification to loop over the different
array lengths. Numbers are for Skylake at 3.4 GHz with swift-3.0.1-RELEASE-
ubuntu16.04. Count is the number of 8B ints in the array being summed. Results
shown were truncated by hand --- I wasn't sure how to specify precision from
within Swift. The execution with "cset proc -s nohz" was to reduce jitter
between runs, but doesn't significantly affect total run time. The anomalously
fast result for the L3 sized 'unsafe' 'unchecked' is consistent.

------
wlesieutre
Site is down, cached here:
[http://webcache.googleusercontent.com/search?q=cache:MYG52qu...](http://webcache.googleusercontent.com/search?q=cache:MYG52quo1u8J:lemire.me/blog/2016/12/06/dont-
assume-that-safety-comes-for-free-a-swift-case-study/)

EDIT: It's back up

------
xenadu02
I don't get the same numbers he gets. The reduce version is the same speed as
the simple for loop. He must have made a mistake somewhere.

------
dispose13432
How do languages which aim to replace C (such as Rust) deal with this issue?

Now I agree, your average webapp won't see any benefit by removing checks and
will see security features by keeping them in, so I'm all for it.

But in OSs (or browsers), speed _does_ matter. And there's no way to optimize
it (every + or array operation involves an if).

Is this just one of the "costs of doing business"?

~~~
mikeash
In Swift, you have total control over this. If you want checks, use +. If you
don't want checks, use &+.

In Rust, the operators check in debug builds, and wrap in release builds.
Calls are provided so you can always have them check, or always have them
wrap, if you need it.

Note that C has some really ugly behavior here. Signed overflow is undefined
behavior, which means that the compiler doesn't need to check for it, but also
that it can assume it never happens and optimized based on that. Many
programmers don't realize this, or don't notice that they've accidentally
written code that depends on overflow behavior, which can lead to many
entertaining bugs:
[https://lwn.net/Articles/511259/](https://lwn.net/Articles/511259/)

~~~
gok
&\+ isn't quite the same as + in C though. It's defined to perform wrapping
overflow everytime, so more like integer + in Java. I don't think there's an
addition operator in Swift that means "not only don't do overflow checking,
but also consider overflow impossible and optimize accordingly".

Not entirely clear that such an operator is ever what you really want, I
suppose...

~~~
mikeash
I believe you can get that behavior by using + in a file that you compile with
-Ounchecked. I don't believe there's any built-in way to do it at an
individual call site, short of writing a wrapper that you compile with
-Ounchecked and then calling the wrapper where you want that behavior.

~~~
gok
I think you're right, yeah.

------
emodendroket
A factor of three for addition is probably not really significant in most
programs.

------
acqq
I see a number of posts that proudly claim (in different forms) "fail early
and fast" like it's a good thing to simply do a run-time crash always, and
even that the article author is "misguided and inflammatory."

I don't agree with both claims.

Regarding the first, as an example where the crash is definitely not the
solution, see the planteen's post:

[https://news.ycombinator.com/item?id=13117170](https://news.ycombinator.com/item?id=13117170)

Or consider that once you use the computers to calculate the real life stuff
(like the movement of your car, or even the spaceship 50 million kilometers
away) the worst thing you can do is introduce the "fatal discontinuities" in
the processing.

Regarding the second, allow me to just roll my eyes. The politics is not
allowed this week on HN, but the political approaches start to be used
automatically. Please just write which his claim is wrong. Labeling is
destructive.

~~~
mikeash
A crash is never the right solution. It is, however, likely to be a _better_
solution than whatever random default behavior you might get.

Regarding the linked comment, if the default had been to crash on overflow (or
whatever the FPGA equivalent might be, if there is such a thing) then the
problem would have been apparent the first time it happened, and they wouldn't
be describing it as "a nasty bug" in the first place. A crash would not have
been the best solution there, but it would have been miles better than the
wrap that they actually got.

As far as real life stuff, my understanding is that it's pretty common to fail
fast and design the system to restart quickly so it can get back to work. If
your needs are more sophisticated then you'd need to actually analyze the
problem and come up with ways to handle unexpected data gracefully, not just
hope that default behavior from the compiler ends up doing something sensible
for your problem.

~~~
acqq
> If your needs are more sophisticated then you'd need to actually analyze the
> problem and come up with ways to handle unexpected data gracefully

The case I've linked to provided exactly that kind of example: "The correct
solution there was to saturate, not wrap, which is almost always the case in
DSP code."

I have an impression you've written your whole answer without reading it.

And if you don't know what that means:

[https://en.wikipedia.org/wiki/Saturation_arithmetic](https://en.wikipedia.org/wiki/Saturation_arithmetic)

Note that something like that is necessary only for the very performance
critical code using integers, as today's FPU are also very fast and allow the
representation of much bigger range of values: more than the number of the
atoms in the observable universe can be fit in the FP number.

Also note "Saturation arithmetic operations are available on many modern
platforms, and in particular was one of the extensions made by the Intel MMX
platform, specifically for such signal processing applications. This
functionality is also available in wider versions in the SSE2 and AVX2 integer
instruction sets."

Not to mention the DSP processors.

[https://www.arm.com/products/processors/technologies/dsp-
sim...](https://www.arm.com/products/processors/technologies/dsp-simd.php)

"Zero overhead saturation extension support"

Moreover,

"the IEEE floating-point standard, the most popular abstraction for dealing
with approximate real numbers, uses a form of saturation."

All of this is a common knowledge by the professionals, and it's completely
different from "just make an exception" approach that is so popular here. I
claim that there's much bigger possibility that the author of the article
knows about that than the commenters here that are responding (and voting)
with the "just fail fast" mantra (or even claim that those who don't agree
with their (probably poorly founded, as far as I see that) understanding are
"inflammatory."

~~~
mikeash
I read it and understood it. I think we're talking past each other.

You're saying, crashing is bad, it's better to figure out a correct response
to overflow.

I'm saying, choosing some default overflow behavior without any analysis is
bad, it's better to crash.

In some cases, you want to wrap. In others, you want to saturate. In others,
you'll want to signal a failure. In still others, you'll want to fall back to
bignums. There's no universal correct answer.

But the discussion here is about what the _language_ should do about overflow.
The language has to pick some default behavior for the + operator, and it can
only pick one. Some people are arguing that wrapping by default is the best
choice, because it at least gives the program a chance of continuing to
operate.

I am arguing that the best _default behavior_ is to crash, because continuing
on when the programmer hasn't explicitly chosen the appropriate behavior is
potentially disastrous.

Of course the person who posted that comment didn't want their stuff to crash.
But if the default behavior had been to crash (ignoring the likely
impossibility of that in the context of an FPGA), their bug would have been a
simple fix instead of something so nasty they still remember it. Sure, if the
default had been to saturate then they would have avoided the bug, but only by
coincidence, and this would just cause bugs elsewhere when people needed
wrapping instead.

The fourth word of your quote is key: "The correct solution _there_...." That
was the correct solution in that spot, but it would not be the correct choice
for default behavior for the + operator in a language.

~~~
acqq
If you are discussing the semantic of the "integer + operator" then I have
also completely another opinion:

"it would not be the correct choice for default behavior for the + operator in
a language."

No. Except if you're doing crypto, the wrap alone is _never_ the desired mode.
The wrap with the accessibility to the carry flag can be (just in implementing
arithmetic with more bits). What the HL languages should actually do is not
even allow the "easy" expressions in the integer domain, exactly because of
all that. Moreover, missing accessibility of the carry flag in the HLL is also
a problem. The instruction sets of ASM instructions are much more reasonable
than the poor implementations of all the HL languages I know. "Exception on
carry" instead of enforcing the code to be precise is the way to the many more
accidents waiting to happen.

The current practice of the HL languages is more of the lazy repetition of
some old decision than the proper safety-preserving approach.

~~~
mikeash
If you're arguing that there shouldn't be any default behavior, and the
programmer should always be forced to choose, I could get behind that.

I've also argued for degradation to bignums as the best default. This way the
result is always correct. Of course, that means a performance hit on all
arithmetic code unless you opt out of that, and you could make the case that
this is also an unacceptable default.

In any case, consider the "crashing is the best default choice" to be
qualified with "if you're going to have some default choice for fixed-size
integers."

~~~
acqq
> I've also argued for degradation to bignums as the best default.

It has sense for Python and Perl code but not for the code produced by
something that would be better C than the current C. If the performance is
critical (as to have to effectively no slowdown compared to what the CPU can
achieve), the CPU machine code instructions already do "the right thing" and
it's the HL languages that abstract the carry or the saturation away to make
the wrong plain "+".

C with the CPU-specific extensions can "do the right thing." It's the
"standard" "language lawyers" that are once again wrong.

