

Solving Linear Range Checks - jasonmp85
http://www.playingwithpointers.com/solving-linear-range-checks.html

======
pjmlp
> Some “managed” programming languages do automatic, compulsory range checks
> on array accesses, and invoke some kind of error condition if the array
> access is out of bounds.

Not some. All languages besides C and its direct descendent (not counting
Forth and Assembly here).

Other system programming languages, even older than C used to do it.

As Hoare so elegantly described at his Turing award speech, regarding Algol
compilers:

"Many years later we asked our customers whether they wished us to provide an
option to switch off these checks in the interests of efficiency on production
runs. Unanimously, they urged us not to--they already knew how frequently
subscript errors occur on production runs where failure to detect them could
be disastrous. I note with fear and horror that even in 1980, language
designers and users have not learned this lesson. In any respectable branch of
engineering, failure to observe such elementary precautions would have long
been against the law."

------
Animats
Optimizing bounds checks in compilers is very important for performance, and,
as the article points out, often the bounds checks can be completely optimized
out. This was done in at least one Pascal compiler in the 1980s, but the
technology has been lost. Now it's needed in Go and Rust.

The article extends range checking for compilers to handle the case where the
index variable wraps around due to arithmetic overflow, as intended behavior.
This is neat, but isn't a real problem.

~~~
kibwen
LLVM is already capable of eliminating bounds checks and Rust is designed to
avoid straight array indexing anyway. I have repeatedly asked the Servo
developers if they've ever had to think about bounds checking and the answer
is that it's not shown up even once in performance profiles.

Much like the noise over the loss of O(1) string indexing, worries about
automatic bounds checks seem to be nothing more than developers clutching
their pearls over antiquated best practices that amount to premature
optimization at best.

~~~
Animats
_" I have repeatedly asked the Servo developers if they've ever had to think
about bounds checking and the answer is that it's not shown up even once in
performance profiles."_

That's because the problem they're solving isn't array-oriented. Try running
LINPACK benchmarks, matrix multiplies, etc.

(Annoyingly, both Go and Rust lack first-class multidimensional arrays. Arrays
of arrays are not the same thing.)

~~~
dbaupp
What are you specifically looking for when you say "first-class
multidimensional array"?

~~~
Animats
Something at least as good as what FORTRAN had in 1954.

~~~
dbaupp
I.e. like C++'s eigen or Python's numpy? I'm sure that Rust and Go can take
the same approach: implement it in libraries. E.g.
[https://github.com/bluss/rust-ndarray](https://github.com/bluss/rust-ndarray)
is a Rust library for it.

I'm sure you'll be sad that this takes `unsafe` to do performantly, but
embedding it in the language is no different (the burden of proof just
switches from "is this code safe" to "is the code that this code generates
safe", and the latter is a harder question).

~~~
Animats
Not only does that crate use unsafe code, it exports unsafe operations:

    
    
        unsafe fn uchk_at<'a>(&'a self, index: D) -> &'a A
    
        Perform unchecked array indexing.
        Return a reference to the element at index.
        Note: only unchecked for non-debug builds of ndarray.
    

There's no subscript checking optimization for the operations there at all. If
you use the safe "at" operation, you get back a "Some" or "None". Here's the
basic operation of subscripting:

    
    
        /// Return a reference to the element at **index**, or return **None** 
        /// if the index is out of bounds.
        pub fn at<'a>(&'a self, index: D) -> Option<&'a A> {
            self.dim.stride_offset_checked(&self.strides, &index)
                .map(|offset| unsafe {
                    to_ref(self.ptr.offset(offset) as *const _)
                })
        }
    

Is that closure really necessary?

~~~
dbaupp
I don't see why exposing unsafe operations is problematic: the crate's user
still has to explicitly decide to use them if desired.

The normal compiler optimisations still apply to these functions, and they
work reasonably well:
[https://news.ycombinator.com/item?id=9644119](https://news.ycombinator.com/item?id=9644119)

Hm, what problem do you see with the closure? If it's just the `unsafe`...
well, something has to touch the memory at some point.

~~~
Animats
_" I don't see why exposing unsafe operations is problematic: the crate's user
still has to explicitly decide to use them if desired."_

Duh. The fact that someone felt it necessary to expose unsafe array access to
the user indicates that the checking optimization has an overhead problem.

~~~
dbaupp
I struggle to believe that subscript checking can be automatically removed in
arbitrary code in any language (at least, not without full dependent typing),
e.g. it seems pretty hard to make guarantees about something like x[y[i]] with
arbitrary y.

