Yeah, their initial explanation is that its the "safety features", but I was und...

Zarel · on Sept 12, 2019

Their initial explanation is bounds checking specifically. While most abstractions are zero-cost, bounds checking isn't.

I do wish there were a way to track expectations about integer ranges in a type system, but there currently aren't.

steveklabnik · on Sept 12, 2019

Bounds checking is a “zero cost abstraction” in the sense that it is both “pay for what you use” and “you couldn’t implement it yourself any better.” That said, the implemention is not as fast as it theoretically can be, because the compiler isn’t always removing bounds checks that are provably true.

When const generic lands, you will be able to make those assertions! A basic implementation is in nightly.

dbaupp · on Sept 12, 2019

This is taking zero-cost abstraction to the extreme, and I think waters down the concept to the point that it almost isn't useful.

One can argue that any feature is a zero-cost abstraction for the exact set of trade-offs it has (in the limit: "you couldn't implement a feature that executes this exact sequence of instructions 'mov ...' any better if you did it by hand").

I think focusing on a hypothetical perfect coder is more useful, someone who never fails an assertion or bounds check. This person does not need any safety/convenience features (unlike us real humans), but they can still use some such features without a penalty. Bounds checks are not one of them. (But... At least they don't have much of a pervasive/whole program cost.)

imtringued · on Sept 12, 2019

Bounds checking is not an abstraction. Therefore you are the one who is twisting the word "zero-cost abstraction" to cover things it doesn't cover.

haberman · on Sept 12, 2019

GC isn't an abstraction either. By that argument, you could claim that you have zero-cost abstractions even if you have stop-the-world GC.

iopq · on Sept 16, 2019

It does abstract over memory management quite a bit taking out the implementation details. What is an abstraction?

dbaupp · on Sept 12, 2019

Hm, I'm not sure I understand. I don't think I was the one to call bounds checking an abstraction in this thread.

steveklabnik · on Sept 12, 2019

Maybe! I think that the concept is a bit more focused than most people think it is. I can appreciate this perspective though. It's a fuzzy concept, so there's decent room to poke and prod around the edges :)

dbaupp · on Sept 13, 2019

Yeah, that's fair. I think there's a strong argument that bounds checking satisfies the "don't pay for what you don't use" rule (although one could say that every slicing having to carry around its length, doubling it's size, is a pervasive cost, but that's fairly weak, given the other things it enables), but I think the other rule is much, much less clear.

jnordwick · on Sept 12, 2019

how do you not use bounds checking?

(unsafe like get_unchecked doesn't count because then you can say that about almost every rust feature).

steveklabnik · on Sept 12, 2019

You're mis-understanding what "you don't pay for what you don't use" means. It means that a program that does not contain [] in it does not include code for []. There's no "mandatory runtime" code for []. It doesn't mean you can use [] in some special way and somehow get less code.

And even if we were talking about your understanding, I don't see why get_unchecked would be disqualified. unsafe is a part of the language for a reason.

jnordwick · on Sept 12, 2019

I know, the meaning of "zero cost abstraction" is really "zero cost to not use the abstraction."

But that's like saying if you don't use rust you don't pay for it. Just because there is the unsafe escape hatch in the language, you don't get to label the language as zero cost abstraction. Because practically speaking there is a lot more runtime cost to rust than people will tell you.

Many langauges (almost anything without a runtime) meet that definition of zero cost abstraction then.

lilyball · on Sept 12, 2019

You still seem to be missing the point.

If you use a Rust construct that does bounds-checked accesses, you're explicitly using bounds checks. You're not paying for anything beyond the bounds checks that you're using.

If you use a Rust construct that elides the bounds checks (e.g. get_unchecked), there are no bounds checks, and you don't pay any cost for the fact that Rust supports a bounds-checked subscript operator.

If you want to compare Rust's subscript operator, do so with an explicitly bounds-checked version of the C code. That's the appropriate comparison for "zero-cost abstraction". Because the whole point of "abstraction" is it's not a change to the observed runtime behavior, it's just a language abstraction.

comex · on Sept 12, 2019

I don't entirely agree. As an analogy, suppose that you want to pass a pointer to an object as a function argument. As the programmer, you expect the object will necessarily remain allocated as long as the function uses it, since it's kept alive elsewhere, but you might have made a mistake. Before Rust, your options would be:

1. Use a C-style raw pointer. This has no overhead but is unsafe.

2. Use a reference counted or garbage collected pointer. This provides safety but is a non-zero-overhead abstraction. In other situations, reference counting can be necessary to manage unpredictable object lifetimes, but in this case we're assuming it would only be used to guard against mistakes, so the cost represents overhead. Even if some language implements reference counting just as fast as you can implement reference counting in C, that doesn't make it zero-overhead.

But Rust adds a new option:

3. Use a borrow-checked reference. This is safe, yet zero-overhead compared to the unsafe option, as long as your usage pattern can be expressed in terms of Rust lifetimes.

Going back to array indexing, the analogous situation is that you want to access an index that you, the programmer, expect to always be in bounds. Option 1 corresponds to unchecked indexing, and option 2 corresponds to bounds-checked indexing. But Rust brings nothing new in this area: there is no option 3.

Yet an option 3 is theoretically possible: safe unchecked indexing if you can prove to the compiler that the index can never be out of bounds. This can be done in languages with dependent types or built-in proof systems. (It can even be done in Rust with a certain neat hack [1], but with a lot of limitations.)

I'm not saying Rust is bad for not having dependent types. As far as I know, it's an open research problem how to make those features feel ergonomic and easy to use, even to the extent that Rust lifetimes are (i.e. not totally). And on the flipside, bounds checks usually don't cause very much overhead in practice, thanks to CPU branch prediction, plus the optimizer can sometimes remove them.

But I'd say that Rust's choice not to go there (...yet...) justifies calling its current approach a non-zero-overhead abstraction.

[1] https://github.com/bluss/indexing

lilyball · on Sept 12, 2019

Bounds-checked indexing isn't an abstraction, it's a safety feature. The abstraction is compared to manually doing bounds checks like you would in C.

  // C safe subscript
  if (i < size) { return buf[i]; } else { abort(); }

  // Rust safe subscript
  return buf[i];

That's the abstraction, and Rust introduces no overhead here.

jnordwick · on Sept 12, 2019

This is such a useless definition of "zero-cost" abstraction that almost everything qualifies for it.

lilyball · on Sept 13, 2019

It's literally the definition. And very little qualifies. An abstraction is zero-cost if there's no runtime penalty, including if the code you'd write by hand to accomplish this goal is no better than what the compiler synthesized for you via the abstraction.

If the use of the abstraction forces your data model into a suboptimal representation, it's not zero-cost. If the compiler emits code that's worse than the straightforward manual implementation, it's not zero-cost. If the emitted code involves runtime checks that aren't necessary without the abstraction, it's not zero-cost.

For example, reference-counting is an abstraction over tracking the lifetime of your object. In some cases (where ownership isn't clear) the retain count introduced by reference counting is required and therefore not a cost, but in most cases the point at which the object ends up freed is actually predictable, and therefore the cost of all the reference counting is something that would have been avoided without the abstraction. Therefore, reference-counting as a replacement for manual alloc/free is generally not zero-cost.

Or how about iteration. Rust has an external iterator model (where you construct an iterator and run that, rather than passing a callback to the collection). In most languages an external iterator model is a non-zero cost, because you need to construct the iterator object and work with that, which is a bit of overhead compared to what the collection could do with an internal iterator model. In Rust, external iterators are frequently zero-cost because they usually get inlined. So you can write a high-level loop that includes a filter and map and the end result is very frequently just as good as (if not better than) what you'd have done if you wrote the iteration/filter/map by hand without Rust's iterator abstraction.

BeeOnRope · on Sept 12, 2019

I understand what you're saying, but that's not how I understand people use the term "zero cost abstraction". That term usually refers to things you can use and pay zero cost for using it. It does not usually refer to abstractions that impose a cost for use, but no additional cost if don't use them, as you suggest.

A quick Google search of the top hits for the term seems to align with my understanding.

Maybe there is a term for what you are talking about, like "pay for what you use".

jnordwick · on Sept 12, 2019

What part of my comment made you think I didn't understand that?

Was it when I wrote: "it really [is] zero cost to not use the abstraction"? Or when I wrote: "if you don't use rust you don't pay for it"?

I understand the concept fine. It is rust's marketing gimmick and not a useful tool to describe the language.

Rust is no more "zero cost" than C++, Fortran, Ada, Objective C, and even D can even make the claim to some extent (you can opt out of the GC and use malloc/free directly). I'm there are plenty more I'm missing. If you use the Java GC that doesn't collect (used for real-time programs), even that probably fits the description.

Under your description an opt in generational GC is zero cost. You can't have the vector be checked, (along with other small performance issues), but use the excuse that you can write code in a way that doesn't use checked access or have that feature's cost. I can do that in a lot of languages.

Also to some extent you should take the ecosystem into consideration where a large chunk of it goes to lengths to not use unsafe such as using a vector for homogeneous objects to get around pointer semantics. That incurs a huge cost, and that should be counted against the language because it makes the other ways too difficult or the main libraries used rely on those techniques. (and if you don't count rust's ersatz standard library, the you can't count Java's standard library since you can always write Java code that doesn't use a GC or extra features - I've done it a few times).

I saw so much promise in rust, but is seems to have gone to complete crap.

lilyball · on Sept 12, 2019

> What part of my comment made you think I didn't understand that?

This part:

> But that's like saying if you don't use rust you don't pay for it. Just because there is the unsafe escape hatch in the language, you don't get to label the language as zero cost abstraction.

Rust isn't "zero-cost" because of the unsafe hatch; that's completely orthogonal. It's zero-cost because if you don't use a feature you don't pay for it. The fact that you need unsafe to get non-checked subscripting isn't particularly relevant to the fact that using non-checked subscripting in Rust means you're not paying for the existence of checked subscripting.

> Under your description an opt in generational GC is zero cost.

You're conflating implementation with semantics. If you have a choice between different allocation strategies that all result in the same observable runtime behavior, using a garbage collector over manual alloc/free is a cost. With manual alloc/free there's no runtime tracking to determine when an object should be freed, it's entirely static. Using a GC dramatically simplifies this from the developer's perspective and avoids a whole class of bugs, but comes with a runtime cost. Meanwhile for single-owner models, Rust's Box type has no runtime overhead compared to alloc/free, since there's no runtime tracking, the alloc and free points are determined statically by the compiler.

Tomte · on Sept 12, 2019

> Rust is no more "zero cost" than C++

C++ popularized the term. And if C++ is commonly described as having designed its features to be zero-cost (or you only pay for what you choose), there's nothing wrong with Rust describing the same concept with the same term.

jzoch · on Sept 12, 2019

What languages dont have a runtime? Even C has one (albeit a very small one). Nobody labels Rust the language as a zero cost abstraction (thatd be silly - there is a cost to learning it!). Rather, they try to provide zero cost abstractions. A great example is that there are no green threads in rust because they consciously removed them as they penalized rust performance regardless of whether said people used green threads or not.

michael_j_ward · on Sept 12, 2019

What exactly does "you couldn't implement it yourself any better" mean if the current implementation isn't as fast as it could be?

Or are you saying that the current implementation fails to live up to the "zero cost abstraction" goal?

steveklabnik · on Sept 12, 2019

The implementation of bounds checking couldn't be written any better: https://doc.rust-lang.org/1.37.0/src/core/slice/mod.rs.html#...

    #[inline]
    fn get(self, slice: &[T]) -> Option<&T> {
        if self < slice.len() {
            unsafe {
                Some(self.get_unchecked(slice))
            }
        } else {
            None
        }
    }

This is exactly how you'd write the feature, by hand, if you were implementing the language.

That the optimizer could, but does not, do as much optimization as it theoretically can, means that it has more work to do. But that's different than the feature being written in a sub-optimal way.

michael_j_ward · on Sept 12, 2019

I don't know if you edited your comment, or if it was just my pre-coffee reading comprehension missed the part "because the compiler isn’t always removing bounds checks that are provably true."

I understand now, gracias.

steveklabnik · on Sept 12, 2019

I don't think I did, but I am very empathetic to pre-coffee reading comprehension woes :) Glad we got it sorted.

rkagerer · on Sept 12, 2019

This got me wondering why we don't include dedicated hardware for bounds checking in CPU architectures. Intel made an attempt with MPX but from a brief glance on Wikipedia it looks like a fail (slower than pure software approaches like AddressSanitizer, among other issues).

imtringued · on Sept 12, 2019

Probably because it doesn't matter. The only important thing is to make sure that the hardware doesn't mis-predict the bounds check.

wbl · on Sept 12, 2019

Might I intrest you in dependent types? Sadly we don't even have a self hosted compiler yet.

Zarel · on Sept 13, 2019

I really like dependent types as a concept! But they're not really in any mainstream languages, so I haven't had much opportunity to play around with them. :(

bjourne · on Sept 12, 2019

There are! It is done in compiled Lisp and Scheme runtimes such as SBCL to remove provably redundant bounds checking. For example, in (when (< x 4) (when (>= 0 x) (...))) SBCL will infer that if x is an integer, it must have one of the values 0, 1, 2 or 3. The knowledge is useful because if x is used as the index of an array containing 3 or more items the bounds checking can be elided.

tom_mellior · on Sept 12, 2019

Compilers do that, yes. But I think the grandparent wanted to express this at the source level, as an annotation to a type.

fluffything · on Sept 12, 2019

FWIW it isn't hard to write safe Rust code that elides bound checks, but one needs to write proper code.

If one pin points a performance issue due to bound checks, it is also always trivial to disable them in a safe way by writing proper `unsafe` code when it makes a difference.

kzrdude · on Sept 12, 2019

It's never claimed that every abstraction in Rust is zero cost. Just that these are often possible.

atoav · on Sept 11, 2019

While you are spot on with zero cost abstractions a bounds check in Rust is not without penalty (if enabled, should be off in release mode)

__s · on Sept 12, 2019

Bounds checks are always on. Integer overflow checks are disabled in release mode. Bounds checks are necessary for memory safety, whereas integer overflow isn't

kjeetgill · on Sept 12, 2019

I believe release mode keeps bound checks. It's integer overflow checks it gets rid of.

jnordwick · on Sept 12, 2019

you can't turn bounds checking off.

there are unsafe method for vec that allow you unchecked access to the array, but you need to to write it in unsafe block (some code that is performance sensitive does this).

rstuart4133 · on Sept 12, 2019

The explanation I heard from a Rust core dev (explain why Rust tends to be slightly slower than C++) is the borrow checker makes programmers to use a slightly different coding style. Were C programmers tend to pass references around Rust programmers tend to take copies of to avoid the borrow checker.

Passing the smaller reference will be faster in single threaded code. In multi threaded code I'd expect making a thread local copy to be a win, and indeed multi threaded Rust programs tend to be quicker than their C/C++ counterparts.

dataflow · on Sept 12, 2019

I don't know... while it may not be the case here, I wouldn't be exactly shocked if the fact that Rust restricts your ability to do some things ends up having a performance penalty, even if any safety checks themselves happen at compile time.