
What Is the Minimal Set of Optimizations Needed for Zero-Cost Abstraction? - matt_d
https://robert.ocallahan.org/2020/08/what-is-minimal-set-of-optimizations.html
======
modeless
The reason compilation time is a problem is that compiler architecture is
stuck in the 1970s, at least for systems programming languages. Compilers make
you wait while they repeat the same work over and over again, thousands of
times. Trillions, if you count the work globally. How many times has <vector>
been parsed over the years?

In a world where compiler architecture had actually advanced, you would
download a precompiled binary alongside the source, identical to what you
would have produced with a clean build, and that binary would be updated
incrementally as you type code. If you made a change affecting a large part of
the binary that couldn't be applied immediately, JIT techniques would be used
to allow you to run and test the program anyway before it finished compiling.

There is no fundamental reason why anyone should _ever_ have to wait for a
compiler. And if you didn't have to wait, then it would free the compiler to
spend potentially _much_ more time doing optimizations, actually improving the
final binary.

The zapcc project shows a bit of the potential for improvement in build times,
though it's just scratching the surface.
[https://github.com/yrnkrn/zapcc](https://github.com/yrnkrn/zapcc)

~~~
api
There's a lot of much lower hanging fruit. As you state: there is no reason
the source has to be parsed from scratch every single time. A little memo-
ization of intermediate data structures when the underlying source has not
changed could go a long way.

AFAIK Microsoft's toolchains do this but they don't do it well. It's a
frequent source of headaches.

~~~
benrbray
I think this is an area where practice has lagged quite far behind theory (for
entirely reasonable practical reasons). I've seen quite a few papers relating
to incremental parsing, compilation, etc. (e.g. [1,2]) but afaik very few
build systems actually take advantage.

[1] [https://arxiv.org/abs/1312.0658](https://arxiv.org/abs/1312.0658) [2]
[https://blog.functorial.com/posts/2018-04-08-Incrementally-I...](https://blog.functorial.com/posts/2018-04-08-Incrementally-
Improving-The-DOM.html)

------
fanf2
My list would be:

\- Inlining

\- Common subexpression elimination

\- Dead code elimination

\- Monomorphization

I get the impression that 2 and 3 come out of an SSA middle end fairly
straightforwardly, along with the article’s copy propagation and dead store
elimination.

The biggie is monomorphization, which is part of the language in C++. The
alternative is uniform representation of generic values, which implies boxing
and indirection instead. I understand Rust does not hide representations
enough for the compiler to be able to choose whether or not to monomorphize.
Interestingly, the Go generics discussion seems to be heading towards a design
where the compiler does have this option.

~~~
mshockwave
+1 for Monomorphization. Though more and more code base try to avoid virtual
dispatching by either not using it at all or using template tricks instead

~~~
pornel
OTOH monomorphisation multiplies amount of code to optimize. For large
functions that is problematic.

Rust's standard library uses a pattern (written manually) of "hoisting"
generic code out of large functions, e.g.

    
    
        fn generic<P: AsRef<Path>>(path: P) {
           non_generic(path.as_ref());
        }
    
        fn non_generic(path: &Path) {
          // lots of code
        }
    

That's cheaper to compile and reduces binary size compared to having one large
generic function monomorphized for every AsRef<Path> type.

It'd be great to have this as a compiler optimization to automatically have
"half-monomorphic" functions.

~~~
matt_d
FWIW, this technique has been known in the C++ community as "hoisting", too
(often applied to templates, particularly container class templates, but also
smart pointers); cf. "Designing and Coding Reusable C++" by Carroll and Ellis,
1995 ([http://cpptips.com/hoisting](http://cpptips.com/hoisting)).

"Given this, what techniques can be used to reduce template instantiation
time? One technique is called "hoisting." This is a generalization of the
"wrappers for pointer containers" technique that showed up as a tip within the
past week or so. The idea is quite simple: when writing a template class,
consider each method in turn. If the method does not depend upon the template
parameters, split the template class into a non-template base and a template
derived class. Move (hoist) these parameter independent methods up into the
base class. Note that with experience, you will begin to recognize
opportunities for "hoisting" even in cases where methods initially appear to
depend upon template parameters."

See also "thin template" (demonstrating the technique for containers, together
with a Symbian OS adoption example,
[https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Thin_Templ...](https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Thin_Template))
as well as "Minimizing Dependencies within Generic Classes for Faster and
Smaller Programs", Dan Tsafrir, Robert W. Wisniewski, David F. Bacon, and
Bjarne Stroustrup (ACM OOPSLA 2009).
[https://www.stroustrup.com/SCARY.pdf](https://www.stroustrup.com/SCARY.pdf)

"Reducing bloat by replacing inner classes with aliases can be further
generalized to also apply to member methods of generic classes, which, like
nested types, might uselessly depend on certain type parameters simply because
they reside within a generic class’s scope. (Again, causing the compiler to
uselessly generate many identical or nearly-identical instantiations of the
same method.) To solve this problem we propose a “generalized hoisting” design
paradigm, which decomposes a generic class into a hierarchy that eliminates
unneeded dependencies."

------
toolslive
\- Should there be a reference to C++/Rust in the title ?

> Tail calls: I can't think of a Rust or C++ abstraction that relies on TCO to
> be zero-cost.

This is an interesting one. Doesn't the absence of TCO mean that there are a
lot of (recursive) abstractions that cannot ever be zero cost ? First thing
that comes to mind is processing of composites (for example serialization of a
protobuf type definition). There are others.

~~~
swsieber
Can't one fake TCO by hoisting things out into a loop?

> Tail calls: I can't think of a Rust or C++ abstraction that relies on TCO to
> be zero-cost.

Rust doesn't have TCO, so it's hard to rely on something that doesn't exist
(unless it's on nightly and I missed it).

~~~
steveklabnik
Rust doesn't _guarantee_ TCO, but LLVM may decide to do it if it wishes.

~~~
Gibbon1
Lack of a guarantee says you should use explicit loops.

~~~
majewsky
That's assuming the tail call is visible in human-level source code. You could
also have a situation where A tail-calls B, B tail-calls C and C tail-calls A.
If B and C get inlined, you now have the opportunity to convert A's self-tail-
call into a loop. Refactoring A manually may not be desirable, e.g. if
inlining B and C cause significant code duplication, or if they live in
entirely different libraries.

~~~
heavenlyblue
This is obviously a mistake if that happened, are you proposing for the
compilers to “fix” this bug automatically until someone changes one of these
functions to not do a tail call and it explodes out of the blue?

~~~
majewsky
Why would that be a mistake?

------
Kinrany
Aggressive inlining keeps coming up.

There's Spiral [0]: a language where inlining is the default and also a first-
class concept, much like immutability in new popular languages like Rust.

As a bonus, the commit messages contain a whole journal worth of notes.

[0]: [https://github.com/mrakgr/The-Spiral-
Language](https://github.com/mrakgr/The-Spiral-Language)

------
Animats
The expensive optimizations are the ones that require non-local analysis.

Register allocation is classically a win for compile speed. It takes less
compiler work to put values in registers than to emit all the loads and stores
and pushes and pops for putting them on the stack.

------
blaisio
Looking at it from this angle is a very interesting idea. Languages like Go
have to avoid a lot of abstractions to avoid increasing compile time. If we
can narrow down the specific optimizations required, we can make better
decisions about the abstractions that could be supported.

------
proverbialbunny
Speaking of zero-cost abstractions. How many languages have this feature? Do
only C++ and Rust support zero-cost abstractions?

~~~
benrbray
I don't know how truly "zero-cost" the compiler implementations are, but the
premise of functional programming languages like Haskell seems to be that
programmers should be able to freely think in abstractions like map, fold,
monads, etc.. without worrying about performance.

~~~
proverbialbunny
The benefit of zero cost is creating types that are not native, yet have the
speed of native types.

Eg, I do a lot of financial programming, so I need a decimal type, which is
like a float, but base 10 so math with no floating point errors.

When I turn on compiler optimizations on zero cost languages, I get roughly a
15-20x speedup, depending on what it is. Interestingly, languages without zero
cost abstractions, but are still quite fast, eg Java, run at the speed of the
zero cost languages with optimizations turned off.

Zero cost is not helpful in most use cases, but in my situation it gives quite
a large speed boost. Sadly, I feel like I'm locked in to using Rust (or C++)
and don't have much of a choice. I suspect Haskell is going to be quite a bit
slower for creating custom types unfortunately.

~~~
benrbray
Interesting, thanks for your perspective! Regarding speed, I suspect there
might be more to the story since so many finance firms [1] are known to use
Haskell internally. Perhaps they're not using it for anything speed-critical,
though.

[1]
[https://wiki.haskell.org/Haskell_in_industry](https://wiki.haskell.org/Haskell_in_industry)

~~~
proverbialbunny
Keep in mind finance is not quantitative finance. (edit: I just realized I
didn't specify above. I'm doing quant work.)

In finance (and quant work) you don't want a mistake. You want what you code
to have provability, so you know what you write is what you get. This is super
important. Speed in pure finance is not important.

In quantitative finance the common practice today is companies roll out their
own programming language that has provability like Haskell, but is closer to
the speed of C++. These languages tend to be a big closer to lisp than
Haskell, but it depends from company to company.

------
andoma
gcc have -Og which turns on optimization steps that can be enabled without
sacrificing debugability.

~~~
slavik81
In practice, I find I usually have to switch to -O0, as nearly all my
variables are <optimized out> with -Og.

~~~
pm215
That's been my experience too. Unfortunately the gcc devs essentially poisoned
their own well there -- I'm now much less likely to want to go back and try
-Og because of my initial bad experience. If they'd started with "always good
debug and very few optimisations" and only added in optimisations when they
didn't break the debug illusion then the user-experience would have been much
better (always at least as good as -O0 when debugging, and gradually gets
faster as you use newer gcc versions).

------
ndesaulniers
> actually configure the Rust compiler with a promising set of optimizations
> and evaluate the results.

LLVM lets you configure an optimization pipeline of passes. I'd hack up my own
-O flag and pipeline. It would be cool to see these suggestions tested out.

~~~
steveklabnik
You don't need code changes to try this out, see

* [https://doc.rust-lang.org/stable/rustc/codegen-options/index...](https://doc.rust-lang.org/stable/rustc/codegen-options/index.html#passes)

* [https://doc.rust-lang.org/stable/rustc/codegen-options/index...](https://doc.rust-lang.org/stable/rustc/codegen-options/index.html#no-prepopulate-passes)

~~~
ndesaulniers
Oh cool! Happy to see you land on your feet btw, new gig looks cool!

~~~
steveklabnik
Thank you! It's been really really great :)

------
ncmncm
I think common sub-expression elimination is a part of hoisting out of loops,
which seems pretty important.

I have seen Gcc dismantle one loop into two adjacent loops, the first to pre-
compute a table for the second loop. My jaw was on the floor.

~~~
roca
Can you give an example where CSE/hoisting is necessary to make an abstraction
zero-cost? I mean: give an example of a zero-cost abstraction that takes some
hand-written code and packages it up nicely, and then show that the compiler
can't compile it back down to the hand-written code without CSE/hoisting.

~~~
fanf2
The usual example is something like a safe array where the bounds are checked
on every access. The bounds check is a common subexpression in code that
accesses the array more than once. If the abstraction includes a safety check
on access that doesn’t depend on the loop variable (eg if empty arrays are
NULL) then that can be hoisted.

------
brandmeyer
"copy propagation" sounds like he's got two things mixed up: Copy elision and
constant propagation. Each one of these are important for zero-cost
abstractions, IMO.

~~~
roca
I mean this:
[https://en.wikipedia.org/wiki/Copy_propagation](https://en.wikipedia.org/wiki/Copy_propagation)

~~~
brandmeyer
With respect to zero-cost abstractions, its copy elision that matters.
operator= is just another function in C++, for example. Only in a handful of
specific circumstances is the compiler allowed to elide calls to operator=.

