
Future Directions for Optimizing Compilers - lifthrasiir
https://arxiv.org/abs/1809.02161
======
michaelmior
> For these we don't really need optimizations.

While it's true that many sites with high volume are written in high level
languages, it's still true that these will require more CPU and memory than
something which is highly optimized written in a lower level language. The
trade-off is that it takes less developer time to write code in the high level
language and the savings there are enough to invest in more computing power.

However, if you can optimize code for high level languages even further,
you're still reducing the hardware requirements and ultimately saving costs.
If you can build a stable Ruby VM that runs with 10% fewer resources and is
compatible with existing applications, I think you'll find people willing to
throw money your way.

~~~
userbinator
_The trade-off is that it takes less developer time to write code in the high
level language and the savings there are enough to invest in more computing
power._

Not if your application is being used by many users; optimising exclusively
for development time is only for rarely-used/one-off things or if you don't
value your user's time --- some of whom may be themselves developers. The more
users you have, the more it matters.

We've somehow gotten ourselves into a situation where relatively few
developers are making countless more users waste time and hardware resources
for their own selfish benefit, and I think that's a very bad thing.

~~~
pjmlp
I think it is a matter of saving resources on language design until it
matters, not always a good decision.

Take Java for example.

When it came into scene we had Oberon and its variants, Ada, Modula-3 and
Eiffel.

All supported JIT/AOT compilation, value types, and with exception of Oberon,
generics.

Most of the last changes at the JVM level, and the upcoming ones on Java's
roadmap are basically catching up with what was already possible in the
mid-90's.

So how much money could Sun, Oracle and IBM have saved if those features were
there from the beginning?

We will never know, but we will know how much the bill is going to be, given
that it is 10+ year process currently.

All because the current hardware changes make it impossible for Java to
continue to ignore those features, if it is to stay relevant against other
languages.

Same applies to other languages in different ways.

~~~
hyperman1
So the question becomes: If Java is today still catching up with a list of 90s
languages, why is it used instead of them?

Part of the answer is that Java still advance the state of the art for most
people: It got safety and garbage collection accepted by the mainstream.

Oberon etc were not in reach for the average Joe, the tooling was too
expensive and the cheap, resource constrained PCs couldn't run it. The
language required sophistication from the programmers. Nobody was available to
answer questions, the ecosystem wasn't there.

Java, o.t.o.h. had something for everybody. It was safer than C, important if
you wanted a stable web server. It was faster than perl or shell scripting, so
you also had a cheap web server. The browser got nicer graphics from applets.
Enterprise architecture got a way forward from CORBA to JEE without having to
admit something was wrong. Hardware vendors got a way to stay relevant in the
face of Windows. Tooling vendors and consulting companies got a reason to
sell. Managers got cogweel replaceable programmers. Schools got a reasonable
and vendor-neutral language. Even if none of promises was completely
fulfilled, it was close enough long enough.

So basically the language design resources for Oberon were in the end more
wasted than those on Java. 'Worse is better' once again.

~~~
microtherion
> Oberon etc were not in reach for the average Joe, the tooling was too
> expensive and the cheap, resource constrained PCs couldn't run it.

I'm sorry, but that's just nonsense. You can argue lack of documentation and
user community, but Oberon is about as resource economic as it gets, and was
originally designed for mid-80s hardware.

[https://en.wikipedia.org/wiki/Ceres_(workstation)](https://en.wikipedia.org/wiki/Ceres_\(workstation\))

~~~
hyperman1
And where was the oberon tooling for x86 dos or win98. Im not saying it was
impossible to create, but nobody seems to have cared enought to do it, to get
it in the hands of the user at an affordable price

~~~
pjmlp
It was available for free on Linux, SunOS, Windows, with the source code
documented on the "Project Oberon" book, affordable enough for you?

~~~
hyperman1
It's a good start but:

* It wasnt very visible in the ICT media. Sun did a full steam ahead Java Campaign

* Open source at the time was not seen as trustworthy. I did run Linux at the time. SO did you, probably. Corporations would not touch it with a 10 feet pole. I presume oberon was seen in the same light.

------
mpweiher
While the suggestions are definitely interesting, and the fact that
compilation times are given proper attention is positive, it does seem to me
that this is still pushing in the wrong direction, overall, of fully automatic
optimization.

We now have machines that are fast enough for many/most activities, as
evidenced by production code that's written in Ruby, Python, JS, etc. For
these we don't really need optimizations.

On the other hand, performance critical routines tend to require human
attention, and for these the approach of having the optimizations fully
automated is generally less than helpful.

As Knuth put it, in 1974 nonetheless:

"For some reason we all (especially me) had a mental block about optimization,
namely that we always regarded it a behind-the-scenes activity, to be done in
the machine language, which the programmer isn’t supposed to know. This veil
was first lifted from my eyes in the Fall of 1973. when I ran across a remark
by Hoare [42] that, ideally, a language should be designed so that an
optimizing compiler can describe its optimizations in the source language. Of
course! Why hadn’t I ever thought of it?

Once we have a suitable language, we will be able to have what seems to be
emerging as the programming system of the future: an interactive program-
manipulation system ..." \--
[http://www.cs.sjsu.edu/~mak/CS185C/KnuthStructuredProgrammin...](http://www.cs.sjsu.edu/~mak/CS185C/KnuthStructuredProgrammingGoTo.pdf)

So let's have systems/languages that allow us to express performance
constraints and performance-oriented transformations, and let the
compiler/language assist us in doing this.

~~~
simias
That seems interesting in theory but I have a hard time imagining what it
would look like in practice. In my experience micro-optimizing C code looks
like what you describe: when you write your algorithm you have a pretty clear
idea about what the machine code should look like. You compile it and check
the assembly output to see if the compiler understood what you were going for.
If it doesn't look like what you were expecting you need to make sure that the
compiler didn't manage to actually generate better code than you exepected
(because it might know things about the underlying architecture that you
don't) and if it turns out that it's genuinely sub-optimal then you refactor
your code a bit until you manage to get the compiler to do what you want. It
can be a frustrating process.

Now we do have a few language constructs to help us here, although sometimes
it's not enough. We have inline, restrict, register as well, although that one
is not really used a lot in practice in my experience (at least not for its
original purpose). We have compiler extensions for computed GOTOs, alignment
constraints etc...

In this context what would "an interactive program-manipulation system" look
like? Personally I think we could use more standardized constraints in
languages like C to be able to annotate code in a way that could be useful to
optimizer, such as tagging "cold" code, having control of the prefetcher and
being able to write SIMD code without using intrinsics (a very complex problem
admittedly). Standardizing computed gotos would be nice as well.

Beyond that, in my experience a big problem with modern architectures is that
it's very much non-trivial to figure out what's the fastest way to implement
an algorithm. You have to consider which CPU model you're using, the cache
architecture, RAM speed ... Second-guessing the compiler is pretty hard if you
don't have a lot of experience optimizing for a particular architecture.

~~~
theoh
It seems pretty clear that Knuth was not talking about instruction-level
optimization, but about those activities/features that came to be known as
metaprogramming, reflection, refactoring etc.

I imagine he was thinking about human-directed program transformations that
are almost language and machine-independent, implicitly high-level, whole-
program stuff, rather than aimed at writing "idiomatic" fast machine code. So
I guess not the kind of optimization you have in mind.

What sounds more applicable for your use case would be profile-directed
optimization, maybe based on mutating an initial fragment of assembler in ways
that preserve its semantics but might improve performance. Like
[https://en.wikipedia.org/wiki/Superoptimization](https://en.wikipedia.org/wiki/Superoptimization).
But unless you wanted to create an instruction set that included reflection as
a feature, this would most conveniently be written in another language.

Edit: I should clarify, obviously self-modifying code written in assembler is
in some sense reflective, but it's hard work.

------
ananya_muddu
A friend recently defended his doctoral thesis on using constraint-based
optimization in compiler backends (reg alloc and instr shed).

~~~
ananya_muddu
Link to thesis:
[http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232192](http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-232192)

~~~
CalChris
Lazano did the Unison combined RA + scheduler. Cool that he followed through
and got his PhD.

 _Register Allocation and Instruction Scheduling in Unison_

[http://llvm.org/devmtg/2017-03//assets/slides/register_alloc...](http://llvm.org/devmtg/2017-03//assets/slides/register_allocation_and_instruction_scheduling_in_unison.pdf)

------
wiz21c
A bit off topic, but how does one frame the difference between "compiler level
optimisation" and "high level optimisation" ? I can sure make a difference
between optimising for CPU cycles and optimizing a QuickSort pivot search, but
I have hard times generalizing/formalizing that idea. It seems important to me
because it would be needed to set realistic/measurable objectives on
compilers.

~~~
CalChris
I think that _high level optimization_ is a synonym for _machine independent
optimization_. The classic example would be common subexpression elimination.
Similarly _compiler level_ is a synonym for _low level, machine dependent code
generation_. An example would be peephole optimization.

Front end vs back end.

~~~
masklinn
I don't know, HLO could also be language-level optimisations (specific based
on known language semantics), while LLO would be IR-level optimisations (more
generic and reusable).

------
delinka
I think something like BOLT[1] has more promise. Now that I think about it,
I'm curious if I take my unoptimized executable, generate the profile for
BOLT, then apply BOLT's optimizations, is it possible to come out ahead of the
Compiler+BOLT optimizations?

1 -
[https://github.com/facebookincubator/BOLT](https://github.com/facebookincubator/BOLT)

~~~
dbaupp
Correct me if I'm wrong, but that architectually seems to be a conventional
(profile-guided/feedback-driven) optimizing compiler, except the input
"language" is machine code.

For instance, it's peephole optimizations are also a chunk of hand-written C++
code
([https://github.com/facebookincubator/BOLT/blob/2ed436a6d3b17...](https://github.com/facebookincubator/BOLT/blob/2ed436a6d3b17044a8d230af7a73eaf7b7249e8c/src/Passes/BinaryPasses.cpp#L1047-L1147)),
and thus could benefit just as much from the more declarative (and possibly
SMT-driven) ideas in this paper.

Finally, there's all sorts of optimizations that are only possible because of
high-level information from the source language. For instance, removing
unnecessary loads & stores requires knowing pointers don't alias, which is
very difficult to deduce given unannotated integers (most of this is driven by
language rules, such as the (controversial) TBAA in C). Additionally, even the
BOLT paper itself mentions things that can be understood (and optimized for)
from a higher-level language, but are much harder to deduce from a raw binary
(Section 8):

> Indirect tail calls are more challenging for static binary rewriters because
> it is difficult to guess if the target is another function or another basic
> block of the same function, which could affect the CFG

And, the paper touches on your curiosity in the conclusion:

 _> Nevertheless, a post-link optimizer has fewer optimizations than a
compiler. We show that the strengths of both strategies combine instead of
purely overlapping_

(Oh, one more post-finally thing, there's probably an argument to be had that
BOLT fits into the (backend) superoptimizer category of the paper.)

~~~
delinka
BOLT’s input is machine code and a profile from running the executable in
production. This definitely puts it outside the conventional optimizing
compiler because the conventional compiler has no information about how the
code will actually be executed.

~~~
dbaupp
That is how profile-guided optimisation (PGO) aka feedback-directed
optimisation (FDO) works.

------
BlackFingolfin
Am I missing something, or is their example at the start of section 3 (bottom
of page 7) plain wrong? If x is an uint32_t, then ((x << 31) >> 31) + 1
returns either 1 or 2; while the "optimized" version ~x&1 reutrns 1 or 0
(which matches the semantics they describe, while the initial code does not).

~~~
jcdavis
If you look at the llvm IR, it is using ashr (arithmetic shift), so you can
probably assume this isn't targeting unsigned types

~~~
Someone
The text also explicitly states this is about signed integers: _”let’s look at
taking ((x << 31) >> 31) + 1, an inefficient idiom for isolating and flipping
the low bit of a signed 32-bit integer”_

I think there is an error on page 9, though: _”Second, −a+−b can be rewritten
as −(a − b)”_. I think that should be either a left-hand side of _”-a--b”_ or
a right-hand side of _”-(a + b)”_.

------
stcredzero
One thing I've been wondering about recently, is if we should have a compile-
time vs. runtime distinction? What if the language was running on a tracing
JIT, where the JIT output and runtime tracing from last time was cached and
available immediately?

------
carapace
I've been working with compilation using Prolog (after Warren 1980 went by
here on HN a month or so ago.[1])

Two things:

1) If you are writing a compiler and NOT using Prolog you are almost certainly
working way too hard.

2) I think I can see an alternate reality where compiler-like software exists
but high-level programming languages as such do not.

[1]
[https://news.ycombinator.com/item?id=17674859](https://news.ycombinator.com/item?id=17674859)

~~~
dman
Anything publicly available?

~~~
carapace
I'm afraid my own work is yet inchoate, but there are decades of research.
Here are some papers I've been looking at recently:

"Logic Programming and Compiler Writing" David H. D. Warren (this is the
kickoff.)

"Parsing and Compiling Using Prolog" Jacques Cohen and Timothy J. Hickey

"Provably Correct Code Generation: A Case Study" Qian Wang, Gopal Gupta

"From Programs to Object Code and back again using Logic Programming:
Compilation and Decompilation" Jonathan Peter Bowen

"Automatic Derivation of Code Generators from Machine Descriptions" R. G. G.
Cattell

