Hacker News new | past | comments | ask | show | jobs | submit login
On sufficiently smart compilers (osa1.net)
50 points by YAFZ on Dec 15, 2015 | hide | past | favorite | 16 comments

> Unreliable optimizations and performance-critical software

This is why we will not have, and should not want, a "sufficiently smart compiler," but instead a "sufficiently predictable compiler." If the compiler is an enormously complex inference system, then tiny language-level changes can result in huge performance changes (see "space leaks" in Haskell, or "auto-vectorization" in basically anything). The solution isn't adding more knobs to the compiler; it's adding easier inter-language communication. Scripting languages like Matlab, Perl, Python, and R have been doing this right for decades: make a good effort at a specific domain (math, text, statistics), and make it easy to call into lower-level code for the key pieces.

Space leaks are a bad example imo since they happen in other languages as well.

Yeah, but having a chain of functions failing to fuse properly can be quite surprising.

There seemed to be an implication that Hindley Milner type inference leads to unpredictable performance, you can see that is untrue with ml and ocaml.

Compiler optimizations as part of libraries is definitely something I would like to see more languages explore. I think this is something that could fit quite well into Rust, actually, with its philosophy of allowing compiler plugins.

A simple example: Most high-level array implementations have a reserve method that allows you to specify the expected number of items added to the array. This allows allocating all required memory up front and hence avoids re-allocs and the associated copies.

In many cases, it would be trivial for a compiler to automatically determine the size of the array and add a reserve call automatically (because the entries are added in a loop with an easily-determined iteration count). Yet compilers cannot be allowed to do that, because doing so changes the visible behaviour of the program.

However, the library in which the array is implemented is free to define the semantics of the array. So that library could provide some hint to the compiler, in essence a library-provided optimization pass, which allows such reserve calls to be added automatically.

I'm not saying that coming up with a good DSL to describe such optimization is easy, but it may well be worthwhile in the long run.

Haskell's GHC offers optimisations-in-libraries via its RULES pragma[1], which allows specifying rewrite rules. These are an important part of the deforestation[2] that makes, for instance, lists and vectors fast.

[1]: https://wiki.haskell.org/Playing_by_the_rules

[2]: http://hackage.haskell.org/package/vector-0.7.1/docs/src/Dat...

Yeah, that's actually mentioned in the article ;)

Tools already exist for doing run time traces of compiled programs and then optimizing accordingly. A lot of what they focus on is improving locality, moving portions of the executable that are run together near each other so improve the instruction cache hit rate. They also much more aggressively inline, to the point that the overall code size increases but hopefully so does performance.

You actually get a huge amount of perf gains just from improving locality, a lot of in-depth optimizations are possible but I am not sure what compilers now days actually do, it has been a long time since I worked on a compiler team!

Got any links? I'd love to check out the tools you're talking about.

Profile Guided Optimisation is the search term you're looking for. For one example: GCC does it (you may need to use gprof as well, can't really remember).

I'd like to see projects that can integrate the runtime information provided by a JIT VM. What if such a project used this information from the start? Then, the programmers might be able to use information like: This variable has only ever had an int in it, ever, over the entire lifetime of the app.

Such information might be problematic to integrate into a legacy project, but in a language which allowed for optional type annotation, such information could be fed into programmer tools and more easily analyzed. (Example: So, this code here where it's not an int -- do we really need to have this, or could we move the error checking elsewhere, so we can just say that's an int?)

Or you could use a language with static typing. It might be the line of work that I'm in, but I've never encountered a case where dynamic typing has really been helpful - more often it's been a source of errors, where non-obvious type coercion has been performed by the interpreter/runtime. I can only think of one case where dynamic typing was necessary, but I'll chalk that up to a really poorly designed COM API that I had to work with.

The problems most people have with static typing are related to explicit typing, not static typing per se. (And somewhat inane conversions tend to give dynamic typing a bad name as well).

There are two cases I can think of where the static typing in the most common languages are insufficient. The first is union types--you can't easily express a data structure like JSON in Java (although one could argue this is a feature, not a bug). The second case is where you need to map types to instances of those types, something akin to a Map<Class<T>, T> in Java syntax. While you can generally specify the latter at a method syntax (e.g., public <T> T get(Class<T> clazz)), there's no way to write that method without having to resort to subverting the type system.

Dynamic typing can be beneficial if you would otherwise be stuck subverting the type system more often than following it, although that's as much of an argument for more powerful static type system as it is one for dynamic typing.

I've had the same experience -- that dynamic typing brings nothing to the party, and often actively gets in the way. I love C++'s extensible explicit static types (enums and structs), and when I don't have them in other languages, I miss them badly. I strongly suspect that dynamic typing is a fad, which will go away as languages change -- especially if programmers stop anthropomorphizing their code and assuming that it wants freedom and liberty. (Imagine if architects imagined that brick or structural steel wanted to be free!)

My experience is in healthcare, machine-language comprehension, enthusiast game development (roughly roguelike-ish -- I haven't ever scraped together the capital and the courage for graphics and music), a little genetic programming, and a little web development (where I outright used Hungarian notation to make Javascript less incomprehensible). What areas have you had experience in, where you've experienced the same?

You could probably do this on native code with something like HP's Dynamo combined with the original symbols.

It seems like errors in the rewrite rules themselves would potentially create very hard-to-debug problems with the runtime code.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact