
Polly: LLVM Framework for High-Level Loop and Data-Locality Optimizations - luu
https://polly.llvm.org/
======
misnome
This sounds interesting, but the documentation is really sparse, basically
only how to turn it on. Am I supposed to be thinking of this a bit like
Intel’s Parallel Studio’s Advisor tools for analysing threading and
vectorisation? How do I tell what it’s doing with e.g. adding OpenMP calls
since it seems to insert them between parsing and object code without
oversight?

Basically, how easy is it, and how do I approach using this, as a non domain-
expert?

~~~
boulos
The best resources on _understanding_ Polly are probably various LLVM
developer conference talks (it’s come up at nearly every one since at least
2010 or so).

 _But_ just like not needing to worry too much about how GCC does this unless
you’re a compiler engineer, I’d say just either let it try or not.

Edit: Here's the 2011 talk
([http://llvm.org/devmtg/2011-11/Grosser_PollyOptimizations.pd...](http://llvm.org/devmtg/2011-11/Grosser_PollyOptimizations.pdf))
that I think is a good yet old intro. Tobias's more recent work (say
[http://www.grosser.es/bibliography/doerfert2017optimistic.ht...](http://www.grosser.es/bibliography/doerfert2017optimistic.html))
is a good "what's possible now-ish". But again, these are both compiler-
researcher papers.

~~~
eslaught
> The best resources on understanding Polly are probably various LLVM
> developer conference talks

I don't suppose there are videos available?

~~~
wyldfire
There are!

[https://m.youtube.com/channel/UCv2_41bSAa5Y_8BacJUZfjQ](https://m.youtube.com/channel/UCv2_41bSAa5Y_8BacJUZfjQ)

[https://youtu.be/A8E5KgzZPMM](https://youtu.be/A8E5KgzZPMM)

[https://llvm.org/devmtg/](https://llvm.org/devmtg/)

~~~
crunchiebones
these are great! thanks

------
gnufx
I'd intended to submit this after seeing the results recently while referring
to it apropos BLAS implementation. I've quoted matmul here as the classic
example of compiler optimization failures and won't again, at least until I've
understood these results. I couldn't see how to reproduce the results and it's
not clear what the various matrix sizes are, related to the dips in MKL
performance that I don't understand where polly wins. Also, it's not clear
what the POWER results are, as BLIS doesn't support it other than via the very
slow generic configuration; I haven't got round to enquiring.

------
fulafel
This sounds very good, just what's been missing from compiler optimizations.
Anyone know if this is alive, or has it been merged to LLVM? The news seem to
end a year ago.

edit: found some activity at the Polly Labs website, eg
[http://www.pollylabs.org/2018/08/15/GSoC-2018-final-
reports....](http://www.pollylabs.org/2018/08/15/GSoC-2018-final-reports.html)

~~~
mahatt
Yes. It works nicely. If you want to enable it on clang, check cmake enable
flag for clang compilation.

------
mpweiher
When I first looked at this, there was a comment stating that (from memory)
despite ~10 years of research and numerous PhDs, papers and conference talks,
polly just doesn't work for production code. That comment is now gone.

Was it retracted because it was inaccurate? Seems to me like an important part
of the conversation, whether true or not.

~~~
fooker
It is still true.

General purpose code does not have the the kind of pure loops doing arithmetic
that polyhedral analysis reasons about.

~~~
gnufx
Yes, but likewise for the classical loop optimizations that this is aiming to
improve on. They clearly do work -- to a lesser extent? -- for production
code, just not the sort that many commentators give the impression of thinking
is all that matters. Linear algebra does crop up all over the place. (In
contrast, I would say gcc's -floop-nest-optimize "experimental" implementation
doesn't work on matmul, at least.)

~~~
fooker
>Linear algebra does crop up all over the place

True. Almost everyone who does matrix/vector operations in production code
uses libraries though. That is why it is tricky to see results from these
kinds of projects on production code.

~~~
mpweiher
And those libraries tend to have been hand-built and/or with special-purpose
tools.

~~~
gnufx
I looked up the Polly reference concerning discussion of BLIS missing
optimized implementations, and GCC and clang apparently failing on BLIS-type
loops. (It mentioned examples of people not using the relevant libraries in
national labs, not that I'd defend that.) It might also have been other types
of kernel that don't exist for ARM or POWER, say. Surely it would be good to
have an effective general-purpose tool (compiler) that could replace
labourious hand-optimization or building special-purpose tools.

