
It's Time to Pay Attention to Intel's Clear Linux OS Project - xkgt
https://www.forbes.com/sites/jasonevangelho/2019/05/13/its-time-to-pay-attention-to-intels-clear-linux-os-project/#2b4743975c49
======
writepub
What are the optimizations, and why can't other distro vendors apply them?
Maybe this is a way for Intel to rally other distros into adopting the
optimizations

~~~
macros
From some browsing through the patches they apply to the packages it appears
they are making extensive use of function multi versioning. Instead of
compiling for the lowest common denominator for the target arch they are
shipping pre-compiled versions for each generation and using run-time
detection to figure out which to load.

Nothing stopping any other distro from copying the approach other than detail
work and increased package sizes.

------
Wowfunhappy
> [Clear Linux is] highly tuned for Intel platforms, with all performance
> optimizations enabled by default. Those optimizations occur across the
> entire stack: kernel, libraries, middleware layers, frameworks and runtime.

So, why aren't these optimizations available on other distros, since they seem
to make such a difference? Is it only a matter of time?

~~~
jdsully
Gentoo was like that. Everything was compiled with -march=native. It didn’t
result in a noticeable improvement and you had to wait for everything to
compile.

It seems with this distro you won’t have to wait. But I can’t imagine the
speed ups will be that noticeable.

~~~
Wowfunhappy
Well, the article made them look quite noticeable. 10% is a lot.

~~~
jdsully
10% is not very compelling to get people to switch. Gentoo has boasted numbers
like that for years. Admittedly I would definitely consider this over Gentoo
(no compiling) - but even one issue that’s not easily googled would eliminate
the benefits of this distro.

------
cdbattags
Hmmmm, so what's Intel's play with this distro? The new RHEL replacement?

Otherwise, Forbes article for this kind of "opinion" makes me very conscious
about if this was "prompted" or not. No harm intended just want to be
conscious about the behind the scenes stuff.

Author is Jason Evangelho
([https://twitter.com/killyourfm](https://twitter.com/killyourfm)) who seems
pretty neutral.

~~~
mixmastamyk
To enable better performance on Intel processors, to encourage folks to seek
them out, and sell more of them, would be my guess.

------
gnufx
I'm not surprised if people are confused about this. I don't think there's any
magic involved, but examples I've seen about what they do don't make sense for
the numerical code they talk about.

For instance, [https://clearlinux.org/news-blogs/transparent-use-library-
pa...](https://clearlinux.org/news-blogs/transparent-use-library-packages-
optimized-intel-architecture) talks about architecture-specific versions of
OpenBLAS, but on x86_64 OpenBLAS dispatches on the architecture. Also it has
similar DGEMM performance to MKL, at least on for AVX2 and below, unless
avx512 has been improved recently; BLIS is competitive also for AVX512. The
example does imply something useful that they seem to have done. That's to add
SIMD hwcaps for dynamic loading. This is an important omission from vanilla
Linux/ld.so, which means you can't build specific libraries and automatically
get the appropriate one loaded for the architecture you're on. (Obviously you
can arrange to get the appropriate one with LD_LIBRARY_PATH or ld.so.conf,
but...) As far as I remember, the only thing that works for on x86_64 is TLS,
i.e. there's a /usr/lib64/tls on Fedora-ish systems, but not a similar one for
avx2.

Clear Linux has a script which looks at GCC optimization reports and adds
target_clones attributes to C(++?) functions which report they're
vectorizable. Many of those won't be helpful. The example I've seen, but don't
have to hand, was for FFTW but, like OpenBLAS, that dispatches to SIMD-
specific kernels. What the script picks up in the example is useless; I think
it's just in a test harness, but at least not something that will make your
FFTs go faster. That sort of thing can actually be harmful if firing up the
SIMD unit lowers the clock rate to no good effect.

There may be problems with that anyway. I haven't had a chance to investigate
closely, but adding target_clones to the generic C kernels for BLIS' DGEMM
doesn't get the performance that it should, compared with a straight -march=.
[It might be worth noting that you can get about 2/3 the performance of the
hand-tuned DGEMM kernels with the generic C and appropriate GCC flags.]

This stuff isn't specific to Intel hardware (v. AMD) except insofar as they
choose specific targets, and Zen is similar to Haswell for linear algebra
kernels.

------
etaioinshrdlu
I wish Nvidia drivers would work on this OS.

Probably the biggest blame is Nvidia for making a compatibility mess with
their drivers.

This would be for cuda on servers.

