The lock-in is an important consideration, but if the scikit-learn API is fully respected it would seem less relevant. It also suggests a pattern for how other hardware vendors could accelerate scikit-learn as a genuine contribution?
Julia is one of those "nice in theory" options which has failed to live up to the hype and at this point seems unlikely to unseat python for most use-cases; it just doesn't have a good enough UX when used as a general purpose language.
There is no Effective Julia style of guide. You either have to wade through infantile tutorials for those with minimal programming experience or several reference books worth of nitpicking on syntax. The actual methods themselves are not well documented and lack examples and usage guidelines.
The language and ecosystem do not feel like a project backed by commercial funding, it feels like one of those functional languages out of academia research where the structure and design of the language are more important than actual developer experience. There are many new projects but most are not actively maintained and updated. The language itself feels massive, with syntactic sugar and weird types everywhere. Trying to understand the implementations of other people's Julia code is frustrating, similar to reading a library written in pure C++ templates. Compared to Go/Rust/Dart, Julia feels overly convoluted. Julia literature is structured in a way that seems to heavily encourage you to take regular classes and lectures to learn and pick up the language. It is hard to feel productive from the get-go.
This is a feature for a lot of Julia's core audience (data scientists like me, who grew up with R).
Getting started with Julia always just feels clunky to me - perhaps the other commenter was closer to the mark in blaming the documentation rather than the REPL itself. Either way, despite being a former scientist who has moved into IT (sadly), I get the distinct impression that the language is just not aimed at me. As such, I'm always surprised to see people trying to push it in settings outside its current realm of adoption; feels very much like the language maintainers have no real interest in that.
They also invest efforts in making it possible to write high performance kernels in Python using an extension to the numba Python compiler:
Currently some work is being done to improve computational primitives of scikit-learn to enhance its overhaul performances natively.
You can have a look at this exploratory PR: https://github.com/scikit-learn/scikit-learn/pull/20254
This other PR is a clear revamp of this previous one:
EDIT: Perhaps its my inexperience, but is anyone else confused by the OneAPI rollout? There isn't exactly backwards compatiblity with the Classic Intel compiler, and an embarassing amount of time elapsed until I realized "Data Parallel C++" doesn't refer to parallel programming in C++, but rather an Intel-developed API built atop C++.
GPU acceleration is not a magic "go fast" machine. It only works for certain classes of embarrassingly parallel algorithms. In a nutshell, the parallel regions need to be long enough that the speedup from doing them in the GPU's silicon outweighs the relatively high cost of getting data into and out of the GPU.
That's a fairly easy scenario to achieve with neural networks, which have a pretty high math-to-data ratio. Other machine learning algorithms, not necessarily. But basically all of them can benefit from the CPU's vector instructions, because they live in the CPU rather than out on a peripheral, so there's no hole you need to dig yourself out of before they can deliver a net benefit.
I would also say that what academics are doing is not necessarily a good barometer for what others are doing. In another nutshell, academics' professional incentives encourage them to prefer the fanciest thing that could possibly work, because their job is to push the frontiers of knowledge and technology.
Most people out in industry, though, are incentivized to do the simplest thing that could possibly work, because their job is to deliver software that is reliable and delivers a high return on investment.
Like, I would guess that the potential benefit to my team's productivity from eliminating (over)reliance on weakly typed formats such as JSON from our information systems could be orders of magnitude greater.
Problem is... the ASICs are really good for certain classes of ML problems but aren't really all that general.
What am I missing?
edit: it seems my instance was using AMD EPYC.
Also in you example it's tiny problem size that have lot of fluctuations. Basically in you code you are running stock both times.
this extension also would bring perf to AMD, although Intel would be better optimized
from sklearnex import patch_sklearn
# The names match scikit-learn estimators
from sklearnex import SVC
import sklearn --> import sklearnex as sklearn
from sklearn.cluster import KMeans --> from sklearnex.cluster import KMeans
In general, I'm a fan of "let me call the initializer myself, at program startup." It's especially important when you want reversibility, i.e. teardown in addition to initialization, which pops up all the time for unit tests.
There are other ways how you can enable this
Generally speaking the distribution-packaged versions of python and all its scientific libraries and their support libraries are best ignored. That stuff should always be rebuilt to suit your actual production hardware, instead of a 2007-era Opteron.
A few followups:
(1) Is this usable for non-intelex packages?
(2) What about packages not in conda's channels?
Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)
(A jump that large suggests to me they may be fixing issues in the default implementation that could also be fixed for other processors!)
I completely agree. I hope some Intel competitor funds a scikit-learn developer to read this code and extract all the portable performance improvements.
Especially with cloud providers making arm processors available at lower prices.
At the same time:
"Intel® Extension for Scikit-learn* is a free software AI accelerator that brings over 10-100X acceleration across a variety of applications."
Maybe their free software could be extended to all processors?
Not sure what kind of secret sauce they've included, but it is Intel so their specific advantage is that they know everything about their processors and can provide really low level optimizations which might not necessarily be super portable.
(I'm just guessing that a lot of the benefit here comes from building with Intel's compiler rather than GCC.)
It sounded like the bulk of the benefits they get are just from using profile-guided optimization to maximize the cache-friendliness of the code. I would guess those kinds of optimizations are readily portable to any CPU with a similar layout and cache sizes. I would not expect, though, that they are actively detrimental (compared to whatever the official sklearn builds are doing) on CPUs that have a different cache layout.
All of my machines still use Intels (other than my SBCs). So installing this and running it is trivial.
Intel is still a major contributor to the Linux kernel. Thus, all their CPUs have first-class support for it. AMD fired all their Linux engineers some time back. They never rehired them to my knowledge.
Then there's things like this (MKL libraries are another). Intel spends a lot more money on development of these little libraries which does meaningfully speed up processes. Those processes affect my day-to-day work as a software engineer.
That adds up when I have to deploy on the cloud. ARM is not quite there yet and little hiccups at deploy time are a pain when the cost difference is not so significant relative to the hourly cost of my time. Linus Torvalds pointed this out about ARM, stating it couldn't ever take off unless it took off on the desktop.
My understanding is that AMD regularly contributes to the Linux kernel for their CPU and GPU lines. How would they do this without Linux engineers?
Intel has done similar work before in the C/Fortran world; see BLAS, LAPACK, and FFTW vs MKL.
Just requires an x86 processor with "at least one of SSE2, AVX, AVX2, AVX512 instruction sets."
> oneDAL is part of oneAPI.
So oneAPI is cross industry but this only works with Intel CPUs?
Hmm. Not sure I’m buying this Intel. Sounds like you’re claiming to be open but locking people into Intel only libraries.
CuML is similar to Intel Extension for Scikit-Learn in function?
> cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.
I have a bunch of notebooks that take 4-8 hours to run. This could potentially make my life much easier.
You likely not exporting algorithms from scikit learn again after patching - i.e. patch call should be made prior to import.
Note there are kaggle notebooks that showcase same optimizations - https://www.kaggle.com/napetrov/tps04-svm-with-intel-extensi...
They are also basically notebooks
Also: how in th
Maybe notebooks require something more?