Hacker News new | past | comments | ask | show | jobs | submit login
R, OpenMP, MKL, Disaster (jyotirmoy.net)
117 points by yomritoyj on Oct 3, 2021 | hide | past | favorite | 42 comments



In a previous life, almost a decade ago, I fought very similar fights with OpenMP and MKL using R. It's painful and you need to pay heed to all these small details pointed out in the docs as in OPs case. However, it's worth noting that OpenBLAS is as fast as MKL, at least if you compile it yourself for your system (i would expect that system provided ones with system detection would be as good, but that wasn't always the case back then). I benched this extensively for all my R usecases and for several systems that i cared about back then. So there is usually no need to use MKL in the first place.


OpenBLAS

OpenBLAS is incompatible with application threads. Most Linux distributions provide a multi-threaded OpenBLAS that burns in a fire if you use it in multi-threaded applications. Even though OpenBLAS' performance is great, I'd be careful to give a general recommendation for people to rely on OpenBLAS. Like this MKL example, you have to be aware of its threading issues, read the documentation and compile it with the right flags (in a multi-threaded application: single-threaded, but with locking).

it's worth noting that OpenBLAS is as fast as MKL

This depends highly on the application. E.g. MKL provides batch GEMM, which is used by libraries like PyTorch. So if you use PyTorch for machine learning, performance is still much better with MKL. Of course, that is if you do not have an AMD CPU. If you have an AMD CPU, you have to override Intel CPU detection if you do not want abysmal performance:

https://danieldk.eu/Posts/2020-08-31-MKL-Zen.html

https://www.agner.org/optimize/blog/read.php?i=49

The BLAS/LAPACK ecosystem is a mess. I wish that Intel would just open source MKL and properly support AMD CPUs.


> OpenBLAS is incompatible with application threads. Most Linux distributions provide a multi-threaded OpenBLAS that burns in a fire if you use it in multi-threaded applications.

Can you explain what you mean by this? Are you saying there's a correctness issue here? I only recall running into issues with MPI, where you (typically) run one MPI rank (process) per CPU core. Then if you combine that with a multi-threaded BLAS library you'll suddenly have N^2 BLAS threads fighting over the CPU's and performance goes down the drain. The solution to this is, like you say, to use a single-threaded OpenBLAS, or then the OpenMP OpenBLAS and set OMP_NUM_THREADS=1

I guess with threads you'll have the same issue if you launch N cpu-bound threads and all those call BLAS, resulting in the same N^2 issue as you see with MPI.


Can you explain what you mean by this?

There is a nice description of this:

https://github.com/xianyi/OpenBLAS/issues/2543

At a previous employer, we have seen various issues, including crashes, non-determinisms, etc. Usually, these issues would go away when switching to MKL.


One of the more painful issues is hanging (lockup) at full CPU usage. At my workplace, initially we introduced a timeout to workaround the hang while trying to determine the cause of the hang. It happened within multithread R code. Various build flags for OpenBLAS have been tried to no avail. Setting OPENBLAS_NUM_THREADS=1 surely makes the problem go away, at the expense of performance.

That R code has since been ported to Python, but we faced the same issue again when using ThreadPoolExecutor, so we had to change it into ProcessPoolExecutor instead.


Debian and Fedora provide serial, OpenMP, and pthreads versions of lilbopenblas. Are you sure OpenBLAS doesn't detect nested OpenMP? I thought it did, though I'd normally use the serial version outside something like R, but if you mix different low-level simple pthreads with high-level OpenMP, you can expect problems. OpenBLAS is fine generally -- competitive with MKL on Intel hardware and infinitely faster on ARM and POWER. For PyTorch, presumably you want libxsmm (which is responsible for MKL's current small matrix performance). On AMD hardware, I don't understand why people avoid AMD's support, which is just a version of BLIS and libflame. (BLIS' OpenMP story seems better than OpenBLAS'.) The linear algebra story on GNU/Linux distributions would be less of a mess without proprietary libraries like MKL. It's fine if you take the Debian approach, in significant experience running heterogeneous HPC systems. Fedora has cocked up policy through not listening to such experience, but you can do the Debian-style thing with the approach of https://loveshack.fedorapeople.org/blas-subversion.html (and see the old R example refuting the MKL story). That's one example of the value of dynamic linking.


On AMD hardware, I don't understand why people avoid AMD's support, which is just a version of BLIS and libflame.

A year ago, I benchmarked a transformer network with libtorch linked against various BLAS libraries (numbers are in sentences per second, MKL with CPU detection override on AMD, 4 threads):

Ryzen 3700X - OpenBLAS: 83, BLIS: 69, AMD BLIS: 80, MKL: 119

Xeon Gold 6138 - OpenBLAS: 88, BLIS: 52, AMD BLIS: 59, MKL: 128

I guess people avoid AMD's support, because MKL is just much faster? AMD BLIS did add batch GEMM support since then. Didn't have time to try that out yet.


I was thinking of the usual complaint about Intel not supporting AMD hardware that is common in HPC.

We don't know what that example was actually measuring, except apparently not the same thing for BLIS and MKL. On the basis of only that, it's not reasonable to say "just so much faster", in particular for what I care about. I have Zen2 measurements (unfortunately only in a VM) using the BLIS test/3 framework. MKL came out nearly as fast as vanilla BLIS 0.7 and OpenBLAS on serial DGEMM, less so on the rest of D level 3, and nowhere close with S, C, and Z. Similarly for one- and two-socket OpenMP. At least in that "2021" version of MKL, there's only a Zen DGEMM kernel.


Have you set the environment variable OPENBLAS_CORETYPE to specify the CPU?


I went further than that, I profiled with perf and checked that the right kernels were used.


> The BLAS/LAPACK ecosystem is a mess. I wish that Intel would just open source MKL and properly support AMD CPUs.

Given that their latest compilers are based on LLVM, that seems like a fair trade between the closed- and open-source worlds.


> OpenBLAS is incompatible with application threads.

I’ve never had any issue when using it in OpenMP codes (either compiling it myself or using the libopenblas_omp.so present in some distros), what do you mean by “burn in a fire”?


> OpenBLAS is incompatible with application threads

R is single-threaded.


> i would expect that system provided ones with system detection would be as good, but that wasn't always the case back then

Also in a previous life, I recall running into distro openblas packages that were not compiled with DYNAMIC_ARCH=1 (which enables the openblas runtime cpu target architecture selection, similar to e.g. MKL) but were instead compiled with some lowest common denominator x86_64 arch. I filed some bug(s?), and IIRC this problem has subsequently been fixed.


> As a good citizen he wanted to file a documentation bug to have this behaviour documented. But R’s bug tracker seems not to be open to the public. So the story has to be recorded here for Google to find.

Huh? The bug tracker is here:

https://bugs.r-project.org/

Yes, for filing a bug you need to request an account because they apparently were overwhelmed with spam, as documented here:

https://www.r-project.org/bugs.html


The first place to report this would be on the R-devel mailing list.


Systems that fail without producing an error or warning are worrying and unsettling to me.


I had a similar problem in a prediction pipeline a few years back. If I remember correctly, someone updated a R package to the next minor version. The package was to read an obscure file format. The fix installed a new C++ library. That C++ library somehow interacted with a second R package (using a specialized type of linear model) when compiled at source and all the results coming out of our package were subtly wrong but only with large files.

It turns out the way the second R package would determine the required precision of floats in sparse arrays was based on the compiled linear algebra libraries available. It took a week for us to debug and ultimately it was easier for us to just rewrite the whole thing in Python.

Renv has made things easier but I don't think packrat/renv allows you to lock C/C++ libraries as well as R ones.


It's perhaps worth saying that if you must mix OpenMP libraries built against the LLVM (Kuck/Intel) runtime, and GNU GOMP on GNU/Linux: Ensure libomp is built with GOMP compatibility, however that's configured, make a shim from the result, like

    gcc -shared -Wl,-soname=libgomp.so.1 -o libgomp.so.1 empty.c -lomp5
where empty.c is an empty file, and put the result on LD_LIBRARY_PATH ahead of the real libgomp. Alternatively, preload the compatible libomp5. On Debian 11 there's already a libgomp in the llvm packaging. Dynamic linking assumed, as is right and fitting.


I don't have MKL to try this out, but I'd check that the MKL threading choice actually didn't break the initialization to 1.0 loop.

That is, instead of checking after doing the x[i] *= SCALE bit with cblas, I would check both before and after the scaling.


In this excellent article Patrick Li, the author of a new optionally type language Stanza and co-founder of JITX (YC S18), provided a compelling reason to design a new language [1]. TL;DR, a powerful language like Ruby enabled the creation of powerful RoR library and framework that help spawned unicorn size startups like Github and Twitter, that's otherwise not feasible.

I want to add another dimension to this argument, what if we can maintain an existing language eco-system (library, community, etc) but modernize the engine that's running and compilation of the R language. This new engine can avoid the dreaded global locking limitation, provides native multi-thread applications and seamless interface with non-native R libraries in C/C++. Interestingly someone has tried this, with a sponsorship from Oracle no less, and presented this futile effort in the last year's R flagship conference keynote [2].

IMHO he will be more successful in his endeavour if using D language in his previous endeavors. What so special about D language you may ask? I would refer to the fact most of the languages do not provide Ruby on Rails (RoR) like tool except D but that for another story (see ref [1]). There's also the fact that D has a working alternative library to OpenBLAS and MKL, and it's even faster than both them five years back [3]! D also supports open method as an alternative for multiple dispatches that is much touted by Julia language community. D is also bringing native support for borrow checker feature that's always mentioned in the same sentence as Rust language. In addition D also has second to none FFI support for C and C++ language. Heck the latest D compiler has standard C compiler built-in. I can go on furthermore but I think you've probably already got the pictures.

My not so humble proposal to R and D language community is to compile R on top of D language. Essentially you a have dynamic language of R that is compiled at runtime (CTFE) on top of static D language. This approach is becoming more popular now as posted recently for the new Val and Valet language combination [4]. Just think of CTFE as the new JVM, but provides truly static and native compilation for R.

[1] What makes a programming language productive? “Stop designing languages. Write libraries instead.”:

https://jaxenter.com/stop-designing-languages-write-librarie...

[2] Why R? 2020 Keynote - Jan Vitek - How I Learned to Love Failing at Compiling R:

https://www.youtube.com/watch?v=VdD0nHbcyk4

[3] Numeric age for D: Mir GLAS is faster than OpenBLAS and Eigen:

http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...

[4] Show HN: Val - A powerful static and dynamic programming language(val-lang.org):

https://news.ycombinator.com/item?id=28683171


As a software developer forced to work with data scientists who refuse to learn Python there is nothing I hate more than R.

R is good for explorative data analysis but useless for everything else.


Of course, if you read the article, you find out that the problem had nothing to do with R. It was a misconfiguration of the underlying linear algebra libraries that R (and Python and everything else) relies on. The author even made a minimal reproducible example in a single C script, no dependencies on R whatsoever.

I hear a lot of "R is bad, Python is Enterprise Production Quality (TM)" blather at my work. It's always because the people involved don't understand computers, don't read documentation, don't debug, don't do root cause analysis, and want to quickly pass off responsibility for their laziness and incompetence. Meanwhile I and my team are happily chugging away, producing millions of dollars of reliable value for my company in R year after year.

Python lags far behind R in wide swaths of data science. Pandas is inferior to both dplyr and data.table, and R's modeling capabilities blow Python's out of the water in breadth and depth. You only use Python when you have to, e.g. for unstructured data and deep learning type stuff.

If your colleagues make you deal with their bad R code, that's too bad, but don't blame the language. It's designed to be easy to use, so a lot of bad coders use it. Go train your bad coders or hire better ones.


I would completely concede that R has better libraries. However, getting stuff like online prediction into production is a real pain when the models are developed in R. And R is single threaded. There is no way to hide that detail.


R isn't the best for production predictions for sure (it can work though). But it's not hard to translate well-designed R processing pipelines and models into other languages if you must. The problem is that R programmers often don't know how to write good code in any language.

Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills.

The solution is for production engineers to understand just enough R to set standards for data scientist code that enable reliable translation of the models to the production language. As with JS, you can complain about the yucky parts, or you can accept that it's the best tool for some jobs and make an effort to work around the yucky parts, or use the tools of those who are doing that (e.g. tidyverse and Wickham).

If you want data scientists to produce production-ready results, you have to hold them to the standards of production engineering.


"Same issue as Excel, really. Easy to use, so you get a lot of users with very thin engineering skills."

Huh?

While I totally agree with your quote, I'd think it applied a lot more to python than to R. Especially given that python seems to be the dominant "first language for people to learn when they get into programming" because it is "easy".


The proportion in R is higher because the community of software engineers working in R is a lot smaller. R coders are overwhelmingly data analysts, while Python coders have more diverse roles. People who use R are also much more likely to have learned R, and only R, from their university courses towards a data science-related degree, especially if that degree is in statistics.


R is a language people use when they get into statistics, not even thinking specifically of programming.


> And R is single threaded. There is no way to hide that detail.

Python isn't much better in this regard, thanks to the GIL.

What I actually found most baffling when I delved into R is the fact that it doesn't support 64bit integers (lack of proper native UTF-8 support coming a close second).


> Python isn't much better in this regard, thanks to the GIL.

Take some standard ML model built with Caret or LME4 and try serve predictions with Plumber in R. It’s significantly more painful than using sklearn + FastAPI. You either need to use future::promise (which still sucks because it’s forking new R runtimes) or forgo this and go K8s or something similar.

I don’t get the love for RStudio either. It crashes frequently for me, or locks up randomly. The debugging experience is abysmal compared to PyCharm. Getting reproducible R builds are a pain, slightly alleviated by Renv. But not really if you want separate dependencies for dev and production.

Python and R tooling are not comparable. You will have serious issues operationalising R. Skills that most statisticians are simply not equipped to deal with, and serious software engineers will hate about R.


FWIW I have many years of full-time RStudio dev experience, and while I've definitely had a few hard-to-explain crashes, I'd characterize it as very reliable overall. When problems arise they tend to be due to community-contributed packages, especially packages that call out to C++. (My name is on the bug fix log for some major packages.)

Unintentional and unnecessary creation of huge, memory-hogging objects is a closely related footgun. Packages are often not built with large data in mind and make choices that scale terribly, such as storing multiple copies of the data in the model object, or creating enormous nonsparse matrices to represent the model term structure. It's a legacy of the academic statistics culture R grew out of. Most researchers test their fancy new method on a tiny dataset, write a paper, and call it a day.

No argument about the debugging experience. I find it very slow, especially with large datasets, and try to avoid it. Not much experience with reproducible R builds but I wouldn't be surprised if it was a pain.


Wow, tell us how you really feel. How much have you used R and Python? Maybe those data scientists would prefer if you didn't viscerally hate the main data/statistics language and didn't call it useless for things beyond a narrow use-case. It may lead to better outcomes if people hated things less and tried to understand the valid use-cases, for instance the reams and reams of statistics that can be done on R where Python may lag behind, since R is the lingua franca of statistics and research.


I've never seen anything for Python that allows you to a linear algebra-based code and run it at maybe petascale with trivial modifications. There's an R example somewhere under https://pbdr.org/publications.html


R is way more powerful and flexible for data science stuff. (Going from Python to R is almost like going from Excel to Python.)


with regard to good software engineering, there is a funny thing about python. Simply cut-and-paste code from one env to another can completely destroy the program, if the cut-and-paste messes with the indentation.

Now some people say this can be solved with a good IDE. Which might (or might not) be true if you can reliably identify, by manually reviewing the code, the ends of the functions, loops, etc which got munged in the paste.

But interestingly enough, jupyter notebooks (which seem to be the go-to tool these days) aren't IDEs. Making it incredibly easy to fubar otherwise perfectly working code by pasting it from your local IDE into, let's say an AWS Sagemaker instance, to pick one random example of a current widely used jupyter implementation. So even if the problem could be fixed by a good IDE, there is no guarantee that that IDE is (easily) accessible for production code.

I just have a hard time seeing how such a fundamental flaw in a language can lead to "good software engineering"


So don’t mess up the indentation when you paste. Seriously, in my 15 years of using Python on a daily basis this hasn’t been a problem once.


It could be worse. They could learn python and still prefer to use R!


I don't know why you're getting downvoted. I was one of the data guys you mentioned who learned R first and resisted python. There are a lot of things about R that leads users to develop very bad habits. The only reason R caught on in the first place is because python did not have mature libraries for data analysis for a long time.


All languages strike a tradeoff between flexibility and enforcing a regular structure. A lot of people seem to think their preferred language hits the perfect point on that tradeoff, and judge any language that makes a different choice. Python lovers judge R, Java users judge Python, C++ users judge Java, Rust users judge C++, Go users judge Rust, and everyone judges JavaScript.

A language that's more flexible than your favorite "encourages bad habits", while a language that's less flexible than yours is "bureaucratic".


It's not the flexibility that encourages bad habits. Lisps are incredibly flexible, for instance, but do not generally encourage bad habits. R encourages bad habits because the language itself and its libraries are not very well-designed. The language is powerful and useful, but it's also a mess.

R encourages bad habits for the following reasons:

- R is made "for statisticians, by statisticians" so a lot of the example code out there is very poorly written

- The syntax is very inconsistent across libraries, and even within base R

- There are a lot of syntactical quirks that cause a lot of confusion for anyone who's learned another language, like using dots in function tables, e.g. "read.csv". There's also the 1 indexing.


> Lisps are incredibly flexible, for instance, but do not generally encourage bad habits

People often say "LISP is so powerful but nobody can understand anyone else's code". That's the dominant explanation for why LISP isn't more popular (along with the "oatmeal with toenail clippings mixed in" syntax, which most people don't find readable, regardless of the fervent beliefs of the LISP community to the contrary). The community stays small because of the unappealing syntax, and even within the community people find it hard to work together, because everyone has their own style, so the kind of coding and collaboration that produces generally useful libraries doesn't tend to happen. I would argue there's no meaningful distinction between "bad habits" and "habits that inhibit the development of generally useful software". In fact, that's the most useful definition of "bad habits" I can imagine.

Flexibility is a root cause of bad habits thus defined, because flexibility is what enables people to make bad choices. Language designers have long recognized this. It's why certain languages impose heavy restrictions on how you can structure your code, from Java's everything-is-a-class to Python's indentation-based scoping. They have a certain vision for what constitutes effective code and they know it won't happen unless they force everyone to follow it. In other words, they choose to reduce flexibility to prevent what they consider to be bad habits.

> R encourages bad habits for the following reasons:

Your reasons are just common complaints about R issues, not actual arguments that these issues encourage bad habits.

> - R is made "for statisticians, by statisticians" so a lot of the example code out there is very poorly written

At best this is an argument that widely publicized badly written R code encourages bad habits, not R itself.

> There's also the 1 indexing

There's a big difference between "language feature I don't like" and "language feature that encourages bad habits". A language that has different conventions than your favorite language is just different.


it is not much as R, as the (relative) unwillingness to break compatibility or enforce global standards. Why do all string functions do not accept UTF-8?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: