> Rust is the fastest CPU language. This is not terribly surprising, as it has a mature compiler, and its default behaviour of unboxing everything is exactly what you need for this program.
> What is not visible from the above table is that most of the implementations were significantly slower in their original formulation. Only Futhark, MPL, and Rust are essentially unchanged from their first straightforward implementation. For the others, most of the performance comes down to various low-level tweaks, in particular avoiding boxing and allocations. This is not exactly unexpected, but I still find it sad that when it comes to performance in functional languages, we must think about the compiler more than we think about the language.
Unfortunately, it seems most functional languages are built on infrastructure that was intended for procedural languages (Rust->llvm, F#->.Net, Scala->JVM, etc.) . This means that by the time that infrastructure starts working on the code, it has already been stripped of its most valuable inferential assets: the resolved types, the statically determined immutability, etc..
Rust is the first language I've seen that is trying to do something about it. I think the idea is that MIR is an intermediate language that still retains type information, after lifetimes have been resolved. Using this, they can implement better analysis and preoptimize code before it hits LLVMs optimizers.
MLton also keeps type information around basically until machine code generation (as far as I understand, at least). There have also been several interesting papers/projects about typed assembly languages.
But what about Haskell and Scheme? They’re the first two languages that come to mind when I think functional.
I think all those other languages are built on existing infrastructures like .Net or JVM because infrastructure is one of the biggest barriers for any language. I’d love to see a general purpose platform designed for functional languages, but I don’t even know what that would look like.
Multicore OCaml compiler doesn’t yet use the new flambda2 optimisation passes. We witnessed good improvements with flambda2 on stock OCaml on the tight loops in this program; the compiler was doing all the right things. It seemed not worthwhile to continue with hand optimising the implementation leading to non-idiomatic FP code. We hope to get back to this once flambda2 Passes are integrated with Multicore OCaml compiler.
Given that, then higher level languages are merely another layer of abstraction. C is procedural, and while it can be functionally separated, it still maps down to assembly, and then to machine code.
Same goes for Lisp, or Haskell, or any other functional language. It still follows the chain down, and maps to assembly and machine code.
Given that, then what is the ability for a purely functional language, like Haskell, or whatnot, to utilize the full feature of today’s multi-core CPUs?
For example, a common functional approach, is to use a map() function, and apply a specific function, against a set of data. Say, you have a list of 1000 strings, and you want to capitalize it all, and return the new list of all capitalized strings. Then, you’d have another function called upper() that will apply to each element in that list.
So, your function call might look something like:
capped_list = map( upper, my_list )
So, the question is, do functional languages automatically have the ability to take this instruction, and spread the data over multiple cores, or multiple processors, in order to return the final answer, in a faster or more efficient process than a regular procedural language?
Both functional and imperative languages can auto-vectorise, although in a functional language it's a lot easier to check that the map is side-effect-free! If your "upper" function is allowed to mutate other list elements then suddenly order of execution matters.
C#'s functional-flavoured sublanguage of LINQ has an auto-paralleliser: https://docs.microsoft.com/en-us/dotnet/standard/parallel-pr...
(I absolutely love writing functional-style code in a fairly-strongly-typed imperative language, because you can usually get all the benefits of functional when you need them and not trip over yourself when you need to actually do IO or interface with the system)
Purely functional languages make it trivial to get correct parallelism. In all cases, such a 'map' can be parallelised without risk of race conditions. Impure languages rely either on complex analyses to figure out when parallelism is safe, or on programmer annotations.
Whether it's a good idea to parallelise 'map' automatically is a different question. More parallelism is not always better, as all parallelism comes with overhead. Indeed, the entire point of this ray tracing benchmark is to investigate to which degree the trivial exploitation of parallelism also translates to good performance in practice (as that is ultimately the purpose of parallelism).
Generally not, because you need a decent seize dataset and/or operation duration before parallelization has an advantage
Capitalizing 1000 strings is not going to be faster in parallel, but 1 billion probably is. So it solving 10 ILP problems.
However, the advantage is that when you want to parallelize, the risk of bugs, race conditions or unintended ordering is much, much smaller in functional programming languages.
Yes. waits for the Wolfram hate
I think you mean no. It's not automatic, you have to tell the language to use parallel constructs.
And as others have said, this is probably the right approach. At least if it's not taking runtime information into account.
What is very good about this approach (and Rayon does something similar for Rust), is that the code transformation between sequential and parallel constructs is so small. The closer one can come to writing idiomatic code then tweaking it slightly for high performance, the better.
Apologies if this answer isn't hate-y enough. I'll try harder next time.