Hacker News new | past | comments | ask | show | jobs | submit login
Parallel ray tracing benchmark for functional programming languages (github.com)
96 points by marcle 28 days ago | hide | past | favorite | 22 comments



These are the most interesting bits for me:

> Rust is the fastest CPU language. This is not terribly surprising, as it has a mature compiler, and its default behaviour of unboxing everything is exactly what you need for this program.

> What is not visible from the above table is that most of the implementations were significantly slower in their original formulation. Only Futhark, MPL, and Rust are essentially unchanged from their first straightforward implementation. For the others, most of the performance comes down to various low-level tweaks, in particular avoiding boxing and allocations. This is not exactly unexpected, but I still find it sad that when it comes to performance in functional languages, we must think about the compiler more than we think about the language.


There is so much performance left on the table by strongly typed functional languages when it comes to optimization.

Unfortunately, it seems most functional languages are built on infrastructure that was intended for procedural languages (Rust->llvm, F#->.Net, Scala->JVM, etc.) . This means that by the time that infrastructure starts working on the code, it has already been stripped of its most valuable inferential assets: the resolved types, the statically determined immutability, etc..

Rust is the first language I've seen that is trying to do something about it. I think the idea is that MIR is an intermediate language that still retains type information, after lifetimes have been resolved. Using this, they can implement better analysis and preoptimize code before it hits LLVMs optimizers.


> Rust is the first language I've seen that is trying to do something about it. I think the idea is that MIR is an intermediate language that still retains type information, after lifetimes have been resolved. Using this, they can implement better analysis and preoptimize code before it hits LLVMs optimizers.

MLton also keeps type information around basically until machine code generation (as far as I understand, at least). There have also been several interesting papers/projects about typed assembly languages.


> Unfortunately, it seems most functional languages are built on infrastructure that was intended for procedural languages

But what about Haskell and Scheme? They’re the first two languages that come to mind when I think functional.

I think all those other languages are built on existing infrastructures like .Net or JVM because infrastructure is one of the biggest barriers for any language. I’d love to see a general purpose platform designed for functional languages, but I don’t even know what that would look like.


Well, haskell has the problem of lazyness being the default execution model, which doesn't really help.


If you’re not already familiar with it, you may find the GRIN (Graph Reduction Intermediate Notation) compiler project interesting: https://github.com/grin-compiler/grin


The borrow checker nowadays operates on MIR, it is not done before MIR.


Was that how you got non-lexical lifetimes? Is it using data flow analysis during that?


Yep! MIR is CFG based rather than AST based.


The author claims to do research on optimizing compilers but fails to mention in the readme that running the Scala benchmarks with GraalVM (a different optimizing compiler) eliminated the need for most low-level tweaks (including the double precision tweak that is still mentioned in the readme) https://github.com/athas/raytracers/pull/21


This is an interesting benchmark comparing parallel functional programming languages, including Rust, Parallel OCaml and MPL. MPL is "a parallelism-oriented fork of MLton for Standard ML" which is described as being "definitely the star" (see also http://www.cs.cmu.edu/~swestric/20/popl-disentangled.pdf). This benchmark was inspired by Jon Harrop's ray tracer benchmark (https://www.ffconsultancy.com/languages/ray_tracer/index.htm...).


Having optimised the Multicore OCaml implementation, this benchmark benefits heavily from inlining and unboxing. MLton style aggressive inlining and unboxing is the way to go here. MPL builds on MLton and it works out great!

Multicore OCaml compiler doesn’t yet use the new flambda2 optimisation passes. We witnessed good improvements with flambda2 on stock OCaml on the tight loops in this program; the compiler was doing all the right things. It seemed not worthwhile to continue with hand optimising the implementation leading to non-idiomatic FP code. We hope to get back to this once flambda2 Passes are integrated with Multicore OCaml compiler.


The CPU and the von Neumann architecture itself is procedural. High level code is mapped to assembly code, which is then mapped to binary machine code.

Given that, then higher level languages are merely another layer of abstraction. C is procedural, and while it can be functionally separated, it still maps down to assembly, and then to machine code.

Same goes for Lisp, or Haskell, or any other functional language. It still follows the chain down, and maps to assembly and machine code.

Given that, then what is the ability for a purely functional language, like Haskell, or whatnot, to utilize the full feature of today’s multi-core CPUs?

For example, a common functional approach, is to use a map() function, and apply a specific function, against a set of data. Say, you have a list of 1000 strings, and you want to capitalize it all, and return the new list of all capitalized strings. Then, you’d have another function called upper() that will apply to each element in that list.

So, your function call might look something like:

  capped_list = map( upper, my_list )
And your return data, capped_list, will contain the list of all strings, capitalized. Simple enough.

So, the question is, do functional languages automatically have the ability to take this instruction, and spread the data over multiple cores, or multiple processors, in order to return the final answer, in a faster or more efficient process than a regular procedural language?


Generally they won't auto-parallelise, because the tradeoff between inter-thread communication time and performance as well as thread lifetime tends to need manual control.

Both functional and imperative languages can auto-vectorise, although in a functional language it's a lot easier to check that the map is side-effect-free! If your "upper" function is allowed to mutate other list elements then suddenly order of execution matters.

C#'s functional-flavoured sublanguage of LINQ has an auto-paralleliser: https://docs.microsoft.com/en-us/dotnet/standard/parallel-pr...

(I absolutely love writing functional-style code in a fairly-strongly-typed imperative language, because you can usually get all the benefits of functional when you need them and not trip over yourself when you need to actually do IO or interface with the system)


> So, the question is, do functional languages automatically have the ability to take this instruction, and spread the data over multiple cores, or multiple processors, in order to return the final answer, in a faster or more efficient process than a regular procedural language?

Purely functional languages make it trivial to get correct parallelism. In all cases, such a 'map' can be parallelised without risk of race conditions. Impure languages rely either on complex analyses to figure out when parallelism is safe, or on programmer annotations.

Whether it's a good idea to parallelise 'map' automatically is a different question. More parallelism is not always better, as all parallelism comes with overhead. Indeed, the entire point of this ray tracing benchmark is to investigate to which degree the trivial exploitation of parallelism also translates to good performance in practice (as that is ultimately the purpose of parallelism).


> So, the question is, do functional languages automatically have the ability to take this instruction, and spread the data over multiple cores, or multiple processors, in order to return the final answer, in a faster or more efficient process than a regular procedural language?

Generally not, because you need a decent seize dataset and/or operation duration before parallelization has an advantage

Capitalizing 1000 strings is not going to be faster in parallel, but 1 billion probably is. So it solving 10 ILP problems.

However, the advantage is that when you want to parallelize, the risk of bugs, race conditions or unintended ordering is much, much smaller in functional programming languages.



> Yes

I think you mean no. It's not automatic, you have to tell the language to use parallel constructs.

And as others have said, this is probably the right approach. At least if it's not taking runtime information into account.

What is very good about this approach (and Rayon does something similar for Rust), is that the code transformation between sequential and parallel constructs is so small. The closer one can come to writing idiomatic code then tweaking it slightly for high performance, the better.

Apologies if this answer isn't hate-y enough. I'll try harder next time.


If you wrap your entire program block in Parallelize, it becomes automatic. Let's not be too pedantic.


It's easy to create parallel algorithms. It's difficult to create an algorithm that scales linearly with core count.


Explicit parallelism can be much easier on abstract streams. And that is better than implicit parallelization. Running on many cores for max throughput is not important for many applications, as compared to power consumption and CPU / mem usage.


Now, debug the same programs at runtime. :)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: