
Parallel ray tracing benchmark for functional programming languages - marcle
https://github.com/athas/raytracers
======
cs702
These are the most interesting bits for me:

 _> Rust is the fastest CPU language. This is not terribly surprising, as it
has a mature compiler, and its default behaviour of unboxing everything is
exactly what you need for this program.

> What is not visible from the above table is that most of the implementations
> were significantly slower in their original formulation. Only Futhark, MPL,
> and Rust are essentially unchanged from their first straightforward
> implementation. For the others, most of the performance comes down to
> various low-level tweaks, in particular avoiding boxing and allocations.
> This is not exactly unexpected, but I still find it sad that when it comes
> to performance in functional languages, we must think about the compiler
> more than we think about the language._

~~~
darksaints
There is so much performance left on the table by strongly typed functional
languages when it comes to optimization.

Unfortunately, it seems most functional languages are built on infrastructure
that was intended for procedural languages (Rust->llvm, F#->.Net, Scala->JVM,
etc.) . This means that by the time that infrastructure starts working on the
code, it has already been stripped of its most valuable inferential assets:
the resolved types, the statically determined immutability, etc..

Rust is the first language I've seen that is trying to do something about it.
I think the idea is that MIR is an intermediate language that still retains
type information, after lifetimes have been resolved. Using this, they can
implement better analysis and preoptimize code before it hits LLVMs
optimizers.

~~~
azhenley
> _Unfortunately, it seems most functional languages are built on
> infrastructure that was intended for procedural languages_

But what about Haskell and Scheme? They’re the first two languages that come
to mind when I think functional.

I think all those other languages are built on existing infrastructures like
.Net or JVM because infrastructure is one of the biggest barriers for any
language. I’d love to see a general purpose platform designed for functional
languages, but I don’t even know what that would look like.

~~~
dsstuifg
Well, haskell has the problem of lazyness being the default execution model,
which doesn't really help.

------
marcle
This is an interesting benchmark comparing parallel functional programming
languages, including Rust, Parallel OCaml and MPL. MPL is "a parallelism-
oriented fork of MLton for Standard ML" which is described as being
"definitely the star" (see also [http://www.cs.cmu.edu/~swestric/20/popl-
disentangled.pdf](http://www.cs.cmu.edu/~swestric/20/popl-disentangled.pdf)).
This benchmark was inspired by Jon Harrop's ray tracer benchmark
([https://www.ffconsultancy.com/languages/ray_tracer/index.htm...](https://www.ffconsultancy.com/languages/ray_tracer/index.html)).

~~~
kcsrk
Having optimised the Multicore OCaml implementation, this benchmark benefits
heavily from inlining and unboxing. MLton style aggressive inlining and
unboxing is the way to go here. MPL builds on MLton and it works out great!

Multicore OCaml compiler doesn’t yet use the new flambda2 optimisation passes.
We witnessed good improvements with flambda2 on stock OCaml on the tight loops
in this program; the compiler was doing all the right things. It seemed not
worthwhile to continue with hand optimising the implementation leading to non-
idiomatic FP code. We hope to get back to this once flambda2 Passes are
integrated with Multicore OCaml compiler.

------
blackrock
The CPU and the von Neumann architecture itself is procedural. High level code
is mapped to assembly code, which is then mapped to binary machine code.

Given that, then higher level languages are merely another layer of
abstraction. C is procedural, and while it can be functionally separated, it
still maps down to assembly, and then to machine code.

Same goes for Lisp, or Haskell, or any other functional language. It still
follows the chain down, and maps to assembly and machine code.

Given that, then what is the ability for a purely functional language, like
Haskell, or whatnot, to utilize the full feature of today’s multi-core CPUs?

For example, a common functional approach, is to use a map() function, and
apply a specific function, against a set of data. Say, you have a list of 1000
strings, and you want to capitalize it all, and return the new list of all
capitalized strings. Then, you’d have another function called upper() that
will apply to each element in that list.

So, your function call might look something like:

    
    
      capped_list = map( upper, my_list )
    

And your return data, capped_list, will contain the list of all strings,
capitalized. Simple enough.

So, the question is, do functional languages automatically have the ability to
take this instruction, and spread the data over multiple cores, or multiple
processors, in order to return the final answer, in a faster or more efficient
process than a regular procedural language?

~~~
goldenkey
[https://reference.wolfram.com/language/ref/ParallelMap.html](https://reference.wolfram.com/language/ref/ParallelMap.html)

[https://reference.wolfram.com/language/ref/Parallelize.html](https://reference.wolfram.com/language/ref/Parallelize.html)

Yes. _waits for the Wolfram hate_

~~~
lucozade
> Yes

I think you mean no. It's not automatic, you have to tell the language to use
parallel constructs.

And as others have said, this is probably the right approach. At least if it's
not taking runtime information into account.

What is very good about this approach (and Rayon does something similar for
Rust), is that the code transformation between sequential and parallel
constructs is so small. The closer one can come to writing idiomatic code then
tweaking it slightly for high performance, the better.

Apologies if this answer isn't hate-y enough. I'll try harder next time.

~~~
goldenkey
If you wrap your entire program block in Parallelize, it becomes automatic.
Let's not be too pedantic.

------
reikonomusha
Now, debug the same programs at runtime. :)

