
C++ vs. OCaml: Ray tracer comparison (2005) - lnmx
http://www.ffconsultancy.com/languages/ray_tracer/comparison.html
======
vvanders
Kinda missing the point of C++ by using a linked-list.

If you're going to spend the time to implement an algorithm in C++ at least
take the time to make it cache aware. Otherwise it's not really surprising
when the performance is the same when you're doing the exact same thing.

Would be curious to see what the performance looks like with a tightly packed
scene and static dispatch on a discrete number of primitives instead of using
virtual functions.

~~~
mden
I was curious about whether using a vector over list would have any effect
even though the size is so small and the answer seems no (I guess as it could
be expected).

It seems like the listed OCaml program has a default level of 9 where as the
C++ one uses 6?

Times using original listing (minus print stmts):

    
    
      $ time ./ray-cpp  
      real    0m3.369s  
      user    0m3.347s  
      sys     0m0.010s
    
      $ time ./ray-ml   
      real    0m4.710s  
      user    0m4.703s  
      sys     0m0.003s
    

Time after changing the 9 in the ocaml program to a six:

    
    
      $ time ./ray-ml  
      real    0m4.032s  
      user    0m4.026s  
      sys     0m0.003s
    

Seems like C++ has a good edge. Above was ran on a 64 bit machine.

~~~
vvanders
List is just one part that makes the C++ impl sub-optimal. All the scenes are
allocated on the heap so you're actually bouncing around memory twice(once for
the next link list entry, second for the actual heap allocation).

Lastly the virtual call again has the chance to cache miss(although probably
only once seeing as we've just got a few impls).

In a small example like this there's a good chance that most of the heap
allocations are close together but put this approach into production and you
can very easily see fragmentation start driving up your cache misses.

(FWIW I'd also store radius*radius on creation, use a BSP/kd-tree and all
sorts of other tricks that really help in these types of computations).

~~~
altern
I did a qucik and dirty test with levels = 14.

    
    
        # program          time   change        
        ./ray-ml:          21.46  # Original ocaml
        ./ray-cpp:         17.63  # Original c++
        ./ray-cpp-no-dtor: 13.86  # Removed all cleanup, ocaml doesn't do it
        ./ray-cpp-2:        8.90  # Changed list to vector
        ./ray-cpp-3:        7.89  # Removed virtual functions
        ./ray-cpp-4:        5.64  # Preallocated large array for all Groups

~~~
vvanders
Solid stuff, glad to see someone testing my assumptions :).

If there's one thing I could teach new CS-grads it's how much your memory
access patterns(and cache) really-effing-matter. Everything else is almost
noise in comparison.

~~~
sanderjd
I'm pretty curious about this: have memory access patterns and cache awareness
become more of a big deal in the past decade or so, or has it only recently
reached my awareness? I studied CS in the early 2000s and while I was aware of
various trade-offs between contiguous and linked data structures, it's only in
the last few years that the importance of cache (and non-"system" languages'
widespread flouting of its importance) has really bubbled up in my
consciousness.

It's certainly likely that I just didn't take the right kinds of classes, or
that I did but just missed or failed to really internalize this, but I'm
curious if something else is going on. Has the bottleneck increasingly become
memory rather than CPU? Have we more thoroughly exhausted other areas of
optimization?

I was thinking about this while watching a talk[0] given by someone from
Google about data structures in the STL, which decried that the standardized
specification for `std::unordered_map` more or less forces an implementation
to use buckets instead of the more cache-friendly technique of probing. I'm
sure that the people who standardized it were very competent, which makes me
think that, at least at the time, buckets were thought to be the better
implementation.

[0]:
[https://www.youtube.com/watch?v=fHNmRkzxHWs](https://www.youtube.com/watch?v=fHNmRkzxHWs)

Edit: Added video link.

~~~
blt
> _Has the bottleneck increasingly become memory rather than CPU?_

Yes.

Besides the basic memory latency vs. CPU clock speed issue, there's also:

\- CPU caches have gotten bigger, so the payoff of making an program cache-
friendly is larger

\- CPUs have gained SIMD vector ALUs and increased superscalar behavior, so
the CPU is capable of processing more data in per clock cycle

\- A cultural shift happened after the freewheeling Moore's Law era ended in
the mid-2000s, and people started examining performance more closely

[http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-
mem_...](http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-
mem_speed.pdf)

~~~
sanderjd
Thanks, this is what I had suspected, but it seems to be left unstated in
everything I've read.

------
kentonv
> Type inference removes the need to specify types explicitly over and over.

These days with C++11's auto, this is improved a lot. It's not full type
inference, but eliminates most of the boilerplate, and arguably what's left
you really want to keep for documentation purposes.

> The object-oriented representation of a trivial sum type as an inheritance
> hierarchy of classes is needlessly inefficient.

This use case strikes me as a particularly bad example of sum types: The
author wants to represent different kinds of scene members, like spheres and
groups, as a sum type. Using a sum type means that anyone wanting to add a new
kind of object has to add it to the sum type and handle it everywhere, which
will quickly become problematic. Using object-oriented design allows new
object types to be introduced without updating everything.

Basically the sum type looks nicer here only because it's a toy example and
we're not considering how it will evolve.

That said, I do think sum types are really useful in some use cases,
especially in compilers, and I'm sad C++ doesn't have them. I tend to end up
defining ASTs using Cap'n Proto as a convenient way to declare a tagged union,
even though I have no intent to serialize the structures.

> No need to write constructors for tuples, records and variant types in
> OCaml.

To be fair, for an all-public struct (as in the example), you don't need
constructors in C++ either.

Once you have private members, then of course you need a constructor.

> Native lists in OCaml.

C++11 initializer lists make it basically possible to write list/vector/array
classes that "feel native".

> Higher-order functions instead of loops in OCaml.

C++11 lambdas, etc. However, in a lot of cases, loops are easier to read.

~~~
vvanders
I've been really happy with C++ lambda's lately.

Would be nice to see Rust's Option<T> and Result<T,E> ported over to C++ since
I find that error handling extremely elegant compared to most C++ approaches.

~~~
loop2
Option has been around for a while as boost::optional or now
std::experimental::optional. Result is coming and has a reference
implementation as boost::expected. There's also a type-erased option type,
"any".

They're quite nice, especially with a few trivial additions for ergonomics.

~~~
vvanders
Yup, except you lost me as soon as you said "boost". I enjoy my fast compile
speeds very much. :)

------
chollida1
I work on a project that's a mix of F# and C++ and my code tends to break down
very similarly to this.

The actual algorithmic lines are very similar in size and readability but when
it comes to defining your data structures, F#/OCaml clearly wins.

The author of the website(Jon Harrop) is a very good writer and though his
books are expensive I'd recommend them to anyone learning OCaml or F#.

Specifically F#/OCaml for scientists is awesome( it comes in variants for both
languages) as is F# for Numerics.

~~~
duaneb
Yea, c/++ tends to have a memory oriented way of defining data structures.
Sometimes this is great, but most of the time there is a lot of mental
translation and work.

------
thelucky41
The benefit of this comparison is not that the languages have nearly identical
performance, but that OCaml can be briefer while accomplishing the same task,
with 47% less lines of code.

Saying nothing about comparing LOC in an imperative/OO language to a
functional one, does the brevity actually help a reader's understanding of the
code at all? It seems to me that a lot of the comparisons call out
descriptions of explicit actions in C++ where OCaml does the same action
implicitly.

That seems like a language trade-off more than a feature.

~~~
progman
Most people just consider the raw execution time to measure performance. In
long-term applications that really matters.

In usual applications however most people don't realize that the _development_
time should also be considered. In that sense a program written in a few lines
of Python could actually outperform a super optimized C++ program because it
usually requires more time and effort to code the same thing in C++.

------
vardump
From article:

> With two cross-overs, there is little difference between the performance of
> the OCaml and C++ implementations. Note that the C++ was compiled with CPU-
> specific optimizations whereas the OCaml binary will run on any 486
> compatible.

Ok, so the author is suggesting numerical C++ compiler generated SSE2+ code is
comparable to OCaml 486, and thus with only very slow stack based x87 FPU
available? SSE beats x87 hands down even without vectorization enabled, using
a single float/double (scalar) per XMM vector register! That should make a
_significant_ difference in any ray-tracer worth 2 cents.

I didn't bother reading the source code, but I think it's very likely he
didn't write C++ code like you should for a performance sensitive numerical
application.

Although maybe the code for scene graph, bounding box and hit test was bad?
FPU performance starts to matter once you actually find the ray intersecting
object...

Anyways, usually if you do use C++, you do so because of specific performance
requirements. Not to have the shortest or cleanest possible implementation.

~~~
xorblurb
The author seems to not know enough about what he is talking about. For ex. he
claims that the performance of passing a struct Vec { double x, y, z; } by
value would be very poor, which is trivially not true, and he uselessly wrote
some constructors for this and similar POD structs, even explicitly claiming
they are needed in C++ (and not in OCaml, implying superiority of the latter
on the conciseness criteria)

~~~
agumonkey
The compiler would have inferred the same constructor code ?

------
36erhefg
You can omit the OO in C++ and achieve a similar code size. If that is a
reasonable metric in the first place, is a question the article doesn't
address.

------
evanpw
I'm surprised the performance of the two implementations is so close. Aren't
floating-point numbers in OCaml usually boxed? Is the compiler smart enough to
optimize away the boxes in this situation?

~~~
rwmj
The floats in the structs are unboxed. Float arrays and all-float structs have
a special representation[1] in the compiler so they are stored unboxed. I
believe the developers of OCaml have expressed some regrets about this, but in
this particular case it's likely to be a win.

Within functions, the OCaml compiler can assign floats to registers, so those
would also be unboxed at least for the duration of the function.

[1] [https://realworldocaml.org/v1/en/html/memory-
representation-...](https://realworldocaml.org/v1/en/html/memory-
representation-of-values.html)

------
jguegant
One can do a ray tracer in O(1) at runtime in C++ ;)

[https://github.com/phresnel/metatrace](https://github.com/phresnel/metatrace)

Performance 100%!

OCaml is a really clean language though!

------
joe_the_user
_Amazingly, there is no easy way to define a type in C++ which can be "one of
these or one of these", e.g. a sphere or a group in this case._

The author may have assumptions I'm missing here but the standard way to have
a "one of these or one of these" thing is to have two classes inherent from a
given abstract base class and use a pointer to the base class.

If one wants the device specifically as a type, one could wrap the base-
pointer in a class.

Is there something I'm missing?

~~~
the_af
> _The author may have assumptions I 'm missing here but the standard way to
> have a "one of these or one of these" thing is to have two classes inherent
> from a given abstract base class and use a pointer to the base class._

I think this is what the author describes in the sentence that follows the one
you quoted. He also calls it "a common (ab)use of object oriented
programming."

~~~
AnimalMuppet
So it's reasonable to define data types in this way in OCaml, but it's abusing
OO to do so in C++? Given that classes are data types, that seems... less than
consistent.

------
mempko
I would love to see this updated using C++14.

~~~
lholden
Totally. I've been mulling over the idea of porting it. Would be interesting
to see how the performance compares vs the older style.

Between C++14 and some template usage, I'd imagine the LOC could be reduced a
lot.

------
tacos
And now's a perfect time to reflect on how the two languages evolved during
the past decade. C++ obviously moved forward quite a bit -- cleaning up the
loop constructs in particular.

OCaml? Uh... there's F# for realists, Scheme for optimists, and Haskell for
pedants. If you're actually writing OCaml web apps in OWebl I'd sure love to
know why. Amazon released a native C++ SDK for AWS if you're into pain. If
you're using "OPAM" to install the OCaml AWS client... wow.

~~~
rwmj
Let's see: opam as you mention. Loads of new libraries[0]. Lots of small
evolutions in the language in OCaml 4 [1] [2]. Lots of work on optimization in
ocamlopt. Aarch64, POWER7 & POWER8, replacement ARMv7 backends.

OCaml is totally a practical language for writing real world applications. As
I usually point out in these threads, Red Hat pay me to write OCaml
programs[3], that are used by thousands of customers and many more free
software users worldwide. The advantages (over other languages, not C++) are:
easy linking to C libraries, and builds a native binary with no extra
dependencies, speed. Advantages over C++: safety, robustness, compact source
code size.

[0] [https://opam.ocaml.org/packages/](https://opam.ocaml.org/packages/)

[1] [https://ocaml.org/meetings/ocaml/2014/OCaml2014-Leroy-
slides...](https://ocaml.org/meetings/ocaml/2014/OCaml2014-Leroy-slides.pdf)

[2]
[https://ocaml.org/meetings/ocaml/2013/slides/leroy.pdf](https://ocaml.org/meetings/ocaml/2013/slides/leroy.pdf)

[3]
[https://github.com/libguestfs/libguestfs](https://github.com/libguestfs/libguestfs)

~~~
tacos
Looks like solid work all around. But if we could weigh it relative to the
advancements in C++ the last decade, it wouldn't even show up on the scale.

Red Hat could pay someone to do exactly what you're doing in many other
languages with the exact same results. The only difference would be that the
language in question would show up on the StackOverflow list of "most loved"
or "most wanted" languages. Rust shows up in 50th place with less than 0.2%
share and OCaml doesn't even make that list (TIOBE 9/2015).

Sounds argumentative, but is a genuine question: why have you chosen OCaml for
this? What is so remarkable about it for this use case that 50+ more popular
languages wouldn't do it?

Obviously there are downsides, so what is the upside?

~~~
rwmj
Did you read my reply? To quote: _" easy linking to C libraries, and builds a
native binary with no extra dependencies, speed ... safety, robustness,
compact source code size"_ There are certainly many more popular languages,
but few with that combination of advantages. The project started in 2009, so
you should discount any languages which were not mature before that date.

There are a few people at Red Hat and outside who contribute. To my knowledge
none had prior experience with OCaml, but it's pretty easy to pick up.

~~~
zem
even in 2015 i can only think of D, go and haskell that would work as viable
alternatives.

