

Go vs C++: Ray tracer (part 3) - kid0m4n
https://kidoman.com/programming/go-getter-part-3.html

======
eliasmacpherson
[https://kidoman.com/images/go-vs-cpp-after-both-
optimized.pn...](https://kidoman.com/images/go-vs-cpp-after-both-
optimized.png)

In this picture - am I right in interpreting it at 2048x2048 and 8 cores, that
the optimised and tuned go code is nearly three times faster than the
multithreaded optimised and tuned C++ code? How come the C++ is the same for
1C and 8C? Is this picture from one of the previous articles?

EDIT: (No, I'm wrong, the C++ is single threaded!)

It seems it's single threaded - and the last graph isn't showing the current
level of C++ single threaded performance - which is ~36 seconds, with go at
21s for 8 threads. Final state of play is go at 18s multithreaded and C++ at
8s multithread and go at 81s single threaded and C++ at 36s single threaded. I
read that from the third last graph.

It's difficult to understand this ending of the article, as the subtitle is:

"Further optimizations and a multi-threaded C++ version" and "Hurray multi-
threading"

I suggest the order of the article be changed around to have a 'recap on
single threaded C++' at the start, and then the new figures - so that it
concludes in straightforward manner.

This quote from the article is wrong:

"C++ is not more than twice as fast than an equivalent Go program at this
stage."

Correct me if I'm wrong, but in every case that I can see the C++ code will
execute twice before the Go code is finished. It is more than twice as fast.

By the same logic this "almost" is also wrong: "From taking 58.15 seconds
(single threaded), it has now dropped to a extremely impressive 36.36 seconds
(again single threaded), making it almost twice as fast as the optimized Go
version."

36s < 81s/2

~~~
vanderZwan
"C++ is not more than twice as fast than an equivalent Go program at this
stage."

I believe that is a simple typo, where 'not' was meant to be a 'now'.

~~~
eliasmacpherson
Initially I thought that, but I can't explain the 'almost' that follows?

also it would be idiomatic to say "now more than twice as fast as" instead of
"now more than twice as fast than".

~~~
kid0m4n
The god of all typos has been fixed :)

~~~
eliasmacpherson
Well done, you should probably change the "almost" I pointed out to "more
than" also.

------
616c
Someone posted something on Nimrod a while back with something totally related
[0], and I think it is interesting it keeps getting passed over. I am not a
lang expert, but I have become very curious about the multiple alternatives in
the systems programming niche, and others. It has been developed at least
since 2008 (with 0.6 branch in 2008)[1] and the Go language's first public
debut in 2009 (I do not know how stable it was at the time and I gave up after
going through the first pages of the whole commit history to get a better
answer).[2] At the very least, they were developed in the same time frame, are
nominally similar languages, and addressing a lot of the same use cases as
Rust I suppose. Yet, unfortunately few bench marks include them.

I guess they will always be writing an indie language, but I wish more would
check it out. As I continue to learn, maybe I can contribute code to his
project. We will see if I ever get that far.

[0] [http://nimrod-code.org/](http://nimrod-code.org/)

[1] [http://nimrod-code.org/news.html](http://nimrod-code.org/news.html)

[2]
[http://en.wikipedia.org/wiki/Go_programming_language](http://en.wikipedia.org/wiki/Go_programming_language)

~~~
gillianseed
>and I think it is interesting it keeps getting passed over.

If you think it's being passed over then you need to try and generate interest
in the language, like posting articles about it you find, or if you find none,
write one.

Same goes for benchmarking, it's unlikely that people who aren't interested in
the language will write benchmark-versions for that language so it's up to
those who are interested in it to provide them.

Which is what someone did in the Rogue Level Generation Benchmark where I
recall Nimrod performed very well.

[http://togototo.wordpress.com/2013/08/23/benchmarks-round-
tw...](http://togototo.wordpress.com/2013/08/23/benchmarks-round-two-parallel-
go-rust-d-scala-and-nimrod/)

From my own very cursory glance I'm not quite sure why you would compare
Nimrod directly to Go, I think it's more aptly compared to something like
Rust, with both having optional garbage collector, generics, macros etc.

~~~
dom96
> If you think it's being passed over then you need to try and generate
> interest in the language

I think that 616c is doing exactly this with his comment. Not everyone can
write effective articles, and writing articles also takes a lot of time and
effort. Writing comments advertising the language is usually the next best
thing.

I wrote a couple of Nimrod benchmarks including the one you linked to. But it
again takes time, and I would prefer to improve the language and its tools
than to be writing tons of benchmarks.

I really wish more people gave Nimrod a serious chance because it is truly a
brilliant programming language and it really deserves to get more exposure.

Nimrod's target user base very much clashes with Go's, I hear about Python
programmers switching to Go because it reminds them of Python so much. Well, I
don't see it. But Nimrod is definitely a fast and compiled Python with lots of
extras.

~~~
616c
Thank you dom96. I see you often in Nimrod forum posts and with cool code on
Github. I look forward to brushing up on documentation and eventually starting
to code in Nirmod. You and Araq have some inspired/inspiring conversations in
language design on that forum. Keep it up, it is a very fruitful read.

~~~
dom96
That's nice to hear. Join the IRC channel or read the IRC logs
([http://build.nimrod-code.org/irclogs/](http://build.nimrod-
code.org/irclogs/)) if you wanna read more discussions or take part in them in
real time :)

------
buster
This has been circulated on the Rust-dev mailinglist:

[https://mail.mozilla.org/pipermail/rust-
dev/2013-September/0...](https://mail.mozilla.org/pipermail/rust-
dev/2013-September/005735.html)

Rust did quite well (given that it's not even production ready i'd say it's
impressive): [https://mail.mozilla.org/pipermail/rust-
dev/2013-September/0...](https://mail.mozilla.org/pipermail/rust-
dev/2013-September/005750.html)

The only thing they noted was that the Go version "cheated" by precomptuing
some values which someone removed to make the algorithms the same.. maybe
kid0m4n can comment on this :)

~~~
kid0m4n
I was looking at the Rust-dev mailing list before going to sleep :)

I would argue that it is not cheating "anymore" as both the Go and C++ version
are now equal. In fact, I wanna compare how Rust performs in this exact same
test with all optimizations applied. Studying those optimizations will be fun
itself.

I have also explained why optimizations are perfectly fine (IMHO) here:

[https://github.com/kid0m4n/rays#why-optimize-the-base-
algori...](https://github.com/kid0m4n/rays#why-optimize-the-base-algorithm)

~~~
buster
Ok, but with different algorithms you are not testing the languages/compilers
anymore but the algorithms ;)

I understand that now when the C and Go version we can't compare with the rust
version anymore uless someone applies the same changes.. anyway, t was a very
interesting read on the rust ML and frankly i was surprised to see rust fare
so well

~~~
kid0m4n
True...

But, I guess algorithmic advances will happen much less frequently compared to
other micro optimizations. And it will keep things interesting across the
board.

------
copx
Looking at the github it seems you benchmark GCC 4.8.1 vs. go 1.2rc1. Numbers
for Go look promising if one considers that Google's Go implementation does
not even have an advanced optimizer yet (in contrast to GCC).

>c++ -std=c++11 -O3 -Wall -pthread -ffast-math -mtune=native -march=native -o
cpprays cpprays/main.cpp

Have you tried -O2? -O3 often generates slower code.

>i7 2600

Intel's compiler would probably generate faster code. That's why you can't
just say "Go vs C++". You could let Go win this fight by compiling the C++
with Digital Mars. It is also a C++ compiler but it lacks a modern optimizer
and the generated code is usually much slower.

~~~
bluecalm
Few remarks:

-mtune is redundant with march=native turned on;

-use -Ofast instead O3/ffast-math it turns some more options as well (although theoretically it might behave in non-standard way with float computations, you need to test this, it wasn't ever a problem for me)

-add -flto it often helps significantly

-it probably won't matter for simple program but you may try compiling with: -Ofast -march=native -flto -fprofile-generate; then run the program (it will generate .gcda files) and then recompile with the same options and -fprofile-use;

EDIT: Some quick tests shows that all the options I mentioned help (MinGW, gcc
4.8.1). Original time on one thread was 3.670s, Ofast and -flto takes it to
~3.630s and adding PGO moves it to 3.46s (there is some variance with all of
those) - whooping 5-6% improvement overall :)

~~~
twoodfin
How would -flto help when all the code is in one module?

~~~
bluecalm
All I know about gcc flags is from testing a lot of combinations of them. I
don't really know how -flto works but maybe it does something for functions
from included files ?

------
nemothekid

        go build -gcflags -m
    

After some quick googling I can't find what the "m" flag does. Can anyone shed
some light?

~~~
kid0m4n
Author of the article here

It helps in finding out details of what the compiler (Xg) thinks of various
funcs inlining applicability (is that even a word?)

~~~
gillianseed
Speaking of gcflags, did you try Go with gcflags=-B to see how it performs
without bounds checking.

I know this isn't 'the right way' given that Go is supposed to be a safe
language but it would be interesting to see how much difference it would make.

~~~
kid0m4n
Have not tried that... will give it a shot

But I guess we are then going away from idiomatic Go

~~~
gillianseed
I guess, it's an unsupported compiler option (as in it can disappear any new
release).

I looked at it more as an option for when you want to cram out possible extra
performance of release builds.

------
shin_lao
I wonder how the C++ version would do with TBB. Also I think it would be more
interesting to compare more memory intensive programs, I think C++ would shine
even more with all the optimizations opportunities there would have.

~~~
shin_lao
And what about adding the following compile options:

-m64 -msse3 -mfpmath=sse?

~~~
qb45
-m64 and -mfpmath=sse are defaults on x86-64, -msse3 is enabled implicitly by -march=native if the machine supports SSE3.

------
halayli
vector S(vector o,vector d, unsigned int& seed) {}

int T(vector o,vector d,float& t,vector& n) {}

not sure why he's copying the vectors here.

~~~
hamidr
Also, he's mostly passing by value.

~~~
bstamour
Passing by value in C++ can offer speed increases when the function will be
copying the object anyways, due to move semantics, constructor eliding, etc.

