Hacker News new | past | comments | ask | show | jobs | submit login

I don't care about micro benchmarks ... but if you're gonna do one, make sure your code is decent.

By replacing the Go slice declarations (var x[]int64) with (x := make([]int64, 0, 1000000)) you'll get an algorithm that in my iMac runs much faster:

- your version: 4.93s user 0.28s system 165% cpu 3.151 total

- my version: 1.93s user 0.04s system 168% cpu 1.165 total

EDIT: I'm sure this happens with all the examples, and that's exactly my point. What is this exactly microbenchmarking? Bad programs?




(Re)allocating arrays/vectors seems to be pretty much the whole point, otherwise both Swift and C++ could not only preallocate (reserve), but they could use iterators and elide all allocation instead, and then you'd get something along the lines of this:

    > rustc -O test.rs -o test_rs
    > time ./test_rs
    9999010000
            0.54 real         0.53 user         0.00 sys
here's TFA's C++ version by comparison:

    > time ./test_cpp
    9999010000
            3.17 real         2.29 user         0.86 sys


I totally agree with you on this. The code for all three languages is clearly not optimal.

I wonder how much of this performance difference is C++ being very good at optimizing your code rather than the runtime being more performant. I don't have any numbers, but I feel like that's not a negligible factor.


> I totally agree with you on this. The code for all three languages is clearly not optimal.

That's not really my point. My point's we have no idea what TFA is trying to do or see[0], so you may be able to say "this is stupid and nonsensical" because it is[1], but you can't say "I've got a better version" then provide something with a completely different behaviour.

[0] most likely they just threw some shit at the wall, ended up with relatively long runtimes and called that a benchmark

[1] or maybe not, maybe it's a reduction of actual workload in the system doing computation based on some sort of dynamic number of items so you can't preallocate the intermediate storage and the point's to compare reallocation overhead[2]

[2] not very likely though, considering you never need the array in the first place as you're just summing each item with the previous one, then summing every 100 item of that… anyway


>, but you can say ""

I'm guessing you meant to type, "but you can't say"


Correct guess. Fixed.


Could you link to the code for `test.rs`?


    fn main() {
        let mut sum = 0i64;
        for _ in 0..200 {
            let items = (0..1000000).zip(1..1000000).map(|(a, b)| a + b);
            sum = items.enumerate().filter(|&(i, _)| i % 100 == 0).fold(0, |acc, (_, v)| acc + v);
        }
        println!("{}", sum);
    }
(note that the addition part does some extra work since it iterates on the whole source instead of skipping ahead)


If the C++ version didn't use vector.reserve(1000000), what justifies the reason for the Golang version to use make(,,1000000)?


The point is to sell Swift. In the C++ example, he didn't cast each integer to int64, but he manufactured a reason to do it in Go and Swift. Perhaps this means that Swift's compiler optimizes that particular case almost as good as Go's. It doesn't matter much because it's a single threaded useless benchmark. It's the kind of thing that can change with a new Go or Swift release. As you know better than I, Go slowed down a small amount when the recent GC optimizations were done. Is that a reason to use Go less? No. The GC is quite amazing (and there is HTTP2 support).

In general, why do I care if there are slight variations in the speed of a for loop? The bottleneck in any app is the database or network access. Maintenance and readability matter to me a lot more than minuscule variations in speed.

As someone who programs in Go and Swift, I can tell you that Swift has some things Go doesn't. For one, it has generics. Even still, I would likely never use it over Go on any code that runs on a server. This is due to readability and the Go standard library. I would still choose Swift over Go for iOS development. This is due to how the iOS stack is premised on object inheritance.


> The point is to sell Swift. In the C++ example, he didn't cast each integer to int64, but he manufactured a reason to do it in Go and Swift.

Wow, the crazy is strong with this one. The reason for the cast is that C++ follows C in implicitly converting between integer sizes but neither Swift nor Go do, remove the casts and the code doesn't compile:

    > swiftc -O test.swift -o test_swift
    test.swift:8:18: error: cannot convert value of type 'Int' to expected argument type 'Int64'
            x.append(i);

    > go build test.go
    # command-line-arguments
    ./test.go:12: cannot use i (type int) as type int64 in append
And while the correct fix was probably to type the loop variable as an int64 (which incidentally is ugly and non-obvious in Go, you have to convert one of the for loop's bounds to an int64 were Swift will just let you explicitly type the loop variable to Int64) it also changes jack shit to the runtime:

    > time ./test-go-type
    9999010000
    ./test-go-type  10.88s user 0.86s system 153% cpu 7.660 total

    > time ./test-go-cast
    9999010000
    ./test-go-cast  11.04s user 0.87s system 154% cpu 7.699 total


Ignoring the inappropriate ad hominem...

Why not type the loop variable as int64 in go? Why not just use an int, which would default to int64 on a 64 bit system? Why not just use sum := 0?

Generally benchmarks don't say much unless one of the languages perform extremely poorly. This benchmark doesn't say anything of value.


> Ignoring the inappropriate ad hominem...

Seems appropriate to me, you're trying to spin a shit "benchmark" into some sort of conspiracy theory against your pet language and for your arch-nemesis or something.

> Why not type the loop variable as int64 in go?

Have you actually read my comment? 1. why not indeed, maybe because it's kind of a pain in the ass to do? 2. and that's got no relevance anyway, it changes pretty much nothing to the runtime

> Why not just use an int, which would default to int64 on a 64 bit system?

Because leaving that stuff to random half-specified defaults is a terrible idea when you have exact-sized integers you can be certain of.

> Why not just use sum := 0?

Because if integral numbers don't default to 64 bits or more or arbitrary-size integers, you get signed overflows.



No, that breaks the fair comparison. Other language examples do not pre-allocate all the vectors.


It is already broken ;)

In c++ indexing vector does no range checks while in Go indexing slice checks validity of index.

And vector in gcc growths by factor of 2 while slice in Go grows by 25% after it's bigger than 1024 elements. So there is a trade-of between memory usage and number of allocations.

And... I could go on, but the point is that microbenchmarks are in general not that useful :)


The benchmark should use std::vector::at() in that case. at() performs bounds checks.

http://en.cppreference.com/w/cpp/container/vector/at


As one would expect, the reallocations way dominate the costs: here's the original program

    > time ./test_cpp
    9999010000
    ./test_cpp  2.29s user 0.86s system 99% cpu 3.170 total
here's by replacing []-indexing by at:

    > time ./test_cpp_at
    9999010000
    ./test_cpp_at  2.31s user 0.85s system 99% cpu 3.173 total
and here's the second one with vectors reserved to 1000000:

    > time ./test_cpp_at
    9999010000
    ./test_cpp_at  1.27s user 0.30s system 98% cpu 1.587 total
I expect with iterators and iterator transforms an actual C++ developers could get pretty much the same no-alloc thing I got using them in Rust:

    > time ./test_rust
    9999010000
    ./test_rust  0.53s user 0.01s system 97% cpu 0.556 total
And of course you could just remove the entirely unnecessary initial loops and just do the summation on the index values anyway because that's what's in your vectors, at which point the compiler realises it's a constant (or it doesn't — I didn't actually check ‚ but it just has 10000 iterations to run with 3 additions each so there isn't much of a difference between that and a constant on a superscalar GHz+ CPU where everything fits handily into registers because there's only two values and a constant):

    > time ./test_cpp
    9999010000
    ./test_cpp  0.00s user 0.00s system 45% cpu 0.009 total
though assuming you didn't know the upper bound (or the step) I assume there's a formula which gives you the result in constant time.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: