
Practical Go benchmarks - minaandrawos
https://stackimpact.com/blog/practical-golang-benchmarks/
======
aodin
The majority of the performance difference between strings concat and builder
in your example is explained by memory allocation. Every loop of concat will
result in a new allocation, while the builder - which uses []bytes internally
- will only allocate when length equals capacity, and the newly allocated
slice will be approx. twice the capacity of the old slice (see:
[https://golang.org/src/strings/builder.go?#L62](https://golang.org/src/strings/builder.go?#L62)).

Therefore, 500,000 rounds of concat is about 500,000 allocations, while
200,000,000 rounds of builder is ~ 27.5 allocations (=log2(200000000)).

I would suggest a different benchmark to approximate real world usage:

    
    
        func BenchmarkConcatString(b *testing.B) {
            for n := 0; n < b.N; n++ {
                var str string
                str += "x"
                str += "y"
                str += "z"
            }
        }
    
        func BenchmarkConcatBuilder(b *testing.B) {
            for n := 0; n < b.N; n++ {
                var builder strings.Builder
                builder.WriteString("x")
                builder.WriteString("y")
                builder.WriteString("z")
                builder.String()
            }
        }
    

Which still shows a significant performance advantage for using builder (-40%
ns/op):

    
    
        BenchmarkConcatString-4     20000000            93.5 ns/op
        BenchmarkConcatBuilder-4    30000000            54.6 ns/op

~~~
marcus_holmes
Won't the compiler just ignore the "builder.String()" line unless the return
value is actually used?

~~~
zaarn
Easy fix, use the anonymous variable;

    
    
        _ = builder.String()
    

The compiler should not optimize that out.

~~~
majewsky
That's absolutely the same as the statement without assignment from a data
flow POV. However, the compiler will likely not optimize it away since it
would be too burdensome to proof that strings.Builder.String() does not have
any side effects. The Go compiler prides itself with fast compilation speed,
so I would not expect it to perform cross-package control/data flow analyses.

------
tapirl
I would mention that, gc (the official Go compiler) makes special optimization
for string concatenation operation (+). If the number of strings to be
concatenated is known at compile time, using + to concatenate strings is the
most efficient.

    
    
        package a
        
        import "testing"
        import "strings"
        
        var strA, strB string
        var x, y, z = "x", "y", "z"
        
        func BenchmarkConcatString(b *testing.B) {
            for n := 0; n < b.N; n++ {
                strA = x + y + z
            }
        }
        
        func BenchmarkConcatBuilder(b *testing.B) {
            for n := 0; n < b.N; n++ {
                var builder strings.Builder
                builder.WriteString(x)
                builder.WriteString(y)
                builder.WriteString(z)
                strB = builder.String()
            }
        }
    

Result:

    
    
        goos: linux
        goarch: amd64
        BenchmarkConcatString-2    	20000000	        83.7 ns/op
        BenchmarkConcatBuilder-2   	20000000	       102 ns/op

~~~
BeeOnRope
Note that this is directly contradicted by another commnent[1] on this post,
where three fixed strings are concatenated with +=, yet that was still slower.

Perhaps the use of += as separate statements is the difference, but one would
hope that gc wasn't so fragile as to be unable to identify those sequences as
identical.

\---

[1]
[https://news.ycombinator.com/item?id=16533650](https://news.ycombinator.com/item?id=16533650)

~~~
tapirl
The optimization made by gc is only valid for the form: s0 + s1 + .... + sn.

------
kjksf
String benchmarks are so broken.

They way he uses b.N is wrong. b.N is different for different loops so he's
e.g. timing 100 iterations of string '+' with a 1000 iterations of
builder.WriteString()

Also the compiler can completely null out no-op functions (without side
effects) so in benchmarks it's a good idea to assign the value being
calculated into e.g. a global variable.

The corrected code is:
[https://gist.github.com/kjk/6a7d7135ae1e5fa6cd1f0db23d2eaf4d](https://gist.github.com/kjk/6a7d7135ae1e5fa6cd1f0db23d2eaf4d)

An example of correctly benchmarking:

    
    
        func BenchmarkConcatString(b *testing.B) {
    	for n := 0; n < b.N; n++ {
    		var str string
    		for i := 0; i < 100; i++ {
    			str += "x"
    		}
    		gStr = str
    	}
        }
    
    

After fixes it paints significantly different picture:

    
    
        go test -bench=. -benchmem
        goos: darwin
        goarch: amd64
        BenchmarkConcatString-8    	  300000	      5148 ns/op	    5728 B/op	      99 allocs/op
        BenchmarkConcatBuffer-8    	 1000000	      1046 ns/op	     368 B/op	       3 allocs/op
        BenchmarkConcatBuilder-8   	 1000000	      1177 ns/op	     248 B/op	       5 allocs/op

~~~
dmitrim
Thanks for pointing it out. Should clearly not depend on the number of
iterations. It's fixed now.

~~~
fauigerzigerk
I think there's another bug in the generateSlice function if the intention is
to create a slice with n random numbers.

    
    
        func generateSlice(n int) []int {
            s := make([]int, n)
            for i := 0; i < n; i++ {
                s = append(s, rand.Intn(1e9))
            }
            return s
        }
    

As it is now, the function creates a slice with n zeros followed by n random
numbers. I suppose you meant to say make([]int, 0, n). You could just as well
assign directly to each slice element instead of using append, which would be
more efficient.

I made the exact same mistake quite a few times myself.

~~~
dmitrim
Yep, that meant to be capacity, not length. Corrected. Thanks!

------
bpicolo
While I don't doubt that strings.Builder does is quicker than += concat for
many iterations, to make it a fair comparison you probably need to pull out
the string at the end rather than just writing to the buffer. It's also not
obvious for example what the difference is with just 2 strings to join if I
need to join two strings together 40 trillion times or whatnot.

Nice collection of microbenchmarks though. Interesting to see magnitude
differences from e.g. regexp compile

------
Vendan
Fun fact: the crypto rand "number" benchmark depends on the number you pass
into it:

    
    
        BenchmarkCryptoRand27-8   	 5000000	       388 ns/op
        BenchmarkCryptoRand28-8   	 3000000	       356 ns/op
        BenchmarkCryptoRand29-8   	 5000000	       335 ns/op
        BenchmarkCryptoRand30-8   	 5000000	       327 ns/op
        BenchmarkCryptoRand31-8   	 5000000	       331 ns/op
        BenchmarkCryptoRand32-8   	 5000000	       322 ns/op
        BenchmarkCryptoRand33-8   	 3000000	       480 ns/op
        BenchmarkCryptoRand34-8   	 3000000	       474 ns/op
    

for benchmarks like

    
    
        func BenchmarkCryptoRand32(b *testing.B) {
            for n := 0; n < b.N; n++ {
                _, err := crand.Int(crand.Reader, big.NewInt(32))
                if err != nil {
                    panic(err)
                }
            }
        }
    

This is because the crypto/rand library is very very careful to give you
unbiased random numbers.

------
friday99
The string benchmark has the issue that the amount of work done varies with
each pass through the loop since the string just keeps getting appended to. A
proper benchmark like the ones in the comments here do the same amount of work
for every loop.

------
jossctz
Note that you can also get the number of bytes processed per second by calling
the SetBytes method. This is very useful on some bench (hashing, base64, ...):

    
    
      func benchmarkHash(b *testing.B, h hash.Hash) {
      	data := make([]byte, 1024)
      	rand.Read(data)  
      
      	b.ResetTimer()
      	b.SetBytes(len(data))
      	for n := 0; n < b.N; n++ {
      		h.Write(data)
      		h.Sum(nil)
      	}
      }

------
pbnjay
> The following benchmarks evaluate various functionality with the focus on
> real-world usage patterns.

I can't say I write much code that does one thing many times in a really tight
loop. It would be a lot more interesting if the code combined multiple
functions into the loop body in a better attempt to simulate "real-world usage
patterns."

~~~
dmitrim
Good point, thanks! The idea behind these benchmarks is to make the results
usable in real-world programs, rather than benchmarking real-world programs. I
rephrased that sentence to avoid any confusion.

------
antoaravinth
I always wanted to ask this. I'm a full stack developer with good knowledge on
Java and JavaScript. I'm currently reading Golang especially for its
concurrency idioms. It is good and easy to write concurrent code but people
always come and say about actors which are very good when compared with
channels. I have never used actors before.. Whats your thoughts on this?

------
majewsky
Even though this is clearly a benchmarking game, I don't like that it does not
explain how the things benchmarked against each other sometimes have
drastically different usecases.

I can assure you that someone is going to use these numbers to argue that
crypto.Rand needs to be replaced by math.Rand BECAUSE SPEED, or that MD5
should be preferred over SHA2/3.

------
Xeoncross
It's worth noting that the first number in a benchmark result is how many
loops (for n := 0; n < b.N) that Go used to find the results.

The nanoseconds, bytes, and allocs per operation are the important part.

