In case someone cares about these things, I compared the build times and the bin...

amenghra · on Aug 25, 2017

Unless you perform a proper statistical analysis it's unfair to draw a conclusion from a single run.

Furthermore, when I see a second run that's faster than the first one, I immediately wonder if it's the cache being cold for the first run and warm for the second.

While I have your attention, https://zedshaw.com/archive/programmers-need-to-learn-statis... is worth reading.

ovao · on Aug 25, 2017

In fairness, the phrase he used was "looks like". I don't think his comment was intended to suggest that he'd done rigorous and exhaustive wide-spectrum analysis of compile times and executable size, just that expectations matched the result for his project.

old-gregg · on Aug 25, 2017

Thanks :) I'm no stranger to the scrutiny of Hacker News, I did 3 builds in a row and threw out the 1st one (cache), the last two were within 0.1s of each other, so I copied & pasted the latter.

_cx2w · on Aug 25, 2017

So basically there's no speedup.

JosephRedfern · on Aug 25, 2017

I'm pretty sure he means the last two runs of the same compiler.

hueving · on Aug 25, 2017

"Programmers Need To Learn Statistics Or I Will Kill Them All"... What an insufferable asshat.

PSA: There is no reason to behave like this and this is an incredible way to alienate a bunch of people. You either offend people directly with the murder implication or they don't take you seriously because you sound like you're throwing such an extended temper tantrum that you managed to write it all in a blog.

baq · on Aug 25, 2017

or you can stop being offended by words put out on the internet by strangers... which is what i always recommended to basically everyone.

pron · on Aug 25, 2017

Or, you can be not offended and still criticise someone for being an asshat.

hueving · on Aug 25, 2017

I'm not offended. I'm just not going to waste my time reading an article by someone behaving like a child.

stcredzero · on Aug 25, 2017

It's like Doonesbury, but it came from the 80's: http://imgur.com/82QXoAj

cat199 · on Aug 25, 2017

... maybe it's meant to be a bit ironic/salty/sarcastic/venting?

noselasd · on Aug 25, 2017

So honest question from a non-statistician,

how, concretely, should I go about doing this particular analyzis of compile time for one project ? How many times should I run the build for each of the 2 compilers and what should I do with the result so I could; 1. Draw a conclusion 2. Come up with fair numbers of how they compare ?

I would hope someone could tech this hopefully simple and very concrete thing to the HN crowd and I do hope the answer is not "go learn statistics".

chillydawg · on Aug 25, 2017

You need to first create a clean slate each time for running the experiment: no cache, no FILESYSTEM cache etc. Maybe a tonne of single use docker images? Even then filesystem caches will mess you up a little.

Beyond that, you need to run the same build "several" times to see what the variance is. Without getting specific, if the builds are within a couple percent of each other, do "a few" and take the mean. If they're all over the place do "lots" and only stop once the mean stabilises. There are specific methods to define "lots" and "a few" but it's usually obvious for large effects and you don't need to worry too much about it.

If you're trying to prove that you've made a 0.1 improvement on an underlying process that is normally distributed with a stddev of, like 2, then you're going to have to run it a lot and do some maths to show when to stop and accept the result.

fnord123 · on Aug 25, 2017

I want measurements with filesystem cache because I'm interested in estimating the speed of the compile-test-edit cycle. If you want to estimate the impact on emerge then you'll want no filesystem cache.

It's all about measuring based on what you intend to use the measurements for.

to3m · on Aug 25, 2017

If the measurements are all over the place, why not take the fastest? The average is no good, because it'll be influenced by the times it wasn't running as fast as possible.

I don't myself lose much sleep over worrying about the times it runs faster than possible.

fnord123 · on Aug 26, 2017

I agree with this sentiment. Any time worse than the fastest is due to noise in the system (schedulers etc). So the fastest is the lowest noise run.

Of course, as I said in another comment it depends what you want to do with the measurement. If you plan to edit how long a run will take on an existing system, then you need to accept the noise and use the mean (or median).

fmap · on Aug 25, 2017

There are people who have thought about this, e.g., http://onlinelibrary.wiley.com/doi/10.1002/cpe.2939/full

Personally I think it's a better idea to instrument your programs and count the number of memory (block) accesses or something. That metric might actually be useful to a reader a few years in the future. The fact that your program was running faster on a modern x86 processor from the year 2010 tells me nothing about how it would perform today, unless the difference was so large that you never needed statistical testing in the first place...

edit: I'm not sure if this paper is accessible to everyone, so here is an alternate link https://hal.inria.fr/inria-00443839v1/document

pvg · on Aug 25, 2017

Aren't go programs statically linked? The change in binary size might be completely unrelated to changes in the compiler.

arghwhat · on Aug 25, 2017

Yes, the other guys are just being pedantic because libc is attempted loaded dynamically (but it is not required—DNS behaviour just may change without it).

agnivade · on Aug 25, 2017

> Aren't go programs statically linked?

Not by default. You have to set CGO_ENABLED=0 to statically link libc.

arghwhat · on Aug 25, 2017

Well, Go code is statically linked, but the runtime may try to dynamically load libc for DNS resolving. Use of cgo of course drastically change everything.

pjmlp · on Aug 25, 2017

By default, nowadays the toolchain also supports generating dynamic libraries.

4ad · on Aug 25, 2017

And by default, this feature is not used.

pjmlp · on Aug 25, 2017

So what, it doesn't make this "Go code is statically linked" into a fact, given that it depends on compiler flags.

Now if it said "Go code is usually/by default statically linked", then yes.

pmarin · on Aug 25, 2017

Do native Go programs use libc?

aleksi · on Aug 25, 2017

For a few things like system DNS resolver in net package (can be switched to the pure Go version with compile-time or run-time switch) and getting user's home directory in os/user package.

sebcat · on Aug 25, 2017

To expand on this:

    $ cat foo.go 
    package main
    
    import (
     	"fmt"
    )
    
    func main() {
    	fmt.Println("Hello")
    }
    $ go build foo.go
    $ file foo
    foo: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

compared to:

    $ cat bar.go
    package main
    
    import (
    	"os/user"
    	"fmt"
    )
    
    func main() {
    	u, err := user.Current()
    	fmt.Println(u, err)
    }
    $ go build bar.go
    $ file bar
    bar: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped

ptman · on Aug 25, 2017

You can usually build fully static by doing

    CGO_ENABLED=0 go build

even when using os/user or net/

sebcat · on Aug 25, 2017

At a computer with go1.4.2 freebsd/amd64 ATM (earlier was go1.8.1 linux/amd64 IIRC) and the above os/user example results in a dynamically linked ELF when built with CGO_ENABLED set to 0.

folago · on Aug 25, 2017

I am pretty sure they don't.

agnivade · on Aug 25, 2017

Nope.

chewxy · on Aug 25, 2017

Runtime has changed too... slightly