
A Response to Hello World in Go - todotask
http://www.doxsey.net/blog/a-response-to-hello-world
======
nemothekid
> _But regardless of what our computer is doing, if it takes less than 100ms
> to execute our program, we simply won 't be able to tell the difference
> between executing 1 instruction or all 10 billion._

> _All of the example programs except for "Python 3 (PyPy)" are basically
> within this threshold. No user will notice the difference._

This is a contentious point. 100ms is very noticeable for a human being. I'd
say most humans probably hit the "simultaneous" range at 33ms (30fps). I know,
personally, I can very much tell when there's an input latency of 100ms.

Regardless, for Drew's somewhat contrived benchmark, I can't imagine a
situation where starting a program in 33ms or 100ms would matter, but if that
program was running in a loop in a bash script? Then those milliseconds very
much do matter. Spending 90ms more in a loop is an extra 9 seconds for every
100 items.

~~~
Twirrim
I would argue if you're doing performance sensitive work in that fashion,
you're doing it wrong.

I'd also probably argue that if you're doing it that way, you're probably
already _not_ that concerned with performance to the degree where losing a few
tens of seconds is a big deal.

I certainly wouldn't include any tight bash loops that execute external code
in any service code, for example. Performance there shouldn't be impacting
either the sync or async path.

Much like the article's point, it's a question for optimising for the right
things.

I do agree that people can perceive values faster than 100ms.

Several years back I was interviewing for a sysadmin job at a futures trading
company. Part way through the interview, one of the traders complained to the
interviewer that the latency was increasing on the connection to the exchange,
and sure enough it was up 20ms from normal (I forget the normal, but it was
very low, on the order of only maybe 10-20ms) and monitoring was just about to
page the interviewer.

------
jlundborg
The author makes a benchmark that compares a go program that explicitly
buffers many "hello world" strings, and compares this to an assembly program
that does not do this, then goes on to argue that this shows that high level
languages can be faster. This can certainly be true, but this particular
example is not a great way to show it, because the same optimization can be
done in an obvious way in assembly, (just allocate enough space with brk or
mmap2, write your strings and send the right memory address to ). This would
of course be harder to do in assembly than in Go, but it is still quite
straightforward to see what is the optimal way to do it. Furthermore, this
would probably still outperform the optimized Go version by a wide margin.

A better example could be where a compiler picks an obscure but faster
parallelization operation, or unrolled a loop appropriately in a way that is
both faster and unlikely to be written by a competent human, or a complex
memory management scenario etc etc.

I think this is not the point of the original article though. I think we all
understand that abstractions can in theory bring great benefits, but we do
need to scrutinize the cost they add. The hello world examples shows that even
with the simplest program we can imagine, the result is extremely far from
optimal in popular programming environments. If this is the case, why should
we assume that these same compilers are doing an excellent job in situations
that are actually hard?

~~~
skywhopper
He called this out, saying that the added complexity overhead of Go allowed
the abstraction to be done much simpler than it would be in assembly. This is
just tradeoff that is worthwhile in most cases.

The gripes about boilerplate overhead in Hello World miss the point that the
runtimes involved are themselves making tradeoff about what to optimize for.
Go explicitly trades off ultra-efficient binary size for ultra-fast
compilation and mostly-static artifacts. Go is not designed to make the most
efficient possible Hello World binary, nor should we want it to be. The fact
that you can optimize Hello World better by hand than the Go compiler does
tells us nothing interesting. How well can you optimize Docker, Consul, or
Kubernetes by hand?

------
munificent
_> By some estimates human beings perceive two events as instantaneous at
around 100ms._

This is not _at all_ what the linked article says. The article says human
_reaction time_ — the time it takes to receive a stimulus, process it, and
perform a physical act in response — is about 100ms.

Human time _perception_ is _much_ faster than that. A software button that
takes 100ms to switch to its pressed appearance feels dramatically different
than one that does so in 10ms. As the linked article notes, game player
performance degrades as latency increases from 13ms.

------
romwell
Original observation: programs do a lot of things you didn't ask for.

This response: well, actually™ you really want all these things you didn't ask
for. And don't worry about that bloat, the memory is cheap and the CPUs are
fast.

So, the points of the original article still stand (this "response" does not
address them in any way).

The claim about 100ms latency being imperceptible is the cherry on the top.
Try delaying a drum track by 100ms and listening to it without pain (or try
singing karaoke into a mic with that much latency).

But that's beside the point, which is that _all these programs are doing a lot
of things you didn 't ask for and take megabytes to do that_.

~~~
pm90
do you agree that there are tradeoffs is system design?

A plane could weigh a lot less if we removed all the computers and sensors
that it carries. However, as logical beings we make the trade off because
speed isn’t as important as safety.

~~~
romwell
Great example! An airplane only carries the stuff that it actually needs.

Aerobus A380 carries 300 miles of wires in its body.

You can be absolutely sure that a Cessna 152 _does not_.

Putting all of A380's wires in a Cessna 152 is not a trade-off, it's insanity.

------
nneonneo
> Computers are fast - a lot faster than we can possibly perceive. By some
> estimates human beings perceive two events as instantaneous at around 100ms.
> To Drew's point, this is an eternity for a computer. In that time a modern
> CPU could execute 10 billion instructions.

This type of thinking is why we can’t have nice things, and why technology and
software seem stuck on an eternal treadmill. A system with lower latency feels
amazing - compare a high and low latency terminal or text editor sometime and
feel the difference. A touchscreen with 1ms latency (input to output, which
requires a 1000Hz display) feels like a real physical object - way different
from a touchscreen at 16ms or (god forbid) 100ms. We are _never_ going to get
there if people keep assuming that latencies lower than 100ms (or 30ms) are
imperceptible.

100ms is the minimum viable performance. If an operation takes longer than
that, you’ve failed for interactive purposes. But just because that’s the
minimum requirement doesn’t mean we shouldn’t try our dang hardest to improve
on that!

------
pcwalton
I strace'd Rust's hello world and the vast majority of the syscalls are just
ld.so setting up dynamically loaded libraries. Presumably if you really cared
you could statically link against musl to eliminate those. The remaining
syscalls are needed for stack guards to work, which you actually want for
security, even with hello world.

I would go so far as to say that publishing benchmarks that encourages
languages to skimp on important security features like guard pages is
irresponsible.

~~~
glandium
It's still weird that it has significantly more syscalls than the C program
dynamically linked to glibc, which presumably has the same overhead wrt ld.so
and guard pages. (BTW, the guard page for the main thread, IIRC, is setup by
the kernel)

~~~
pcwalton
Here are the list of syscalls:
[https://gist.github.com/pcwalton/60ff97c2353feda11638be10118...](https://gist.github.com/pcwalton/60ff97c2353feda11638be10118851f2)

I think the extra syscalls you're referring to are the result of linking with
pthread. Obviously Rust code always wants thread support.

Here's the setup code: [https://github.com/rust-
lang/rust/blob/master/src/libstd/sys...](https://github.com/rust-
lang/rust/blob/master/src/libstd/sys/unix/thread.rs)

A fair number of the syscalls are to ensure that stack overflow is reported
like a Rust panic (with a stack trace), not like a segv in unsafe code. I
think this is worth it.

------
csande17
> A full-program, aggressive optimization step might lead to smaller binaries,
> but it would do so at a great cost to projects with many dependencies.
> Incremental compilation is a boon to developer productivity which is also
> important.

Isn't the traditional way to solve this problem to have two different modes
for the compiler? You can have a "development" mode that compiles quickly and
supports incremental compilation, and that doesn't stop you from also having a
"release" mode that produces a Hello World executable that fits on a floppy
disk.

And then if you're Google and you have infinite disk space and memory but
limited time to spend waiting for builds, you can just run all your builds in
"development" mode. (But if you also ship smartphone apps, maybe you compile
_those_ in release mode.)

~~~
loopz
The Go Way ain't the traditional way. I'd argue you should want the same
binary everywhere, though there's always C or you could come up with your own
scheme to break production.

~~~
csande17
I don't especially care about the binary (beyond wanting it to be as small and
fast as possible on end-users' machines), but I do want the _behavior_ to be
the same everywhere.

In C, this is difficult because code often accidentally relies on "undefined
behavior" that changes depending on the compiler and optimizations you use.
But is that still an issue in newer languages that don't have as much
undefined behavior? (Is this something Rust programmers struggle with, for
example?)

~~~
loopz
Safe language implementations such as Go and Rust are specifically designed to
avoid such inconsistencies. If your builds differ, you never know 100% though.

------
stickfigure
From the original-original post: _Most languages do a whole lot of other crap
other than printing out “hello world”, even if that’s all you asked for._

This should be of great concern to all the people who write "hello, world"
programs for a living, and no concern to anyone else.

------
6510
On c64 you just load the "hello world" ~~program~~ text into the screen
buffer.

~~~
teddyuk
On an “etch a sketch” you go up one dial then down a half then right half a
dial then down half a dial then right half a dial and then right half a dial
and then left half a dial and up one dial and then right half a dial and then
down half a dial and then...bugger stuck

~~~
devjam
That's when you yell out to your female sibling for help, aka a sis-call.

------
saagarjha
> Kinda neat that it handles the binary getting deleted

…and then it goes and reads /proc/self/exe. This looks really racy.

~~~
firethief
Reading the symlink isn't racy.

But why is the os package doing a syscall to determine the executable's path
in a hello world program?

~~~
saagarjha
I believe the function of this feature was to get the executable before it
could be deleted off the disk, which it presumably could in the time between
the binary was executed and the call to readlink was made.

~~~
firethief
I've looked into it some more; I had assumed /proc/*/exe acted like a symlink
whose value never changed, but actually it's more complicated[1]. As a result
(only considering Linux here), there actually is a race condition in what
they're doing: /proc/self/exe will have " (deleted)" appended if the
executable is deleted. In an apparent attempt to make the problem more subtle,
they're doing the readlink on every program startup so that if the value is
needed, it will be correct--unless the file was deleted between when it was
exec'd and when it got to the readlink call.

1: [https://unix.stackexchange.com/questions/197854/how-does-
the...](https://unix.stackexchange.com/questions/197854/how-does-the-proc-pid-
exe-symlink-differ-from-ordinary-symlinks)

~~~
saagarjha
Yup, that's exactly it.

------
jancsika
> By some estimates human beings perceive two events as instantaneous at
> around 100ms.

Would be fun to design a realtime audio app using that supposition.

Also, a question to troll fans of the phrase "Gell-Mann amnesia effect": how
flippantly should I now reject the other paragraphs that are outside of my
expertise?

------
dana321
// shortest "hello world" in go

package main

func main(){

    
    
           println("Hello World")
    
    }

