Hacker News new | past | comments | ask | show | jobs | submit login
Hello World (drewdevault.com)
212 points by ddevault 19 days ago | hide | past | web | favorite | 133 comments

What is this post supposed to prove? It certainly is not supposed to prove that a hello world is a representative real-world program, from which one could infer that writing and debugging a real-world program in Julia is 835 times as complex as writing a real-world program in assembly, since the former makes 835 times as many syscalls as an assembly program. (You know, that number seems okay for me, except it needs to be applied to these languages in reverse.)

I agree that software bloat is a big problem, but trivializing that problem to printing a "hello world" to the screen, punishing all languages with runtimes by measuring the syscalls involved in their startup routines, disregarding the fact that many users are going to have a single system-wide runtime for e.g. C or Python or Julia and therefore the total-kB number does not scale linearly with the number of programs written in C or Python or Julia, ignoring the massively increased software development and debugging time for writing in low-level and memory-unsafe languages like assembly, static Zig, or C, and directly implying[0] that most problems with software complexity can be solved by writing in assembly, static Zig, or C rather than in Julia/Ruby/Java/all other languages from the bottom 90% of the list (and that's the vibe that this post gives me) is, for me, more about making a venting shitpost than creating something that provides even a part of an actual solution to software bloat in general.

The "more time your users are sitting there waiting for your program" statement is especially amusing to me. Your users are not going to wait shorter for your program because you and your team are taking another year to write, test, and debug it in assembly.

[0] "These numbers are real. This is more complexity that someone has to debug, more time your users are sitting there waiting for your program, less disk space available for files which actually matter to the user."

> What is this post supposed to prove?

It proves that the compilers for many programming languages emit a lot of instructions and a lot of system calls they could have optimized away theoretically.

This is particularly true when there are lots of "startup routines". When the program to be run is known, almost all of that startup can be found out to not be necessary, and stripped from the final executable.

> Your users are not going to wait shorter for your program because you and your team are taking another year to write, test, and debug it in assembly.

They might, if your program is interactive and is used often enough. Now, writing assembly is not very realistic, but considering a different language with less overhead - is.

For all the daily articles that pop up here on HN concerning premature optimization I find it surprising to see this thread and all the sentiment of language choice mattering.

It’s far easier to work in and translate thoughts to more feature rich languages. The vast majority of the time it’s not the language itself that will be an optimization issue down the line but rather the written code.

Here's the problem, your language isn't the optimization issue... until it is. You optimize all the hot-spots on your profile, and then... all the sudden you're looking at a very big samey chart of everything taking slightly-too-long even though you've optimized every possible thing and there's nothing that particularly sticks and and there's nothing you can do about it. And then you're totally fucked, because you're looking at a rewrite at that point; probably in a language your team doesn't know and doesn't like.

Sometimes language matters! Not always, but frequently!

> you're looking at a rewrite at that point; probably in a language your team doesn't know and doesn't like

Or you commit atrocities like rip the garbage collector out of Python. I’m not joking; Instagram did this: https://instagram-engineering.com/dismissing-python-garbage-...

That’s actually a pretty clever (and crazy) idea, because when you disable the gc in Python the reference counting still works and only the cycle detection part is vanished. So you are just dealing with reference semantics the same as C++’s shared_ptr or Swift’s references, which isn’t that hard to use if you are careful about cyclic refs. And if the memory leak exceeds a certain amount they can just restart the process.

Disabling the python GC like that isn't that crazy. They didn't add gc.disable() because you shouldn't use it. The crazy part is some library calling the gc directly without a way to disable it. That's just bad form.

A lot of games in gc-ed languages turn the gc off, and then manually run the gc whenever there's some spare time in a frame. The same thing should more or less work for a server handling web requests too. Maybe run it after every request or perhaps every couple of requests depending on memory usage.

That's also how Perl functions. No generational GC, just ref counting. So you're fine if you don't make circular references.

Heh! That is fantastic!

> they could have optimized away theoretically.

For a hello-world program, this is true, you do not need any of the features that usually come in the language runtime - no garbage collection, no runtime memory safety, no runtime introspection, et cetera. This stems directly from the fact that many of these languages have borrowed features from Lisp, which has the whole language (GC, compiler, debug functions, etc..) always available for the code.

And that, considering your point of view, might be a feature, not a bug - especially if your program, having a runtime of its own, actually produces its own stacktrace, allows you to interactively inspect values of variables throughout the stack, and - in case of some runtimes - is capable of opening up its own debugger for you to perform introspection and issue analyzing in.

> This stems directly from the fact that many of these languages have borrowed features from Lisp, which has the whole language (GC, compiler, debug functions, etc..) always available for the code.

Right, I think the point was that this isn't always necessary, in which case it becomes extra baggage that you have to carry around.

I think it would be interesting to make a slightly more complex program involving maybe a few hello worlds, and then see how much the size of all these languages increases.

> Now, writing assembly is not very realistic, but considering a different language with less overhead - is.


"Sawyer wrote 99% of the code for RollerCoaster Tycoon in x86 assembly language, with the remaining one percent written in C."

I didn't say "impossible" or "completely unrealistic"...

I wonder how that code gets maintained.

Also - that game absolutely needs better resolution to do the coasters justice.

It's quite an old game now, released in 1999. Runs beautifully under wine and one of my favorite games of all time. I recommend giving it a go if sim games are your thing. Hard to believe it was all written in assembler and I too would love to know how Chris managed his code.

It could just be trying to be interesting, not to prove anything.

The language in the post, "Most languages do a whole lot of other crap other than printing out “hello world”, even if that’s all you asked for." certainly seems to imply it's moralizing about something.

Why is it attempting to moralize if it is not meant to prove any morals then?

If your compiler generates something huge for the most basic test, how can you trust it's doing well on actual programs? Most real programs are far too complicated to analyze. It may be a toy example, but if your language can't handle a toy, it's probably bad on real world programs also.

You have a weird definition of "handling". A high-level language performs the same amount of work as the lower-level one - I cannot see a language listed in the blogpost that has failed to display a command in the console.

It's the lower-level languages that, for the sake of efficiency and minimalism, cannot handle systems larger than a hello-world and actual programs written in them without long development times and a mass of bugs that come straight from their inability to abstract when necessary and from their memory management systems that lack sanity and safety. This "simplicity"[0] is often fetishized, and this blog post looks no different to me - look how many overflow bugs in operating system kernels this "simplicity" has caused.

Or, paraphrasing your quote, if your language prevents you from creating most basic abstractions, how can you trust it's doing well on actual programs?

[0] I can agree, writing in a memory unsafe language is simple, as long as it's someone else who maintains the code you wrote, accepts or reroutes all of the bugtickets that happen, and analyzes heisenbugs that come from weird memory corruptions caused by unsafe code.

I don't think what people are saying here is that the languages are too abstract, it's that compilers are implementing those abstractions with too much overhead.

Most compilers (i assume) optimize by applying specific optimizations to patterns they find in the source. most of the time this is pretty good compared to non-optimized code. but the other day on HN i learned about polyhedral compilation, which is pretty crazy of an optimization. it got me thinking: what about all the optimizations we don't know about? is it possible to find the absolute best optimization for a program given some specifications of what we want from it?

I think an ideal compiler would brute-force optimize like this so we don't miss out on optimizations we didn't think of:

1: compile the code normally without optimizations

2: generate a binary [see note1], and test if it produces the same output as the binary from (1)[see note2]

3: sort through the binaries and choose the best ones (perhaps some binaries are more optimized for speed, size, etc so there may be tradeoffs)

note1: here are a couple ways i think you could do this, in order of practicality:

A: modify the binary from (1) a bit at a time

B: modify the binary from (1) X bits at a time, or any of the other binaries from (2) that passed the test, more likely to modify binaries that perform better in the tests.

C: iterate through every possible binary up to around the size of the binary from (1)

note2: the easiest way to implement this would be to test it with a ton of sample input, but it would be better if the binary could be analyzed somehow to see if it satisfied the parameters of the source code.

I wouldn't use a such a brute-force optimizer to actually produce code which would then be undefined, but you could use it to reverse-engineer optimizations and implement them in source code.

edit: formatting


> No, uh, the high level language performs way more work.

The effective work is the same: a line is printed to the screen. The part you consider "wasted" does only go to waste if all of your programs are "hello world" programs, at which point, you might consider yourself fairly lucky if you also get paid for that.

> Yes, nobody has ever written a complex program in C.

> You're right. Let's rewrite the Linux kernel in Ruby.

You're attacking a straw man. You completely omitted the part I wrote about "long development times and a mass of bugs that come straight from their inability to abstract when necessary and from their memory management systems that lack sanity and safety."

No one doubts the existence of Linux, BSDs, Darwin, or Windows. I doubt the programmer productivity while working on them.

> The effective work is the same: a line is printed to the screen. The part you consider "wasted" does only go to waste if all of your programs are "hello world" programs, at which point, you might consider yourself fairly lucky if you also get paid for that.

Did the CPU work go towards something useful? No? Then it is waste. Some waste is inevitable; that doesn't change what it is. It's still waste. We should minimize waste. It's kind of simple.

>You're attacking a straw man. You completely omitted the part I wrote about "long development times and a mass of bugs that come straight from their inability to abstract when necessary and from their memory management systems that lack sanity and safety."

No mate, you're the one attacking a straw man. You're ignoring the actual evidence of people getting important things done in low level languages and hand-waving it away because it doesn't agree with your preconceived notions of how things should be done.

> Did the CPU work go towards something useful? No? Then it is waste. Some waste is inevitable; that doesn't change what it is. It's still waste. We should minimize waste. It's kind of simple.

Sure, then let us minimize the waste - for example, minimizing the number of man-hours spent on fixing buffer overflows, null pointer dereferences and memory corruptions all over the software ecosystem across a variety of operating systems and architectures. Some languages have already eliminated that waste, by means of automatic memory management and memory safety. This is the part that doesn't agree with your preconceived notions in turn, and therefore you omit it from your posts altogether.

Sure, if you're aiming for a trivial "hello world" program, then you can write it in the language that will give you the best result of the optimization function of time spent writing and debugging it, binary size, syscall size and whatever other measures you might want. So assembly and C are absolutely fine. And then, when you want to quickly and effectively write an actual system that utilizes the features of the respective languages' runtimes, use the same method: write it in the language that'll give you the best results under that same optimization function. And I can bet a few bucks that total development and bugfixing time will skyrocket for the low level languages, giving you all the waste that you mentioned above.

> You're ignoring the actual evidence of people getting important things done in low level languages

Again - I said that these achievements are undisputable and evident. The number of memory safety bugs in these low level languages is also one of them, which is a fact that you keep on ignoring.


> Honest question: have you ever actually done any (professional) development in C/C++ or Rust?

I did in C and C++ and ran away screaming from them 12 years ago. I am learning Rust lately and I am actively rewriting several small-to-medium-sized projects that I wrote in Ruby or Elixir in the past.

Accidentally, some of those projects have intersections with C/C++ code I wrote ~15 years ago and the Rust code is much more manageable, it's smaller, more understandable (although it has a pretty quirky syntax at places) and I am confident it will be easier to troubleshoot -- if my 18 years in the area are worth a penny. It doesn't carry invisible gotchas between the lines which is IMO the biggest win when you write a lower-level language.

> Also, I'm calling bullshit that manual memory management lacks "safety and sanity". It's harder. You can screw it up.

Correction: you WILL, INEVITABLY, screw up manual allocation, eventually. That's the crux of your discussion with the other guy here. The mental baggage you have to carry in your brain to avoid this is not worth it for 99% of the projects, at least those in my career. None of my customers and employers ever contracted for NASA or gaming studios, so reaching for C/C++ was a huge waste of human time, most of the time.

If you have to write an embedded OS or there's absolutely no way to use Go or Rust to interface with a hardware piece that still has no driver -- and you are the one writing it -- sure, C/C++ are still a good choice (although I do wonder for how much longer; Zig and Rust look better and better with time but I imagine LLVM still has to improve a lot so Rust is useful for OS/driver development).

If not, using literally everything else on the spectrum -- from Rust to Python/PHP/Ruby/Elixir/JS -- will work fine.

> That doesn't mean that the default should be "pretend like machines have infinite RAM and we'll never run out". We're engineers, we make tradeoffs here all the time.

I don't think anyone here said that but I haven't scanned the whole thread yet. You are indeed fighting an [extreme] straw-man in my eyes as well.

> I am entirely aware of how convenient a garbage collector is. I also don't think "helloworld" should compile to be 1MB. We should have standards.

Now we're getting somewhere! :)

We absolutely should be having standards. The problem is sadly very common: everybody is like "my MacBook could handle Python's runtime inefficiency so meh to you old man yelling at cloud, we got work to do". IMO all programming languages should have a natural way to sink to the bottom of popularity if their runtimes are not improved over time.

I am a good example of how people should pick their tech (given no firm requirements from employers, of course). I tried several dynamic languages and made a pick based on my productivity, the helpfulness of the community, the average quality of the libraries, and the technical prowess of the runtime. I do wish it had a shorter bootstrap time but hey, it's specifically made for servers so I can swallow it.

On the other end of the spectrum, I'd like to make a comeback in the very fast low[-ish] level languages so I am learning Rust. I know I can gain extra runtime speed (and maybe memory efficiency but that's not a given) if I go back to C and C++ but I'd like to balance out productivity with stress and overall ability to get shit done. And especially important: not join tribes that yell "real men use C++!", I am sick of that crap.


I don't think you and the other guy actually disagreed but kind of got hung up on a few minutiae.

I also believe we indeed should have standards. (Which means software like Slack should not be used at all.) But we all know that writing compilers and optimizers is pretty damn hard.

Your thinking like a cog in a machine. It is easier to program in high level languages. So easy that the hard part becomes dealing with management, constantly rewriting for marginal improvements. You start thinking the sprint is important and you ststt choosing a language so hiring is easier. In the end we end up with bloating teams and bloated software that continues to do less use more cpu but costs more

It was more difficult to write a huge program in C. But you needed a smaller team. You are so deep in the system management is always over their heads.

This example is not too relevant because the overhead is probably the additional tools (e.g. debug, error handlers, etc), there should be multiple examples for a normal comparison, e.g. 3-4 examples of more and more advanced code since then maybe these numbers would increase in a similar fashion.

"What is this supposed to prove?"

Perhaps it proves that if one is writing relatively small, simple programs, some of these languages are overkill.

Two notes on Go. Since Drew is stripping the binaries for many other languages, he should also do it for Go:

  go build -ldflags '-s -w' -o test test.go
Also, since he's complaining about the difficulty of obtaining a static build in Go: There's an issue for that. (https://github.com/golang/go/issues/26492) Drew definitely knows, he left an incredibly unhelpful comment there yesterday.

Unhelpful, but I must be jaded because I’m not sure I’d call it “incredibly unhelpful”.

You do have to wonder why nobody thought ‘this is getting a bit crazy’ before though.

You do?

It is known.

I guess I could have written: I think someone should have wondered whether this was actually a reasonable way to make a statically linked binary before.

Maybe if you’ve gotten used to invoking the magic it doesn’t seem quite so arcane anymore.

FWIW if you don't need cgo then CGO_ENABLED=0 is sufficient to get a static binary. The tricky part is getting a static binary while also embedding C libraries. That's what the github issue is about.

Try the pronoun form of "one"

As it's not on Drew's list:


  $ cat hello.nim
  stdout.write("hello, world!\n")
Static (musl):

  $ nim --gcc.exe:musl-gcc --gcc.linkerexe:musl-gcc --passL:-static c -d:release hello.nim
  $ ldd ./hello
  not a dynamic executable

  Execution Time: 0m0.002s (real)
  Total Syscalls: 16
  Unique Syscalls: 8
  Size (KiB): 95K (78K stripped)
Dynamic (glibc):

  $ nim c -d:release hello.nim
  $ ldd ./hello
  linux-vdso.so.1 =>  (0x00007ffc994b6000)
  libdl.so.2 => /lib64/libdl.so.2 (0x00007f7c88785000)
  libc.so.6 => /lib64/libc.so.6 (0x00007f7c883b8000)
  /lib64/ld-linux-x86-64.so.2 (0x00007f7c88989000)

  Execution Time: 0m0.002s (real)
  Total Syscalls: 42
  Unique Syscalls: 13
  Size (KiB): 91K (79K stripped)
Which I think is actually pretty reasonable for a high-level GC'd language.

"Size (KiB): 95K (78K stripped)"

Seems suspicious that lines up with the 95.9 KiB the author listed for C + GCC + musl static build even though the author says they stripped the binary after. I think they might have copied the wrong number into the table :).

The author was counting the size of dynamic as binary + dynamically linked files. Should be about the same as the c dynamic ones in the table in this case anyways but just a note to anyone else running their own tests.

Nim compiles to C

When testing with glibc, I got further improvements when I used echo instead of stdout.write, and also even more with

    -d:danger --opt:size
On lobste.rs, someone also listed results for OCaml and they were quite nice IMO too.

Curious if the generated C is any different for nim's echo as opposed to stdout.write().

Sorry but I have to give this a thumbs down for not being a very convincing or well-written blog post. It dumps some data and then immediately jumps to a statement about how "lots of syscalls = bad" without actually detailing what those syscalls are doing in the context of the runtime. And I'm saying this as someone who already runs Alpine on my servers and doesn't need to be convinced. Drew, I think you can write much better posts than this.

More matter with less art.

I thought it was a breezy read with a simple thesis. I don't know why such a thing should be discouraged.

It should be no surprise to users of CPython and Ruby that those languages have a lot of startup code. The details of what that code is doing are already evident if you're watching it happen in an strace log, but those bits were left out. This isn't art, it's details, and without the details, it's just preaching to the choir. No new readers are going to be convinced.

I didn’t see a great deal of effort to convince anyone of anything.

Why must all blog posts be attempts to change the world?

Where I come from, the responsible thing to do when making a public statement is to back it up with facts, sources and explanations. We get enough blogspam and clickbait articles making the rounds here as it is. Please don't contribute to that.

It's still an interesting table. I was surprised that Java is 10x faster than Python. I would have expected initializing the JVM would be similarly complex to initializing the Python Interpreter.

When you run "python" you're most likely running CPython, an interpreter written in C that was 2.5x faster than Java according to this table.

You're looking at the numbers for PyPy, an alternative Python implementation written in Python-ish compiled to C which provides a JIT compiler for Python. A bit more understandable why that initializes slowly (though runs faster for longer programs).

The author doesn't state the Java version but more recent version JREs (9+ IIRC) start up way faster than older versions. I'd imagine a JRE's launch time is heavily influenced by disk cache. A warm launch ends up way faster than a cold launch with the tens of megabytes of classes in RAM means the warm launch basically loads the program's class file from disk.

He literally does list the Java version, though...?

OpenJDK 11.0.5 JRE

Well I guess I missed it.

My point is that it's not clear from the article why that is the case.

Haskell: https://gist.github.com/nicolasff/fe5668194769c175d9d4dfc78f...

38 bytes source code, 970 KB binary (782 KB stripped) 0m0.003s execution time, 139 syscalls (26 unique).

Seeing the list of syscalls is the most interesting part of this whole exercise; the number of milliseconds it takes to print "hello world" is not super relevant (except in the few cases where start-up time is painfully long).

"Passing /dev/urandom into perl is equally likely to print “hello world”"

That gave me a good chuckle towards the end.

It'd be useful to break this out a little further as it'd have been interesting to see how small just the output is on the dynamically linked versions instead of just comparing static to whole dynamic bundle.

It's also a bit odd that e.g. zig gets optimized for size and stripped via the compiler, c gets optimized for speed and stripped via strip, and Go/Crystal just gets built standard with no stripping at all. I don't think it'd change the big picture just a bit odd.


Unrelated tangent/ramble, I played with Zig and Go as part of my yearly "take December off and tinker" break. Zig was really fun to work with but unfortunately still in a huge churn and development. Go was a lot better than I expected it to be (I had put off messing with Go for a few years now) and the size of the stdlib is just astounding. In the end it wasn't as "fun" as zig but it had very low friction and I definitely see myself using it for a few personal projects over the next year... and then seeing if Zig has less churn in December ;).

What is with it lately where there seem to be lots of posts fetishising absolute performance over lots of other attributes, or even worse, pretending those other attributes don't even matter.

What is the point of this post? Yes, I fully expect a simple Hello World in assembly would be straightforward and fast. I still want the advantage of things like automated memory management, an interpreter or JIT compiler where warranted, a standard runtime environment, etc. For anything even remotely complicated.

I get it, over the past 30-40 years we've built layers upon layers of abstraction, so it's worth it to take a look back and ask "Are there some cases where we overdid it?" Still, let's not throw the baby out with the bathwater, or forget why we added those layers in the first place.

I still find the table worthwhile, and maybe even worth looking into for the worst outliers. It's not definitive, but the spread is interesting.

This isn’t a recent trend, very common not just on HN but many, many communities of computer science.

I think it’s sort of like how a lot of people fetishize a party lifestyle in their 20s and age out of it often when they get more perspective and understand bigger picture priorities.

This would be a portable C (i.e. non x86-64, but still Linux) with just 2 syscalls:

  $ cat a.c
  #define _GNU_SOURCE
  #include <unistd.h>
  #include <sys/syscall.h>

  void _start(void) {
    syscall(__NR_write, 1, "Hello world!\n", 13);
    syscall(__NR_exit_group, 1);

  $ gcc -nostartfiles a.c -o a -static
  $ ls -l a
  -rwxr-xr-x 1 user user 9584 jan  5 00:07 a
  $ strip -s a; ls -l a
  -rwxr-xr-x 1 user user 9000 jan  5 00:10 a
  $ strace ./a
  execve("./a", ["./a"], 0x7ffd4d4f8250 /* 67 vars */) = 0
  write(1, "Hello world!\n", 13)          = 13
  exit_group(1)                           = ?
  +++ exited with 1 +++
I'm not actually sure why syscall() is inlined and glibc code is not included here while compiling with -static, but well, it works. Maybe it's because syscall() is a macro, or maybe it's some kind of ld code minimization (removing unused/unreachable code) trick which seems to be common in modern static linkers (ld, gold, lld).

> portable

> #define _GNU_SOURCE

If it's portable, what's the point of defining _GNU_SOURCE? Honest question.

It's 'portable' as originally stated, across various Linux CPU archs.

_GNU_SOURCE is requested by man 2 syscall. Interestingly it mentions it appeared in 4BSD, so possibly it might work (maybe with some changes) under other Unixy platforms.

I’m not sure why they aren’t using the SYS_* constants either.

Yup, it'd make it (also using SYS_exit instead of SYS_exit_group) compilable under various *BSD OSes. Doesn't work for some reason under NetBSD due to some linking problem producing invalid binary for the platform.

  $ gcc -nostartfiles a.c -o a -Wl,--entry=_start -static
  $ ./a
  -bash: ./a: cannot execute binary file: Exec format error
  $ file a
  a: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped\
But the final binary is impressively small, and it seem to contain all relevant code (body of _start and syscall funcs)

  $ strip -s a; ls -l a
  -rwxr-xr-x  1 user  users  2688 Jan  5 07:53 a

Works under FreeBSD though:

  $ clang -nostartfiles a.c -o a -static
  $ ./a
  Hello world!
  $ strip -s a; ls -l a
  -rwxr-xr-x  1 user  user  9216 Jan  5 07:59 a

Random thought/question, does `process.stdout.write("Hello World");` in Node.js make any difference? While `console.log()` is correct for this analysis since it's the more common one, it does a lot of extra internal logic: https://github.com/nodejs/node/blob/v13.x/lib/internal/conso...

Edit: I'm just curious and don't know how to even start testinng this, not trying to promote/demote Node.js in any way.

I'm not able to reproduce the authors results (nor reliably ANY result) but you're right. Using `strace -c node -e <code>` I get ~630 syscalls using `console.log("Hello world")` and ~620 syscalls using `process.stdout.write("Hello World");`.

This is on nodejs v13.5.0

EDIT: My previous comment was made using an old version of nodejs, the update halved the number of syscalls from being around ~1350

If it is any better, it probably would be the more fair entry (and perhaps similar for python and stdout/bytes) since Go's example did basically that instead of fmt.print with a string

I would be surprised if console.log() made many more syscalls than process.stdout.write. More function calls probably but those aren't being counted and neither is RAM usage. "strace" would let you count and find out though!

The size would be a few bytes larger since node is scripted and that's more characters.

As a performance optimization I use stderr instead of console.log, for some reason stderr is much faster then stdout :P And console.log is slow as hell, mostly because it's sync, but also because it does a ton of stuff; parsing the parameters, etc.

It is not that stderr is faster, it is that it is unbuffered by default (while stdout is buffered by default). If you flush your output after printing to stdout, you should get a similar result.

And well, I am thinking you're talking about "much faster" for small amount of text (small enough to not fill the stdout buffer). Actually printing huge amount of text to stdout should be much faster than printing to stderr exactly because of this buffer.

If you're micro-optimizing javascript, you're already doomed.

This appears to have been written by someone who thinks the point of "Hello World" is to print the string "Hello World" as efficiently as possible. If that were the case, Kernighan wouldn't have written the first one with "printf()".

Why shouldn't we expect a compiler to emit the most efficient code? Isn't that the point?

If they're not emitting the most efficient code on "hello world", that's a thing. It's not groundbreaking, but, I'd like whatever language I choose to be efficient with the small things as well as the large things.

I know what you're going to say. "printf does a lot more". Ok. But can I statically analyze: printf("hello world"), and notice that it's not doing anything interesting with zero ambiguity?

Kernighan's compiler didn't do that, and he had puts(3) available, because he, you know, invented it at the same time. Was he just a suboptimal C programmer?

Nice dodge, but the question is: should the compiler emit the most efficient code possible, or not? I don't care what Kernighan's compiler did, I care what's possible. It's been 40 years. What do we got?

I'm sorry, I don't understand what this has to do with the point I made. I was talking about Kernighan, not about the most efficient possible modern implementation.

Well, maybe I missed your point then. Is your critique that the author used "printf"?

I was responding to this statement:

> This appears to have been written by someone who thinks the point of "Hello World" is to print the string "Hello World" as efficiently as possible.

To me... that would straightforwardly seem to be the point of writing a hello world program.

My critique is that the author missed the point of "Hello World".

Ok, I'll bite: In your mind, what is the point of "hello world"? Because I think the author's perspective is that it's a "small program that does something interesting". That seems pretty reasonable. I'm completely lost on what you're trying to say. I don't mean that in a snarky sense. Obviously you're a well regarded poster here and I'm interested in what you have to say. I just have no idea what you're communicating.

The point of "Hello world" is to be the minimum program to demonstrate a programming environment (and verify that you have it working). Optimizing it is a very weird idea.

Should the compiler get to decide arbitrarily at which point a program becomes important enough to optimize, or should it apply the same level of optimization (based on compiler flags etc.) constantly so that you can reason about what will happen when you compile the code?

The blog post isn't even about how optimized the end result is. It's to get you, the programmer, to think about the cost you incur by going further down in this table when choosing your language/environment. Sometimes that's fine; Drew himself writes a lot of Python for example for his web stuff because it's the best choice for it and writing it in C or Zig is a pointless effort. The key thing to note though is that he picked Python while being well aware of this table.

Lots of programmers today aren't aware of this table. That's the point of the post.

>Should the compiler get to decide arbitrarily at which point a program becomes important enough to optimize

It probably already does. In a real program run with a real compiler you cannot reasonably know what the optimization process is going to do. You have to look at the output.

>It's to get you, the programmer, to think about the cost you incur by going further down in this table when choosing your language/environment.

>Lots of programmers today aren't aware of this table.

Please don't. You don't have to be a CS prodigy to notice the size of an executable or the fact that a program is starting slow.

He didn't write the first one with "printf()".


Because he didn't have it in B. The K&R C "Hello World" uses printf(), which itself does a whole bunch of crap other than printing out "hello world".

Mhm, like being recognized as constant by the compiler and turned into a call to puts?

Something Kernighan's compiler didn't do.

Their libc probably put the characters out slowly one-by-one for both of them, so the difference was probably just a branch or two. Not a whole set of extra syscalls.

How is a 40 year old compiler relevant?

Kernighan had puts(3) available, but chose printf(3), which does a lot more than puts. The point is Kernighan, not the compiler.

An alternate interpretation of the same data - if you're committed to writing a program in a language that requires a runtime / does a lot of work on startup, writing a program that prints "Hello, World!" does not make the most effective use of that language and runtime.

Put that way, most of this post seems like a tautology: if you misuse the tools you are given, of course you're going to get bad results!

It seems reasonable to me that a language should make the assumption that the programmer's use case matches the languages strengths, so by default any runtime setup / bookkeeping / teardown code should run. Failure to remove that extra "complexity" isn't a failure of the toolchain, it's a failure of the programmer to select the right tool for the job.

If the point the author was trying to make was that the complexity being added is never useful, this is not a post that argues that position. A cost / benefit discussion of the specific behaviours being supported by that complexity would be very interesting!

That's pretty cool. It reminds me of GodBolt (https://godbolt.org).

I'm told that the story behind it, is that he was arguing with someone about the efficiency of an operation, and actually wrote that site to prove his point.

It’s comparing unassembled assembler to JITed code and compilers that are pulling in precompiled libraries.

I feel it needs some kind of normalizing. I get that it is illustrating bloat but it doesn’t really illustrate where that’s coming from.

Maybe only the output of the JITs should counts, or the syscalls required to assemble the example should be included. Are musl and glibc really wasting cycles or are they doing something that the example is missing.

Fun to think about.

It's not even meaningfully illustrating bloat. Hello world is an unrealistic edge case. Any program that does something useful will be far more complicated, and it's entirely possible that a lot of the extra stuff being measured here will required anyways.

> It’s comparing unassembled assembler to JITed code

No, he runs the assembly through NASM + GCC as documented on the page.

I think it's a comparison of "when the user runs the program what runs, how long does it take and how much disk space does it need" based of the column headings. It's not a comparison of the tooling prior to the user's computer as far as I can tell.

It's deliberate that JITs, interpreters, and compiled languages are compared on the same terms here. JITs and interpreters are fundamentally less performant than compiled languages, they don't get a pass on performance tests just because it's by design.

> JITs

This is...highly context dependent. For highly polymorphic code, my understanding is that JITs can outperform precompiled binaries, since they can inline virtual/polymorphic calls in tight loops.

This also isn't a "performance" test in any real sense. It's a test of startup time. Where, yes, JITs lose, but unless you're writing short lived interactive command line tools, or something that runs on lambda, that shouldn't be a concern. For "normal" serverside or desktop apps that run for more than, say, 30 seconds, the difference between 0s of startup and 0.2s of startup time is literally in the noise.

Startup time is the least interesting metric in this article. The more interesting metric is the number of syscalls. This isn't a measure of performance, it's a measure of complexity and busywork. Complexity tends to indirectly affect performance, but that's not the point of the article.

Complexity of what?

The resulting generated binary? Well no, a python binary is smaller than the c binary. The toolchain? Well gcc is pretty complex and that's unaccounted for. The build process? Again, no.

The closest thing I can think of is the language runtime. But why do I care about how complex the language runtime is? Often more complex language runtimes make my life easier anyway, and they're all sitting atop the Intel microcode magic box anyway.

There's a very specific definition of complexity you're using, and I'm still not sure what it is. In my world, you usually add complexity to eek out extra performance by breaking the less complex abstractions.

I'd say the complexity imposed upon the final executable program, by the [supposedly sub-optimal] compiler / runtime tooling. That's how I understood the goal of the article.

The compleixty of the system.

I'm still confused. Why is runtime compilation a component of the "system", but aot compilation is not? Why are python interpreters a component of the system, but microcode interpreters and aot compilation not?

If you haven't given a clear definition of what "the system" is, I can't really use your evaluation to influence my decision making.


That difference is irrelevant when your production system is updated a lot, or when you actually need the runtime compilation, which a lot of programs do. Even on mobile and embedded where you would think this wasn't true, the customers still really don't care if given the luxury to do so by a fast CPU and high capacity battery.

I feel like printing "hello world" with, say, java, is like picking up your supermarket groceries on an Scania truck. All that extra machinery is there for when you need to tackle harder problems, but may represent a lot of overhead for super simple ones.

It's an interesting table, but what you really want to know is how much of the standard library gets linked in as soon as you do anything interesting. For example, if using one printf links in a lot and you almost always use at least one printf (or use a library that does), then the size you get when you entirely avoid printf doesn't matter, because that's not a practical thing to avoid. But you might want to know the minimum size with printf.

Figuring how where the binary size cliffs are (what causes size to grow a lot) and which cliffs it's practical to avoid might be useful.

Perl seems to lead the pack for interpreted langauges by a wide margin for this microbenchmark. I wonder if that's just for the narrow case of print() / hello-world.

I think it holds true for any kind of string management. It's kind of why Perl exists in the first place, to take one bunch of strings and turn them into another, probably printing them out somehow.

It stands to reason that the default use case of Perl is well optimized.

Rust drops to 16KB (on a Mac, I suspect nearer 8KB on Linux) if you set panic=abort, turn on LTO, and drop the std library:

  #![feature(start, lang_items)]

  #[link(name = "c")]
  extern "C" {
      pub fn puts(s: *const u8);
  pub extern "C" fn main(_argc: i32, _argv: *const *const u8) -> i32 {
      unsafe {
          puts(b"hello world\n" as *const u8);
  fn panic(_info: &core::panic::PanicInfo) -> ! {
      loop {}
  #[lang = "eh_personality"]
  extern "C" fn eh_personality() {}

2x assembly isn't bad, but I'm willing to bet we can go smaller.


(See also http://mainisusuallyafunction.blogspot.com/2015/01/151-byte-... which was the original implementation, this shaved a few more bytes off)

>The following assembly program is the ideal Linux x86_64 program for this purpose. A perfect compiler would emit this hello world program for any language.

I.e. you want your compiler to emit code that isn't portable across unix-like systems (and their future versions) in any degree? AFAIK Linux is the only unix-like system that guarantees ABI stability.

I'm curious what Zig does in this case that it got so "good" results. Does it forgo portability?

Linux binaries should take advantage of the stability of the platform they’re on, surely?

I’m curious to know how NIM performed given it transcodes into C.

Someone commented a test shortly after you asked https://news.ycombinator.com/item?id=21957476

I’m actually curious why Nim didn’t make the list. Crystal is there and a lot of other emerging languages. Is the author not familiar with it?

Drew's familiar with nim, and he's talked with you about it before on HN. Crystal was volunteered by someone working on Crystal, Julia was volunteered by a dev who works using Julia, etc. Haskell was also presented, I think, but Haskell was a mess.

What is the history of the Hello World example program? I'm not sure why it's become this defacto example of syntax for every language. Maybe there is something else we could use as an example that would be more indicative of syntax but perhaps I'm overthinking it.

Brian Kernighan, 1972, A Tutorial Introduction to the Language B.

And yeah, I think it's an okay intro to syntax because at least it shows you the minimum boilerplate to get some output.


I have always thought it first appeared in K&R, but I do not know where I got that from.

I'm curious what method was used to compile the Java version. I'm assuming it was jlink? If you use graalvm's native image then last time I checked the output was 6 MiB and runtime ~1 ms. And it may be much better now with Java 11.

It would be interesting to know why the number of syscalls for C are so high.

I’m wondering why his dynamic glibc C executable is so big.

He's counting glibc too, not just the binary. This is true for all of the examples, the size is the binary itself plus whatever system/libraries you'd need to actually run the binary.

Which version of C?

All versions are interesting, zig/assembly managesto do it in 2/3 so what is musl doing that needs 5? And what on Earth is glibc dynamic doing that it needs 65?

Curious what all the syscalls Rust is making, and why?

One reason is that println! will lock stdout for you. I'm not sure what percentage that makes up.

If you don't want that behavior, you can control this all yourself with write! and friends.

EDIT: deeper analysis on Reddit: https://www.reddit.com/r/programming/comments/ejxwlu/hello_w...

Most of them are probably from glibc.

> Passing /dev/urandom into perl is equally likely to print “hello world”

I chuckled.

I don't know much about Julia. Why is the binary over a third a GiB?

Julia by default is loading a system image which is made for scientific computing, so linear algebra support, a package manager, distributed parallel computing, etc. are all part of that. You can build a system image without all of that of course, but given its audience it's a good default choice.

What is the command used to get total and unique syscalls!? strace wc !?

test.c compiles to 6 kilobytes on my Linux Mint machine... how did he get it to compile to more than 2 megabytes?

Run ldd on it, the "true" size will probably paint a different picture.

Add the size of all the dynamic libraries that your executable depends on - which most likely means glibc, the C runtime.

Anyone want to do Clojure + GraalVM?

And then, someone went and invented the graphical user interface. So many precious machine cycles wasted rendering graphics for people who are too lazy to use the command line like they are supposed to. Shhheez.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact