Hacker News new | past | comments | ask | show | jobs | submit | hyperbrainer's comments login

Brilliant resource for anybody beginning with programming languages


Ads, popups, horrible laggy UI all to make money. It is not user experience, just money


Really well written, and well explained without any bait


> (Of course, I edited the code to output the results, otherwise, it just optimized out everything.)

O(1) :)


Why was C slower though?


Author mentions they didn't use optimization flags but doesn't include the compilation details. You can sort of guess that (relatively) unoptimized C might perform worse than V8's JIT on short, straight computational code - you're more or less testing two native code generators doing a simple thing except one has more optimizations enabled and wins.


Oh, I assumed he still did -O2, and did not do anything else. Is that bad to assume?

PS: I do not use C beyond reading some of its code for inspiration, so kinda unaware


It just says 'no optimization flags' and that's that, I don't think even the compiler is mentioned so the author is not giving you a lot to go on here - you don't even know what the default is.

A modern C optimizing compiler is going to go absolutely HAM on this sort of code snippet if you tell it to and do next to nothing if you don't - that's, roughly, the explanation for this little discrepancy.


Simply compiling it with -O3 produces something which completes in half the time of the JavaScript version (350ms for C, 750ms for JS), so perhaps that.

Edit for Twirrim: on this system (Ryzen 7, gcc 11): "-O3": 350ms; "-O3 -march=native": 208ms; "-O2": 998ms; "-O2 -march=native": 1040ms.

Edit 2: Interestingly, changing the C from float to double produces a 3.5x speedup, taking the time elapsed (with "-O3 -march=native") to 58ms, or about 12x faster than JS. This also makes what it's computing closer to the JavaScript version.


For fun and frolics:

No flags: 1843ms

-march=native: 2183 ms

-O2: 423 ms

-O2 -march=native: 250 ms

-O3: 425 ms

-O3 -march=native: 255 ms

O3 doesn't seem to be helping in my case.


They didn't use any optimization flags.


Yeah, I was just trying to show the difference. Doing it without optimisation flags is an utterly bewildering decision by the author.


Especially given the goal was increasing performance.


Seems crazy to me that double would produce that kind of speed up. Is float getting emulated somehow? Don't they end up the same size?


> Don't they end up the same size?

No. Float is half the size of double.


My intuition would have been that floats were faster because of this. Less memory to iterate through.


it is faster in just about every way. less memory, even the cpu instructions (which are usually not the problem) are faster. there's something fucky going on with code gen here. or it could also simply be the measurement procedure that is doing something weird like working with not properly cold or equally warmed up data or instruction caches.


Did you add output of the results to the C code?


I did not, but I confirmed with objdump that my compiler was not removing the code.

(But to be sure, I just ran it again with an output and got the same value.)


No optimization flags could be a big part of the reason.

Haven't looked closely at the code or tried it, but with -O3, -fopenmp and a well-placed pragma the performance could increase many-fold.

Heck, with NVC++ you could offload that thing to a GPU with minimal effort and have it flying at the memory bandwidth limit.


This could actually be nice for code golf.


The lto setting controls the -C lto flag which controls LLVM’s link time optimizations. LTO can produce better optimized code, using whole-program analysis, at the cost of longer linking time.

The valid options are:

    false: Performs “thin local LTO” which performs “thin” LTO on the local crate only across its codegen units. No LTO is performed if codegen units is 1 or opt-level is 0.
    true or "fat": Performs “fat” LTO which attempts to perform optimizations across all crates within the dependency graph.
    "thin": Performs “thin” LTO. This is similar to “fat”, but takes substantially less time to run while still achieving performance gains similar to “fat”.
    "off": Disables LTO.
Basically, increases performance at cost of compile time. Great for release builds, not so much for debug builds (because most of the time that rustc spends is on linking anyways.)


Also, codegen-units = 1 disables some compiler parallelization which means it can find more optimizations, including for size: https://nnethercote.github.io/perf-book/build-configuration....


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: