I see this as the perfect moment to get into consulting - either development, or security. People were not sure what jobs AI will create: "GenAI babysitting" is one of them.
"Make one Ubuntu package 90% faster by rebuilding it and switching the memory allocator"
i wish i could slap people in the face over standard tcp/ip for clickbait. it was ONE package and some gains were not realized by recompilation.
i have to give it to him, i have preloaded jemalloc to one program to swap malloc implementation and results have been very pleasant. not in terms of performance (did not measure) but in stabilizing said application's memory usage. it actually fixed a problem that appeared to be a memory leak, but probably wasn't fault of the app itself (likely memory fragmentation with standard malloc)
I did research into the glibc memory allocator. Turns out this is not memory fragmentation, but per-thread caches that are never freed back to the kernel! A free() call does not actually free the memory externally unless in exceptional circumstances. The more threads and CPU cores you have, the worse this problem becomes.
One easy solution is setting the "magic" environment variable MALLOC_ARENA_MAX=2, which limits the number of caches.
Another solution is having the application call malloc_trim() regularly, which purges the caches. But this requires application source changes.
FWIW i had it with icinga2. so now they actually preload jemalloc in the service file to mitigate the issue, this may very well be what you're talking about
True, I also believed it for a second. But it's also easy to blame Ubuntu for errors. IMHO they are doing a quite decent job with assembling their packages. In fact they are also compiled with Stack fortifications. On the other hand I'm glad they are not compiled with the possibly buggy -O3. It can be nice for something performance critical but I definitely don't want a whole system compiled with -O3.
To me it's obviously a scam because there's no way such an improvement can be achieved globally with a single post explanation. 90% faster is a micro-benchmark number.
This is neither a micro-benchmark nor a scam, but it is click-bait by not mentioning jq specifically.
Micro-benchmarks would be testing e.g. a single library function or syscall rather than the whole application. This is the whole application, just not one you might care that much for the performance of.
Other applications will of course see different results, but stuff like enabling LTO, tuning THP and picking a suitable allocator are good, universal recommendations.
True that, I mean it is still interesting, that if you have a narrow task, you might achieve some significant speed up from rebuilding them. But this is a very niche application.
true, i saw a thread recently on reddit where guy hand-tuned compilation flags and did pgo profiling for a video encoder app that he uses on video encode farm.
In his case, even a gain of ~20% was significant. It calculated into extra bandwidth to encode a few thousand more video files per year.
I wonder how many prepackaged binary distributions are built with the safest options for the os/hardware and don't achieve the best possible performance.
I bet most of them, tbh.
Many years ago I started building Mozilla and my own linux kernels to my preferences, usually realizing modest performance gains.
The entire purpose of the Gentoo Linux distribution, e.g., is performance gains possible by optimized compilation of everything from source.
the title is clickbait, but it's good to encourage app developers to rebuild. esp when you are cpu bound on a few common utitilities e.g. jq, grep, ffmpeg, ocrmypdf -- common unix utils built build targets for general use rather than a specific application
Or, if I understand TFA correctly, don't release debug builds in your release packages.
Reminds me of back in the day, when I was messing around with blender's cmake config files quite a bit, I noticed the fedora package was using the wrong flag -- some sort of debug only flag intended for developers instead of whatever they thought is was. I mentioned this to the package maintainer, it was confirmed by package sub-maintainer (or whomever) and the maintainer absolutely refused to change it because the spelling of the two flags was close enough they could just say "go away, contributing blender dev, you have no idea what you're talking about." Wouldn't doubt the fedora package still has the same mistaken flag to this day and all this occurred something like 15 years ago.
So, yeah, don't release debug builds if you're a distro package maintainer.
Vector operations like AVX512 will not magically make common software faster. The number of applications that deal with regular operations on large blocks of data is pretty much limited to graphical applications, neural networks and bulk cryptographic operations. Even audio processing doesn't benefit that much from vector operations because a codec's variable-size packets do not allow for efficient vectorization (the main exception being multi-channel effects processing as used in DAW).
Thanks for the correction. I hadn't considered bulk memory operations to be part of SIMD operation but it makes sense -- they operate on a larger grain than word-size so they can do the same operation with less micro-ops overhead.
in case you don't know, some Gameboy games required to have Nintendo logo in the game data as part of copy protection. allegedly that was legal protection against bootlegs.
"Description In Zip" has the typical clunkiness of a backronym and the 180° theory suffers from the fact that filenames (as lloeki rightfully pointed out) were usually presented in all uppercase (even when lowercase was available) in that era.
Phonetically shortening stuff on the other hand was almost a requirement in the scene, even if you had the space.
EDIT: Thinking about it "identify this" is well in line with "read me".
> Compact tape drive. which uses 1/4-in. magnetic tape cassettes operating at 80 ips, each containing every street and specific address, for an area about twice that of an ordinary paper street map, as well as overviews of major state and regional roads, and national interstates (installed under the vehicle dashboard or in the glove compartment).
So it seems that, if it's not 80 inches per second, then the confusion dates at least back to 1985!
"The actual production of Musicassettes was done on machines running 32 times faster than normal playback. Cassette tape would be reeled over four heads recording what would be both sides at once at 60 IPS. The master tape that was source of the original music had been recorded at 7.5 IPS and this would also run 32 times faster, clocking up a playback speed of 240 IPS for duplication purposes."
"This super-fast tape transport also required the circuitry to follow suit. So instead of the bias frequency being around 80kHz, it was now 2.4MHz; the amplifiers also needed to work over a frequency range of 200kHz to 500kHz."
Commodore 64 with best tape Turbo is able to store slightly above 1MB per cassette. Japan already had floppy drives capable of storing over 1MB in 1983, but it looks like Etak needed more, with this 40 times faster tape drive delivering:
"local map data base stored on a 3.5-MByte tape cassette."
I would love to learn more details about this drive. Modulation used? Number of tracks? Format? Magnetic flux dump of one of the cassettes would be a lovely puzzle to decode.