
Common Systems Programming Optimizations and Tricks - ptc
https://paulcavallaro.com/blog/common-systems-programming-optimizations-tricks/
======
ncmncm
A discussion of systems programming optimization that doesn't start with
single-writer ring buffers starts on the wrong foot.

Those other tricks are excellent, and I use all of them, in cases where they
work at all. But, e.g., seeking a way not to need to take a lock at all should
come before discovering a low-contention locking protocol.

Readers should note that packing spare bits into the bottom bits of suitably
aligned pointers is more portable than using high bits. Any page-aligned
pointer has at least 12 bits free at the bottom, and any malloc'd pointer has
at least 2, often 4.

Ring buffer counters into a power-of-2 sized buffer can be incremented without
bound, enabling use of ordinary arithmetic on them, and high bits masked off
cheaply on each use.

Probably the most neglected primitive data structure is the bitmapped set. A
`uint32_t` gives you a universe of 32 elements; a byte is enough for the days
of the week. The popcount native instruction is very valuable here, usually
expressed as `__builtin_popcount` in source code. C++98's `std::bitset`
provides Standard, portable access to it, but C++20 offers `std::popcount`
directly.

