Are you aware of any side channels introduced by compiler optimizations that weren't present in unoptimized code? Recompiling with -O0 vs -O2 doesn't seem likely to fix your pkcs padding bug. Or hmac strcmp.
Interesting. Looks like it affects all version of llvm (at least those that compile).
I guess with LTO the result could be the same, but asking to always inline seems like asking the compiler to fuse operations, and then it sees shortcuts. Dunno.
`always_inline` is only there to make -O0 smaller for presentation purposes. It has no impact on -O2.
The trick here is that -march=i386 predates CMOV, and that LLVM specializes code emitting for bool (and _Bool). If the secret bit were an uint32_t, there wouldn't be a branch anymore.
There is one class of cryptographic code, however, that is entirely unsuitable to distribute in Bitcode---DPA/EM-protected code. EM attacks on middle-end ARM chips have been demonstrated recently [1, 2].
Protecting against these attacks usually involves splitting the computation into 2 or more "shares" (see, for example, [3]); these require strict control of which register each word goes into, and which registers overwrite which. This cannot be enforced in Bitcode---or any other bytecode, for that matter---and direct assembly must be used.