0x3030303030303030 -- the byte array of '0's encoded as ASCII
0x8040201008040201 -- the magic
0x101010101010101 -- only 0 or 1 to be added to the appropriate array element
Therefore, more readable:
void to_bin(unsigned char c, char *out) {
/* endian dependent, works on x86 */
*(unsigned long long*)out = 0x3030303030303030ULL +
(((c * 0x8040201008040201ULL) >> 7) & 0x0101010101010101ULL);
}
The function is of course endian dependent (I'd add a comment about that fact in it) and the explanation of the quoted function on the page starts at "If you shift the the constant by 7 bits to the right." The page is a result of an edit where the older function (that ends with + c / 128;) was explained first.
the last thing I did was changing the constants to decimal - they look way more scary this way ;>
because, you know, programming should be innately difficult and confusing so we can laugh at others who understand less than we do.
Never underestimate how many times code (any code) posted online will be copy/pasted throughout the world everywhere from school projects to production systems.
Cool hacks seem to be lightening rods for the humorless. There are entire contests dedicated to obfuscating code! This was a neat, unintuitive solution that he revealed in steps with a surprise ending. As an essay it worked perfectly.
The whole thing is whimsy. You'd never want to use this approach at all in a production system. Using decimal constants instead of hex is the least of the sins here.
There should be room in programming for whimsy. This stuff is fun, and play is healthy. If somebody takes this thing seriously and copies the code into a production system, that's their own fault, not the author's.
> You'd never want to use this approach at all in a production system
Actually it's not less useful than the code the compilers you depend on use all the time. They really depend on such "bit twiddles." Guy Steele agrees too, read his foreword to the "Hacker's Delight," a whole book about such functions:
> I would not use this code without being very thoughtful and intentional about where it was being used.
Sure, definitely not for everybody and every occasion. It's a platform dependent code. See my comment to
cnvogel and look at his implementation for an example.
My job is basically fixing the code and processes of corporate programmers who copy/paste examples from websites into production systems. There's little understanding of how the code works and almost no reading of related documentation. If I compiles, I ships.
It's like if bridge designers didn't really know mechanics and just copy/pasted minecraft designs into the real world. Sure, it's not the fault of minecraft designers, but we can help kill fewer people by making better examples.
I'm sorry you have that job. Or, maybe congratulations if you're the consultant who comes in and makes bucket loads of money fixing other people's crap.
Either way, expecting People Who Know What They're Doing to shut up about it so that we can prevent People Who Don't Know What They're Doing from misunderstanding is shameful. The linked code wasn't published in a textbook, nor was it submitted as a Stack Overflow answer. You have no right to expect people to sanitize their writings just in case they see it.
Sorry if I sound prickly, but this sentiment makes my blood boil.
You're suggesting that something that doesn't happen (mechanical engineers + minecraft) is analogous to something that does happen (corporate programmers + code poetry).
Are you then suggesting that we outlaw minecraft because there may be a brain dead mechanical engineer alive?
What you're suggesting is akin to either outlawing essays like this or only allowing "Professional Engineers" (i.e. certification required) to write code?
The only solution is for the latter since the former is typical of a professional engineer seeking intellectual stimulation and should be applauded.
I don't think that analogy really works. If bridge designers were doing this, there would be massive outcry and we'd put a stop to it. We allow it to happen with (terrible) programmers only because the harm they do is pretty limited, and nobody's getting killed because of it.
There are probably 100x more servers sitting in server farms running websites (and burning power) than need to be because companies would rather hire 10 cheap Ruby or PHP programmers than one "expensive" Erlang/Elixer programmer. So there is some harm to bad programmers, even for non-critical websites.
Software obviously kills people sometimes. But I'm pretty sure that people who blindly copy/paste code from the internet into production systems don't.
This code was already difficult and confusing and the author obviously pokes fun at his own code by decimalizing the constants. No need to accuse him of elitist or immoral behaviour.
It's a good and common practice to keep the constants in the sanest form and not obfuscate them, see the collections of such functions, like the already quoted
Sure, sure, in most contexts code as communication medium is laudable. Here, the game is to make people accustomed to understanding go "the hell just happened?" when it works. I like it.
You mean the whole article is a joke just because the article ends with "the last thing I did was changing the constants to decimal - they look way more scary this way"?
I see it as a presentation of an efficient specialized function, just like the other bit hacks are (some already linked, one more, a whole book, here: http://www.hackersdelight.org/ ).
If it's is to impress the uninitiated, such are more probable to not know the hex codes too. For them is the decimal number the one less scary than the hex one. For others, converting back to hex is just a tiny annoyance, nothing more.
But it's not efficient. The author even explicitly says so at the beginning:
> (Disclaimer: It's not faster. It's not better. It's not production-quality code. It's just is and it was fun to make ;>)
It's a joke, a joke with some context, that you tell other people who know the context. Would you accuse people of intentionally obfuscating things when they riff on "the door is ajar" joke?
No, on modern 64-bit processors it's definitely more efficient and faster than the loop variant. It's measurable. The author's disclaimer that it isn't faster is actually valid only for old or small processors.
Do you post similar rants to mechanical engineers building rube goldberg machines and electrical engineers building impractical extremely high voltage transformers?
Can you imagine that there are people who actually program on such a low level that they actually care about that kind of functions, constants and the number of T states that the resulting assembly code takes? I'm one of those. If these aspects aren't relevant to you please allow me to address them.
The article author actually engineered an effective function for spreading out bits to bytes. I invite you to read the comments here that are actually mine and what's in them.
Why isn't this faster than the "standard" loop based version? The disclaimer at the start says it's not, but I would have thought that 4 64bit math operations would be much faster than a loop with 8 steps and comparisons and so on.
First and foremost: YES, I KNOW! This is completely senseless microbenchmarking and micro-optimization. I ONLY LOOK AT THIS FOR FUN.
That being said. The 64bit multiplication is faster, and also constant in time. I've measured this using the timestamp counter which doesn't necessarily count actual CPU clock cycles, but possibly an integer multiple of it (seems to output always numbers divisable by 8 for me).
/* anyone have a big-endian machine at hand to test this? */
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define MULT 0x8040201008040201ULL
#else
#define MULT 0x0102040810204080ULL
#endif
Even without the machine I believe that such a big endian implementation can't work. Hint: you can try it on the little endian machine too, the resulting bits must be appropriate, and I believe they wouldn't. That's the beauty of the magic constants, their construction isn't so easy as it appears to be.
Here's a fancier, slightly less-magic, version. How to compile is left as an exercise for the reader (hint: Haswell or newer, requires BMI2 instruction set):
> x86 is little endian - that's why it's backwards
This can be read the wrong way, although it is correct: the author is referring to the final byte sequence and not the input bit sequence. Endianness applies to bytes, not bits.
Be careful with alignment. On architectures where it matters, the caller may pass in a "char* out" which is not suitably aligned, and doing an "unsigned long long" store into that address may fault.
0x3030303030303030 -- the byte array of '0's encoded as ASCII
0x8040201008040201 -- the magic
0x101010101010101 -- only 0 or 1 to be added to the appropriate array element
Therefore, more readable:
The function is of course endian dependent (I'd add a comment about that fact in it) and the explanation of the quoted function on the page starts at "If you shift the the constant by 7 bits to the right." The page is a result of an edit where the older function (that ends with + c / 128;) was explained first.Still, nice.