"benaadams commented on 3 Oct • edited
I couldn't believe there wasn't a number in the 2^64 space that did this. However, after exhaustive internet searching I couldn't find one. So I went to sleep, woke up in the middle of the night with a headache and idea of a method to try and actually derived it.
Having derived it, I did some searching with the search engines and couldn't find any hits either in its decimal or hexadecimal from so called it Ben's Magic Number
I will write a blog post on the method I used."
"I tried to come up with a similar magic number that would allow to obtain the index after shifting its bits, and found a different one.
Wouldn't this number just do the same and be a bit easier to understand? 0x01020304050607
We know that after doing x=(v & (-v)) we get a power of 2, which when multiplied with the magic number will shift its bits by a multiple of 8 positions:
So after being multiplied by x, 0x0001020304050607 is turned into
0x0304050607000000 when shifted by a 3-byte offset
0x0405060700000000 when shifted by a 4-byte offset
0x0506070000000000 when shifted by a 5-byte offset
and so on...
We then simply have to read the 1st byte of the resulting number to find the offset, ie:
offset = (((v & -v) * Magic) >> 56) & 0x7"
"benaadams commented on 4 Oct • edited
@nicodeslandes Congats! That is a lot more obvious and what is happening; and drops a shift. Already superseded; that's the power of OSS collaboration!"
The comments here are very nice and worth a read.
Github does strip some tags for security reasons like <script>. Some of the parsers I have seen offer this feature.
Kestrel will be one of the best when it comes to benchmark
 added video url
I think it's one of the best decisions MS has made, to go open source.
Guys in OP seem to be more clever, though
How its been done is more tricky. Lots of iterative development.
Vectorization, bit twiddling, fast-path inlines, analysing the output asm from the compiler, benchmarking, repeat...
Can you also explain the bit twiddling bit in more detail? What operation it makes faster and how it falls into a larger picture?
As to how it works: you use the v&-v trick to clear all but the top bit of your comparison result. Because you've only got 1 bit set, this value is a power of two; if you multiply another value by this one, you're getting the equivalent of a left shift by that power (which is relevant - you need to be thinking in shifts rather than multiplies).
Because of the way the predicate results are formed, you only have 5 possible values you're going to get here: 2^31, 2^23, 2^15, 2^7, or 0. The outputs these correspond to:
31 -> 0
23 -> 1
15 -> 2
7 -> 3
0 -> (special case - all identical)
A suitable value for this is something like 3<<22|2<<14|1<<6. (If I've made a mistake there, which I probably have, hopefully the working will make it obvious how to fix it.) Recalling that multiplying by 2^N gives you a left shift by N:
When multiplied by 2^31, you get 0.
When multiplied by 2^23, you get 1<<29.
When multiplied by 2^15, you get 2<<29|1<<21.
When multiplied by 2^7, you get 3<<29|2<<21|1<<13.
In all cases, you can shift the result right 29 bits (unsigned shift, or signed shift and mask with 3 afterwards) and get the value you're looking for: 0, 1, 2, or 3.
Zero you can check for before doing the multiply; or you could pop a 4 in at the bottom or something so that the result would be 4 in that case. (This is what I figured you'd want originally and that's why the result starts at bit 29. But actually I guess you'd probably check first.)
This extends naturally to wider comparisons. I think they're doing it 64 bits at a time in this case.
Exercise for the reader: (maybe these are too easy?? - however they weren't instantly obvious to me)
- Why is x * 2^N equivalent to x<<N?
- What if you have more than 1 bit set? Like x * (2^N+2^M)?
- Figure out some rules for generating useful constants like these