
KestrelHttpServer SuperCharged MemoryPoolIterator - philliphaydon
https://github.com/aspnet/KestrelHttpServer/pull/1138
======
jsingleton
Ben has been once of the main external contributors to Kestrel and is a big
reason of why it's so fast. Great work! I have no idea how he manages to do
all of this and write a game at the same time.

[https://www.techempower.com/blog/2016/11/16/framework-
benchm...](https://www.techempower.com/blog/2016/11/16/framework-benchmarks-
round-13/)

[https://www.techempower.com/benchmarks/#section=data-r13&hw=...](https://www.techempower.com/benchmarks/#section=data-r13&hw=ph&test=plaintext)

------
NKCSS
quote:

"benaadams commented on 3 Oct • edited I couldn't believe there wasn't a
number in the 2^64 space that did this. However, after exhaustive internet
searching I couldn't find one. So I went to sleep, woke up in the middle of
the night with a headache and idea of a method to try and actually derived it.

Having derived it, I did some searching with the search engines and couldn't
find any hits either in its decimal or hexadecimal from so called it Ben's
Magic Number

I will write a blog post on the method I used."

Then:

"I tried to come up with a similar magic number that would allow to obtain the
index after shifting its bits, and found a different one. Wouldn't this number
just do the same and be a bit easier to understand? 0x01020304050607 We know
that after doing x=(v & (-v)) we get a power of 2, which when multiplied with
the magic number will shift its bits by a multiple of 8 positions: So after
being multiplied by x, 0x0001020304050607 is turned into

0x0304050607000000 when shifted by a 3-byte offset 0x0405060700000000 when
shifted by a 4-byte offset 0x0506070000000000 when shifted by a 5-byte offset
and so on... We then simply have to read the 1st byte of the resulting number
to find the offset, ie: offset = (((v & -v) * Magic) >> 56) & 0x7"

Followed by:

"benaadams commented on 4 Oct • edited @nicodeslandes Congats! That is a lot
more obvious and what is happening; and drops a shift. Already superseded;
that's the power of OSS collaboration!"

The comments here are very nice and worth a read.

------
Arnavion
If anyone wonders like I did what markdown magic could've made the collapsible
"Details" sections in the second comment, it's actually standard HTML5 -
[https://developer.mozilla.org/en-
US/docs/Web/HTML/Element/de...](https://developer.mozilla.org/en-
US/docs/Web/HTML/Element/details)

~~~
bhrgunatha
How do you use github flavoured markdown to embed <details> tags - surely you
can't just use the html tags? Isn't that the whole point of markdown - to
limit users html input so there aren't any security escalations?

~~~
jfroma
It is not the point of markdown, in fact any html tag is valid in markdown if
you read the original description [1].

Github does strip some tags for security reasons like <script>. Some of the
parsers I have seen offer this feature.

1:
[https://daringfireball.net/projects/markdown/syntax#html](https://daringfireball.net/projects/markdown/syntax#html)

------
hitr
It seems like kestrel part of .net core got some amazing performance
improvements contributions from the open.I see the performance of kestrel is
much better than any versions of IIS +ISAPI or IIS7 + Asp.net modules/handlers
ever produced.May be this is partly to do with how simple the middleware (just
a function/method it is ).But the request parsing logic got really well and I
see that kestrel could hit 5 Million RPS disucssed on this talk[1]( compared
to 50K of old asp.net) . Some crazy optimizations and benchmarks are discussed
in that video like static byte arrays,memory pools,custom awaiter,bit
manipulations for string comparisons etc

Kestrel will be one of the best when it comes to benchmark

[1] [https://vimeo.com/172009499](https://vimeo.com/172009499)

[edit] added video url

~~~
philliphaydon
Some of the contributions have been amazing. There's some exceptionally smart
people in the community that are contributing for the greater good of the
community.

I think it's one of the best decisions MS has made, to go open source.

------
pkhuong
TIL: .NET doesn't expose bitscan instructions.

~~~
jongalloway2
From how I read this comment, it's not exposed on x86 because it's extremely
slow on Atom:

[https://github.com/dotnet/coreclr/blob/cb9cbef5cd036bb761720...](https://github.com/dotnet/coreclr/blob/cb9cbef5cd036bb7617202262908f358a46b47f8/src/inc/bitposition.h#L47)

------
eveningcoffee
Can somebody ELI5 what is the objective and how it has been accomplished?

~~~
benaadams
Parse a full HTTP request faster. Method, Path, Version, Headers, start of
data.

How its been done is more tricky. Lots of iterative development.

Vectorization, bit twiddling, fast-path inlines, analysing the output asm from
the compiler, benchmarking, repeat...

~~~
eveningcoffee
Thanks! This gives a general picture.

Can you also explain the bit twiddling bit in more detail? What operation it
makes faster and how it falls into a larger picture?

~~~
to3m
Roughly speaking, find index of most significant set bit. (So 255 gives you 7,
254 gives you 7, 256 gives you 8, etc.) At a guess, they are comparing things
8/16 bytes at a time, and getting a packed set of byte flags where bytes are
set to 0xff to indicate inequality (so, imagining it 4 bytes at a time,
comparing 0x12345678 and 0x12125656 would give you 0x00ff00ff) and they want
to see where the first different byte is. In this example, the index is 24.
(Then: 4-24/8=1. So the first different byte is at offset 1.)

As to how it works: you use the v&-v trick to clear all but the top bit of
your comparison result. Because you've only got 1 bit set, this value is a
power of two; if you multiply another value by this one, you're getting the
equivalent of a left shift by that power (which is relevant - you need to be
thinking in shifts rather than multiplies).

Because of the way the predicate results are formed, you only have 5 possible
values you're going to get here: 2^31, 2^23, 2^15, 2^7, or 0. The outputs
these correspond to:

    
    
        31 -> 0
        23 -> 1
        15 -> 2
        7  -> 3
        0  -> (special case - all identical)
    

So you need a value that when shifted left 31 bits can be easily turned into
0, when shifted left 23 can be turned into 1, and so on.

A suitable value for this is something like 3<<22|2<<14|1<<6\. (If I've made a
mistake there, which I probably have, hopefully the working will make it
obvious how to fix it.) Recalling that multiplying by 2^N gives you a left
shift by N:

When multiplied by 2^31, you get 0.

When multiplied by 2^23, you get 1<<29.

When multiplied by 2^15, you get 2<<29|1<<21.

When multiplied by 2^7, you get 3<<29|2<<21|1<<13.

In all cases, you can shift the result right 29 bits (unsigned shift, or
signed shift and mask with 3 afterwards) and get the value you're looking for:
0, 1, 2, or 3.

Zero you can check for before doing the multiply; or you could pop a 4 in at
the bottom or something so that the result would be 4 in that case. (This is
what I figured you'd want originally and that's why the result starts at bit
29. But actually I guess you'd probably check first.)

This extends naturally to wider comparisons. I think they're doing it 64 bits
at a time in this case.

Exercise for the reader: (maybe these are too easy?? - however they weren't
instantly obvious to me)

\- Why is x * 2^N equivalent to x<<N?

\- What if you have more than 1 bit set? Like x * (2^N+2^M)?

\- Figure out some rules for generating useful constants like these

~~~
al2o3cr
Highly recommend the book "Hacker's Delight" by Henry S. Warren, Jr for more
bit hax0ring like this.

~~~
naasking
Or check this reference:

[https://graphics.stanford.edu/~seander/bithacks.html](https://graphics.stanford.edu/~seander/bithacks.html)

