Hacker News new | comments | show | ask | jobs | submit login
KestrelHttpServer SuperCharged MemoryPoolIterator (github.com)
95 points by philliphaydon 214 days ago | hide | past | web | 19 comments | favorite



Ben has been once of the main external contributors to Kestrel and is a big reason of why it's so fast. Great work! I have no idea how he manages to do all of this and write a game at the same time.

https://www.techempower.com/blog/2016/11/16/framework-benchm...

https://www.techempower.com/benchmarks/#section=data-r13&hw=...


quote:

"benaadams commented on 3 Oct • edited I couldn't believe there wasn't a number in the 2^64 space that did this. However, after exhaustive internet searching I couldn't find one. So I went to sleep, woke up in the middle of the night with a headache and idea of a method to try and actually derived it.

Having derived it, I did some searching with the search engines and couldn't find any hits either in its decimal or hexadecimal from so called it Ben's Magic Number

I will write a blog post on the method I used."

Then:

"I tried to come up with a similar magic number that would allow to obtain the index after shifting its bits, and found a different one. Wouldn't this number just do the same and be a bit easier to understand? 0x01020304050607 We know that after doing x=(v & (-v)) we get a power of 2, which when multiplied with the magic number will shift its bits by a multiple of 8 positions: So after being multiplied by x, 0x0001020304050607 is turned into

0x0304050607000000 when shifted by a 3-byte offset 0x0405060700000000 when shifted by a 4-byte offset 0x0506070000000000 when shifted by a 5-byte offset and so on... We then simply have to read the 1st byte of the resulting number to find the offset, ie: offset = (((v & -v) * Magic) >> 56) & 0x7"

Followed by:

"benaadams commented on 4 Oct • edited @nicodeslandes Congats! That is a lot more obvious and what is happening; and drops a shift. Already superseded; that's the power of OSS collaboration!"

The comments here are very nice and worth a read.


If anyone wonders like I did what markdown magic could've made the collapsible "Details" sections in the second comment, it's actually standard HTML5 - https://developer.mozilla.org/en-US/docs/Web/HTML/Element/de...


How do you use github flavoured markdown to embed <details> tags - surely you can't just use the html tags? Isn't that the whole point of markdown - to limit users html input so there aren't any security escalations?


It is not the point of markdown, in fact any html tag is valid in markdown if you read the original description [1].

Github does strip some tags for security reasons like <script>. Some of the parsers I have seen offer this feature.

1: https://daringfireball.net/projects/markdown/syntax#html


Thank you, I was really wondering if github is collapsing too long code snippets on mobile.


It seems like kestrel part of .net core got some amazing performance improvements contributions from the open.I see the performance of kestrel is much better than any versions of IIS +ISAPI or IIS7 + Asp.net modules/handlers ever produced.May be this is partly to do with how simple the middleware (just a function/method it is ).But the request parsing logic got really well and I see that kestrel could hit 5 Million RPS disucssed on this talk[1]( compared to 50K of old asp.net) . Some crazy optimizations and benchmarks are discussed in that video like static byte arrays,memory pools,custom awaiter,bit manipulations for string comparisons etc

Kestrel will be one of the best when it comes to benchmark

[1] https://vimeo.com/172009499

[edit] added video url


Some of the contributions have been amazing. There's some exceptionally smart people in the community that are contributing for the greater good of the community.

I think it's one of the best decisions MS has made, to go open source.


Here are the latest benchmarks that the ASP.NET Core team is running to test their stack: https://github.com/aspnet/benchmarks


TIL: .NET doesn't expose bitscan instructions.


From how I read this comment, it's not exposed on x86 because it's extremely slow on Atom:

https://github.com/dotnet/coreclr/blob/cb9cbef5cd036bb761720...


When I needed these instructions, I created this https://github.com/omgtehlion/netintrinsics

Guys in OP seem to be more clever, though


bitscan would change 3 instructions (xor, mul, -1) with one - wouldn't be a huge difference; in this particular scenario.


Can somebody ELI5 what is the objective and how it has been accomplished?


Parse a full HTTP request faster. Method, Path, Version, Headers, start of data.

How its been done is more tricky. Lots of iterative development.

Vectorization, bit twiddling, fast-path inlines, analysing the output asm from the compiler, benchmarking, repeat...


Thanks! This gives a general picture.

Can you also explain the bit twiddling bit in more detail? What operation it makes faster and how it falls into a larger picture?


Roughly speaking, find index of most significant set bit. (So 255 gives you 7, 254 gives you 7, 256 gives you 8, etc.) At a guess, they are comparing things 8/16 bytes at a time, and getting a packed set of byte flags where bytes are set to 0xff to indicate inequality (so, imagining it 4 bytes at a time, comparing 0x12345678 and 0x12125656 would give you 0x00ff00ff) and they want to see where the first different byte is. In this example, the index is 24. (Then: 4-24/8=1. So the first different byte is at offset 1.)

As to how it works: you use the v&-v trick to clear all but the top bit of your comparison result. Because you've only got 1 bit set, this value is a power of two; if you multiply another value by this one, you're getting the equivalent of a left shift by that power (which is relevant - you need to be thinking in shifts rather than multiplies).

Because of the way the predicate results are formed, you only have 5 possible values you're going to get here: 2^31, 2^23, 2^15, 2^7, or 0. The outputs these correspond to:

    31 -> 0
    23 -> 1
    15 -> 2
    7  -> 3
    0  -> (special case - all identical)
So you need a value that when shifted left 31 bits can be easily turned into 0, when shifted left 23 can be turned into 1, and so on.

A suitable value for this is something like 3<<22|2<<14|1<<6. (If I've made a mistake there, which I probably have, hopefully the working will make it obvious how to fix it.) Recalling that multiplying by 2^N gives you a left shift by N:

When multiplied by 2^31, you get 0.

When multiplied by 2^23, you get 1<<29.

When multiplied by 2^15, you get 2<<29|1<<21.

When multiplied by 2^7, you get 3<<29|2<<21|1<<13.

In all cases, you can shift the result right 29 bits (unsigned shift, or signed shift and mask with 3 afterwards) and get the value you're looking for: 0, 1, 2, or 3.

Zero you can check for before doing the multiply; or you could pop a 4 in at the bottom or something so that the result would be 4 in that case. (This is what I figured you'd want originally and that's why the result starts at bit 29. But actually I guess you'd probably check first.)

This extends naturally to wider comparisons. I think they're doing it 64 bits at a time in this case.

Exercise for the reader: (maybe these are too easy?? - however they weren't instantly obvious to me)

- Why is x * 2^N equivalent to x<<N?

- What if you have more than 1 bit set? Like x * (2^N+2^M)?

- Figure out some rules for generating useful constants like these


Highly recommend the book "Hacker's Delight" by Henry S. Warren, Jr for more bit hax0ring like this.





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: