> Currently an allocation of the new VA area is done over busy list iteration until a suitable hole is found between two busy areas. Therefore each new allocation causes the list being grown.
Still pointer-chasing, I see. It is remarkable that performance was ever tolerable. Probably the need to zero the pages before delivering them to user space masks almost any amount of inefficiency.
I wish they would consider using cacheline sized B+ trees instead of dumb RB trees.
The latter are not making proper use of pipelined superscalar processors (AKA any modern CPU that can run at over 1 GHz).
Couldn't you (or someone) write the code and submit it for approval. I was under the impression anyone could hack on the kernel (fairly new to Linux) and make submissions for review.
The natural choice of language to code well-optimized data structures in is C++, but the Linux old guard have shown themselves irrationally hostile to integrating anything coded in C++.
Coding data structures in C is a formula for wasting your time, because at each next use you have to start over nearly from scratch. That is why kernels are such heavy users of ancient data structures user-space has largely abandoned.
Additionally, one must learn the politics, code-style, idiosyncrasies, etc before submission will be successful. And of course, the architecture of the project itself.
Open Source / FOSS creates the opportunity for anyone to offer code; it does not mean it will so simply be accepted, or should be. And it does not mean you can, or should. But if you wish to, a path always exist (if they just plain don't want it, and you really want to add it... fork!)
Sorry my reading of this is that they had an important allocator using a O(N) free-block search? That makes lots of algorithms and operations become quadratic really easily - especially given no one expects linear allocation cost.
RB tree is an interesting choice, presumably there’s a benefit vs btrees (maybe reduced metadata cost?)
It’s also kind of frustrating when articles like this say things like “up to X% faster”. That’s way underselling it: this is asymptotically faster - the performance increase gets larger and larger over time, it’s not a simple multiplier :-/
Why? I expect all high speed network drivers to be already zero-alloc in their sending path. We have zero-copy interfaces as well. Is any network operation really held up by allocs anymore?
Drivers mostly allocate things from kmalloc, not vmalloc. vmalloc is only used for large allocations that exceed the maximum size that kmalloc can provide. It's traditionally not expected to "churn" either.
One use for vmalloc is for allocating loadable modules: when you insmod a driver, the space comes from vmalloc. Needless to say, there are few use cases for inserting and removing a driver thousands of times per second.
Sounds accidentally quadratic?