Your logic also would mean that any sorting function that is publicly facing (which is basically any interface on the internet, like a sorted list of Facebook friends) would need to use heapsort (which is 2-4 times as slow), as otherwise DoS attacks are simply done by constructing worst case inputs.
There are no real disadvantages to the hybrid approach.
> Your logic also would mean that any sorting function that is publicly facing (which is basically any interface on the internet, like a sorted list of Facebook friends) would need to use heapsort (which is 2-4 times as slow), as otherwise DoS attacks are simply done by constructing worst case inputs.
Why is that a wrong conclusion? It might be, I'm not a dev. But if I found myself caring about that sort of minutiae, I would reach exactly that conclusion.
* the paranoid possibility that enough users can trigger enough DoS attacks that your system can fall over. If this is likely enough, maybe you should design for the 2-4x worst case, and make your testing and provisioning of resources easier.
* a desire for simplicity when predicting performance, which you're losing by going your route because you're adding the possibility of a 2-4x performance drop depending on the content of the list. Ideally, you want the performance to solely be a function of n, where n is the size of your list; not n and the time-varying distribution of evilness over your users.
Finally, adding a fallback doesn't seem free to me, because it might fool you into not addressing the points I just made. That O(n^2) for Quicksort might be a good way to get people to think; your O(n log n) is hiding factors which don't just depend on n.
pdqsort is a hybrid sort; when it encounters apparently pathological input, it switches from a strategy resembling quicksort to heapsort—it doesn't share quicksort's worst case characteristics.
Your thesis is wrong:
> you might as well use an algorithm that takes the same amount of time every time, which I think means heapsort
Heapsort has a variable runtime. It will selectively reheap. Whether this happens is dependent on the state of the heap and your next element, which means the total number of times you reheap will vary with input.
I understand how hybrid sorts work. I thought that would be clear enough. I guess when you're a stranger on the internet, people don't automatically trust you to know what you're talking about. I imagine this is even truer when you say something counterintuitive or controversial. In spite of learning that, I'm responding chiefly because of that quote.
> Your thesis is wrong:
> Heapsort has a variable runtime
Let's not get hung up on heapsort. My thesis is if you have an algorithm with a gap between the worst case and average case, then you shouldn't use it on the web. That gap might not manifest as gap in the big O -- it could be 4x slower in the worst case than the average case, and that would still be bad.
To see why, try imagining this world of pure evil:
You create a website that uses gapsort. Gapsort is an algorithm that is 4x slower in the worst case than it is in the average case, for a fixed value of n (the size of the input). On top of that, triggering the worst case by accident is hard. You deploy your site for several months, buy servers, and get a userbase. In your performance testing, you only encountered the average case, so you only provisioned hardware for the average case. Now imagine that suddenly all your users turn evil, and start triggering the worst case; this leads to a sudden 4x performance drop. You may have underprovisioned for this.
So my thesis is it looks like having differences from the worst case and average case is great, on average. But in a hostile world, that actually makes things more complicated. When doing empirical performance testing, you'll test for the average. Moreover the gap can be just 4x; this will not manifest as a difference between the big Os of the worst and average cases (as I've said previously).
Changing from quicksort to heapsort on some inputs may manifest as a performance gap between the QS case and fallback case. Maybe that's not such a good idea. In fact, maybe you shouldn't use any quicksort variant, even introsort, in a potentially hostile environment, because of the unavoidable average-to-worst-case gap. In introsort, that gap is merely a different coefficient, but it's still a gap.
I hope that wasn't too repetitive.
 I deleted and reposted this because the edits weren't being applied.
 Done it twice now.