Lookup of an element. Find the heap with the element (i.e. element between maximum of previous heap and this heap). Perform linear search in heap. Worst cast complexity O(n^(1/3) + n^(2/3)) = O(n^(2/3)).
Insertion of an element. Find the last heap those maximum element is still larger than the element. Extract top element of the heap and insert the new element instead. Then propagate upwards (i.e. extract top of next heap and insert top of previous heap, etc). Worst case is O(n^(1/3) + log(n^(2/3)) * n^(1/3)) = O(log(n) * n^(1/3)).
Deletion of an element. Find the heap with the element (i.e. element between maximum of previous heap and this heap). Remove the element from it. Extract top of next heap and insert it into this one. Then continue propagating upwards. So this is O(n^(1/3) + n^(2/3) + log(n^(2/3)) * n^(1/3)) = O(n^(2/3)).
Building the structure. Why is this O(n)? If you have already segregated the array into segments for the separate heaps, heapifying them would be O(n). How can the segmentation be done in O(n)?
Edit: Using the suggestion of splitting up according to 1+3+5+..+(2n-1) = n^2 we'd get:
Search: O(sqrt(n) + sqrt(n)) = O(sqrt(n))
Insert: O(sqrt(n) + log(sqrt(n)) * sqrt(n)) = O(log(n) * sqrt(n)
Delete: O(sqrt(n) + sqrt(n) + log(sqrt(n)) * sqrt(n)) = O(log(n) * sqrt(n))
>> Deletion leaves a hole in one heap, which should be filled by the minimum element of the next heap, etc. The minimum cannot be extracted efficiently from a max-heap.
And since we started with the smallest element, this will cascade through each of the heaps, in order to keep them all (but the last) at full size. All together, that's O(N).
Using 1 + 3 + 5 ... = n^2, wouldn't the largest segment hold roughly sqrt(n) elements. So O(sort(n)) heapify for sqrt(n) heaps, or O(sqrt(n) * sqrt(n)) = O(n).
Insertion for example, if I want to insert a value which is smaller than the smallest element in the last heap, then it'll have to be inserted into some heap in the middle, right? And since the heap is in the middle, say heap K, its size already reaches its upper limitation K^2. Then where should the new value go? If I insist on pushing it into heap K, then the heap will violate the size limitation. Should I split the oversized heap into several heaps some time? If it does split, will the O(sqrt n) searching time still holds?
Or maybe I just have not caught the author's idea yet. :-(
BTW, why not BST, for everything is O(log n)?
Update: @nikic solved my questions. Thanks!
Note that I'm making a couple assumptions about the data structure that were left unstated in the original post:
* Each element of each heap is less than every element of every subsequent heap.
* "When the max value in a heap is less than the searched element" was a typo and "When the max value in a heap is greater than the searched element" was intended.
Maybe I just totally misunderstand this data structure though :-( It's still morning for me.
> Then you would insert the max you just removed into the next heap in the same way, and so on.
BSTs are O(n) lookup, and the pathological case is quite easy to achieve: add elements to it in sorted order.
There are other trees that have O(lg n) lookup. Red-black trees are the canonical example.
> There are many types of binary search trees. AVL trees and red-black trees are both forms of self-balancing binary search trees.
When you say BST the only thing you've said is that it's a binary tree structure with a property about the childrens value compared to the parent. While saying that there is certain BSTs with O(lg_2 n) searches is true, there also exists BSTs with O(n) searches.
The original claim was that:
> BTW, why not BST, for everything is O(log n)?
Which is not true. A sorted linked list is the canonical counter example. There is no confusion of terminology, it is clearly defined in algorithmic literature on search trees.
For example, if I was talking about an exciting new business plan to serve alcohol in a building and someone replied "Bars already serve drinks.", you probably wouldn't object saying that some bars don't serve drinks - that's not really the point.
I can probably concoct a counterexample, but given that most BSTs that one would actually use under that terminology are self-balancing ~O(log n) structures I am tempted to go with this. I imagine if "self-balancing BST" was used the objection might never have been raised, even though not strictly all self-balancing BSTs have all O(log n) operations.
Either way it's a semantic thing that has little to do with what BST means (we agree on that) and more to do with interpreting the ambiguities of English.
> not strictly all self-balancing BSTs have all O(log n) operations.
You're right. You could have a self balancing tree that would simply guarantee it's height is n/2, but that entails that it's height is O(n), which, by most definitions mean that it's not self balancing. You want it to be at least O(n^c) for 0<c<1, in height.
The Wizard of Oz
The Third Man
All About Eve
Das Cabinet des Dr. Caligari. (The Cabinet of Dr. Caligari)
A Hard Day's Night
E.T. The Extra-Terrestrial
It Happened One Night
Singin' in the Rain
The Adventures of Robin Hood
North by Northwest
Snow White and the Seven Dwarfs
In each of sub-heaps (1, 4, 9, 6) elements will be organized in max-heap. Meaning for each element i, x[i] > x[2* i+1] && x[i] > x[2 * i+2].
we decompose an array of size 35 into: an array of size 1 followed by arrays of sizes 4, 9, 16, 5
I may have missed something, though :-|
This context includes not only the characteristics of the (typical) systems using the data layout, but also the typical use of that structure (read heavy, write heavy, some mix). The cost of memory is also an issue. For a typical end user program the bottleneck is almost always going to be the user. For larger scale programs or more performance intensive areas of a program (E.G. graphics languages or other large data operations) different tradeoffs should be evaluated.
> The short of it is, arrays are king. No two ways about it - following pointers is a losing strategy when there's an alternative.
Why is following pointers a losing strategy? If you have a contiguous array (a vector), then the pre-fetcher can do its magic and give you a psuedo-cache level. (I may be wrong about this assumption).
Of course things become more complicated when you start having larger amounts of data, and you want to skip over most of it, but still keep the cache happy.
Even that would have its problems, though. Typically, every one of these dereferences brings in a new cache line. Unless the data that the pointers point to fills an integral number of cache lines, you are bringing data into the cache that you will not need.
What is that? Well CPUs actually process a statement in several steps, and try to process several statements at a time. But what if you have an if? The solution is to guess which way it will turn out, then begin on the expected branch before it is done evaluating the if. If the if turns out as expected, this goes faster. If not, the partially done work is thrown out, and the other branch is run. This is a substantial speedup when branches are predicted correctly.
When you search through an array you often get into a situation where most of the time the if turns out the same way. "Is this bigger than my max? No. Is this bigger than my max? No...." As a result branch prediction makes this execute surprisingly quickly.
Something like this with a contiguous allocation would probably require quite a lot of empty space - 100% reserve for each subarray by necessity and 50% for the array as a whole, so 200% overhead at worst and probably 90% overhead on average. Alternatively you can heap allocate each subarray at the expense of slower operations, but resulting in a more typical 100% worst-case overhead.
So you'd store like this
(0, 3) (2, 2) (1, 4) (_, _) [1, 3, 7, _] [14, 15, 16, 19] [9, 12, _, _] [_, _, _, _]
So to search for a value, you do a binary search over the header to find the wanted bucket and then over the bucket to find the value. That's O(log n).
Insertion is O(sqrt n) since you need to find the bucket and then do a normal insertion into it. You might need to reallocate the whole thing (remember to increase bucket size!) but that's amortized out.
Deletion is even easier since you don't have to deal with overflow (although you might want to move empty buckets to the end or merge adjacent ones, maybe). O(sqrt n).
Creating the structure is trivially O(n log n), since it's sorted.
Converting to a sorted list is just compacting the buckets.
You also get access by index at O(sqrt n) speed since you just need to traverse the counts in the header. Not perfect, but close.
For a cache-efficient design it's pretty neat. Binary searches are fast, after all.
: When you split an overfull subarray, each new subarray will be half full.
Not sure if it's a great idea, but it's tempting.
> Now each of these arrays we organize as a max heap. Moreover, we arrange data such that the maximums of these heaps are in INCREASING order. That means the smallest element of the entire (initial) array is at the first position, then followed by the next 4 smallest organized in a max heap,
Just because the maximums of the heaps are increasing order doesn't mean that the minimum is at the front. It means the first element is smaller than the biggest element of the next four.
Consider, for example:
1 4 3 2 0
In fact, there's an error in the search algorithm as well:
> Whenever the maximum element (first element in the subarray) is smaller than the searched value, we skip over that entire heap and go to the next one. In the worst case we'll skip O(sqrt n) heaps. When the max value in a heap is less than the searched element, we found the heap and we run a linear search among O(sqrt n) elements.
For the unitalicized portion above, he means to say "When the max value in a heap is greater than the searched element".
0 4 2 3 1
Just as (e^n) grows faster (in the limit) than any polynomial of n, such as n^2 or n^100; likewise, log(n) grows slower than any polynomial of n, such as sqrt(n) = n^(1/2), or even n^(1/100).
They are both practically pretty slow-growing, though. It may well be that for practical sizes, constant factors matter more than the asymptotic difference on present computers.
But I actually wanted to respond to this:
> log(n) grows slower than any polynomial of n, such as sqrt(n) = n^(1/2)
It's correct to say that log(n) grows slower than any positive exponent of n, but what you've said is wrong in two ways:
- f(n) = n^(1/2) is not a polynomial, as a polynomial in n can only feature nonnegative integer exponents of n.
- f(n) = 300 is a polynomial in n, but it grows more slowly than log(n). (It's also larger, for any plausible value of n at all. So constant factors do matter, but the difference between log(n) and sqrt(n) is quite noticeable.)
So, in Big-O term, this only trying to optimize only small dataset (4 <= N <= 16).
In this case, the constant factors will probably be quite different because the goal is to make better use of CPU cache.