Hacker News new | comments | ask | show | jobs | submit login
An interesting data structure with search time O(sqrt n) (dlang.org)
254 points by z3phyr on Nov 30, 2015 | hide | past | web | favorite | 70 comments

Per the last posts: Number of heaps: O(n^(1/3)). Maximum elements per heap O(n^(2/3)).

Lookup of an element. Find the heap with the element (i.e. element between maximum of previous heap and this heap). Perform linear search in heap. Worst cast complexity O(n^(1/3) + n^(2/3)) = O(n^(2/3)).

Insertion of an element. Find the last heap those maximum element is still larger than the element. Extract top element of the heap and insert the new element instead. Then propagate upwards (i.e. extract top of next heap and insert top of previous heap, etc). Worst case is O(n^(1/3) + log(n^(2/3)) * n^(1/3)) = O(log(n) * n^(1/3)).

Deletion of an element. Find the heap with the element (i.e. element between maximum of previous heap and this heap). Remove the element from it. Extract top of next heap and insert it into this one. Then continue propagating upwards. So this is O(n^(1/3) + n^(2/3) + log(n^(2/3)) * n^(1/3)) = O(n^(2/3)).

Building the structure. Why is this O(n)? If you have already segregated the array into segments for the separate heaps, heapifying them would be O(n). How can the segmentation be done in O(n)?

Edit: Using the suggestion of splitting up according to 1+3+5+..+(2n-1) = n^2 we'd get:

Search: O(sqrt(n) + sqrt(n)) = O(sqrt(n))

Insert: O(sqrt(n) + log(sqrt(n)) * sqrt(n)) = O(log(n) * sqrt(n)

Delete: O(sqrt(n) + sqrt(n) + log(sqrt(n)) * sqrt(n)) = O(log(n) * sqrt(n))

Deletion can't be done efficiently, as pointed out by Timon Gehr:

>> Deletion leaves a hole in one heap, which should be filled by the minimum element of the next heap, etc. The minimum cannot be extracted efficiently from a max-heap.

Indeed. If the smallest element is deleted, then we are removing the singleton element of the first heap. We need to find the second-smallest element, which will be in the next heap after it, but it could be any of the leaves. That means we need to traverse O(heapsize) elements.

And since we started with the smallest element, this will cascade through each of the heaps, in order to keep them all (but the last) at full size. All together, that's O(N).

Never used one, but how about a min-max heap instead?


That's true, I got confused with min and max there. gre's suggestion of using a min-max heap (if deletion is required) seems a reasonable workaround for this issue.

Using min-max heaps should be workable; indeed, the resulting data structure seems to be a special case of the data structure described in section 3 of [1].

[1] http://www.cs.otago.ac.nz/staffpriv/mike/Papers/MinMaxHeaps/...

> Building the structure. Why is this O(n)? If you have already segregated the array into segments for the separate heaps, heapifying them would be O(n). How can the segmentation be done in O(n)?

Using 1 + 3 + 5 ... = n^2, wouldn't the largest segment hold roughly sqrt(n) elements. So O(sort(n)) heapify for sqrt(n) heaps, or O(sqrt(n) * sqrt(n)) = O(n).

Here's a related paper (on HN last month) which tries to find the arrangement of elements in an array which minimizes search time, measuring wall-clock time rather than theoretical complexity: http://arxiv.org/ftp/arxiv/papers/1509/1509.05053.pdf. (Spoiler alert: it's not sorted order with binary search, and the answer has a lot to do with the cache).

Interesting. One layout they did not consider is the bit-reversal permutation[1] of the sorted array. Searching is straightforward: to find element i, you reverse the bits of i -- so then, you can just binary search. This layout is similar to Eytzinger, so it should perform at least as well, and it may have some advantages.


From some homework I did as a freshman, I remember another neat data structure with O(sqrt(n)) search: https://ece.uwaterloo.ca/~dwharder/aads/Algorithms/Beaps/

I'm confused. I can follow the logic for O(sqrt n) searching. yeah, that's cool. But how to maintain the structure when inserting or deleting?

Insertion for example, if I want to insert a value which is smaller than the smallest element in the last heap, then it'll have to be inserted into some heap in the middle, right? And since the heap is in the middle, say heap K, its size already reaches its upper limitation K^2. Then where should the new value go? If I insist on pushing it into heap K, then the heap will violate the size limitation. Should I split the oversized heap into several heaps some time? If it does split, will the O(sqrt n) searching time still holds?

Or maybe I just have not caught the author's idea yet. :-(

BTW, why not BST, for everything is O(log n)?

Update: @nikic solved my questions. Thanks!

I think for insertion, once you find the heap the new element should be inserted into, you would remove the max element from the heap, and re-heapify that subarray with the new element. Then you would insert the max you just removed into the next heap in the same way, and so on. That sounds like it could be less expensive than shifting the whole array, but I haven't done the math (and it seems that Alexandrescu hasn't either, yet).

Note that I'm making a couple assumptions about the data structure that were left unstated in the original post:

* Each element of each heap is less than every element of every subsequent heap.

* "When the max value in a heap is less than the searched element" was a typo and "When the max value in a heap is greater than the searched element" was intended.

Maybe I just totally misunderstand this data structure though :-( It's still morning for me.

You'd re-heapify all heaps greater than the one that ejected an element, since heaps greater than you are all full except for the very last one.

I said that:

> Then you would insert the max you just removed into the next heap in the same way, and so on.

BST is O(n) in the worst case (when a tree is completely unbalanced and is essentially a linked list)

Well, it's not a big problem. This case won't exist in most BSTs. Even if it appears in some BST, splay tree for example, it will be amortized.

I'm not really sure what amortization you're talking about here.

BSTs are O(n) lookup, and the pathological case is quite easy to achieve: add elements to it in sorted order.

There are other trees that have O(lg n) lookup. Red-black trees are the canonical example.

BSTs can have O(lg n) lookups. A Red-black tree is such an example. It is a self-balancing BST, so a red-black tree is a BST itself.

And BSTs can have O(n) lookup. The only property of a BST is that you know something about the value of the children compared to the parent. This means that a sorted linked list is a BST.

Sure, but abcdabcd987 was evidently not suggesting using a completely general BST. That there exists a BST with the mentioned properties is sufficient to validate the claim made; that there exists a BST which does not is irrelevant.

I believe your and abcdabcd987's use of the term BST is not exactly what is commonly used. A BST refers to both the data structure and the algorithm used to manage it. An RB tree is a different concept. An RB tree is a binary tree, certainly, since the term "binary tree" implies no algorithms, but it is not a "binary search tree" in its specific denotation.


> There are many types of binary search trees. AVL trees and red-black trees are both forms of self-balancing binary search trees.

Terminology is important. A BST is what I defined. A balanced BST has an extra property, but you don't get to call a balanced BST just a BST, it only adds to confusion and unclear communication.

I don't agree with your use of terminology; to me "BST" is just as much a class of things as "mammal" is. I will agree though that it has added confusion.

> I don't agree with your use of terminology

When you say BST the only thing you've said is that it's a binary tree structure with a property about the childrens value compared to the parent. While saying that there is certain BSTs with O(lg_2 n) searches is true, there also exists BSTs with O(n) searches.

The original claim was that:

> BTW, why not BST, for everything is O(log n)?

Which is not true. A sorted linked list is the canonical counter example. There is no confusion of terminology, it is clearly defined in algorithmic literature on search trees.

I generally take English sentences like that to be existential claims, rather than universal ones.

For example, if I was talking about an exciting new business plan to serve alcohol in a building and someone replied "Bars already serve drinks.", you probably wouldn't object saying that some bars don't serve drinks - that's not really the point.

I can probably concoct a counterexample, but given that most BSTs that one would actually use under that terminology are self-balancing ~O(log n) structures I am tempted to go with this. I imagine if "self-balancing BST" was used the objection might never have been raised, even though not strictly all self-balancing BSTs have all O(log n) operations.

Either way it's a semantic thing that has little to do with what BST means (we agree on that) and more to do with interpreting the ambiguities of English.

My point of view origins from the academic world, and the distinction absolutely matters, and it should matter in any theoretical discussion on the topic, otherwise we'll spend too much time arguing semantics, instead of just being clear.

> not strictly all self-balancing BSTs have all O(log n) operations.

You're right. You could have a self balancing tree that would simply guarantee it's height is n/2, but that entails that it's height is O(n), which, by most definitions mean that it's not self balancing. You want it to be at least O(n^c) for 0<c<1, in height.

I'll put this under "agree to disagree". English is too vague to speak unambiguously in it all the time :P.

I'm not sure I'm able to visualize this correctly. Let's say we have an array of strings containing the top 20 movies from Rotten Tomatoes:

  The Wizard of Oz
  The Third Man
  Citizen Kane
  All About Eve
  Das Cabinet des Dr. Caligari. (The Cabinet of Dr. Caligari)
  Modern Times
  A Hard Day's Night
  The Godfather
  E.T. The Extra-Terrestrial
  It Happened One Night
  Singin' in the Rain
  The Adventures of Robin Hood
  Inside Out
  North by Northwest
  King Kong
  Snow White and the Seven Dwarfs
What would this data structure look like?

It will look like regular plain array of 20 elements (with pointers to your strings). In these 20 elements first one will be the smallest one (use strcmp or similar). Next 4 elements will contain next 4 smallest strings, next 9 next 9 smallest, and then 6 last ones. 1 + 4 + 9 + 6 = 20.

In each of sub-heaps (1, 4, 9, 6) elements will be organized in max-heap. Meaning for each element i, x[i] > x[2* i+1] && x[i] > x[2 * i+2].

From my understanding it would be roughly: https://gist.github.com/bpicolo/32a7fc775ce1810c88a0

I won't be array of arrays. It will be just one array.

  we decompose an array of size 35 into: an array of size 1 followed by arrays of sizes 4, 9, 16, 5
Then again, maybe he means `mentally decompose` in terms of pretending they're separate arrays? I interpreted it as `visualize this data structure`.

I also originally thought he had an array of arrays, but now I I think the run-times he mentioned for insertion and deletion would only work for a single array, not an array of arrays.

That is the way I took it too

From what I understand, this is how Medium organizes the fast retrieval and editing of their rich text. You might want to read this for more information: https://medium.com/medium-eng/why-contenteditable-is-terribl...

I may have missed something, though :-|

That doesn't sound like it at all. Medium's document model is a list of paragraphs, and that (according to your link) is it.

The context of the proposed data structure is more important for evaluating how effective it is at maximizing performance.

This context includes not only the characteristics of the (typical) systems using the data layout, but also the typical use of that structure (read heavy, write heavy, some mix). The cost of memory is also an issue. For a typical end user program the bottleneck is almost always going to be the user. For larger scale programs or more performance intensive areas of a program (E.G. graphics languages or other large data operations) different tradeoffs should be evaluated.

Even just that question: "Why?" and the corresponding answer are worth a read.

Agreed. I love that this came from noticing a "gap" and coming up with something to fill the hole. I wish I thought like this more often!

"Find a gap and fill it, I always say! My great-grand-uncle is the one who came up with disposible ketchup packets. Me, I found a data structure with O(sqrt) search time. Sure it doesn't have all the doohickeys of a Red-Black Tree, it's not as quick on the insert as a radix - but at the end of the day it gets the job done. And to some people, that's all that matters..."

This is a very interesting discussion, but I'd like to ask a question about this in particular:

> The short of it is, arrays are king. No two ways about it - following pointers is a losing strategy when there's an alternative.

Why is following pointers a losing strategy? If you have a contiguous array (a vector), then the pre-fetcher can do its magic and give you a psuedo-cache level. (I may be wrong about this assumption).

I'm missing how your first sentence is connected to your second (are you suggesting a contiguous array of pointers?) but I believe the generalized disadvantage of pointers is poor locality of memory access, cache misses, etc.

The prefetcher works only if it can predict what you're going to access next. Chasing pointers means essentially jumping to random addresses in a given memory region. That's pretty much the worst situation from a caching point of view, what the prefetcher loves is linear traversal.

Of course things become more complicated when you start having larger amounts of data, and you want to skip over most of it, but still keep the cache happy.

In theory, an advanced prefetcher can detect that you are stepping through an array of pointers, dereferencing each of them, and start prefecting.

Even that would have its problems, though. Typically, every one of these dereferences brings in a new cache line. Unless the data that the pointers point to fills an integral number of cache lines, you are bringing data into the cache that you will not need.

There are costs to this, because you may dereference things that are not valid pointers either. The kernel used to have software prefetch code for linked list traversal. On each access it would prefetch the subsequent access. Which sounds like a huge win. However, in practice the most common list size was just a few elements and there was a huge penalty for fetching the final, invalid null pointer. They ended up taking out the prefetching recently (and I think they saw a small speedup as a result!) Now maybe someone could invent a hardware prefetcher that does this in a way that's a net gain, but the fact that Intel hasn't done it yet suggests that it maybe it doesn't help much or that the situation isn't all that common.

You just (partially) answered your own question. Imagine something like a binary search tree. With all the pointer chasing you'll be doing, the poor pre-fetcher doesn't stand a chance.

In addition to the unexpected high costs of following pointers, the cost of searching an array is often less than you'd expect thanks to branch prediction.

What is that? Well CPUs actually process a statement in several steps, and try to process several statements at a time. But what if you have an if? The solution is to guess which way it will turn out, then begin on the expected branch before it is done evaluating the if. If the if turns out as expected, this goes faster. If not, the partially done work is thrown out, and the other branch is run. This is a substantial speedup when branches are predicted correctly.

When you search through an array you often get into a situation where most of the time the if turns out the same way. "Is this bigger than my max? No. Is this bigger than my max? No...." As a result branch prediction makes this execute surprisingly quickly.

I haven't thought about it closely but, from a cursory read, it sounds like the SquareList data structure: http://www.drdobbs.com/database/the-squarelist-data-structur...

Another thing that works is a length-sqrt(n) list of length-sqrt(n) sorted lists with internal gaps, like with Python's SortedContainers[1].

Something like this with a contiguous allocation would probably require quite a lot of empty space - 100% reserve for each subarray by necessity[2] and 50% for the array as a whole, so 200% overhead at worst and probably 90% overhead on average. Alternatively you can heap allocate each subarray at the expense of slower operations, but resulting in a more typical 100% worst-case overhead.

So you'd store like this

    |----------HEADER---------| |-----------------------BUCKETS------------------------|
    (0, 3) (2, 2) (1, 4) (_, _) [1, 3, 7, _] [14, 15, 16, 19] [9, 12, _, _] [_, _, _, _]
Cache efficiency might also suggest you store the minimums of each list in the tuples.

So to search for a value, you do a binary search over the header to find the wanted bucket and then over the bucket to find the value. That's O(log n).

Insertion is O(sqrt n) since you need to find the bucket and then do a normal insertion into it. You might need to reallocate the whole thing (remember to increase bucket size!) but that's amortized out.

Deletion is even easier since you don't have to deal with overflow (although you might want to move empty buckets to the end or merge adjacent ones, maybe). O(sqrt n).

Creating the structure is trivially O(n log n), since it's sorted.

Converting to a sorted list is just compacting the buckets.

You also get access by index at O(sqrt n) speed since you just need to traverse the counts in the header. Not perfect, but close.

For a cache-efficient design it's pretty neat. Binary searches are fast, after all.

[1]: http://www.grantjenks.com/docs/sortedcontainers/implementati...

[2]: When you split an overfull subarray, each new subarray will be half full.

After some thought, you can avoid so much reserve by just not having any reserve buckets (and thus no need for the header either). You can only resize once every O(sqrt n) operations and a resize costs O(n), so it's amortized O(n / sqrt n) = O(sqrt n) insertion. You can lower the constant factors a bit by doing occasional local reshuffling.

Not sure if it's a great idea, but it's tempting.

I don't understand this bit:

> Now each of these arrays we organize as a max heap. Moreover, we arrange data such that the maximums of these heaps are in INCREASING order. That means the smallest element of the entire (initial) array is at the first position, then followed by the next 4 smallest organized in a max heap,

Just because the maximums of the heaps are increasing order doesn't mean that the minimum is at the front. It means the first element is smaller than the biggest element of the next four.

Consider, for example:

    1 4 3 2 0
1 < 4, certainly, but 1 > 0. What am I missing?

I'm pretty sure there is an additional (unstated) assumption that each element of each subarray is less than all elements of subsequent subarrays. Otherwise you could just take the largest sqrt(n) elements, sort them, and then randomly assign the other elements of the collection to their heaps and it would still meet the criteria he's laid out, but the search algorithm he's proposed wouldn't work.

In fact, there's an error in the search algorithm as well:

> Whenever the maximum element (first element in the subarray) is smaller than the searched value, we skip over that entire heap and go to the next one. In the worst case we'll skip O(sqrt n) heaps. When the max value in a heap is less than the searched element, we found the heap and we run a linear search among O(sqrt n) elements.

For the unitalicized portion above, he means to say "When the max value in a heap is greater than the searched element".

Minimum is only at the front of the entire structure, not each heap. Thats because the first element is a heap of size 1. From what I understand, the correct layout would be something like

  0 4 2 3 1
Max heap of size 1 containing the smallest element, then a maxheap of size 4 containing the next 4 smallest, etc

Is it immediately obvious this can be constructed in O(n) time? The author just gleans over it but I'm still not able to figure it out.

Isn't O(log(n)) equivalent to O(sqrt(n))?

No, because sqrt(n)/log(n) is not bounded by a constant as n approaches infinity. Consider: in base 10, if log(n)=10, then sqrt(n) is 5 digits; if log(n)=100, then sqrt(n) is 50 digits; if log(n)=1000, then sqrt(n) is 500 digits; etc.

Errr, what? natural log grows slower than sqrt(n)


I think that's the point he is making as well. Except it might be confusing that he is using numbers for log(n) and number of digits for srqt(n).

Exactly; it shows just how wildly faster sqrt(n) grows than log(n).


Just as (e^n) grows faster (in the limit) than any polynomial of n, such as n^2 or n^100; likewise, log(n) grows slower than any polynomial of n, such as sqrt(n) = n^(1/2), or even n^(1/100).

They are both practically pretty slow-growing, though. It may well be that for practical sizes, constant factors matter more than the asymptotic difference on present computers.

In Parallel programming class the professor told us: "When I was a student, our professor told us that for practical purposes log(n) = 7"

Well, you can get sqrt to nontrivial sizes pretty easily, where you can basically never grow log to nontrivial sizes. Looking at 100 million, the square root is ten thousand and the base-2 log is 26.5. The base-2 log of a googol is 332 (and, just for fun, the square root is 100,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000).

But I actually wanted to respond to this:

> log(n) grows slower than any polynomial of n, such as sqrt(n) = n^(1/2)

It's correct to say that log(n) grows slower than any positive exponent of n, but what you've said is wrong in two ways:

- f(n) = n^(1/2) is not a polynomial, as a polynomial in n can only feature nonnegative integer exponents of n.

- f(n) = 300 is a polynomial in n, but it grows more slowly than log(n). (It's also larger, for any plausible value of n at all. So constant factors do matter, but the difference between log(n) and sqrt(n) is quite noticeable.)

Plotting the two functions might help in understanding why that's not the case:


Some other algorithms with O(sqrt(n)) runtime: https://www.quora.com/Are-there-any-algorithms-of-the-order-...

Another one is an uniform-sized carry-select adder. It has O(sqrt(n)) delay considering MUX delay << full adder delay IIRC.


if 4 <= N <= 16, then sqrt(N) <= log2(N) else sqrt(N) > log2(N)

So, in Big-O term, this only trying to optimize only small dataset (4 <= N <= 16).

Big-O is only strictly meaningful in terms of asymptotic behavior; as soon as you start talking about specific values of N, you have to care about constant factors.

In this case, the constant factors will probably be quite different because the goal is to make better use of CPU cache.

I mean, it depends on your data access patterns. Arrays are good for some things, trees are good for other things. Here's another option.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact