
Show HN: Array with Constant Time Access and Fast Insertion and Deletion - igushev
https://github.com/igushev/IgushArray
======
malisper
For those reading along, the post describes a vector-like datastructure that
has O(1) access time and O(sqrt(N)) insert and deletion time _at an arbitrary
index_. The idea is pretty clever. The datastructure maintains sqrt(N)
circular arrays each of size sqrt(N). For reference indexing into a circular
array has O(1) access time and O(1) insert and deletion time at the ends.

For accesses, you can in constant time determine which circular array contains
the element at the given index (it's simply _i % sqrt(N)_ ) and then in
constant time access the element from the underlying array. For inserts and
deletions, you find the circular array that contains the location you want to
insert into. First you make sure there is space in the array to insert. You do
this by moving one element from each circular array to the next one. Since
deleting and inserting from the end of a circular array takes O(1) and there
are O(sqrt(N)) arrays, this takes a total of O(sqrt(N)) time. Then you insert
the new element into the middle of the designated circular array which is of
size sqrt(N) so it takes in the worst case O(sqrt(N)). This means insertions
take a total of O(sqrt(N)) time.

As immawizard pointed out, there is a generalized version of this idea called
tiered vectors[0] that supports an arbitrary level of nesting. A 1-tiered
vector is a circular array. A k-tiered vector is an array of n^(1/k) tiered
vectors of tier (k-1). You can show that for a k-tiered vector, access time is
O(k), while insertion and deletion have a runtime of O(n^(1/k)). The
datastructure mentioned in the post can be considered a 2-tiered vector.

The post includes benchmarks comparing the datastructure to std::vector. I
would be interested in seeing benchmarks vs a binary search tree. Even though
the datastructure has O(sqrt(N)) performance, that's still a lot slower than
O(log(N)). The square root of a million is 1000, while the log base 2 of a
million is only ~20.

One nitpick is that the author names the datastructure after themselves.
Naming things after yourself is typically a faux pas.

[0]
[https://www.ics.uci.edu/~goodrich/pubs/wads99.pdf](https://www.ics.uci.edu/~goodrich/pubs/wads99.pdf)

~~~
igushev
Thanks for feedback! Any naming suggestions? :-)

~~~
Stratoscope
This is fun, let's have a naming party!

Here's my nomination:

Loop List

~~~
Mvhsz
This seems like the winner. Although I would tweak it to looped list

~~~
Izkata
That just sounds like a circular linked list. They were probably going for
"list of loops", so I think the ambiguity here makes it a bad choice.

------
immawizard
I believe this data structure is called Tiered vectors, and is described in:
[https://www.ics.uci.edu/~goodrich/pubs/wads99.pdf](https://www.ics.uci.edu/~goodrich/pubs/wads99.pdf)

~~~
igushev
I need to read the paper, but on high-level looks indeed similar.

------
dwohnitmok
There's some interesting pointers for how to improve your data structure here:
[https://stackoverflow.com/questions/10478260/looking-for-
a-d...](https://stackoverflow.com/questions/10478260/looking-for-a-data-
container-with-o1-indexing-and-ologn-insertion-and-dele/10519240#10519240)

There's also some benchmarks for another implementation of the same idea
linked to in the README here: [https://github.com/mettienne/tiered-
vector/blob/master/READM...](https://github.com/mettienne/tiered-
vector/blob/master/README.md). That implementation also just punted on growing
the maximum capacity of the area.

------
birdbrain
I don't want to detract from the cleverness here, but I believe your
benchmarks could use some work. Here are a few suggestions:

1\. Simply testing something 1000 times and (presumably) presenting the
arithmetic mean is not very informative. Looking at the detailed reported
benchmark times (in the output file in tests), it looks like many of the
timing outcomes have high variance. Rather than running the tests 1000 times
and taking the mean, you might consider running 10 batches of 100 tests (or
1000, if you can) and presenting the mean and variance of the resulting
distribution. In general, k sample groups each of size p will provide more
reliable information about the underlying distribution than one sample group
of size k*p (for reasonable k and p, obviously).

2\. Related to that, the results of the "inserting a number of elements" and
"deleting a number of elements" tests are significantly worse for the tiered
vector vs the std::vector than the "insert/delete a single element" tests. You
don't mention this in the readme, but thinking about why it is might be
informative. Thrashing seems like a possible explanation, and one you might be
able to mitigate.

3\. Are you making sure your cache is warm before starting to measure
performance? (Pardon, I didn't look through every line of your tests.)
Particularly for std::vector, and likely your intermediate deques too, this
will have a big effect on timing.

4\. Finally, it looks like you're primarily testing using ints (?). It would
probably be a good idea to see if your results hold for a different payload
size.

I don't know whether these will improve or worsen your comparison against
std::vector, but they will make your claims more robust.

~~~
jnordwick
He's going to have cache issues since he requires one extra memory lookup.

His lookup should be roughly twice as expensive as a regular array when cache
is cold - it will be dominated by the cache misses two vs one.

With a hot cache, the array lookup becomes a single load op of a few cycles,
and even if he gets two cache hits, his will probably be about 6-10 ops with 2
loads, a div, and a few more ops to get the index of the sub bucket.

On one cache hit and a possible miss on the other (eg his top level index is
in cache, but the bucket might not be), he's going to be getting high variance
in his lookup timings. OP: to help the iterator access (sum), you might be
able to prefetch (eg force a read) of the folling bucket before you need it.
when you move over the bucket 2, cause a fetch of bucket 3 and the same time.
You might be able to hide some of that cache miss latency then.

Also, with the DEQs you are potentially starting at the middle of a slab and
having to roll around its front. That isn't going to prefetch well, so you
might want to fetch the next next slab's base address and the address it
logically starts at since those cane be different. Just try to hide as much
cache miss latency as you can because that is where you are probably getting
killed on in the iterator access.

------
panda88888
Correct me if I am wrong, but I think the insertion/deletion time complexity
is incorrect. It should be O(N). If each DEQ is implemented using array
instead of list, wouldn’t the insert/delete operation of IgushArray still take
linear time? Since popping or pushing (pick one) of DEQ implementation with
array still take linear time, that means, let’s say insertion is at position
N/2, then N/2 ... N-1 elements still need to be moved to the right by one?
Popping/pushing is only linear if the DEQ is implemented with linked list, but
that would mean linear access time instead of constant.

Update: answer is to implement DEQ with circular buffer for O(1) pop/push.

~~~
igushev
Insertion and deletion from DEQ indeed is linear, but in all of DEQs in the
structure only one (!) needs insertion, the rest are just popping/pushing
front/back.

~~~
flafla2
> the rest are just popping/pushing front/back.

Sure, but this step would require a right shift for (in the worst case) every
element in the structure. Suppose you have the following DEQ:

    
    
        [R0] -> [0, 1, 2]
        [R1] -> [3, 4, 5]
        [R2] -> [6, 7, 8]
    

And you want to remove 4. From my understanding of your README, the structure
will look like this after the operation:

    
    
        [R0] -> [0, 1, 2]
        [R1] -> [3, 5, 6]
        [R2] -> [7, 8, _]
    

This was _not_ and O(sqrt(n)) operation, because you had to move elements 5,
6, 7, and 8 ( _not_ just 5 as suggested by your figures). So in general you
still need to move O(n) elements upon any insertion or deletion.

EDIT: Just read some of the other comments and realized that this can be done
with a circular buffer with a bit more memory than the subarrays themselves.
Makes sense + clever! @OP, I really think you should clarify on this important
implementation detail in your writeup.

~~~
kadoban
5 and 6 do need to move. 7 and 8 can stay where they are. In R2 you'd change
the variable that stores where the front of the DEQ is to point to 7, but 7
and 8 themselves stay put.

------
igushev
Interesting that some people found an article about tiered vectors, I didn't
find it back then. I implemented this structure many many years ago and also
published in Bauman Moscow State Technical University magazine back in 2012:
[http://engjournal.ru/articles/101/101.pdf](http://engjournal.ru/articles/101/101.pdf)
(in Russian)

~~~
dwohnitmok
Tiered vectors have been around for a while. The paper for them came out in
1998. They've just always been rather underappreciated. I only found out about
them through this thread too.

I don't think any of us mean to imply that you copied off another
implementation; independent discovery happens all the time! It's great you've
made this a usable C++ library and have contributed your own thoughts on the
structure!

It looks like you're having difficulty with some of the resizing. The two
array approach described here
[https://stackoverflow.com/questions/10478260/looking-for-
a-d...](https://stackoverflow.com/questions/10478260/looking-for-a-data-
container-with-o1-indexing-and-ologn-insertion-and-dele/10519240#10519240)
makes resizing a lot easier. You just double the size of the main array and
increase the size of the offset array by the square root of two, without
needing to fiddle around with anything else. You also get unchanged amortized
time bounds.

~~~
igushev
I also found out about tiered vectors only in this thread. At the time of
implementation I tried to Google for something like this but didn't find.

------
asdfasgasdgasdg
This is very similar to what std::deque does under the covers already. It's a
chunked array. The main difference AFAICT from std::deque is that this offers
n^1/2 insertion and deletion within the body of the array. This guarantee is
provided by allowing for the possibility of significant overallocation in the
chunks themselves, if many modifications happen in the middle of the array. On
the other hand, std::deque does not overallocate by as much -- just a chunk at
the beginning and a chunk at the end. But as a consequence it has to move
trailing elements when an element is inserted or deleted in the middle.

Interesting!

~~~
igushev
DEQs also allocate internal arrays with hardcoded size only (8 in std, if I
remember correctly)

~~~
asdfasgasdgasdg
At least in libc++ it's the greater of 4kB or 16 * sizeof(value_type). That's
the default, it is configurable. 8 would be a pretty unfortunate design
choice.

------
juliusmusseau
I'd like to see the performance of growing the list (a common scenario):

    
    
        ArrayList list = new ArrayList();
        for (int i = 0; i < N; i++) {
          list.add(i);
        }
    

(Forgive me, Java is my mother tongue).

~~~
kadoban
I believe this case is very badly handled in this data structure
implementation. It seems to be saying that you want to occasionally manually
recreate the DS as it grows bigger, as the deqs won't automatically resize. So
you'll just have linear behavior in that case, with extra constants because
the deqs are useless logic.

I suspect an amortized data structure could automatically and internally do
this rebalancing operation, but I'd have to work out the scheme and the
analysis. At a guess, something like normal dynamic arrays do, where you
multiply the max size by a constant when it fills up, would give you decent
amortized bounds.

~~~
igushev
Correct. I need to add automatic restructuring. Currently structure sensitive
and benefits a lot from using reserve() method

------
meuk
This is cool. I wanted to comment on the necessity of knowing the approximate
size of N a priori, but this is very clearly stated on the github page.

Interesting alternatives that also support insertion, deletion, and lookup (is
there a name for this interface?) are AVL trees and hash tables.

As described in
[https://arxiv.org/pdf/1711.00275.pdf](https://arxiv.org/pdf/1711.00275.pdf),
tiered vectors can be improved further:

"We present a highly optimized implementation of tiered vectors, a data
structure for maintaining a sequence of n elements supporting access in time
O(1) and insertion and deletion in time O(n^ε) for ε > 0 while using o(n)
extra space."

~~~
igushev
It seems I need to add automatic restructuring to the structure, so it would
maintain perfect lengths of DEQs

------
carl8
I wonder how this would compare to Judy arrays. Would anyone like to benchmark
this?

[https://en.wikipedia.org/wiki/Judy_array](https://en.wikipedia.org/wiki/Judy_array)
[http://judy.sourceforge.net](http://judy.sourceforge.net)

To install the Judy C lib:

    
    
      brew install judy
      apt-get install libjudy-dev
    

Then compile with -lJudy and #include <Judy.h>

Example:
[http://judy.sourceforge.net/doc/JudyL_3x.htm](http://judy.sourceforge.net/doc/JudyL_3x.htm)

~~~
proverbialbunny
Interesting. I wonder how similar a Judy array is to an rrb-tree (relaxed
radix binary tree), or if they're the same thing.

An rrb-tree is an effective O(1) time for every category. Scala's Vector type
implements one.

If interested about the data structure:
[https://youtu.be/sPhpelUfu8Q](https://youtu.be/sPhpelUfu8Q)

Scala bigO doc: [https://docs.scala-
lang.org/overviews/collections-2.13/perfo...](https://docs.scala-
lang.org/overviews/collections-2.13/performance-characteristics.html) Scala's
doc on it: [https://github.com/nicolasstucki/scala-rrb-
vector/blob/maste...](https://github.com/nicolasstucki/scala-rrb-
vector/blob/master/documents/RRB%20Vector%20-%20A%20Practical%20General%20Purpose%20Immutable%20Sequence.pdf)

------
jnordwick
Why would you chose this over a Van Emde Boas tree where indexing, insert,
delete are O(log log) and prev/next are constant?

[https://en.wikipedia.org/wiki/Van_Emde_Boas_tree](https://en.wikipedia.org/wiki/Van_Emde_Boas_tree)

(it basically trades a little more work on indexing for a little loser
structure)

I like the use of the DE queue, but having to keep all the sub vectors packed
to ensure constant time lookup might not be completely needed if you are
willing to keep a little more information around or have two candidate
buckets. Not sure if would be a win or not though, since they your probably
just better using a VeB tree.

e.g., I don't think you always need to compress everything back to the front
if you can make sure all the buckets are within 1 or each other, but you can
hit cases where you have to look in two buckets which isn't that difficult
since you how the high and low of each bucket at the head and tail of each
DEQ.

But then you would need to keep some local coloring info keep the buckets
perfectly balanced, as everything should be.

------
kccqzy
Doesn't really seem much better than just a balanced size-tagged binary tree.
You get O(log n) access, insertion, and deletion. Yes access is slower but
insertion and deletion is much faster. Remember that for even one billion the
logarithm is merely 30. That's hardly anything. Whereas the square root is
more than 30 thousand.

~~~
igushev
That sounds like trees are silver bullets but they're not and that's why we
have variety of basic data structures.

~~~
kccqzy
I'm not saying they are a silver bullet but personally from experience I think
they are a good choice in the majority of scenarios that programmers
encounter. Of course our experiences could differ.

------
michaelrpeskin
Is there a reason you didn’t just use std::deque? You’re already in C++.

~~~
kadoban
std::deque has O(n) insertion/deletion in the middle. Or do you mean replacing
the internal deques with std::deque?

~~~
temac
> std::deque has O(n) insertion/deletion in the middle

I'm curious to know from approximately which n this is a problem though. It
can be easy to make something with a better O complexity that is actually
slower for what computers can actually handle.

edit: nevermind, I thought deque was more "smart" than it is but it needs to
move all _elements_ when inserting / deleting in the middle, so it won't have
a great constant factor.

~~~
proverbialbunny
I believe a std::deque is an rb-tree (radix binary tree), which suffers from
insertions in the middle.

An rrb-tree solves this problem by being relaxed, which means allowing for
having random empty places in the cache line sized arrays within the
structure. This way insertion is effectively O(1).

Modern languages implement these, like Scala, but C++ has been around for a
while, so it's the stl is not the fastest in a few ways. std::deque is one of
them. Another example, std::sort isn't the fastest, I believe.

~~~
kadoban
I think the typical implementation of a std::deque is actually something like
a vector of pointers to vector of pointers to T. (except they can't be
std::vector exactly, because then you'd only get amortized bounds on pushes).

------
bt848
It says in bold type at the beginning "The IgushArray class fully implements
std::vector interface" but it's clear it cannot provide an equivalent of
std::vector::data.

~~~
igushev
There is a disclaimer about that:
[https://github.com/igushev/IgushArray#limitations](https://github.com/igushev/IgushArray#limitations)

------
mises
Side-question: is there any way to explicitly tell github whether a file is C
or C++? I have this same issue, where C code is marked as C++ and vice-versa.
Not a huge issue, but a bit annoying. It seems a bit weird that linguist
(github's tool for doing language recognition) marks a file with namespaces in
it as C, which is obviously not supported.

------
Labo333
All of those operations are achievable when replacing sqrt with log by using
Skip lists
([https://en.wikipedia.org/wiki/Skip_list](https://en.wikipedia.org/wiki/Skip_list)),
and probably with Zip trees as well
([https://arxiv.org/abs/1806.06726](https://arxiv.org/abs/1806.06726)).

That being said, the overload of a complex data structure might multiply the
running time by a factor up to ~10 (mostly because of cache misses), so
simpler structures will be more performant for short inputs.

There is a more general pattern in most tree data structures where you can
transform the log(n) recursive operations on the tree into sqrt(n) operations
on a simpler structure with 2 levels.

I happen to have described the solution to a problem where you must use a
sqrt-decomposition datastructure to have updates in O(1) and queries in
O(sqrt(n)) because O(log(n)) everywhere is not good enough as there are a lot
of updates to do ([https://tryalgo.org/en/2017/09/01/path-
statistics/](https://tryalgo.org/en/2017/09/01/path-statistics/)).

------
bnolsen
The performance numbers would be most interesting if he benchmarked against
std::vector, std::list, std::multiset and std::deque just to see how it
compares to all the major collections. Using different items, say std::string,
int64_t and a big POD struct might fill in a few things too.

------
superphil0
If deq had to be fixed length beforehand with length k but with many
insertions k can be much smaller than n^0.5 I think you might end up with a
much worse worst case for insertion? Something in the order of O(n) than
O(n^0.5)

Or there has to be a readjustment of k.

Cab somebody point out my thinking error?

~~~
igushev
Yes, the structure is sensitive for that and benefits a lot from using
reserve(). I need to add automatic restructuring to maintain perfect k.

------
Thorrez
The Push Front time could be improved to O(1)* by making it circular. Assuming
"*" means amortized. But as-is it does provide a better comparison to an
array. A circular IgushArray would be better compared to a circular array
(which an std::vector isn't).

~~~
igushev
That's a good point. That would make computation of indexes much more
complicated though

------
ummonk
You could do this with O(N^1/K) for arbitrary K, using K layers of
indirection, no?

This is similar to how you can do O(logN) for insertion / deletion & access
using an indexed B-tree, except you're using a fixed height instead of a fixed
branching factor.

~~~
baiwan123
True. If you go k levels deep you get indexing complexity of O(k) (because you
do a constant time operation for each level) and insertion/removal complexity
of O(k + n^(1/k)) (because that’s indexing plus a linear insert/remove
operation for the array at the bottom). If you push k towards log n you get
O(log n) for both. (I’m ignoring rebalancing issues here but these can be
dealt with too) The article did mention that you can push k to values larger
than 2 in the Generalization section. I think it would’ve been nice if they
had explored such values when doing the performance tests, but either way I
found it clear and nicely written.

~~~
ant6n
for insert/remove you also need to move the last element up for each deque
with higher indices, and If I understand it correctly, there should be
O(N^(2/3)) (i.e. N^1/3 * N^1/3) such deques.

------
enriquto
cool! looks like from that, you could maybe

1\. extend the construction to three dimensions, so that the insertion cost is
a cubic root?

2\. iterate the extension to an increasing number of dimensions, so that the
cost becomes logarithmic?

~~~
igushev
I mentioned that extension in the published paper, but complexity of access
becomes O(d) where d is number of dimensions.

------
JulianWasTaken
The benchmarks compare this to standard arrays, but I'd be curious to see how
tiered vectors stack up against persistent vectors (either Clojure's or
Python's [pyrsistent's])

------
jason0597
As someone who doesn't know much about this level of CS, can someone explain
to me why inserting/deleting an element in a linear array is expensive?

~~~
kmbriedis
Are you a software developer? No judging, just curious

~~~
jason0597
Turns out it was a misunderstanding on my part. I interpreted the word
"insert" as "replace" and I was wondering why replacing an element in a linear
array would be an expensive operation. And no, I am not a software developer,
I'm a chemical engineering student. Though I've done a bit of work on Nintendo
3DS hacking by writing tools for various exploits, and also bit of STM32
tinkering ([https://github.com/jason0597](https://github.com/jason0597))

------
nayuki
Your idea appears to build upon the concepts in the
[https://en.wikipedia.org/wiki/Hashed_array_tree](https://en.wikipedia.org/wiki/Hashed_array_tree)
.

Alternatively, here's my list ADT implementation which has O(log n)
access/insertion/deletion time: [https://www.nayuki.io/page/avl-tree-
list](https://www.nayuki.io/page/avl-tree-list) .

------
piccolbo
sqrt(N) can be described as "fast" only with significant amounts of spin
added. Use ropes instead.
[https://en.wikipedia.org/wiki/Rope_(data_structure)](https://en.wikipedia.org/wiki/Rope_\(data_structure\))

------
szemet
I had some hard time parsing O(N^1/2)... ;)

( O(N^(1/2)), O(N^0.5), O(sqrt(N)), O(N¹ᐟ²), O(√N̅) would have been better...
)

------
Glyptodon
Reminds me of a skip list.

------
known
Nothing beats !a[$0]++

------
not_kurt_godel
My somewhat naïve question is: if indeed this is an objectively superior data
structure as claimed, why hasn't anyone else thought of it before in the
history of computer science? Extraordinary claims of devising a radically
better implementation of a fundamental data structure require extraordinary
evidence that A. the idea is indeed novel and B. it doesn't come with hidden
tradeoffs that are only advertised in the fine print. The work seems
impressive, and maybe it is somehow a completely new genius idea no-one has
thought of, but whatever the case it needs to much more heavily reference
existing literature to credibly make the barely-without-qualifiers claims
contained in the README. (And to be clear: I actually hope that such
references bear out the claims, because if so - awesome! But in the meantime:
"trust but verify")

Edit: curious about the downvotes - care to explain? I feel like this is a
legitimate and fairly uncontroversial take, but happy to hear differing
perspectives...

~~~
asdfasgasdgasdg
The downvotes are probably because your question could be asked of any new
thing. "If this is so great, why didn't someone do it before? Therefore it is
probably not so great." I mean you could literally paste that same comment
with almost no modification as a response to almost anything new thing posted
on this site. That's a good signal that the comment is not helpful.

Also, I don't think the author claimed it's "objectively superior." It has
objectively superior asymptotic performance in certain relatively niche
workloads. However, its constant time factor is iffy due to all the pointer
following and calculations you have to do for random accesses. And the
benchmark results in the README show this clearly -- both the strengths and
the weaknesses.

Actually, your whole train of thought around the claims in the README is
puzzling to me. You seem to think they're grandiose and over the top, but the
README seems mostly explanatory to me. What statements in there do you think
ought to be qualified, specifically, and how?

If you want upvotes next time, explain in detail what claims you object to,
and why. This comment I'm afraid did not add much to the discussion, at least
for me.

~~~
not_kurt_godel
Thanks for your explanation.

> It has objectively superior asymptotic performance in certain relatively
> niche workloads. However, its constant time factor is iffy due to all the
> pointer following and calculations you have to do for random accesses. And
> the benchmark results in the README show this clearly -- both the strengths
> and the weaknesses.

I got that impression after going through a fair bit of the README. My feeling
is neither the post title nor the introductory statements in the README
reflected these very relevant qualifications to the claim of "An Array with
Constant Time Access and Fast Insertion and Deletion", and putting the onus on
the reader to dig through the whole thing to discover this is either naïve or
a bit disingenuous. As others have noted, and as I implied in my comment,
there is in fact existing literature that captures the particular ideas and
tradeoffs in this implementation - Tiered Vectors. I believe the author has a
duty to cite that up front and prominently in the spirit of honest academic
discourse.

~~~
adwn
> _As others have noted, and as I implied in my comment, there is in fact
> existing literature that captures the particular ideas and tradeoffs in this
> implementation - Tiered Vectors. I believe the author has a duty to cite
> that up front and prominently in the spirit of honest academic discourse._

What "academic discourse"? This is a datastructure implementation with some
documentation in a Github repo, not a submission to a scientific journal.
You're insinuating a level of rigor that is completely unwarranted.

Just lean back, relax, and accept that you overreacted a bit in your original
post. It's not the end of the world to admit that.

