Hacker News new | past | comments | ask | show | jobs | submit login
Succinct Dynamic Data Structures (2001) [pdf] (imsc.res.in)
26 points by tjalfi 38 days ago | hide | past | favorite | 7 comments

Is there a good applied overview of succinct data structures for programmers (non CS researchers)?

Also I just started researching succinct dynamic graph data structures a few days ago and ended up on a very interesting paper if anyone is interested: https://arxiv.org/abs/1911.03195

Will also add this, lecture introducing succinct data structures: https://youtu.be/3Y2weLDiUWw

Ok last edit to this comment but I’m sure someone else will find this useful. I also found a good book (haven’t read it yet, but it doesn't seem to have much code examples if any) that covers Succinct Graph data structures and also Succinct Dyanmic Data structures: https://www.google.com/books/edition/Compact_Data_Structures...

Here's the abstract.

We develop succinct data structures to represent (i) a sequence of values to support partial sum and select queries and update (changing values) and (ii) a dynamic array consisting of a sequence of elements which supports insertion, deletion and access of an element at any given index.

For the partial sums problem on n non-negative integers of k bits each, we support update operations in O(b) time and sum in O(logb n) time, for any parameter b, lgn/ lg lg n ≤ b ≤ nƐ for any fixed positive Ɛ < 1. The space used is kn+o(kn) bits and the time bounds are optimal. When b = lgn/ lg lg n or k = 1 (i.e., when we are dealing with a bit-vector), we can also support the select operation in the same time as the sum operation, but the update time becomes amortised.

For the dynamic array problem, we give two structures both using o(n) bits of extra space where n is the number of elements in the array: one supports lookup in constant worst case time and updates in O(nƐ) worst case time, and the other supports all operations in O(lg n/lg lg n) amortized time. The time bound of both these structures are optimal.

I've looked into succinct data structures in the past [0] and not been totally satisfied with the stated time/space complexity (singling out the linked example because a lot of the literature is built on something similar).

Picking on the O(1) time complexity for rank queries, if the superstructure is constant sized then surely each block would have super-constant size and thus couldn't be walked in O(1) time, but conversely if the superstructure has super-constant size it itself can't be walked in O(1) time. In all cases the runtime of such a rank operation is greater than O(1).

Would somebody be willing to ELI5 what I'm missing?

[0] https://alexbowe.com/rrr/

There is no need to walk a superblock because they all have the same structure. You can pre-compute offests, similarly to what you would do to access a field within an array of structs.

Also, at the cost of sounding like the CS equivalent of flat earthers, big-O notation is kind of broken, and specifically, it doesn't make a lot of sense to say that something is O(1). Even hashmap queries, if you think about it, can't really be O(1) for arbirarily large values of N.

The general interpretation of O(1) in theoretical CS literature is "we know it's bugged, but we promise we won't bugabuse".

Thank you for that explanation! I apologize; I totally agree that you don't need to walk the superblocks. I don't know why I wrote that instead of what I was really thinking:

- The vector represents a subset of N values, hence requires Z=N bits.

- The blocks themselves require at least Z bits to store, so to be succinct the superblocks must be o(Z) in space.

- Consequently, there are at most o(Z) superblocks.

- It follows that there are at most o(Z) blocks (since otherwise you would have a super-constant number to walk per block).

- Thus, each block takes up super-constant space (and the popcount component must increase in size as well).

- Your rank lookups require at least O(popcount-size) time, which is super-constant.

I think you're right though that the problem I was seeing just falls back to the general "we know it's bugged, but we promise we won't bugabuse" idea -- no function returning distinct values for N different inputs on a fixed-width register machine can execute in less than log(N) time, and I probably should have just stopped there and called it a day.

You can make a model where the presented algorithm runs in O(1) time, for example a transdichotomous model. However those models have very unintuitive consequences (for example, the O(n log n) bound for sorting doesn't hold any longer!), which is what I was hinting to with my "big O is broken" remark.

You are absolutely right that it can't be done on a fixed-width register machine, though.

Generally speaking, most papers assume arithmetic is O(1) time and integers fit in O(1) space because keeping track of the actual time and space of integer arithmetic gets unwieldy very quickly. But this model is super broken: if arbitrary arithmetic can be performed in O(1) then P = PSPACE. The "gentlemen's agreement" assumption is that operations that make numbers grow too big are forbidden.

I don't know much about this stuff (I'm way out of my depth as a matter of fact), but I'm glad I could be useful :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact