Hacker News new | past | comments | ask | show | jobs | submit login
What Cannot be Skipped About the Skiplist (arxiv.org)
130 points by todsacerdoti 10 months ago | hide | past | favorite | 21 comments



Section 2.3, "Analysis of p" is incomplete. The ideal value is 1/e. In most practical applications, you want to start with setting it to 1/e, which is a good speed/memory tradeoff. I tried for years to get Redis and some widely-used libraries to update their default values for everyone's benefit, as this one of my favorite data structures.

Ref: http://www.sciencedirect.com/science/article/pii/03043975940...


Another interesting data structure related to skiplists but not mentioned here are zip trees: https://arxiv.org/abs/1806.06726

The are a tree-based version of skiplists and thus more suited to functional programming / immutable datastructures.


Zip trees are novel but their performance (and therefore also skip lists, since they are isomorphic) lacks behind other linked structures like Treaps and especially LBSTs. [1] I personally find skip lists to be overhyped binary search trees in disguise.

[1] https://rtheunissen.github.io/bst


Absolutely trees in disguise, in fact there are deterministic versions of skip lists that are equivalent to B-trees. An 1-2 skip list (where each pointer to next can skip from 1 to 2 pointers on the level below) is isomorphic to a 2-3 tree.


I always felt like Pugh was pointedly ignoring the cost of inconsistently sized data structures and felt like more transparency there was necessary. It's been a really long time since I dug into them though so I could be out of date.

I'd also like to see better investigations into Treaps balanced not for fairness but for average access time, so that more frequently used values are faster to look up than uncommon values.


Zip trees are great!

For a project I made a version that uses the memory location of the entries to construct the (random) rank on the fly.

So it’s a binary tree structure that requires the same memory as a linked list (two pointers) only!

https://github.com/open62541/open62541/blob/master/deps/zipt...



I don't know anything about zip trees, but I'd think in a common scenario the memory addresses are reasonably consecutive. Would that be a problem? If not, why not just assign consecutive ranks in the first place?


The address is scrambled (similar to a random-number generator) to produce the rank. So consecutive locations do not hurt performance.


First sentence of the abstract: "Skiplists have become prevalent in systems."

They have? I've always viewed them as a curiosity. They're a ton easier to understand than balanced trees and have the same performance behavior, which sounds great. But the allocation mess[1] makes them lose to RB or AVL trees in, basically every system I can think of. Is any major software using skiplists as a standard ordered container or map?

[1] You either need to pay for Log2(N) pointers per item, or allocate them from a heap with variable header sizes. Both of those choices are really pessimal when compared with fixed-size metadata. Skiplists pretty much can't be intrusive, for example.


> Is any major software using skiplists as a standard ordered container or map?

Java for concurrent navigable maps.

https://docs.oracle.com/en/java/javase/21/docs/api/java.base...

> All Known Implementing Classes: ConcurrentSkipListMap

Balancing binary search trees suffer from lock contention more than skip lists.


> Balancing binary search trees suffer from lock contention more than skip lists.

Hm... I guess the argument would be that the various list insertions can be independently synchronized? Certainly lookup is going to be a r/w lock or whatever and basically a wash. I vaguely buy that but would want to see numbers.

But that said, the hash made of the heap due to the variable size nodes and lack of intrusivity is going to have exactly the opposite effect for any high performance implementation. Maybe Java doesn't play in that sandbox, I guess.


Garbage collectors are really nice for concurrent data structures, making lock-free algorithms very practical. Java's skip list does not require any locks for write, and requires essentially no synchronisation at all for reads (just suitable platform-dependent memory barriers). I believe reads are also wait free.

There is a locality penalty for lookups, although I don't think this is core to skip-lists, just an impracticality of the Java language and how you can use its standard libraries. The variable size of the nodes is not a problem for the Java heap, due to how compacting garbage collectors work.


SingleStore (née MemSQL) uses a skiplist as its foundational in-memory data structure.


Pretty sure I learned about skip lists as a lock-free data structure that is relatively easy to reason about (but it was a while ago and I’m not finding confirmation in a quick search).


Skip lists are relatively simple to make lock-free, while lock-free (even unbalanced) binary search trees are an absolute nightmare.

https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/...


The full subtitle is "A Survey of Skiplists and Their Applications in Big Data Systems."


the title makes this review sound a lot less comprehensive than it really is. it covers basically every variant of pugh's skip list ever published, so you can in fact skip most of them

but this is the paper you want if you need to look up what variants exist of, say, the interval skip list, and how they compare. as it happens, that's exactly what i needed today

gold


Arguably HNSW indexes for vector DBs are a variant of skiplists.


I thought arXiv papers were all (experimentally) available as HTML for accessibility reasons now, but I can’t find the html link here.

Ref: https://blog.arxiv.org/2023/12/21/accessibility-update-arxiv...


From your link:

> [...] arXiv is now generating an HTML formatted version of all papers submitted in TeX/LaTeX [...]

This paper has been submitted as a PDF blob rather than buildable TeX source, it seems. (Otherwise there’d also be a “TeX Source” link on the left.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: