

Skip Lists are Fascinating - mbowcock
http://igoro.com/archive/skip-lists-are-fascinating/

======
barrkel

        for (int R = _rand.Next(); (R & 1) == 1; R >>= 1)
    

For C# on .NET, this may work fine, but in general, this is a bad way of
extracting a 0/1 choice from a pseudo-random number generator (PRNG). Many
PRNGs use linear congruential generators, which are just a multiply and an
add, and have highly predictable low bits as a result (frequently just a
repeating pattern with a short period, as short as 4 or 8).

Safer:

    
    
        while (_rand.Next(2) == 0)
    

\- or even simply reading the bits off the other end.

Another nice thing about skip lists is that they are relatively easy to make
into cheap persistent structures (aka functional structures), that is,
structures where operations return a new immutable copy of the structure, but
share most of the substructure with the previous version.

~~~
leif
_anorwell is right, I'm wrong:_

Actually, his technique is _way_ faster (1 random generation per element, vs.
log(n) or log(u)), but you're absolutely right, you need to take the higher
order bits first or your skiplist will be very far from correct.

(P.S. I'd love to have you reviewing my code ;-)

~~~
anorwell
> 1 random generation per element, vs. log(n) or log(u)

The expected height of an element is 2, and the maximum height is 32. So there
is at most a constant-factor performance difference, and in practice we should
find this constant to be small.

It's not clear whether any small performance gain is worth the loss of clarity
in presentation of the algorithm and code.

~~~
Xk
Not only that, but if you were to write this in (say) Java, which uses a
linear congruential random number generator, doing it this way would probably
be even faster due to the higher quality random numbers. The least significant
bits on a linear congruential generator aren't terribly random.

------
mayank
They are indeed fascinating, and Erik Demaine's MIT OCW lecture on it is
amazing [1]. The analogy between the NYC subway system (express and local
trains) and skip lists is brilliant.

[1] [http://ocw.mit.edu/courses/electrical-engineering-and-
comput...](http://ocw.mit.edu/courses/electrical-engineering-and-computer-
science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-
lectures/lecture-12-skip-lists)

~~~
dougabug
Terrific lecture. A bit leisurely for the first hour, probably could've cover
that in half the time, but then progressively realizes he's short on time and
accelerates through the probabilistic analysis of the runtime. Still pretty
easy to follow at warp speed near the end, because he is extremely clear. For
instance, you can tell when he makes a simple 1/x mistake in a couple places
(he catches himself), since his explanation which goes along with the
calculation is crystal clear. Nice job.

------
jbapple
I don't find skip lists to be simpler than AVL trees, but it's my
understanding I'm in the minority on this issue.

What I do find very interesting about skip lists is that they support fingers
- pointers to locations in the structure that allow fast modification nearby.
As a very simple example, prepending an item to a skip list (this cons) is
O(1) expected.

Getting this property for AVL or red-black trees is possible, but much more
difficult, and requires fundamental changes to the structural invariants and
representations.

For more on the uses of finger search, see:

<http://www.cs.au.dk/~gerth/pub/finger05.html>

~~~
sesqu
I find AVL trees simpler as well, and probably a good deal more performant.

The nice thing about skip lists, beyond being mostly simple, is that min() is
O(1), which is far more useful than the middling value at the root of AVL
trees.

That you can finger-append in O(u) is misleading, since ideally u≃log n and
finger operations are amortized O(1) in AVL trees (I know it's been shown
empirically, not sure if analytically).

~~~
jbapple
> That you can finger-append in O(u) is misleading, since ideally u≃log n

What's finger-append?

> finger operations are amortized O(1) in AVL trees (I know it's been shown
> empirically, not sure if analytically)

Do you have a citation for that? I thought delete was Theta(lg n) expected,
even from a finger.

~~~
sesqu
By finger-append I meant that skip lists are normally singly linked, and so
you can append to a sublist but not prepend nor insert into the implied
subtree.

The O(1) wasn't amortized, it was expected. The citation is P. L. Karlton, S.
H. Fuller, R. E. Scroggs and E. B. Kaehler, Performance of height-balanced
trees. Comm. ACM 19, 1 (1976), 23-28.

~~~
sesqu
I should also note that it's not very surprising that AVL restructurings show
up as expected O(1) empirically, since many AVL trees are BB-trees, and it's
been shown for BB-trees under the assumption that the root approximates the
median - which is certainly true in most experimental settings. So It's quite
possible the expectation isn't O(1) for pathological distributions, not to
mention sequences.

------
tdmackey
They are very interesting but in practice often preform poorly due to the cpu
not being able to figure out what to cache.

~~~
cpeterso
Yes. I have read that big skip lists have poor data locality because the skips
jump at unpredictable (probabilistic) times to memory that is probably on a
different page.

------
elbenshira
I think the original paper is very readable:
ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf

------
kamechan
i've always preferred treaps to skiplists as far as probabilistic data
structures are concerned. that could be partially due to the fact that i felt
aragon and seidel's paper on treaps to be fundamentally better that pugh's
paper on skiplists, which i recall being kind of hand-wavy with respect to the
analysis.

i've had to implement both and replace many of the java collections interfaces
with them as the backing store for various projects or coursework. i find the
structure of the skiplists intricate and fascinating as a thought experiment,
but i feel they difficult for certain things, like implementing an iterator
over them. treaps, if i recall correctly, just use the normal BST traversals.

would be curious to hear comparisons on the two, being probably the most
popular of the probabilistic data structures.

performance wise, i've found both to have their strengths and weaknesses under
various load testing scenarios. i have some stats around here somewhere.

of course the downside with any probabilistic data structure is that you're
counting on the amortized bounds, but could end up with the absolute worst
case performance at times. there are so many well-documented and well-
implemented libraries out there for red-black trees (the gold standard in my
opinion) that it's hard to find compelling reasons besides curiosity to use
them in practice.

the original papers for both of them are here (treaps):

<http://faculty.washington.edu/aragon/pubs/rst89.pdf>

and here (skiplists) :

ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf

~~~
gjm11
> of course the downside with any probabilistic data structure is that you're
> counting on the amortized bounds

No. With typical probabilistic data structures you're counting on the
_average_ bounds. If you have good amortized bounds and that's what you care
about, you don't need to bother with the randomization. Per-operation versus
amortized and worst-case versus average-case are orthogonal distinctions.

~~~
kamechan
yes, sorry. that's what i meant. long day. wait, it's still not over.

------
AndrewO
IIRC, Redis uses skip lists in its implementation of sorted sets. Good rundown
of the data structure.

~~~
antirez
Exactly, Redis uses augmented skiplists so that we can support the rank
operation in O(log(N)).

------
oniTony
An interesting supplemental reading is the "two bowling balls" interview
puzzle -- [http://20bits.com/articles/interview-questions-two-
bowling-b...](http://20bits.com/articles/interview-questions-two-bowling-
balls/)

It's essentially a 2 level skip list (but could be generalized where you get
one extra level per bowling ball), and brings out some basic calculus to
optimize lookup performance even further.

------
richcollins
The creator of Io created a kv db based on skip lists called skipdb that is
worth a look.

