

Skip lists: fast PyPy-compatible ordered map in 89 lines of Python - _wmd
http://pythonsweetness.tumblr.com/post/45227295342/fast-pypy-compatible-ordered-map-in-python

======
paxswill
> As many will attest, it’s easy to live life without an ordered map in
> Python, but the moment you need one Python starts to suck really fucking
> hard. This should be built into the language somehow.

OrderedDict [0] was added with Python 3.1, and there's an equivalent (pure
Python) class provided in the docs for backwards compatibility. Just in case
the author reads this, it would make the code nicer to implement the
appropriate "magic" method names that make it act like built-in containers
[1].

0:
[http://docs.python.org/3.3/library/collections.html?highligh...](http://docs.python.org/3.3/library/collections.html?highlight=collections#collections.OrderedDict)

1:
[http://docs.python.org/3.3/reference/datamodel.html#emulatin...](http://docs.python.org/3.3/reference/datamodel.html#emulating-
container-types)

~~~
_wmd
OrderedDict tracks _insertion order_ , not _key order_. Additionally it is not
possible to start iteration from an arbitrary key.

The method and variable names here mostly reuse the names from the original
skip lists paper, although in a real generic implementation, reusing the
mapping protocol would be a nice touch. I posted this more due to the
ridiculous simplicity of implementation: skip lists themselves are far more
worthy of note than my horrid example code.

------
_wmd
This implementation may look ghastly at first sight, however it's worth note
that:

* The node pointer list is reused for the node structure to save memory, otherwise a minimum of 72 bytes is wasted per node on CPython, in addition to malloc slack for 2 allocations (the list object itself, and the array of pointers). As it stands, CPython burns about 100 bytes per record, so 'clean' code here would potentially double the memory requirements in addition to added runtime cost.

* Numeric, rather than symbolic indexing wins 5k lookups/second on CPython. It's debatable whether using speed hacks like "search(key, IDX_PREV=IDX_PREV, IDX_NEXT=IDX_NEXT)" is uglier than the bare numbers themselves. Using accessor functions to pretty the code also costs quite a lot.

~~~
aidenn0
Out of curiousity, what made you pick skip-lists over a trie structure, such
as crit-bit trees?

~~~
_wmd
Primarily because I knew from previous research that they were easy to
implement. Secondarily because unlike most trees, they're amenable to
concurrent update (although this is mostly irrelevant to a Python application)
and I wanted to get some foundational experience. Locking a skip list is very
straightforward, and slightly less intuitive lockless versions also exist.

~~~
mzl
I'm wondering if "slightly less intuitive" is a deliberate understatement or
not :)

I recently read through the concurrent skip list map implementation in Java
that is lockless. That code is definitely quite tricky, and the comments refer
to three PhD thesis's one should read to understand how it works.

------
btilly
Random note.

The "public domain dedication" is the wrong way to make code available for
anyone to do what they want with it. The problem is that under US copyright
law, you and your heirs actually own that copyright, whether you want to or
not. Declaring otherwise has no legal force. Which means that if you change
your mind, or if your heirs feel differently than you do, people can
potentially be sued for copyright violations on that code. Unlikely, but
possible.

By contrast releasing your code under an extremely permissive license has
legal force, and neither you nor your heirs can unrelease it. (Assuming, that
is, that you own copyright to your own work. Sometimes people do not, and do
not realize this...)

~~~
tptacek
Are you sure about this? Hasn't Daniel J. Bernstein been fighting this meme
for years and years? Is he just wrong?

<http://cr.yp.to/publicdomain.html>

~~~
btilly
Given a legal dispute between informed developer and a copyright lawyer,
caution indicates that I should pay attention to the lawyer.

As Bernstein indicates, Lawrence Rosen argues the other side of this. There is
binding precedent on the question in the 9th circuit. However that precedent
is not necessarily binding on other courts, the statute does not provide for
such a mechanism, and we've already seen statutes bring works back under
copyright which had been out of copyright. The most famous example being _It's
a Wonderful Life_. Therefore it is possible that other courts could decide
differently, and it is possible that future copyright legislation or treaties
could alter the legal status of works that have been abandoned to the public
domain.

And this is just the situation in the USA. There are about 200 countries in
the world, with different legal systems. Most have some type of copyright law,
and that law roughly follows international treaties. I have confidence that in
the countries with "reasonable" copyright legislation/jurisprudence, that
copyright licenses have force. Given that even the US situation is not
entirely settled, I have no confidence that a public domain declaration has
force in other countries. Therefore caution would indicate that a simple
permissive license is preferable to a public domain declaration.

Therefore yes, I would say that the chances of Damiel Bernstein being wrong on
this are good enough that a simple permissive license is preferable to a
public domain declaration.

~~~
tptacek
Public domain is explicitly part of the Berne Convention. Bernstein is, as CS
professors go, atypically engaged with the law in general and copyright
particularly. I wouldn't be too quick to dismiss him.

~~~
btilly
Bernstein may be right. He certainly does care about this issue, and has
researched it. But when there are two almost equivalent approaches, one has 0
risk, and the other has minimal, why not follow the approach with 0 risk?

Incidentally your Berne Convention argument is rather weak according to my
reading of the actual text of the Berne Convention. (See
<http://www.wipo.int/treaties/en/ip/berne/trtdocs_wo001.html> for the text.)
Public domain is only referred to in article 18, and the only type of public
domain referred to is due to expiration of the term for copyright. Section 7
indicates the minimum terms in question, and those terms are both long and do
not contain anything indicating that they can be shortened by the author's
wish.

Therefore the fact that public domain is mentioned in the Berne Convention
does nothing to reassure me that countries which sign the Berne Convention
will necessarily pay any attention to a public domain declaration.

~~~
dalke
Berstein isn't the only one who dedicates software into the public domain.
SQLite is perhaps the most widely used software delivered in that form.

<http://www.sqlite.org/copyright.html>

They recognize that the public domain might not exist in all legal domains, so
a licensed version is available, for a fee.

Others have decided to decline to use copyright protection. There's a list of
such software at <http://unlicense.org/> .

I wouldn't look to the Berne treaty for some statement of the international
existence of copyright law. You need to look towards national laws instead.
For example, the US recognizes the public domain, and a work of the United
States government is automatically in the public domain in the US. (Though it
might not be in the public domain elsewhere.)

So like any social movement, if enough people release software and disclaim
copyright protection, then those jurisdictions which don't recognize the
public domain might change. If no software ever takes the risk, then it will
never change.

------
raymondh
FWIW, there has long been another Python skiplist recipe at:
<http://code.activestate.com/recipes/576930/> and it is indexable as well. It
is under an MIT license and runs great under PyPy.

