Hacker News new | past | comments | ask | show | jobs | submit login
What the heck is an xrange? (late.am)
98 points by dcrosta on June 19, 2012 | hide | past | web | favorite | 26 comments

Nitpicking, but this statement doesn't work with large numbers (which xrange() is supposed to handle correctly):

    self._len = int(ceil(float(stop - start) / step))
Python supports arbitrary-length integers; you can't just cast those to (fixed-length) floating point numbers without losing precision. It's better to use integer division here, for example:

    self._len = (stop - start)//step + bool((stop - start)%step)
(A variant like (stop - start + step - 1)//step works only for positive numbers; I guess it could work if you put it in the earlier if-clause and put a corresponding assignment with +1 at the end in the other branch.)

Great catch. If you want to submit a pull request on github you can get the credit -- github.com/dcrosta/xrange

A related nitpick: since Python's integers are arbitrary size, technically operations like - and / are not constant time, so your claim of an implementation "with constant-time and constant-space operations" is not true.

However, I understand if that complaint is just too nitpicky for you to want to mention it. Great post btw.

Thanks for the consideration, but I don't really use github, so you go ahead and take the credit yourself.

OK, thanks. Updated in github and on the blog.

Huh -- CPython 2.x doesn't let you create an xrange with values past 263-1. CPython 3.x does.

Even 63-bit integers aren't (all) representable in IEEE double floating point values (that Python uses) which have a 53 bits mantissa.

For example, int(float(10¹⁸ - 1)) != 10¹⁸ - 1, but xrange(1, 10¹⁸) is perfectly valid (even in Python 2).

edit: how do I type two consecutive asterisks on Hacker News? Backslash doesn't seem to work as an escape character.

If you put two spaces at the beginning of a line, you'll get a monospaced "literal" mode.

  For example, int(float(10**18 - 1)) != 10**18 - 1,
  but xrange(1, 10**18) is perfectly valid (even in Python 2).


Edit: failure. That's two stymied people :(

You can also cast to long, which doesn't lose precision.


As a side-note, here's a related article on implementing a Python generator "for real" using the C API: http://eli.thegreenplace.net/2012/04/05/implementing-a-gener...

I didn't expect to learn much when I started reading this, but I was wrong. Excellent post. Thanks.

I had a _very_ similar question during my Google interview. In the course of the day I was tasked with implementing a generator (although answering with `(x for x in foo)` got a smile I did have to build a class) and later in the day was asked how a sequence manager can maintain constant time.

This is an excellent post and every Python hacker should read it. Kudos to the author.

This is cool. I know this isn't how they actually do it, but seeing it in a language I can understand (Python) and not in C is really pleasing.

You can do tons of examples, but until you figure out how the machine works inside, you'll be completely lost (you could get it with examples, but you won't necessarily know WHY you got it, so you'll be useless in helping other people learn).

Does anyone know how this is implemented in Pypy?

Not worth a pull request, but personally I'd replace:

        if len(args) == 1:
            start, stop, step = 0, args[0], 1
        elif len(args) == 2:
            start, stop, step = args[0], args[1], 1
        elif len(args) == 3:
            start, stop, step = args
            raise TypeError('xrange() requires 1-3 int arguments')

        map = [
                lambda args: (0, args[0], 1),
                lambda args: (args[0], args[1], 1),
                lambda args: args,
            start, stop, step = map[len(args)](args)
        except IndexError:
            raise TypeError('xrange() requires 1-3 int arguments')
It's more DRY, and it conveys the intent better.

I would not do such a change to the if step block since its pattern feels noticeably different: "open" checks fit well in a if/else, whereas bunch-of-equalities fit a dispatch map better (plus you can actually modify the map at runtime).

That imho seems like overcomplicating a relatively straight-forward piece of code without noticeable improvement. In the original code the intent is clear from the first line ("check the number of arguments"), but in your version I have to parse 7 lines of code before I find out that. Also in your code I'm required to keep larger, more complicated state (the map array) in my head when reading it. Your version also looks like it would perform worse than the original. Imho code that looks like it performs suboptimally is code that looks ugly, even if the performance difference in reality would be negligible.

To me your version is less clearer than the explicit if calls:

* Name 'map' for a variable is a poor choice (as it has same name as the python builtin function map)

* Your version has off by one error, it doesn't give correct results when called with a single element list or if the list has three elements:

  >>>mymap = [
                lambda args: (0, args[0], 1),
                lambda args: (args[0], args[1], 1),
                lambda args: args,

  >>>args = [5]

    Traceback (most recent call last)
   # This shouldn't be the case, a list with a single
   # element is a valid input

  >>>args = [1,6,1]

    IndexError   Traceback (most recent call last)

   # This shouldn't be the case, a list with a three
   # elements is a valid input

If only lambda's allowed statements :(.

  >>>mymap = [
               lambda args: raise IndexError,
               lambda args: (0, args[0], 1),
               lambda args: (args[0], args[1], 1),
               lambda args: args,
That aside, this approach would be much slower too.

Answers to http://stackoverflow.com/questions/1482480/xrange2100-overfl... contain several implementations of xrange() in pure Python

Thanks, hadn't seen this.

I cried a little bit on the inside when he said that that the iterator should not be implemented using generators, etc. Writing iterators by hand forces one to turn the iteration code inside out and it can be a painful experience if the logic is anything nontrivial.

But there's a good motivation here: (x)range objects are supposed to be index-able, while a generator (expression) cannot go back once an element has been consumed.

I feel that this kind of combination of lazy evaluation for sequences, together with eager evaluation for imperative code hits a particular sweet spot. Python and Clojure have very nice lazy sequences.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact