Hacker News new | past | comments | ask | show | jobs | submit login
Brilliant or insane code? (stavros.io)
250 points by StavrosK on Sept 1, 2013 | hide | past | favorite | 107 comments

This is in the zip documentation as the way of solving this problem. Sort of surprised the author didn't look up the documentation before writing what is otherwise a very good post.

    The left-to-right evaluation order of the iterables is guaranteed. This makes
    possible an idiom for clustering a data series into n-length groups using

Recently I was downvoted 2-3 times on S.O. for an answer that was claimed to be non-idiomatic. So, I cleaned it up, but it really irked me. What I had written was totally fine. It shouldn't have hurt anyone's eyes. It was direct. It was in-your-face. It was not magic.

Reading this post brought back that feeling.

If people don't understand a completely valid and terse way of coding something, sometimes instead of bothering to understand it, they will bash it. Sometimes, this is a totally valid way to vent frustration, and then they learn something new, and all is good. But sometimes, it just gets left as "this is wrong" and then someone else thinks it is wrong, and so on. That is wrong, and tech leads or architects that enforce such crap will bug the living shit out of good developers and lose them.

I appreciate clarity. But, terse one-liners can be just as clear if not clearer than code that unnecessarily adds more methods/functions/names/local vars and claims to be "more testable", etc.

You shouldn't have to sacrifice the ability to be terse and clear at the same time. Testing is no excuse for code bloat. You can likely write a test that executes the behavior without having to atomize it. Assess the amount of production and test code you are writing. How much more code are you actually having to write in order to test, both in the tests themselves and in the code which you are having to test?

It happened here with Python and it happens in many languages. It even happens with laws and regulations in government. If someone gets the same thing done just as ethically but without the bureaucracy, just appreciate it as another perhaps better way of doing something. Don't bash it publically because you don't understand it.

I think the mentality can be extended to most code-reading. Don't get it by skimming? Must be crap code. I only got away from that when I changed my litmus test towards whether I could write on top of the codebase successfully, not how it looked. Today my only real point of judgment about the look of code is whether it's written in a style that increases average error rate.

w/r to Python in particular, it has a history of ending up with idioms that are "tricky" and not particularly more or less terse than other techniques, but are able to exploit the standard library functions to get a faster-running result.

This is, of course, at odds with the motto of "there should be only one (obvious) way to do it," so every experienced Python programmer has to internalize a small dictionary of idiomatic one-liners for these exceptional cases. (Fortunately, it's not that big. I can only think of three or four off the top of my head.)

style that increases average error rate

Errors per LOC is allegedly constant, across all languages.

I also consider the cost of change when designing things.

I once created an HL7 wrapper that was a marvelous thing of beauty. Fluent API, clever use of the type system. But no one could maintain it, including me. It had too much magic. So I scrapped it, went with a dumber implementation.

> Errors per LOC is allegedly constant, across all languages.

If this is true, then a more verbose style will have a higher error rate. (more LOC to do the same task -> more errors)

>>Errors per LOC is allegedly constant > then a more verbose style will have a higher error rate.

I think that's what he means.

> Errors per LOC is allegedly constant, across all languages.

This implies that more expressive languages(more expressions per line) are less prone to errors.

You conclusion would be correct if he had stated that error per expression was constant between languages.

Don't confuse correlation with causation. It may not be that more lines is the cause, but that the method of thinking that some languages require you to think in is error prone, and it just so happens that such a language also needs more LoC.

> Don't get it by skimming? Must be crap code.

Well, there is some truth to that. Anecdotally, most lines of code will be read many times before they are changed/discarded, and most of this reading will be skimming, where the reader is either: 1) trying to understand the structure of the code, or 2) trying to figure out where to make modifications.

Code that's hard to understand quickly (e.g. by skimming) is technical debt. I think a good litmus test would include not just the effect on error rate, but also the effect on the time it takes to understand the code and to make changes to it.

claimed to be non-idiomatic

I write specialist-o-matic code. Lotsa DRY, composition, iterators, "fluent" APIs.

Makes me an unapologetically poor general purpose pair programming partner.

terse and clear at the same time

Concision is a virtue.

I think the problem with terse and clear is that, typically, it solves the problem elegantly only in the exact current context in which the code is written. Meaning, as the software progresses, that previous elegant, perfect, concise code is rendered invaluable. In such cases, due to the size of the codebase, it becomes easier/necessary to work around that previously elegant code in the form of if blocks etc... which inevitably leads to crap code.

At least, that is my experience in the world of web-based programming. Scripts and other single-purpose code implementations are another case entirely.

I think what you're talking about is a different problem. Terse and clear code that solves precisely the problem it means to solve is a good thing, provided the overall design of your system is flexible. If you design the proper components for an adaptable system, making each component terse should make maintenance simpler.

Concision is a virtue.

Can be.

Concision is as short as possible, but still conveys the idea. If you don't get the idea from a piece of code then the code is not concise. It's just short. Concision is a virtue.

Or do you mean there's some value in redundancy? I'd be interested in an example. Even java added the <> to avoid the repeated template parameter,

    Foo<Things> x = new Foo<Things>();

    Foo<Things> x = new Foo<>();

You can just use projectlombok.org and

    val x = new Foo<Things>();

By the way, I was recently surprised by a similar pattern in linux, where you can do:

    (echo -e "one\ntwo\nthree\nfour") | paste -d, - -
to get result of:

by exploiting a similar trick, i.e. reading two times ('- -') from the same iterator (STDIN of 'paste')

Thanks for posting this. For reference, paste is a standard UNIX utility, whose purpose is to do all sorts of useful things not limited to 2-tuples:

  $ seq 1 9 | paste - - -
  1	2	3
  4	5	6
  7	8	9

No need for parentheses,

  echo -e "one\ntwo\nthree\nfour" | paste -d, - -
works just fine

I am also surprised I didn't look it up. I guess that I didn't expect that anything more than how zip() works would be there, and I already knew that.

I've updated the post with this, another commenter pointed it out. Thanks!

The fact that this idiom is documented in the official Python documentation makes me cringe, knowing that this code is not what I would call explicit.

This is one of those cases where Ruby does it better (s.each_slice(3).to_a).

Kind of ironic for a language whose motto is "explicit is better".

It does say "possible", not "recommended" or even "sane"...

    i = iter(array)
    return zip(i, i, i)

There you go. All but neceessary magic gone with just one line more.

A much clearer solution, I would also add a comment reminding people how zip works.

Really? What's the point in explaining a standard language function that they should know, and can just Google if they don't remember?

It's using a somewhat-obscure guarantee that doesn't come up in normal usage of the function - namely, that it will always get the iterator values in left-right order.

it depends on the kind of code you write. i guess if you're writing web server stuff, documenting this makes sense. but in maths-related code, it's pretty standard. you use something very similar to transpose matrices, for example.

(and the original article is dealing with coords in graphics, which is "maths-related code" in my book, but perhaps not in everyone's)

The original source is taken from a medical image DICOM viewer. In my limited experience as a medical physics student, the people working with these tools would really benefit from a comment explaining the code. They are most definitely not coders, most of them having barely done anything more than write a few matlab scripts.

This is hardly a typical environment. All code would be challenging, even commented code. Excessive commenting would be necessary: write the program again in English.

Yeah, I was thinking the same thing. The only downside is that it isn't generic for variable length chunks.

It is clearer, but uses a fixed number of arguments, so may be unsuitable for some applications.

The current function has the same restriction. In their case it's in the form of " * 3 ".

That `3` can be trivially replaced with a variable in a way that `(i, i, i)` cannot. But still, the refactoring of the iterator into a separate variable is the key step which reinforces that the same object is being passed 3 times

You don't have to use the same function to solve every problem.

If you only ever need to chunk a list into 3-tuples, sure. If you want a general solution for chunking a list with zip and an iterator, this solution doesn't scale (look at hartror's and bjourne's posts for something that does).

I think that dependency on argument evaluation order inside zip function smells a bit.

It's OK here, but may bite you with a different function.

    The left-to-right evaluation order of the
    iterables is guaranteed.

Yes, but only for zip, which I believe was the parent posters point. With another function this technique might not work.

That is infinitely more readable.

Much, much better.

I'm glad Clojure has top-level support for this operation... it's quite flexible too, and the presence of partition-all makes it explicit what you should expect if the sequence doesn't evenly partition.

    user=> (partition 3 [1 2 3 4 5 6])
    ((1 2 3) (4 5 6))
    user=> (partition 3 [1 2 3 4 5 6 7])
    ((1 2 3) (4 5 6))
    user=> (partition-all 3 [1 2 3 4 5 6 7])
    ((1 2 3) (4 5 6) (7))
    user=> (partition 3 3 (repeat 0) [1 2 3 4 5 6 7])
    ((1 2 3) (4 5 6) (7 0 0))

FAO: iamgopal - your account has been dead for over a year for no discernible reason, only people with showdead on can see your posts.

Haskell also has it in the Data.List.Split library, which is handy, although it's easy to define yourself.

λ> chunksOf 3 [1..12]


From Itertools Recipes [6]:

  def grouper(iterable, n, fillvalue=None):
      "Collect data into fixed-length chunks or blocks"
      # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
      args = [iter(iterable)] * n
      return zip_longest(*args, fillvalue=fillvalue)
- What is the most “pythonic” way to iterate over a list in chunks? [1]

- Idiomatic way to take groups of n items from a list in Python? [2]

- Python “Every Other Element” Idiom [3]

- Iterate an iterator by chunks (of n) in Python? [4]

- How do you split a list into evenly sized chunks in Python? [5]

[1]: http://stackoverflow.com/questions/434287/what-is-the-most-p...

[2]: http://stackoverflow.com/questions/2461484/idiomatic-way-to-...

[3]: http://stackoverflow.com/questions/2631189/python-every-othe...

[4]: http://stackoverflow.com/questions/8991506/iterate-an-iterat...

[5]: http://stackoverflow.com/questions/312443/how-do-you-split-a...

[6]: http://docs.python.org/3/library/itertools.html#itertools-re...

Figured I would add my own [7]. I know the example prompt might seem a bit specific, but it can come in really handy sometimes (for example, iterating through get parameters). Always a fan of idiomatic and one-liner python.

I should mention that I ended up using the fourth version (seemingly the slowest) but it is actually the fastest depending on your input -- as the length of the elements gets larger, the fourth method tends to vastly outperform the others.

[7] http://stackoverflow.com/questions/16685545/elegantly-iterat...

I think the code is pretty ok, except for the stupid name, docstring and that it is a method and not a free function.

    def chunks(seq, n):
        "groups the elements of the seq into a list of n-sized chunks."
        return zip(*[iter(seq)]*n)

The original docstring is a complete red herring... yours is much better, but I'd go one further and include an example.

It does not rely on an implementation detail, that is how iterators work. He's just supplied the same iterator to a function which consumes iterators... that's exactly the expected behaviour.

In fact official python docs say:

"The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip([iter(s)]n)."


The fact that zip evaluates its arguments in order is an implementation detail. It could evaluate them in reverse order, in which case this code would not behave as expected.

That's what I meant, thanks for clarifying. It does seem that the zip() docs make a guarantee of left-to-right evaluation, though, so my apprehension proved unfounded.

You can also do

    zip(arr[::3], arr[1::3], arr[2::3])
which is nearly as fast but doesn't work with iterators. If you want to use iterators you could also do

    zip(islice(arr, 0, None, 3), islice(arr, 1, None, 3), islice(arr, 2, None, 3))
which is a tad slower.

It wouldn't have occurred to me to do this a different way. Isn't this a very basic use of zip()?

The part that annoys me is that the docstring mentions "dictionaries" when I see no `dict`s.

Yeah, looks like a remnant that was never updated.

It's not brilliant.

This accomplishes the same thing without being hard to understand:

    from itertools import islice

    iterator = iter(array)
        while True:
           yield list(islice(iterator, 3))
    except StopIteration:

This crashed my computer (it's an infinite loop that needs too much memory, apparently), and there's a mistake (the i is not defined).

Turns out that islice doesn't raise an IterationError, it just returns an empty list.

Fixing the problems, it runs in 237 μsec per loop, around 23 times more than the zip version.

Haha. I feel very silly now. I was just about to respond with the same thing.

    while True:
        result = list(islice(iterator, 3))
        if not result:
        yield result

That's what I did, I edited my post above. It's around 23x slower, although I had to use a function invocation for %timeit as well, so it's probably a bit faster in practice.

One more for kicks:

    n = iter(array).next
    [(n(), n(), n()) for _ in xrange(len(array) / 3)]

Sure, there are many ways to do it, but I think the author was going for speed here.

I'd be surprised if my way is slower. Any time you unpack into a function such as zip() python has to create an intermediary list to store all the results before calling the function.

I was a bit wrong about that. For some reason I imagined that there would be quite a few args being passed to zip, when in fact there are just the three iterators. It does create a temporary list, but it's so small it's negligible. Using izip wouldn't really change anything.

Exactly, I would be very surprised if the zip version was faster. One of the first steps to optimization in python is moving everything you can to generators and using of itertools.

The OP's question of is this genius or bad is clear in that regard: it is bad, due to not being the proper optimization direction, but it is interesting.

I would be very surprised if it weren't. The zip version has the hot path written in C, the other version has it written in Python with a bunch of exception handling/list assignment code, etc. The original method is just zip(iterator, iterator, iterator), which is probably as fast as anything can be.

That is a common misconception. Moving to iterators adds a function call while list creation in C is quite fast. Every case has to be tested for performance.

I always thought the reason iterators are used in preference to lists was due to the memory advantages, not the performance.

There's also flexibility improvements to do with lazy evaluation, and it makes the Twisted guys not hate you.

The memory advantages are key to performance when you have a non-trivially sized dataset, in my experience.

In my experience, building a non-trivially transformed sequence with the builtin C-backed list and list comprehensions vs iterators and generators, the iterators and generators win.

Maybe I'm doing crazy stuff, though!

Did you test and time the itertools version?

I did not, but I might now.

EDIT: See update above.

like dict(zip( or dict(getmembers(asdf)).keys(), it's idiomatic code. it wouldn't have occurred to me the first time i had to write such a function, but now that i've taken a few moments to read the article i find it clearer than the list comprehension version (because the constant only appears once) and nicer than the numpy version in that it doesn't require an extra dependency.

may save a few keystrokes some rainy day. good post.

Even more interesting to me:

In [3]: ar = [1, 2, 3, 2, 4, 6, 3, 5 ,7, 3, 5, 8]

In [4]: %timeit zip([iter(ar)]3) 100000 loops, best of 3: 2.02 us per loop

In [5]: %timeit zip(ar[0::3], ar[1::3], ar[2::3]) 1000000 loops, best of 3: 1.37 us per loop

In [6]: %timeit zip((iter(ar),)3) 1000000 loops, best of 3: 1.34 us per loop

From which I conclude: - zipping slices is even more efficient, and arguably easier to grok - but you get about the same runtime by multiplying a singleton tuple rather than a list

However if you want to generalize the chunk size, multiplication seems to win out over slicing (with tuples still being more efficient than lists):

In [7]: chunk1 = lambda n, it: zip([iter(it)]n)

In [8]: chunk2 = lambda n, it: zip((iter(it),)n)

In [9]: chunk3 = lambda n, seq: zip(*(seq[i::n] for i in xrange(n)))

In [10]: %timeit chunk1(3, ar) 100000 loops, best of 3: 2.32 us per loop

In [11]: %timeit chunk2(3, ar) 1000000 loops, best of 3: 1.83 us per loop

In [12]: %timeit chunk3(3, ar) 100000 loops, best of 3: 3.55 us per loop

Maybe I'm misreading it, but to me the far more heinous crime than using difficult code (setting aside that it's in the docs) is that the docstring is a total lie.

    """Parses an array of xyz points and returns a array of point dictionaries."""

    'Only, it doesn’t really. It takes an iterable of points...and returns an iterable of 3-tuples of groupped points'
The wrongness of it would cause me to double-take, because even if I were familiar with this usage, it isn't what the comment suggests is happening. A docstring more like:

    """Parses an iterable of values [x,y,z,x,y,z...] and returns an iterable of 3-tuples: [(x,y,z),(x,y,z)...]"""
Would be a lot more clear by simple virtue of truth, even if it didn't explain the code step by step.

I made a few more interesting (to me) measurements. As always, you have to measure your performance with your actual input data to see what's "best".

Test 1: Boring, small array of integers

    In [28]: arr = range(0, 300)

    In [29]: %timeit [(arr[3*x], arr[3*x+1], arr[3*x+2]) for x in range(len(arr)/3)]
    10000 loops, best of 3: 27.2 us per loop

    In [30]: %timeit numpy.reshape(arr, (-1, 3))
    10000 loops, best of 3: 45.2 us per loop

    In [31]: %timeit zip(*([iter(arr)]*3))
    100000 loops, best of 3: 6.25 us per loop
This roughly matches the article's timing ratios, so far so good.

Test 2: Use numpy's random number generation to get a small array of floats

    In [32]: arr = numpy.random.ranf(300)

    In [33]: %timeit [(arr[3*x], arr[3*x+1], arr[3*x+2]) for x in range(len(arr)/3)]
    10000 loops, best of 3: 54 us per loop

    In [34]: %timeit numpy.reshape(arr, (-1, 3))
    1000000 loops, best of 3: 1.06 us per loop

    In [35]: %timeit zip(*([iter(arr)]*3))
    10000 loops, best of 3: 39.7 us per loop
numpy is two orders of magnitude faster here; it's evidently using a highly optimized internal codepath for random sequence generation, which I'd guess is a common thing to do in numeric analysis. I assume it's using a generator, so there's no actual array being created, blowing up the CPU cache lines etc.

Test 3: Verify that analysis by interfering with numpy

    In [36]: arr = [x for x in numpy.random.ranf(300)]

    In [37]: %timeit [(arr[3*x], arr[3*x+1], arr[3*x+2]) for x in range(len(arr)/3)]
    10000 loops, best of 3: 26.2 us per loop

    In [38]: %timeit numpy.reshape(arr, (-1, 3))
    10000 loops, best of 3: 48.5 us per loop

    In [39]: %timeit zip(*([iter(arr)]*3))
    100000 loops, best of 3: 6.55 us per loop

Test 4: Larger data set, no interference

    In [40]: arr = numpy.random.ranf(3000000)

    In [41]: %timeit [(arr[3*x], arr[3*x+1], arr[3*x+2]) for x in range(len(arr)/3)]
    1 loops, best of 3: 624 ms per loop

    In [42]: %timeit numpy.reshape(arr, (-1, 3))
    1000000 loops, best of 3: 1.06 us per loop

    In [43]: %timeit zip(*([iter(arr)]*3))
    1 loops, best of 3: 335 ms per loop
The numpy time doesn't change at all from test 2 despite the larger size, but the others suffer. Again, I suspect numpy is being intelligent here; my guess is that it doesn't actually apply the function and generate the real output, it just wraps the random generator in another one.

Test 5: Larger data set, interfering with numpy

    In [44]: arr = [x for x in numpy.random.ranf(3000000)]

    In [45]: %timeit [(arr[3*x], arr[3*x+1], arr[3*x+2]) for x in range(len(arr)/3)]
    1 loops, best of 3: 321 ms per loop

    In [46]: %timeit numpy.reshape(arr, (-1, 3))
    1 loops, best of 3: 354 ms per loop

    In [47]: %timeit zip(*([iter(arr)]*3))
    10 loops, best of 3: 83.6 ms per loop
There we go; we're back to roughly the original timing ratios.

So, surprise! You always have to measure. Measure, measure measure. My bias is to write code first for legibility and modifiability, and then optimize hot spots if needed (and add comments, please, when you do so).

Without doing deeper analysis I'd say one moral of the Python story is, this shows the potential power of generators. But in real-world data sets this isn't always ideal -- is it faster to load up the whole data set in memory and blast through it, or load it from disk on demand with a generator? In really high performance scenarios, is it faster to preprocess the data to fit into the CPU's cache lines? You can't tell without measuring, and you have to measure in the environment you're deploying to, since the answer may be different on a machine with 1GB RAM vs. one with 128GB RAM, or 32KB L1 cache vs. 8KB.

The numpy example becomes fast when you use numpy arrays. Try %timeit numpy.array(arr); numpy.reshape(arr, (-1, 3));

and then just %timeit numpy.array(arr), you'll see that the reshape takes no time at all. Type conversion from python list to numpy array is what kills the performance.

A point of clarification here - numpy's reshape operation stays fast as long as the array is a numpy array.

Which is exactly what the parent comment was all about - the author figured that the reason numpy was significantly faster was because it was accessing / working with the data in a different fashion.

So, in order to test that theory, he converted the numpy.array into a normal python array before he proceeded to do any timed operations with zip vs. numpy.reshape, etc.

This is a more realistic playing field if you're considering data that was created outside of the numpy environment. At some point, if you're going to work with numpy.reshape, it will need to be type converted / "imported" into numpy data types.

For the purposes of this test, it's much more "fair" to include both the time numpy spent on splitting the array as well as that conversion time. The reshape process in numpy had essentially O(1) time with native data types indicating that it had done some behind the scenes work that allowed for such speed. The parent example is much more realistic in capturing the time of the behind the scenes work by forcing each method to start from the same exact same data objects.

My reply was in response to the statement "numpy is two orders of magnitude faster here; it's evidently using a highly optimized internal codepath for random sequence generation", which is false, it's not because of highly optimized internal codepaths for random sequence generation, it's because the code produced a numpy array (or didn't have to do type conversion). But I agree, when using numpy to produce a timing comparison, it would be fair to start with a numpy array, or to show the time involved in the creation of the array.

Thanks for your comments. I hope it didn't sound like I was negatively comparing numpy's array/sequence operations to anything. I know very little about numpy, and I assume that "real" numpy solutions don't look anything like what's being discussed here. I only included those measurements since the article's author did.

To clarify my points a bit, the optimizations I alluded to (in "highly optimized internal codepath") were meant to include things like using a generator, i.e. at no point is there an actual array of input random numbers. The fact that in numpy the 300-element "array" and the 3,000,000-element "array" had identical timings suggests exactly that; I disagree that it's an issue of internal representation, unless the concept of a numpy array subsumes the concept of a generator, in which case I think we're all saying the same thing.

That kind of optimization is only possible in this case because by the definition of randomness nobody could know what the values were until they were enumerated, so it's 100% transparent to use a generator. That's not how real-world data works, hence my forced-native-array measurement and pudquick's reply.

Since you're looking at alternatives... you might as well include: zip( arr[0::3], arr[1::3], arr[2::3] ) It's pretty fast, especially if the array is pre-allocated rather than a generated iterator.

We all love short and fast. But this is definitely an interesting approach. I'd love to see similar approaches to problems if you guys can point out to some.

You might like the Python Infrequently Answered Questions: http://norvig.com/python-iaq.html (Even if it's a bit old.)

Thanks for that. An enjoyable read. Shame it's so out of date now.

I would also, I love this sort of thing. How about:

    >>> some_boolean = False
    >>> ["Thing 1", "Thing 2"][some_boolean]
    "Thing 1"

Yeah, but unlike the hack in the article, the more readable version of that code,

  "Thing 1" if some_boolean else "Thing 2"
is also almost twice as fast(775 vs 1340 ns, on my machine).

I was more posting it for the "cute hack" value, rather than speed. Speed was significant in the post because the author needed their library to be fast.

the problem with this expression is that it doesn't "short circuit". Both "Thing 1" and "Thing 2" (in this cases object, but the can be function calls) are evaluated before the "some_boolean" usage.

That can be an advantage in some cases, however. Not generally, though.

In particular, if you want the side effects of both operations. Trying to write multiple things while checking if any of them failed, for instance.

Short but I doubt this will be faster as noted.

Insane, because it relies on the zip implementation detail. If you cared about a measly factor of 4 in performance you wouldn't be using python anyway.

Any developer can care about performance. And should never be mocked for achieving above 200% performance compared to an alternative implementation... Let alone 400% performance compared to the alternatives.

If that 400% performance improvement was achieved by sacrificing code readability, on a project that has deliberately chosen to prioritize readability over performance, then yes they should be mocked. Or at least told not to do it.

It's not an implementation detail, the order is guaranteed by the spec.

> by the spec

by the implementation you mean.

See: http://stackoverflow.com/questions/1094961/is-there-a-python...

As mentioned by others, the documentation for zip explicitly mentions this as a Python idiom: http://docs.python.org/2/library/functions.html#zip

My vote: Code is unreadable crap written to be discarded later. Spend some time typing it out. You're writing for the x # of people (including the author) that will have to read it countless times in the future. I would be better if they didn't have to think about the "cleverness" when they come across it. Unless of course, you're participating in an obfuscated code contest.

The pairwise recipe function in itertools demonstrates just that (using tee instead of multiplying the iterator):


  def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

I wouldn't call it insane, neither brilliant. I use this function to split a sequence into pairs (or triplets, or fourths, etc):

  def paired(t, size=2, default=None):
    it = iter(t)
    return itertools.izip_longest(*[it]*size, fillvalue=default)
I use it in a formatter which outputs alphabetized data in columns, where the order should run down the columns instead rowwise.

If it's actually faster, the speed actually matters, and you wrap it with a well commented explanation and descriptive name, I'd consider it reasonable. Otherwise, just write out what you're doing. It's definitely clever, but fewer lines of code is not an optimization.

My question is, at what point does it become incumbent on the language to provide a more meaningful idiom for this kind of expression? I think the point at which constructs like this leak into your official documentation is a pretty good line to start thinking about it. :)

>"there should be one — and preferably only one — obvious way to do it"

Oh. Well. OK.

An aside: the animated expansion of the code blocks make the site feel twitchy

How do you mean?

Brilliant and insane?

Hah, yep, I hadn't considered it can be both.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact