
Fast Python loops - dorsatum
https://www.python.org/doc/essays/list2str/
======
cjhanks
Or you could just use PyPy and see these numbers:

    
    
        ('f1',)
        0.328
        ('f2',)
        0.525
        ('f3',)
        0.269
        ('f4',)
        0.297
        ('f5',)
        0.041
        ('f6',)
        0.188
        ('f7',)
        0.103
    

Turn into these

    
    
        ('f1',)
        0.06
        ('f2',)
        0.08
        ('f3',)
        0.115
        ('f4',)
        0.063
        ('f5',)
        0.027
        ('f6',)
        0.065
        ('f7',)
        0.024

------
srssays
According to archive.org this essay is from _at most_ June 2006, which would
put it at Python 2.4 or earlier. The specific performance characteristics of
Python will have changed greatly in the intervening period.

~~~
jwilk
The last sentence (grep for "since this essay was written") suggests that the
article was written before the 'B' typecode was added to the array module.

This typecode was added in Python 1.5.

------
tyingq
I wonder if bytearray([97,98,99]).decode('latin-1') is faster still.

Edit: Yup, it is. 3x faster.
[https://gist.github.com/anonymous/18e372e8d0173e77b5c405920d...](https://gist.github.com/anonymous/18e372e8d0173e77b5c405920d4d3080)

~~~
sametmax
Was thinking exactly the same. Plus you save an import. And no need to wrap it
in a function since it's a short method call on a built-in. Would
bytearray([97,98,99]).decode('ascii') be even faster ?

~~~
tyingq
It would fail when you pass it 128. He's passing in values from 0-255. "Ascii"
is sort of a misnomer in his post.

~~~
jstimpfle
It's better to fail than to convert something that probably isn't latin1!

~~~
mark-r
The latin1 codec doesn't fail for any values in the 0-255 range. It probably
_should_ , since there are values that don't map to a valid character. I don't
know if this is deliberate and guaranteed or just an artifact of the current
implementation. There should have been a 'byte' encoding that was explicitly
made for 1:1 conversions.

------
jMyles
A big problem here, though, is that

> array.array('B', list).tostring()

is not terribly readable or beautiful.

In fact, I think I'd need a comment to explain that it's typecasting elements
of a list strings to ints.

Optimizations like this are an identity crisis for Python (and have been for a
long time, as evinced by the age of this essay): Is it focused on being human-
readable and otherwise compliant with the Zen of Python?

Or is it focused on exposing high-performance variants for every way of doing
something?

...and while it may be tempting to say that the answer is "both," this removes
"one obvious way" as a lingual basis for Python.

Ultimately, there needs to be a reconciliation wherein these high-performance
variants are folded back in (even if it means cheating in the implementation)
to the idiomatic expressions.

Type hints and the evolving async syntax are additional considerations in this
arena.

I know that this has been a topic of conversation at the language summit in
the past; I surmise it will be very much so next month.

~~~
pekk
The people complaining about performance are always going to be louder.

~~~
jMyles
Sure, but what I'm saying is that, if the underlying philosophy of Python
holds water, then we need to find ways of to overcome this tug-of-war and
ensure that the most beautiful ways are also the most (or among the most)
performant.

------
inlineint
> if you're considering different versions of an algorithm, test it in a tight
> loop using the time.clock() function.

...but if you are anyway experimenting with your code in a Jupyter notebook,
then you could just use %timeit [1] or %time [2] magics to measure execution
times and benchmarks without writing any additional code.

[1] [https://ipython.org/ipython-
doc/3/interactive/magics.html#ma...](https://ipython.org/ipython-
doc/3/interactive/magics.html#magic-timeit)

[2] [https://ipython.org/ipython-
doc/3/interactive/magics.html#ma...](https://ipython.org/ipython-
doc/3/interactive/magics.html#magic-time)

------
wyldfire
Virtually every discussion of Python and performance should mention PyPy. If
for no other reason than to disqualify it "doesn't work with deployment
feature x or customer requirement y."

It's often a "free" speedup and generally Just Works.

------
RhysU
I wish the Python community smoked its own shit: "There should be one-- and
preferably only one --obvious way to do it." I have never observed either
"one" or "obvious" in my dealings with the language.

------
orf
I think this is the most idiomatic way of doing it in modern python:

> "".join(chr(x) for x in list_of_ints)

This article is really really really out of date

~~~
tantalor
The article is about performant python, not idiomatic python.

~~~
kixiQu
Sure, but I have to say I felt very distracted by his not even _mentioning_
the idiomatic approach.

------
bogomipz
The author states:

"There's a general technique to avoid quadratic behavior in algorithms like
this. I coded it as follows for strings of exactly 256 items:"

    
    
      def f5(list):
          string = ""
          for i in range(0, 256, 16): # 0, 16, 32, 48, 64, ...
            s = ""
            for character in map(chr, list[i:i+16]):
              s = s + character
              string = string + s
          return string
    

I am not understanding what the technique is or why using a step size of 16 in
the range function is significant. Can anyone enlighten me about this and what
the technique is? Does this technique have a name?

~~~
tfm
The idea is to reduce the amount of redundant copying of characters: you end
up doing a few more concatenations in the outer loop, but the concatenations
in the inner loop are of short strings.

Importantly, if you remove the restriction of the input list being "exactly
256 items", then the method is still quadratic.

A linear-time algorithm for this would copy each input character exactly once,
which is effectively what the method based on array.tostring() does.

The chunk size of 16 is not as significant as the technique of
constructing+concatenating chunks, although it is optimal for input length
256. In general I think you'd want a chunk size about the square-root of the
expected input length, to minimise the number of copied characters.

EDIT: maths

Concatenating strings of length M and N is linear in O(M+N), because that's
how many characters you're copying.

Number of characters copied if you construct a string of length N by
concatenating one character at a time

    
    
      = (0+1) + (1+1) + (2+1) + (3+1) + ... + ((N-1)+1) + (N+1)
      = (N +1)*N / 2
    

Number of characters copied if you construct a string of length N by
concatenating a chunk of length 16 each time

    
    
      = (0+16) + (16+16) + (32+16) + ... + ((N-16)+16)
      = 16*(0+1) + 16*(1+1) + 16*(2+1) + ...
      = 16 * (N/16 +1)*(N/16) / 2
             ^^^^^^^^^^^^^^^
    

This is where the "technique" comes in: although the algorithm is still
quadratic you're effectively moving a constant factor out the front.

Note that you also have the cost of constructing the chunks each time, which
becomes the dominant cost if you have too many chunks.

In general, if you have a length kM string which you construct from k chunks
of length M, the number of characters copied

    
    
      = M * (k+1)*k / 2  +  k * (M+1)*M / 2
    

... which (rounding and integer constraints aside) is minimised for M = k,
i.e. when the chunk size is the square root of the input length. Hence, for
input length 256 we take chunk size 16.

~~~
bogomipz
I see, that makes sense about reducing the constant. Interesting. Thanks for
the great explanation.

------
dorfsmay
No list comprehension?

I thought list comprehension were faster than loops?

~~~
ngoldbaum
List comprehensions didn't exist when this post was written.

~~~
BeetleB
Python 2.0, back in 2000, had them.

~~~
jwilk
The last sentence (grep for "since this essay was written") suggests that the
article was written before the 'B' typecode was added to the array module.

This typecode was added in Python 1.5.

------
zde
In the time frame spent to test these hacks I would have written a perfect C
module that runs circles around it.

~~~
tyingq
Then realized it barfed on nulls, and started over with a more perfect
implementation that took a bit longer.

~~~
zde
Likely not, managed code tends to avoid nulls. They are problem in the return
path though.

------
proyb2
Interesting read! I'm still missing what "B" in the Python code referring to
as well.

~~~
tyingq
array.array isn't a normal python array. It's an optimized type/object where
all the elements of the array are of the same type. The 'B' is a format string
that indicates what type you want the elements to be. B == unsigned char

He's using it because it converts the int 65 to the string 'A', then
toString() to join all the array elements together.

[https://docs.python.org/2/library/array.html](https://docs.python.org/2/library/array.html)

See my other post though, it's not the fastest way to do this in python.

~~~
proyb2
Great thanks! I have assume array.array is similar to 2-dimensional array.

------
leecarraher
doesn't 'b' in array.array('b'... and 'B' denote unsigned and signed
conversion, thus the 0-256 as desired? or am i missing something?

~~~
jwilk
> _Note: since this essay was written, the 'B' typecode was added to the array
> module, which stores unsigned bytes, so there's no reason to prefer g1() any
> more._

------
aptwebapps
I didn't know about the array library, but

    
    
        "".join(map(chr, list))
    

would have been my first choice, partly for style and partly for avoiding
string concatenation (recent speedups aside).

~~~
hayd
or `"".join([chr(ch) for ch in s])`. It's only 2-3x slower than the array
solution, but a lot more rubust.

The main issue IIUC is the original suggestion was O(N^2).

