Speeding up non-vectorizable code with Cython

Veedrac · on Aug 9, 2015

Before jumping to Cython, remove things like exponentiation and all the calls to `ord`:

    def column_to_index(col):
        col_index = 0
        for byte in bytearray(col):
            col_index *= 26
            col_index += byte - 64
        return col_index - 1

FWIW, you _can_ vectorize this, albeit it's neither simple nor fast, due to Numpy's fixed overheads. If you convert thousands of strings at once, though, it's probably reasonably speedy.

    import numpy
    from numpy import arange

    place_values = 26 ** arange(20, dtype=object)[::-1]
    overhead = numpy.add.accumulate(64 * place_values[::-1]) + 1
    def column_to_index(col):
        digits = bytearray(col)
        length = len(digits)

        summation = place_values[-length:].dot(digits)
        return summation - overhead[length-1]

Russell91 · on Aug 9, 2015

Nice article. Shameless plug: readers that would like to use cython outside of the convenient ipython notebook interface should look into runcython [1]. You can run any python file using cython with:

    sudo pip install runcython
    mv main.py main.pyx && runcython main.pyx

[1] http://github.com/russell91/runcython

lorenzhs · on Aug 9, 2015

The type limitations example seems a bit contrived, a spreadsheet with 2^64 is slightly unrealistic. The point remains, though.