I wrote up a small script to test this out for python (at bottom of this post). The results on my laptop are about 8.0 seconds for unsorted and 7.6 seconds for the sorted version. I'm assuming that the discrepancy for python is much smaller due to the slow nature and high overhead of the language (or at least the way I've used it here), but I would be interested to know: how would one go about finding out what the python interpreter is doing beneath the surface?
Edit: After running with a wider range of parameters, it seems that the difference is always roughly the same order of magnitude. To investigate further, I included the sort into the second timing to double check and for 3276800 elements it's still a bit faster overall when you sort the array.
#!/usr/bin/env python
import time
import numpy as np
def main(n=32768):
arr = np.random.randint(0, 256, n)
t0 = time.time()
sum1 = do_loop(arr)
t1 = time.time()
arr = np.sort(arr)
t2 = time.time()
sum2 = do_loop(arr)
t3 = time.time()
assert sum1 == sum2
print(" Unsorted execution time: {} seconds".format(t1-t0))
print(" Sorted execution time: {} seconds".format(t3-t2))
def run_many(func):
def wrapper(arg):
for t in range(1000):
func(arg)
return func(arg)
return wrapper
@run_many
def do_loop(arr):
tot = 0
for i in arr:
if i >= 128:
tot += i
return tot
if __name__ == '__main__':
main()
As you can see, the pure Python version is faster than the Numpy version, and also has a larger margin between unsorted and sorted. PyPy is of course faster than both, and also has an even greater margin between unsorted and sorted (2.63x faster).
Good call on going pure python. To take this a bit further I made your changes and used numba with @jit(nopython=True, cache=True), for some interesting results. If I do include the sorting into the timing:
I'm not quite sure what you mean by "using python for micro benchmarks usually doesn't work". To the extent that "microbenchmarking" is useful at all, then sure it works. There are just different considerations from other languages, and it depends on which Python implementation and libraries you're using.
Also, while I'll grant you that using Python implies that convenience, programmer time, and/or development speed is a higher priority than performance, that doesn't at all mean that people who use Python "don't care about performance".
I don't think it's related to branch prediction in particular, just spooky action-at-a-distance where a change makes seemingly unrelated code much slower or faster.
What makes you say this? The author provides an extremely persuasive case, with perf timings. Additionally updated it with compiler enhancements from GCC and Intel that remove the branch mispredictions entirely and do perform as predicted.
I think you misunderstood me. I agree that the StackOverflow post has to do with branch prediction. I just meant that I don't think that's why the earlier poster thinks it's parallel to the situation described in the article.
Why is it faster to process a sorted array than an unsorted array?
https://stackoverflow.com/questions/11227809/why-is-it-faste...