This reminds me of this classic StackOverflow question: *Why is it faster to pro...

MooMooMilkParty · on Feb 20, 2017

I wrote up a small script to test this out for python (at bottom of this post). The results on my laptop are about 8.0 seconds for unsorted and 7.6 seconds for the sorted version. I'm assuming that the discrepancy for python is much smaller due to the slow nature and high overhead of the language (or at least the way I've used it here), but I would be interested to know: how would one go about finding out what the python interpreter is doing beneath the surface?

Edit: After running with a wider range of parameters, it seems that the difference is always roughly the same order of magnitude. To investigate further, I included the sort into the second timing to double check and for 3276800 elements it's still a bit faster overall when you sort the array.

  #!/usr/bin/env python
  import time
  import numpy as np
  
  def main(n=32768):
      arr = np.random.randint(0, 256, n)
      t0 = time.time()
      sum1 = do_loop(arr)
      t1 = time.time()
  
      arr = np.sort(arr)
      t2 = time.time()
      sum2 = do_loop(arr)
      t3 = time.time()
  
      assert sum1 == sum2
      print(" Unsorted execution time: {} seconds".format(t1-t0))
      print(" Sorted execution time:   {} seconds".format(t3-t2))
  
  def run_many(func):
      def wrapper(arg):
          for t in range(1000):
             func(arg)
          return func(arg) 
      return wrapper
  
  @run_many
  def do_loop(arr):
      tot = 0
      for i in arr:
          if i >= 128:
              tot += i
      return tot
  
  if __name__ == '__main__':
      main()

Cyphase · on Feb 20, 2017

I tried this on my machine, then tried a pure Python version; I only changed three lines, to:

  import random
  ...
  arr = [random.randint(0, 256) for x in range(n)]
  ...
  arr = sorted(arr)

Here are my times:

  $ time python2.7 hackernews_13682929.py 
   Unsorted execution time: 4.33348608017 seconds
   Sorted execution time:   4.09405398369 seconds
  
  $ time python3.5 hackernews_13682929.py 
   Unsorted execution time: 4.4200146198272705 seconds
   Sorted execution time:   4.188237905502319 seconds
  
  $ time python2.7 hackernews_13682929_purepython.py
   Unsorted execution time: 0.981621026993 seconds
   Sorted execution time:   0.832424879074 seconds
  
  $ time python3.5 hackernews_13682929_purepython.py
   Unsorted execution time: 1.3005650043487549 seconds
   Sorted execution time:   1.157465934753418 seconds
  
  $ time pypy hackernews_13682929_purepython.py
   Unsorted execution time: 0.239459037781 seconds
   Sorted execution time:   0.0910339355469 seconds

As you can see, the pure Python version is faster than the Numpy version, and also has a larger margin between unsorted and sorted. PyPy is of course faster than both, and also has an even greater margin between unsorted and sorted (2.63x faster).

MooMooMilkParty · on Feb 20, 2017

Good call on going pure python. To take this a bit further I made your changes and used numba with @jit(nopython=True, cache=True), for some interesting results. If I do include the sorting into the timing:

    Unsorted execution time: 0.2175428867340088 seconds
    Sorted execution time:   1.133354663848877 seconds

And if I don't:

    Unsorted execution time: 0.21171283721923828 seconds
    Sorted execution time:   0.08376479148864746 seconds

rasz_pl · on Feb 20, 2017

Using python for micro benchmarks usually doesnt work. The mere fact you use python says you dont care about performance, but convenience.

Just for fun add "randomvariable = 1"(you know, 1 one cycle machine op) in a tight loop and watch tenths of a second added to your "benchmark".

Cyphase · on Feb 20, 2017

I'm not quite sure what you mean by "using python for micro benchmarks usually doesn't work". To the extent that "microbenchmarking" is useful at all, then sure it works. There are just different considerations from other languages, and it depends on which Python implementation and libraries you're using.

Also, while I'll grant you that using Python implies that convenience, programmer time, and/or development speed is a higher priority than performance, that doesn't at all mean that people who use Python "don't care about performance".

kccqzy · on Feb 20, 2017

How is this article related to branch prediction? I don't see any connection here.

nitinreddy88 · on Feb 20, 2017

I bet you didnt read the answer for that :)

Retr0spectrum · on Feb 20, 2017

I read it a while ago, and going on what I remember, I still can't see the connection. Would you mind explaining?

umanwizard · on Feb 20, 2017

I don't think it's related to branch prediction in particular, just spooky action-at-a-distance where a change makes seemingly unrelated code much slower or faster.

alexeldeib · on Feb 20, 2017

What makes you say this? The author provides an extremely persuasive case, with perf timings. Additionally updated it with compiler enhancements from GCC and Intel that remove the branch mispredictions entirely and do perform as predicted.

umanwizard · on Feb 20, 2017

I think you misunderstood me. I agree that the StackOverflow post has to do with branch prediction. I just meant that I don't think that's why the earlier poster thinks it's parallel to the situation described in the article.

alexeldeib · on Feb 21, 2017

Ah, my mistake. I thought the OP a bit further up didn't understand the relation of the stack overflow answer to the array question.