
Incrementally improving the performance of a Python script - Ivoah
http://mycode.doesnot.run/2018/04/11/pivot/
======
sifoobar
This is just a reflection from recent work on interpreter internals, but I
figured it could be relevant to someone walking the same paths.

Allocating registers for all local vars statically means scopes have different
sizes, which in turn complicates slab allocation and/or reuse. In return for
being easier to reason about (except for the main scope issue) and saving
space.

I opted for a fixed number of linearly assigned registers per scope in Snigl
[0]; once the limit is reached, remaining variables are stored in a table.
Which means I sort of get both, since additional scopes may be added using {}
(it could make sense to add a scope: keyword to Python) if that becomes an
issue.

It's all compromises, all the way down.

[0] [https://gitlab.com/sifoo/snigl](https://gitlab.com/sifoo/snigl)

------
kiddico
Interesting. I guess I've been accidentally optimizing when I put everything
in a main() call after an "if __name__..."

~~~
cheez
Same

------
jacobsimon
Python has some unusual performance behaviors. IIRC you can also speed up the
performance of your program a lot by assigning intermediate variables instead
of referencing properties, for example:

[A.b[i] for i in range(100)]

is a lot slower than:

B = A.b

[B[i] for i in range (100)]

~~~
mirashii
Unusual is a strange qualifier here. Removing a dereference leading to a
speedup is probably one of the few almost universal optimizations.

~~~
gameswithgo
People do it in C# a lot with Count and Length on Lists and Arrays and it
ruins array bounds check elision.

It is something that compilers do for you, in this context, in most
programming environments.

~~~
uryga
Unfortunately, Python being hopelessly dynamic, `A.b` could involve executing
arbitrary code and side effects depending on what `A` is. So hoisting `A_b =
A.b` out of the loop may produce different results and thus can't be done
automatically without some kind-of incredibly (impossibly?) smart static
analysis :(

(though perhaps a JIT could perform this optimization for the usual, non-
surprising path)

------
monochromatic
I don’t think I understand the problem. We have n distinct integers, and the
array has already been partitioned. Why isn’t the answer just that there was
exactly one pivot that could generate that partition?

~~~
jonathankoren
I didin't really understand it either. I think the problem is taking in an
array that's already been partitioned around a pivot, and then trying to
figure out how many numbers could have been the pivot. The reason why this
isn't just the length of the array is because the array is _already_ pivoted.
The reason why you have to use a linear scan is because the array is still in
unsorted order.

For [1, 2, 3, 4, 5] there are 5 pivots.

For [1, 2, 3, 5, 4] there are 3 pivots (1, 2, and 3)

For [2, 1, 3, 5, 4] there is only 1 (3).

~~~
monochromatic
That makes more sense. Thanks.

------
indweller
Found something weird with codes posted in the links. I am getting different
outputs for the intermediate and the final codes for the input of "5;
1,2,3,4,4". Can someone help?

1\. [https://imgur.com/a/u8O65AF](https://imgur.com/a/u8O65AF)

2\. [https://imgur.com/a/uHniZof](https://imgur.com/a/uHniZof)

~~~
Yxogenium
> Starting from an array A that has n __distinct __integers

I don't know in what ways they are differents, but these programs were not
designed to work with duplicates in the input. This probably explains the
results.

~~~
indweller
Yeah you're right

------
kevinventullo
Slightly OT from the main takeaway, but I wonder if this could be sped up
further by only doing a single pass, and maintaining a stack (implicitly
sorted) of elements which are greater than everything seen so far, and popping
them off when they're greater than the current element.

