
Speed improvements in Python 3.7 - orf
https://hackernoon.com/5-speed-improvements-in-python-3-7-1b39d1581d86
======
btown
Since one of the mentioned improvements is performance of the builtin regular
expression library... for anyone curious, there's a drop-in replacement, re2,
that leverages a Google library to provide linear time guarantees for
arbitrarily complex or user-provided regexes. (You lose lookahead assertions,
but most use cases for them can just be handled in pre- or post-processing.)
It's an incredibly handy tool that lets us confidently use complex regexes in
production.

[https://pypi.python.org/pypi/re2/](https://pypi.python.org/pypi/re2/)

[https://github.com/google/re2/wiki/WhyRE2](https://github.com/google/re2/wiki/WhyRE2)

> RE2 was designed and implemented with an explicit goal of being able to
> handle regular expressions from untrusted users without risk. One of its
> primary guarantees is that the match time is linear in the length of the
> input string. It was also written with production concerns in mind: the
> parser, the compiler and the execution engines limit their memory usage by
> working within a configurable budget – failing gracefully when exhausted –
> and they avoid stack overflow by eschewing recursion.

Really interesting articles in the second link detailing how this works!

~~~
tyingq
Great observation, but re2 does have tradeoffs to gain that performance.
Backreferences, for example:
[https://github.com/google/re2/issues/101](https://github.com/google/re2/issues/101)

~~~
btown
Absolutely. But for most common uses of backreferences, this is largely
mitigated by having a post-regex step that compares capturing groups for
equality, which induces overhead also linear in the length of the string. Of
course this falls short of true backreference support, but for tasks where we
do not expect potential matches to overlap, it gets the job done.

~~~
tyingq
Sure. I just object to the _" drop-in replacement"_ notion. Bad idea or not,
many people have existing code dependant on backreferences.

------
jxub
Python is the best (as in most productive and happy) language for me bar none.

It feels great to know that speed, the only major drawback, is being seriously
improved in each release. Here's to the future!

~~~
billsix
If your looking for a nice gui framework for python, check out my pyNuklear!

[https://github.com/billsix/pyNuklear](https://github.com/billsix/pyNuklear)

------
viraptor
For the new 3.7 there's also gc.freeze
([https://bugs.python.org/issue31558](https://bugs.python.org/issue31558))
which will be great for many web projects.

~~~
orf
[https://docs.python.org/3.7/library/gc.html#gc.freeze](https://docs.python.org/3.7/library/gc.html#gc.freeze)

Documentation link for anyone that is interested

~~~
wyldfire
> Also collection before a POSIX fork() call may free pages for future
> allocation which can cause copy-on-write too so it’s advised to disable gc
> in master process and freeze before fork and enable gc in child process.

I guess it would be nice to know if optimizations like this are already
covered in the high-level libraries like `multiprocessing`.

~~~
viraptor
They aren't / shouldn't be. This is something you need to opt into
specifically because it changes behaviour. For example if you have done
resource still open, calling gc.freeze will prevent it from ever cleaning up.

But web frameworks should implement it.

------
singularity2001
did they further improve backwards compatibility?

Neither trolling nor beating dead horses:

I recently wanted to do some screen capture and needed tkinter, which of
course has different imports for py2 and py3. adding library aliases would
help. I honestly wished people would use transitional deprecation warnings
more often to educate about changed apis.

As long as 50% of tensorflow projects are still in Python 2 it is always a
pain to encounter Python 3.

``` try: from urllib2 import urlopen from urllib import urlretrieve except
ImportError: from urllib.request import urlopen, urlretrieve ```

~~~
kevin_thibedeau
Is 15 years to prepare still not long enough to migrate your code
incrementally? You can get Py2 into a nearly-ready-for-Py3 state with just a
little effort stretched over whatever horizon you desire. Beating a dead horse
doesn't do anything about your stagnant code base.

~~~
mjevans
No, because they did a bunch of things that make automated porting not work
well and provides a ton of developer friction.

I'm willing to live with Python's space related insanity. Mandating that
indenting is meaningful. Mandating a best practice of using spaces instead of
tabs for indent* (which makes tab width matter). Mandating that a tab equals 4
spaces instead of 8. (Honestly, I REALLY hate this, most of the Windows
developers are happy with the space usage koolaid, let us have a default of
tab equals 8 spaces of indent.) Those are ALL, REALLY ANNOYING, but lint tools
can save me.

What will bite me for as long as Python 3 exists, and maybe Python 4, is how
they handle Unicode support and encoded outputs. Pretty much the ENTIRE rest
of the computing world does one of two things.

1) Doesn't care what the encoding is, and flings around non-validated 8 bit
sequences of data.

2) Does care, and defaults to assuming well formed UTF-8 data-streams for text
data, BUT still doesn't do anything to enforce it UNLESS ASKED. Most
importantly invalid data streams just continue flowing through the processing
tools until they reach a point where a human has designated they're willing to
care and /do/ something about it.

If Python 4 wants to 'fix' Unicode support:

    
    
        * Use a binary sequence base storage class with metadata tags.
        > Length (in bytes, in 'display length', and maybe in 'codepoint length')
        > encoding* (including if normalized and how so)
        > If the encoding has been validated.
        * Support automatic coercion between standard encodings
        * Allow user defined helpers for custom conversions
        * Make all input/output default to UTF-8**
    

__but permit raw output of such types without validation or re-encoding, just
like every other UNIX tool that knows what an encoding is.

~~~
singularity2001
Yes, please make UTF-8 default! It's 2018

print(str(b'I DONT WANT TO add "UTF-8" everywhere!!','UTF-8'))

~~~
ubernostrum
As a serialization and transmission format, UTF-8 is fine.

As the basis of a high-level language's string type, UTF-8 is objectively
incorrect. Strings in a high-level language should be as clean an abstraction
of Unicode as possible, and leaking implementation details of the particular
byte-encoding scheme up to the programmer is not acceptable.

~~~
singularity2001
Ok, is it reasonable to ask for all string getter/setter file read/write ops
to default to UTF-8?

also having different string methods .byte_length .char_length(?)
.codepoint_length seems a good idea. or string.length(aspect=bytes) but what
should be the default aspect?

~~~
ubernostrum
I'm going to quote something I wrote a couple years ago[1]:

 _Now, I should point out here that I’m not really knocking the people who
were writing, say, command-line and file-handling utilities in Python. For
years, Python sort of accepted the status quo of the Unix world, which was
mostly to stick its fingers in its ears and shout LA LA LA I CAN’T HEAR YOU
I’M JUST GOING TO SET LC_CTYPE TO C AGAIN AND GO BACK TO MY HAPPY PLACE. A bit
later on it changed to “just use UTF-8 everywhere, UTF-8 is perfectly safe”,
which really meant “just use UTF-8 everywhere because we can continue
pretending it’s ASCII up until the time someone gives us a non-ASCII or multi-
byte character, at which point do the fingers-in-ears-can’t-hear-you thing
again”._

 _So a lot of what you’ll see in terms of complaints about string handling are
really complaints that Unix’s pretend-everything-is-ASCII-until-it-breaks
approach was never very good to begin with and just gets worse with every
passing year._

I stand by this: we had a couple of decades of Python catering to this
brokenness, and it made life miserable for everyone who didn't work in that
particular domain. Python 3 changed that. Does it mean life got harder for
some people? Yup. But life got a lot easier and more reliable for many more
people, and it's a tradeoff I'm willing to accept.

[1]
[https://www.b-list.org/weblog/2016/jun/10/python-3-again/](https://www.b-list.org/weblog/2016/jun/10/python-3-again/)

~~~
mjevans
Can you answer what ways things got easier that are /not/ equal or better in
the solution I proposed?

~~~
viraptor
Currently in python you know that you're holding either raw bytes, or
something that can be successfully serialised to utf8. With your proposed
solution you'll find out which one it is when you try to encode it.

It's the difference between easier to debug: "you asked me to read a value
here, but your assumptions about the encoding don't match reality" exception
and the hard to debug: "I did a lot of processing; you thought this thing is a
valid text, but it isn't; have fun tracking down how it got here in the first
place" exception.

~~~
mjevans
Python 3: We have two things that are ALMOST the same, and which if we'd done
it correctly, could have been converted just by changing what we're willing to
call it (in the "downgrade" direction; or also verifying if you want to
achieve what I hate that Python 3 is forcing on programmers).

Proposed "string like" Object: __IF__ you want to turn on debugging, sure,
force it to validate assumptions at runtime/compile time. Otherwise call
verify() when you're willing to handle that being some result indicating "we
have a problem".

Maybe the verify() call returns the byte-access-offset of the first non-
conforming sequence.

~~~
viraptor
> We have two things that are ALMOST the same,

I think we've got a fundamental disagreement here. I don't believe they're
similar at all. They just happened to get confused a lot in the past when it
didn't matter that much.

------
hartator
Is it faster than 2.7 now?

~~~
ubernostrum
Python 3.0 was slower than Python 2.7, largely because some new modules were
initially implemented in pure Python in order to get the APIs out there for
people to experiment with. They're since been replaced with implementations in
C.

Python 3 passed Python 2 on performance years ago.

~~~
icegreentea2
Except at interpreter startup.

