Hacker News new | past | comments | ask | show | jobs | submit login
Migrating to Python 3 with pleasure (github.com)
320 points by arogozhnikov on Feb 4, 2018 | hide | past | web | favorite | 176 comments

I've been toying around with Python 3 and using it for most of my personal/hack projects, but I somehow missed the unpacking improvements: https://www.python.org/dev/peps/pep-0448/

In particular, being able to create an updated copy of a dict with a single expression is pretty cool:

    return {**old, 'foo': 'bar'}
    # Old way
    new = old.copy
    new['foo'] = ['bar']
    return new

    return {**old, 'foo': 'bar'}
    # Old way
    return dict(old, foo='bar')
Not much difference if you ask me.

It's twice as slow, doesn't work with more than one dictionary, you can't easily control the merge prescience and you can't (easily) use expressions/variables in the keys:

    {**x, 'fo'+'o': 'bar', **y}

why wouldn't you be able to just keep the same syntax and make it twice as fast in a newer implementation?

The meaning of 'dict' can be overwritten, but '{}' can not.


Personally I hate the new {} for set syntax.

Is {} an empty set, or an empty dict?

An empty dict, for backward comptibility.

    return dict(x, **{'fo'+'o': 'bar'}, **y)
The new syntax is some mild syntactic sugar. (Which isn't a bad thing IMO)

Did you try your example? It never worked in Python2 and it doesn't work in Python3 by design.

  Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> x = {'foo': 5, 'bar': 6}
  >>> y = {'foo': 7, 'baz': 9}
  >>> dict(x, **{'fo'+'o': 'bar'}, **y)
    File "<stdin>", line 1
      dict(x, **{'fo'+'o': 'bar'}, **y)
  SyntaxError: invalid syntax

  Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
  [GCC 5.4.0 20160609] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> x = {'foo': 5, 'bar': 6}
  >>> y = {'foo': 7, 'baz': 9}
  >>> dict(x, **{'fo'+'o': 'bar'}, **y)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: type object got multiple values for keyword argument 'foo'
  >>> {**x, 'fo'+'o': 'bar', **y}
  {'foo': 7, 'baz': 9, 'bar': 6}

This is the most unreadable code I've seen so far.

Just skip all this and use Perl instead. You can write far more idiomatic, succinct readable code in Perl than you can in Python.

The whole point of Python is to not write code this way.

This is what I have been saying for a long time. Readability in Python is an illusion.

On the face of it, sure, but really it's some new bytecode that gets rid of the limitations of using kwargs like that, namely it's slow, impossible to really optimize and doesn't support duplicate keys.

Except that does not necessarily work with non-string keys (it'd depend on version and implementation IIRC).

The expanded unpacking works in all cases.

And people say perl is unreadable ;)

Surely a more intuitive syntax would be:

X ++ {'fo'+'o': 'bar'} ++ y

Where '++' here means dictionary union, the choice of symbol is not relevant.

Not really, and it's not getting most pythonic way of doing things IMO. With the syntax above (I think) you can work with any iterable, whereas with a dictionary union operator you'd have to define it on every class you'd want to use, and you'd be out of luck with generators.

I've been practicing Python for a while and didn't even know about this. In my code style, I try to completely avoid the "dict" keyword and exclusively use dict literal notation.

I'm allergic to the { ... } thing. And I like terse and cryptic, but for some reason I find it the less attractive syntactical trick of all python3 (that I know).

dict(a, ... ,b) feels cleaner.

My mind is blown. Ever since JavaScript added this, I've been wanting it in Python... and somehow it was there all along. It works for lists too!

    [*a, *b, *c]

bear in mind that most things javascript got recently are old. literals, closure syntax, destruct, spread .. (lisp, scheme of course, but yeah python took it a while back too) this is all very very old but now ECMAxxxx is bringing it to the mainstream.

And the performance is pretty much the same, just a lot nicer syntax

    In [1]: x = {1:2, 3:4}

    In [2]: %timeit x[3] = 5
    48.6 ns ± 1.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

    In [3]: %timeit y = x.copy(); y[3] = 5
    189 ns ± 3.23 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

    In [4]: %timeit {**x, 3: 5}
    182 ns ± 3.3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Edit: It also seems to be pretty constant time if you're just merging:

    In [16]: %timeit {**x, **y, **z}
    180 ns ± 1.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

    In [17]: %timeit {**x, **y, **z, 3: 5}
    278 ns ± 18.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

    In [19]: dis.dis(lambda: {**x, **y, **z, 3: 5})
              0 LOAD_GLOBAL              0 (x)
              2 LOAD_GLOBAL              1 (y)
              4 LOAD_GLOBAL              2 (z)
              6 LOAD_CONST               1 (3)
              8 LOAD_CONST               2 (5)
             10 BUILD_MAP                1
             12 BUILD_MAP_UNPACK         4
             14 RETURN_VALUE

     In [20]: dis.dis(lambda: {**x, **y, **z})
              0 LOAD_GLOBAL              0 (x)
              2 LOAD_GLOBAL              1 (y)
              4 LOAD_GLOBAL              2 (z)
              6 BUILD_MAP_UNPACK         3
              8 RETURN_VALUE

It also allows for unioning two dicts together:

    baz = {**foo, **bar}


a, *b, c = range(10)

This is enlightening. So you're telling me you can do

    car, *cdr = some_list
(OMG I fired up the Python REPL and you totally can!)

Yes. And with files too :)

This is a lot like JavaScript :)

You mean JS is a lot like Python. Cause last time I checked, they explicitly said they were inspired by python to implement spread.

Is the new way still a shallow copy?


It's so strange to me that data scientists would need to be convinced to move to Python 3. It's superior in every way to legacy Python. I can understand maintaing Python 3 compatbility for legacy systems if you don't want to have Python 3 as a dependancy, but data scientists will be writing mostly ad hoc code and using Jupyter notebooks. The people around me are not allowed to use Python 2, in fact they're generally required to use the latest version of Python 3.

For anyone having trouble with maintaing multiple Python versions, I recommend Pyenv. You can install multiple local versions and switch between them. The selected version then uses the standard commands "python" and "pip", etc. which you can use to make your virtualenv from.

If you’re doing data science, do consider the anaconda python distribution with conda environments instead. Intel MKL linked numpy gives a lot better performance, and conda can install binary dependencies for libs that are tricky to compile even on linux. Pip packages are just more work and less performance in my experience (~10x)

For me, it's as simple as python is not a moving target. Whatever new changes there are in python 3 doesn't affect me since I won't be using any of the new features even if I were to use python 3.

Why on earth would you not use the new features?

Tradition? ;) I only use python for deep learning, which is not really that intensive in terms of code. Also most of the code out there for deep learning works on both python 2 and 3.

Also, your phrasing invites the question. Why on earth would I use a new feature in a turing complete language? Unless using the new feature results in tangible improvements in my code, I don't see a reason to use it.

I really like this one. “So you already have your turing complete language! What else do you need?”

That's not the point. The point is there is no reason to use an extra language feature unless it provides a tangible improvement in the code.

In fact, there are lots of reasons not to use an extra language feature that doesn't provide tangible improvements in code, including maintainability, ease of reading code, and portability in python's case.

I've moved to python 3 over the past couple of months, after resisting for the better part of a decade. I like it.

One surprising thing I learned from this document is that dicts now iterate in assignment order, not hash order. That's going to break some code for people.

Dicts never had an iteration order, though. If you relied on dicts iterating in order, you were kind of asking for it.

I believe newer Python versions have at least deterministic iteration order (correct me if I'm wrong), while some previous ones had a non-deterministic one (non-deterministic between mutiple invocations of the interpreter). But there is also OrderedDict which iterates in assignment order.

No, you're right. I believe dict orders were made deterministic (by insertion order) in 3.6. Before that, it was undefined.

In Python 2.7 prior to 2.7.3, or Python 3 prior to 3.3, iterating a dictionary would -- if it always contained the same set of keys -- be in a consistent order across multiple runs of the same program. This was an implementation detail not to be relied on.

Starting in 2.7.3 and 3.3, the hashing algorithm applied a per-process random salt to certain built-in types, in order to mitigate a potential denial-of-service vector by crafting dictionary keys which cause hash collisions. Unless the PYTHONHASHSEED environment variable was set to a consistent value, iteration over a dictionary was no longer accidentally consistent across runs.

In 3.6, the dictionary implementation was rewritten to drastically improve performance. As a side effect, dictionary iteration became consistent again, this time determined by insertion order. In 3.6, this consistency was an implementation detail and not to be relied on.

Beginning with 3.7, dictionary iteration order is finally guaranteed by the language, independent of implementation, and goes by insertion order.

CPython interates in this fashion since 3.6, but it is not part of the Python spec.

Might be a bit contentious but it's not supposed to be relied upon

It was recently decided that as of 3.7 it will be.

(Source: https://mail.python.org/pipermail/python-dev/2017-December/1...)

Hmmmmmm.... that makes it hard for other implementations. CPython isn't the whole of the Python world. Requirements like that can be problematic for implementations like Micropython, for instance.

They asked people from the Jython, pypy and uPython community if it was ok before doing so.

The entire reason they decided to make it a standard rather than keep it as an implementation detail is to make things better for users. It’s not about just looking after cpython.

I suspect the reasoning was with CPython dominating, people would eventually make this assumption in their code, perhaps unintentionally, and reduce compatibility. Better to make it official.

The headache that having non-deterministic code causes more than makes up for that.

Among other things this makes writing tests much easier.

Python dicts have always been touted as an unordered data structure though, so yeah, if you're depending on any type of order you shouldn't be.

In 3.6 dicts are ordered (by insert order) as an implementation detail. From 3.7 on it will be a feature.

There’s a well known YouTube video which covers all this comprehensively.

Raymond Hettinger, Modern Python Dictionaries... hopefully this is the correct one. https://youtu.be/npw4s1QTmPg

Ooh, I'm no longer involved in the Python world but I love presentations by Hettinger. Thanks!

EDIT: I was just informed it is as of 3.7 an official language feature, making everything below invalid

Personally, I'm worried people will come to rely on the new behaviour in code instead. As core developers have repeatedly said, dict order is still an implementation detail, it should not be relied on as it is not officially part of the language. Other implementations (except pypy) will probably not have this behaviour.

Yet, I feel like this will fall on deaf ears and become a de-facto part of the language. Blogs will state it as a new feature, Python books will teach it and new coders will rely on it, forever locking the dict internals in place for all python interpreters.

(If you need to rely on ordering, use an OrderedDict instead.)

Dicts are UNORDERED associative containers. If youre depending your app on implementation defined behavior, that's on your developpers shoulders. Stuff like that shouldn't pass code review

It is surprising though that the order was non-deterministic, not only undefined. Determinism is something I intuitively expect from read-only operations like iteration.

Non-determinism comes from randomness in the hash function. I don't know the details of the Python implementation, but generally you want randomness at the very least when processing potentially hostile user input. Otherwise your code is susceptible to a denial of service attack where the attacker sends data that was crafted to maximize hash collisions, to break the expected performance of hash tables.

From a theoretical angle, the only way to build hash tables with some kind of provable runtime guarantees is to include randomness.

I think that's a habit you need to shake. If something is undefined, you should intuitively shy away from expecting things from it. If I found myself wondering why some undefined behavior was not consistent, I'd question why I even believed it should be consistent in the first place.

Would you expect that `str(some_dict) == str(some_dict)`?

For that exact line I might expect that the result is true but still consider that line as "smelly" and meaningless, since dict equality cannot be compared by comparing their string representation.

I'd expect that any two ways of forming a dict with exactly same contents can result in different str(x) representation, and also that serializing and deserializing a dict can result in a different string representation.

No, of course not. Some order has to be applied to serialize it to a string, but that order is not defined between calls.

Python long ago made the guarantee that:

> Keys and values are iterated over in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions. If keys, values and items views are iterated over with no intervening modifications to the dictionary, the order of items will directly correspond.

That is, the order will be maintained between calls, so long as the dictionary had not been modified.

Going back to "str(some_dict) == str(some_dict)". I would not expect to always be the same, but for entirely different reasons. Consider:

  class Strange():
    def __repr__(self):
      import random
      return str(random.random())

  some_dict = {1: Strange()}

  >>> str(some_dict) == str(some_dict)

A section called "CPython implementation detail" looks like the very antithesis of a guarantee, to me.

My apologies. I looked for the answer I knew was there, and quoted the wrong section because it matched what I was looking for.

I should have quoted the next section:

> If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.

Curiously, the "Dictionary view objects" section at https://docs.python.org/2.7/library/stdtypes.html#dictionary... has the same "Keys and values are iterated over in an arbitrary order which is non-random ..." text, but without being inside of a "CPython implementation detail" box.

Huh, that's inconsistent. I guess they changed it to avoid the security problem and forgot to change the docs? Or maybe they didn't update 2.7 at all and it still behaves that way.

See this comment for the details:


It is simply that a python dict is not a list (read array).

Not for much longer, as of 3.7 the ordering is a language feature: https://mail.python.org/pipermail/python-dev/2017-December/1...

It's mad that it ever wasn't this way. Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.

It has been so much more pleasant to write python since ordering was maintained by default.

> Mapping-with-ordered-keys is such a useful and pervasive data structure (all database query result rows, for one) that an ordered dictionary should be a fundamental part of a language.

Here's one real-life use case of that: Avro records. One of the formats Avro uses is a text format that's basically ordered JSON. One company I worked for years ago used Avro as its wire protocol, and some Avro data was stored as JSON files on disk. Of course, Python's JSON implementation by default loads/unloads JSON to/from a dict. So just calling json.load() and json.dump() means I can't just load an Avro record from disk, change some data, and save it (which is something that came up when I was writing an upgrade script at a company I was working at years ago).

Thankfully, I had an out: the JSON library lets you override what container you load JSON into with object_pairs_hook, so I could just snarf it into an OrderedDict. But if I ever have to do this again after 3.7 comes out, I'm glad I won't have to worry about making sure I have the right container class. It makes my code simpler, and I won't have to leave a comment explaining why the code will break unless I specify an OrderedDict.

Great example.

I hate to think of all the developer hours wasted because JSON doesn't maintain key ordering.

Not to mention the lost opportunities for delta compression.

>>(all database query result rows, for one)

What? No. SQL does not return results in any consistent ordering unless specifically instructed to.

Not the result set, the rows of the result set.

Do you mean the columns of the row?

No, I mean the rows. Each row is semantically an ordered mapping.

Shouldn’t the collection of rows be a set or list, not a dictionary?

That said, you disagreed with my question then went on to show my question was on point.

The “rows themself” being an ordered map means you are referring to the columns, the order being set by the SELECT clause or table definition order (in case of wildcard).

That said, I personally feel iterating over table columns in that way to be a “bad code smell”. Not saying it’s bad in all cases, but generally it’s an anti-pattern to me.

Order-significance is exactly how the relational model was defined in the beginning

"An array which represents an n-ary relation R has the following properties:

1. Each row represents an n-tuple of R.

2. The ordering of rows is immaterial.

3. All rows are distinct.

4. The ordering of columns is significant -- it corresponds to the ordering S1, S2, ..., Sn of the domains on which R is defined (see, however, remarks below on domain-ordered and domain-unordered relations).

5. The significance of each column is partially conveyed by labeling it with the name of the corresponding domain."

-- A relational model of data for large shared data banks[1]

[1]: https://cs.uwaterloo.ca/~david/cs848s14/codd-relational.pdf

> Shouldn’t the collection of rows be a set or list, not a dictionary?

I didn't mention the collection of rows, I mentioned the rows themselves.

> The “rows themself” being an ordered map means you are referring to the columns

No, it means I'm referring to the rows themselves.

The rows themselves are each ordered mappings.

Hope this helps to clarify things. Happy to keep repeating this as many times as necessary.

I think the confusion is your use of “columns” or “columns of the row” to refer to attribute-value pairs that make up the row.

Unless there is an "order by" clause, the order of rows in the result set is undefined and non-deterministic from query to query.

As I said:

> Not the result set, the rows of the result set.

Each row is a mapping with ordered keys.

Perhaps you are confused and mean "columns"? ("Rows of the result set" is what I was referring to.)

A result set has rows, which are not in a deterministic order unless an "order by" is provided. Each row has columns. The columns are in order, obviously.

Perhaps you are confused and think I'm talking about the ordering of the result set.

I said that database query result rows are made up of ordered dictionaries. I didn't mention ordering of the result set.

Happy to keep repeating this as many times as necessary.

ok. We're actually saying the same thing, just in different words.

I have the opposite reaction to it, it seems insanely idiotic to have a associative array with ordered keys. It can only make sense to someone who doesn't know anything about fundamental data structures and a language that caters to people like that in spite of the performance penalty is just strange.

but hey, its Guido, I still can't fathom that he moved reduce into functools.

I vaguely remember that the change to ordered keys in 3.6 was actually a side effect of making the dict implementation more efficient!


Well, given that I used reduce twice in production in 14 years of Python, my guess is that you are using Python wrong.

Putting aside the fact that this change was made to increase performance, I'd rather have a language that's useful, expressive, and semantically powerful by default, instead of one that is less powerful and harder to use but slightly faster.

The data structure OrderedDict does what you describe and has been in the stdlib since I think 2.7

Indeed, but it's much nicer when it's universal and built-in rather than requiring a specific import and different, more cumbersome syntax.

Ordereddict has been available for ages. Why should I pay for the overhead in the 98% of the cases where I don't need it.

The new dict implementation was introduced in 3.5 (with forced order randomization, which was removed in 3.6) because it is faster and uses less memory than the previous non-ordered dict. Ordering is merely a nice byproduct. So you're not paying any penalty.

OrderedDict is fundamentally different because it is designed to allow inserting/removing keys in the middle of the ordering in O(1) time. It does not use the new dict under the hood, or vice-versa.

Actually, 3.7 are not always ordered. If you delete a key, order is not guaranteed anymore. For performance reasons.

Are you sure about that? I thought they considered that but decided to just make them always ordered in the end.

Keeping the dict ordered is actually faster. There’s a link to a YouTube talk a little upthread that discusses the algorithms used. It’s really interesting.

agree completely

aside from that, python has a separate ordered dict class

Before 3.7, dict ordering is documented literally everywhere as "consider it implementation details, use OrderedDict if you need order". From 3.7, it's part of the spec.

If you were relying on keys order before, you were not only doing it wrong, but you were doing so despite being told again and again.

This has been changed in Python 3.6 only, due to a change in dictionary implementation to make them more efficient. "Modern Dictionaries by Raymond Hettinger" [1] is a quite interesting and technical talk about these changes, explaining also the change in iteration order IIRC, worth a watch in my opinion!

[1] https://www.youtube.com/watch?v=p33CVV29OG8

This is why you read the documentation rather than relying on what a piece of library code appears to do. Maybe all programmers have to learn this the hard way.

I’m a little surprised at this point that Apple still doesn’t include a default Python 3.x on macOS. It’s the single thing keeping me from moving (as there’s a big difference between “just run this” and “first download this, then run this”).

Downloading it is definitely not the biggest hurdle to moving to python 3.

I infer GP was talking about his users on OSX, who would have to download/install python3 to use the distributed software.

There are arguments against doing anything against system python installs in the first place.

My theory is that this is not going to be a nice change when they drop 2.7. Maybe one version will be released with both installed by default, then they'll use only 3.7/3.8 for the next decade. MacOS doesn't seem to care a lot about backwards compatibility recently.

This is of course pure speculation...

Other OSes just install a “python3” binary, I’d expect Apple to do the same.

Sure. What I'm saying is I expect python/python2 to be gone soon after that.

I’m not so sure; /System/Library/Frameworks links for Python versions have been stable for a very long time (which surprised me at first but I imagine Apple has plenty of stuff of their own that uses Python). Even though they’ve since hacked a lot of the older versions with symbolic links to 2.7, versions back to 2.3 have valid paths.

Honestly the fact that they include a system Python 2 is a huge pain. You can't add packages to it, and you shouldn't add another Python 2 interpreter to the PATH. You end up having to use virtualenv which is a stupid hack.

pip takes a --user flag that installs packages to your user's account. It's essentially global unless you have multiple users on your machine.

But there is no pip.

I used PyInstaller for the first time this week to build a single binary executable for Linux. So far in limited use the result has been great. I think I'm gonna do OS X next.

I agree, though it's not just macOS: all of the BSDs are behind on Python 3. Once FreeBSD gets Python 3 by default, I bet macOS will follow.

Current FreeBSD Python 3 status: https://wiki.freebsd.org/Python

I'd been a 2.7 holdout for ages, but when f-strings were greenlit for 3.6, I decided then and there that all my new personal projects would be written in 3.

I'm glad I did. F-strings are wonderful, as is pathlib and the enhanced unpacking syntax.

Since I started my current job, I've also been writing as many scripts as I can in Python 3 as well (and Docker has been a godsend for that because I can now deploy 3.6 scripts to the few servers we have that are still running RHEL 6).

Could you provide more info on your setup for this? I work on some EL 6 servers and would be interested in using this setup.

Not the OP, but replying to offer a suggestion in this respect.

I'll often do something similar to this, where I have a CLI tool that I'm not ready to deploy server-wide yet, and has hefty dependencies.

The pattern I use is to have a wrapper shell script that calls:

    docker run -it --rm --volume "$(pwd)/$1:/file_to_process:z" --user $(id -u) container-image /opt/command_to_run /fileToProcess
This runs "/opt/command_to_run /fileToProcess" inside a container as the current uid, mounting the parameter to the shell script as "/file_to_process" inside the container.

The :z mount parameter may or may not be needed depending upon whether you have SELinux enabled or not (by default, SELinux prevents countainers accessing any file on the host, and :z changes the SE context to allow access). I don't know if this is the case with EL6 tho.

The -t parameter shouldn't be used if your script is running in a pipeline (it creates a pseudo-tty). So it may be worth having some kind of conditional to remove this.

The wrapper I use also has a conditional to add the "$(pwd)" prefix to the call parameter only if the parameter is a relative path.

Honestly, I just wrote my Dockerfile using the example from the official Python container: https://hub.docker.com/_/python/

Then I push the image to my company's internal Docker registry, ssh into the server, and pull the image.

(also, aside from using Python 3 on RHEL 6, it also means using Python 3.6 on RHEL 7 without having to install python36u)


Dockerfile (script name redacted)

    FROM python:3
    WORKDIR /usr/src/app

    COPY requirements.txt ./
    RUN pip install --no-cache-dir -r requirements.txt
    COPY . .

    CMD [ "python", "./redacted.py" ]
Build and push commands (company and script names redacted):

    docker build --rm -t dreg.example.com/redactedproject/redacted .
    docker push dreg.example.com/redactedproject/redacted
Pull and run on the server (same stuff redacted as above, plus I redacted the actual port number to be on the safe side):

    docker pull dreg.example.com/redactedproject/redacted
    docker run -p 1337:1337 -d --name redacted dreg.example.com/redactedproject/redacted
And at some point, I'll make proper startup scripts for them. On RHEL 7 boxes, I've made systemd unit files. On RHEL 6... well, I suppose I'll be writing initscripts soon.

If Python 3.4 is good enough, you can get it from the EPEL repos.

Several posters indicate that they’ve stuck to python 2.7 even for small side projects until now. I cannot understand why? Python 3 seems to have been technically superior for a few years, and side projects must surely be good for learning something new?

I distribute python programs. Macs only have Python 2 by default.

But surely that's not stopping anyone. As someone above said, just use pyenv to run Python 2.7x and Python 3. It's not as if anyone has to settle for using only legacy Python.

At the moment my instructions are "grab this script and run 'python script.py'". That is ok for about everyone.

I don't want to have to start teaching pyenv to every academic or student I want to send a script to.

If you would like to write Python 3 but need to maintain support for any particular version of Python 2, something that I can personally recommend is using the Coconut transpiler: http://coconut-lang.org/ The Coconut language is a superset of Python 3, so any valid Python 3 is valid Coconut. But the advantage of the transpiler extends beyond the language itself, in that it can target any version of Python from 2.6 onwards on the Python 2 branch and 3.2 onwards on the Python 3 branch.

It has been really useful for me in that I want the advantages of type annotations and f-strings and other python 3.5+ features but I have to support running in an environment with only 2.6 installed. So when I target a 3.5+ version, all of those features are maintained, but when I target 2.6, the transpiler does all the work in converting to running 2.6 code for me.

Didn’t know about the enforce library (https://github.com/RussBaz/enforce/blob/master/README.md) — but have been wanting something like this.

Thanks for the useful list!

the Path thing is incredible! I wasn't aware of it. I'll definitively be replacing my os.path.join calls for a more readable version of it.

The read_bytes() and read_text() methods are particularly useful. They're new in Python 3.5.

Path(__file__).absolute().parent.glob('/*.py') is just wonderful.

Wow, I switched to Python 3 really early on and never knew about pathlib.

I have been conditioned to write code that is 2/3 compatible, that even when I am writing specifically for the PY3 interpreter the code turns out to be a __future__ import away from being valid PY2. I did not think much of them at first, but very recently, f-strings have changed that.

I think many people get imprinted with writing the PY3 code using only the features available at the time they switched over.

You can use some backports of libraries though, surely? `pathlib` is in pip.

Back-ports are an option but they are an extra dependency. When you have not used a new feature, the cost of an extra dependency out weighs the unknown benefits that would be realised by using a back-port.

And the multiprocessing and threading pools.

And collections.ChainMap.

And f-strings.

And yield from.

And type hints.

And statistics.

And ipadress.

And secrets.

And matmul.

And subprocess.run.

Come on, Python 3 is packed with awesomeness !

Every time somebody mentions how awsome f-strings are makes me laugh. It was around ten years ago when Python community was looking down at Perl and shell with their string interpolation, but now that Python got pretty much the same it's suddenly not considered a misfeature.

We were wrong. We aknowledged it and improved. It's a good thing.

Have you considered that maybe the language had not evolved enough for them to be appealing. With b'' and u'' string syntax coming into the language, there is a realisation that string interpolation can be an opt-in feature among other reasons.

What you have is decision making analogous to the function

    evaluate(string_interpolation, python_ecosystem)
which is different from


> With b'' and u'' string syntax coming into the language, there is a realisation that string interpolation can be an opt-in feature among other reasons.

This argument won't fly. Did you know that string interpolation in Perl and shell was always an opt-in feature? And despite that it was frowned upon by Python community.

multiprocessing and threading pools

While certainly awesome, these are in python 2.7 as well. Although I don't remember if they where in 2.7 first of back-ported to 2.7 from 3.

__Reliable__ pools are only in 3.

Cool. What's changed?

> test_path = datasets_root / dataset / 'test'

> Previously it was always tempting to use string concatenation (concise, but obviously bad), now with pathlib the code is safe, concise, and readable.

This is the kind of feature that I'm wary to use even in scripts: questionable benefit, and probably too clever.

Benefit is huge: it's readable, easy to type, and takes care of joining with proper "/" or "\" depending of the plateform.

I'll challenge you: You shouldn't have many paths, should keep them at a centralized location in the code, and there you could just use a normal function with a proper self-documenting name.

It's not like programs consist of path operations to a significant degree, so using fancy syntactic sugar doesn't seem like a worthwhile optimization to me. At all.

You say that because you are not a sysadmin. Python user base is very rich.

I don't have a machine available right now, but I wonder what happens if two adjacent path elements are integers? Does it perform division instead of path/string concatenation?

You can't concatenate a path with an integer:

  >>> Path.cwd() / 1
  TypeError: argument should be a path or str object, not <class 'int'>
Also, note that the '/' operator is left-associative, so even if you have two adjacent numbers when joining paths, there will be no division.

  TypeError: expected str, bytes or os.PathLike object, not int

This probably means that you put a path to the left, not ints on both sides.

Well, I put a path and then two ints... if you don't start with a path it's obviously just going to be division.

The Path object won't construct a path from an integer.

    >>> from pathlib import Path
    >>> p=Path(1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.6/pathlib.py", line 979, in __new__
        self = cls._from_parts(args, init=False)
      File "/usr/lib/python3.6/pathlib.py", line 654, in _from_parts
        drv, root, parts = self._parse_args(args)
      File "/usr/lib/python3.6/pathlib.py", line 638, in _parse_args
        a = os.fspath(a)
    TypeError: expected str, bytes or os.PathLike object, not int
So what happens if the paths are numbers? They are treated like any other characters.


Python is strongly typed.

Yes, that's pretty much the question metaobject was asking.

Should I have said "numeric characters" or "numeric strings" in my last paragraph instead of numbers?

Came here to validate my own "I moved" experience: learned stuff I hadn't checked on. TL;DR its never too late to learn what you can do, once you can deprecate the past.

I'm not sure why this is targeted toward data scientists; these tips are useful for any Python user.

I think it's because the remaining python 2.7 users are mostly people using the packages that were unsupported in 3, which I think were mostly packages used in data science afaik

If it were targeted towards developers I would highly, highly emphasize type annotations with a type checker.

Is the np.dot -> @ tip reliable? I thought these were fairly different in practice.

What do you mean? `A.dot(B)` and `A @ B` are the same thing for NumPy arrays. You might be mixing it up with the weirdness of `array` versus `matrix`, but that's totally separate.

The @ operator is the same as np.matmul which is different from np.dot for matrices of rank >=3.

Ah you're right, I had that mixed up. matmul should generally be the desired behavior anyway I'd think, the previous behavior of dot was a bit weird IMO, especially the behavior with a scalar. It's not backwards compatible, but I think it'd be better to prefer @/matmul in the future anyway.

I almost think it'd be nice to make matmul undefined for ranks higher than 2, since it's not really matrix multiplication and if you want to do that (or the previous behavior of dot) it can be achieved with einsum, with the advantage that you have to be a lot more explicit about what sort of tensor multiplication you want. That's probably a bit too purist though.

Another difference is that B can be a scalar in the statement A.dot(B), but not in A @ B

I had so many issues trying to install python3 in an existing server that I ended up having to go back. pip kept complaining and it was just really annoying. Then some libraries were not compatible and it felt like it wasn't worth it.

Isn't python 3 usually installed alongside python/pip 2.7 as python3/pip3, and everything kept seperately?

If you use pyenv, it is.

Even if you don't, any OS-level package manager should easily install Python3 alongside whatever the base install is without any conflicts as `python3` and pip as `pip3`.

Homebrew, apt, pacman, etc. all have one-line python3 installation.

Is there some kind of slash operator making this statement work? Is this a way to concatenate things?

train_path = datasets_root / dataset / 'train'

Python allows the overriding of just about every operator. For Pathlib they overrode the division operator to instead perform path addition in a platform agnostic manner.

> Python allows the overriding of just about every operator.

Except the boolean operators (and, or, not). For instance __and__ overrides the binary and (&), not the boolean and.

That's True, but you _can_ override __bool__.[1]

[1] https://docs.python.org/3/reference/datamodel.html#object.__...

pathlib[1] has been in the standard library since Python 3.4. It can do this.[2]

[1] https://docs.python.org/3/library/pathlib.html

[2] https://docs.python.org/3/library/pathlib.html#basic-use

Type hinting isn't part of Python 3, it's part of Python 3.5+.

Syntactic support for annotating functions and exposing the annotations existed as of 3.0.

The 'typing' module in the standard library was new as of 3.5.

Syntactic support for annotating variables was new as of 3.6.

Support for delaying resolution of annotations is new in 3.7 with a __future__ import.

Originally the annotation feature was seen as a possible way to add type hints to Python, but other potential uses were envisioned and no immediate preference was given to types over other uses of annotations.

I realized, just today, that the secrets module is new to 3.6 after trying to pip install it. This being provided directly by the language is a game changer, IMHO.

It's interesting for sure, and I'd like to see where it goes, but right now there isn't much to it: https://github.com/python/cpython/blob/3.6/Lib/secrets.py

<3 python 3 :)

Another feature is that the mock module is now part of the standard library, no need to install.

also you can use pythonconverter.com

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact