Hacker News new | past | comments | ask | show | jobs | submit login

>UNIX files and paths have no encoding, they're just bags of bytes, with specific bytes (not codepoints, not characters, bytes) having specific meaning

Yes, and you should treat them as such. If you decode them you're doing it wrong. os.listdir lets you treat paths like bytes and responds in kind:

https://docs.python.org/3.5/library/os.html#os.listdir

The problems you mentioned do exist, and Python 3's response is to make you deal with them. This is the whole point, not a mark against it. Often when you find that Python 3 is making it hard for you, it's because you're doing it wrong.




While correct that also means that you cant ever correctly show filenames to users either. At somepoint you need to relate it to an encoding as a user displays or enters it. "utf-8 everywhere now!"


Hi again. This was addressed in my other comment.

>You shouldn't decode them except if you need to for display (and in that case, you should be prepared to handle filenames which cannot be decoded as text).


you cant ever correctly show filenames to users either

If I create a file whose name consists entirely of bytes that correspond to non-printable characters, how would you display that in the language of your choice?

If I create a file whose name cannot possibly be valid UTF-8, how would you display that in the language of your choice?

Both of these things are legal to do on at least some systems. Presumably you have in mind a language which will magically handle this case with no extra effort from the programmer, though, so I'm curious to know the language and what it would do.


> Yes, and you should treat them as such.

Of course you should. But the debate here is not about how a programmer should do it. The debate here is that Python3 doesn't do it that way by default and at the very least it takes extra code to do it, and as the article points out in some cases like command line arguments makes it impossible to do it that way.

A well designed computer language encourages good code by making it easy and obvious to write it. By that metric Python3's handling of bytes and strings is not well designed.

By the by I have been bitten by this. As others have mentioned a backup program turns out to be a worst case. I had to redo the string / file name handling twice before I was confident it was correct. I'm an experienced Python programmer both Python2 & 3. The final design wasn't obvious to me, and took substantially more code than I thought it would.


How is that different from Python 2? You have to deal with it either way.


Python 2 encourages the unaware to silently write broken, anglocentric code. Python 3 puts the problem in your face.


I don't see how that is responsive to the question about using bytes/Py2 string for path names.


Python 2 encourages the unaware to silently write broken, anglocentric code [which interacts with paths].


Python 3 encourages the unaware to write broken code in a utf8-centric way instead. The unaware are going to write broken code.



What specific difference are you trying to highlight? The only distinction I see is that Python 3.3 adds support for file descriptors.


Likely the existence and explicit mention of os.fsencode, as well as the api giving what you put in, so accepting bytes if you do things safely.


The 2 API also returns the same type you put in.

os.fsencode() was changed in Python 3 longer after the 3.0 fork and isn't inherent to 3 -- the same change could be made to Python 2. It only wasn't because Python.org has been intentionally neglecting Python 2 since 2015.

(The Python 2 equivalent of os.fsencode() is just encoding with sys.getfilesystemencoding(). The primary difference is that fsencode() uses the surrogateescape error handler by default. Since this is a relaxed behavior, it seems like it could be added to Python 2 without regressing existing programs.)


Yes, so the documentation and language explicitly have tools to avoid encoding issues and make you aware of them. It encourages writing good, unbroken, code.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: