Indeed py3 decided to make unicode strings the default. This fixes all sorts of thorny issues across many use cases. But it does indeed break filenames. I haven't dealt with this issue myself, but the way python was supposed (?) to have "solved" this is with surrogate escapes. There's a neat piece on the tradeoffs of the approach here: https://thoughtstreams.io/ncoghlan_dev/missing-pieces-in-pyt...

Maybe handling the surrogates better would allow you to use 'str' everywhere instead of bytes?

