Hacker News new | past | comments | ask | show | jobs | submit login

Going of on a tangent a bit here, but I think there are 2 important related issues:

* API design should fit the language. In a "high on correctness" language like Haskell or Rust, I'd expect APIs to force the programmer to deal with errors, and make them hard to ignore. In a dynamically typed language like Python where many APIs are very relaxed / robust in terms of dealing with multiple data types (being able to see numbers/strings/objects generically is part of the point of the language), being super strict about string encoding sounds extra painful compared to a statically typed language. I'd expect an API in this language to err on the side of "automatically doing a useful/predictable thing" when it encounters data is only slightly incorrect, as opposed to raising errors, which makes for very brittle code. Most Python code is the opposite of brittle, in the sense that you can take more liberties with data types before it breaks than in statically typed languages. Note that I am not advocating incorrect APIs, or APIs that silently ignore errors, just that the design should fit the language philosophy as best as possible.

* Where in a program/service should bytes be converted to text? Clearly they always come in as bytes (network, files..), and when the user sees them rendered (as fonts), those bytes have been interpreted using a particular encoding. The question where in the program should this happen? You can do this as early as possible, or as late as possible. Doing it as early as possible increase the code surface where you have to deal with conversions, and thus possible errors and code complexity, so that doesn't seem so great to me personally, but I understand there are downsides to most of your program dealing with a "bag of bytes" approach too.

I don’t think Haskell is a very good example to promote for string handling. Things are mostly strict and well behaved once they make it into the Haskell program but before then they either need to satisfy the program’s assumptions before being input or the program will be buggy/crash unless it is carefully written such that it’s assumptions are right.

I didn't mention Haskell specifically for strings, but as a language that tends to be very precise about corner cases. That may not even be the best example, but I couldn't think of any better mainstream-ish language examples :)

Part of the problem is that encoding is treated as something that must be explicitly handled in the string API, but it's something that's just assumed by default in the IO API. Python just guesses what the encoding is, and it often guesses wrong.

The design of the API leads people to do the wrong thing. Encoding should be a required argument for `open` in text mode.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact