Hacker News new | past | comments | ask | show | jobs | submit login

A python byte sequence (the b"file" in the example) is not necessarily a string that is using the lower ascii characters, it's an arbitrary sequence of arbitrary (not only <127) bytes - the equality operation comparing a string with a sequence of bytes needs to be well defined for all possible byte sequences, including non-ascii (byte)strings like b'\xe2\xe8\xe7\xef' which decodes to different strings in different ANSI encodings (and bytestring data does not include any assumption about the encoding - especially if you just received those bytes over the network or read them from a file), and is not valid UTF-8.

Furthermore, even for ascii sequences like b"file" the bytes do not map to the string "file" in every Unicode encoding - for example, in UTF-16 the bytes of b"file" represent "楦敬", which is a bit different than "file".




If the "string" b"file" does not mean the ASCII string "file", but rather is supposed to be interpreted as a byte array (equivalent to just bytes in memory with no context of the individual array members being characters), then my original point still stands: such a comparison shouldn't be allowed at all and an error should be raised. To simply return False indicates that the comparison is valid with regard to the string types, but the comparison simply returned False because the two strings were not equal.

I thought Python was strongly-typed ? Am I incorrect in this regard ?


Python is not strongly-typed, it's "duck-typed", i.e. everything is an object, and you should be able to hand over "X-like" objects to code that expects type X, and it should work properly if your X-like object supports all the interfaces that type X does. As part of that duck-typing, it's valid to compare anything with anything without raising an error, it's just that different things are (by default) not equal, so the comparison returns false. For example, you can compare a class defining a database connection with the integer 5, that would be a valid comparison that returns False.

This behavior is a key requirement for all kinds of Python core data structures, for example, if you'd define bytestrings so that they throw an error when compared to a "normal" string, then this means that for a heterogenous list (pretty much all data structures in Python can be heterogenous regarding data types) containing both b"something" and "something" many standard operations (e.g. checking if "something" is in that list) would break because the list code would require a comparison to do that.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: