Hacker News new | past | comments | ask | show | jobs | submit login

I didn't read the article as saying "I hate the complexity of Unicode", I read it as a complaint about language/library ergonomics. Comparing a string to a bytestring should not just always evaluate to false, it should (IMHO) raise a type error.



It read to me as not learning how things worked and then blaming the language when an upgrade revealed that misunderstanding rather than taking the time to learn how things actually work.


Even Python 3 makes it easy to ignore how things work. open happily accepts a string. Regex works on bytes. open and the standard streams can be used for reading and writing without explicitly specifying any encoding or error handling strategy. There's a default value for encoding on codec.encode and codec.decode. And nothing is checked statically, so you get to discover all your mistakes at runtime—and often only with a specific combination of inputs and environment.

People don't think about these things because Python encourages you to ignore them. Until it doesn't. There'd be a lot less confusion if Python 3 were more strict/explicit about conversions.


> Comparing a string to a bytestring should not just always evaluate to false, it should (IMHO) raise a type error.

Equality testing is implicitly done in a lot of places (like searching), it would become a bit of a hassle if the basic types were to throw exceptions when compared with each other.


It would also result in correct code sooner.


Perhaps, but the ship sailed long ago on that one

    >>> 1 == '1'
    False


Finding bugs early is considered a hastle these days? Really?


Not saying it's right but:

For something like Unicode, I think the point is that it "just worked" before where it would have been a lot more time and effort to get it "working" in say Java. Types and compile/runtime errors are a balance in the end that depends on the use case and if the programmer wants something that works (for the short term usually) or is resilient and will work long term. IMO, the influx in popularity of untyped languages (Javascript being used on the server side, ew) is a sign that so much coding today is not done for the long haul. I think there are good balances to be hit though, and Unicode is a place that highlights that python could use some better optional typing aid, as Typescript brought to JS.


As someone with so called “Unicode-characters” in my name, I have encountered bugs and issues on every single American website in the history of the internets, and you will have to forgive me for preferring correctness over programmer convenience.

For me it’s literally personal.


How many more ads are you willing to watch to pay for the correctness?


While your comment is trollish, I’ll let you know I’ve had many issues with payments because shitty web-sites have been unable to 1. Validate my name and 2. Correlate whatever their whitelist permits with actual information returned from the payment processor.

So your inability to handle basic stuff like this is actually costing you money.


Why would this be a type error? "Are these two things equivalent?" has an unambiguous answer regardless of whether the things are of different types. When they are not the same kind of thing, the answer is simply `False`. There's no reason to have a third answer "You're not allowed to ask that" for that case.


It's not unambiguous at all. If I do

    "część" == "część".encode('<some-obscure-encoding>')
should it return True? They are 'equivalent' after all. But that would mean it should work for any encoding, so testing `s == b` would entail trying to decode `b` as every known encoding to see if any of them gives `s`.

My point is, `s == b` is not a well-defined question. It only makes sense to ask it for a particular encoding: "Are these two things isomorphic according to cp-1252?"¹². Giving a type error is a decent way of signalling all this.

---

¹ And even then, I don't like automatic type conversions that enable things like `3 == "3"`, but I guess that's a matter of taste.

² Of course you could pick a convention like `s == b <=> s = b.decode("utf8")`, but that's a different question.


No, it should return `False` because they're different types, as I said.

> I don't like automatic type conversions that enable things like `3 == "3"`

I didn't mention this, and Python, the language under discussion, doesn't do this.


Okay, it looks like I misunderstood your original point:

> "Are these two things equivalent?" has an unambiguous answer regardless of whether the things are of different types.

as

"Isomorphic things should compare equal"

So, to respond to (what I think is) your actual point: in my view, equality is defined on values of a single type, so I prefer to distinguish

"a isn't equal to b" (= False)

and

"there's no defined way to compare a and b, because they have different types" (= some kind of error or a null-like value)

To me, conflating the two seems counterproductive, but perhaps this is personal preference (probably correlated with how much one likes static typing).


Python has operator overloading, though, so a flat rule that comparisons involving values of different types must always be a type error would actually reduce the power of the language. Granted, there aren't a ton of use cases where it's important to be able to do this, but it is useful on occasion.


You're right – I guess I got too hung up on the types. What I meant to say is that I prefer to distinguish "`a` is not equal to `b`" and "equality between `a` and `b` is undefined"; the typing aspect is kind of orthogonal and muddles the discussion.

(I guess I ended up conflating equality and equivalence[1], but in my defense, most languages seem to do that too; and the presence of the `is` operator in Python mixes things up even more)

[1] https://en.m.wikipedia.org/wiki/Equality_(mathematics)#Relat...


It would be a type error because that would be helpful. Since comparing a bytes to a str is not usually what you want, Python could make you be explicit about it. It’s too late now, and it might have been considered another backwards compatibility break (so a good thing to keep out of Python 3), but it’s not a bad idea on its own. Heterogeneous collections are pretty rare.

Also, `==` doesn’t always answer the question of whether two things are equivalent in Python:

  >>> 1.0 == 1
  True
  >>> {'a': 5} == OrderedDict({'a': 5})
  True
(OrderedDict equality isn’t transitive.)


Pretty rare but not unheard of. I do know that most of the code bases I work in would require significant reworking if I have to expect `==` to throw a TypeError for basic builtin types.


Not quite with ==, but there is precedence where operators with basic builtin types throw a TypeError when the operation doesn't make sense:

    >>> 0 > None
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: '>' not supported between instances of 'int' and 'NoneType'


Absolutely; there's no good boolean answer for > when you have different types, so the `TypeError` is perfect there.


At the same time, this result is entirely dependent on the types involved. For example, "1.0 > 0" has a well-defined answer, and Python returns it.

Considering the expression "x > y", the full process is:

1. Call x.__gt__(y).

2. If that returns a bool, that's the result. If it raises an exception, propagate the exception.

3. If it returns the sentinel value NotImplemented, attempt the reflected version of the operation: y.__lt__(x).

4. If that returns a bool, that's the result. If it raises an exception, propagate the exception.

5. If it returns the sentinel value NotImplemented, raise TypeError and inform the programmer this operator is not valid on these types.


Could you explain how your example shows that?


Which “that”? If you’re asking about `OrderedDict`, I’m referring to equality between `OrderedDict`s:

  >>> a = OrderedDict([(1, 2), (3, 4)])
  >>> b = {1: 2, 3: 4}
  >>> c = OrderedDict(reversed(a.items()))
  >>> a == b == c
  True
  >>> a == c
  False


I was referring to this:

> Also, `==` doesn’t always answer the question of whether two things are equivalent

because the examples you posteda kinda make it look like it does, i.e. `1.0 == 1` even though one is a float and the other is an int.


> "Are these two things equivalent?" has an unambiguous answer regardless of whether the things are of different types.

Absolutely not; that's basically the OG subjective question whose lack of an obvious right answer underpins a substantial quantity of the bugs in less strongly-typed languages.

Nor is an unambiguous answer possible symbolically, or in common logic, or the semantics of spoken languages.


I think you stopped reading before my next sentence, which says the answer is "false" if the types are different.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: