Hacker News new | past | comments | ask | show | jobs | submit login
Ad-hoc polymorphism erodes type-safety (cs-syd.eu)
44 points by diarrhea 9 months ago | hide | past | favorite | 38 comments



The Rust example provided isn't showing what the author seems to claim it's showing, which is interesting.

You see, that function named iter() isn't some trait feature, it's just a convention. So this is a coincidence, of exactly the sort which could arise regardless of a feature like traits.

Option::iter() is a function defined on the Option type, and [T]::iter() is a function defined on slice types [T]. The Vec is getting coerced into a slice to call that iter() function in the case where there's a Vec here.

These functions do similar things because of the convention, but they could in principle have completely unrelated behaviour -- it's just a convention -- and this would still compile.

For example suppose we instead change allow_list to my custom Doodad type. Doodad can cheerfully provide a method named iter() which gives you a Debretts (Debrett's publishes a notable guide to British Peerages) and my Debretts type has a method count() which tells you how many counts and countesses there are currently for some reason (zero, England doesn't use this title).

That compiles just fine, not because "Ad-hoc polymorphism erodes type safety" but because we just wrote iter().count() and we get whatever that does, if we wanted to say "I assert iter gives me an Iterator over items of type ClientIdentifier", we can say that, but we did not.


I was expecting/hoping to learn something new, and instead only came to this same realization. The most I could say is that interfaces/etc can sometimes give a false sense of (type) safety. On finding the cause of such a fault, I don't shake my fist at interfaces, I say "Doh! I should know better."

> Some responders seemed surprised by this idea, and I thought this was common knowledge but could not find any blog posts about it either. I figured maybe it was worth writing out, and some people agreed, so here we are:

The idea is both surprising and common knowledge. Surprising to think that it's a special kind of type weakness, rather than a common one where actual cases are not surprising at all. Nice writing, more substantial content would be great!

If we take away all the big words, use Java instead of Haskell/Rust and start with the example that breaks, it would be a much shorter post that no one would care to read. I don't know that the great writing and build-up is worthwhile, other than a pleasant if somewhat disappointing read.


This feels less of polymorphism eroding type safety as it is aliased terms eroding it. Stated differently, programming/abstraction doesn't protect you from synonyms.

That is, it means different things to iterate over all allowed values of a list than it does to iterate over the existence of something. That they can be abstractly described in the same way feels powerful, of course, but such is the case of powerful tools?


Yes, this is about aliases. When you see `length` in a language like Haskell you don't know what length exactly it refers to without looking at the type of its argument. But if you see `list_of_foo_length` then you know there can be only one `list_of_foo_length` and where to look for it.

This makes things like source code indexers have to know how to parse the language and how to understand its types even. Compare to the lowly ctags/etags/cscope of C world! Clearly monomorphic APIs are easier to deal with in some ways, though they also clutter things up a bit.


Yes it's in the same category of "we know better now, hopefully?" as people overloading the + operator for string concatenation.

Refining for clarity of categories is hard. I'm not sure 'type polymorphism' is the problem here as much as it is false conceptual polymorphism.


Yes, and common synonyms like `length` can hose you when you convert from a string to a list of strings. They shouldn’t really have the same name, especially in a Unicode world where there are several different lengths for a string.


If instead of changing the allow list from a vector to optional vector, it was changed, for whatever reason,to a vector of vectors, wouldn't the monomorphic example compile and silently do the wrong thing?

Even if you buy the thesis, the example is not terribly convincing.


This is a good point, even without polymorphism there are risky refactors, but the polymorphism increases the number of potential risky refactors.

On top of that, it might be a lot more obvious that turning a vector into a vector of vectors might be risky, but that you can iterate on Option won't be top-of-mind to everyone.


This is an interesting point but does rely on a rather narrow definition of “type safety”. By any definition I have used, the code is type safe in those final examples. It has a bug, most definitely, but it isn’t type unsafe in the way that reading a chunk of memory intended to hold type A as if it was holding a type B is


Fully ad-hoc polymorphic languages like Smalltalk have this property, but many prefer having the compiler catch more errors.


Rust's iterability of Option is a good example for this because (1) it has actually bit me, but it wasn't too bad and (2) it is somewhere close to the border of where it's reasonable/useful to treat it as a sequence or not. Logically it works, but Option is not primarily a sequence.

Then the Rust code in the article funnily enough not use traits for the `.iter()` call at all, that's just duck typing (no trait for .iter()).


Yeah. it's bit awkward to make `Option` implement `IntoIterator` and can certainly cause some weird and subtle bugs. But on the other hand it does make a whole lot of things work nicely. For example, you can turn a `Vec<Option<T>>` into a `Vet<T>` trivially with `foo.into_iter().flatten().collect()`. And generally it's nice to have Option and Vec composable.


I agree. Ironically the Rust standard library defines iter::once using Option::Some [O].

[0] https://doc.rust-lang.org/src/core/iter/sources/once.rs.html...


In Swift, you shouldn't use Optional like this. You should instead make a new type to represent exactly what parameter this function expects:

  enum AllowList {
    case all
    case some([ClientIdentifier])
  }


I would argue that you should do something similar in Haskell, too. It would ensure that practically everything accessing the allow list fails to compile unless it had been adapted to the new model. Haskell’s strength is the compiler and its ability to pedantically enforce type errors. Use it! Let it help you!


So far, in Rust, that has always felt unidiomatic and I haven't seen this pattern around a lot. It always felt like reimplementing Option/Result but worse: "Why use this when something conceptually similar exists in the stdlib?"

But that is exactly the point: it's only conceptually similar, but in this case deceivingly so. In Rust, we make abundant use of newtypes; this enum is also a new type. Only once it is treated as such does actual type safety kick in. I take it the same principle applies to Haskell. Good to know there's some merit to it.


I think this claim has some merit. Perhaps the only thing missing (or perhaps intended to be self-evident) is that ad-hoc polymorphism is also useful and pragmatically, this situation doesn't come up too often. In particular, encapsulation reduces the frequency that such errors can be introduced. In the article's example, the `Setting` struct should (idiomatically) have private elements and use a method in the struct's impl block to expose the (optional) length of the field. And, to support the author's point, you wouldn't use ad-hoc polymorphism for this either.

The author's concern should be considered when introducing ad-hoc polymorphism, especially when the type bounds are "loose" or commonly defined by many types.


I think part (but not all) of the blame in this case should fall to `length` and `.iter().count()` doing something rather different for a list and an Option/Maybe type. Does taking the length/iter of the latter really make sense? Or should one have to do `option.is_some() as usize` instead, if that's really what you want?

If the same method always did the same for all types that implement it, this problem wouldn't occur. (That being said, I understand that that might be too much to ask for / can't always be avoided.)


But you see, this behavior is the result of uniformly doing the same thing. You're asking for it to have special cases.

The signature "length :: Foldable f => f a -> Int" is pretty specific. The polymorphism is parametric over `a' and ad-hoc over `f'. It has to select its behavior based only on the "container" type, rather than the "element" type. (The quoted terms are sloppy, but better for intuition than talking about deconstructing curried generative type constructor application.)

Here's the thing about that. If your type is (Maybe (List Foo)), the `f' variable (container type) gets matched with `Maybe'. The `a' variable (element type) gets matched with (List Foo). So the fact that there's a list involved gets thrown away during instance resolution.

And that's a good thing, because it lets you reason about your code's behavior uniformly. Just seeing the type of `length' is enough to tell you it has to work this way. Any attempt to work another way would fail to type-check.

--

Ultimately I think I slightly agree with the original post, but I think it oversells the issue. Using ad-hoc polymorphism inside a definition that doesn't expose that polymorphism externally does allow a few more more incorrect implementations. That is some form of erosion of type safety, but a very minor one compared to what ad-hoc polymorphism allows you to recover in safety when writing code that's useful across many types.


> The signature "length :: Foldable f => f a -> Int" is pretty specific.

That may be, but it isn't the best way to describe the type of `length Maybe`, which behaves more like `f a -> Bool`, since it can only return 0 or 1.

> You're asking for it to have special cases.

Not really, I'm just questioning whether the `Maybe` type should implement `length`. Does it make sense to use an int to describe whether it has a value? Does it make sense to call that int its "length"? I would argue: sort of, but there are better names for it, such as `boolToInt . isJust`.


You're sort of asking the wrong question. The choice wasn't whether Maybe should implement length. Instead, there were two independent choices: should Maybe implement Foldable, and should length be implemented in terms of Foldable?

For the first, the answer is absolutely yes. You really want things like traverse and sequence to work with Maybe, and that requires Maybe to implement Foldable (on the way to Traversable.) It's incredibly useful to be able to treat Maybe as just another container type.

Whether length should be in terms of Foldable, by contrast, is a much more debatable question. On one hand, it means a lot of unexpected things can have length called on them. (length (3,4)) evaluates to 1, for instance; that's just almost never useful, despite being completely logical. On the other hand, it's really awkward to need a different function to get the size of every container type. Especially when you can write a length function using just Foldable. And I think that's what led to that decision - if the standard library doesn't provide that definition, a whole bunch of other libraries will.

I'm still not sure about that decision. It's got some weird corner cases. There are some cases where it allows bugs, as the article points out. But at the same time, I have written code that is happy to use that degree of polymorphism, and does so safely because it allows the caller to choose the concrete type.

Maybe the real problem isn't the existence of those instances. Maybe the underlying issue is that you can use ad-hoc polymorphism at concrete types without knowing what those concrete types are. The article did touch on that aspect in its final paragraph, when it mentioned monomorphizing length at the call site. But even this feels like it could easily introduce more pain that it solves.

I've spent more time here talking about the potential issue than I ever have spent dealing with it in real software. Maybe it's just not that big of a deal.


> Not really, I'm just questioning whether the `Maybe` type should implement `length`.

In this case no effort is made to make `Maybe` implement `length` specifically other than making it implement `Foldable`.

    ghci> :i Maybe
    ...
    instance Foldable Maybe -- Defined in ‘Data.Foldable’
    ...
and `length` is a method of the `Foldable` typeclass.

    ghci> :i Foldable
    ...
    length :: t a -> Int
    ...
https://hackage.haskell.org/package/base-4.18.0.0/docs/Prelu...

length implementation for `Maybe`

https://hackage.haskell.org/package/base-4.18.0.0/docs/src/D...

edit - accidentally submitted while part way through editing.


OK, but the question remains, does it make sense for `Maybe` to implement `Foldable`?

The second link says:

> The Foldable class generalises some common "Data.List" functions to structures that can be reduced to a summary value one element at a time.

does it make sense for this to apply to `Maybe`? I guess technically, sort of, but I think it's not a perfect fit. Perhaps there might be some use cases for it, but I'd argue it also carries some risks, as demonstrated in the posted article.


> to structures that can be reduced to a summary value one element at a time.

This is the key part in my mind. Or elsewhere in the documentation:

> Class of data structures that can be folded to a summary value.

https://hackage.haskell.org/package/base-4.18.0.0/docs/Data-...

Foldable is the generalization, in the mathematical sense of the word, of some list functions, but it is not just the list functions. Generalizing something and applying to other structures sometimes results in unintuitive behavior, but is mathematically correct.

`Maybe` makes mathematical sense with foldable. A function with the type that `length` has for Foldable `Foldable f => f a -> Int` makes sense as well. The number of items in a list or a tree, or various other structures makes sense for 'data structures that can be folded to a summary value'.

The design decision I think is whether or not you call `Foldable f => f a -> Int` in Foldable `length` or something else. Since you have `length` for list structures already and `Foldable f => f a -> Int` is `length` for lists you can call it `length` instead of having two functions `length :: [a] -> Int` and lets say `foldCount :: Foldable f => f a -> Int` that both do the same things for lists.

I am pretty sure a discussion on this decision popped up, both in mailing lists and r/haskell, when they made the switch from `length :: [a] -> Int` to `length :: Foldable f => f a -> Int` in the haskell prelude. I did not follow it closely though both reduction in function count, pedagogical arguments and more came up.

edit - The proposal was the FTP proposal if you want to google those conversations.

Some links to start with

FAQ:

https://wiki.haskell.org/Foldable_Traversable_In_Prelude

random reddit conversation on r/haskell that popped up on a google search:

https://www.reddit.com/r/haskell/comments/3okick/foldable_fo...


It’s useful for Option to be iterable so that it works with things like .flat_map(), and it’s useful to be able to count from an arbitrary iterable.

.iter().count() is a contrived way to find the length of a Vec though, since .len() is clearer and faster.


Isn't the idea with traits that you can pass in anything you want that implements the trait?

I think a larger problem here is that the function takes Settings and not a &Vec Option<&Vec>, that way you'd notice immediately.

Or you split the code and add type constraints:

    struct ClientIdentifier {}

    struct Settings {
        allow_list: Vec<ClientIdentifier>
    }

    struct Settings2 {
        allow_list: Option<Vec<ClientIdentifier>>
    }

    fn allow_list_count(settings: Settings) -> usize {
        // clear you're iterating over a slice
        let iter: std::slice::Iter<'_, _> = settings.allow_list.iter();
        // and the count is len
        iter.count()
    }

    fn allow_list_count_2(settings: Settings2) -> usize {
        // clear that you're iterating over Option<T>
        let iter: std::option::Iter<'_, _> = settings.allow_list.iter();
        // which is 0 (None) or 1 (Some(T))
        iter.count()
    }


The claim

> Using ad-hoc polymorphism can cause code to silently do the wrong thing after a refactor because of a type error, in a way that monomorphic code would not have

doesn't mean languages with ad-hoc polymorphism are less type-safe overall. Ad-hoc polymorphism allows you to write stronger-typed code more easily.


I mean, this is true. But traits are also very useful. Feels a bit like saying "driving is dangerous; avoid driving where possible".

Also "allow-list" is so gross. Maybe pick something less divisive for future examples.


> I mean, this is true. But traits are also very useful. Feels a bit like saying "driving is dangerous; avoid driving where possible".

I agree. Just because it can be misused does not mean it is no good for anything. Note that you can declare types explicitly, and is usually how I do when I write anything in Haskell anyways, which makes the program clearer.

Also, changing the types after the rest of the program is already written, is going to be a problem in any programming language even without ad-hoc polymorphism, if you do not then consider everything that needs to be changed due to that.

> Also "allow-list" is so gross. Maybe pick something less divisive for future examples.

I agree with this, too. You can call it "allowed clients" or "list of allowed clients" if you want to, which is better in my opinion. (Simply "allow-list" does not even describe very well what it is for; what is being allowed?)


> Also "allow-list" is so gross. Maybe pick something less divisive for future examples.

What is gross or divisive about that?


I'm guessing this is a reference to the movement by some people to use that term instead of "whitelist" on the grounds that the latter is racist. There's a division over whether this is silly or not.


Lots of people (including me) think it sounds extremely stupid and "whitelist" was perfectly fine anyway.

Reminds me of "mebi-byte", which is similarly gross (but at least that had sensible motivation).


I agree with you, "whitelist" is a perfectly fine word, and "allow-list" doesn't sound good. I also agree that "mebi-byte" doesn't sound good either, but yes, at least it is sensible.

However, just because a word is OK does not mean it is applicable to all contexts. You can call the list "AllowedClients" or something else like that instead, which explains what is being allowed (since there might be multiple kinds of restrictions that you might wish to define based on lists). (Of course, this is a whitelist mode, and can be described as such.) However, if the program already exists, then changing the name of the setting would be even worse (unless the old name is still available as an alias, but you must be careful to ensure you do not mess up anything by doing so), since that would just break compatibility (you can change the documentation though if the existing documentation is unclear).

Also, in some contexts a word can be confusing and a different word is preferable. For example, if the list has something to do with colours, then using the terms "whitelist" and "blacklist" might be confusing and perhaps should not be used. Likewise, singular "they" is OK, but if singular they is confusing in that context then you should use different words instead.

Still, if some people do not like "whitelist", you do not have to use it, but you should consider what words to use which are appropriate and descriptive. Fortunately, sometimes they do, but unfortunately sometimes they don't because their only consideration is to be not racist, instead of to actually be good.

I also do not like the word "tonne"; I prefer "megagram" (which is more descriptive, just as easy to understand, and not confused with "ton").


Lots of people think whitelist is not fine, and decide to use another term whose meaning is perfectly clear and understandable. You don't like that term, but what would you have people use that's less divisive? Clearly "whitelist" does not work, since it was the initial source of the issues.


The example in this article could have been any kind of list.


the entire idea of having a "allow" list is itself exclusionary and divisive. why are some clients allowed and some are not? </sarc>


Lists of "Allow" and "Deny" have been around a long time, for example in Apache for probably nearly 30 years.

https://httpd.apache.org/docs/2.4/mod/mod_access_compat.html...


I don't see "allow-list" or "deny-list" anywhere there?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: