I disagree, and I think this kind of fundamentalism hurts the adoption of type safety. It's like saying that a non-machine-checkable mathematical proof is not a proof - something few working mathematicians would agree with.
In a well-formed program it is impossible to have a OneToFive that has not passed through toOneToFive. That's type safety by any reasonable definition. It's not as much type safety as you'd gain through explicitly modelling the internals, but that's a difference of degree, not kind. Sure, using `Generic` or any number of other things lets you break your rules - but so does using `unsafeCoerce` on the constructive version.
I'd argue that newtypes provide a better cost/benefit than essentially any other language feature. The author purports to embrace the idea that the type system is a tool to be used pragmatically, but that's exactly what using a newtype does: you enforce that appropriate checks are applied to any use of any given type, but what's "appropriate" is an internal concern for that module. Whereas expecting to be able to construct every domain datatype rigorously from first principles is simply not realistic in a lot of business cases; that constructive model may well be opaque or simply not exist for the domain you're working in. (Yes, this may well mean the domain you're working in is fundamentally incoherent - but if that incoherence is present in the real process that you're modelling, then your model has a responsibility to faithfully reproduce it).
This kind of newtype use can be done compositionally - one module can be reasoned about without needing to understand the internals of the modules you're depending on - which is the essence of practical, effective programming tools. A technique that only works in a "closed world" will always be severely limited in its applications.
Read the article more carefully. I wrote in multiple places that the use of newtypes does provide real safety.
> If you are fond of newtypes, this whole argument may seem a bit troubling. It may seem like I’m implying newtypes are scarcely better than comments, albeit comments that happen to be meaningful to the typechecker. Fortunately, the situation is not quite that grim—newtypes can provide a sort of safety, just a weaker one.
> […]
> This tradeoff may not seem all that bad, and indeed, it is often a very good one! Guaranteeing invariants using constructive data modeling can, in general, be quite difficult, which often makes it impractical.
> […]
> Newtypes are useful when carefully applied
The whole point of the blog post is that newtypes are useful, but only when used in certain ways and in a weaker sense than constructive data modeling.
This is unnecessarily rude. Maybe you should have written the article more carefully, or read my comment more carefully.
> newtypes are useful, but only when used in certain ways and in a weaker sense than constructive data modeling
Constructive data modelling can be an easier way to provide certain kinds of guarantees in some circumstances, perhaps. But the claim that newtype-based approaches are not type safety remains false.
"newtypes can provide a sort of safety, just a weaker one. The primary safety benefit of newtypes is derived from abstraction boundaries. If a newtype’s constructor is not exported, it becomes opaque to other modules. The module that defines the newtype—its “home module”—can take advantage of this to create a trust boundary where internal invariants are enforced by restricting clients to a safe API."
From an outside perspective, you seem to be arguing the same things the author has already stated in the article, which is why he's asking you to read the article more carefully.
For those who may not be familiar with the Haskell ecosystem, Alexis King (lexi-lambda) writes many blogs and libraries related to Haskell.
For example, you may be interested in her recent talk on effect systems in Haskell at ZuriHac 2020,[1] based on the eff library she mainly contributes to.[2]
> "newtypes can provide a sort of safety, just a weaker one. The primary safety benefit of newtypes is derived from abstraction boundaries. If a newtype’s constructor is not exported, it becomes opaque to other modules. The module that defines the newtype—its “home module”—can take advantage of this to create a trust boundary where internal invariants are enforced by restricting clients to a safe API."
The author appears to be claiming that this is somehow distinct from (and qualitatively weaker than) "type safety", without any justification for that claim.
"To some readers, these pitfalls may seem obvious, but safety holes of this sort are remarkably common in practice. ... Proper use of this technique demands caution and care:
* All invariants must be made clear to maintainers of the trusted module...
* Every change to the trusted module must be carefully audited to ensure it does not somehow weaken the desired invariants.
* Discipline is needed to resist the temptation to add unsafe trapdoors that allow compromising the invariants if used incorrectly.
* Periodic refactoring may be needed to ensure the trusted surface area remains small...
In contrast, datatypes that are correct by construction suffer none of these problems."
The article lists a few specific pitfalls. I don't think this justifies the claim that "it is a meaningfully distinct kind of type safety": newtype-based approaches may have further pitfalls that ground-up construction approaches do not, but ground-up construction approaches do still have pitfalls.
To get more specific:
* Safety holes of this sort are remarkably uncommon in practice, in my experience.
* Modules should be small and easily understood. If maintaining a module's invariants becomes too complex then you can, and should, recursively apply the same techniques within the module, breaking it up into smaller modules. This is good development practice anyway, as is periodic refactoring. So these cautions are a lot less costly than they sound; in fact the cost may well be zero.
* You need to resist the temptation to add unsafe trapdoors in any other approach as well; this simply isn't a disadvantage that's in any way specific to using newtypes.
> * You need to resist the temptation to add unsafe trapdoors in any other approach as well; this simply isn't a disadvantage that's in any way specific to using newtypes.
In fact, in some cases, it's useful to retain (redundant or invalid) state that's discarded in "correct by construction" data structures. For example, saving application configuration as Option<int> is easier for the application to read correctly (int value, bool present). However when a user is editing an Option<T> through a GUI (checkbox, number) pair, then unchecking the checkbox will set the value to None and discard the last entered value, which is a poor user experience in my view.
This isn’t necessarily so, though. I don’t see why unchecking the checkbox can’t create an Option<T> that is passed (int value, bool false). It just means that you accept this as a valid construction state, and that it must therefore be handled.
I admit that I might be missing something fundamental to this discussion.
I meant to say "saving application configuration as Option<int> (either int value or None) is easier for the application to read correctly then a tuple (int value, bool present)."
I think the request is rather justified given your response, which is engaging with a strawman version of the thesis presented in the post. What you wrote about the virtues of newtype was explained quite well in the post itself, but you seem to want to disagree about something. You present another strawman in this reply:
> the claim that newtype-based approaches are not type safety
To use your phrasing, the claim is that newtypes do not in and of themselves provide type safety, but that "newtype-based approaches" can and do provide a weaker form of safety than constructive modeling. Further, it's important to understand what the critical additional steps are in such approaches to achieving that safety and avoid cargo-culting "newtypes make things type-safe", and to understand the ways this safety can be violated.
> What you wrote about the virtues of newtype was explained quite well in the post itself
The post implied that the newtype-based approach was somehow "not type safety" and offered significantly less safety in practice than the constructive model approach. The first of those is definitely false, and based on my own experiences I don't believe the second.
> Further, it's important to understand what the critical additional steps are in such approaches to achieving that safety and avoid cargo-culting "newtypes make things type-safe", and to understand the ways this safety can be violated.
Naively it sounds like that would be important, but I'm not convinced it actually is in practice. My experience is that even a cargo-culty use of newtypes delivers most (maybe even all) of the defect rate benefit, and that these vigorous warnings about newtypes are more likely to reduce real-world safety (because people faced with cases that they can't produce a constructive model for will be encouraged to use an alias, or no type at all, rather than a newtype) than improve it.
Again, you are conflating newtypes per se with approaches using newtype as part of an overall abstraction and interface design including constructor hiding. The post makes this quite clear and the author and others have pointed out this distinction to you multiple times here. Just because you made an inference and argued against it doesn't mean the article was making such an implication.
Your second point is certainly arguable and does represent a substantive disagreement with the post unlike your first comment. I think the post makes a solid case that gratuitous use of newtype absent any abstraction boundary is an anti-pattern that provides little benefit if conversions are done ad hoc.
Sprinkling on some newtype can certainly help some cases, but the post encourages critical thinking about these issues and explores where this reasoning falls down. If your claim is that critical thinking gets in the way of cargo-culting approaches that lead to "nominal" type safety then you would be right. Your last parenthetical doesn't seem like a reasonable response of someone who has read and understood this post. The post specifically encourages the use of newtype with abstraction boundaries where practical and works through an example.
> Again, you are conflating newtypes per se with approaches using newtype as part of an overall abstraction and interface design including constructor hiding. The post makes this quite clear and the author and others have pointed out this distinction to you multiple times here. Just because you made an inference and argued against it doesn't mean the article was making such an implication.
The author is still carefully avoiding describing any newtype-based approach as "type safety", to the point that they would mislead anyone who wasn't already familiar with the subject.
> Your last parenthetical doesn't seem like a reasonable response of someone who has read and understood this post. The post specifically encourages the use of newtype with abstraction boundaries where practical and works through an example.
It works through an example and follows that with a long list of (IMO exaggerated) weaknesses of that example, which seems more designed to dismiss it. It does not "specifically encourage" using newtypes at all (indeed it says "if you are fond of newtypes" as though one would only ever use newtypes for private emotional reasons). It concludes with "correctness by construction should be preferred whenever practical".
If you'd carefully reread the post you'll see that your first assertion is false. It contains this sentence: "But it is a meaningfully distinct kind of type safety from the one I highlighted a year ago, one that is far weaker." The antecedent of "one" is "type safety", which is being provided, just, weaker, not "not type safety" as you claim.
As for whether the example is being "dismissed" or not, I also think that's manifestly false. See: "[this tradeoff] is often a very good one!" while still diving into exactly what the tradeoff is.
I agree with the OP but for a slightly different reason.
Type safety is valuable in that it relatively easily helps achieve a goal of alerting the programmer that he/she is mixing together things that should have not been mixed. Ie. using something in a context where it should not be used.
That goal can be achieved in other ways and it is not even the best way to achieve that goal.
Type safety is valuable in that:
* does a lot of sense for a user of the programming language (ie. programmer) IF it is done sensibly (forget about template metaprogramming, that is not sensible). Thinking about things in terms of types is natural to our brain structure.
* warns early (ie. your program does not compile)
* enforces on everybody working on the application (you can ignore conventions but you can't ignore compilation error)
Now, what I don't like is that people forget that types and type safety is there to achieve the goal (making programs better, easier to read, easier to reason about, harder to make a mistake).
Thinking about type safety as if it was some kind of security mechanism to prevent any and all kinds of mistakes is IMO very misguided. I don't want to battle with my type system, I want to write working code and I want to use type safety mechanism that lets me get there reasonably without spending huge amount of extra effort.
So, I think, there is a sweet spot for type safety, where it is enough to help prevent most mistakes and make reading the code easier, but not enough to make you reduce your productivity.
Not only that, the sweet spot will depend on what task you are working on.
For example, languages with relaxed type safety are good for rapid development -- ie. "scripts". There sweet spot is shifted to less type safety because your program is much smaller and you don't need help remembering what are different things and there is probably less developers working on it.
On the other hand, for large projects with many people working on them, long build times, expensive testing, etc. you would very likely want a language with stronger typing (like Java that is very poor language but offers very good practical type system).
Now, Rust type safety is a special IMO compared to other type safety mechanism in that it is very hard for the user (ie. programmer) but I still am ok with this. The reason is because there the type safety is used to get so much more that would not otherwise be possible (at least not with our current knowledge).
Definitely agree the best approach is doing modelling that results in trustworthy / usable coverage checking. ESP since for constrained ranges of values where new type won’t play great with supporting pattern matching without some fancy pattern synonyms.
I have seen folks write code that’s like
type Name = String
rather than use a new type, and I think really new type is about having enforced abstraction over the representation of a datatype, to prevent silent corruptions from transparently equivalent representations that have different meanings.
I guess I think it’s good to try to split this sort of opinion piece into a part that’s a forward looking tech challenge along side an exposition that hopefully has clear methodological guidance for today
>> I disagree, and I think this kind of fundamentalism hurts the adoption of type safety.
For what it's worth, I got a similar impression: An excessively narrow (implied) definition of "type safety", based on which newtype wrappers are criticized, while minimizing or denying the benefit of newtypes over type aliases in preventing errors matching arguments to functions. E.g. the `newtype Email = Email String` example someone mentions down-thread, which has clear benefits as soon as there's a function `sendEmail :: Email -> IO ()`. Benefits that I'd argue many people would be happy to see as part of "type safety".
You say that, on its own, a newtype is nothing more than a name and that names are not type safety, but then you give the example of using newtypes to prevent someone from adding a distance and a duration. I think it's a false distinction to say that the safety there is some kind of safety that isn't type safety.
Yes, on its own, a newtype is nothing more than a name. The safety comes from pairing a newtype with an encapsulation mechanism and a carefully-designed trust boundary.
Without an encapsulation mechanism, I do not consider using newtypes to wrap real numbers with units of measure sufficient to be called “type safety”; in my experience it still requires significant discipline to use properly (because the points of wrapping/unwrapping are usually fairly local and require delicate care).
Of course, this is a matter of both subjective definition and relative situation. One can theoretically imagine a codebase that conventionally uses units-of-measure wrappers so pervasively that the safety is genuine, since the places where values are wrapped/unwrapped are so well-defined that any misuse would stick out as wrong. However, I have never in my life seen such a codebase, so anecdotally I can only consider such measures more like the lines painted on a road to delineate lanes than a bona fide safety mechanism.
So basically, "if you don't use newtype to implement an abstract data type, you don't get any type safety because the on-spot wrapping/unwrapping is still possible".
On one hand, yes. On another, no. I am working right now on a Go codebase with lots of newtypes (well, the Go's equivalent), and they're generally casted to/from underlying primitive types basically in two places: when they're serialized into JSON, and where they're deserialized from JSON. And in several places in the middle of the code where you need an explicit cast, well, the cast is explicit. You actually have to consider it.
Mind you, in Go it's almost impossible to hide the newtype's constructor unless you jump through some rather unintuitive hoops. Is Go fundamentally type-unsafe? I don't think it is, although I sometimes wish some structs were impossible to zero-initialize. Then again, that's generally amended by unexporting the struct and exporting its interface instead.
Cardelli's definitions are extremely odd; if you take them literally then Python is type safe but Java is not.
In everyday language a downcast, even a checked one, is not a type-safe operation, and so to the extent that Go's limited type system makes it impractical to write programs without downcasts I'd say that Go is fundamentally type-unsafe.
Well, it's a matter of taste, I guess. In my opinion any downcast that reliably distinguishes between valid and invalid values is type-safe. The behaviour of
defer func() {
if e := recover(); e != nil {
fmt.Printf("not a string %v\n", e)
}
}()
v := interface{}(5)
u := v.(string)
fmt.Println(u)
is well-defined: it will print "not a string", always.
By that logic there's no such thing as a non-type-safe language, because all programs have behaviour.
Normally one would say a checked downcast like that is not type-safe, because you can't reason locally about the type behaviour of the downcast based on knowing the type of v. You would have to know the value of v, which is a Turing-complete problem in the general case.
> It's like saying that a non-machine-checkable mathematical proof is not a proof - something few working mathematicians would agree with.
Only because we're not ready for that yet. Given the current technology (Coq, Agda, Mizar etc.), writing a machine-checkable proof is unbelievably tedious. That doesn't mean it wouldn't be welcome; many published papers, even widely accepted ones, have later been found to contain errors. [1] [2]
> In a well-formed program it is impossible to have a OneToFive that has not passed through toOneToFive.
I think this is the key. In any (type) system, there’s an opportunity to chicanery the compiler. The degree to which you have to treat users of that system like children depends on how likely you think they are to behave badly.
If a “well-formed program” is something you can reasonably expect in your environment, you can also reasonably assume you also received a nominal type because it was validated. If it’s not, the expectation is that someone is lying to the compiler... and then you treat someone like they’re misbehaving. Even if they’re not.
The whole conceit of fancy type systems is that it's generally useful to move as many things as possible from the realm of "humans must verify" to "computers can automatically verify on behalf of humans". Relying on data representations which are richer or more constrained than newtypes, and treating with suspicion promises the computer can't automatically verify, isn't "treating your users like children," it's acknowledging that you and your users are, eventually and probably frequently, going to forget important assumptions and make mistakes.
What I mean is that in a system where you’ve defined a type that requires validation to be assigned safely, and where you’ve provided means to validate untrusted input, functions which don’t handle untrusted input should be able to use those types freely... unless you don’t trust other people assigning types in the system.
In TypeScript we sometimes define nominal types with a technique called “branding”, and it works the same way: validate untrusted input and cast; trust at layers beneath that surface area.
My reading of the OP is that these techniques (branding, nominal typing, &c) are good and useful, but it's even better if you can remove/minimize the necessity of ever trusting software developers to make reasonable decisions. Speaking from experience, even on personal projects where I am fully in control of my requirements and deadlines and the only person who has ever and will ever touch a codebase is me, I have regretted trusting software developers to make reasonable decisions w/r/t an internal (let alone external) API.
Less cynically, if you have assumptions and facts present about your data at runtime, it makes good sense to spend some extra effort encoding them explicitly in the data representation when your type system supports it, rather than allowing them to exist implicitly (or by documentation only) as a tag. Even if you discount the advantages provided in shrinking how much you are forced to trust your users, reifying your knowledge about a piece of data gives the compiler (and other static analysis tools) more information to help you write the code in the first place: The article provides some examples in terms of exhaustiveness checking, but you can also leverage such things if you have access to fancy toys like refinement types, dependent types, or a development environment that can do tactics-style interactive transformations. IMO it also makes your code easier to read and understand, but that might just be a matter of taste.
So I think that's true as far as it goes, but vastly overstates how much difference fully constructive types actually make. Starting to use types, even nominal ones, to express your requirements is a real game changer, a step change in your defect rate: it turns making a correct program from an intractable global problem into a compositional problem that you can reason about locally. Using constructive types is a technique that eliminates a couple of rare classes of errors. It's fine, and if you can do it cheaply (which is pretty rare in practice IME) then it's worth doing, but it's really not a major difference-maker. And I think a post like this could easily give the impression that newtypes aren't all that, so to speak, when nothing could be further from the truth.
>In a well-formed program it is impossible to have a OneToFive that has not passed through toOneToFive.
It does seem more likely that the program over times becomes less well formed and you end up with a OneToFive that has not passed through toOneToFive than having a good type definition for OneToFive is changed to be less precise and open to abuse.
Note the seems more likely is just in my opinion of course as I have not ever seen any statistics on this kind of degradation of program quality and must thus just rely on my own experience of how these kinds of things go.
I've been thinking for a while about something like an "assertion based type system." Rather than types being variants of other types, types can be described as other types + restrictions. Functions take variables with types but also come with a list of assertions. For example, it is sometimes useful in graphics to differentiate, in the type system, a 3D vector and a normalized vector (e.g. separate Vec3 and Norm3 types) but there is nothing you can do to actually put this into the type system. In some assertion based system, you would write something like: `type Norm3 = Vec3 v given(len(v) == 1)`. You could cast a Vec3 to a Norm3 which would run the assertion at run time, or you could write a function like `normalize` which always returns a Norm3 (maybe even it comes with its own assertion that len(v) != 0). Then you can safely pass your Norm3s to functions that expect normalized vectors. It's always recommended to make bad states unrepresentable, but it's weird how no language has ever given you a system like this.
As was already mentioned in another reply, what you are describing are refinement types. A refinement type system actually exists for Haskell: it’s called LiquidHaskell,[1] and though I have not used it for anything serious, it seems to work well for certain kinds of problems.
The main challenge of refinement types is that arbitrary properties are very difficult to check in general. (If that weren’t the case, software verification would be easy!) I wrote about this at length in my previous blog post,[2] and Hillel Wayne wrote about it earlier this year from another perspective.[3]
> arbitrary properties are very difficult to check in general.
To add to this:
In theory, it is not just difficult, but impossible.
Imagine that the refinement predicate is whether the String/Text encode a (non-)halting Turing machine/Haskell program. Checking it would solve the halting problem. And this predicate is the proof of Rice theorem.[1] (Granted, this may require some sufficiently powerful logic such as first order, not sure if this proof applies to LiquidHaskell.)
In practice, I think reasonably intuitive properties are already very difficult to formalize in refinement types.
>In practice, I think reasonably intuitive properties are already very difficult to formalize in refinement types.
Could you elaborate on that? Unless you're just saying that formalizing reasonably intuitive properties is difficult in general, I don't see what you mean.
>The main challenge of refinement types is that arbitrary properties are very difficult to check in general.
While this is true, I'm not sure how constructive types don't face the very same issue. Ultimately, you will have to provide evidence for the required properties, with constructive types the proof is just (more or less) implicitly embedded into the datatype, which imo just makes dealing with these properties harder rather than easier.
In regards to the post by Hillel Wayne you've posted, I'm not sure what this statement
>This means that predicative data is easier to express at the cost of permitting representable invalid data. This is enough of a problem that we prefer constructive data whenever feasible.
is supposed to mean exactly. If your predicate makes some data invalid, it's just not representable in the refined type.
> is supposed to mean exactly. If your predicate makes some data invalid, it's just not representable in the refined type.
Author here. It's representable; the type checker has to prove it never happens in practice. This requires very different techniques from typechecking constructive types. Liquid Haskell shells out to Z3, which can return "unknown" as an answer in addition to "sat" or "unsat".
(There's actually a fourth option, which is that Z3 runs forever, but I think LH does a bunch of clever design to avoid that case.)
You can also `assume` that refinements are satisfied without proving them, in which case they can be violated at runtime. This would be impossible if the invalid data was unrepresentable.
Other approaches to refinement types, such as the original ML paper [0], implemented them constructively and didn't have this problem.
> While this is true, I'm not sure how constructive types don't face the very same issue.
Not sure if the author claims that constructive types do not face this issue.
On constructive types vs refinement types, there is a recent Reddit discussion on this topic,[1] prompted by Facebook’s use of Dependent Haskell to eliminate bugs.[2]
>Not sure if the author claims that constructive types do not face this issue.
This might be true, but when they were making the case for constructive types they didn't mention it.
>On constructive types vs refinement types, there is a recent Reddit discussion on this topic
The discussions seems to be mostly dependent types vs refinement types, which I'd argue is a false dichotomy. I'd say that dependent types is a what makes refinement types actually powerful in the first place. It's also how most Coq specifications are written.
> This might be true, but when they were making the case for constructive types they didn't mention it.
Agree.
> The discussions seems to be mostly dependent types vs refinement types, which I'd argue is a false dichotomy.
You are right, I think I confused two notions:
1. constructive vs non-constructive reasoning.
2. intrinsic vs extrinsic types (aka Church vs Curry types[ 1]).
In general, intrinsic types are constructive (because Program-is-Proof due to Brouwer–Heyting–Kolmogorov and
Curry–Howard and others).
But extrinsic types could also be constructive, esp. when the external static analyzer has access to the program, such as in refinement reflection [2] (as pointed out in the above Reddit comment thread).
Then I think the author is arguing for constructive proofs, which come naturally with intrinsic types, or at least with extrinsic analyzers accessing program structures; and arguing against non-constructive (extrinsic) types, or anything detached from implementation.
I think dependent types are stronger than refinement types because they can encode types that don't necessarily have to be decidable. Refinement types are used for creating subsets of a type, while dependent types can be used for creating arbitrary types based on values. As a result, dependent types are more powerful, but they might be too much if all you need are some constraints on an existing type.
Sounds like you're describing dependent types. You've now got the problem that showing your program is correct involves writing maths proofs that show your properties hold in all cases (which is an undecidable problem).
But maybe analyzing all cases is not needed if we know (or postulate) something about the predicates? E.g. we can prove (or just assume since the case is simple) that construction of Norm3 is defined for any triple except (0, 0, 0), and that it is preserved under rotation and under reflection, but guaranteed to be broken under addition.
I suspect that if we factor out such basics, the amount of proof for a reasonable function may become manageable.
That's like what the blog post is about where you're tagging values you assume have a certain property.
> I suspect that if we factor out such basics, the amount of proof for a reasonable function may become manageable.
It's not obvious how you contain the complexity though and you don't need a lot before it becomes well beyond practical for even experienced programmers. The moment you have to write an algebraic proof where variables are multiplied together you're in the realm of nonlinear arithmetic which is undecidable for example, and writing proofs is hard.
Mainstream languages contain the proof complexity by limiting what you can express e.g. it's trivial for the computer to prove "String = String" and "Dictionary = HashTable" as part of type checking.
It's mainly concerned with runtime-checking, but the idea is to establish a standard that can be leveraged in multiple ways, from tests to (I believe) some degree of static analysis
What you are looking for are refinement type systems. LiquidHaskell [0] is the most well known refinement type system out there, to specify and verify these kind of assertions.
> Newtypes are useful when carefully applied, but their safety is not intrinsic, no more than the safety of a traffic cone is somehow contained within the plastic it’s made of. What matters is being placed in the right context—without that, newtypes are just a labeling scheme, a way of giving something a name.
What's a better alternative though?
I like this approach for tagging e.g. user input that has been checked to be free of HTML/JS injection hacks to stop you from outputting user input that hasn't been checked first (Angular does something like this with TypeScript I think). Obviously you could tag something as safe that isn't by mistake but at least the places in your code this decision is made is explicit and you can't forget to do it.
I'm not seeing a practical alternative that would be significantly better. If you want to be able to capture any property you can think of in the type system (like bounded integers or bounded lists), you're inevitably going to need dependent types and they're about as far from practical as you can get (where either you or the computer will then have to write difficult maths proofs before you know your code is correct).
I’m not really suggesting there needs to be an alternative. Using newtypes to achieve safety via encapsulation is fine, good even, and I say as such at several points in the article.[1] The point is twofold:
1. A lot of uses of newtypes in the wild are “safety theater” and provide zero actual safety benefit.
2. The uses of newtypes that do add safety provide it in a weaker sense than correct-by-construction data modeling.
The point is not that newtypes are always useless. In fact they are often by far the most practical tool for the job. The blog post just advocates being conscious and considerate of the limitations of the techniques.
In Haskell, you would use an algebraic data type that can only hold legal values. You would typically only use a newtype when you're just giving a new name to the same type.
Having a Magnitude newtype over Int doesn't change the range of permissible values but does prevent using a Length where a Magnitude was expected, for example.
In all typed languages you'll eventually see a function with multiple arguments of the same type and have to check the documentation (or worse, implementation) to know what each is for. Having a zero cost type alias to make it clear is an improvement and IMO is a form of type safety.
Having a data type that can hold illegal values is a bad idea. Using an Int for Length implies you know what to do with negative lengths, for example. A newtype won't help there.
How do you write a data type which can only hold legal email addresses? Or URLs? ZIP codes? Or even dates and times? Expressing all of the requirements of these real-world data formats, within a type system such as Haskell's, seems like an extremely daunting task to me. I have programmed in Haskell a fair bit in the past and I have no idea where I'd begin with those.
You're combining acceptable structural representations with validated as correct inputs - this is a trap to be careful of.
It's real-world data, which is rarely 100% clean and correct. If you can only represent clean and correct data, you probably can't handle all the data you'll need to.
For an email, a "newtype Email = Text" is, IMO, correct. You can break it down as far as you need though: "data Email = Email (Maybe Name) LocalPart Domain" eg, with the three sub-types being newtypes of text. If you try to go further you'll run into problems that the best definition of "valid" is "works when used."
The situation for URLs is similar.
ZIP codes is an even trickier one - if you can only represent genuine ZIP codes, what happens if you need to represent an address where someone's accidentally transposed two digits and wound up with an unused ZIP code? If you're not US domestic only, what do you do for the four billion or so humans who don't have a structured address at all?
(There's a reason vCard gave up on forcing structured addresses and added a "just whatever this text chunk here says" alternative).
If you want to represent "validated version" data then IMO it's sufficient to use a "data ValidatedEmail = ValidatedEmail Email" with a non-exported constructor and a "validateEmail :: Email -> Maybe ValidatedEmail" function.
If you're, say, the USPS and really must represent valid ZIP codes and only valid ZIP codes, you'll perhaps wind up in the rabbit hole of defining a hierarchical hot mess of sum types. Or pragmatically accept you will risk invalid ZIP codes to avoid needing to do that.
Practically, I disagree with the GP. A newtype is fine in these cases. You don't have to say you know what to do with negative Lengths if you have a smart constructor that won't let you create one.
Dependent types don't provide any safety over smart constructors for parsing, which is the most common use case for types with restricted values. Validating user input, parsing emails or URLs, etc. all must deal with an invalid case at runtime, and dependent types are of no use there.
When I say range types I mean a type like '5..23' - a sub-range of some other type. Haskell doesn't have these. Most languages don't - one example that does is Ada:
As you can see from that link, it's bounds checked at runtime. You could have a modulo type that wraps on overflow, which is of course what most numeric types are in most programming languages, including Haskell, but the modulus is fixed at some power of two for obvious reasons.
There's only one thing I truly value out of a good type (or static analysis) system, and that's enabling fearless refactoring. I want the type system to pick up on any downstream consequences of a change I make rather than finding out by phone call at 3am the day after a deploy that some edge case got overlooked.
Dependent types can (in theory) give that over smart constructors. If my parser should leave me with, say, a sorted vector, or an assertion that if one field is some value then some other field is Just something, then I'd like to be able to rely on those facts in downstream code making use of the results of the parser. If it's just a smart constructor as we usually write them a later refactor (sorting on a different key, for example) doesn't change anything that any other bit of code can be verified against.
A dependent type carrying a proof that a vector is sorted based on some criteria and an assertion about the existed of some value can be verified at compile time.
I'm not sure what the status is of dependent types for Haskell. You could do similar propositional assertions now with Ghosts of Departed Proofs, and LiquidHaskell exists for these sorts of assertions, so dependent types aren't the only path to this kind of static safety, either. I won't claim that this is practical - though I would be surprised if it's not practical to do in small doses.
Nevertheless, I would still just be reaching for smart constructors in current day Haskell.
What dependent types would protect you against would be the parser itself returning data. I think it'd be quite difficult to do this for the case of an email or URL, but for cases like "length values must be non-negative," a parser/validator with a type like
parseLength :: JSON -> Either Error (Sigma (l : Length). lengthToDouble l >= 0.0)
is reasonable, and I'm sure somewhere in the literature there's an indexed monad or something for building parsers/validators out of these, that keeps track of the properties while combining parsers like these.
You have a constructor function which does the input check, and it lives in a module that only exports the smart constructor, not the type constructor. It doesn’t matter if another module deconstructs the type, since the input was already checked.
For example:
newtype Username = Username Text
mkUsername :: Text -> Either Text Username
mkUsername t = if t == “” then Left “Username cannot be blank” else Right (Username t)
If you don’t export the constructor for Username, and only export mkUsername, it doesn’t matter if someone deconstructs the Text from Username, because it must already be validated.
The entire point of a good type system is to reduce the need for extra cognitive diligence around your data/code. The more corner cases you have to be careful around, the less helpful the type system is.
I've gotten bit by the Generic one in real code before... I'm sure if I used Haskell for RESTful web stuff more, I'd eventually get bit by something similar for Aeson too.
Applying the author's "parse, don't validate" concept, I think you'd parse the user input into a "safe HTML" type with a definition that excluded the language features that enabled those attacks. It might not be worth writing such code for a single project, but I could get behind a library that did that.
This is an approach I've used in C#/Java - you have 'UnvalidatedFoo' and 'ValidatedFoo' types, UnvalidatedFoo has a public constructor, ValidatedFoo has a private constructor.
The only way to acquire a ValidatedFoo is to pass an UnvalidatedFoo into ValidateFoo(). The outer layer of the code deals in UnvalidatedFoos, but past that point you must have a ValidatedFoo to continue.
An extremely good example I've found for this is canonicalization in all forms. Frequently there are scenarios where there are many ways to represent something, but only one official Right Way.
I'm not sure I see how "parse, don't validate" would apply when using a newtype—would you mind explaining further?
I think the main point of this article is that newtypes aren't an application of "parse, don't validate". If you validate that user input doesn't contain XSS, then wrap it in a `SafeHtml` newtype, that's validation, not parsing. The article outlines ways that you could wrap the user input but forget to validate.
By contrast, suppose you have a type like `type SafeHtml = TextNode Text | DivNode SafeHtml | ...`. You parse the user input into a value of this type. If the parse fails, it's because the user input contained something unsafe. The article contends that it's harder to accidentally wrap an unsafe string, since you have to parse a safe HTML string to get a value of type `SafeHtml`.
(Although I'm not sure I buy that last point. With both methods you'd want to keep the `SafeHtml` type's constructors in a module and not export them. Otherwise you could wrap arbitrary text in the newtype or in the `TextNode` constructor. Both seem equally possible to me.)
It seems to me that there is no practical alternative. The impractical alternative is to extend the language so that its type system implements the feature you want. (For example, adding bounded integers.)
I don't see how that helps, though, because there could be bugs in your language extension, and it has to be maintained. It's just going about it the hard way. It would be sort of like avoiding "unsafe" in Rust by extending the compiler instead.
There's a lot to be gained by isolating the place where the validation needs to be done, and the source of the bugs could be to one place though. Would you rather have one bit of code that checks 1<=X<=5, or 100 call sites all doing the same thing slightly differently?
So in the example where you're constructing HTML, you would never use strings to represent your HTML. So your template might look something like
```
P(Text(someUserText), I(someItalicized))
```
This is essentially how React.js works; you don't need to tag strings as safe or unsafe because strings are never treated as HTML, only constructed HTML values (via JSX) are treated as HTML.
Maybe you can solve it like this: internally represent the document as DOM objects. Have element nodes contain a list of children which can be either raw strings (representing text nodes), or other element node objects. The render function of element nodes would know to escape the contents of text nodes before concatenating them with their rendered element node siblings. It would be type safe since it knows all the types of children it can contain.
It's like SQL parameterized queries: delay the concatenation until the last possible moment to reduce the area where errors can occur.
But I think maybe that just moves the opportunity for error to the render function, so I don't know if that is really an improvement on "smart constructors".
Thinking about safety and security can benefit from a threat model: what sort of things do we expect to happen, and are we trying to prevent?
I think of safe handling of user input as "first order safety". Server-side code handling arbitrary input from the network should use an adversarial model: assume that users will exploit any weakness we expose, no matter how convoluted.
Program/library architecture and design is more like "higher order safety": we're not directly preventing attacks, we're preventing code-which-allows-attacks. I don't follow an adversarial approach here: I'm not trying to stop a rogue developer from inserting backdoors into a project, I'm trying to catch and prevent mistakes from being made.
If there's a correct-by-construction way to approach certain problems then I'll gladly use it (e.g. one of my current projects uses NonEmptyString, NonEmptySet and NonEmptyList, which use a (head, tail) pair to guarantee non-emptiness). If that's not practical, but there is a way to encapsulate problematic code using newtypes and modularity, then I'll take that (e.g. this project has a NoSpace newtype for strings without whitespace). If it's not practical to encapsulate problematic code, then I'd still rather make it obvious when something dodgy is happening, e.g. even if a 'StatusCode' is isomorphic to a String, the minor cost of wrapping it up in a newtype is worth it to prevent obvious mistakes like appending semantically distinct values. It's all about bang-for-buck.
I also agree that type synonyms usually aren't worth it. The extra cost of introducing a newtype isn't much, and it buys us a lot more in terms of ruling out problems, making conversions more explicit, better error messages, etc.
(The following comment may not have anything to do with Haskell’s use of the word “name,” but I’ve wondered about this for a long time.)
Forcing someone to write iFoo instead of foo isn’t type safety, but it’s a certain kind of safety that I haven’t seen clearly spelled out. Actually I recall Joel on Software mentioning it once.
I don’t like it, but it’s always seemed interesting. Another example is in lisp:
*current-input-port*
to indicate that a variable is dynamic, you can end it with an asterisk. But this is only a convention. I’ve always felt that LET should automatically do dynamic binding for any name that ends with . But then of course that runs into surprising corner cases when the user wants to e.g. write foo-+ foo-/ foo- foo—
You could argue that names aren’t type safety, but I would disagree that they aren’t any kind of safety. A good naming convention is one of the safest ways to program in an otherwise hostile domain. But that raises the question: what kind of safety is it? I’m not sure it even has a name, other than Hungarian notation, which is both imprecise and not sufficiently general for the concept.
> Forcing someone to write iFoo instead of foo isn’t type safety, but it’s a certain kind of safety
The term I'd use (at least from the top of my head) is hygiene. The article uses the word safety in a restrictive sense, ie. preventing errors. Forcing iFoo instead of foo, to me, is more about hygiene where one could still potentially make an (accidental or deliberate) error but that would go against said hygiene. Maybe another way to put it is that safety prevent misuse while hygiene guides correct use. Or another is that with safety the letter and spirit of the law are one and the same while with hygiene it's not.
Haskell's `newtype` is an amazing tool for (proactive) hygiene and, as the article says, a limited tool for safety. By proactive hygiene I mean that if you have:
bmi :: Float -> Float -> Float
Looking at it you don't know what those params are. They are all just `Float`s. There's not vocabulary to them. Meanwhile here:
It is both clear what they are as well as the compiler will prevent you from passing `Weight` as the 1st param. Now it's not safety as you can still wrap the wrong `Float` as `Height` but that would go against the hygiene.
I do not see the problem with newtype, but rather with fromOneToFive and toList. As soon as you use them, you are left with an Int and a List. So why complain safety is lost when it has been intentionally lost?
If the compiler does not provide safe ways of working of newtype's, the programmer indeed has to provide them in the form of methods, basically just as shown in the article.
In Pascal you can easily create such a type OneToFive = 1..5
It should inherit all the normal arithmetic, but does not allow values outside the range
However, these kind of types are causing a lot of problems.
You do arithmetic on them, get a value outside of the range, and then it throws an exception. And an unhandled exception crashes the program, which is often worse than an invalid value.
And it is slow when it has to check if the value is in the range after every arithmetic operation, so you can disable the range checking to make it faster. But then you get values outside the range without an error
The language also assumes everything can be initialized to zero. It just calls "memset" on a allocated memory, so it can be initialized quickly. So you often get values that are zero whatever the range is
And you cannot even check for that, because the compiler knows what values are valid. So if you want to check if the variable has a valid value and write (value >= 1 and value <= 5), the compiler knows it is supposed to be always true, and optimizes it away.
>You do arithmetic on them, get a value outside of the range, and then it throws an exception. And an unhandled exception crashes the program, which is often worse than an invalid value.
IMO an exception (failing fast) is always better than an invalid value.
Exceptions like these also excel at finding dangerous edge cases when paired with a decent test suite.
> And an unhandled exception crashes the program, which is often worse than an invalid value.
Please elaborate. Why is working with garbage better than not working at all? Would it not also rather make it easier to find these out-of-range cases with fuzzing?
You have 1% of code dealing with user data, and 99% of code doing other things
For example, a text editor. If it crashes, you lose all your text since the last save, which is very bad. But the editor also does other things like playing animations when you click somewhere, a lowering button, or a show a rotating hourglass as mouse cursor. If it would show a wrong animation, that is completely irrelevant. Or it has syntax highlighting with a range type for possible user-defined colors. If it shows the wrong color, that is no big deal. But if you have a new version of the editor, adding more colors, and you use those colors in the user's setting, you cannot downgrade to an older version, because it would crash when it does not know the new colors.
I just had an actual problem with such an exception when calling GetFileAttributes. It returns INVALID_FILE_ATTRIBUTES (unsigned 0xFFFFFFFF) on failure. But I was calling a wrapper around it, which casts it to a signed value and then return -1. Storing the return value in an unsigned variable and then comparing it to 0xFFFFFFFF gave me a range check error. But compiled with range checking disabled, it works perfectly fine.
Unless it's in that one percent of code that deals with user data, in which case it might be much better to just crash. Depending on what the GUI application is for, giving incorrect values could be very bad.
I would think you could catch all exceptions at the top level, save a backup of whatever data the user has loaded, and then continue to crash normally.
A problem that I see with these approaches (e.g. 1..5) is that the possible values are often not set in stone. This means that by using range constraints or similar, there is often an overspecification that can hurt down the road.
Furthermore, the possible values differ from location to location as the values travel through the program. In that sense, the possible values are often underspecified / underconstrained, even though the specifications are already not simple.
This is why I prefer to go with rather little constraints - in a compiled language, you have to specify the physical layout (8/16/32/64 bits?), and I try to specify not more than that in most cases.
Then, so much safety can be achieved by placing asserts in strategic locations.
It should be noted that the -- to use the terminology of the article -- extrinsically safe example module Data.List.NonEmpty.Newtype can be converted into an intrinsically safe one by changing the definition of the NonEmpty type from
newtype NonEmpty a = NonEmpty [a]
to:
newtype NonEmpty a = NonEmpty (a, [a])
(and changing the associated functions accordingly -- which can now be implemented without the use of error).
EDIT: Upon further thought, I'm not sure the above distinction makes sense. The implementation of e.g. Data.List.NonEmpty.Newtype.head would still involve the use of "error" through the use of Data.List.head:
head :: NonEmpty a -> a
head (NonEmpty (x, [])) = x
head (NonEmpty (_, lst)) = Data.List.head lst
That's the implementation used in previous posts, like 'parse, don't validate'. This post is talking about using newtypes instead (which seems like a worse choice in the case of NonEmpty, but may be more pragmatic when a correct-by-construction approach would be infeasable)
The idea of needing a 1-5 compile-time constrained value is rare in any application I have worked with, because either you'd use an enum, or the "5" isn't really known at compile time. (I thought about this when doing some basic Idris lesson.).
It makes me wonder what the constrained type might be that I do need?
A non-negative floating point number would be a common one I think, e.g. for a coordinate. I've not been keeping up with Haskell but can you guarantee this at compile time?
I guess it would be hard to do so, because every assignment to this from a normal floating point would need to be checked, requiring you "prove" that the number is non-negative at compile time, so back to Idris I guess?
E.g. x:Positive, x+5 -> this is now a normal float, or I need to prove x+5:Positive.
I have only haskelled a little bit, but I would think most of the newtypes are mostly to trace the meaning of something. E.g. Latitude, Longitude, X, Y, WeightKg, and so on. These would be hard to constrain (and to what..?) they are just floating points.
> The idea of needing a 1-5 compile-time constrained value is rare in any application I have worked with, because either you'd use an enum, or the "5" isn't really known at compile time.
The OneToFive-type is used to illustrate a point, not because it’s practically useful.
> A non-negative floating point number would be a common one I think, e.g. for a coordinate. I've not been keeping up with Haskell but can you guarantee this at compile time?
No. It’s not possible.
But, as you point out, this is not a problem since values like these are rarely known at compile-time.
And, more importantly, if you can’t trust a programmer to write the correct constant, how can you trust the same programmer to write the correct type for this constant (“non-negative float”)?
Except by using a quasi-quoter (which amounts to parsing a string into a float at compile-time)
This is an excellent article that clearly outlines a conversation I had with a co-worker last week; Type systems are usually only as helpful as the developer allows them to be.
I believe that Ada, or perhaps a specific subset with some extensions, achieves most of these goals. You can definitely restrict types to be a range or enumeration of e.g. ints, and they have some facility for contract based function interfaces.
I don't know much about Haskell, but I was struck by this:
> "Suppose we want a type for “an integer between 1 and 5, inclusive.” The natural constructive modeling would be an enumeration with five cases"
Does that really make sense? Using an enum for a limited integer type? I realise most languages don't actually support true limited integer types, but if I recall correctly from my introductory programming classes 28 years ago, Modula2 does allow you to specify a number type with a range of acceptable values. I don't think I've seen that in any other language, but it sounds like an incredibly useful feature if you want type safety.
Range types in the Pascal family, unless I'm mistaken, are not checked for legitimate values at compile time - they just crash at runtime if you try to create one with an invalid value. Syntactic sugar for an assert statement in the constructor. It's usually quick and straightforward to replicate the feature in a modern language with generics and first-class functions.
When people refer to 'type safety', they almost always mean compile-time safety. Not just ensuring that a value of 6 cannot exists, but ensuring that none of your code is trying to create one. The unwieldy enumeration allows that.
The real issue is the lack of dependent types in Haskell (and unfortunately Rust and all languages with a vast ecosystem).
With dependent types, you can just include a type-level proof that x <= 5 in the integer wrapper type, which then allows to statically check that the default case in the match is impossible.
A newtype including such a dependently-typed proof is the correct solution to this problem, since it can be generalized to any range, unlike the enum with 5 variants.
The Rust standard library has many examples of exactly this, except with unsafe to assert that the unreachable cases can't happen. A classic example is the String type, which is just a newtype around Vec<u8>. Other examples include CStr, NonZeroUsize and PathBuf.
Arguably even File is an example of this. Internally, it's just an integer for the file descriptor.
Exactly what? I'm not sure what you're referring to here exactly.
Generally newtypes in Rust are not used excessively and largely for sensible safety assertions. I don't see people making `EmailAddress(String)` newtypes.
Of newtypes. I gave several examples of them. The String type _is_ a newtype that outlaws some of the possible values of the underlying type, namely non-utf8 byte sequences. I wasn't talking about people using Rust, but about the standard library.
That said, I have personally used newtypes a reasonable amount, mostly for making sure I don't mess up different kinds of database ids.
It is not. `NonZeroUsize` is (though in Rust "newtype" is only a pattern or idiom), and its definition (inside of a macro which provides numeric newtypes a bunch of `impl`s) is:
pub struct NonZeroUsize(usize);
But here's String:
pub struct String {
vec: Vec<u8>,
}
…which is a normal `struct` _containing_ a named `Vec<u8>`. Of the types you mentioned, only `NonZeroUsize` conforms to the pattern.
The point of a newtype is that it allows you to create a wrapper around some other type for one of the following two purposes:
1. Marking the kind of data to prevent mixing up values of different kinds. (e.g. Email)
2. Outlawing some of the values of the underlying type. (e.g. OneToFive)
I do not think it being a tuple-struct or not is important at all when deciding whether something uses the newtype pattern, as long as it has a single field.
Do you feel that just because the inner value is named, rather than unnamed, it’s no longer a new type? Trying to figure out what line is crossed here.
Hehe yeah, I mean, I am not 100% sure myself, which is why I asked.
There's no real difference between a 1-tuple struct and the regular struct, other than the name. So to me, it feels like either are both newtypes. But, I guess I could see some sort of argument the other way too.
In Haskell, a newtype can be a single-field record with a named field, so from a Haskell perspective, I would definitely consider both of them “newtypes.”
The article doesn't exactly make a case against it. It even explicitly says it's useful! It just says making a newtype isn't as strong a guarantee as a proper type.
Newtypes do work to assert certain checks have already been done and need not be repeated.
One note: instead of using module boundaries to limit how a value can be used, you can use existential types instead. This is essentially equivalent to how encapsulation works in OOP languages.
In a well-formed program it is impossible to have a OneToFive that has not passed through toOneToFive. That's type safety by any reasonable definition. It's not as much type safety as you'd gain through explicitly modelling the internals, but that's a difference of degree, not kind. Sure, using `Generic` or any number of other things lets you break your rules - but so does using `unsafeCoerce` on the constructive version.
I'd argue that newtypes provide a better cost/benefit than essentially any other language feature. The author purports to embrace the idea that the type system is a tool to be used pragmatically, but that's exactly what using a newtype does: you enforce that appropriate checks are applied to any use of any given type, but what's "appropriate" is an internal concern for that module. Whereas expecting to be able to construct every domain datatype rigorously from first principles is simply not realistic in a lot of business cases; that constructive model may well be opaque or simply not exist for the domain you're working in. (Yes, this may well mean the domain you're working in is fundamentally incoherent - but if that incoherence is present in the real process that you're modelling, then your model has a responsibility to faithfully reproduce it).
This kind of newtype use can be done compositionally - one module can be reasoned about without needing to understand the internals of the modules you're depending on - which is the essence of practical, effective programming tools. A technique that only works in a "closed world" will always be severely limited in its applications.