Fascination with type systems does not seem to be all that useful in practice. Go has a minimal type system, and is able to do much of Google's internal server side work.
Most of the problems that cause non-trivial bugs come from invariant violations. At point A, there's some assumption, and way over there at point B, that assumption is violated. That's an invariant violation.
Type systems prevent some invariant violations. Because that works, there are ongoing attempts to extend type systems to prevent still more invariant violations. That creates another layer of confusing abstraction. Some invariants are not well represented as types, and trying makes for a bad fit. What you're really trying to do is to substitute manual specification of attributes for global analysis.
The Rust borrow checker is an invariant enforcer. It explicitly does automatic global analysis, and reports explicitly that what's going on at point B is inconsistent with what point A needs.
This is real progress in programming language design, and is Rust's main contribution.
That's the direction to go. Other things might be dealt with by global analysis, Deadlock detection is a good example. If P is locked before Q on one path, P must be locked before Q on all paths. There must be no path that leads to P being locked twice. That sort of thing. Rust has a related problem with borrows of reference counted items, which are checked at run time and work a lot like locks.
Those potentially have a double-borrow problem related to program flow. I've heard that someone is working on that for Rust.
> Fascination with type systems does not seem to be all that useful in practice.
> ...
> The Rust borrow checker is an invariant enforcer. [...] This is real progress in programming language design, and is Rust's main contribution.
I'm so confused by your stance here. You essentially say "type systems are not useful" and then "oh but this most recent advance in type systems — that one is useful." Do you find type systems useful or not?
There are a lot of properties we can analyze statically, and practically all of them essentially amount to extensions of type systems. Any of them increases our ability to rule out undesirable programs from every beginning execution. Some of them have unintuitive syntax, but many of them are no more syntactically burdensome than most other type systems. This is especially true if you consider how far we've come with type inference, so we no longer have to write code with the verbosity of Java just to get some meager guarantees. It's still a very active area of research, but we're clearly making progress in useful ways (which you even highlight), so I don't really know what point it is you've set out to make.
They appear to think that the borrow checker isn't achieved with type theory, but with some other technique ("global analysis").
Although, to be fair, my understanding of making a practical affine type checker is that things get kind of wonky if you do it purely logically. So practically you do some data flow analysis. Which is, I believe, what rust is doing. This also explains why MIR was such a big deal for certain issues with borrow checking. They ended up with a format that was easier to run a data flow analysis on, and that allowed the borrow checker to handle things like non-lexical lifetimes, etc.
[I've only read about such things. So I might have mis-remembered some of the details. However, this is my take on why someone might not call rust's advances purely type theoretic (even if they can be handwaved as type theory at a high level).]
Also, the borrow checker doesn't do global analysis. Not doing so is an important design decision in Rust, and is part of the reason why we don't infer type signatures, including lifetimes generally. You want to keep things easy to compute.
That said, you're right that the current borrow checker ("non-lexial lifetimes") is built off of a control flow analysis, absolutely. But it still operates only within bodies, not globally.
>They appear to think that the borrow checker isn't achieved with type theory, but with some other technique ("global analysis").
No, I think they appear to think that the borrow checker is not about the programmer stuffing the program with type theory annotations for this purpose, but the borrow checker handles the analysis automatically across the program.
Of course it doesn't handle it perfectly, so there's some "fighting the borrow checker" and lifetime annotations sometimes needed - but it's not like designing the whole of your program based on type theory - the same way you're somewhat free from thinking about freeing memory in a gc language.
I'm so confused by your stance here. You essentially say "type systems are not useful" and then "oh but this most recent advance in type systems — that one is useful." Do you find type systems useful or not?
Not the original author but it seems like they're saying that type-systems are non-specific invariant enforcers and so have costs without necessarily having benefits whereas a user-specifiable invariant enforcer is more guaranteed to have the benefits.
It is probably pretty presumptuous to assume, but I think that a lot of programmers that have only every been exposed to C/C++/C#, Java and Python have basically no concept of what a good type system can do for them.
Two examples from the top of my head:
1. Encoding matrix sizes into the data- and function-types, so that you can safely have a function `mat[c,b] mat_mult(mat[a,b] a, mat[c,d] b)` or even `mat[w-2,h-2] convolve(mat[w,h] input, mat[3,3] kernel)` and have the compiler check that you never use a matrix of the wrong size.
2. Actually checking the correctness of your implementation.
There is a very nice online demo of Liquid Haskell [1], where they defined the properties of ordered lists (each element has to be smaller or equal to the one before, line 119). Then they define a function that takes an (unordered) list and spits out a ordered one by applying a simple quicksort.
Now, if you break the algorithm (e.g. flip the < in line 193) and run Check, the compiler will tell you that you messed up your sorting algorithm.
Pretty neat.
edit: I just realized that LiquidHaskell is almost 10 years old. Sad to see that basically nothing made it into "production".
For example one, it works for signal processing or graphics but compile-time dimensions are unusable in Machine Learning or Numerical Computing because it's too much friction on serialization/deserialization and some operations that reduce dimension or rank are based on runtime data (for example some dimensions are 1)
Sure, but this approach generalizes quite well. Especially in ML you have have a lot of matrices, many of them of known size (e.g. convolution kernels).
Also, while it looks like the matrix sizes have to be known at compile time, this is not the case. You are still free to use the same matrix types with dynamic sizes (or even mix them, useful for said convolutions).
In Haskell there that is achieved by using a "KnownNat", basically you elevate an integer from the value into the type level during run-time.
I can see this kinda. It would be interesting to experiment with how black and white this is.
Historically, most cases have been either compile time (static) or run time (dynamic) type checking. And left between one or the other, and experiences like the above, people make their binary choice.
More and more in my Python code, I do some type annotations I can. My feeling is that the annotation coverage ROI is non linear. I get a lot of mileage out of the types I can add easily. When it gets complicated (interesting unions or deep containment) it gets harder and harder. Enough so that sometimes it influences my design decisions just to make the typing job easier.
I’m left to wonder how this scales as the code base/team size scales. If the pain of 0 types is 100 in a large project, and 100% types cuts it to 10, what happens if we all do the easier 80% annotations. Is the pain 80% less too? Or is my personal experience mirrored and it actually is quite a bit better than just 80% reduction?
> either compile time (static) or run time (dynamic) type checking
But it is not that black and white, is it? Python is actually somewhat static in that it checks (some) types during runtime. Other dynamically typed languages live completely by the "when it quacks like a duck" playbook.
On the other hand, Haskell is completely statically typed. Still you can write many programs without annotating any types at all, as the compiler is pretty good at inferring types from their context.
They are especially useful when refactoring, or as a documentation tool.
However, without a type checker, choosing your names wisely will get you a long way.
I think anyone advocating type systems should spend a year working in a dynamic language, to get out of their echo chamber and form a more objective opinion.
I was a mild advocate of static type systems. Then I spent about two years working on dynamic languages, mostly Python, on two jobs with otherwise very different code bases. I am now fully converted: after that experience, I strongly, strongly, strongly advocate static typing, so much that I will leave any job that requires me to use languages that lack them. I don't know if I was just unlucky, but I found that experience to be dreadful. Aside from the million small friction points, I remember things like needing to allocate a full week for a refactor that would have taken less than one hour (and that's a pessimistic estimate) with the standard tools of a static language's IDE.
I really don't see the advantages of dynamic languages. Supposedly you develop faster, but in my experience that's absolutely not true.
For me it all depends on the size of the project. For personal projects below a certain size dynamic typing is 50x faster than static. But that benefit drops off sharply once the project gets over a certain size or once there are more than a couple of people involved.
That's why I liked JavaScript -> TypeScript. I can bang out a prototype of some small piece in JavaScript and then add the types once I'm sure it worked. I save tons of time getting to MVP / proof of concept first and then filling in the constraints later.
Yeah, you have to work so hard to write maintainable code in "dynamic" languages. It's also super hard to debug them. It's easier to write the first thing that runs, but that's not enough.
What you appear to be saying is that people who like type systems must be ignorant because with experience you suspect they would think differently.
This seems to me to be extremely uncharitable point of view.
But let's roll with it.
I advocate type systems.
I've also worked in several non-trivial projects in lua. Several non-trivial projects in python. Several trivial projects in common lisp. Several trivial projects in erlang.
Additional I've worked in a non trivial project in ruby for several months. And one non trivial project in node for a year. Both of these in a professional 8 hours a day capacity.
I still advocate type systems. More so after working in dynamic languages.
"Type system" is being used here to mean "static type checking" while CLisp has some facilities for this they are implementation-specified and not commonly used. (Even its dynamic typing checking is not commonly used.)
There are type hints that you can give the compiler, but I believe that's only useful for generating faster assembly. Also you can get compile time warnings from macros, but that's much closer in nature to getting a parse error (something pretty much every language does that I'm aware of static or dynamically typed).
I wouldn't be surprised to learn that there exists a common lisp typing extension that someone made with macros (after all racket has something like that iirc), however if it exists I didn't use it regardless.
I’m not sure this is true. John Brant and Don Roberts did the some of the original refactoring work in Smalltalk under Ralph Johnson at UUIC.
I remember sitting with them at OOPSLA and discussing just this at length. Java was in full momentum ascendancy. Eclipse was the new will-solve-everyones-problems IDE. There was a LOT of interest in doing for typed Java in Eclipse what John and Don had both done in Smalltalk during their advanced degrees. They too thought typing would make it easier.
And what they shared with me is that it actually made it harder. Type systems are complicated to model. It gets combinatorial complex to model transformations across all the edge cases.
It may be argued that this was/is more a statement about the Java type system in particular, which is not generally extolled anyway.
But the basic take away was that refactoring becomes easier and easier as the language syntax and model gets simpler. A more complicated language like Ruby, without types, was also harder.
Or put another way… refactoring engines work with AST models. The more complicated your AST, the harder refactoring gets. I think type systems generally add, rather than remove, complexity from language models.
But perhaps there is a way to make it so. Just sharing some thoughts of yore from some guys that spent a lot of time studying refactoring.
That's exactly what I did, because I felt that I was just too sure that typed language were just superior to non typed languages without actually having proper experience.
I learnt Clojure, to the point that it's now my favourite language and the one I use for many things. Nevertheless, I still think that typed languages have two advantages that make them a more practical option for most situations:
- It's easier to evolve your code. This includes refactor but also normal program evolution through aggregation.
- It allows to express abstractions in pure code, rather than in documents or comments.
The first advantage requires little explanation. The second I think is more difficult to appreciate without enough experience with typed systems and an interface oriented programming style.
How these two things are important for a given person is difficult to say. I have come to the conclusion that it relates a lot to how your brain works. Unfortunately, mine can only track 3 or 4 things at the same time, so I need to abstract away anything else than the little piece of code I'm working on and get the compiler to worry about how things glue together.
I need the comfort of getting the compiler to help me change my hats and be able not to think about anything else that the concrete piece of code I'm working on and the interfaces of the other parts it uses. When I don't have this possibility, I miss it even more than the comfort of programming using inmutable data structures that Clojure spoilt me with. I think need to seriously try Scala 3 at some point, since it seems to combine inmutable data structures by default with an advance type system (although the lenses libraries I've seen look like an abomination in comparison with Clojure's update-in and company, not to mention the absolute elegance and triviality of the implementation of the latter: https://github.com/clojure/clojure/blob/clojure-1.10.1/src/c...).
So I would second your recommendation and encourage people trapped in dynamic language echo chambers to try for a year something like Typescript or Kotlin to appreciate other ways of doing things in languages with practical type systems. Perhaps some of them will discover that it suits them better or help them better understand why they prefer the dynamic style.
I spent about a decade pretty much solely in dynamic languages (ruby and javascript mostly) before switching to pretty much entirely working in a fairly lame statically typed language (java). Languages with static type systems are enormously, glaringly, better. Really where the benefit comes from is static analysis; it's just that languages with static type systems have better static analysis capabilities. This may not strictly be the case in theory but it definitely is in practice.
Oh, I'll never program in a dynamically typed language again. I'm sold on that. What I'm speaking to is the notion that types are the best model for solving most/all problems in software engineering.
Do people commonly think types are a solution to most or all problems? Other than correctness I am not sure what software engineering problems a type system actually solves, and the rest of the debate is about the expressiveness of the type system (or lack thereof, which forces suboptimal engineering practices in some languages).
Static typing is just another form of static analysis. However, it's static analysis enforced by the language rather than a third party tool. That allows me to be confident in my dependencies too if I see them putting the type system to work.
Protobuf is moving us that way with microservices too. Since they're a strongly typed message format, it's harder to make mistakes in the interface between two services.
I also like that languages can have complete local static analysis. Sure, the business requirements might be large and spread across many areas, but I will break them down into smaller chunks and encode invariants into the type system so that if the small chunk compiles, I am confident it does exactly what I expect, and I don't need to remember exactly where it fits in the larger picture
Based on your definition, then any static analyzer is a type system, because type information and the usage of those types is basically all that’s available for static analysis.
Types define what operations can be performed. The borrow checker looks at allocations and determines when they can be freed. It really has nothing to do with types. An ideal implementation of the borrow checker could be totally type unaware and work with dynamically typed languages.
Probably you could build such a thing without thinking about types per-se, but I don't think the designers of Rust did that, and I am not sure that makes it "not a type system" anyway.
> any static analyzer is a type system
"Any static analysis checking correctness is either a linter or a type checker" isn't a claim I would make, but I am not sure offhand how I would argue against it.
I'm not deeply familiar with Rust borrow checker, but my understanding is that Swift will in the near-future support move only types. Are those not somewhat similar to what Rust has?
> Fascination with type systems does not seem to be all that useful in practice.
And yet type theory is an excellent way to express all kinds of invariants. The more rich the type system the more you can express. If you get to dependent types you essentially have all of mathematics at your disposal. This is the basis of some of the most advance proof automation available.
What is super cool is that proofs are programs. You can write your programs and use the same language to prove theorems about them.
This is still fairly advanced stuff for most programming activities but the languages have been steadily getting better and the automation faster and I think the worlds will eventually collide in some area of industrial programming. We're already seeing it happen in security, privacy, and networking.
I don't think type systems suffer from complexity. They merely reveal the inherent complexity. You can use languages that hide it from you but you pay a price: errors only become obvious at run time when the program fails. For small programs that's easy enough to tolerate but for large ones? Ones where certain properties cannot fail? Not so much in my experience.
update: clarified wording of "proofs are programs"
> The more rich the type system the more you can express
Why is this interesting? You pay an extremely heavy price in terms of language complexity. In practise, you almost never have the invarants at all or correct when you begin programming, and your programs evolve very rapidly. Since with dependent types you loose type-inference, you now what to evolve two programs rather than one. Moreover proofs are non-compositional: you make a tiny change somewhere and you might have to change all proofs. In addition we don't have full dependent types for any advanced programming language features, we have them only for pure functions that terminate.
> the same language to prove theorems about them
That sounds like a disadvantage. Ultimately verification is about comparing two 'implementations' against each other (in a very general sense of implementations where logical specs and tests also count as implementations). And the more similar the two implementations, the more likely you are to make the same mistake in both. After all, your specification is just as likely to be buggy as your implementation code.
> type systems suffer from complexity.
This is clearly false for just about any reasonable notion of complexity. For a start pretty much as soon as you go beyond let-polymorphism in terms of typing system expressivity, you looks type-inference. Even type-checking can easily become undecidable when the ambient typing system is too expressive.
>your programs evolve very rapidly. Since with dependent types you loose type-inference, you now what to evolve two programs rather than one.
Yes, just like you have to evolve your specification/documentation. Similarly, in the exploratory phase you'll stick to very 'rough' typing and next to no proofs and as the program gets clearer and solidifies, you can continuously refine your types (with the amount of refinement depending on how much certainty one wants). A mature program with a rigid spec is going to be harder to change, but that's just how it is anyway.
>Moreover proofs are non-compositional: you make a tiny change somewhere and you might have to change all proofs.
People on HN keep repeating this, but it's trivial because it's actually just a statement about the nature of properties. Proven invariants provide an interface to be used by the rest of the program. If the changes you make don't break your invariants but only the proofs, then you just have local adjustments to make. If your change cascades through the entire program, then it's because all the invariants get invalidated and your 'tiny' change actually changes a good deal about the semantics (and thus the spec) of the program.
The exact same thing applies to unit tests, but you don't see people constantly bemoan the non-compositional nature of unit tests even though a 'tiny' change could break all your unit tests.
>After all, your specification is just as likely to be buggy as your implementation code.
Not only are specifications declarative, they're generally magnitudes smaller. If you're as confident in your spec as in your code, something is very, very wrong with your spec. Well, or your code is your spec, but in that case you get what you pay for.
> just like you have to evolve your specification/documentation.
That is correct, and also one of the core reasons why in the vast majority of cases either no specification/documentation exists, or will only cover a small case of the actual specification. For example I would bet money that not a single function in the C, C++, Java and Python standard libraries is fully specified, in the sense of nailing down the program up to observational equivalence. (Aside: I can still count the programmers who would be able to sketch the observational equivalence using in e.g. C++.)
> If your change cascades through the entire program, then it's because all the invariants get invalidated and your 'tiny' change actually changes a good deal about the semantics (and thus the spec) of the program.
This is not borne out in practise.
A lot of code refactoring I've done was trivial (e.g. changing the order or arguments), but ripples through the program and proof structure. HoTT was invented in order to automate some of those trivialities. Java exception specifications are an example: you call a different library function somewhere and need to change all exceptions specs up to the main function, rippling through millions of LoCs. That's why exception specs were abandoned. Another example are termination proofs (note that a full specification must involve termination proofs, which is why the existing expressive type theories don't give you unrestricted recursion, and also the reason why program logics are typically only for partial correctness). In my experience, termination bugs are rare, and it would be insanely counterproductive if I had to refactor all proofs globally just because I've made a slight change to some printf somewhere in a large code base.
> unit tests, but you don't see people constantly bemoan
The reason is that no programming language forces you to do unit tests. In contrast, expressive type-theories constantly force you to prove a lot of trivialities.
> declarative, they're generally magnitudes smaller.
I don't know what you mean by declarative (other than: leaving out some detail). But they cannot be smaller in general: if every program P had a full specification S that was shorter (here full specification means specifying P up to chosen notion of observational equivalence) then you've an impossibly strong compressor which you can use to prove that every string can be compressed even more. Contradiction.
What you see in practise is that you only specify some properties of the program you are working on. For example sorting routines in C, C++, Java etc. I have never seen a specification that says what happens when the sorting predicate is nicely behaved (e.g. returns a < b on first call but a > b on second). It's fine to omit details, but that limits the correctness you get from your spec (and also limits program extraction). Moreover, if you only work with a partial specification, you can ask the question: what level of partiality in my specification gives me the best software engineering results. My personal and anecdotal experience (which includes dependently typed languages) has consistently been that the full automation given by let-polymorphism is the sweet spot for non-trivial programs (lets say > 10k LoC).
>the core reasons why in the vast majority of cases either no specification/documentation exists
I feel that is much too pessimistic.
>will only cover a small case of the actual specification.
If the same applies to proofs: so be it. Don't let perfect be the enemy of good!
>For example I would bet money that not a single function in the C, C++, Java and Python standard libraries is fully specified, in the sense of nailing down the program up to observational equivalence.
I'd imagine so as well, but I think that's more indicative of how even a (superficially) simple language like C is not all that amenable to specification.
> A lot of code refactoring I've done was trivial (e.g. changing the order or arguments), but ripples through the program and proof structure.
This is not my experience. If you use sufficient proof automation, something like this should have next to no impact on proof structure. Univalence is useful, but a lot of refactoring is not just switching to isomorphic representations, so I'm convinced that larger scale proof automation is way more essential than HoTT.
> Java exception specifications
I'm not convinced that this is fundamental to specifying exceptions rather than just Java having particularly poor ergonomics for it. I've never met a person that actually liked implicit exceptions and if you ask people who dislike exceptions, that's often one key part of it.
> In contrast, expressive type-theories constantly force you to prove a lot of trivialities.
For all large dependently typed languages that actually bill themselves as programming languages (Agda, Idris, Lean) there is some sort of 'unsafe' feature that allows you to turn the termination checker off - at your own peril of course. But you only pay for what you use, if you don't need the additional correctness, you don't need to put in any more work, just like with unit tests.
(There are also ways to safely defer termination proofs, but to be fair the experience isn't the best currently.)
>I don't know what you mean by declarative (other than: leaving out some detail).
Specs tell you the what but not (necessarily) the how. Take the spec for all sorting algorithms (giving observational equivalence):
1. The output list is a permutation of the input
2. The output is sorted in regards to some predicate.
That's a couple lines at most (or a one-liner if you don't count the building blocks like the definition of a permutation), which is a good amount shorter and easier to verify than e.g. a full heapsort implementation.
>But they cannot be smaller in general: if every program P had a full specification S that was shorter [...] then you've an impossibly strong compressor
The compression is stripping away the 'how', that's why you can't 'write a spec for a spec' and compress further.
>What you see in practise is that you only specify some properties of the program you are working on.
Sure, writing a fully specified program of non-trivial size is currently really only possible if you're willing to ~waste~ invest a decade or more.
>Moreover, if you only work with a partial specification, you can ask the question: what level of partiality in my specification gives me the best software engineering results.
Why would you assume that there is a single level of partiality that gives the best results? I agree that HM style types are a great fit for 'general' programming because it has such low impedance, however I also believe that most programs have some areas where they would benefit from more specification. (I think people have a clear desire for this and that it's at least partially responsible for some of the crazy type hackery as seen in Haskell or Scala, which could often be greatly simplified with a richer type system.)
Having a richer type system doesn't mean that you always have to fully use it. It's perfectly possible to just use a HM style/System F fragment. Using dependent types just for refinement is already one of the dominant styles for verification. If dependent types ever become mainstream in any way, I imagine it will also be largely in that style.
> Take the spec for all sorting algorithms (giving observational equivalence):
It's not that simple. You also have to specify the effect set of the algorithm, meaning, assuming we do in place sort: every memory cell other that the input array are unchanged after termination. (If you return a fresh array, you have to specify something related). You also have to specify what happens for general sorting predicates, for example if the sorting predicate prints out the element it compares then this reveals (much of) the algorithm.
> The compression is stripping away the 'how'
The sorting example shows that this largely doesn't work in practise. In my experience for non-trivial programs you tend to have to carry around invariants whose complexity matches the size of the program you are verifying.
> I'm convinced that larger scale proof automation is way more essential than HoTT.
I strongly agree, but currently this proof automation is largely a hope, rather than a reality.
> crazy type hackery as seen in Haskell or Scala
Haskell and Scala have (local) type-inference. That makes those complex types (somewhat) digestible.
> dependent types just for refinement
If / when this really works well, so that you don't pay a price when you are not using complex types, then this would be very nice. I don't think PL technology is there yet. (I'm currently using a refinement typing approach in production code.)
>It's not that simple. You also have to specify the effect set of the algorithm, meaning, assuming we do in place sort
I implicitly assumed that state would be encapsulated, since after all dependent types are fundamentally incompatible with observable state, but you're right that this is part of the spec. However I have severe doubts about the usability of any specification language that takes more than a one-liner to specify this.
>You also have to specify what happens for general sorting predicates
If you want to admit more general sorting predicates, sure, but normally you'd just allow pure functions.
>The sorting example shows that this largely doesn't work in practise. In my experience for non-trivial programs you tend to have to carry around invariants whose complexity matches the size of the program you are verifying.
If you include lemmas derived from the spec then I'd agree, but then the spec you have to check would still be a lot smaller. If not, then the only specs where I've experienced anything close to this are ones that are just the operational semantics of existing programs i.e. cases where you want to analyze a specific program. Otherwise I'm frankly uncertain as to what even the point of giving a spec would be.
>Haskell and Scala have (local) type-inference. That makes those complex types (somewhat) digestible.
Might make it more tractable to use, but I find my issue is that it's less direct and often obfuscates the meaning.
> doubts about the usability of any specification language that ...
You can extrude hidden state in ML-like languages, which gives rise to all manner of subtle behaviour in combination with higher-order functions. As Pitts, Odersky, Stark and others have shown in the early 1990s, even just the ability to generate fresh names (in ML-like languages, this corresponds to the type Ref(Unit)), in conjunction with higher-order functions gives you essentially all the specification and reasoning problems of state.
> normally you'd just allow pure functions.
This is way too restrictive for non-trivial programs. This restriction work for sorting. And in reality even there you'd run into problem, for example if you take into account that a predicate might throw an exception (which, in practise you cannot avoid, think overflow or taking the head of an empty list).
> the only specs where I've experienced anything close to ...
I think that's because you have not attempted to prove substantial theorems about substantial code. The SeL4 verification is interesting in this context: it's specification (a purely functional specification of the OS behaviour) had, IIRC about 1/3 of the size of the OS's C code. Which is not much of a compression.
> Why is this interesting? You pay an extremely heavy price in terms of language complexity.
As you say, there is no free lunch. Not having a useful type system introduces its own complexity. It depends on what abstractions you find most useful.
The present limitations of dependently typed languages will not be limitations tomorrow. Evolution in the field of proof engineering is providing new frameworks for making proofs more compositional and being able to extract programs from the proofs. It's not amazing and super useful today but it's a lot better than it was even five years ago and I suspect will continue to improve.
> After all, your specification is just as likely to be buggy as your implementation code
If you can't think of the right theorems or specifications I doubt you will write a correct program.
One is a lot easier to reason about than the other.
> Even type-checking can easily become undecidable when the ambient typing system is too expressive.
I'm not sure I follow. I understand how type inference can become undecidable but how does a sound type theory end up this way? The judgement and inference rules for CoC are rather small [0].
> There is no free lunch.
I don't disagree. I still enjoy programming in C. And I might even choose it for a project. And if it so happened that I had certain requirements like the system had to be real-time and could not deadlock then... I might be making a trade off to write my proofs and specifications in another language than my implementation.
We're not at a place yet where we can extract a full program from a specification and not in a place where we can write dependently-typed programs with deterministic run times either.
I would like to have my cake and eat it too but that's where we are.
> Not having a useful type system introduces its own complexity.
I agree, I am not promoting dynamically typed languages. My intuition about this is more that there is a sweet-spot between automation and expressivity that gives you the best software engineering experience in most practical programming tasks. Milner's let-polymorphism is closer to that sweet-spot than full-on dependent types a la Calculus-of-Constructions
> extract programs from the proofs.
In practise you don't have specifications in > 95% of programming tasks. What, for example, is the full specification of a climate simulation, or of TikTok? One could argue that the shipped product is the (first and only) specification. Every program I've ever been part of constructing started from an informal, intuitive, vague idea what the software should do. This includes safety-critical software. To quote from a famous paper [1]: "We have observed that the errors we find are divided roughly evenly between errors in the test data generators, errors in the specification, and errors in the program."
> If you can't think of the right theorems or specifications I doubt you will write a correct program.
I strongly disagree. I have written many programs that are, as far as I can see, correct, but I am not sure I fully know why. Here is an example: the Euclidean algorithm for computing the GCD. Why does it terminate? I worked out my informal understand why at some point, but that was a lot later than my implementation.
More importantly: in practise, you do not have a specification to start off with. The specification emerges during programming!
> how does a sound type theory end up this
I'm not sure what you are referring to. Type inference and type checking can be undecidable. For example System F, a la Curry.
> extract a full program from a specification
Let me quip: Proof are programs! In other words: if you want to be able to extract a full program from proofs, the specification / proof must already contain all the information the eventual program should have. So any bug that you can make in programming, you also must be able to make in proving. Different syntax and abstraction levels clearly corresponds to to different programmer error probabilities. But fundamentally you cannot get away from the fact that specifications can and often do contain bugs.
[1] K. Claessen, J. Hughes, QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs.
Highly expressive type systems can lead people to as much design hell as does deep OOP. I have seen and experienced this in at least a couple of projects.
The only difference is: instead of brittle hierarchies, we get ossified compositions (depending on how much nominal vs structural typing happens).
We, of course, agree that we are quite some distance from having advanced type systems brought to day-to-day industrial programming.
If my understanding is correct, Agda and Coq disallow general recursion, so all programs terminate by construction, but Idris relaxes this, in a desire to be more pragmatic at the cost of not as clean mathematically (functions have to be modeled as partial in general, to account for the fact they might not terminate).
This advanced types
stuff sounds really useful but it needs to be made very easy to use for the mainstream Java of C# programmer to use.
A success story in this regard is the async keyword. Very quickly you can get used to it and it feels like any other imperative programming.
In C# if I can add assertions and have C# compile time check the source that the assertion will not be violated. This would be great. I know they do this for null checking.
> The more rich the type system the more you can express. If you get to
Ah yes. And then you end up writing entire prgrams in types. So the next logical setep would be to start unit- and integration tests for these types, and then invent types for those types to more easily check them...
> you essentially have all of mathematics at your disposal.
Most of the stuff we do has nothing to do with mathematics.
One of the major selling points of a robust (not "strong", necessarily, but at least...not weak?) type system is that an entire class of unit tests are no longer necessary (e.g., those that validate/exercise handling of cases where data of an invalid type are passed as a parameter to the function). Integration tests are necessary independently of the implementation language--they don't test the correctness of units of code, but of communication between APIs.
> Go has a minimal type system, and is able to do much of Google's internal server side work.
And yet Go is adding generics in 1.8. And I'm sure its type system in another 5 years will be much more expressive than 1.8's. The community has long been saying that the minimal type system isn't enough.
Nit: they're adding Generics in 1.18 (not 1.8). Regarding "another 5 years": I'm not so sure. Go is very conservative about language changes. The type system didn't change at all from version 1.0 through version 1.17 (a 12-year period).
Some changes to nil accesses were made in 1.3. Tags were ignored in casts since 1.8. Overlapping methods were allowed in 1.14. New array pointer casts were added in 1.17. (Arguably, also type aliases in 1.9.)
None of these are as significant as generics, but things do change.
Yes, you're right. Saying that it didn't change "at all" was perhaps overstatement. But those are all very subtle changes. I've coded Go daily for 3-4 years and never been affected by them, except the tags one -- I think I used that once (post 1.8). Not sure what you're referring to about the nil changes in 1.3 -- I didn't see anything about nil in the 1.3 release notes: https://go.dev/doc/go1.3
Apparently what I was thinking of was in 1.2; nil pointers used to give semi-meaningful (but useless) results when an address was taken of a subfield, like the usual C implementation of `offsetof`. https://docs.google.com/document/d/14DgGJKGQeBTNJDXo3YxnlSwv...
It was before my time so I don't know what the practical impact was, just inherited some code that needed updating a few years ago...
I have used tagless casting a half-dozen times to deal with JSON vagaries; and overlapping methods today almost constantly. Slice conversions to array pointers is too new to say generally but I had one use-case pretty much immediately.
> Go has a minimal type system, and is able to do much of Google's internal server side work.
Isn't it stil mostly Java and C++? That's what I hear all the time here.
Also, I'm not sure what point you're trying to make. You start by saying that fascination with types systems is not useful in practice, and end with an example where it is useful (Rust). While Go can stick a GC to avoid most of the issues that Rust is trying to solve, it stil has to ship with a defer mechanism (no linear/affine types/RAII) and a data race detector.
It's been a while since I worked there, but the trend at Google at the time was that the amount of code written in each popular language was rapidly growing, and the number of popular languages was also slowly growing. (Despite a lot of resistance to introducing new languages.)
I'm out of touch, but I would expect that there is a lot more Go code by now, and it also didn't catch up with C++ or Java.
This is pretty accurate I would say. Although, Go is more popular outside of Google than inside it and the original dream of "Replace all Python and most C++ and Java with Go" is laughably dead. Google used to require SRE teams to explain why they weren't using Go for new projects and they abandoned that because it was ridiculous and no one took it seriously.
There are quite a few internal projects written in Go, but it feels like any time one of them gets big enough to matter, someone ends up reimplementing the important parts in C++ for performance reasons.
> Isn't it stil mostly Java and C++? That's what I hear all the time here.
Go's type system is much weaker and less expressive than either Java's or C++'s. C++ in particular has parametric polymorphism, type constructors, and dependent types. Go has none of those.
Everything is mutable. Builtin operators like assignment and equality are hardcoded and may behave badly for user-defined types. It doesn’t have subtypes; you can slice to an embedded value but can’t get from there back to the containing struct or its methods. It doesn’t have generic covariance and contravariance; you can’t decide whether a []Animal or a []Bulldog is acceptable in place of a []Dog.
> Most of the problems that cause non-trivial bugs come from invariant violations. At point A, there's some assumption, and way over there at point B, that assumption is violated. That's an invariant violation.
Which is exactly what a type error is!
> The Rust borrow checker is an invariant enforcer. It explicitly does automatic global analysis, and reports explicitly that what's going on at point B is inconsistent with what point A needs. This is real progress in programming language design, and is Rust's main contribution.
> That's the direction to go.
The borrow checker is an ad-hoc informally specified implementation of half of an affine type system. Having to switch programming languages every time you want to add a new invariant is a poor paradigm. What we need is a generic framework that allows you to express invariants that are relevant to your program - but again, that's exactly what a type system is.
Rust has done a great thing in showing that this is possible, but linear Haskell or Idris - where borrow checking is not an ad-hoc language feature that works by gnarly compiler internals, but just a normal part of the everyday type system that you can use and customize like any other library feature - are the approach that represents a viable future for software engineering.
> Most code written today do not require manual memory management
Rust has automatic memory management.
> or even explicit multithreading.
You don't need explicit multithreading to run into data races. Languages that allow any kind of unchecked mutable state sharing and allow any form of concurrency (explicit or hidden) are prone to that problem.
Even single-threaded programs with aliasing of mutable variables are hard to reason about and Rust improves on that considerably by not allowing accidental aliasing.
It's not a fascination, it's just easier and better to have good static analysis when programming. That doesn't have to be a type system, but I think there is a lot of reason to think that a type system is the lowest hanging fruit for useful static analyses.
I think this sums up the pragmatics well. Brian Cantrell discusses in one of his talks what they did at Sun to ensure they were writing safe C. This was a substantial amount of tooling they had to build up. Type systems bring you this tooling in a well founded, logical way. And as you say, it's a good place to start, even if it's just to know how the puzzle pieces of your code fit together.
Yes exactly. I'm kind of a broken record on this, but the key thing is static analysis. It's just that with statically typed languages, the type system specification and its implementation give you a giant head start on doing those analyses. You can build other kinds of static analyses for languages without static types, but it's just harder and you're way more on your own; you don't benefit from all the work put into the compiler for the language.
> and is able to do much of Google's internal server side work.
You mean, one of the companies with the largest number of developers on the world, paying one of the highest average salaries for them is able to use the language?
"Why not add X feature? If people don't want to use X, they just don't, and there are basically 0 downsides."
In theory this is true. If the compiler is decent, compile times and analysis shouldn't really be affected. Maybe libraries will use X but otherwise they would use a manual implementation of X anyways.
But in practice developers misuse features, so adding a feature actually leads to worse code. It also creates a higher learning curve, since you have to decide whether to use a new feature or just re-implement it via old features. See: C++ and over-engineered Haskell. So each feature has a "learnability cost", and only add features which are useful enough to outweigh the cost.
But most features actually are useful, at least for particular types of programs. It's much harder to write an asynchronous program without some form of async; it's much harder to write a program like a video game without objects. This may be controversial, but I really don't like Go and Elm (very simple languages) because I feel like have to write so much boilerplate vs. other languages where I could just use an advanced feature. And this boilerplate isn't just hard to create, it's hard to maintain because small changes require rewriting a lot of code.
So ultimately language designers need to balance number of features with expressiveness: the goal is to use as few simple but powerful features to make your language simple but really expressive. And different languages for different people. Personally I like working with Java and Kotlin and Swift (the middle languages in the author's meme) because I can establish coding conventions and stick to them, C++ and Haskell are too complicated and it's harder to figure out and stick to the "ideal" conventions.
All features are useful. That's table stakes. But usefulness is insufficient to warrant inclusion. How does a feature interact with all existing features? Are there ambiguities? Are there conflicts? A language is not a grab-bag of capabilities, it's a single cohesive thing that requires design and thought.
> But in practice developers misuse features, so adding a feature actually leads to worse code.
Is that really a problem on the language's side, though? Devs are capable of mis-using any feature, even extremely basic ones that almost every language has (variable names, for instance (although I'm laughing in FORTH)). Code standards and code reviews are necessary tools in the first place because it doesn't matter what language you give a programmer - they're perfectly capable of constructing a monstrosity in it.
I argue that preventing programmers from doing dumb things with well-designed language features (so, hygenic Scheme macros, and not raw C pointers) is a social and/or organizational problem, and it's better to solve that at that level than to try to solve it (inadequately) at a technical level.
("I keep dereferencing null pointers", on the other hand, is an example of a technical problem that can be solved on the technical level with better language design)
> Is that really a problem on the language's side, though?
Yes, for a language to be good in practice you need to look at what developers actually do and not how a perfectly rational developer would use the language.
> But in practice developers misuse features, so adding a feature actually leads to worse code.
I have found the opposite to be true. Missing features often leads to what one would call "design patterns". When the language adds official support to solve the problem you're trying to solve with that pattern, the code becomes clearer.
This entire article can be summarised as "compile time stuff should use the same language as run time".
I guess the author just hasn't encountered Nim before, where anything becomes compile time by just assigning to a const, and macros have access to the real AST without substitution. Macros also allow compile time type inspection, as they are a first class citizen rather than tacked on.
The compile time print, AFAICT, already exists in Nim as the `&` macro in strformat. That lets you interpolate what you like at compile time, and supports run time values too.
> This entire article can be summarised as "compile time stuff should use the same language as run time".
I think the message is more nuanced than that (otherwise wouldn't lisp with its homoiconicity and compile time macros fit the bill perfectly?). Idris uses the same language, but is still too complex. And Zig not general purpose enough. I don't want to put words in the author's mouth but I think the implication is that this is a large space to explore and we don't have a solution yet; there's nothing like "just make your language like this and it'll be good." They're just pointing out the problem they see, and some (non-)solutions to it.
I thought it was more nuanced too as they were explaining how integer types can be derived, until I finished the article, and they really did just seem to be complaining that there's a mismatch between compile time and run time.
Dynamic types don't really solve the problems they mention as far as I can tell either (perhaps I am misunderstanding), they just don't provide any guarantees at all and so "work" in the loosest sense.
> otherwise wouldn't lisp with its homoiconicity and compile time macros fit the bill perfectly?
That's a good point, I do wonder why they didn't mention Lisp at all.
> we don't have a solution yet
What they want to do with print can, as far as I can see, be implemented in Nim easily in a standard, imperative form, without any declarative shenanigans. Indeed, it is implemented as part of the stdlib here: https://github.com/nim-lang/Nim/blob/ce44cf03cc4a78741c423b2...
Of course, that implementation is more complex than the one in the article because it handles a lot more (e.g., formatting and so on).
At the end of the day, it's really a capability mismatch at the language level and the author even states this:
> Programming languages ought to be rethought.
I'd argue that Nim has been 'rethought' specifically to address the issues they mention. The language was built with extension in mind, and whilst the author states that macros are a bad thing, I get the impression this is because most languages implement them as tacked on substitution mechanisms (C/C++/Rust/D), and/or are declarative rather than "simple" imperative processes. IMHO, most people want to write general code for compile time work (like Zig), not learn a new sub-language. The author states this as well.
func zero*[bits: static[int]](T: typedesc[Stuint[bits] or Stint[bits]]): T {.inline.} =
## Returns the zero of the input type
discard
func one*[bits: static[int]](T: typedesc[Stuint[bits]]): T {.inline.} =
## Returns the one of the input type
result.data = one(type result.data)
It also has 'real' macros that aren't substitutions but work on the core AST directly, can inspect types at compile time, and is a system language but also high level. It seems to solve their problems, but of course, they simply might not have used or even heard of it.
> incorrectly states that Zig has "colored" async functions
This was indeed weird to read, given that only Zig (and soon the JVM) solves this problem, and is well known for the fact. Especially when language design and type theory are an area of interest.
But hey, silver lining: Zig still kind of came out on top.
> I cannot imagine a single language without the if operator, but only a few PLs accommodate full-fledged trait bounds, not to mention pattern matching. This is inconsistency . . .
How?
> Sometimes, software engineers find their languages too primitive to express their ideas even in dynamic code. But they do not give up . . .
Is this a failure of the language, or a failure of the engineer?
> If we make our languages fully dynamic, we will win biformity and inconsistency,[^] but will imminently lose the pleasure of compile-time validation and will end up debugging our programs at mid-nights . . . One possible solution I have seen is dependent types. With dependent types, we can parameterise types not only with other types but with values, too.
Types are a productive abstraction/model in programming languages. One of many. Each has its strengths and weaknesses; each is appropriate in some circumstances and not in others. Types are not the solution to all problems, any more than currying or OOP or whatever else is.
Shader languages are also hellbent on avoiding branches too so if is frowned upon and often not used. I could easily imagine not having it in a shader language.
The old assembly-like languages (ARB_fragment_program, NV_fragment_program*, et al.) did indeed not have branches, only selection and conditional termination, because that was the extent of the capabilities of the underlying hardware. (I understand the execution on modern fragment processors can’t actually diverge within a single batch, either, so they execute both branches and select afterwards, but they are at least capable enough not to do that if the branch went the same way everywhere. But it’s been a long time since I’ve had a state-of-the-art GPU to play with.)
Still true. Cuda warps work by team of 32 threads and if there is a branch they have to take both and then select the result. It's fine for loop termination ``while (i < 1000)`` but if there is actual work it's often significantly better to switch to branchless code.
This is very much changing. IMHO, doing shader language design today, you should give let the programmer express things in the most natural way possible, and let the compiler figure out whether to generate a branch or branchless code. Yes, often you want the latter, but compilers are pretty good at figuring that out.
Yeah, the current problem is that Idris code is far less efficient than Rust code, because Idris boxes everything and erases all types, and also Idris's support for borrowing seems less powerful than Rust (it lacks first-class mutable borrows as far as I can tell).
It seems that fixing this is a research problem, which would lead to the holy grail of programming languages, i.e. an ultimate language that is as expressive as Idris and as efficient as Rust, and is thus essentially perfect.
Expressiveness is not an unambiguous net good -- more expressiveness is not a priori better. Expressiveness carries costs of comprehension and coherence that need to be appropriately weighed in the contexts where the language will be applied.
Programming languages are not theoretical things. They're concrete, practical tools that _enable_ other stuff. Engineering, not science.
How would you define expressiveness (as its commonly used, so a definition where Turing complete languages can have different expressiveness) if not as how much something can be simplified and thus aiding comprehension, rather than detracting from it?
>Programming languages are not theoretical things. They're concrete, practical tools that _enable_ other stuff. Engineering, not science.
You can't escape theory, engineering is applied science.
I'm not the person you replied to, but here's an analogy: it's easier to learn how to drive a car with an automatic transmission than a manual one, even though the latter is "more expressive".
Heh, I would actually consider automatic transmission to be the more expressive one, since to me expressive means how easy it is to express something. Analogously e.g. C++ (manual) is more efficient and allows finer control, but makes it harder to express the same thing as in a 'higher level' (automatic) language.
Otherwise, since Assembly provides the most control out of all, would you consider it the most expressive? ;-)
I guess in my head "expressiveness" is some fuzzy combination of what you are able to do plus how easy it is to do those things. I'd consider a calculator that supports real numbers to be more expressive than one which only supports integers, all else being equal.
Maybe this definition is idiosyncratic, though. It's certainly not objective.
I'd agree that "what you are able to do" seems like it's intuitively part of expressiveness, but due to Turing completeness you don't have any situation where language A can compute something that language B can't. So the only difference in capabilities seems to be in how easy it is to compute something, rather than if one is able to do something.
Increasing expressiveness of a language necessarily increases its complexity. Comprehension is important but it's a function of "the whole stack" -- language and program both.
> Idris's support for borrowing seems less powerful than Rust (it lacks first-class mutable borrows as far as I can tell).
Depends on what you mean. Idris's notion of multiplicities essentially subsumes Rust's borrowing (there's some differences with affine vs linear types), so I can't think off the top of my head of things that you can ensure with Rust that you can't with Idris, but Rust has a lot more quality of life improvements that make things less clunky (also having a GC, Idris can get away with a lot less need for borrowing in the first place).
The Prusti effort to endow Rust with proof-carrying code is also worth mentioning. There are some reasons to expect this approach to be more fruitful than an actual extension of dependently-typed languages, since the type system features of Rust itself are hard to integrate with dependent types. (At best, it might be somewhat feasible to use the latter in the `const`, compile-time evaluated subset of the language.)
> an ultimate language that is as expressive as Idris and as efficient as Rust, and is thus essentially perfect.
Are both Idris's expressiveness and Rust's efficiency (given stronger guarantees) perfect? Aren't theese languages really complex both to learn and to write? There are poblems without a solution, perfect and unique to all of them.
Having used Clojure for a while now, I will say having 90% of things be a primitive, map, or vector goes a long way in and of itself. A lot of types concocted in a more conventional language just don't need to exist, IMO, and they create so much baggage around themselves.
Hmm, how well does this scale though? you are passing around these giant maps of vectors of tuples and then you pass it to someone unfamiliar with the code, how the hell do they know what's in there? Is the order price the first element of the tuple or the second? What happens when I refactor things and now all the tuple elements shift over one? Surely you'll end up writing just as much in documentation as you would have to specify the types?
Currently working my way through some complex Python code written in that style and it's completely impossible to understand it. In fact, the only way I can actually do it is transforming all these ad hoc data structures into proper types so I can make sense of it.
If your data is not position-dependent a tuple doesn’t sound like the correct choice. In the price example you provided, a map would be much better.
As for how you know what’s in there - you should only know whether what’s relevant to your function is in there and not care about the rest of the world. For the former, tools like clojure.spec are helpful but ultimately good design helps the most (something that typed languages can often obscure).
> If your data is not position-dependent a tuple doesn’t sound like the correct choice.
That's kind of the problem though. Software is written by humans, and humans are fallible. We don't always make the correct choices. Also, there are economic pressures, deficiencies in specification, and changes in business requirements.
Personally, I believe businesses should accept the aforementioned reality and optimise for cost of change.
I have the exact opposite experience. I can't think of anything I got more sick of than every freaking method in every rails project having `params = {}` where you have no idea what keys are required or expected or ignored. Easily 90% of these should have been named structures instead of these arbitrary data grab bags.
Agree that "map oriented" code bases are pretty bad. Always an unmaintainable mess, usually developed by single dev, painful to refactor. Seen this with Groovy back when some people thought this language had any merit.
If you had a generic params map, you would likely destructure it and the keys would be obvious. Destructuring in Clojure also gives you a way to specify defaults for each key right there.
In the SML/OCaml world there's something like that: there is a difference between types and modules, and functions (from types to types) and functors (from module to module). Work was done on 1ML to unify everything: https://people.mpi-sws.org/~rossberg/1ml/. An extract:
> In this "1ML", functions, functors, and even type constructors are one and the same construct; likewise, no distinction is made between structures, records, or tuples. Or viewed the other way round, everything is just ("a mode of use of") modules. Yet, 1ML does not require dependent types, and its type structure is expressible in terms of plain System Fω, in a minor variation of our F-ing modules approach.
> An alternative view is that 1ML is a user-friendly surface syntax for System Fω that allows combining term and type abstraction in a more compositional manner than the bare calculus.
On the other hand, from the "engineer" point of view, all abstractions melting into one may not be desirable. It's nice to be able to use weak abstractions for simple stuff and powerful abstractions for more powerful stuff. Being exposed to the full complexity of your language all the time sounds like a recipe for disaster.
I dislike the phrase 'dynamic language' and especially dislike the phrase 'static language'. We should say 'dynamically typed' or 'statically typed', because 'static' languages are the site of major dynamism.
I think 'dynamic language' is appropriate here, since it's not only talking about types; it's largely talking about macros, pre-processors, reflection, etc. too.
Also, the main argument is that separating features into those used at compile-time (AKA static) and run-time (AKA dynamic) is necessarily creating separate languages (i.e. a "static language", which may involve types, macros, preprocessors, etc.; and a "dynamic language", which may involve memory allocation, branching, I/O, etc.)
I didn't see the article touch on the "why" explicitly, but: zig really has the chance to square this circle for low level languages, since there is duck-typed type-inferenced-coercion in places where it makes sense. Completely correct about zig not necessarily being good for higher level stuff, but I think (dynamic) HLLs have been converging on dealing with this using static typechecking, with varying levels of success
I spent a couple of days with Zig. Thought the language was great but the tooling (on Windows) just killed it for me. I hope that gets better 'cos I'd like to give it another go
The problem I find with static typing is that it so easily leads you over-specifying the requirements / constraints. In fact, it makes such a virtue out of that over-specification that many people would consider it a best practice to do so.
For example, perhaps my `calculate_price` function only depends on 2 attributes of the order which has 65 attributes. Am I creating a 2-element data type for that function to process? no! I'm specifying that it processes an Order data type, with all its 65 elements. But implicitly then I'm saying the function has 65 input parameters of all these specific types and nobody can call it now without providing them all. What a pain! Huge amount of extra code, refactoring, unit testing, because of this.
So either you end up with a cambrian explosion of micro-types or you have these way overspecified interfaces everywhere.
Compare with dynamic languages (or structural typing, Go etc) that only care that things "quack like a duck". The calculate_price function doesn't care what object you give it, as long as it has the two attributes it needs. Now I can unit test `calculate_price` with a 2-element object rather than needlessly creating the 23 irrelevant required elements of a valid Order.
I think a lot could be solved with culture shift. Where data types are really known and locked in, use the crap out of them. As soon as things get ambiguous or flexible, go right ahead and specify that your function takes a Map<String,Object>. If a useful concrete interface emerges at some point factor it out then. The problem is that this is really frowned upon in a lot of places.
> Compare with dynamic languages (or structural typing, Go etc) that only care that things "quack like a duck".
Go is a statically typed language. Unless you're referring to interfaces at run time?
> I'm saying the function has 65 input parameters of all these specific types and nobody can call it now without providing them all. What a pain! Huge amount of extra code, refactoring, unit testing, because of this.
I've seen this "cambrian explosion of micro-types" argument before on HN but I think it comes from a misunderstanding about what you actually do in a static language. No one's creating types for every combination of parameters.
Either you'd pass the two arguments directly (`a, b: int` or whatnot), or you'd pass the Order type and just use the bits you need.
If you have multiple Order types, you'd use generics or something like it to get duck typing. If you used a field that wasn't there, you'd get a compile error.
The reality is the code would look pretty much the same in both static or dynamic languages.
> As soon as things get ambiguous or flexible, go right ahead and specify that your function takes a Map<String,Object>.
Dear god, please don't. Some of the worst spaghetti code I have disentangled used this pattern. Typos in the string literals used as keys, object type mismatches, etc.
If you only want some of the fields from another struct, you have a few options
* Define another struct `FooArgs`. This is easy, and I (respectfully) reject the claim that nobody does it.
* Just define your function to take those two fields directly.
You’re putting structural types in the same boat as dynamic types, which I don’t think is fair. Some of the most popular static type systems out there have structural typing, including Go (as you mentioned) and TypeScript. And that’s not even getting into languages that do extensive type-inference, including TypeScript Haskell and ReScript (which also saves you from locking into over-broad contracts).
One works with the type system they have, not the one they want. Out of the top 20 most popular languages according to Stack Overflow [0], TypeScript and Go are the only statically-typed languages that also have structural types support.
> The calculate_price function doesn't care what object you give it, as long as it has the two attributes it needs
The issue with this is it makes it difficult to understand code. If anything works potentially anywhere then the flip side is you have no idea what works anywhere without running the code.
Running code is slower iteration cycles than a type checker. Also you don’t actually know what it returned. So it was able to produce an output, was it what you expected? Or was it subtly different in ways that will break your code downstream.
I work at a company with a large untyped code base and the product is constantly breaking in these ways.
Or you could just take the two parameters you're actually using on your function. No new type, no need to pass your mega-object, just take two nice strongly typed arguments.
In addition to what others have said about just passing two parameters, there also row types, where the signature of `calculate_price` can be specified to accept any record that has the two required fields.
I don't write Python, but I think row level typing is stricter. Both the names and types of the record fields would have to satisfy the function signature, so the quacking is only honored on field names. Where dynamic languages will of course accept floats where ints are called for, etc, quacking all the way down.
The point of my original comment was to suggest that some of the flexibility offered by duck typing can be achieved in FP, so they should seem similar.
I would still just pass the fields as two parameters.
Typescript can do exactly what you ask for, “static duck typing”. Just define the input argument to be a interface with the 2 relevant attributes, any struct having at least these 2 attributes are now allowed, even if there is no explicit inheritance to it. This can be done inline in the function signature so doesn’t contribute to bloat.
Python can do the same using what they call Protocol. Here protocols need to be defined upfront.
This is usually called Structural typing, as opposed to nominal typing where classes inherit from a base.
Indeed, the article has it backwards. The types are always there. Your program will fail at runtime if it's not correct. The type system merely surfaced that.
Complexity in the types happens when the type system isn't expressive enough. Or when you're trying to do something that would make the compiler try to solve the halting problem.
To that last point, this is why the PLT community has pushed in the direction that Agda / Idris has. Kind of like how we realized years (decades?) ago that we didn't need pointer arithmetic, there's been a realization that "total" isn't actually that helpful, and it's okay if we didn't have languages that could express the halting problem.
That's the hope, but saying there's something wrong is insufficient. The compile-time errors need to be understandable, or it's just going to be frustrating.
Maybe we should judge compile-time constraint systems by how easy it is for the library author to add good error messages for misuse?
We do or we don't. There is no "might". Spending money on "might" has been the death of many projects.
If we didn't, and now we do, we could write a fn to map the car to parts, or we could define the car struct in terms of its parts, or we could just do away with the car altogether.
But far more valuable would be an analysis of what changed about the requirements that the model no longer works.
Now, don't get me wrong: I'd love a better language, and by better I mean "as fast as assembly but 'dynamic'". The problem is that, at the end of the day, all compilers are just "premature optimizations" or perhaps "willing premature optimizations". We could all be happily programming in smalltalk or build a runtime using predicate logic, but a) the number of people who could program in it is vanishingly small and b) it would be fucking slow. These languages don't solve a problem that I have, or rather they don't solve a problem that I don't already have a far better solution for. They solve a problem that academics have.
I think the comparison between printf in Idris and Zig is a little off, since the Idris version defines an intermediate datastructure, and hence requires extra parsing and interpreting functions for it. That's a nice approach, but the Zig version is operating directly on characters, so it's a bit apples-to-oranges.
We can get a more direct Idris implementation by inlining the parser (toFmt) into the interpreter (PrintfType). That lets us throw away `Fmt`, `toFmt`, etc. to just get:
Except this version doesn’t compile. I’m not sure that it’s possible to get it to compile: type-level Idris is actually a _subset_ of Idris and pattern-matching non-ADTs is half-broken on the type level. You can also observe this problem in this simplified example:
f : Char -> Type
f '0' = Int
f _ = Char
g : (c : Char) -> (f c)
g '0' = 0
g c = c
How is formatted printing related in any way to the internal representation of strings?
printf is what you call when you want to print X in hexadecimal with at least two digits, left justified on an eight-character wide field. I don't see how the sanity of whatever string representation the programming language uses is relevant here.
Some kind of formatting function would because sometimes, you really do need to print an integer with enough leading zeroes to fit in a five-digit field.
FWIW, I've been developing code directly in MLIR recently, and in MLIR "Comparing types is cool" is indeed true.
It's amazing what you can do when you have compiler transformations and targets always available.
Suddenly, "little DSLs" (MLIR dialects) don't seem so bad, since they are defined the same way and map in semantically-sound ways to lower-level dialects. You can have dedicated dialects, like Halide, for doing something as concrete as image processing kernels.
Oh, and you can output those kernels to both the CPU and GPU, including automatically introducing async functions, host-side sync barriers, etc. Good luck doing that automatically with a general purpose programming language and a combination of macros, AST manipulations, and derived types! You really need a compiler to stay sane.
Can I pick your brain on MLIR? It sounds awesome from what you describe, but I want to know more about whether it's specialized to machine learning types of workloads or whether it's good for more general things.
Well, we're using it for business automation. We have automated agents that are selectively override-able by humans on an as-needed basis (e.g. a case we don't currently handle, or because of a runtime error).
Also, most of our code needs to support suspend/resume on another machine, either in the middle of an action or more often between actions. So, a "behavior" might begin on machine A and then migrate to machine B to do more work, then on to machine C. While doing work, its execution state might be serialized to Postgres while some dependency is waited on—say, a human task that doesn't get done until the following Monday. It's then resumed in the same execution state, potentially on an entirely different worker/machine, and continues executing.
The suspend/resume stuff completely destroys the code if you're writing it by hand, as does moving from machine to machine.
So we write the core logic in our own internal MLIR dialect and then output code that has the suspend/resume semantics automatically (i.e. literal compiler transformations, plus our own "interpreter" (which is just JavaScript/v8 with all of the extra suspend/resume cruft added in).
We don't translate out of SSA form at all, our codegen can execute it directly. We also insert debug hooks so when there's an error, you can map the execution state to the original code.
Most of the cool machine learning stuff MLIR can do, we're not even doing yet outside of some internal prototypes. So far, just the methodology of MLIR has made a huge impact—it gives really nice structure (read: tooling) for the kinds of code transforms we've needed to do.
One way to do dynamic macro in static type language is to generate the source code using the host language as separate build process before the compilation of hand-written and generated source code.
For example in Typescript, I use tsc-macro to run "*.macro.ts", they can import any functions and modules just like normal source code. And their evaluated result are saved as "*.ts"
The generated ts are then compiled alone with other hand-written typical source files into js for deployment and execution.
Great article, made me think! However, I think it needs to be trimmed down. Making your argument in the final paragraph of the article is not great.
Hoist the "Final Words" section to the top and make it a "tldr" introduction, that way your reader can begin with a high level understanding of your argument, which you can hone and refine as you progress.
Almost all software running the world is written in statically typed languages. This is not by accident or because developers don’t know better. Every few months on HN somebody will make some new claim about why dynamically typed languages are somehow better. But the truth is that statically typed languages have won in the market place for real world software. And I don’t see anything changing that.
Python, JavaScript and PHP run on runtimes written in statically typed languages. And those runtimes run on operating systems written in statically typed languages, using hardware drivers written in statically typed languages. So yes the world does indeed run on statically typed languages. The code you write in Python/JavaScript/PHP is a thin layer on top of C/C++.
My guess: It wouldn't because this is about static languages. Typescript is still a dynamic language with a very smart (probably best-in-class at this point in time) compile-time typechecker/static analysis tool.
So whenever I have to study someone else's 'dynamic' python I encounter this sort of thing:
def foo(bar, baz):
bar(baz)
...
What the heck is 'bar' and 'baz'? I deduce no more than 'bar' can be called with a single 'baz'. I can't use my editor/IDE to "go to definition" of bar/baz to figure out what is going on because everything is dynamically determined at runtime, and even
grep -ri '\(foo\|bar\|baz\)' --include \*.py
Won't tell me much about foo/bar/baz, it will only start a hound dog on a long and windy scent trail.
To be fair, in recent years I've worked on a number of Typescript projects and it was common for developers to use `any`, `Object`, `() -> Promise<void>`, etc. Not super helpful.
Though in my experience, sane code structure and informative comments trump everything else when it comes to understanding big and unknown codebase. I still shudder when I think about working years ago on various Java codebases (mostly business IT systems). What a convoluted mess of n-levels deep interface hierarchies. Types? Yeah, but good luck unraveling what exactly is happening in the runtime.
It doesn't just make the code harder to read, it makes it run slower, too. Static typing provides some compile-time guarantees about what's going to go where, so the compiler can make a lot of simplifying assumptions that speed things up.
I agree that on balance type signatures are better -- and that's why modern Python has evolved to incorporate them. But they aren't a magic cure-all, and do they impose a significant tax of their own.
Oh yeah the easiest code in the world to read is some contorted type system and function signatures that look like hieroglyphics that you need a PHd in CS to comprehend.
Python is easy to grok, and if you have programmers writing code like bar(foo,baz) then the problem is not Python. You can write crap in any language.
Unit tests do much of what typing checks anyway ... and here's the thing ... you NEED unit tests no matter what. No typing system can tell you that you wrote > when you should have written <.
It’s what you’re used to. I personally find Python horrible to read because I used to a whole different class of programming languages. But I’m sure some of my code might be hard to read by others who aren’t used to that particular programming language too.
> Unit tests do much of what typing checks anyway ... and here's the thing ... you NEED unit tests no matter what.
Some, not all. Strictly typed languages are handy when it comes to refactoring and unit tests can sometimes fail there if the design is being changed enough that the unit tests need rewriting too.
> No typing system can tell you that you wrote > when you should have written <.
Not technically true. Some languages with a richer set of types and operator overloading could have code written to detect that sort of thing. But I do get your point that unit tests are import too.
I’ve been programming for > 30 years and in dozens of different languages. In that time I’ve felt strictly typed languages make larger and more mature code based slightly easier to maintain. While loosely typed languages are easier for smaller and/or younger code based. But largely it boils more down to personal preference than anything.
I will caveat that by saying the fact that Python supports type annotations should be telling that even dynamic languages benefit from a stricter approach to typing.
I'm glad to see someone else that finds Python unreadable. I keep seeing people saying that it's one of the most readable languages out there, and each time I feel like I'm from another planet.
> Oh yeah the easiest code in the world to read is some contorted type system and function signatures that look like hieroglyphics that you need a PHd in CS to comprehend.
People who just learn programming probably think the same about whatever language they are learning.
> No typing system can tell you that you wrote > when you should have written <.
The more the compiler can figure out for you, the quicker problems can be identified and fixed. I stopped using python altogether, because it was just infuriating to have the tiniest mistakes blowing up in spectacular and inscrutable ways. Mixing up values of complex types often does not fail at the actual site of the error, but much, much later. Sometimes literally later in time, as in hours, days, or months until you get an obscure "FooType does not Bar" error, and how the thing in question ever became a FooType is inscrutable at that point. If the result even is a runtime error at all! (Bonus points if your production database is now full of junk as well.[1])
The unit test did not catch it because it did not test the offending composition of classes and functions. Meanwhile, a compiler would have caught it immediately: "The thing you're doing here leads to your data structures being nested wrong."
When I started using async/await in python, at first it was just over, since in plain python that introduces another layer of typing without any assistance whatsoever. Then I discovered mypy which actually lets me do some amount of static typing in python, and it was very enjoyable and now python is back on the table for smaller projects.
There is a reason Haskell has the reputation of "if it compiles, it works". There is a reason why system programmers that work on critical systems are jealously eyeing Rust if their shop still does C.
By the way, dependent type systems absolutely can tell you if you wrote > instead of <. But since that usually comes at the expense of not being Turing complete anymore, it's more used for very critical systems, or for theorem provers.
[1] Yes sqlite, I'm looking at you. The decision to make database column dynamically typed, and hence have for example an INTEGER column silently accept data that is very much not an INTEGER at all, caused me some grief on a widely deployed system once.
Typing allows you to specify the expected behavior in terms of input/output structure of algorithms in such a way that they can be statically verified without writing unit tests or manual checking code in the source, allowing your unit tests to check behavior by value rather than by value and structure. The equivalent of type checking is not unit testing, but fuzzing.
Python is not easy to grok at all, when you consider you have to grok implementations to understand what they are supposed to do, and require extensive runtime debugging to figure out if it is behaving as expected before you can even write unit tests.
Compare to decent statically typed languages, which have quicker write/debug cycles since checking type definitions is faster than checking code behavior, and the structural unit testing is covered automatically by the compiler.
It's like getting more than 50% of your programs' test coverage, for free!
> Oh yeah the easiest code in the world to read is some contorted type system and function signatures that look like hieroglyphics that you need a PHd in CS to comprehend.
You can also make a book easier to read by ripping out all its pages.
If you eliminate the content you need to read to understand something, what have you actually made easier?
> No typing system can tell you that you wrote > when you should have written <.
There are many that can; e.g. via SMT-decidable refinement types, or even full undecidable dependent types coupled with automated solvers and manual proofs.
Yeah, I always say that python is an amazing language to prototype and terrible language to scale precisely because it lets people write the usual terrible code and then gives you the freedom to make it even worse.
In Clojure, I tend to put pre and post assertions on most of my functions, which is useful for checking errors in the schema of runtime data (very useful when dealing with 3rd party APIs) but it also offers the documentation that you are seeking:
(defn advisories
[config]
{:pre [
(map? config)
(:download-advisories-dir config)
]
:post [
(map? %)
]
}
(let [
dir (:download-advisories-dir config)
]
;; more code here
Pre/post conditions are complementary to a type system. They can ensure logical properties that may not be encodable in your underlying type system (that is, essentially every mainstream statically typed language). Such as the relationship between two values in a collection. Trivial example, if you have a range such as [x,y] where x < y must hold, how would you convey that in any mainstream type system?
The first part of that page demonstrates what amounts to pre/post conditions, but placed in the constructor. The range is checked dynamically, not statically.
The second part is using Peano numbers to enforce the constraint. I guess you could try and force that into some mainstream languages, probably C++. With its template programming you could get something going in this vein, though I'm not sure how well it would work if the number were calculated at runtime rather than compile time. You'd still end up with a dynamic check somewhere.
The way that the value floats through the system is checked statically, and the program can (and should) be designed so that the value with the appropriate type cannot be constructed unsafely.
If you need to statically check the construction of values in Haskell, there are things like refinement types[0].
> The way that the value floats through the system is checked statically, and the program can (and should) be designed so that the value with the appropriate type cannot be constructed unsafely.
Except that in the first example from the first link you sent me, there is no static guarantee that the inputs to the constructor are valid, thus the error branch (it would be unnecessary if static guarantees could be made regarding the use of the constructor). And that was my point, that you still end up with dynamic checks on the values which is where pre/post conditions step in to cover what static typing cannot (or, again, cannot easily in mainstream languages, which would not be Haskell).
Mostly because, per my reading (as an admitted Haskell dabbler and not fluent), it looks like a variation on a theme rather than a totally different thing. It seems to be a more consistent and refined (har har) way of doing the same thing as the first link, and still has a dynamic check (at least in one form, refine vs refineTH) just like the smart constructors.
But also because we got derailed from my initial point and context.
> Pre/post conditions are complementary to a type system.
Do you disagree or agree with this statement? Because you never addressed it either.
> They can ensure logical properties that may not be encodable in your underlying type system
Note the "may", because that's important. I didn't say that there were no languages in which my example could be encoded in the type system. And maybe it wasn't the best example, but the point itself was that there is no type system (that I'm aware of, not even Idris as far as I know) which can prove in its static type system every piece of logic about a program. This means that some properties of the system will end up being checked (if you bother to) at runtime and not at compile time. That's where pre/post conditions are useful, they contain information (and in a more deliberate form in cases like the Clojure example) about the properties of the system that are hard or impossible to encode directly in the type system.
Different type systems (both static and dynamic) let you express more or less with them, which reduces the need/desire to have checks like these in your program. But I seriously doubt that any mainstream language will ever totally remove their utility, as complements to the rest of the type system.
> there is no type system (that I'm aware of, not even Idris as far as I know) which can prove in its static type system every piece of logic about a program.
Many static type systems can prove anything that can be proven. Notionally one might write a program that relies on something unproven like the Collatz conjecture, though I'm not sure that would happen in practice (e.g. it's easy to write a program that relies on the Collantz conjecture for numbers below 2^64, but that's quite provable). Whether it's actually worth writing out the proof is another question though.
> This means that some properties of the system will end up being checked (if you bother to) at runtime and not at compile time. That's where pre/post conditions are useful, they contain information (and in a more deliberate form in cases like the Clojure example) about the properties of the system that are hard or impossible to encode directly in the type system.
This is true but makes surprisingly little difference, because you still want to keep track of which values have or haven't had that runtime check applied. So you almost always want your postconditions expressed as types (even if they're just "marker" types that indicate that a given runtime check was passed). Put another way, any metadata you would want to be able to track about a function's argument values or return value, you almost always want to be able to carry that metadata around with that value before it's passed in or after it's returned - but at that point that metadata is a type, and it's easiest to represent it as one.
Sorry, I should have been clearer. I was referring to the Template Haskell part of that refinement types library, where the construction of values can indeed be statically checked at compile time.
> > Pre/post conditions are complementary to a type system.
> Do you disagree or agree with this statement? Because you never addressed it either.
I agree with it. I write web applications in Haskell for a living, and the very nature of web applications is that most values coming into the system are not known until runtime. It is not a reasonable design goal to want every possible value in the system to be verified at compile time. However, it is valuable to be able to statically verify the relationships between functions and values as those values — once they have been parsed at runtime into a more principled type — move through the system.
This is one of my favourite blog posts on the Internet and I implore every programmer to read it.
> The claim is simple: in a static type system, you must declare the shape of data ahead of time, but in a dynamic type system, the type can be, well, dynamic! It sounds self-evident, so much so that Rich Hickey has practically built a speaking career upon its emotional appeal. The only problem is it isn’t true.
Static languages unfortunately don't save you from that. You find automatically inferred types, or types that refer to some abstract interface or template-class-mess but you have no idea where the actual implementation lives until you compile with RTTI and run it under a debugger... and as tfa posits, people working with the limitations of static languages often end up reinventing a dynamic structure.
Is this somehow supposed to relevant to the posted article or did you just want to start a tangentially related dynamic-vs-static flame war here in the comments section?
> Static languages unfortunately don't save you from that.
While you still can make a static language that is confusing, it's a lot harder... I challenge you to write a function signature in Rust that is both:
1) Useful
2) As opaque as the python signature above.
> You find automatically inferred types
A minority of static languages do type inference in function signatures. I think it's a bad idea for exactly the same reason the python code is bad. On the other hand, every dynamic language allows you to omit any information about a type signature.
They usually save you from that particular pitfall, but not always of course.
Static vs dynamic makes for such difference in the detailed workflow, both in terms of changing existing code & in terms of writing new/(more) from scratch code, yet they can both be quite fruitful, and can both be abused in absurdum.
It seems like people naturally fall into one of the two camps (either by personality or by training), and the other side just seems kind of insane: "how can you even work that way!?". Then culture and idioms emerge over time and strengthen the tribalism.
I've gone back and forth between the two over the course of my career, and it's quite a mind-shift when switching, with a fair bit of pain involved ("but it would be so easy to do this in [old language]", or "what the hell is this garbage anyway!?") and then eventually it settles in and it's not all painful, all the time ;)
(Going back and forth between Scala and Python right now, so this hit a bit of a nerve)
The types are there.. but you don't know which one it is that your program is dealing with. You could have dozens of implementations for any given abstract interface. One gets picked up at run time.
That's the theory. Works great when there are no bugs and everything's been designed just right. In that world you could wipe implementations from memory because you won't ever need to dive in..
Very often I'm looking at code and "how to use the interface" is not a question I'm looking to find answers for.
Some information is a lot better than none. In some cases you might want to know what implements the interface: that information is also statically available. In Rust, you can look at a trait and see what types implement that trait.
If you need to know exactly which implementation is being used in a particular context then maybe you shouldn't be using an interface, but should be using the concrete type?
> If you need to know exactly which implementation is being used in a particular context then maybe you shouldn't be using an interface, but should be using the concrete type?
Look again at the original comment by dandotway. It's not a question of "what type should I be using here", nor is it "how should I use this interface", but "what do I need to do to understand this code?". Even if an abstract interface is used correctly and is the right thing to do, you still need to understand the code before you can (debug|rewrite|extend|whatever) it.
And it's a pipe dream to say you can look just at the interface. Something blows up, is it the interface used wrong? Maybe, maybe not? Is it the interface implemented wrong? Maybe, first you need to know what implementation you're looking at. Subtle interactions between abstract and concrete send you spelunking through the layers when you're debugging or trying to extend the interface to account for something the original author didn't anticipate, and often the devil (in the details) comes from concrete implementations.
> to some abstract interface or template-class-mess
And traits! "Oh look, this functionality is implemented in a trait implemented by a trait implemented by a trait implemented by what you're looking at. Maybe"
When that JSON payload changes (intentionally or not), you will run into a mysterious problem in some unrelated area of your code. It will be significantly more expensive to fix than failing fast at the point of parsing.
But depending on the situation, that problem may never happen. I'm not a big fan of introducing complexity to guard against code screwups in the same codebase.
The hardest bugs are the rare bugs. Common and frequent problems are easy bugs to fix. These subtle or rare changes are those which should be feared. If you model only that which you support, finding the source of the problem becomes much easier.
But the thing is, don't you have to worry about structure? You have to unpack the elements from the JSON, so you will need to encode its structure explicitly, which includes type information. The only reason this would be useful is if you're just shuttling data to another API that expects a dict structure (that will then validate everything) and not JSON and you aren't really doing any real work yourself.
> The real power of dynamic languages is being able to do: const foo = JSON.parse(arbitraryJsonString); and not having to worry about the structure up front.
That's not power, that's a shotgun aimed at your crotch whose trigger is connected to a cosmic ray detector.
+NaN For comments that make me laugh out loud for duration T>2.0 seconds, I wish HN provided a way to transmute/sacrifice one's past karma points into additional +1 mod points.
Every static language that I know of also supports this -- you can parse into a sum type of `JsonAny` (or whatever), where `JsonAny` is one of `Null | Number | String | List[Any] | Dict[String, JsonAny]`.
The API then becomes a runtime fallible one, which is perfectly sound.
Then you write one short operator (and I agree that some static languages make this more cumbersome than it should be) to say so, and either handle the case where it isn't, or explicitly declare yourself partial and not handling it.
> Also, how do you then access nested objects, like data['key'][0]['attr'] in Python?
With lenses, something like:
data ^? (key "key") >>> (nth 0) >>> (key "attr")
If you do several unsafe operations in a row then this is cumbersome by design - you want to be clear which parts of your program are safe and which are unsafe, so that readers can understand and know where to review. But a good language should let you compose together several unsafe operations in a lightweight way and then execute them as a single unsafe operation, for cases like this where you want to work in the unsafe part of the language for a bit.
But my main point is that HideousKojima's "statically-typed" solution would result in a runtime type error if it was given unexpected input, just like a dynamically typed solution.
> But my main point is that HideousKojima's "statically-typed" solution would result in a runtime type error if it was given unexpected input, just like a dynamically typed solution.
I don't think HideousKojima ever called it a "statically-typed solution". Their point was that statically-typed languages still let you write unchecked code when you want to - and yes, of course such unchecked code can fail at runtime - but give you the option of having checking in the cases where you want it.
let foo: serde_json::Value = serde_json::from_str(arbitraryJsonString)?;
There, just as powerful [1]. But you know what's even more powerful? After you've done your dynamic checks, you can do this on the entire JSON tree, or on a subtree:
let bar: MyStaticType = serde_json::from_value(foo)?;
and you get a fully parsed instance of a static type, with all the guarantees and performance benefits that entails.
There's something to this. I love featurefull type systems, but I've seen engineers try to parse JSON the "right" way in Scala, get frustrated, and blame the entire concept of statically typed languages. Elm manages to make this user friendly so perhaps it's "only" a matter of compiler messages and API design?
Structure is a tool. Like any tool, it can be misused or overused.
For anything even remotely production-y I'll always prefer explicitly parsing JSON into a known structure, but there's a lot of value in in being able to do some exploratory scripting without those constraints.
not necessarily. there is a bottom up school of thought that encourages people to noodle around and construct primitives by playing in the domain, and then interactively composing those primitives into larger and larger systems.
Yeah, that's true. It's a judgment call for sure, but I've always found that angle on things to be self-subversive. The best programmers I know are all bottom-up learners, not top-down.
Unknown is a top type in the TS type system. It serves the very important role of "here you need to apply some pattern matching and validation" and then you can make sure that you can continue working in a type safe environment. TS has a lot of facilities that help you with this (from the usual narrowing things, typeof and instanceof guards, control flow analysis, and at the end of the list there are the big guns like this thing called type predicates which basically allows you to wrap these checks and "casts" in nice reusable functions).
There are also recursive types that help you model JSON, but knowing that it's an arbitrary deep nested map/list of maps/lists and number and bool and string mixed like a Bloody Mary cocktail doesn't really help :)
With NestJS it's very easy to add decorators/annotations to fields of a class, and the framework handles validation (throws HTTP 422 with nice descriptions of what failed) and then in your controller you can again work in a type safe environment.
As soon as you try to do anything useful to foo it's not arbitrary anymore. You have to make some kind of an assumption on the underlying type, check for keys, nulls, maybe it's a number (the right number?), maybe it's a list. So now you have to scatter some boilerplate checks everywhere you touch a part of foo.
If you could parse it into a typed structure up front, you'd only have to deal with this in one spot, and have guarantees for everything else that follows.
Bonus: if your typed language has good support for records, you can even do this in a way that only provides structure to the parts you care about, and is robust to changes to any other parts of the json.
So there's no docstring? And the actual variables are that random and indecipherable?
Sounds like the problem is that you're tasked with looking at code written by someone who is either inexperienced or fundamentally careless. When dealing with reasonably maintained codebases, this kind of situation would seem pretty rare. In modern python we now have type hints of course, which have helped quite a lot.
In other words, in Python you have to rely on your colleagues manually writing documentation, and if they don't you're out of luck and they're 'bad developers' and potentially the whole product is affected.
In static languages this simply isn't a problem. Types are checked for consistency at compile time and you don't have to rely on people toiling on this busy work.
Not to say documentation isn't necessary, or good, but isn't something you need to create working programs because otherwise no one knows wtf any variable is without running the program.
No, actually I advocate static typing approaches for precisely the reasons you give. As does Python, since years and years ago.
I'm just saying that core problem it solves -- "bad developers" -- is going bite you no matter what (if not addressed at the source). And that supposed magical efficacy of measures designed to protect against it is somewhat overstated.
What I've found is even worse: on medium-size personal projects I inevitably pass the wrong type to a function (or refactor and forget to change a call somewhere). Generally it does not fail until somewhere else, which relies on it being a certain type. So even though I know what my functions should take, I still spend a bunch of time tracking down the problems, which involves a printing out a lot of stuff because the error happened outside the current call stack. This is something a static type system would just prevent for me, and I've basically decided that anything beyond really simple stuff is actually faster for me to write in a statically typed language. (Examples in the past are a parser for a static site generator with a mostly Turing-complete language, and 3D model generators)
They're probably laughing because a) you're suggesting manually doing the work static typing does in a dynamic language because its untenable not to for large projects, and b) you can't easily add type hints to other people's libraries.
No - (a) is not what I'm suggesting. And (b) while disappointing, just doesn't slow one's work down very frequently in daily practice.
Look, I just don't buy the suggestion that static typing magically solves a huge set of problems (or that it does so without imposing negative tradeoffs of its own -- the very topic of the original article). Or that dynamic languages are plainly crippled, and that one has to be a kind of a simpleton not to see this obvious fact.
> just doesn't slow one's work down very frequently in daily practice.
Well, maybe you don't feel it slows you down, but it is manual work you must do to get a reliable product only because of dynamic typing. Not only that, but you have to then refer to these docs to check you're not creating a type calamity at some nebulous point down the run time road. Static languages just won't let you mess that up, and often have utilities to generate this documentation for you at no effort.
> I just don't buy the suggestion that static typing magically solves a huge set of problems
Static typing really does "magically" solve a whole class of problems without any negative tradeoffs, assuming the language has decent type inference.
Not all problems, but a specific class of them that you should do extra work to guard against in dynamic languages. Whether that is extra documentation that has to be reliably updated and checked, or run time code to check the types are what you expect at the point of use.
Take for example JavaScript, where typing is not only dynamic, but weak. Numbers in particular can be quite spicy when mixed with strings as I'm sure you know. Strong, static typing forces you to be explicit in these cases and so removes this problem entirely.
By the way, no one's saying anyone is a simpleton. The reality is our field is wide and varied, and different experiences are valid.
Dynamic languages can do some things that static languages can't. For example, you can return completely different types from different execution paths in the same function.
This has been something that has confused me when reading Python, but it does make it easier for stuff like tree parsing. In a static language you need to specify some variant mechanism that knows all data possibilities ahead of time to allow this. From my perspective the dynamic typing trade off isn't worth these bits of 'free' run time flexibility, but YMMV! It really depends what arena you're working in and what you're doing.
I think I've said about 3 times in this thread that I'm firmly in the "pro" camp as regards the net positives of static typing, and for precisely the reasons you've detailed. I just don't see its absence (or instances where it less than algebraically perfect) as the crippling dealbreakers that others seem to regard it as.
What I was referring to as "not slowing one's work down very frequently" was the corner case situation that someone brought up 4 comments above yours, which (in their view) renders the general type checking capabilities of Python >= 3.5 moot. I don't buy that logic, but that was their argument, not yours.
But to shift gears: if there are languages besides JS that you feel get their type system "just right", I'd be curious as to what they are, for the benefit of that future moment when I have the luxury of time to think about these things more.
Put back into context, your reply makes sense as these popular libraries are pretty battle tested. Having said that, it is a valid point that type hints being voluntary means they can only be relied upon with discipled developers and for code you control. Of course, the same point could be made for any code you can't control, especially if the library is written in a weakly typed language like C (or JS).
> I just don't see its absence as the crippling dealbreaker
My genuine question would be: what does dynamic typing offer over static typing? Verbosity would be my expectation, but that only really seems to apply without type inference. The other advantage often mentioned is that it's faster to iterate. Both of these don't seem particularly compelling (or even true) to me, but I'm probably biased as I've spent all of my career working with static typing, aside from a few projects with Python and JS.
> if there are languages besides JS that you feel get their type system "just right", I'd be curious as to what they are
This is use case dependent, of course. Personally I get on well with Nim's (https://nim-lang.org/) type system: https://nim-lang.org/docs/manual.html#types. It's certainly not perfect, but it lets me write code that evokes a similar 'pseudocode' feel as Python and gets out of my way, whilst being compile time bound and very strict (the C-like run time performance doesn't hurt, too). It can be written much as you'd write type hinted Python, but it's strictness is sensible.
For example, you can write `var a = 1.5; a += 1` because `1` can be implicitly converted to a float here, but `var a = 1; a += 1.5` won't compile because int and float aren't directly compatible - you'd need to type cast with something like `a += int(1.5)`, which makes it obvious something weird is happening.
Similarly `let a = 1; let b: uint = a` will not compile because `int` and `uint` aren't compatible (you'd need to use `uint(a)`). You can however write `let b: uint = 1` as the type can be implicitly converted. You can see/play with this online here: https://play.nim-lang.org/#ix=3MRD
This kind of strict typing can save a lot of head scratching issues if you're doing low level work, but it also just validates what you're doing is sensible without the cognitive overhead or syntactic noise that comes from something like Rust (Nim uses implicit lifetimes for performance and threading, rather than as a constraint).
Compared to Python, Nim won't let you silently overwrite things by redefining them, and raises a compile time error if two functions with the same name ambiguously use the same types. However, it has function overloading based on types, which helps in writing statically checked APIs that are type driven rather than name driven.
One of my favourite features is distinct types, which allow you to model different things that are all the same underlying type:
type
DataId = distinct int
KG = distinct int
Data = object
age: Natural # Natural is a positive only integer.
weight: KG
var data: seq[Data]
proc newData: DataId =
data.setLen data.len + 1
DataId(data.high) # Return the new index as our distinct type.
proc update(id: DataId, age: Natural, weight: KG) =
data[id.int] = Data(age: age, weight: weight)
let id = newData()
id.update(50, 50.KG) # Works.
50.update(50, 50.KG) # Type mismatch got int but expected DataId.
id.update(50, 50) # Type mismatch got int but expected KG.
id += 1 # Type mismatch += isn't defined for DataId.
As you can imagine, this can save a lot of easy to make accidents from happening but also enriches simple integers to serve other purposes. In the case of modelling currencies (e.g., https://nim-lang.org/docs/manual.html#types-distinct-type) it can prevent costly mistakes, but you can `distinct` any type. Beyond that there's structural generics, typeclasses, metaprogramming, and all that good stuff. All this to say, personally I value strict static typing, but don't like boilerplate. IMHO, typing should give you more modelling options whilst checking your work for you, without getting in your way.
It "can", but it's a design decision not to by default because mixing `uint` and `int` is usually a bad idea.
This is telling the compiler you want to add an `int` that represents (say) 63 bits of data with a +/- sign bit to a `uint` that doesn't have a sign bit. If `a = -1` then `b = uint(a)` leaves `b == 18446744073709551615`. Is that expected? Is it a bad idea? Yes. So, the explicit casting is "getting in your way" deliberately so you don't make these mistakes. If `a` is a `uint`, it can't be set to `-1`, and adding them is freely allowed.
Incidentally `uint` shouldn't be used for other reasons too, for instance unsigned integers wrap around on overflow, whereas integers raise overflow errors. The freedom of mixing types like this are why languages like C have so many footguns.
In short, explicit is better than implicit when data semantics are different. When the semantics are the same, like with two `int` values, there's no need to do this extra step.
You could create a converter to automatically convert between these types, but you should know what you're doing; the compiler is trying to save you from surprises. For `int`/`float`, there is the lenientops module: https://nim-lang.org/docs/lenientops.html. This has to be deliberately imported so you're making a conscious choice to allow mixing these types.
> don't you get tired of typing (and reading) `uint` twice in the latter setting?
Well, no because I wouldn't be writing this code. This example is purely to show how the typing system lets you write pythonesque code with inferred typing for sensible things, and ensures you're explicit for less sensible things.
For just `int`, there's no need to coerce types:
var
a = 1
b = a + 2
intro = "My name is "
name = "Foo"
greeting = ""
b *= 10
# Error: type mismatch: can't concatenate a string with the `b` int.
# greeting = intro & name & " and I am " & b & " years old"
# The `$` operator converts the `b` int to a string.
greeting = intro & name & " and I am " & $b & " years old"
# If we wanted, we could allow this with a proc:
proc `&`(s: string, b: int): string = s & $b
# Now this works.
greeting = intro & name & " and I am " & b & " years old"
echo greeting # "My name is Foo and I am 30 years old"
# Normally, however, we'd probably be using the built in strformat.
# Incidentally, this is similar to the printf macro mentioned in the article.
import strformat
echo &"My name is {name} and I am {b} years old"
We don't want to automatically convert between `int` and `float` because there's a loss of information, since floats aren't able to represent integers precisely.
However, we don't need to specify types until the point of conversion:
let a = 1
let b = a.float
> Python's behavior (though correct to spec) is arguably worse
Yeah that is not ideal. Looking at the code it seems logical at first glance to expect that `b` would be a `float`. In this case, the type hints are deceptive. Still, it's not as bad as JavaScript which doesn't even have an integer type! Just in case you haven't seen this classic: https://www.destroyallsoftware.com/talks/wat
Python takes a very non-obvious position on this from my perspective.
Ultimately, all these things are about the balance of correctness versus productivity.
I don't want to be writing types everywhere when it's "obvious" to me what's going on, yet I want my idea of obvious confirmed by the language. At the other end of the scale I don't want to have to annotate the lifetime of every bit of memory to formally prove some single use script. The vast majority of the time a GC is fine, but there are times I want to manually manage things without it being a huge burden.
Python makes a few choices that seem to be good for productivity but end up making things more complicated as projects grow. For me, being able to redefine variables in the same scope is an example of ease of use at the cost of clarity. Another is having to be careful of not only what you import, but the order you import, as rather than raise an ambiguity error the language just silently overwrites function definitions.
Having said that, as you mention, good development practices defend against these issues. It's not a bad language. Personally, after many years of experience with Nim I can't really think of any technical reason to use Python when I get the same immediate productivity combined with a static type checking and the same performance as Rust and C++ (also no GIL). Plus the language can output to C, C++, ObjC and JavaScript so not only can I use libraries in those languages directly, and use the same language for frontend and backend, but (excluding JS) I get small, self contained executables that are easily distributable - another unfortunate pain point with Python.
For everything else, I can directly use Python from Nim and visa versa with Nimpy: https://github.com/yglukhov/nimpy. This is particularly useful if you have some slow Python code bottlenecking production, since the similar syntax makes it relatively straightforward to port over and use the resultant compiled executable within the larger Python code base.
Perhaps ironically, as it stands the most compelling reason not use Nim isn't technical: it's that it's not a well known language yet so it can be a hard sell to employers who want a) to hire developers with experience from a large pool, and b) want to know that a language is well supported and tested. Luckily, it's fairly quick to onboard people thanks to the familiar syntax, and the multiple compile targets make it able to utilise the C/C++/Python ecosystems natively. Arguably the smaller community means companies can have more influence and steer language development. Still this is, in my experience, a not insignificant issue, at least for the time being.
> I just don't buy the suggestion that static typing magically solves a huge set of problems
// compiles and runs, but does bad things
function foo(x, y) {
someDangerousEffect();
return x + y;
}
-- does not compile; huge sets of problems magically solved
foo :: Int -> Int -> Int
foo x y = someDangerousEffect >> pure $ x + y
Our main codebase is in PHP, but we enforce type hinting for all new code, so it feels more like a static language at this point in practice. However, there are chunks of old code without type hinting or types in PHPDocs. Whenever I have to deal with that code (especially if it's unknown code), my productivity decreases considerably. I have to click through many layers of functions to figure out the data flow to understand what the implicit contract of a function is. In static languages, all you need to care about is a contract, the rest is implementation details. In the dynamic portions of the codebase, there's just too much cognitive load because I have to look at implementation details to get it. PHPDocs and dynamic checks seem to be pretty error-prone, because a dev often forgets to update both the code and the annotations/type checks (and type checks are often ad hoc and random), leading to even more cognitive load. Having static analyzers in the pipeline to have some control of the situation leads to longer build times, so in the end it feels like PHP builds slower than all our Go projects combined.
When conventions are followed they do exactly that, actually. And unless the code is a complete trainwreck, it's pretty easy to tell what the return type is, even without explicit annotation in the docstring.
PEP 257, and local conventions in certain projects.
What all comes down to is: if you really have people on your project writing code like the foo/bar/baz example way up above - then you have problems way bigger than static type checks can possibly help you with.
Wouldn't it be something if there existed tooling that enforced this level of discipline and checked its validity before executing any code such that you didn't rely on the entire ecosystem to adhere to the same standards and remove that as a source of ambiguity...
> I can't use my editor/IDE to "go to definition" of bar/baz
I use "Find Usages" on foo to see where it is used. Once you see where it is used, you know what the types can be. It's not great, but it's also something that can be progressively remedied.
In the event that the function is not as trivial as your example suggests, the author should have written a docstring to help you understand what it is trying to do, in addition to type annotations that will make it more readable.
In this example, bar can either be a function, a class, or any object with __call__(), so the type information is less important in this case, than actual docstrings that express intent.
My worst developer experience ever was trying to make changes to a custom build system with functions exactly like that. Injector or decorator pattern, or what ever it is called.
I just gave up and introduced globals and used as flags for some places in the code.
To make things even more fun, big parts of the system was written in 2.7 called from runtime generated bat files from 3.4 as remnants of a rewrite and the consultant that had his funding cut.
But also, there’s nothing stopping the code from being much clearer about its intention than this weirdly contrived example (I have a lot of code and most the function names are pretty unique). And surely you want to search for ‘foo(‘ to find invocations.
Most of the problems that cause non-trivial bugs come from invariant violations. At point A, there's some assumption, and way over there at point B, that assumption is violated. That's an invariant violation.
Type systems prevent some invariant violations. Because that works, there are ongoing attempts to extend type systems to prevent still more invariant violations. That creates another layer of confusing abstraction. Some invariants are not well represented as types, and trying makes for a bad fit. What you're really trying to do is to substitute manual specification of attributes for global analysis.
The Rust borrow checker is an invariant enforcer. It explicitly does automatic global analysis, and reports explicitly that what's going on at point B is inconsistent with what point A needs. This is real progress in programming language design, and is Rust's main contribution.
That's the direction to go. Other things might be dealt with by global analysis, Deadlock detection is a good example. If P is locked before Q on one path, P must be locked before Q on all paths. There must be no path that leads to P being locked twice. That sort of thing. Rust has a related problem with borrows of reference counted items, which are checked at run time and work a lot like locks. Those potentially have a double-borrow problem related to program flow. I've heard that someone is working on that for Rust.