
Type systems and the future of programming languages - lewisjoe
https://docs.google.com/document/d/1-aLUwnN0XzLbzICnFLWfCLuD4ULYXGcKAoyRAqTuAIY/edit#heading=h.bxiqdlerl36p
======
pcwalton
We aren't going to move to dependently-typed languages based around theorem
proving anytime soon. The reason is simple: those languages don't support
normal industry development practices, where you don't have time to spend 10x
the time on verification as you do on development. (They _do_ support
development practices favored by the US Department of Defense, which is
perhaps part of the reason why research into such type systems attracts DARPA
grants.)

Where such languages may have a future is for proving some kernel of core
infrastructure code correct, where the code has such a high value to size
ratio that spending the time to prove it correct may cross the line into
economic viability. SeL4 is a high-profile example of this. Another example
that's been on my mind over the years could be proving the unsafe code in the
Rust standard library correct. The latter would be interesting because it
would address the criticism of "well, there's unsafe code in libstd, how can
you say Rust is safe?" without burdening regular Rust users (not standard
library developers) with all the baggage of a complex type system.

~~~
touisteur
Not even going to dependent types, the type system and verification
capabilities of the SPARK language & toolset already allow for use in the
mission-critical (and not only safety-critical) domain.

The tech is progressive so you can go first for 'correct' data flow (no access
to uninitialized variables, dataflow constraints like Depends, Global, very
interesting stuff already), then for proof of the absence of runtime errors,
then for some specific properties (think maybe properties like 'this should
never happen' or properties you'd write when doing PBT) to full functional
proof. On a procedure to procedure base.

The progress that's been made with support of floating point is amazing.
Things that were hard 3 years ago now get proved without help. Pointers are
now supported via a rust-like mechanism.

NVidia seems to believe in the tech (well at least for Ada) for some of its
new firmwares. I've seen pure algorithmic code fully proven with similar
effort as it would have taken testing /AND/ you always find bugs... People
that have written some SPARK then start thinking and designing for proof. Code
gets simpler, design clearer, loops more evident, added contracts help
understanding the interface and clear the mind while reading code (like early
returns would have done before).

I don't work for AdaCore or Altran UK but I think they're amazing minds at
work here.

~~~
pcwalton
Maybe some folks are experimenting with it, but I would be shocked if SPARK
achieves market share of 1%. Or 0.1%.

~~~
touisteur
I think it's a bootstrapping problem, similar to rust adoption. People are now
getting enthusiastic about a language with lots of static guarantees and a
difficult-to-please compiler, and I keep reading people almost jumping to
formal proof/verification ('now that I made all this effort and I get all
those static guarantees, why not go the extra static verification mile ?'.

I think the rust formal-methods community should look at what's already been
done with Spark2014 (Ada) and Frama-C. A large part is an adaptation of the
language for proof (adding /executable/ contracts, some quantifiers, loop
variants/invariants, expression functions), and another big part of the work
is in why3 and all the SMT tech behind, so can be re-used 'for free'.

Sure not even 0.1% market share today, and I'm not even sure Ada is above
this. But it's there, it's already mature, progress is very fast, and it can
be tried out for free without 'asking for a quote'. Same for Frama-C.

Sure it's a paradigm shift, but it's not an academic tool with an obscure UI
and no retargetable skills.

------
abeppu
This author predicts that we'll drift towards languages like Agda or Idris,
and also discusses how type systems came out of math. I'm curious about two
related questions:

1\. Why is it that a minority of math is done with proof systems (i.e.
yielding automatically checkable artifacts)? It's burdensome to formalize much
of the math that we're interested, even when we're confident in that math.
What's the nature of that burden, and how could tools improve to lessen it?

2\. Are the difficulties which prevent a lot of math from being done with
these systems the same as the difficulties which prevent programmers from
using them? There are many invariants that we can easily know about a program,
but representing the lot of them in types would be onerous. How do we make
that easier/lighter?

~~~
zozbot234
> Why is it that a minority of math is done with proof systems (i.e. yielding
> automatically checkable artifacts)?

Because most interesting math depends on a lot of other math, and formalizing
all of these pre-requirements is a significant burden. The math subfields
where this is less of a problem (so-called 'synthetic' mathematics) are also
where formalization is being used the most.

~~~
Smaug123
And additionally it takes a _lot_ of effort to formalise a proof. (I say this
as someone who has spent nearly two years formalising basic maths in Agda.)
It's true in maths as in programming: 90% of the work is in the last 10%, and
if you don't get the last 10%, you haven't got a proof.

~~~
zozbot234
The usual rule of thumb is that 1 page of paper proof leads to 4 pages of
"proper" formal proof. Though this may vary depending on how capable the proof
assistant is - AIUI, Agda requires you to write out complete proof terms,
whereas other systems may be more capable.

Sometimes you can use a kind of "reflection" that makes many proofs trivial,
simply by defining a decision procedure that's proven to be correct in the
general case, and having the system perform the appropriate computations. This
is how you would deal, e.g. with trivial simplifications in elementary algebra
(that can usually be proven to be valid in any ring, field etc.)

~~~
Smaug123
Correct, Agda has this notion of reflection. The way to do "tactics" in Agda
is simply to manipulate expressions, and there's no meta-language in which to
do this: it's just Agda all the way down. It's neat, but not easy.

------
mbrock
It is quite strange how people assume and spread the notion that types are the
best or the only place to put propositions. Why ignore the option of
specifying propositions as separate invariants or postconditions, verified by
proof, model checking, or testing? Why this focus on Curry-Howard, dependent
types, and type theory?

~~~
zozbot234
> Why ignore the option of specifying propositions as separate invariants or
> postconditions

These are usually called 'contracts'. But there's not much of a distinction
between contracts and refinement-types.

------
crimsonalucard
LEAN is another popular language in this area.

There are huge benefits for moving the industry in this direction. Such
programs that are formally correct eliminates the need for testing almost
completely and can cut such infrastructure down by possibly 95%. A program can
use dependent typing to prove itself 100% correct vs. 100 unit tests which
proves only 100 arbitrary test cases as correct. Unit tests are statistical
experiments where the people hope that the verification of 100 arbitrary test
cases correlates with the entire program being correct.

Yes, proofs are harder to write than tests, yes there can be bugs in your
proof (their can be bugs in tests too). However, I genuinely believe that for
the lifetime of an application, in general, the net benefits of dependent
types is positive and greater than that provided by testing.

However I do not believe the industry will move in this direction.

The reason why the industry won't move in this direction is largely cultural
and intellectual. It is harder to learn how to use dependently typed languages
and harder to learn how to write a proof of correctness than it is to learn
how to write 30 unit tests.

Additionally you just need to look at how the industry changes to see that the
industry does not trend towards "better" technologies for abstraction. HTML,
css, javascript, JAVA, SQL are all awkward/imperfect technologies that
dominate the industry for reasons other than technical prowess. Culture
dictates their dominance as will it dictate the language of the future.

You even get technology that culturally moved backwards simply because people
just don't get the importance of static typing... python, php and javascript
all came out after typed languages were popular and are all used for large
applications where typing would otherwise be very important.

Trends like this, and lack of awareness of even algebraic data types tells me
that dependent typing is even more unlikely to become popular.

~~~
tsimionescu
> Such programs that are formally correct eliminates the need for testing
> almost completely and can cut such infrastructure down by possibly 95%

I think this is an absurdly high number. The difficulty of proving the kind of
rich properties that integration and system-level testing can easily cover is
mind-boggling, and also requires huge amounts of infrastructure that doesn't
exist and is not feasible for a single team to build.

Think of an end-to-end test for a traffic generator application - you simulate
a click in a web UI, you start a TCPdump on a network interface, wait for a a
few minutes, and sample the capture for some expected packets.

How much dependently-typed infrastructure would you need to get even the small
level of assurance that this test gives? How much of networking protocols and
OS APIs would you have to formally specify to even start proving that your
program adheres to them? How much of the DOM and JS APIs would you have to
formally specify in order to prove that the button is going to be visible on
the screen and that clicking it will produce the right HTTPS calls to the
backend?

I am reasonably sure that dependent types will continue to get better and will
help with developing certain kinds of abstract software components. But I
believe that it would take decades before they can become a tool that could be
used to prove a realistic large scale program to the point that you can throw
away 95% of the testing done today.

~~~
crimsonalucard
Nah I'm going by the testing pyramid. Basically by 95% I mean all unit tests,
which under the testing pyramid is the majority. If you follow different
philosophies (which is valid and perfectly ok imo) and have more integration
tests then unit tests then the 95% marker does not apply.

Integration tests or end to end tests or any test that touches IO or tests
that measure performance are outside the purview of proofs. Proofs only verify
pure logic.

Having a full formal model of an end to end system that can be verified with
proofs is, I completely agree, really far away.

If 95% of your tests are unit tests then my statements apply.

------
smitty1e
What I'd like to locate/implement would be:

\- an in-memory SQLite database with

\- a sufficiently generalized schema that could

\- ingest the static type information emitted by, say GCC, so that

\- one could easily inspect/compare metadata from any language.

I doubt that this idea is unique to me.

~~~
Boulth
Sounds like Language Server:
[https://langserver.org/](https://langserver.org/)

~~~
smitty1e
Same idea carried further to include data types and objects that actual
programs in the languages use, not just the EBNF grammar constructs of the
languages.

Unless I missed that in my quick glance at the link.

~~~
Boulth
Yep, LSP does include semantic view of the code. That is anything you'd see at
the end tools such as IDEs: types, members, functions, properties within
context (e.g. this function is a member of this class).

------
karmakaze
What's the purpose of posting as a G-Doc instead of HTML? The visitors' google
account info could potentially be collected.

~~~
3pt14159
Not really anymore. It's now anonymized by default.

~~~
nixpulvis
Color me skeptical. Plus it breaks the "visited" styling on HN, and generally
pisses me off for throwing me out of my browser.

<angry old man waves fist at cloud />

------
lidHanteyk
I'd like to joke that we could replace the link with [0] and it would be a
massive improvement. Seriously, though, these descriptions of type-theoretic
programming as a solution to a problem, rather than a tower to be climbed, are
increasingly obscurantist, hiding the meat of the correspondence behind
analogies and narratives.

[0]
[https://ncatlab.org/nlab/show/computational+trinitarianism](https://ncatlab.org/nlab/show/computational+trinitarianism)

~~~
carapace
> hiding the meat of the correspondence behind analogies and narratives.

Interesting. Could you elaborate a little?

~~~
lidHanteyk
Sure. Any time somebody says "Curry-Howard" and doesn't provide the table that
I provided, they are handwaving. The table makes the correspondence precise.

~~~
carapace
Cheers. When you say "the correspondence behind analogies and narratives",
which column is analogy and which is narrative? I hope I'm not being really
dense or ignorant here, apologies if so.

~~~
lidHanteyk
The page I linked is relatively clear. I was complaining about the original
article, which fails to make contact with the actual guts of formal type
theory.

~~~
carapace
Right, I get that. But the words "analogy" and "narrative" don't appear in the
page. I'm wondering what you mean by those words?

~~~
lidHanteyk
For fuck's sake.

> For instance, it's impossible to write an executable program in Java that
> adds a string to a number.

Wrong [0] and misleading and not a reasonable summary of type-driven
development. The article goes rapidly downhill from there. It tastes like an
undergrad who has just learned about the very basics of things and is still
trying to figure out how to put them all together.

> So it is possible to statically verify that a function always returns the
> square of its input, it just needs a proof (aka its type).

This is not sufficiently convincing. What exactly was that type, and which
language was it written in? The type "int -> int" is not at all enough. This
sort of handwaving logic seems to come up constantly with folks new to type
theory. And each time it comes up, it seems that a massive argument must
happen [1] in which folks who actually understand type theory have to remind
folks that no, type systems do not automatically save us.

Here are two analogies in the article. "I'm lying" is a reference to a famous
paradox [2], and the barber story is a truncated part of an older set of
stories told by Russell, Gardner, Smullyan, and others [3]. All fine and well.
However, they then try to claim that types somehow fix the underlying
paradoxes, but of course, they don't. This is because the underlying cause of
the paradoxes is of course Gödel's first incompleteness, which works even in
statically-typed environments [4]. Worse, the article author makes it sound
like incompleteness is some barrier to be climbed over, when in fact it is a
basic (if tough-to-prove) categorical property [5].

[0] [https://hackernoon.com/java-is-
unsound-28c84cb2b3f](https://hackernoon.com/java-is-unsound-28c84cb2b3f)

[1]
[https://lobste.rs/s/yyhu4w/real_problems_with_functional_lan...](https://lobste.rs/s/yyhu4w/real_problems_with_functional_languages#c_4pbkan)

[2]
[https://en.wikipedia.org/wiki/Liar_paradox](https://en.wikipedia.org/wiki/Liar_paradox)

[3]
[https://en.wikipedia.org/wiki/Barber_paradox](https://en.wikipedia.org/wiki/Barber_paradox)

[4] [http://r6.ca/Goedel/goedel1.html](http://r6.ca/Goedel/goedel1.html)

[5]
[https://ncatlab.org/nlab/show/Lawvere%27s+fixed+point+theore...](https://ncatlab.org/nlab/show/Lawvere%27s+fixed+point+theorem)

~~~
carapace
... I just realized that, when you wrote:

> hiding the meat of the correspondence _behind_ analogies and narratives.

I misread:

> hiding the meat of the correspondence _between_ analogies and narratives.

...and I thought you might have been pointing to some potentially very
interesting deep connection or something.

Sorry for the noise.

------
malkia
The webpage for AGDA seems to be
[https://github.com/agda/agda](https://github.com/agda/agda) or
[https://wiki.portal.chalmers.se/agda/pmwiki.php](https://wiki.portal.chalmers.se/agda/pmwiki.php),
but not [https://www.agda.com.au/](https://www.agda.com.au/) (end of the
paper).

~~~
lewisjoe
Thanks. Fixed the links.

------
ngcc_hk
Still think that dynamic language with dynamic type and messaging still have a
future. Objective c, lisp and JavaScript have some of this. Not all will die
out.

~~~
zozbot234
"Dynamic types" is a misnomer; these are runtime _tags_ in a system of
"tagged" values endowed with a _single_ static type. Types apply statically to
program expressions, not runtime values. "Messaging" similarly represents the
input to a dispatch step-- usually one that introduces non-trivial pitfalls if
you want to ensure that your code keeps making sense as it evolves (see
"fragile base class" and the like).

Anyway, dependently-typed languages support these patterns _better_ than most
other static languages do, since static types can trivially be reified as
runtime-tagged data in such languages, while keeping the ordinary benefits of
static typing for most of the code.

------
cryptica
I don't think it's accurate to compare type systems and mathematics. When you
write out a mathematical equation, you don't explicitly specify if the
variable is a float or an integer or a set or a vector or a matrix as part of
the equation, the reader will infer the type of each variable from the problem
domain. So in fact, math is much more like dynamically typed languages.

>> We know types help us eliminate certain classes of errors from programs.

Yes and in my experience they often also introduce new kinds of errors,
architectural errors (which are much worse). If you have a system which makes
it easier for developers to pass around complex instances across multiple
source files, they will use that feature and it will often lead to modules
which have lower cohesion and tighter coupling which adds complexity and makes
it harder to modify and maintain the logic in the long run. Rigid type systems
also add complexity when integrating with third party modules which may use
different type names for similar concepts.

The wisdom of dynamically typed languages is precisely that it is difficult to
know what the type of each variable is so it forces you to follow the logic
around (which is unpleasant but necessary). What is almost always overlooked
is the human psychological effect; this unpleasantness involved in trying to
keep a mental picture of the logic creates a strong incentive for developers
to keep the logic as simple, modular/encapsulated as possible. This results in
better software design/architecture overall.

I'm saying this based on decades of experience having gone back and forth
between dynamically and statically typed languages. Programming is as much
about human psychology as it is about logic. Dynamically typed languages force
developers to be more disciplined and this mindset is extremely valuable.

Statically typed languages tend to put developers on auto-pilot. You get so
caught up on types and catering to the compiler's warnings that you lose some
common sense. Coding in a statically typed language feels a bit like having a
boss who micromanages you and tells you every little detail that you need to
implement. Coding in a dynamically typed language is more like having a boss
which tells you the general picture of what is required and lets you figure
out the details.

I think the mindset of self-reliance which comes with dynamically typed
languages is very important when it comes to producing high quality code.

------
azhenley
Anyone know who wrote this?

~~~
fauigerzigerk
I don't know, but I wouldn't be surprised if he did:
[https://github.com/joelewis](https://github.com/joelewis)

------
wellpast
I'm forever baffled by the complete inability of our industry to see time or
care about seeing that our most precious resource is time.

One thing is true: if you're not proving correctness of your code -- formally
or informally -- then you are living in entropy and at very high risk of
inefficiently delivering value through software. Knowing how to call "correct"
on code is paramount.

And also -- yes, static type systems allow for (partial) machine verification
of these proofs.

The missing piece is the innate -- and immense -- cost in having to express
these proofs formally in a machine-checkable way.

Static type enthusiasts typically downplay these costs but they are simply
wrong. I am not talking about the cost of _learning_ how to code in a
statically typed system--that should never be factored in. I am talking about
the innate costs of formal verification (and strong static typing) that even
the expert static typers pay. I have seen these guys work and they are
delivering sub-optimally in time compared to the alternatives. Period.

I have been around the block with both static and dynamic typing systems and
the latter by far optimizes for delivery throughput over time.

Formally proving correctness of your program has the upfront cost of
formalizing the proof (to the degree required by the verification system) as
well has having the effect of crystallizing your code in its current
representation which makes it more difficult to (re-)factor for future uses.

Some of the (more reasonable) strong static type enthusiasts will concede that
this kind of machine/type-proving is better done when the domain and code
stabilizes. My hats off to these people for at least being honest about
things.

However the next realization is that once code and domain stabilize the
need/value for machine proving correctness (in typical business/data
applications) drops substantially (for obvious reasons).

So the pragmatic value of strong type systems and formal verification is far
lower than the proponents will have you believe. Of course we've known this
truth forever but our industry forgets pretty quickly. Haskell and variants
are on the rise in popularity; but make no mistake, if you are optimizing for
overall delivery throughput over time -- even experts are swimming upstream
with these languages.

Of course every time I point this out on HN I get downvoted -- but it kills me
to think that a next generation of programmers are being misled down a path of
formal purity with misrepresentative claims about the cost of using these
tools in real business applications.

Just to dispel any idea that what I'm saying is philistinic, I am a
mathematician/academic first, enjoy category theory, have written more than my
share of academic proofs (including novel results), and think these tools are
immensely fascinating.

But having been in industry now for 20+ years shipping web-scale and
distributed data systems for business industries (what 90+% of us are doing, I
imagine), where time is the most precious resource, I know with certainty that
leaning on formal verification techniques (including strong static type
systems) is an enormous tax compared to the alternative. That these tools work
against fast-paced, iterative development.

It has also become evident to me that there is a vanguard of static type
enthusiast who are not admitting (or perhaps do not understand) the relative
cost of the pursuit. Who will point to a few null pointer errors (that, mind
you, could be eliminated or reduced by other defensive coding techniques
besides formal proofs) and use these to justify the herculean cost of their
formal system.

If you're a junior or on the fence about static type systems - at least code
in a weakly typed PL in which you can lean in one way or the other. If
business outcome/throughput is what you value first and foremost, I guarantee
you will gravitate more and more toward dynamic evaluation - especially as you
realize the world of real business delivery produces constantly changing
requirements and carrying out delivery in the face of purist formal modeling
and proofs will be a substantial drag on what you can do for little relative
benefit downtream.

~~~
tunesmith
I'm not addressing formal verification itself, but in the argument between
dynamic languages and compiled type-checked languages, I haven't found the
tradeoff as you describe.

I also have 20+ years in development and consulting, and I'm someone equally
skilled in dynamic languages (particularly php) and compiled languages (mostly
java, with scala and FP concepts mixed in).

My current long term project is something where I'm the sole person on the
team capable of deeply understanding their two main sets of legacy code. Both
have been in active development for around fifteen years. One is in php, and
one is in java. Both have significant technical debt, with several efforts of
"modernization" that have only touched parts of the codebases.

At this point, adding new features to the php codebase is harder. There are
weirder runtime problems. The codebase is more difficult to understand. The
side effects of any new features in the php codebase are more difficult to
predict.

When developing features that have any hope of actually _succeeding_ and
sticking around for a while, the truth is that area of code is going to be
read, analyzed, and understood (or attempted to be understood) far more often
than it will be written. So any initial benefit you get in writing speed is
more than swallowed up over time in its difficulty to re-read, understand, and
maintain.

After dealing with both approaches for several years, my opinion is pretty
firmly set that the dynamic approach is best for prototyping, or for a
quickly-written simple service that won't grow, or if you're perhaps a startup
writing a demo and scrambling for your first rounds of funding.

But if you're looking at business features of any complexity, want to maintain
them over a significant period of time, through a significantly changing
number of eyeballs, and making a significant number of changes while leaving
the codebase somewhat understandable... over the long run, the typed codebase
will be easier and faster. I think the problem is that with typed systems, the
costs are more explicit, but with dynamic systems, the costs are embedded in
decisions like "That feature sounds too hard for that codebase, let's not do
it (and thereby cede ground to competitors)."

~~~
wellpast
I don’t disagree with you about maintaining large unwieldy code bases being
perhaps better in static type land.

Large unwieldy code bases have already ossified anyway so the productivity
gain of dynamism is lost and you’re simply swimming in complexity.

But my argument is that what got you the large unwieldy code base is a lack of
skill set that no type system could protect you from.

With the right toolkit (which includes nominal static typing among many other
tools) and expertise you are not being optimal by giving highest precedent to
formal machine proof systems which is what strong static typing does.

~~~
zozbot234
Properly used, static typing actually enhances modularity and reduces unwanted
coupling among software components. This means that a larger code base is far
less likely to become practically "unwieldly" if static types have been
consistently used in development. Even in exploratory programming where
dynamic types may actually have some limited value, using them effectively
requires a lot more "skill".

~~~
wellpast
Thought experiment, in your mind what would better produce modularity — TDD or
static type verification?

If you had to pick one.

~~~
tome
Anecdata: in Python I used to (and still do to some extent) practice TDD very
conscientiously because I felt it really helped me structure my programs well
and get them "correct my construction". One of the reasons for moving to
Haskell was that the type system has the same effect at a lower cost (once the
barrier to entry of learning about the type system has been crossed).

------
fny
Isn't there a Turing reduction that violates any guarantees?

    
    
        def square(x):
          maybe_terminate()
          x*x

------
Koshkin
The thing is, the modern notion of what constitutes a type is 'class'. This
stealthy switch from one to the other has made the concept of strong typing
hard to understand and/or apply in practice, and with the addition of the idea
of interfaces, late binding, etc. the problem of types in programming, stated
in all its generality, became pretty much intractable.

