Hacker News new | comments | show | ask | jobs | submit login
Untyped Programs Don’t Exist (williamjbowman.com)
45 points by gbrown_ 5 days ago | hide | past | web | favorite | 97 comments





A couple of points.

Javascript, Racket, etc that he listed aren't "untyped" languages. They are typed. Weakly/dynamically typed. Although I guess some people equate the two. So lets go with that.

Also, division by zero is an runtime operation error, it is not really a type error. The division operation doesn't fail because it violated the type system, it fails because you can't mathematically divide by zero.

Finally, x86 assembly doesn't really have what we call a type system in the general sense.

But assuming his premises and logic, the conclusion should be "Typed Programs Don't Exist". If x86 checks its type at runtime ( division by zero ), then it is a dynamic language. Dynamic languages are untyped languages like javascript and racket. And since all programs are ultimately assembly programs and all assembly programs are created by dynamic languages then it must mean that all programs are untyped. Hence, no typed programs exist.

Ultimately, the author's argument is that runtime division by zero error is proof of x86 type system. But if that is true, x86 assembly is a dynamic ( aka untyped ) language. And his conclusion is wrong.


> Weakly/dynamically typed.

These terms do not mean the same thing, and the two languages you mention are not weakly typed.

Strong typing means, roughly, that type errors can be detected at some point. It is strong static typing if this detection is at compile time, and strong dynamic typing if it happens at runtime. Racket is strongly dynamically typed:

    > (car 42)
    car: contract violation
      expected: pair?
      given: 42
The type error is detected.

Weak typing is the opposite of strong typing. Assembly languages are weakly typed: you never know what's in a register. So is Forth. So is C, if you allow pointer casts. But dynamically typed programming languages are not weakly typed: They need dynamic type information at runtime to execute, and they do use that information to detect errors.


From the console output you have shown, from that output alone, how do we know whether that's a static or dynamic type check?

From that output alone we don't know that. I assumed some background knowledge on the reader's part, namely, that Racket is not statically typed by default.

Pretty sure javascript is weakly typed.

Can you show an example of a JavaScript type error that is not detected at any point? By "type error" I mean something like "applying integer add to a bit pattern representing an integer and a bit pattern representing a float", which you can do in assembly language or Forth or C, but not in any dynamically typed language I know of. Because you cannot take the bits that make up a float and pretend to the compiler that those bits actually represent an integer.

weak/strong typing has no formal definition. You can argue over it's meaning but neither of you will be right.

I have heard that languages like javascript are weakly typed because of the effort they go through to evaluate an expression instead of throwing a type error.

>> "5" - 3 < 2

It doesn't fit your example of a type error. Quite the opposite, the interpreter intentionally has code to do the specific thing for the specific types. But this feature is why javascript is "weakly typed" in many posts.


> These terms do not mean the same thing

What do they mean?


"Weakly typed" means that the type system doesn't detect and prevent all situations whereby a datum is misused as if it were of a different type. Programs may misbehave due to type confusion, running off with incorrect results.

"Dynamic typing" means that run-time values carry type information.

Dynamic typing that checks that type information everywhere and diagnoses misuses can be called strongly typed.

Static and dynamic situations can be weakly or strongly typed.

Ada is more strongly typed than C.

Common Lisp is more strongly typed than someone's one-weekend project whose car function cheerfully accepts an integer-typed value and uses it as a cons cell.


"Type safety is the property that no primitive operation ever applies to values of the wrong type."

page 263 "Programming Languages: Application and Interpretation" Shriram Krishnamurthi 2003

----

"Historically, people have sometimes used the phrase strong typing to reflect the kind of type-checking that ML and Java use, and weak typing for the other kinds. However, these phrases are at best poorly defined.

Indeed, the phrases are not only poorly defined, they are also wrong, because the problem is not with the “strength” of the type checker but rather with the nature of the run-time system that backs them. The phrases are even more wrong because they fail to account for whether or not a theorem backs the type system."

https://papl.cs.brown.edu/2018/safety-soundness.html#%28part...


Also, there has been a use of "strongly typed" to denote the idea that alias names for types such as integers are incompatible.

In this sense, Pascal family languages were called "strongly typed", whereas C isn't, because a typedef int user_id_t; isn't a new type, but just a name for int that can be passed everywhere an int can be passed. C character handling is also "weakly typed": characters are just kinds of integers: we don't need ord(ch) to get a character's code, or chr(n) to produce a character. C implicit arithmetic conversion behaviors are also weakly typed in this sense.


It's been used to mean this, it's been used to mean that -- now it means we always have to ask what someone actually means.

From the article: "Definition. A type is a statement of the invariants of some expressions."

To the author, a type is any mathematical statement that can be proven about the code. This definition is so general that, yes, every computer language is typed, as we can prove things about programs in that language. For example: when provided with an integer as input this program produces as integer as output, this program will terminate, this program returns a sorted list of digits, this program will never divide by zero, this program recognizes sentences from this language, etc.


On the one hand, I doubt the author was out to stir up criticism on HN.

On the other, that definition is skirting quite close to defining everything in the known universe as typed - I suspect there is a hair splitting argument about whether the fundamentals of physics can be defined as a 'symbol' in an 'expression'.

With that sort of starting point, it is no longer clear what the value of things being typed is. If everything has a type, what use is it to point out that a thing is typed? The results become too trivial to be worth discussing. A little further down this rabbit hole and we'll have discovered that mathematics encapsulates universal laws!


I think the main point of the article is to shift type systems from being necessarily a part of the language itself to something that can exist outside the language and used to analyze it. For instance I once wrote a program which would infer types on a subset of Python and identify simple type errors without running it. That said there are languages which are more amendable to type checking. For example type checking a Ruby program which dynamically adds methods to a class.

these ['Strong and Weak Typing'] words have nearly no meaning at all.

http://blogs.perl.org/users/ovid/2010/08/what-to-know-before...

So what is “strong typing”? This appears to be a meaningless phrase, and people often use it in a non-sensical fashion. To some it seems to mean “The language has a type checker”. To others it means “The language is sound” (that is, the type checker and run-time system are related). To most, it seems to just mean, “A language like Pascal, C or Java, related in a way I can’t quite make precise”. If someone uses this phrase, be sure to ask them to define it for you. (For amusement, watch them squirm.)

page 263 "Programming Languages: Application and Interpretation" Shriram Krishnamurthi 2003


Division by zero isn't necessarily a runtime error. You could define your division operation to only admit divisors that are statically known not to equal zero. You just need a type system that is strong enough to track whether a number has been confirmed not to equal zero.

(Only problem is that such a type system is unwieldy compared to mathematics itself.)

I suppose we’d better get you in contact with with all those silly mathematicians who like to use proof assistants to add rigour to their work.

In run of the mill mathematics, I've never seen zero treated as a special type. In derivations, a restriction against zero (because some variable appeared in a denominator) is usually carried as a logical assertion that is attached to subsequent derivations, like "<.. equation ...>, x /= 0".

"Also, division by zero is an runtime operation error, it is not really a type error. The division operation doesn't fail because it violated the type system, it fails because you can't mathematically divide by zero."

Zero is a different type from the type that consists of integers above N.


Can you show a language that implements your concept of "integers above N" consistently? I fail to see how such a type would be remotely useful in the context of addition/subtraction operators.

I think Ada comes closest to your definition with allowing modular and derived integer types, but the former does include 0 and I'm not sure if the latter even allows division.


What is "the type that consists of integers above N?" If N = -1, then 0 is in the set of integers greater than N.

I think it makes more sense to say that division is a function F × (F - {0}) → F where F is a field. F is not the same set/type as F without 0. I think in the context of computer languages this would be called a refinement type. (https://en.wikipedia.org/wiki/Refinement_type)


Its an interesting concept. Addition or subtraction now would have to dynamically return either a zero type or a non-zero type. I suppose the type system could force you to check and handle the zero case.

In reality I think it would be too cumbersome. Division by zero is a relatively rare flaw and zero agnostic math on integers is common. I'd love to hear about any languages that do this.


Not necessarily, additional and subtraction just return int, but you can define other operations over subsets of ints that return nonzero.

Rust does something similar, floats by default can be NaN, but you can write a non-NaN float that just panics if it's Nan and then ignore the possibility of Nan in parts of your system.


> Finally, x86 assembly doesn't really have what we call a type system in the general sense.

I might argue that it kind of does, based on the width of the operands. For example, you can only movq a quadword–you will get a "compile error" if you try to use it on a 16-bit value.


Machine instruction sets have data type systems which tend to give a well-defined meaning to type punning. If an integer is reinterpreted as a floating-point value, the meaning is precisely determined down to the last bit. Few restrictions are placed by machine instruction sets on how a datum that is written into memory by a store is later interpreted by a load, though the semantics is clear for any given value. This is better than "undefined behavior" (how C treats similar situations), because it is predictable, but it has to be identified as weak typing.

We might call it "weak static typing". An assembly language program that assembles doesn't contain any undefined behavior due to misuse of type. When we assemble an instruction like "load 32 bits from the address given in R1 into R2", there is no possible type error there.

In machine languages, there are other type rules at play besides just data type, and assembly languages tend to treat these poorly. For instance, whether a pointer is suitably aligned for a given access is arguably a type. It may be the case that only pointers ending in 00 (bits) have the right type for accessing a 32 bit value. However, we can assemble the "load 32 bits from [R1] into R2" instruction silently without the assembler providing any type check that R1 contains a suitably aligned pointer. If we work with arrays in a typical assembly language, we don't get static bounds checks.

In short, machine language programs can exhibit horrible run-time errors due to misuse of memory, which are not statically caught by assemblers, even though we can pun among different kind of operands while remaining within non-failing behavior (no exceptions raised) and well-defined results being produced.


I'm not sure encodability can usefully be considered to define x86's type system. rax can be used in places r8 can't.

As in, there are certain instructions that operate on rax but not r8? Maybe a more complex type system in this case where rax and r8 are specialized quadwords and certain instructions accept more specific arguments?

Every register besides r8-r15 has special functions, and different registers support different addressing operations. There are at most two pairs of equivalent registers: r8/r10 and r9/r11 (i.e. I don't know of any operation in which r8 couldn't be swapped with r10, and likewise for r9/r11). All other registers are unique (i.e., for any pair of them, there exist valid assembly programs that would become invalid if you swapped usage of the registers in that pair throughout the program).

So you can define a type system isomorphic to a lot of things, but this seems to me a degenerate case. What's the point of considering it a type system when no two values can be of the same type, almost all operations are polymorphic, almost all operations are on multiple types, and almost every instruction involves a conversion between types?


> They are typed. Weakly/dynamically typed.

Unityped, as in, they have only one type. You can assign any type to any variable, so all values have a single type.

> Also, division by zero is an runtime operation error, it is not really a type error. The division operation doesn't fail because it violated the type system, it fails because you can't mathematically divide by zero.

That is a type error in some type systems. The general notion of a type is a logical proposition. "Non-zero number" is a proposition and so you can represent this with types. The type of division is then `Num -> (Num != 0) -> Num`.

> Dynamic languages are untyped languages like javascript and racket.

This is not true. All languages are typed, either with rich types, or they are unityped (like the lambda calculus or JS).

Machine language is typed, and the fundamental types are (roughly) word and double. These are the registers of machine language, which you can roughly think of as variables in programming languages.


That is simply not so. Values and variables are not the same thing. Just because variables do not have a type (can potentially hold any value) doesn't mean that values have no type.

Furthermore, variables potentially holding any value doesn't mean they have no type. It is just a potential. A variable effectively has that type which is actually put into it.

If a given variable is always either a string or nil, then that's what it's type is; the sum type of null (the type that contains nil) and the string type.

It may be the case that in a given language implementation, this fact about the variable is not calculated; but it's still a fact.

Neglecting to analyze the types of variables (and other nodes in the program) is an implementation quality/depth choice, not a kind of type system design.


> Just because variables do not have a type (can potentially hold any value) doesn't mean that values have no type. Furthermore, variables potentially holding any value doesn't mean they have no type.

I said they have one type, I didn't say they have no type.


That basically means the same thing. "Typeless dynamic language" and "all variables and object slots having one type" are essentially the same claim: that the bindings or storage locations are all indistinguishable and can take on any value.

(If they have one type, what is that type? And can values be of that type?)


Values in JS still have different types (as evidenced by their different behavior). Variables could all be argued to have one type in JS, which is the root type (which everything else subtypes).

So no, JS is still not untyped, merely dynamically typed. An example of a language that also has only one data type for value would be something like Tcl, where the only type is a string - and everything else is just an optimized way to store certain kinds of strings, that could be removed without affecting semantics of any program.


They could be argued, but that would be wrong. If we define a variable, give it the value 42 and then never change it over the lifetime of the image, then that variable's type is integer. Or, even more specifically, its type is the set { 42 }.

It may be the case that no such thing is actually analyzed and the implementation always retains the potential to reassign that variable; that doesn't change the mathematical fact that the set of values which that variable is fixed.

That the implementation is not interested in examining the type of a variable (let alone checking for consistency with the rest of the program) shouldn't be mistaken for the absence of type. There is a mathematical fact about the program that isn't constrained by the quality and depth of its implementation.


> Values in JS still have different types (as evidenced by their different behavior).

Behaviour is based on values, not types. 1+2 will print out a different value than 1+1, but these expressions have the same type (typically).

> So no, JS is still not untyped, merely dynamically typed.

I said JS was unityped, not untyped.


Values also have types, it's not something that's exclusive to expressions and bindings.

Javascript is Unityped. In Haskell/ML, you would define the type of Javascript objects by doing something like this:

    data JSObj
      = Num Double
      | Str String
      | Obj (Hashtable String JSObj)
      | True
      | False
      | Nil
      | Undefined

> Unityped, as in, they have only one type.

If there are no distinguishable properties ("one type") then untyped vs unityped seems a distinction without a difference.


Upon further thought, there's probably a useful distinction to be made here. Let's take the classical untyped lambda calculus as a canonical untyped language. Every value is a lambda, so there is clearly only one type, and no deeper structure at play.

Now contrast with something like Scheme or JS. Clearly there is richer underlying structure than mere lambdas, but since every value is still interchangeable (a global sum type) I think this suggests a "unityped" label as a meaningful distinction from the untyped case.

So I would amend my original comment to move the lambda calculus out of unityped and into untyped, and possibly Tcl as well since everything is conceptually a string there.


Let's take BCPL —

'The major difference between BCPL and ALGOL is that all ALGOL variables are declared with data-types (integer, real, boolean, string, array, procedure, label, pointer, etc.), whereas all BCPL variables have the same data-type: a 16-bit number.

In ALGOL, the meaning of an expression is dependent both on its context and on the data-types of the entities involved, and only expressions with certain data-types may appear in a given context.

In BCPL, any expression may be used in any context; the context alone determines how the 16-bit value of the expression is interpreted. BCPL never checks that a value is "appropriate" for use in a given way.'

http://www.bitsavers.org/pdf/xerox/alto/bcpl/BCPL_Reference_...

( I haven't been able to figure out what happens on divide-by-zero:

The division operator / yields the correct result if E1 is divisible by E2; it is otherwise implementation dependent but the rounding error is never greater than 1.

https://www.bell-labs.com/usr/dmr/www/bcpl.pdf )


Surely the meaningful distinction between BCPL and Scheme is that BCPL is not type safe and Scheme is type safe.

I disagree. "Type safety" is not the only kind of safety, and it's not the kind of safety found in either Scheme or BCPL.

> "Type safety" is not the only kind of safety

Never said it was.

"Type safety is the property that no primitive operation ever applies to values of the wrong type."

I did say "BCPL is not type safe" (so on that we agree).

Are you saying that, in Scheme, primitive operations can be applied to values of the wrong type?

A string can be multiplied by 2?


> Are you saying that, in Scheme, primitive operations can be applied to values of the wrong type?

Yes, that's a well-defined operation: it throws a runtime exception (which is also a value). All operations are well-defined in safe languages, that's what makes them safe.

But dynamically "typed" languages aren't type safe.

A truly undefined operation would immediately halt the program because the program simply cannot proceed. You can resolve this unsafety by defining semantics for these operations (like reifying errors as values as in dynamically typed languages), or you can add a type system which ensures only safe programs can be expressed.


You say "Yes" "primitive operations can be applied to values of the wrong type", but then you say that doesn't happen and what actually happens is that "it throws a runtime exception".

What you seem to mean is that isn't what I (`naasking`) would call type safety.

That may be.

However, it clearly has been a text-book meaning of type safety.


> You say "Yes" "primitive operations can be applied to values of the wrong type", but then you say that doesn't happen and what actually happens is that "it throws a runtime exception".

Because that's not a different type. It's the same (sum) type just with a different tag.

> What you seem to mean is that isn't what I (`naasking`) would call type safety. [...] However, it clearly has been a text-book meaning of type safety.

No, it's not.


> It's the same (sum) type just with a different tag.

Now it's — that isn't what I (`naasking`) would call a type.

> No, it's not.

page 263 "Programming Languages: Application and Interpretation" Shriram Krishnamurthi 2003


A quote divorced from context is meaningless. Here's one too:

"A type system is a syntactic method for enforcing levels of abstraction in programs. " [1]

What syntactic methods are used in dynamically "typed" languages?

The point of a technical term is to make distinctions. As I already explained, adding safety has at least two distinct approaches: ensure unsafe programs have no meaning, or ensuring every program has meaning by reifying unsafe expressions as values. These are clearly completely different approaches, and "type" has always meant the former from its inception as a solution to Russell's paradox.

The fact that dynamic typing has some superficial similarities is completely irrelevant. Types can enforce properties that no dynamic "typing" can ever achieve.

[1] https://www.cis.upenn.edu/~bcpierce/tapl/


> A quote divorced from context is meaningless.

I'll take that non sequitur to be an acknowledgement that your previous denial was incorrect.

> … and "type" has always meant…

"Language safety is not the same thing as static type safety."

p6 Types and Programming Languages

Apparently the author thought the phrase "type safety" more ambiguous than you do.

> Types can enforce properties that no dynamic "typing" can ever achieve.

Yes, static type checking can enforce properties that dynamic type checking can not.


> "A type system is a syntactic method for enforcing levels of abstraction in programs. " [1]

> What syntactic methods are used in dynamically "typed" languages?

Interesting. You may have unwittingly provided one of the best justifications for the term "dynamically typed" I've seen so far.

That's because operations that violate a (safe) dynamically typed language's chosen set of abstractions (built using bits, church encoding, whatever) are syntactically inaccessible.


> That's because operations that violate a (safe) dynamically typed language's chosen set of abstractions (built using bits, church encoding, whatever) are syntactically inaccessible.

They are semantically inaccessible, not syntactically inaccessible. These languages enforce abstractions by attaching meaning to the tags implicit to all values, and these meanings combined with the operation are what drive the next evaluation step, ie. success or raise error value. This is not what happens in typed languages.

Lexical scoping is an example of a syntactic method commonly employed by dynamically typed languages. It's possibly the only one.

Edit: I have both of Pierce's "Types and Programming Languages", and "Advanced Topics in Types and Programming Languages". "Dynamic typing" is mentioned exactly once, on page 2 of the first text where Pierce says:

> The word "static" is sometimes added explicitly [...] to distinguish the sorts of compile-time analysis we are considering here from the dynamic or latent typing found in languages such as Scheme, where runtime type tags are used to distinguish different kinds of structures on the heap. Terms like "dynamically typed" are arguably misnomers and should probably be replaced by "dynamically checked".


> semantically inaccessible, not syntactically inaccessible.

You can give the semantics of everything that happens in a safe d-t language in terms of church encoded tag-value pairs and yet the programmer can't break the abstraction because abstraction-breaking operations are syntactically unavailable.

It should count as testament to the success of dynamic typing that the abstractions are so safely enforced that you can squint and suddenly the whole language is semantically founded on them.


> You can give the semantics of everything that happens in a safe d-t language in terms of church encoded tag-value pairs and yet the programmer can't break the abstraction because abstraction-breaking operations are syntactically unavailable.

This strikes me as just a Turing tarpit argument. Sure you can represent and enforce semantics syntactically, and you can probably also do this via automatic translations in many or most cases. Why is this compelling? And what does this have to do with types?

The definition of type system I gave is, "a syntactic method for enforcing levels of abstraction in programs". Ergo, not all syntactic methods qualify as a type system, but a specific kind of syntactic method classify type systems. Why do you think your translation qualifies?


I should clarify that dynamically typed languages are type safe in the trivial sense that they satisfy unityped type safety. What I meant is that they are not type safe by virtue of what they call types, in the sense that is used by non-unityped languages.

> All languages are typed, either with rich types, or they are unityped

Almost all languages. There is one language that can be typed without a type, the empty language.

Since the empty language contains no programs, it follows that "Untyped Programs Don't Exist".


> Also, division by zero is an runtime operation error, it is not really a type error

.. in most languages. This is not universally true however. Consider a language with dependent types whose division function has a type signature like so: div : Nat -> (y : Nat) -> (y > 0) = True -> Nat, which basically requires a proof that can be verified at compile time that y is bigger than 0. In such a language you would not have a division by zero runtime error.

You can even have something similar (but much more painful to use) in languages without dependent types, consider:

    data Nat1 = Succ Nat1
              | One
    data Nat = CNat1 Nat1
             | Zero
    div : Nat -> Nat1 -> Nat

Right. You can also think of a type as a set of valid values that a variable could hold. In typescript you can define union types that simply hold one of a set of defined values. Eg:

  type OneOrThree = 1 | 3
or

  type Nullable<T> = T | null
This is arguably nicer than Maybe / Option because you can loosen / tighten API constraints without changing the API. Eg, myfn(x: number) can be rewritten to myfn(x: number | null) with all the safety benefits of Maybe but without the need to change any callers.

Its not just Ada - Rust also has non-zero integer types in the standard library. Eg:

https://doc.rust-lang.org/std/num/struct.NonZeroU32.html

This was written mostly let Option<NonZeroU32> take up the same amount of space as u32. Ie, if you know your integer will never be zero, you can reuse the zero slot as a sentinal None value in a way thats API-compatible with everything else in rust.


> You can also think of a type as a set of valid values that a variable could hold

I would suggest avoiding this kind of thinking. Set theory and type theory are separate. Mixing them will only cause confusion and suffering.

> without changing the API

Pretty sure that loosening the return type constraints of a function or tightening the constrains of a variable that a function accepts would both change the API, is my understanding incorrect?

> This was written mostly let Option<NonZeroU32> take up the same amount of space as u32

Why not just use u32 then? These are basically the same thing after all.


> I would suggest avoiding this kind of thinking. Set theory and type theory are separate.

There are some languages - notably, Ada - where this (types as sets) is very much explicit in the language.

> Why not just use u32 then?

For u32, 0 is "just another value" - there's nothing special about it. If you pass it to some other code, there's nothing telling them that they must check for zero, and must not treat zero as actual zero value (and if they forget, it will silently give the wrong answer).

Option<NonZeroU32> requires them to pattern-match on None/Some, and if they get Some, it's guaranteed to be non-zero. That None is stored as zero is an implementation detail in that case, and doesn't leak onto the abstraction level on which code operates.


Ada is an example of a standardized, mature language with a "natural number" (strictly positive integer) type.

But Ada doesn't restrict its division operators (div, mod and rem) to just "natural number" divisors.

I find it a bit of a strawman. "Untyped" is not a very common term for programming languages. "Dynamically typed" is much more common in my experience. I'd call something like CSV untyped.

Certain subsets of the statically typed functional programming research crowd (i.e., Haskell/OCaml/ML aficionados) do like to refer to dynamically typed languages as "untyped". For them it is a common term of derision. Unfortunately they do not know or care that the rest of the world uses the term differently.

Well I would consider this a bit of a harsh take on Haskell programmers and not really a true statement, either.

I don't think it's particularly offensive that some communities have developed alternative terminology for the same concepts. For example, whether you call something a "hash", a "map", or a "dictionary" is not especially consistent. Haskell programmers spend a large amount of time talking about type systems, mostly because Haskell has a large and complicated type system which is constantly evolving, so I would expect people who use Haskell to have some differences in the way they talk about type systems.

The whole typed programming research crowd is basically doing research into how you can prove at compile time that certain properties of a program will hold at runtime. To them, that's what a "type system" is. I wouldn't say that Haskell programmers or programming research people are unaware of the terminology differences... I personally know some people who are programming language / type theory PhDs or professors, and they're usually more excited to tell you about something they added to GHC or something they did with Agda or Coq, and less excited about dissing JavaScript.


> Well I would consider this a bit of a harsh take on Haskell programmers and not really a true statement, either.

I did write "subsets", not "all Haskell programmers".

> I personally know some people who are programming language / type theory PhDs or professors

Me too, hence my statement above.


Derision is generally worryingly common among these people.

This is of course anecdotal on my part but as someone working with JS on a daily basis I've received more than my share of insults based on my choice of career.


I've seen the same.

Yet good user interfaces are typically warty behind the scenes and full of exceptions, exactly the kind of thing where spending a lot of time on specifications/typing offers little reward. Au contraire, it can put you into a mindset where you'd rather fight the users than put in another difficult-to-handle-typewise exception.

Some time ago I had a look at old Visual Basic which I've never personally used, and lo and behold, if you squint it's not actually that far from Javascript.


Ha! I remember when VBScript was still a thing. What a different world we would be living in had Microsoft had its way.

The derision towards JS programmers is very real, but I don't think it's fair to turn it into a generalization about "these people" (being PL researchers and Haskell/ML/etc aficionados).

I've encountered enough JavaScript programmers telling me how important Node.js is, and how crazy I am for not using asynchronous programming, or for using a different language in the browser and the server, or how static typing is a waste of time and you should really be using tests.

I'm not going to say that's worryingly common among JavaScript programmers, though. It's just very natural that people who work with a certain set of tools are acutely aware of those tools' advantages.

And I've also encountered plenty of C++ programmers explaining to me how crazy it is that I'm using a garbage collector, or that I'm not compiling something down to native machine code ahead of time.

And the people in industry will make fun of the people in academia, and the academics will make fun of people in industry.


OK I didn't know that. The article makes more sense in that context.

CSV is a list of lists of strings.

Don't the rows have to have the same number of elements? In that case it's more like a matrix of strings.

If it's a cohesive dataset, then sure, but csv "the spec", no.

I'm going to be a little nitpicky here:

> Definition. An expression is a symbol of sequence of symbols given some interpretation. > Definition. A language is collection of expressions.

I immediately reject these definition. A language is a subset of the free monoid over some alphabet [1]. This definition is widely used and the author changing the definition here is significant. Next, because the author requires that each expression be given an interpretation, the author conflates language (e.g. valid strings) with semantics (e.g. the interpretation of strings).

> The result of this sequence of symbols is undefined in C.

...

> Definition. Undefined behavior is the result of interpreting a non-expression.

The author is mixing around a few definitions of "undefined behavior". Of course, C compilers will gladly compile programs containing expressions that are total nonsense. The resulting behavior of the program has no meaningful semantics but the program itself consists only of valid expressions. Because a program may contain expressions that exhibit UB when operating with standard definitions (e.g. the definitions of "expression" and "language" ubiquitously used when discussing programming languages and compilers) the proof is incorrect.

The immediate implication of this result is that technically a C program with UB is free to modify itself non-deterministically and thereby prevent any existing proof system from predicting its behavior.

However, definitions aren't really "right" or "wrong." The proof is otherwise sound. My critique is that real systems aren't built using the definitions the author uses and therefore the resulting theorem isn't really applicable in any practical circumstance.

[1] https://en.wikipedia.org/wiki/Formal_language


And most type systems don't actually reflect the subtyping relation inherent in the topos correspondent to their logic. As a result, there is a massive disconnect between type systems, which let us catch errors in programs without running them, and type theory, an essential and unavoidable part of doing maths.

I think that we should take "typed", "untyped", etc. away from the dynamic-language haters and force them to justify their positions without this needlessly-perjorative attitude. "Types lend themselves to strong automated reasoning, automatically eliminate large classes of errors, and simplify the job of whoever is reasoning about the programs," writes the author, ignoring that there are piles of similar features in the language design space, like highlighting, typesetting, brace-matching, whitespace, capitalization conventions, docstrings, named variables, named procedures, 2D/spatial layout, ASCII/JIS/Unicode art, and the list goes on and on and on.


> most type systems don't actually reflect the subtyping relation inherent in the topos correspondent to their logic

Would you mind elaborating? With examples if possible.

> ignoring that there are piles of similar features in the language design space, like highlighting, typesetting, brace-matching, whitespace, capitalization conventions, docstrings, named variables, named procedures, 2D/spatial layout, ASCII/JIS/Unicode art, and the list goes on and on and on.

I am interested to see how any of these would protect against bugs that say a type system with dependent types would protect against.


The most infamous example is that of Haskell. Reasoning in Haskell is not the same as reasoning in the category Hask [1], whose objects and arrows purport to be Haskell types and functions; many folks are actually reasoning in an idealized, simplified Hask. Papers like "Fast and Loose Reasoning is Morally Correct" [0] justify this stance. (There is, to be fair, work on determining Haskell's actual categorical properties. [2]) This attitude is popular across the ML family, but spikes in the Haskell community for some reason.

I don't have a good citation for categorical subtypes, but it is well-known category-theory folklore that often objects have subobjects and that a given arrow `f : X -> Y` has fixed source object X and target object Y, and therefore that f likely embodies not just categorical structure but also subobject structure inside X. There may be many distinct arrows of type `X -> Y`, each with different subobject behavior. This is where concepts like dependent typing shine, right?

I'm going to take the other branch of your implicit dilemma. Let's protect against the same bugs that dependent types protect against, but without dependent types. This is valid E [3] or Monte [4], and languages in the E family are what Haskellers would call "unityped" or "untyped". The specification is that `y >= x` and also that the return value `rv >= y`. The compiler is not obligated to infer corollaries like `x >= 0`.

    def f(x :Int, y :(Int >= x)) :(Int >= y) { return x + y }
It is not accidental that Monte has a categorical semantics [5] at the level of values and not types. It seems to be the case that, even if the category of types is trivial, the category of values is rich. At the same time, studying only the category of types and never examining the values is insufficient! Ponder this Haskell REPL session:

    $ nix run nixpkgs.ghc -c ghci
    GHCi, version 8.2.2: http://www.haskell.org/ghc/  :? for help
    Prelude> :t reverse
    reverse :: [a] -> [a]
    Prelude> :t const [] :: [a] -> [a]
    const [] :: [a] -> [a] :: [a] -> [a]
    Prelude> :t cycle
    cycle :: [a] -> [a]
    Prelude> :t map id
    map id :: [b] -> [b]
    Prelude> :t tail
    tail :: [a] -> [a]
Studying the types alone, we are helpless. We need to know something of the values as well.

[0] http://www.cse.chalmers.se/~nad/publications/danielsson-et-a...

[1] https://wiki.haskell.org/Hask

[2] http://www.cs.gunma-u.ac.jp/~hamana/Papers/cpo.pdf

[3] http://erights.org/

[4] https://monte.readthedocs.io/

[5] https://monte.readthedocs.io/en/latest/category.html


On C2-dot-com there used to be a long running debate about "tag based" dynamic typing versus tag-free dynamic typing. Tag-free is similar to "untyped", but that may depend heavily on which definitions are used. This debate included whether tag-free is "good", and how to define it, since "tag" implies an implementation and not really behavior. One may also use tags for compression and speed "under the hood" even under "tag-free", but the programmer may not otherwise have to care as a language user (outside of performance issues). Thus, how does one know as a language user that the language is "tag free"?

I lean toward tag-free; it makes life simpler in my opinion. Interesting but thready debate. A battle of philosophy and definitions.


This is totally backwards. Typed programming languages don’t exist, as in at runtime it’s just instructions applied to sequences of bits. A type is just a convention about an agreed interpretation of those bits. It’s a temporary fiction that doesn’t exist at runtime.

There's some inconsistency in that argument. If a type is just a convention about an agreed interpretation of bits, then isn't it equally true that bits are just a convention about an agreed interpretation of voltages, charges, and magnetic domains? But worse yet, voltages, charges, and magnetic domains are just interpretations of the physical world, and they don't really "exist" either...

The difference with bits is that it's a convention realised in the hardware.

0 and 1 are physically different from each other. A 64-bit float could easily be seen as an array of bytes without any change to the information it contains.

Bottom line: types are external to the information something contains.


Whether a particular voltage represents an 0 or a 1 is an interpretation external to the device. Internally, there are just voltages. Whether a particular voltage represents an 0 or 1 will not in general be consistent, even within a device, and whether a particular piece of the device represents a bit or not is a matter of interpretation.

What tells you that a particular location in a chip represents a bit and what voltages correspond to 0 and 1? You have to bring external knowledge of the particular device.


Entropy. I can measure the number of steady states of the device and discover the actual, real, physical information contained within.

Also through this I can discover which voltage corresponds to the high and low state.

Meanwhile a series of bytes can be a float, a string, a table of numbers or even a function pointer - you can't tell which without external information.


It's a bit frustrating to see a comment that starts with a single, incomprehensible, verbless statement like "entropy." I know what entropy is, and I could give you three different senses for the word, but it's not a particularly illuminating thing to think about here and I'm clueless about how this connects to the rest of the comment.

I think what you're doing is taking your existing understanding of how a processor works, and saying that you could figure it out, given an actual processor. But that's begging the question. You have a definition for what a "bit" is, and what a "byte" is, and since you know what they are you can go on a hunt within a physical system and label things as "bit" and "byte". But I could come in and start labeling certain things "function pointer" or "float", and I don't see a particularly compelling reason why that's not allowed, but labeling things as "bit" or "byte" is allowed.

Or in other words, why is, "I know how synchronous digital circuits work and I can figure out a processor" acceptable, but "I know how compilers work and I can reverse engineer a program" prohibited?

And I'll still say that the statement that, "Typed programming languages don’t exist," is not even wrong. It's just a cute thing that you can say, to demonstrate that you can choose a definition for "exist" or "typed programming languages" that makes it needlessly difficult to have a conversation.


I don’t see what the inconsistency is?

We build layers of interpretation on top of one another, and we get this big stack with abstract things like types and programs on the top, bits and bytes somewhere in the middle, voltages and transistors farther down, and fundamental particles at or near the very bottom.

We agree that the stuff at the bottom exists, if the word "exists" has any meaning at all. But if you argue that "things which are just conventions about how to interpret other things" (like types) are not real and do not exist, if you want to be consistent about that, the entire stack evaporates and you are left with whatever is at the bottom, such as fundamental particles.

For a more concrete example, you said that, "A type is just a convention about an agreed interpretation of those bits." The major problem with this statement is that a program is also an agreed interpretation of some bits. Whether some piece of memory is an instruction or data is just a matter of interpretation. So if this argument says that types do not exist at runtime, then programs do not exist either.

So if we accept that programs exist, the argument must be wrong.


> “We agree that the stuff at the bottom exists”

I don’t understand this part. Where do we agree that? As opposed to those things just continuing to be useful fictions at the boundary between the current best-known reductive explanation of a thing and the outstanding questions where that reductive explanation fails to reproduce observational data as-yet-unexplained?


I'm not sure exactly what you're saying. The word "where" is a bit confusing, because there are some spatial metaphors here. The final sentence has several subordinate clauses and is quite long, has referents which are quite far from their references, and seems to use the vague word "thing" in two different ways.

We agree that there is a physical, objective universe. (Why? We assume it is true because otherwise conversation is not possible.) This physical universe exists (in a physical sense). We also have a catalogue of other phenomena (silicon atom, apple, Kentucky) which we say "exist", phenomena which don't exist (magnetic monopole, dragon, The Shire), and a catalogue of intangible concepts (happiness, vector space, triangle) which don't exist in a physical sense.

But every term here, except "the physical universe," are either concepts or interpretations of phenomena. If you say that a data type does not exist because it is nothing more than an interpretation of a physical phenomenon, then my problem with this is that this also applies to computer programs, bits, bytes, silicon atoms, apples, and Kentucky. So I would prefer to use a definition for "exist" that includes data types, bits, bytes, silicon atoms, apples, and Kentucky.

I don't think there's a compelling case for drawing a line between data types and bits and saying "bits exist, but data types do not".


I think you’re just arguing with yourself here. While pithy, this reply just seems to ignore basically every point I raised and just reassert your original assertions without offering any reason to believe them. In fact I’d say your whole series of comments here are rhetorical gimmick with some bravado like it’s a great philosophical insight while side-stepping the only interesting philosophical question here, which is how do we decide which concepts “actually” exist.

> While pithy, this reply just seems to ignore basically every point I raised and just reassert your original assertions without offering any reason to believe them.

I said that I didn’t understand your argument, so I asked some questions. And then I expanded my argument with reasoning, but apparently you feel that my reasoning is deficient, but lazily assert that I did not offer “any reason” to believe them. I honestly thought that I did offer reasons.

> In fact I’d say your whole series of comments here are rhetorical gimmick with some bravado like it’s a great philosophical insight while side-stepping the only interesting philosophical question here, which is how do we decide which concepts “actually” exist.

That’s an interesting philosophical question, but we’re having a discussion about programming languages and type systems, and I would say that the philosophical question of “what exists?” is side-stepping the interesting questions about type systems. In order to have a discussion about type systems we need to agree about what exists, and the best way to do that is to simply agree by convention that certain things (like type systems) exist.

Which is what I’d prefer. Same thing with mathematics… if we are talking about vector spaces, I don’t want to have a discussion about what “exist” means, I just want to use the word “exist” as a substitute for mathematical concepts, agree how we use the word in context, and then talk about math.


Instructions themselves often encode a conventional interpretation of the bits...

Many languages have type information available at runtime.

This misses the point though. It’s still just instructions applied by convention to bits to tell some other instruction something. When you use “isinstance” at runtime in Python or reflection at runtime in Java, it doesn’t mean the underlying bits somehow “have types” any more than any other function calls or compiled instructions ever would. That’s still just part of the useful fiction.

I guess bits are also a fiction then?



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: