Hacker News new | past | comments | ask | show | jobs | submit login
A decade of developing a programming language (yorickpeterse.com)
288 points by YorickPeterse 10 months ago | hide | past | favorite | 139 comments



> "Oh, and good luck finding a book that explains how to write a type-checker, let alone one that covers more practical topics such as supporting sub-typing, generics, and so on"

I bought _Practical Foundations for Programming Languages_ by Harper and _Types and Programming Languages_ by Pierce and I just can't get through the first few pages of either of them. I would love to see a book as gentle and fun as _Crafting Interpreters_ but about making a static ML like language without starting with hardcore theory. (Bob, if you're listening, please make a sequel!)


I'm in the same boat as you -- here are the two best resources I found:

https://mukulrathi.com/create-your-own-programming-language/...

https://jaked.org/blog/2021-09-07-Reconstructing-TypeScript-...

I read through the first 10 chapters of TAPL, and skimmed the rest. The first 10 chapters were good to remind myself of the framing. But as far as I can tell, all the stuff I care about is stuffed into one chapter (chapter 11 I think), and the rest isn't that relevant (type inference stuff that is not mainstream AFAIK)

This is also good:

https://github.com/golang/example/blob/master/gotypes/README...

And yeah some of us had the same conversation on Reddit -- somebody needs to make a Crafting Interpreters for type checking :) Preferably with OOP and functional and nominal/structural systems.

---

Also, it dawned on me that what makes TAPL incredibly difficult to read is that it lacks example PROGRAMS.

It has the type checkers for languages, but no programs that pass and fail the type checker. You are left to kind of imagine what the language looks like from the definition of its type checker !! Look at chapter 10 for example.

I mean I get that this is a math book, but there does seem to be a big hole in PL textbooks / literature.

Also I was kinda shocked that the Dragon Book doesn't contain a type checker. For some reason I thought it would -- doesn't everyone say it's the authoritative compiler textbook? And IIRC there are like 10 pages on type checking out of ~500 or more.


Are we looking at the same Dragon book? Chapter 6 the whole chapter is about type checking. 6.2 spells out a type checker for a sample language, admittedly in pseudo code.

It might not have the same lingo as the modern type checking specification but most of the ideas are there. It talks about type systems, type expressions, type rules, constructed types like array/struct, type variables for unknown types, etc. It puts the type rules in the grammar productions. It stores the type info in the AST nodes. It uses the chained symbol tables as the type environments.

It has sections on type conversion, operator/function overloading, polymorphic functions (parameterized types), and type unification.

It has all the pieces and steps to build a type checker.


I have the second edition, copyright 2007, and chapter 6 is "intermediate code generation". The whole chapter is definitely not about type checking -- is yours?

https://www.amazon.com/Compilers-Principles-Techniques-Tools...

It has 11 sub-sections, 2 of which are about type checking

6.3 - Types and Declarations, pages 370-378

6.5 - Type Checking - pages 386 - 398

So that's about 20 pages out of 993 pages. A fraction of a chapter!

---

What I was referring to is the appendix "A Complete Front End", which contains a lexer and parser, but not a type checker! Doesn't sound complete to me.

I guess they have a pass to create the three-address code, and type checking is only in service of that. But they don't give you the source code!

---

Also weirdly, when you look at the intro chapter "The structure of a compiler", it goes

1.2.2 Syntax Analysis

1.2.3 Semantic Analysis (3 paragraphs that mentions type checking and coercions)

1.2.4 Intermediate Code Generation

But when you look at the chapters, there's NO semantic analysis chapter! It goes from (4) Syntax analysis, (5) Syntax-directed translation, to (6) intermediate code generation.

I think there's a missing chapter! (or two)


I have the 1st edition printed 1988. Chapter 6 is Type Checking, chapter 7 is Runtime Environments, and chapter 8 is Intermediate Code Generation.

WTF. The 2nd edition is really butchered.


Wow yeah. So in the first edition pages 343 to 388 is a full chapter on type checking. It looks much more coherent.

What the heck happened ...


Standard thing: older textbooks are generally better. They tend to get dumbed down in new revisions, and modified to fit stupid standards about class syllabus.


This reminds me that I actually bought the first edition of the GC handbook on purpose, because it had more C++ material, which was relevant to my project.

The second edition cut a bunch of C++ material out.

And yeah I don't think I missed anything -- I think it even had better diagrams.

So yeah I believe that general rule! I didn't realize that at the time I bought the Dragon Book.

Pretty sad how you can lose knowledge over time!

---

The books also respond to fashion -- e.g. Java was a popular teaching language, so they had to add more Java. I think that is reasonable in some ways but can backfire in the long term.

Although I guess I am not too excited about MIX assembly in TAOCP and so forth, so it's a hard problem


Obviously sometimes the updates are fixing real critiques of earlier editions. Those are good. But many times it is “fixing” critiques along the line of “cut out advanced material because it is too much to cover in 1 semester.” Makes sense in an academic context, but not for professionals like you and me that are looking for that kind of stuff.

Books also tend to become more sparse on the page, and with more pointless “illustrative” stock photos that add nothing and take up page real estate (leading to more advanced topics being dropped to reclaim space), but make the book seem more approachable to undergraduates.

Generally applies more to physics, chemistry, and math books than computer science, but it is a general phenomenon.


Thank you so much! This looks like some good morning-with-coffee reading!


The bigger challenge is doing the pattern matching and mainly the exhaustiveness checking of patterns which aren't constructors of a sum type, but for example integer intervals.

Simon Peyton Jones together with others wrote a relatively easy to read book about the writing of Miranda (Haskell), which even includes a simple explanation of lambda calculus, so it includes everything you need to know about the theory, more or less: "The Implementation of Functional Programming Languages" 1987, https://www.microsoft.com/en-us/research/wp-content/uploads/...


I highly recommend https://github.com/sdiehl/write-you-a-haskell as it is very developer friendly. It’s not complete, but it really gets the gears turning and will set you up for writing your own Hendley-Milner style type checker.


> I would love to see a book as gentle and fun as _Crafting Interpreters_ but about making a static ML like language without starting with hardcore theory. (Bob, if you're listening, please make a sequel!)

I don't think an ML-like language is the best option. I would start with type-checking something simple like C or Pascal: no overloading, generics, type inference, anonymous functions. Just plain old subroutines and declared types on everything, arrays and structs as the only composite data structures.

Then you could add implicit conversions, type inference for locals (not that hard), subtyping, target type inference for anonymous functions, generics and maybe overloads if you're brave enough.


I had a similar experience this year as I’m working on an ML-cousin to Golang. I actually found that ChatGPT (gpt4) was really good and breaking down the step of implementing a Hindley-Milner type system in a language I could understand. It could also provide follow up clarifications to my questions too. I then implemented it a small bit at a time and as the complexity grew I started to better understand the algorithm.

EDIT: ChatGPT could actually demonstrate the whole process for type checking small chunks of code, including: assigning them type variables, collecting constraints and then unification. It would even then point out if there was a type error in the code snippet!


I'm curious how you went. Here's my attempt: https://github.com/chewxy/hm , the core of which (i.e. the interface type) is what powers most of the languages I wrote (different languages I wrote have different unification schemes)


Cool to see an implementation in Go, nice! Here's where I got to with my first attempt: https://github.com/cobbweb/flite. There's definitely stuff I'd do a bit different, my second attempt is actually a project to learn Rust with, but I'm already considering switch to Ocaml haha!


I've mentioned this in a different comment, but I've written several articles on type inference, trying to make the topic more approachable:

https://gilmi.me/blog/tags/type%20inference

If you find that helpful, I've made more content on compilers (including live coding a compiler, without narration) which should be easily reachable from my website.


I would love to know more about what the author found difficult regarding type checking. As long as you don't screw up your semantics to the point of needing a full blown constraint solver there should be no issues.

Edit: by "screwing up semantics" I mostly mean the combination of overloading and implicit conversions, which is known to cause issues in Java, C++, etc.


The problem for me was that I had a rough idea of how to implement (essentially it's similar to a recursive-descent parser), but I had a difficult time finding appropriate resources to confirm or deny whether my idea was solid, as well as tips and what not.

Basically, the existing material is a bunch of existing incredibly complicated implementations, the odd blog post that just throws a bunch of code your way without really explaining the why/thought process behind it, and books that aren't worth the money.

The result is that you can of course piece things together (as I did), but it leaves you forever wondering whether you did it in a sensible way, or if you constructed some weird monstrosity.

To put it differently: you can probably build a garden by digging some holes and throwing a few plants around, but without the right resources it can be difficult to determine what the impact of your approach may be, and whether there are better ways of going about it. Oh and I'm aware there are resources on gardening, it's just a metaphor :)


As someone who has written many compilers and type checkers, I agree. There’s very little information online about the subject. I think part of the issue, for me, was psychological: I felt like I was missing some implementation theory that was presented alongside other compiler subjects (e.g., LALR) when the reality is that you’re mostly implementing very simple Boolean operations.


While helping a bit, it's difficult to learn or reverse engineer from existing type checking code because a lot of them are mundane repetitive code implementing some high level theory and algorithms. The actual code is too far remote from the theory. You really want to start from the theory and the overall picture.

The Dragon book has a chapter on type checking. It gives explanations on many topics. It has plenty of examples. It has type treatments on different areas of a language, like expression, array, struct, function, pointer, etc.

Despite being really old, its ideas and explanation on the topic are still relevant.


I think the larger writings like books are generally going to be written by academics so they'll be rigorous and hard to read.

So that leaves blog posts for most developers actually implementing this stuff.

But the solution here is that when you finally figure out a good strategy to deal with something muddy like this is to write that better blog post :)


I'm planning on doing something similar to what I did for pattern matching [1]: basically building something entirely standalone that fits in 2k LOC or so, and explains the basics (i.e. nominal typing plus basic sub-typing), hopefully such that people can then take that and extend it.

As for _when_ I'll do that, that depends on when I can convince my inner critic to actually commit to the idea :)

[1]: https://github.com/yorickpeterse/pattern-matching-in-rust


I've been implementing a language with a TypeScript-like type system, and while the core system is pretty intuitive (values combining with other values, determining whether basic value types fit into others), some of the fringes have been challenging:

- Generics, especially inferring generic type params

- Recursive types

- Deeply mutable vs immutable types

- Type refinement based on checks/usage

- Giving good error messages for deep type mismatches (though I'm not sure TypeScript itself has figured this one out yet, lol)

etc. And even the stuff I've figured out, I figured out almost all from scratch (sometimes having to bail and take a totally new approach midway through). I would love to read a book giving me a proper bottom-up walkthrough of the current landscape of type system implementation


I know nothing about how one would make a type checker, but it sort of feels like your comment is "As long as you don't encounter any issues, there won't be any issues", no?


TypeScript type-checking is quite complicated. I'm sure no human being on earth could pass a test of "does this compile"

But that's an exception, and deliberately decided to be complex.


Would type-inference also "screw up the semantics"? My understanding is that this generally operates via some kind of constraints solving.


Depends on how much type inference you want.

If you want ML-style total type inference when you expect the compiler to deduce the parameter and the return types of your functions, then yes.

Local variable type inference isn't hard at all, unless you want to do stuff like Rust: declare a variable without an initial value and infer its type from the first assignment downcode.


I’ve written about this at some length, but long story short: type modeling and semantic checking is intimately connected to the language semantics, so aside from very simple “look at the properties of the underlying nodes, derive properties of the current node”, it is hard to present any generic approach. In fact, there are many possible way to approach modeling types even with the exact same semantics, leading to different trade offs (just compare different C compilers and you’ll see!).

So this is actually harder to do that it might seem. The best advice I have is to contribute to a compiler or two and understand their semantic checking model.


> about making a static ML like language without starting with hardcore theory.

The book Applicative Order Programming: The Standard ML Perspective has you implement an ML-like language at the end (the book is mostly about programming in Standard ML itself). Chapter 10 covers type checking and type inference. Chapter 11 covers interpretation via graph reduction. Chapter 12 covers compilation of the ML-like language to an abstract stack machine. The code is all in Standard ML. It's very direct and practical without getting bogged down in theory, although it does talk about theory, and it's not as gentle and fun as Crafting Interpreters.

It was published in 1991. It's on libgen; I'd recommend downloading a PDF and reading those chapters in order to judge for yourself.


I've written a type checker or two. My experience: https://lambdaland.org/posts/2022-07-27_how_to_write_a_type_...

I left some links/references to books that helped me out.


Might this help? I wrote it: https://azdavis.net/posts/define-pl-01/


“Modern Compiler Implementation” by Appel is definitely worth a look. I read the ML version, and the code is really clean/readable (not sure about the C and Java versions).


Your mileage may vary. I think It's a good article. But sometimes a PL has to go down an unconventional path to make something new. If you follow a best practice "rulebook", your PL will be like every other PL.

I think the most important part is figuring out who is going to use the PL and what problems they are solving. As precisely as possible, best with a list of concrete Personas and Projects. This creates a "value system", which makes it easier to answer all questions during implementation.


> sometimes a PL has to go down an unconventional path to make something new.

Absolutely, just look at Haskell as an example.

> most important part is figuring out who is going to use the PL and what problems they are solving.

Completely agree. Rust and Go have received a lot of criticism over the years, but I think that is often down to them not being as general purpose as many people try to claim. The creators had particular users and use cases in mind.


I think Rust and Go have received that criticism also because they precisely "dared" to take an unconventional approach: Go is very opinionated and uses a radically simplified syntax without inheritance, exceptions and other things people are used to from other languages, which leads to many people trying to bend Go to their style of programming and getting frustrated; Rust has its focus on performance ("zero cost abstractions") and being memory safe without using GC, which leads to its famously steep learning curve. But I agree with OP: if you are going to start a new programming language today, you'd better try to bring some new ideas to the table, otherwise you're just crowding an already crowded space even more.


> I think that is often down to them not being as general purpose as many people try to claim.

Is that the claim actually that they are general purpose, or do people not understand the claims made? Go has been very explicit that it is designed for systems programming. The Rust camp seems to hold a similar position. "Systems" implies that it is not designed for all tasks. But I'm not sure "systems" is properly understood by a lot of people.


Fun fact: Haskell was established by a committee based on best practices from a jungle of ML-type languages.


Maybe people just like to use general purpose languages. It should not be too hard to marry the good parts of major language families


> It should not be too hard to marry the good parts of major language families

It is. Language features generally aren't isolated modules that you can add and remove freely: they interact, and "the good parts" are only good because they play nicely with everything else. You can't have Lisp macros without Lisp (or otherwise homoiconic) syntax, or Haskell's concurrency without its effect tracking, or global type inference with Java-style inheritance.


As someone who has worked with massive typed Python codebases, I 100% agree with the author on Gradual Typing. It's literally the worst of both worlds. It's actually even worse than that because it gives the illusion of some safety when there isn't any, especially in Python where most of the tooling "fails open" by default and won't tell you something is wrong despite having annotations.


I disagree, as somebody who has also worked with tons of type-hinted Python code. Gradual typing is obviously worse than real static typing, but it's a step up from full dynamic typing. It has caught many bugs for me before hitting them in runtime, and provides good documentation about what I can expect a function to accept and return without forcing me to read prose.

Type hints don't provide any safety, though. That was never the goal, given that they're strictly optional and don't really exist at runtime anyway (though I have written some experimental monstrosities that used annotations for code generation at runtime). They're documentation in a standard form that static analysis tools can leverage.

I really can't imagine a situation where having type hints in Python is worse than simply not having them. They're not the worst of both worlds, they're a compromise with some of the benefits and drawbacks of each.


> I really can't imagine a situation where having type hints in Python is worse than simply not having them.

Can’t they lie / go out of sync with what’s actually happening? IMO, an unenforced type hint is very dangerous because of this - it can give you a false sense of confidence in a scenario where you would otherwise write more defensive code.


I suppose, if you're relying on type stubs. Ordinary in-line type hints won't lie or go out of sync without complaining somewhere though.

Defensive code is good, but I'm absolutely sick of writing code that has to be defensive about the type of absolutely every variable everywhere. It's ridiculous, verbose, and nearly impossible to do universally. THAT is the worst of both worlds. Having to manually fret about the types that every variable may hold. At the point that you're having to write defensive code due to dynamic types, you've already lost the advantages of dynamic types entirely.

I use type hints to say "use and expect these types or get Undefined Behavior", and that contract is good enough for me.


> Ordinary in-line type hints won't lie or go out of sync without complaining somewhere though

This is demonstrably untrue with the default configuration of Mypy. It will silently ignore explicit hints, inline or not, in certain situations.


I'm going to need an example of that. I tend to use Pyright, but when I used MyPy, I never had it simply ignore explicit hints.

If it's demonstrably untrue, you should be able to demonstrate it, right?


Here's the quickest, simplest example I could come up with off the top of my head:

https://paste.sr.ht/~chiefnoah/7e07a961cf266fa620d1fd2d31ba2...

This particular issue will get picked up with `--strict`, but it's nearly impossible to do that on a large codebase with typing added post-hoc.

Pyright has saner defaults, it catches this particular issue: https://paste.sr.ht/~chiefnoah/80816fded2a08a03ca80804d524ee...


Thanks for the example. That's happening because the `main` function is an untyped function. You can fix that by changing `def main():` to `def main -> None:` You can also make it pick up with `--check-untyped-defs`. I always used mypy with --strict, so I forgot that it wasn't the default configuration.

I agree, though. Pyright has better defaults. It's not great that mypy just completely skips all functions without typing information given. It makes a lot more sense for it to just check everything and assume Any on all unspecified arguments and returns. It's still valuable to type-check the contents of functions with unspecified types.


Yes, this is my whole point. Mypy by default will silently allow blatantly incorrect typing which is worse than no types IMO.


Yes, if you don’t run the type-checker regularly and just run the code itself, there’s nothing preventing them from going out of sync. Type hints are similar to tests in that way.


I suspect Python is particularly bad in this regard. For example, PHP's gradual typing is enforced with runtime assertions.


Any? I regularly find bugs. I don’t find every bug, but finding some bugs is better than none. Especially for the minimal cost of writing annotations.


> I don’t find every bug, but finding some bugs is better than none.

I used to think this, but based on experience I'm now less convinced. Finding most bugs, like real static typing does, is great; you can significantly reduce your test coverage and iterate with more confidence. Finding a few bugs is pretty useless if you're not finding enough to actually change your workflow.


Finding a few bugs is very useful if they're the kind of bugs that can cause problems in production but usually not in development or testing, like not checking an optional that is very rarely null.

It's not about workflow or finding enough bugs, but finding bugs that you might not have otherwise seen can be monumentally beneficial.


> Finding a few bugs is very useful if they're the kind of bugs that can cause problems in production but usually not in development or testing, like not checking an optional that is very rarely null.

There's a kind of excluded middle here though. Either that kind of bug hits production often enough to matter - in which case a checker that catches it sometimes isn't good enough, you need a checker that eliminates it completely. Or it doesn't hit production often enough to matter, in which case a checker is of limited use.


How is it worse than no typing? If you're not testing adequately because you think type safety is enough, you done f'd up.


Gradual typing is great. The parts you want precision on are statically typed, the parts that don't matter dynamic, and they fit together sanely at the language level.

Writing type annotations in comments for an optional preprocessor to pass judgement on before the actual implementation ignores them is a bad thing. But that's on python, not on gradual typing.

(An any type + static is also fine if you have a way to retrieve type information at runtime, and there's a sense in which template instantiations of a generic function and a function taking arguments of type any are the same thing as each other)


Python's type hints are just that, hints. They aren't static type annotations. Cython uses gradual typing, but the mainstream Python implementation does not actually offer gradual typing. It's still dynamically typed, but with an annotation mechanism that lets other tools do some analysis for you.


It's not the gradual typing. It's that static typing gives you the illusion of some safety where there isn't any.


I know someone that was on the Dart team and they say the same thing.


Looking at the table in the end of the article: notice how Scala and Elixir both took only 3 years, because they were targeting an existing platform. Clojure took only 2 years and was developed by a single person.


> only 3 years, because they were targeting an existing platform. Clojure took only 2 years and was d̶e̶v̶e̶l̶o̶p̶e̶d̶ ̶b̶y̶ targeting a single person.


What do you think about clojure?


I like it, and certainly much more than java which I just can't. I have trouble committing to clojure fully for larger projects because it's still a bit niche in terms of the size of the community but nice that it interfaces with java, and for smaller one-off things I just use scheme.

oh, clojure also is a little bit weird being a sort of grab-bag collection of data structures that it inherits from java and then turns lisp-ish. doesn't make it bad, each thing they add is nice, just feels a little motley

also, i refuse to call it "closure", i pronounce the j


Bravo, namers of things don't get to instruct the world on how they are pronounced, we will pronounce it as it is spelled. Likewise lie-nuks, ess-queue-ell, ...


Clojure has it’s own persistent immutable data structures, that are one of the main components of the language. They do implement Java collection interfaces, but that’s mostly for passing them to existing Java libraries without conversion (unlike Scala, where you have to convert).


> also, i refuse to call it "closure", i pronounce the j

Wait, it's not "clodger"?


> a bit niche in terms of the size of the community

i say it's the largest lisp like community out there tbh.


I like lisp but that is saying your dog is very big for a chihuahua.


also, i refuse to call it "closure", i pronounce the j

Wow so edgy


Clojure is a lisp, it basically got to skip the parsing and lexing stages


"Avoid writing your own code generator, linker, etc"

There's a certain dearth of pluggable code generators and linkers. Well, not on GNU/Linux, where you get both as(1) and ld(1) practically out of the box, but making your compiler emit a PE/COFF on Windows is a pain. You either bring your own homemade codegen and linker or use LLVM, using Microsoft's ml64.exe and link.exe is incredibly impractical.


The only "good" programming language is the language I make! Everybody thinks different and wants to optimize for different things. There is no existing language I've found that I "love"; they each have features I like but none have all the features together.

I've drafted up a language called "Moth" that has a lot of "meta power" to program block scope any way you want, as most languages hard-wire scoping rules, which I find limiting. Things like "classes", "functions", while-loops etc. would be defined by libraries, NOT the language. It's like lambda's on steroids and without bloated arrow syntax. But it may run slow as molasses, as scoping meta power adds lots of compiler/interpreter indirection.

However, it may turn out that only a few scoping rules are practical in most cases, and the compiler could then optimize for those. It would then only be slow if you are doing something "weird" with scope. Stick to a fixed known set, and things zip along. But finding that set requires R&D and road testing.


> The only "good" programming language is the language I make! Everybody thinks different

Yeah. There's always one little thing or another that bothers me in every language I've learned. Guess I've just come full circle now that I've finally made my own. No doubt it will bother someone else too. If anyone ever uses it.


Kernel[1] has the ability to make scoping more abstract, but in a clean way which doesn't introduce the "spooky action at distance" of full dynamic scoping. It's an improvement on the older fexprs.

Essentially, you have a lambda-like combiner called an operative which does not reduce its operands, and implicitly receives a reference to the caller's dynamic environment as an additional argument. The operative body can then optionally evaluate anything as if it were the caller, with even the ability to mutate the caller's local scope. The parent scopes of the caller are accessible for evaluation, but not mutation. You can only mutate an environment for which you have a direct reference - and the environments themselves are first-class values which you can pass around and store in other environments. It is only possible to mutate the parent environment of a caller if the caller's caller has passed a reference to its own environment, and the caller forwards that reference explicitly.

You can create empty environments or environments from an initial set of bindings, and build on them from there, and you can also isolate the static environment from a callee, ensuring that only a limited set of bindings are accessible to the callee. In effect, this enables a kind of "sandboxing" approach - where you can easily do things like load an external plugin which can be written using a limited subset of Kernel features, and only access the specific functions of your host program which you allow. It provides a lot of abstractive power, with the ability to have things like types and classes defined as libraries. The operatives can introduce "sub-languages", which can simulate the behavior of other language semantics.

As you suspect, this runs pretty slow because it's almost impossible to optimize ahead-of-time. A given piece of code is just a tree of symbols and self-evaluating atoms, and the symbols in some code are only given any functional meaning by the environment it is evaluated in. Kernel busts the myth that "compilation vs interpretation is just an implementation choice," and explores what can happen when we go all-in on interpretation. The author has written about the nature of interpreted programming languages and how this differs from compiled languages.[2]

There are opportunities to have compilation and performance without much loss of abstractive power, but this is not a focus of Kernel - whose design goals are described at detail in the report. I've done a lot of R&D on making performance acceptable for a Kernel-like language. The main insight is that you should be able to make some assumptions about the environment passed to an operative without knowing anything else about the environment. I do this by modelling environments as row polymorphic types, where the operative specifies a set of bindings it expects the caller's dynamic environment to contain, complete with type signatures, and assumes these bindings behave in some specific way. (Eg, that the binding `+` means addition, which is in no way guaranteed by Kernel, and that the type signature of `+` is `Number, Number -> Number`).

---

A mostly complete implementation of Kernel: https://github.com/dbohdan/klisp (Cloned from Andres Navarro's bitbucket repo which is no longer available).

Some performance improvements of klisp, with a lot of hand-written 32-bit x86 assembly: https://github.com/ghosthamlet/bronze-age-lisp (Cloned from Oto Havle's bitbucket repo, also no longer available).

Another implementation of Kernel, with some examples of defining records, classes, objects, generators etc as libraries: https://github.com/vito/hummus/tree/master

---

[1]:http://web.cs.wpi.edu/%7Ejshutt/kernel.html

[2]:https://fexpr.blogspot.com/2016/08/interpreted-programming-l...


> I used an S-expression syntax, instead of designing my own syntax and writing a parser for it.

> This meant I was able to experiment with the semantics and virtual machine of the language, instead of worrying over what keyword to use for function definitions.

Yeah. I see a lot of people asking why people are so fascinated by lisp and why there are so many lisps out there. I think this is a huge reason. It certainly was for me.

I just wanted to get some ideas working as soon as possible. Lisp is the easiest language to parse that I've ever seen, managed to write a parser by hand. And yet it's a fully featured programming language. It just gives you huge power for very low effort. It took a single bit to add metaprogramming to my lisp:

  if (function.flags.evaluate_arguments) {
    arguments = evaluate_all(interpreter, environment, arguments);
  }
I see it as a little frontend for my data structures. I get to work on them endlessly and everything I do improves something.


> Lisp is the easiest language to parse that I've ever seen, managed to write a parser by hand.

I've written a dozen or so parsers for different languages by hand: the parser is easy regardless of the syntax you choose. The complicated bit is always compilation/interpretation.

The biggest value I see to using S-expressions isn't the time saved in writing a parser, it's reducing the amount of bikeshedding you'll be tempted to do fiddling with syntax.


Aren't languages like C notoriously difficult to parse? I've read that they're actually context sensitive. IIRC much of Go's syntax was designed to avoid parsing complexity.


> Aren't languages like C notoriously difficult to parse? I've read that they're actually context sensitive.

All programming languages with identifiers are context-sensitive if you put well-formedness into the grammar. C is not exactly difficult to parse, rather it just needs some specific approach compared to most other languages. The biggest (and technically the only [1]) ambiguity is a confusion between type-specifier and primary-expression, which is a roundabout way to say that `(A) * B` is either a multiplication or a dereference-then-cast expression depending on what `A` is in the current scope. But it also has a typical solution that works for most parsers: parser keeps a stack of scopes with contained identifiers, and lexer will distinguish type identifiers from other identifiers with that information. This approach is so popular that even has a name "semantic feedback".

[1] As an example, the notorious pointer and array declaration syntax is actually an unambiguous context-free grammar. It is notorious only because we humans can't easily read one.

> IIRC much of Go's syntax was designed to avoid parsing complexity.

Go's semantics (in particular, name resolution and module system) was designed to avoid complexity. Its syntax is quite normal, modulo personal and subjective bits which all languages have. I can't see any particular mention of syntax decision to performance in the initial spec document [2].

[2] https://github.com/golang/go/blob/18c5b488a3b2e218c0e0cf2a7d...


I suspect that the interpretation of C cannot be specified in a brief page of C code. For one thing, the language doesn't supply the data structures for representing any aspec of itself; the code will have to invent those: declarations, definitions, statements, types. Then handle a ton of cases.

It will not be obvious that what the C code is doing is C interpretation, because all those data structures don't resemble the C syntax.

The Lisp meta-circular interpreter side-steps that; the issue is settled elsewhere. It exists against a backdrop where the easy correspondence between the printed syntax and the data structure being handled by the interpreter is taken for granted.

The C would need the same kind of backdrop; and that would be a lot of documentation.


For the context sensitive part, all you need is a symbol table to check which names are for types. I haven't actually done it but I don't think it is hard.

The real problem is the preprocessor, it's very hard to implement, and implement in a standards-compliant way. (haven't finished one either, but everybody says so). And without preprocessor and proper include processing, you simply can't parse correctly because you lack the context of which types are defined (besides missing preprocessor symbols).

Another problem that comes with that is that you always have to parse everything from beginning to end -- including the include files, which can easily be 10s or 100s of thousands of lines of code. I haven't seen a real performant and always correct parser for C IDEs, not sure if one exists.


See the clockwise/spiral rule!


Type syntax started pretty easy to parse, basically just parse a C expression. It became more complex and inconsistent later (when for example types for function parameters were added), but IIRC the clockwise/spiral rule is nothing like the idea behind the syntax. I think it is too complicated / not getting the gist of it.


That rings untrue in the face of how John MacCarthy specified Lisp interpretation, on paper, in Lisp itself, quite briefly, and didn't even suspect it would be executable. Then Steve Russel comes along, and bootstraps it by hand-translating it to code.


I found that a naive frontend for a language is not that big of a deal with something like ANTLR. The benefit is that you have a formal grammar syntax description, and you can iterate on that endlessly without too many code changes down the line.


It isn't with a handwritten lexer/parser either, the real work starts after the parsing is complete. Of course, there are two possibilities: often changing them can either result in a cleaner, more readable result or a _real_ mess ;) But you have to rewrite it by hand anyways if you want to get better (or usable) error messages and incremental parsing for a LSP.


I see lisp as sort of "no syntax". It's basically the AST. Which is a good thing, you can concentrate on the compiler and add syntax later. If you are working on a compiler for a non lisp PL, it makes sense to log the AST after parsing, as a lisp, as a readable AST.


> Avoid self-hosting your compiler

Not sure this is so problematic anymore.

The Zig folks targeted WASM so that they can bootstrap without needing a second compiler implementation. Compilers don't need a lot of POSIX in order to be functional.


Sigh. Please don’t refer to the rather exceptional nature of Zig’s self hosting. There has been an insane amount of work put into the Zig compiler by various contributors. Probably easily eclipsing the effort to write two separate compilers. The whole WASM deal comes AFTER they essentially built the compiler both in C++ and Zig to reduce bootstrapping steps. That just addresses the bootstrapping cost and not the fact that you have to build the compiler at least twice - and if you do it before 1.0 - more than twice. Cargoculting Zig’s development isn’t going to give you Zig’s success. Zig is successful because of Andrew presenting a vision of a language that lots of people was attracted to, and then cultivating the community well. It’s a well known fact in the community that the first C++ compiler had to go because it wasn’t well written, not because self hosting is great.


Um, so?

The fact that Zig did a boatload of extra work due to path dependence is true.

However, this in no way contradicts that self-hosting via WASM/WASI is likely a far better idea than maintaining two complete compilers in different codebases and languages solely in order to bootstrap.

In fact, nowadays, someone building a new language and compiler is probably better off targeting WASM/WASI before any other architecture.


Oh that's a great point! Didn't someone give a talk about that too?



>>"In reality, gradual typing ends up giving you the worst of both dynamic and static typing: you get the uncertainty and lack of safety (in dynamically typed contexts) of dynamic typing, and the cost of trying to fit your ideas into a statically typed type system. I also found that the use of gradual typing didn't actually make me more productive compared to using static typing."

The Elixir team didn’t get that memo because they are actively in the process of researching and working on a gradual type implementation.[1]

[1] https://elixir-lang.org/blog/2023/09/20/strong-arrows-gradua...


Pursuing gradual typing makes sense when you've got an established dynamic language and want to incorporate typing into it (forget Elixir, let's just look at the massive success of TypeScript). But as OP's recommendation says - "for new languages", gradual typing has lots of costs and fewer benefits.

To put it in other terms: my informal impression is that the dynamic "features" of TypeScript are used grudgingly; the community strongly pushes towards strictness, eliminating anys, preferring well-typed libraries, and so on. There's little appetite to - in a single project, unless absolutely required - mix-and-match dynamism with staticness, which is the thing that gradual typing gets you. Rather, it feels like we're migrating from dynamic to static and gradual typing is just how we're doing the migration. But in the case of a new language, why not just start at the destination?


I wish Python was further down this path— I experimented with adding annotations and the mypy checker to a bunch of code at my company, and it seemed hopeless. Most libraries didn't have annotations at all, or there were annotations being maintained by a third party in a separate pypi package, but the annotations were out of sync with the main project and thus blew up in weird ways.

I feel for the people trying to develop these gradual systems but it's truly a herculean task, especially in the Python community that is now understandably so extremely shy about major breaking changes.


I might be missing something, but the appeal of gradual typing to me is that I can mostly type functions, providing safe input/output boundaries, and avoid having to type every single variable (unless I have to do so for performance reasons, as I do in Common Lisp).

This approach is comfortable to me both in Erlang and in Common Lisp, I see it as a balance between safety/performances and development speed (and I'm saying that as someone using Go for all professional development and being really happy with its full static typing).


> avoid having to type every single variable

Most static languages don't make you type every single variable anymore. Java, C++, Rust, C#, and many others let you make the compiler infer types where reasonably possible. That's still full static typing.

My Python and Rust have about the same kinds of explicit type annotations in roughly the same places. My C++ has a little bit more, just because `Foo obj{a}` is more idiomatic than `auto foo = Foo{a}`.


You can do that with type inference and a strong, static type system.


I think the author's recommendation has some needed context:

> Recommendation: either make your language statically typed or dynamically typed (preferably statically typed, but that's a different topic), as gradual typing just doesn't make sense for new languages.

The "for new languages" part is really important. Gradual typing makes a lot of sense when you are trying to retrofit some amount of static checking onto an existing enormous corpus of dynamically typed code. That's the case for TypeScript with JavaScript and Elixir with Elixir and Erlang.


Raku (formerly known as Perl 6 and a radically different language than Perl 5) uses gradual typing on purpose. It's a cool language that has a lot of advanced ideas, but has basically been in either the design or beta stage for like two decades.


>for like two decades.

IOW, Raku is a gradual type of language.


Other puns are not as good as this pun


You're right about this applying to new languages. I've added a note to the article to (hopefully) make this more clear.


Oh yeah, as far as I know Elixir has a lot of calling back and forth with Erlang code. So that makes for a super tricky type system problem.

I think the macros make it even harder. Elixir appears to be done almost all with macros -- there are lots of little "compilers" to Erlang/BEAM.


What about Julia?


Julia isn't gradually typed. It's strongly but dynamically typed.


Well, there's no memo (real or as a manner of speaking). It's just this person's preference.

I think gradual typing is a nice concept, and his arguments against it amount to "gradual typing is not stating typing", which is like the whole point. E.g. he goes on how the compiler can't do some optimizations on functions using gradual typing, but, well, it isn't supposed to anyway.

The benefit of gradual typing is that you can make your program fully dynamic (e.g. for quick exploration), and if you want more assurances and optimizations make it fully static, or if you just want that for specific parts, do them static. And you have the option to go for any of those 3 things from the start.


The problem about the whole "it's faster (productivity wise) than static typing" has, as far as I know, never actually been proven (certainly not through repeated studies).

Having worked with both dynamically and statically typed languages extensively, I never felt I was _more_ productive in a dynamically (or gradually) typed language compared to one that was just statically typed. For very basic programs you may spend a bit more time typing in a statically typed language due to having to add type annotations, but typing isn't what I spend most of my time on, so it's not a big deal.

In addition, that work you need to (potentially) pay upfront will help you a lot in the long term, so I suspect that for anything but the most basic programs static typing leads to better productivity over time.


>The problem about the whole "it's faster (productivity wise) than static typing" has, as far as I know, never actually been proven (certainly not through repeated studies).

Neither has the opposite, so there's that.

There's an ACM paper too ("An Experiment About Static and Dynamic Type Systems", 2010) which found higher productivity for the same code quality with dynamic typing. Of course a few papers here and there, pro or against, are as good as none. It's hardly a well studied area.

Besides, one or the other proven better doesn't mean much, just like you liking eggs over easy vs scrambled eggs doesn't depend on some study. If it works for you, and you're more productive with dynamic typing or static typing, use that.

Even if a study "proved" that one kind is "more productive" based on some statistics from measuring some group, or that it has "less bugs" most people when given a choice would still use what they prefer and makes them, as individuals, more productive and happy coding.

>Having worked with both dynamically and statically typed languages extensively, I never felt I was _more_ productive in a dynamically (or gradually) typed language compared to one that was just statically typed.

Depends on the type of program, the type of programming (e.g. imperative/declarive/functional/logical/OO or some combination and so on), the program's scale, the team size, and other aspects, including individual aptitude and preference, not to mention the language and its semantics beyond dynamic/static (and the ecosystem too). I'd certainly be way more producting using dynamic numpy than some C equivalent, even if as a lib it had feature parity.

>In addition, that work you need to (potentially) pay upfront will help you a lot in the long term, so I suspect that for anything but the most basic programs static typing leads to better productivity over time.

There are problems where upfront work is not a benefit, e.g. if it means getting behind in building your MVP or getting behind a competitor adding new features faster while you "perfect it", and your early stage startup loses steam. Also for things where the overhead of upfront might put you off from even attempting them. It can also be a problem to have big upfront costs for exploratory programming and searching into the problem space for your design/solution.


I think it's unfair to criticise the lack of studies for that specific question without acknowledging that there are exceedingly few studies that show a benefit for static typing despite the fact that a huge number of people feel that there must be an effect.


IIRC there was a study that looked at the proportion of Github issues labeled as bugs, by language. Clojure came in with the lowest proportion. I overall prefer static typing, depending on what I'm doing, but I think avoiding ubiquitous mutable state is a lot more important than typing.


What you can do is the opposite, integrate a “dynamic” type into a statically-types language, like C# does: https://learn.microsoft.com/en-us/dotnet/csharp/advanced-top...

This lets you use dynamic typing when you want, and enables more seamless interoperability with dynamically-typed languages.


Is it not literally the same on a type system level? Of course it could be very different from a user-perspective, as the ratio of `dynamic` will be very low in case of C#.


It’s not the same, because with “dynamic” the client code decides that it wants to use a specific object dynamically rather than statically, whereas with gradual typing it’s the object/class that decides that it now has (or hasn’t) type annotations. Aside from that, the default is reversed.


A sibling addresses the nuance expressed in “for new languages”. But even if the original author’s opinion was that gradual typing is always bad, why does that mean we should dismiss the Elixir team’s efforts?

I have opinions about Elixir, mostly around aesthetics (for context, I started programming in Python, not Ruby), but that doesn’t mean my opinions are objectively more valid than those of a genius like José Valim.

I’ve actually found Elixir to be very well-designed and internally consistent, and after two years using it, it’s obvious to me that José and team are very thoughtful and deliberative. I don’t think they want to introduce gradual typing because it’s trendy.


> In reality, gradual typing ends up giving you the worst of both dynamic and static typing: you get the uncertainty and lack of safety (in dynamically typed contexts) of dynamic typing, and the cost of trying to fit your ideas into a statically typed type system.

Depends on how you implement gradual typing. There's a spectrum [1] of gradual languages, and the differences between, say TypeScript and Reticulated Python matter a great deal.

It is true that you can have some serious performance hits on the migration from dynamically typed -> fully typed code; my advisor Ben Greenman is doing work examining that space of performance en route to fully-typed systems.

In practice, there are several gradually typed languages that give you very good performance when all the code is typed—i.e. you don't have to pay a cost just for having dynamic in your language. You might have to pay a cost when you use it, and the cost can vary dramatically.

Note that's just performance—sometimes performance matters less than being able to prototype something fast in a language, which is a feature I find valuable.

> In fact, the few places where dynamic typing was used in the standard library was due to the type system not being powerful enough to provide a better alternative.

Hey, that sounds like either a win for gradual typing, or a sign that your type system needs some serious work! ;-)

[1]: https://prl.khoury.northeastern.edu/blog/2018/10/06/a-spectr...


I really enjoyed the article.

I think one way to simplify language creation is to use an existing language with somewhat similar operational semantics as a compilation target. This simplifies the backend a lot and leaves more time to explore what the language (frontend) should look like. The backend can always be rewritten at a later time. My personal choice is usually JavaScript[1].

Regarding type checkers/type inference, I've also ran into difficulties with this topic, and I've written several articles trying to make it more approachable[2].

[1]: https://gilmi.me/blog/post/2023/07/08/js-as-a-target

[2]: https://gilmi.me/blog/tags/type%20inference


Related post "Inko Programming Language" which might get some interesting comments.

https://news.ycombinator.com/item?id=38270265

https://inko-lang.org/


I personally enjoy this article very much, just like what we are doing in KCL language [1]. KCL is a cloud native configuration and policy language, and we have endowed KCL with a semantic level gradual type system. It is a bit like Typescript and you use Javascript to write some project configurations with the Typescript type checker, but we have not completely discarded runtime type checking because as a configuration language, we need to strike a balance between efficiency and stability.

But on the other hand, we are currently facing certain challenges. KCL is somewhat ambiguous in grammar and semantics e.g. the record type, and we are working hard to improve it.

[1] https://github.com/kcl-lang


It should be added that we also use Rust to fully build the front-end of the language and compile it into native code using LLVM.


Having spent a decade writing my own programming language it is funny to read these recommendations. I don't think it is fair to say what someone should not do, as writing a programming language by definition is ridiculously hard!



"either make your language statically typed or dynamically typed (preferably statically typed, but that's a different topic), as gradual typing just doesn't make sense for new languages."

Is that because of type inference?


Your question doesn't make a lot of sense. The entirety of gradual typing works by type inference, as does the static typing on any new language.


> The entirety of gradual typing works by type inference

Is not a correct statement. You can use gradual typing with explicit type annotations or via inference, it makes no difference to the concept of gradual typing. Gradual typing itself is a way of handling the case of a language being both statically and dynamically typed and handling the interaction between the two portions.


Well, technically you can have gradual typing without inference. But you'll need to annotate every single value to get any matching out of it. If you do any language like that, the types will be only useful for documentation purposes.


> But you'll need to annotate every single value to get any matching out of it.

Ok, if we're taking "1 is an int" as type inferencing then, yes, every statically typed language, at least every mainstream one, has at least a small amount of type inference since we don't, in C for instance, have to annotate values. But C does not infer the types of its variables or functions, nor do you have to annotate them in every location where they are used. But that's not inference either, that's using the information determined by annotating the variables.

----------

My main point was that it's weird to say the entirety of gradual typing works by type inference. It works by permitting mixing statically typed and dynamically typed code together in one language. Whether the statically typed portion uses type inference or more "classical" type annotations is orthogonal to the way it works under the hood.


You can have an explicit "any" or "dynamic" type that can be used in place of a static type where you want dynamism. C# actually has this.


What part of the statement "is" because of type inference?


Most fringe languages are vanity projects that address a limited number of use-cases.

There are always language specific features that may reduce a given problems implementation complexity, but it often depends how much time people are willing to commit to "reinventing the wheel". If you find yourself struggling with support scaffolding issues instead of the core challenge, than chances are you are using the wrong tool.

https://m.xkcd.com/927/

I am not suggesting Erlang/Elixir, Lisp and Julia are perfect... but at least one is not trying to build a castle out of grains of sand, The only groups I see freeing themselves of C/C++ library inertia is the Go and Julia communities to a lesser extent.

Have a wonderful day, =)


It's a vanity project until it isn't. I don't know what makes a language succeed but it's clearly possible. I've seen quite a few languages rise from small to widely used. Zig is a prominent example but there are others.


I wouldn’t call Zig widely used just yet. It is definitely not in the same camp as hobby languages, but it is still incredibly small.

Nonetheless, it is a very welcome addition to the low-level landscape with some cool ideas.


I can only hope my own language will become "incredibly small" like that one day. :)

Even if a language doesn't take off, its ideas can make a difference and influence future designs. The creator of Zig certainly made an impression on me with this talk:

https://youtube.com/watch?t=120&v=Gv2I7qTux7g

Even if Zig hadn't been successful, the ideas it represents would've enriched the world. Gives me hope I'll be able to make something out of my own ideas too one day.


I often ponder if it is a function of the breadth of language use-cases within a problem domain, and or developer effort minimization.

C/C++ tend to be CPU bound languages that are tightly coupled to the Von Neumann architectures. Anything that is not directly isomorphic tends to not survive very long unless its for a VM, and supports wrapper libraries.

Best of luck, =)


Good god, all is well but can we please all stop posting this xkcd :p


So, you are proposing a new standard of satire. =)


So you wrote a static language using another static language that takes influence from it.......ok


As someone who has been designing a language since late 2012 (and it's about to have its first public release early next year O.o), here are some thoughts.

Avoid gradual typing: Absolutely. I had gradual typing and abandoned it. Gradual typing is not as good as simple Go-like inference, and that simple inference is easy to implement and takes care of 95% of the "problem."

However, you also need one or more dynamic typing escape hatches. When I implemented a config file format by tweaking JSON, I had to implement dynamic typing in C.

Avoid self-hosting your compiler: It depends. You probably should avoid it by default, unless your language is just so much better than the original. Rust is an example of this (compared to C++). I'm writing in C, and I want memory safety [1], so bootstrapping makes sense.

Avoid writing your own code generator, linker, etc.: This is excellent advice! Use the standard tools. Of course, me being me, I'm breaking it. :) I am trying a different distribution method, where you distribute an LLVM-like IR, and the end machine then generates the machine code on first run. In that case, there's no linker necessary, but I do have to write my own code generator. Fun.

Avoid bike shedding about syntax: Yes, absolutely. I did this, but the syntax still changed enormously!

Cross-platform support is a challenge: Yes, in more ways than one. First, you have to somehow generate code for all of the platforms, then you have to make sure your library works on all of the platforms too.

Compiler books aren't worth the money: Yes, but please do read Crafting Interpreters. Anyway, getting a simple parser and bytecode generator (or LLVM codegen) is the simple part of making a language. Then you need to make it robust, and no one talks about that. Maybe I should write a blogpost about that once my language stabilizes.

Growing a language is hard: Yes, absolutely. He mentioned two ways it needs to grow: libraries and users.

You can design a language to be easy to grow via libraries. See "Growing a Language" by Guy Steele. [2] I went the extra mile with this, and user code can add its own keywords and lexing code. So growing my language is "easy."

But growing the userbase? That's hard. You need to have a plan, and the best plan is to solve a massive pain point, or multiple. I'm targeting multiple.

First, I'm targeting shell; my language can shell out as easily as shells, or even more easily, but it's a proper language with strong static type checking. If there are people who want that instead of bash, they'll get it. And there's a lot of bash people might want to replace.

Second, I'm targeting build systems. My language is so easy to grow, I've implemented a build system DSL, and then I put a build system on top.

Shell and build systems are both things people hate but use a lot. These are good targets.

The best test suite is a real application: Yes, absolutely. Except, the best test suite is actually a bunch of real applications.

My language's own build scripts are the first real program written in it. I'm also going to replace every shell script on my machine with my language, and most of them will make it into the formal test suite.

Don't prioritize performance over functionality: Yes, absolutely. This is why I would suggest making an interpreter first; you don't depend on LLVM (shudder), and you can easily add functionality.

Building a language takes time: I've taken 11 years, and there's no release yet. Yes, this is true.

cries in the corner

Anyway, I wish the author luck, even though I'll be a competitor. :)

[1]: https://gavinhoward.com/2023/02/why-i-use-c-when-i-believe-i...

[2]: https://www.youtube.com/watch?v=lw6TaiXzHAE


>"In reality, gradual typing ends up giving you the worst of both dynamic and static typing: you get the uncertainty and lack of safety (in dynamically typed contexts) of dynamic typing, and the cost of trying to fit your ideas into a statically typed type system. I also found that the use of gradual typing didn't actually make me more productive compared to using static typing.

Well, that's like, your opinion, man...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: