Hacker Newsnew | past | comments | ask | show | jobs | submit | temp123789246's commentslogin

One requirement for a programming language to be “good” is that doing this, with sufficient specificity to get all the behavior you want, will be more verbose than the code itself.

Theory: Any system, legal or otherwise, that denies the Axioms of Reality, will eventually fail.

Axiom of Reality: “Intellectual Property” does not exist.


I lol’d


“Two gin-scented tears trickled down the sides of his nose. But it was all right, everything was all right, the struggle was finished. He had won the victory over himself. He loved Big Brother.”


In the same way that people struggle to comprehend exponential growth, they seem to also struggle to comprehend the cost of inaction, compounded over time.

Imagine if the steam engine had not been allowed by regulators during the time of the Industrial Revolution.

If that happened and we were all still working on farms today, I bet half the people would be telling us how much safer the government was making us with all its regulations. In blissful ignorance.


Indeed. Thank you for writing this and speaking up in public.

Many of the comments here that essentially reply to your article by saying “regulation is good, stop criticizing it”, are deeply depressing. That is a regulatory mind virus that must be destroyed before it kills us.


Does anyone know why, anecdotally, it seems like the slowness of type inference is more of a pain point in Swift than in Ocaml, Rescript, Purescript, Haskell, etc?


Is it that Haskell, at least, doesn't support overloading in the same way as Swift? I don't know either of them well enough to be sure.

It seems like there's a combinatorial explosion of possible overloads in Swift, whereas if you implement a function with the same ergonomics in Haskell (e.g. a printf-like function), the only thing the compiler has to do is ask "Does type X have an implementation for typeclass Show? Yes? Done."

Essentially Haskell solved this overload inference problem in the same way that iterators solve the M*N problem for basic algorithms: convert all these disparate types to a single type, and run your algorithm on that.


"Does type X have an implementation for typeclass Y" isn't always easy to answer.

https://aphyr.com/posts/342-typing-the-technical-interview


That post, while awesome (as is the rest of aphyr's stuff), is a lot to wade through to get to the point you're trying to convey. Can you spell it out for me?


That typeclass resolution can encode some heavy computation, the example being n-queens in the article.


That's only the case when you turn on the "enable arbitrary computation in typeclasses" flag, so I'd say it's not much of a worry.


I'm not an expert on the theory, but OCaml has a very fast compiler and while it is (almost) capable of fully reconstructing the types from a program with no annotations, it doesn't have to deal with ad-hoc polymorphism and takes some shortcuts like weak polymorphism when it gets too hard: https://www.ocaml.org/manual/5.2/polymorphism.html


Try this:

    let f0  = fun x -> (x, x) in
        let f1  = fun y -> f0(f0 y) in
        let f2  = fun y -> f1(f1 y) in
        let f3  = fun y -> f2(f2 y) in
        let f4  = fun y -> f3(f3 y) in
        let f5  = fun y -> f4(f4 y) in
        f5 (fun z -> z)
Lifted from https://dl.acm.org/doi/pdf/10.1145/96709.96748 via Pierce, Types And Programming Languages.


But that's just a type that is huge. I didn't want to wait for the evaluation, but if I drop the f5 out, I got a type that is 1.6 megabytes long when printed without spaces.

It's still very fast for "normal size" types. That reduced version compiles in 151 milliseconds.


Wait what? In Haskell the types are usually directly inferrable from the arguments they're being used as, and when you put a type annotation it's usually not explicit types (Num a => a -> b -> c).

I almost never bother putting types in Haskell, unless I want to guarantee some constraint, in which case I typically use typeclasses. Maybe I'm just weird but I don't think so. One of the very few things I actually like about Haskell is how good the type inference is.


My guess is different extensions to Hindley–Milner type system, which is EXPTIME-complete in the worst case.

HM isn't bidirectional in the special case, so probably the features they added vs the top level universal quantifier type that has pathological low time complexity.


Congrats!

I’ve been watching HVM for a while and think it’s extremely cool.

My intuition is that this will eventually be a really big deal.


TIL what “fnord” is. I think that is a perfect way to describe much of this article.


_The Illuminatus Trilogy_ , where I learned this word, is a fantastic read.


Have there been any notable innovations in parsing since this was written?


Aycock & Horspool came up with a 'practical' method for implementing Earley parsing (conversion to a state-machine) that has pretty humorously good performance delta over "naive" Earley, and is still reasonable to implement. Joop Leo figured out how to get the worst-case of Earley parsing down to either O(n) (left-recursive, non-ambiguous) or O(n^2) (right-recursive, non-ambiguous). That means the Earley algorithm is only O(n^3) on right-recursive, ambiguous grammars; and, if you're doing that, you're holding your language wrong.

A somewhat breathless description of all of this is in the Marpa parser documentation:

    https://jeffreykegler.github.io/Marpa-web-site/
In practice, I've found that computers are so fast, that with just the Joop Leo optimizations, 'naive' Earley parsing is Good Enough™:

    https://loup-vaillant.fr/tutorials/earley-parsing/


An extremely layman answer is that most interesting innovation in parsing in relatively modern times has happened seems to be in the context of IDE's. I.e. incremental, high-performance parsing to support syntax highlighting, refactoring, etc. etc.

(I may be talking out of my ass here.)


Actually the most important step of parsers (as even non-incremental, slow (or better: not fast) parsers are fast enough) is error recovery (error resilience) from syntax errors (mostly half written or half deleted code). What is time consuming is e.g. type-checking. Semantic checking in general, like exhaustiveness checks of pattern matches, syntax checking is fast.


In the days of punched cards error recovery was important.


Not sure, but I at least am certainly aware of possibilities that such writeups exclude.

In particular, you can do (a subset of) the following in sequence:

* write your own grammar in whatever bespoke language you want

* compose those grammars into a single grammar

* generate a Bison grammar from that grammar

* run `bison --xml` instead of actually generating code

* read the XML file and implement your own (trivial) runtime so you can easily handle ownership issues

In particular, I am vehemently opposed to the idea of implementing parsers separately using some non-proven tool/theory, since that way leads to subtle grammar incompatibilities later.



I'm not super familiar with the space, but tree-sitter seems to take an interesting approach in that they are an incremental parser. So instead of re-parsing the entire document on change, it only parses the affected text, thereby making it much more efficient for text editors.

I don't know if that's specific to tree-sitter though, I'm sure there are other incremental parsers. I have to say that I've tried ANTLR and tree-sitter, and I absolutely love tree-sitter. It's a joy to work with.


In my experience incremental parsing doesn't really make much sense. Non-incremental parsing can easily parse huge documents in milliseconds.

Also Tree Sitter only does half the parsing job - you get a tree on nodes, but you have to do your own parse of that tree to get useful structures out.

I prefer Chumsky or Nom which go all the way.


Ah interesting, yeah I did spend quite a bit of time parsing their AST, which turned out to be harder than writing the grammar itself. I’ll look into those two projects.


What do you mean by “parse of that tree to get useful structures out”? Can you provide some concrete examples?


Yeah suppose you write a simple config language like:

  let a = 12;
  let b = a + 5;
  ...

Tree-Sitter will give you a tree like

   Node(type="file", range=..., children=[
     Node(name="let_item", range=... children=[
       Node(name="identifier", range=...)
       Node(name="expression", range=..., children=[
         Node(name="integer_literal", range=...)
   ...
Whereas Nom/Chumsky will give you:

    struct File {
      let_items: Vec<LetItem>,
      ..
    };
    struct LetItem {
      name: String,
      expression: Expression,
    };
    ...
Essentially Tree-Sitter's output is untyped, and ad-hoc, whereas Nom/Chumksy's is fully validated and statically typed.

In some cases Tree-Sitter's output is totally fine (e.g. for syntax highlighting, or rough code intelligence). But if you're going to want to do stuff with the data like actually process/compile it, or provide 100% accurate code intelligence then I think Nom/Chumksy make more sense.

The downsides of Nom/Chunksy are: pretty advanced Rust with lots of generics (error messages can be quite something!), and keeping track of source code spans (where did the `LetItem` come from) can be a bit of a pain, whereas Tree-Sitter does that automatically.


Ok, understood. I was confused by the phrase "parse of that tree".

Tree-sitter's output is closer to being "dynamic" than "untyped", though.

It's not too hard to build a layer on top of tree-sitter (out of the core lib) to generate statically typed APIs. I haven't felt the need for that yet, but it may be worth exploring.

> actually process/compile it

At work, I built a custom embedded DSL, using tree-sitter for parsing. It has worked well enough so far. The dynamically-typed nature of tree-sitter actually made it easier to port the DSL to multiple runtimes.

> provide 100% accurate code intelligence

Totally agree that tree-sitter cannot be used for this, if we are aiming for 100%.


Not the person you’re asking, but basically anything that needs to happen after the initial parsing stage. So you convert your raw text into an AST, but there’s usually some processing you need to do after that.

Maybe you need to optimize the data, maybe you need to do some error checking. Lots of code is syntactically valid but not semantically valid, and usually those semantic errors will persist into the AST (in my limited experience).


> [incremental parsing] I don't know if that's specific to tree-sitter though

No, it isn't. And incremental parsing is older than 2011 too (like at least the 70s).

For example: https://dl.acm.org/doi/pdf/10.1145/357062.357066


Yes. You roll your own manual parser. It's not as difficult as people make it out to be.


These are not new, but my takeaways from https://tratt.net/laurie/blog/2020/which_parsing_approach.ht... and https://rust-analyzer.github.io/blog/2020/09/16/challeging-L... are to embrace various forms of LR parsing. https://github.com/igordejanovic/parglare is a very capable GLR parser, and I've been keeping a close eye on it for use in my projects.


Yes, we resdesigned our programming languages to be easy to parse with limited lookahead.


I feel that most of the time the two options are presented as either write a handwritten parser or use a parser generator. A nice third way is to write a custom parser generator for the language you wish to parse. Handwritten parsers do tend to get unwieldy and general purpose parser generators can have inscrutable behavior for any specific language.

Because the grammar for a parser generator is usually much simpler than most general purpose programming languages, it is typically relatively straightforward to handwrite a parser for it.


You could write a language with which to specify that custom parser. Oh wait...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: