More and more I've come to the opinion that in the long run, it's useful for a l...

chaosite · on Sept 1, 2023

Alternatively, use tree-sitter[0] for the tooling.

[0] https://tree-sitter.github.io/tree-sitter/

Karellen · on Sept 1, 2023

Huh. I didn't realise that tree-sitter had its own parser for each of the languages it supported. For some reason I thought it used existing compilers/parsers to be able to figure out which symbols would be valid at a given point.

TIL.

pfdietz · on Sept 2, 2023

Using existing parsers is a problem in things like C with preprocessors. You need more than just a file to use those; you need a build environment with all the includes files and command line options defining appropriate macros. Tree sitter makes a good effort attempt to parse such things but can't get it right in general.

Karellen · on Sept 2, 2023

Ah, for some reason I thought tree-sitter did autocomplete, and that pretty much relies on being able to find and read the relevant include files too.

I just checked and it doesn't do that, so if it's just doing enough for syntax highlighting, yeah, a custom parser is probably the best way to go about that. Thanks.

Karellen · on Sept 1, 2023

Isn't that possible either with

a) a compiler that can recover from errors and continue to parse the rest of a file, to produce an AST of all the parts that make sense but error nodes for the bits that don't.

b) a compiler that can optionally not run some passes (like typechecking) so that an editor/treesitter can get an understanding of the structure of the code, but an actual compile will do more. (Actually, this should already happen, so language servers don't do optimisation or IL/bytecode/asm generation passes.)

saghm · on Sept 1, 2023

Yeah, it's not strictly necessary that the compiler doesn't emit any parse errors for this sort of thing as long as sufficient caveats are in place, but I guess my question is...why? As long as its properly caught as a type error, I don't really see the benefit of making a separate path where it's also a parse error. All else being equal, I feel like having a simpler grammar (e.g. where arrays are indexed by expressions rather than some specific subset of them) is both more flexible and makes the parser simpler; it's not obvious what advantage making something a parser error instead of a type checking error would be.