More and more I've come to the opinion that in the long run, it's useful for a language to treat a larger set of things as syntactically valid but not typecheck. Having good support for a language in editors is a pretty important hurdle in adoption for new languages, and having that support be crippled by a syntax error in a program is pretty frustrating to new users trying out a language for the first time. If something is a type error but still parseable, it's much easier to write tooling given that it will still have a full AST that it can work with. There will obviously be edge cases that can't be parsed and editor tooling will need to work around, but every case that you can handle in the compiler rather prevents that from needing to be solved by multiple different tools for different contexts. It seems like a clear win for a new language to proactively try to design the parser with common mistakes (or partially complete program states) in mind rather than ignore everything but the happy path where everything is perfectly correct.
Huh. I didn't realise that tree-sitter had its own parser for each of the languages it supported. For some reason I thought it used existing compilers/parsers to be able to figure out which symbols would be valid at a given point.
Using existing parsers is a problem in things like C with preprocessors. You need more than just a file to use those; you need a build environment with all the includes files and command line options defining appropriate macros. Tree sitter makes a good effort attempt to parse such things but can't get it right in general.
Ah, for some reason I thought tree-sitter did autocomplete, and that pretty much relies on being able to find and read the relevant include files too.
I just checked and it doesn't do that, so if it's just doing enough for syntax highlighting, yeah, a custom parser is probably the best way to go about that. Thanks.
a) a compiler that can recover from errors and continue to parse the rest of a file, to produce an AST of all the parts that make sense but error nodes for the bits that don't.
b) a compiler that can optionally not run some passes (like typechecking) so that an editor/treesitter can get an understanding of the structure of the code, but an actual compile will do more. (Actually, this should already happen, so language servers don't do optimisation or IL/bytecode/asm generation passes.)
Yeah, it's not strictly necessary that the compiler doesn't emit any parse errors for this sort of thing as long as sufficient caveats are in place, but I guess my question is...why? As long as its properly caught as a type error, I don't really see the benefit of making a separate path where it's also a parse error. All else being equal, I feel like having a simpler grammar (e.g. where arrays are indexed by expressions rather than some specific subset of them) is both more flexible and makes the parser simpler; it's not obvious what advantage making something a parser error instead of a type checking error would be.