"Creative" grammar introduces parsing difficulties, which makes IDE tooling hard...

klodolph · on May 19, 2022

Creative grammar can introduce parsing difficulties, but it doesn't have to.

I've made a couple small languages, and it's easy to end up lost in a sea of design decisions. But there are a lot of languages that have come before yours, and you can look to them for guidance. Do you want something like automatic semicolon insertion? Well, you can compare how JavaScript, Python[1], Haskell, and Go handle it. You can even dig up messages on mailing lists where developers talk about how the feature has unexpected drawbacks or nice advantages, or see blog posts about how it's resulted in unexpected behavior from a user standpoint.

You can also take a look at some examples of languages which are easy or hard to parse, even though they have similar levels of expressivity. C++ is hard to parse... why?

You'd also have as your guiding star some goal like, "I want to create an LL(1) recursive descent parser for this language."

There's still a ton of room for creativity within constraints like these.

[1]: Python doesn't have automatic semicolon insertion, but it does have a semicolon statement separator, and it does not require you to use a semicolon at the end of statements.

pizza234 · on May 19, 2022

> you can look to them for guidance. Do you want something like automatic semicolon insertion? Well, you can compare how JavaScript, Python[1], Haskell, and Go handle it

You can't look at JavaScript/Python/Go (I don't know about Haskell), because Rust is a mostly-expression language (therefore, semicolons have meaning), while JavaScript/Python/Go aren't.

The conventional example is conditional assignment to variable, which in Rust can be performed via if/else, which in JS/Python/Go can't (and require alternative syntax).

klodolph · on May 19, 2022

> You can't look at JavaScript/Python/Go (I don't know about Haskell), because Rust is a mostly-expression language (therefore, semicolons have meaning), while JavaScript/Python/Go aren't.

I have a hard time accepting this, because I have done exactly this, in practice, with languages that I've designed. Are you claiming that it's impossible, infeasible, or somehow impractical to learn lessons from -- uhh -- imperative languages where most (but not all) programmers tend to write a balance of statements and expressions that leans more towards statements, and apply those lessons to imperative languages where most (but not all) programmers tend to write with a balance that tips more in the other direction?

Or are you saying something else?

The fact that automatic semicolon insertion has appeared in languages which are just so incredibly different to each other suggests, to me, that there may be something you can learn from these design choices that you can apply as a language designer, even when you are designing languages which are not similar to the ones listed.

This matches my experience designing languages.

To be clear, I'm not making any statement about semicolons in Rust. If you are arguing some point about semicolon insertion in Rust, then it's just not germane.

kibwen · on May 20, 2022

Not the parent, but you can certainly have an expression-oriented language without explicit statement delimiters. In the context of Rust, having explicit delimiters works well. In a language more willing to trade off a little explicitness for a little convenience, some form of ASI would be nice. The lesson is just to not extrapolate Rust's decisions as being the best decision for every domain, while also keeping the inverse in mind. Case in point, I actually quite like exceptions... but in Rust, I prefer its explicit error values.

steveklabnik · on May 20, 2022

Ruby is a great example of a language that’s expression oriented, where terminators aren’t the norm, but optionally do exist.

pizza234 · on May 20, 2022

> I have a hard time accepting this, because I have done exactly this, in practice, with languages that I've designed.

I don't know which your languages are.

Some constructs are incompatible with optional semicolons, as semicolons change the expression semantics (I've given an example); comparison with languages that don't support such constructs is an apple-to-oranges comparison.

An apple-to-apple comparison is probably with Ruby, which does have optional semicolons and is also expression oriented at the same time. In the if/else specific case, it solves the problem by introducing inconsistency, in the empty statement, making it semantically ambiguous.

xigoi · on May 20, 2022

Ruby is expression-oriented like Rust and doesn't have semicolons. Neither do most functional languages.

int_19h · on May 20, 2022

Have you also written tooling - e.g. code completion in an IDE - for those small languages? There are many things that might be easy to parse when you're doing streaming parsing, but a lot more complicated when you have to update the parse tree just-in-time in response to edits, and accommodate snippets that are outright invalid (because they're still being typed).

klodolph · on May 20, 2022

Yes, that's a good example of exactly what I'm talking about. Code completion used to be really hard, and good code completion is still hard, but we have all these different languages to learn from and you can look to the languages that came before you when building your language.

Just to give some more detail--you can find all sorts of reports from people who have implemented IDE support, talking about the issues that they've faced and what makes a language difficult to analyze syntactically or semantically. Because these discussions are available to sift through in mailing lists, or there are even talks on YouTube about this stuff, you have an wealth of information at your fingertips on how to design languages that make IDE support easier. Like, why is it that it's so hard to make good tools for C++ or Python, but comparatively easier to make tools for Java or C#? It's an answerable question.

These days, making an LSP server for your pet language is within reach.

KptMarchewa · on May 19, 2022

Tooling should not depend on code text, but on language's AST.

nightski · on May 19, 2022

I'm not an expert as I do not work on these tools but I don't think IDEs can rely solely on ASTs because not all code is in a compilable state. Lots of times things have to be inferred from invalid code. Jetbrains tools for example do a great job at this.

bobbylarrybobby · on May 20, 2022

In practice though, getting the AST from the text is a computational task in and of itself and the grammar affects the runtime of that. For instance, Rust's "turbofish" syntax, `f::<T>(args)`, is used to specify the generic type for f when calling it; this is instead of the perhaps more obvious `f<T>(args)`, which is what the definition looks like. Why the extra colons? Because parsing `f<T>(args)` in an expression position would require unbounded lookahead to determine the meaning of the left angle bracket -- is it the beginning of generics or less-than? Therefore, even though Rust could be modified to accept`f<T>(args)` as a valid syntax when calling the function, the language team decided to require the colons in order to improve worst case parser performance.

xigoi · on May 20, 2022

This is why using the same character as a delimiter and operator is bad.

colejohnson66 · on May 20, 2022

How does C# manage to handle without the turbo fish syntax? What’s different in Rust?

bobbylarrybobby · on May 21, 2022

It's not impossible to handle the ambiguity, it's just that you may have to look arbitrarily far ahead to resolve it. Perhaps C# simply does this. Or perhaps it limits expressions to 2^(large-ish number) bytes.

tripa · on May 19, 2022

Comments tending to skip on being a part of the AST make that harder.