Compile-time and short-circuit evaluation

victorNicollet · on Sept 1, 2023

There are really two reasons one would be analyzing the source code.

One reason is to make sense of how to run it: to determine what instructions to emit. This is the kind of analysis where you want to propagate constants (even if they're not flagged as compile-time), to drop dead code (if it can be can determined that it doesn't need to be executed), and so on. For this analysis, even if && evaluation isn't short-circuit, one can replace false && A with just false as long as you know A doesn't have side-effects, and at the very least one can replace:

    if (CallA() && false)
    {
        CallB();
    }

With just:

    CallA();

The other reason to analyze your source code is to provide some level of static information to the user, such as detecting in advance situations that don't make sense. If there is never a situation where the expression `okeoefkepofke[3.141592]` is acceptable, then it's usually a good idea to report it as such even if the expression is 100% dead code as far as execution goes. I would even go one step further and say that static analysis should also register that expression as a (read-only) use of the variable `okeoefkepofke` for purposes of "find all references" or "rename symbol" tools.

Karellen · on Sept 1, 2023

I'm not sure why syntax checking/semantic analysis, and compile-time evaluation, can't be two separate passes?

    if (false && okeoefkepofke[3.141592])

Why can't you check that using a float as an array index is a syntax error, no matter if it's going to be evaluated or not?

Are multi-pass compilers not a thing any more?

saghm · on Sept 1, 2023

More and more I've come to the opinion that in the long run, it's useful for a language to treat a larger set of things as syntactically valid but not typecheck. Having good support for a language in editors is a pretty important hurdle in adoption for new languages, and having that support be crippled by a syntax error in a program is pretty frustrating to new users trying out a language for the first time. If something is a type error but still parseable, it's much easier to write tooling given that it will still have a full AST that it can work with. There will obviously be edge cases that can't be parsed and editor tooling will need to work around, but every case that you can handle in the compiler rather prevents that from needing to be solved by multiple different tools for different contexts. It seems like a clear win for a new language to proactively try to design the parser with common mistakes (or partially complete program states) in mind rather than ignore everything but the happy path where everything is perfectly correct.

chaosite · on Sept 1, 2023

Alternatively, use tree-sitter[0] for the tooling.

[0] https://tree-sitter.github.io/tree-sitter/

Karellen · on Sept 1, 2023

Huh. I didn't realise that tree-sitter had its own parser for each of the languages it supported. For some reason I thought it used existing compilers/parsers to be able to figure out which symbols would be valid at a given point.

TIL.

pfdietz · on Sept 2, 2023

Using existing parsers is a problem in things like C with preprocessors. You need more than just a file to use those; you need a build environment with all the includes files and command line options defining appropriate macros. Tree sitter makes a good effort attempt to parse such things but can't get it right in general.

Karellen · on Sept 2, 2023

Ah, for some reason I thought tree-sitter did autocomplete, and that pretty much relies on being able to find and read the relevant include files too.

I just checked and it doesn't do that, so if it's just doing enough for syntax highlighting, yeah, a custom parser is probably the best way to go about that. Thanks.

Karellen · on Sept 1, 2023

Isn't that possible either with

a) a compiler that can recover from errors and continue to parse the rest of a file, to produce an AST of all the parts that make sense but error nodes for the bits that don't.

b) a compiler that can optionally not run some passes (like typechecking) so that an editor/treesitter can get an understanding of the structure of the code, but an actual compile will do more. (Actually, this should already happen, so language servers don't do optimisation or IL/bytecode/asm generation passes.)

saghm · on Sept 1, 2023

Yeah, it's not strictly necessary that the compiler doesn't emit any parse errors for this sort of thing as long as sufficient caveats are in place, but I guess my question is...why? As long as its properly caught as a type error, I don't really see the benefit of making a separate path where it's also a parse error. All else being equal, I feel like having a simpler grammar (e.g. where arrays are indexed by expressions rather than some specific subset of them) is both more flexible and makes the parser simpler; it's not obvious what advantage making something a parser error instead of a type checking error would be.

kazinator · on Sept 2, 2023

I don't understand what this is talking about.

Checking isn't evaluation. Short-circuiting and by a constant false doesn't mean we skip checks on the right hand side. The right hand side is dead code. We check dead code.

E.g.

  if (false) { x = undefined_var; }

should diagnose the undefined_var before throwing away the dead code.

(If such a diagnostic exists and would be applied to

  if (global_var) { x = undefined_var; }

)

Mathnerd314 · on Sept 1, 2023

   "This is perfectly legitimate behaviour... BUT it would mean this would pass semantic checking as well: [...] So now we got this big piece of code that is wrong."

I don't follow. Wrong in what sense? Per the post it would compile and run just fine. If I comment out code so it's never run, it could be whatever I want. Maybe I write pseudocode and gradually uncomment it and turn it into C++ with CodeLLama. Why is Mr. Lernö butting in and telling me how to write code?

wCxV8HzziQBb · on Sept 1, 2023

Compile-time short circuiting is why https://https://ziglang.org[Zig] sucks. It means that code that is never called never goes through semantic analysis, which means that calling functions can expose latent compile errors.

I've gone back to C99. Sure, arrays aren't bounds checked, but I can write C so much faster than Zig that it doesn't matter.