Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> it's weird to assign moral superiority to one side of the grass or the other

Algorithmic complexity is not "moral superiority." The whole point of formal language theory is to show how the different languages differ in terms of how difficult they are to work with.

> just as they use regular expressions for tokenizing and allow regular language sub-languages

That is because regular languages are a subset of context-free languages. There is nothing special to "allow" there.

> I can point to ambiguous context-free grammars and say whitespace sensitivity is ok, because the tools to fix the one are often the same to fix the other.

No they are not. Precedence rules and extra lookahead will help with ambiguous context-free grammars but not with context-sensitive ones. There is also a whole class of algorithms, such as GLR, specialized to handle ambiguous CFGs efficiently.

> Whitespace sensitivity is one of the easier context sensitivity challenges to embed as a sub-language in an otherwise (mostly) context-free language, because you can represent it entirely as pseudo tokens from a simply modified lexing phase in a traditional CFG parser.

That statement is nonsense. There is no way to "embed" a context-sensitive language in a CFG. What you are saying is that it is "easy" to make a whole context-sensitive parser just to produce a context-free language so you can pass that to another CFG parser. That is not "easy," that is bolting crap onto other crap.




> Algorithmic complexity is not "moral superiority." The whole point of formal language theory is to show how the different languages differ in terms of how difficult they are to work with.

Chomsky's hierarchy was designed for production from categories of languages. That it doubles as a useful rule of thumb for algorithmic complexity in parsing is a fascinating dualism in mathematics. That those same rules of thumb reflect basic automaton abstractions is even more fascinating. This is also where it seems the clearest analogy lies to why I find your arguments to "complexity" so useless. It sounds to me like a strange steampunk form of the "640k is all anyone will ever need" fallacy: "why use Turing machines when Pushdown automatons will do just fine?"

Yes, there are a lot of great tools for working with CFGs, just as we've pushed Regular Expressions far past the boundaries of formal Regular Languages, we push these same tools past the boundaries of formal CFGs. We aren't actually constrained by the limits of only using Deterministic Finite Automata or Pushdown Automata in our programming, we have vast Church-Turing–level state machines at our power completely and easily capable of tracking contexts/states.

The "context" in a whitespace-sensitive language is "how many spaces have you seen recently?". This is not a hard question, this is not some mystical "context sensitivity" that needs an entire "context-sensitive language parser". It's a handful of counters. That is the very definition of easy, that's the built-in basics of your average Church-Turing machine, keep a number on the tape and update it as necessary.

Python's specification for the language defines the language in a CFG BNF just as the majority of brackets languages. The one concession to its whitespace sensitivity is that it expects from its lexer INDENT and DEDENT tokens. Those tokens are added simply by counting whitespace between lines, seeing if there is a < or > relationship. After those tokens are in the stream (along with the rest of the tokens defined in their associated (Mostly) Regular Languages) it is parsed by whatever CFG automaton/algorithm the Python team wants (LR, GLR, LALR, PEG, etc). That's not "bolting crap onto other crap" by most stretches of the imagination, that's using a pretty simple stack of tools that any programmer should be able to (re-)build, and not one at all more complicated than (and in fact built entirely inside) the classic tokenizer/lexer feeding a parser stack.


> This is also where it seems the clearest analogy lies to why I find your arguments to "complexity" so useless.

Honestly, from everything you have posted so far, I really get the impression that you have never worked on programming language parsing, or with large code bases. The difference in algorithmic complexity is not "useless," it is the difference between being able to compile millions of lines of code in a few seconds on modest hardware, and needing a cluster just to get your builds done:

https://community.embarcadero.com/blogs/entry/compiling-a-mi...


Not saying algorithmic complexity is useless, only that it is a starting point in a discussion, not the end of the discussion. I've repeatedly used the analogy "tools in the toolbelt" as my return point in the discussion, so maybe you are missing plenty of my actual words in your "impression" that swings wildly toward an ad hominem attack.

If we want to fight anecdote for anecdote, I've certainly seen millions of lines of code of Python analyzed ("compiled") in seconds on modest hardware, rather than a cluster. Differences there are more in the static typing versus dynamic typing, and strongly compiled versus weakly compiled/primarily interpreted there. Maybe a better anecdote is millions of lines of Haskell? Seen that too. Whitespace-sensitive languages scale just fine.

I can understand it if you want to admit that you just don't like whitespace-sensitive languages for personal reasons, but there aren't very good technical reasons and you seem unhappy with my explanations of why that is the case. I'm not sure how much more I can try to explain it, and given you seem close to resorting to personal attacks I'm afraid this is likely where the conversation ends.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: