
Is C++ context-free or context-sensitive? - pmelendez
http://stackoverflow.com/questions/14589346/is-c-context-free-or-context-sensitive
======
ggchappell
It seems to me that the real issue is not mentioned by the original question.
Yes, "a b(c);" has ambiguous semantics; its interpretation depends on whether
c is a type. But such ambiguity is not a matter of syntactic correctness.

However, in a function prototype, we can name the parameters. Thus,

    
    
      a b(c x);
    

is syntactically correct C++ if c is a type, but _not if it isn't_. A correct
grammar for C++ must deal with this issue, which cannot be settled by a CFG.

Also, a couple of notes.

(1) "Context-sensitive" does not mean "not context-free". In fact, strictly
speaking, every context-free grammar is a context-sensitive grammar. And there
are grammars that are neither. So the original question should really be
restated.

(2) The long answer makes a very important point:

> ... the standard does not attempt to provide a complete formal grammar, and
> ... it chooses to write some of the parsing rules in technical English.

> What looks like a formal grammar in the C++ standard is not the complete
> formal definition of the syntax of the C++ language.

~~~
tsahyt
Correct me if I'm wrong here but isn't that exactly the line between syntax
and semantics? I'd think that

    
    
        a b(c x);
    

is _syntactically_ correct C++ but still not valid C++ code (as in: a well
defined line of C++). As far as I know, syntax doesn't depend on whether a
type or an identifier or whatever actually exists.

~~~
scott_s
No; if "c" is not a type, then there are no grammar rules which will allow you
to produce that statement. So, as you parse that text, when you get to "c",
you have to ask, "Is 'c' a type?" The answer you get will determine which
subsequent grammar rules you apply. That context - having to know something
about "c" that was described elsewhere - isn't technically semantics. It's an
assertion just that "c" appeared elsewhere in the text, and when we parsed it,
it was in a particular location in the grammer (which just so happens to mean
it is a _type_ \- but the parser doesn't need to know what that _means_ ).

An example of syntactically correct, but semantically undefined code is:

    
    
      *(static_cast<int*>(NULL)) = 42;
    

The difference between here and above is that there is nothing in the grammar
that can tell you "this is an error". It's only an error because of the
_semantics_ that we've given to the operations.

------
dottrap
C++ is absolutely not context-free. It is probably in fact the worst and is
"undecidable". <http://yosefk.com/c++fqa/defective.html#defect-2> Compiler
writers have to use heuristics to sort out the all the ambiguities and cross
their fingers they are correct.

Lua might be context-free. I forgot how to do rigorous proofs. But is one of
the cleanest and most elegant languages ever made. See the complete (tiny)
syntax of Lua in extended BNF at the very bottom of the manual.
<http://www.lua.org/manual/5.2/manual.html>

~~~
lucian1900
Afaik both Lua and Python are context-free.

~~~
Twisol
How would you implement Lua's long-string syntax in a context-free grammar?

    
    
        str = [==[foo]bar]=]baz]==]
    

Been trying to puzzle this one out.

------
chimeracoder
I don't think any commonly used language has a truly context-free grammar
(including Lisp, actually, though Lisp's deviations are both minor and
straightforward). That said, Lisp and Go are the two that come the closest, in
my recollection.

We actually had this discussion on #go-nuts a few weeks ago - part of the
confusion stems from the fact that most language compilers enforce
requirements that aren't covered by the parser. In other words, a parser
oftentimes accepts (fails to reject) invalid strings.

Easy examples of requirements that violate context-freedom are the requirement
that variables be declared before use, or the inability to divide an integer
literal by 0. (The latter is a compiler error in Go[0] - I'm not sure about
C++, but I know the former is a requirement in both languages).

Finally, we should make a distinction - even if the set of valid programs
_could_ be described perfectly by a context-free grammar (and therefore
implemented by a pushdown automaton), chances are the actual parser for the
language's compiler _doesn't_ implement the full set of rules, which again
reinforces the point that most parsers are biased in their outcome (biased in
favor of accepting potentially invalid strings).

> If you look up the definition of context-free languages, it will basically
> tell you that all grammar rules must have left-hand sides that consist of
> exactly one non-terminal symbol

That's not exactly true - one can rewrite (normalize) a context-free grammar
into this form, but CFGs are often written in an "unnormalized" form which has
multiple nonterminal symbols, because it tends to be clearer to read in many
cases.

[0] As demonstrated here, the compiler will actually reject the division-by-
zero at compile-time: <http://play.golang.org/p/3gwDeaoiM2>

~~~
Rexxar
I think you are conflating the grammar checking of a language with all the
checks done in other compilation phases.

For example, in a context free language, if "A : B;" is a variable
declaration, A is always a type and B is always a variable name. In the
expression "C = C + 1", C would always be a variable name even if C has not
been declared. In this case the error will be detected in the next compilation
phase but not during grammar checking.

In the same ways all type checking are done after grammar checking.

For example :

    
    
        int i = "iyi";
    

is a valid line for C++ grammar but invalid for the type checker.

~~~
praptak
> I think you are conflating the grammar checking of a language with all the
> checks done in other compilation phases.

The main ppoint is that for some languages those two kinds of checks can not
be easily separated. The top answer to the linked question demonstrates that
for C++. It occurs that in C++ typen<1>() can be either parsed as a template
istantiation or not-parsed (i.e. producing a syntax error) as (typen < 1)>().

The syntactical correctness of this expression depends on the result of
evaluation of a fine piece of template metaprogramming - there must be more
than one grammar checking phase, each one dependent on the evaluation of the
previous one. This gives a good hint on why C++ compile times are sometimes so
huge.

------
jessaustin
Best comment on SO was from jpalecek, who observed that the requirement that
variables be declared prevents the language from being context-free. Of course
that requirement is not enforced in the parser, so you might not care about
it.

------
ComputerGuru
Can I reply with another question: why does it matter? I mean, even if C++
_were_ context-free, that does not make it any better or worse of a language.
It certainly does not alleviate the fact that the best-case number of passes
required to compile C++ code is (if I'm not mistaken) 3 all the way up to 6
for proper optimization.

(Note: I love C++. But that doesn't take away from the fact that's one of the
worst languages to parse and compile.)

------
bo1024
As pointed out in Dan's answer, the phrase "grammar of a programming language"
is not well-defined. Even "syntax of a programming language" is not well-
defined.

Until someone defines those terms accurately, trying to answer the question is
a waste of time.

------
niggler
"I wish more titles were that Google-friendly."

I agree that this specific case is google-friendly, but many SO and HN posts
are not. In general, how does one construct a google-friendly title?

~~~
jsmeaton
"Problem with my Python" is not google friendly.

"Why does my division produce an integer in Python" is google friendly.

Essentially, you should be able to work out the rough content of a question
based on its title. I'm sure there's some irony about context and grammars
hidden in this response.

