Hacker News new | comments | ask | show | jobs | submit login
We need more keywords (java.net)
33 points by myinnerbanjo 11 days ago | hide | past | web | favorite | 14 comments





Another option is to enable language features on a per-file basis. That way you could have your "ideal" syntax, and only have to do the work of updating those files where you need to use it.

The only language I can think of that does this extensively is Haskell (and some flavors of regular expressions). The main downside I have noticed is that the ability to do this allows for the language to add a bunch of complexity that would otherwise be prevented by the need to preserve backwards compatibility [1]; but, if the language writers can remain disciplined, it seems like an effective way to grow a language without the growing pains the article is discussing.

For adding keywords, this approach doesn't even add that much maintenance cost, as all the additional complexity is confined to the lexer.

[1] Or, as has happened several times with Haskell, add features that too broken to even be considered adding but for the fact they are always behind a flag.


> The only language I can think of that does this extensively is Haskell

Rust is doing the same thing (although on the level of crates (packages) instead of individual files). When you want to use new keywords like async/await, you have to declare your crate as using the 2018 edition of Rust (as opposed to the original 2015 edition). Crates using different editions can be freely mixed, so when a new edition is released, everyone can migrate to the new edition at their own pace.


Python has the same idea, albeit on a smaller scale, with its "from __future__ import <feature>" statements:

https://www.python.org/dev/peps/pep-0236/


On a purely syntactical level, F# has a "light" indentation-based syntax mode which can be toggled on a per-file basis, and OCaml recently got a whole syntax revision with curly braces in the form of ReasonML, which the build system automatically recognizes based on its file extension.

Neither of these adds new keywords (ReasonML switches some around, e.g. match is now switch) but maybe one of these approaches might work for language features or breaking syntax changes or revisions.


One of my favorite features of Racket[1] is that each file declares its dialect on the top line with `#lang language-name`. Different dialects, such as custom embedded DSLs, can have slight syntax modifications or even their own custom lexer and parser. In the end Racket modules all boil down to a smaller core language and interoperate beautifully.

[1]: https://racket-lang.org


> Another option is to enable language features on a per-file basis.

Will you need a new keyword to enable your option for new keywords?


It works better if it is part of the language from the start.

Having said that, we can probably implement it using only existing keywords. Maybe: import java.lang.extensions.assert.

(I assume Java has some reserved part of the package namespace they can use for this purpose. Even if they don't, it shouldn't be difficult to find a package name that hasn't been used by anyone yet)


Alternatively...Use a new file extension. That way you have a syntax version. Repeat if you need. If you build into that syntax whatever means you need to future extend keywords no need to repeat.

If a user could alias or reassign certain keywords, it might be another solution. That way, older projects only require a keywords mapping configuration, and newer project probably will just use the default keywords.

  the lexer can always tell with fixed lookahead whether `a-b` is three tokens or one
How does that work? It doesn't seem obvious to me?

I also wonder what approaches other languages have taken since I haven't seen dashed keywords in any of the other big languages.


>How does that work? It doesn't seem obvious to me?

Assume that "-" is the MINUS token. This means that, starting with the first character in "a", we have the token string "A MINUS B", which we can determine with a 3 token lookahead.

Now, if either A or B are reserved keywords, the lexer knows that "A MINUS B" is not correct (unless the grammar happens to allow for a keyword to occur next to a minus sign, which I don't think Java does. If it does, then you just avoid using that keyword when deriving a new one). At this point, the lexer and lex it as "A-B", which is (hopefully) a keyword.

I can't think of any examples, but I feel like I have seen languages reserve take a simmiliar approach where they reserve a prefix, which allows them to create as many new reserved words as they like.


Ooh let's take a tour of big languages.

- C has given up on new identifiers save the "reserved namespace", consisting of an underscore followed by an Uppercase, which is how you get _Bool and _Generic. Oof, nuff said.

- C++ has stretched poor `static` to its limits, and now incorporates context-sensitive "identifiers with special meaning": `final` and `override`. No new hard keywords in C++17 AFAIK.

- C# takes this even further with its LINQ-driven contextual keywords.

- JavaScript stumbles about by adding new identifiers, BUT only in strict mode, and even then only sometimes; `yield` is especially ambiguous.

Fill in the blanks please!


I think C# has only ever added contextual keywords after its initial release and if I remember correctly there was only one breaking change for semantics of source code, which I find impressive. I also didn't get the impression (from working with Roslyn and looking at its source here and there, as well as reading posts from people like Eric Lippert or Mads Torgersen about the language design) that contextual keywords are as much of a hassle as Brian makes them out to be. About the only really annoying one I can think of in C# would be the nameof operator, which has to be parsed as a method invocation and is only the nameof operator when there's no symbol with that unqualified name accessible at that point that can be invoked as a method (e.g. an actual method named nameof (rare), or a local variable of a delegate type). Pretty much all other contextual keywords are only valid in a few places where parsing is not ambiguous. You just happen to have a token there that's still valid elsewhere as an identifier.

It's interesting to read about the musings and decision making processes for different languages, though. I'm sure Java and C# are both designed very carefully, yet with radically different goals and outcomes. And I'm sure both language design teams must ponder pretty much the same issues.


C is a little better than presented here because they add a header to the standard library with a #define in it that gives the new identifier a reasonable keyword; for example, you can access `_Bool` as `bool` if you `#include <stdbool.h>`. I actually think this is a reasonable compromise; you can opt in to the new "keyword" at the compilation-unit-level.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: