

Sometimes Java is just weird - jere_jones
http://www.vsadt.com/blog/2010/11/16/java-real-numbers-do-the-funky-chicken-in-hex.html

======
celoyd
_I understand the need for localization and all, but 46 THOUSAND characters?
Jeez._

There are that many Han characters alone, so I’m not sure what the surprise
is. It’s not like you have to hard-code them in your grammar.

If anything, I’d hope that new languages in 2010 allow any of the roughly
100,000 non-control non-whitespace [edit: non-punctuation] Unicode characters.
For a lot of the code I see, ASCII is at least as constraining as, say,
fixnums would be.

~~~
riffraff
moreover you actually already have trivial code for that in
$JAVAISH_IMPLEMENTATIon's Character.isUnicodeIdentifier{start,part}

~~~
jere_jones
I used Character.isJavaIdentifierStart(int) and
Character.isJavaIdentifierPart(int) to write a file with the ranges that I cut
and pasted into my C# code. And thank goodness, too! I'm sure I would've made
a typo or missed characters if I had to type all that myself.

------
stcredzero
"For now, I'm not going to support this in..."

Often, this is really just another way of saying, "I didn't think of this
before, and I don't want to start thinking about it just now."

The floating point notation would be understandable for those who need to be
intimately familiar with floats at the bit level. There are a few people who
have to do this.

------
raju
Perhaps I am nitpicking here - but it's "weird" not "wierd".

Also, I am not sure rewording the title added any value. The title of the
article IMO was just fine - See <http://ycombinator.com/newsguidelines.html>

~~~
jere_jones
Thank you for the correction. I guess I was typing too fast. :-)

------
tolmasky
None of this seems particularly weird to me (except _maybe_ the fact that the
hexadecimals use binary exponents, but I can imagine why this would be way
more useful than a decimal exponent)

Allowing unicode identifier names is a feature I've seen many people ask for,
and doesn't seem like that big a deal. It must be frustrating using languages
that don't support this feature to foreign speakers. Of course some characters
can be in an identifier but not start it, this is true of most languages. You
can't start variables with numbers in C.

I see no issue with allowing numbers in different bases. And of course the
decimal would also be in that base. It would be weird if left of the decimal
were in hex and right of the decimal were decimal.

------
iwr
"For now, I'm not going to support this in the parser as it appears to be a
fairly dusty corner of the language. Maybe at some later date."

And that is why this particular parser won't pass the Java TCKs.

~~~
jere_jones
I have never heard of "Java TCKs" but after a little research, I understand
them better. Just out of curiosity, what would passing the Java TCKs do for me
or my users?

~~~
iwr
Certification will lead to the recognition of your product as an enterprise-
ready technology, allowing interoperability and intercompatibilitization with
all Oracle Java products and libraries. In today's complex and increasingly
specialized economy, a promise of compatibility is not sufficient. The _Oracle
Java Compatibility Process™_ provide you with the credentials you need to
succeed in acquiring market share.

------
brown9-2
I'm curious, are you building your own parser from scratch? There are several
existing libraries out there which can work with Java source code either at
the bytecode or syntax-tree level.

Depending on your purpose, might be a whole lot easier to re-use an existing
implementation and focus on whatever you're planning to use this parser for...

~~~
jere_jones
I'm using Irony as the parser. But I still have to translate the grammar into
C# code.

I was unable to find an existing Java parser in C#. Do you know of one?

------
Jach
The only weird thing to me is this:

> # There are 46,908 different valid characters that you can use in an
> identifier

And yet among all those characters I can't have a question mark at the end of
a boolean variable or function name...

Anyway, weirdness is no excuse for not supporting a feature in a parser!

~~~
kleiba
That must be because of the ternary operator:

    
    
        condition ? result-if-true : result-if-false
    

although you're probably right that a smarter parser should be able to
distinguish between both cases. But then the next guy comes along and
complains that he can't use a colon as a valid character in his variable
names...

------
davidw
Maybe you could borrow parsing code from Apache's Harmony? That's under a very
liberal license.

