

Arc's Unicode support (by the news.yc patch writer) - kul
http://collison.ie/blog/?p=49

======
jbert
Perl has fairly decent utf8 support, off by default:

    
    
        use utf8;
    

allows source to be written in utf8 (string literals, function+variable names
etc).

Filehandles aren't utf8 by default, but can be put into utf8 mode with the
'binmode' function. (You can go further, and tag filehandles with pretty much
any charset encoding and you'll get the Right Thing happening with reads and
writes).

Other data sources (db handles etc) generally have some API to control this
too.

As a convenience -CIO option to perl puts stdin and stdout into utf8.

Defaulting to off is just the price of back-compatability. Not sure what the
plan is for perl6 in this regard.

------
bootload
Won't work for me at least till py3k and is a good example of Python not
leading.

    
    
      $ cat > foo.py
      # -*- coding: <utf-8> -*- 
      # even with the above character encoding
      def ô():
    	return "ô"
      $ python foo.py
      File "foo.py", line 3
      SyntaxError: Non-ASCII character 'xc3' in file foo.py on line 3, but no
      encoding declared; see http://www.python.org/peps/pep-0263.html 
    

Nothing you can do about this at the moment as python up till 2.5 is ascii
only for strings [0] unless source code encoding is added. It gets worse.
Support for "non ascii identifiers" (shown with the bug above) outlined in PEP
3131 is SF, not SA. [1], [2]

Q Do people use Unicode identifiers in their source code?

[0] BDFL made this as a design choice to default to ascii.

[1] <http://www.python.org/dev/peps/pep-3131/>

[2] <http://www.python.org/dev/peps/>

~~~
anewaccountname
Since 'cat > foo.py' erases foo.py, you are obviously lying about that error
message.

~~~
bayareaguy
Hmm... I guess that makes me a liar too.

    
    
      powerbook.local 104> cat > foo.py
      # -*- coding: <utf-8> -*-
      def ô():
    	return "ô"
      powerbook.local 105> python foo.py
        File "foo.py", line 2
      SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
      powerbook.local 106>
    
    

Now if you actually go look at the web page in the error message, you should
notice that the encoding doesn't include the < >.

But even when the encoding is set properly python still complains:

    
    
      powerbook.local 106> cat > foo.py
      # -*- coding: utf-8 -*-
      def ô():
          return "ô"
      powerbook.local 107> python foo.py
        File "foo.py", line	2
          def ô():
    	  ^
      SyntaxError: invalid syntax
      powerbook.local 108>

~~~
anewaccountname
Are you sure you don't mean 'cat foo.py' or maybe 'cat - < foo.py'?

~~~
bayareaguy
Yes, I'm sure.

cat > foo.py copies standard input into the file foo.py. The rest of the
characters you're seeing in the message are what I typed in. The only
character you don't see is the Ctrl-D I type to indicate EOF. This is a quick
and easy way of creating foo.py without using an editor.

~~~
anewaccountname
Ah, now I see.

------
marvin
The referenced flamewars from comp.lang.lisp are, once again, priceless. I
can't for the love of God understand that community.

------
iamelgringo
And all the flamers owe PG, and Patrick a big, "Uh.... Sorry, Dude. My bad."

~~~
prescod
The whole Unicode brouhaha was triggered by Paul's description of the issue.
If he had said: "Arc doesn't support Unicode yet but I expect it would be easy
and I welcome patches" then it would have not been an issue at all.

Instead, he said: "I realize that supporting only Ascii is uninternational to
a point that's almost offensive, like calling Beijing Peking, or Roma Rome
(hmm, wait a minute). But the kind of people who would be offended by that
wouldn't like Arc anyway."

There was no reason to take a swipe at the "kind of people" who care about
internationalization, which is to say:

* people who are not mono-lingual anglophones

* people who want to build real-world applications

Why shouldn't people in those categories be interested in Arc? Why should they
be excluded? And why treat it as a matter of political correctness rather than
just a technological decision?

Paul threw the first punch and the blogosphere punched back.

Furthermore, his general tendency to divide and conquer the programing world
in that way is why there was already a huge pool of haters ready to pounce on
him. Every essay of his implies that there are people who get it and people
who don't and one can distinguish between them by seeing which people agree
with him and which do not.

He only needed to say: "Arc does not yet support Unicode" and the whole thing
would have been avoided.

~~~
vlad
With the understanding that arc was released as a tech demo of a personal
project at this stage, I took what pg wrote to mean that he thought he had
better things to do than make it do X (in this case, add and test
international support). I think there were enough disclaimers in the
announcement, web site, and posts to conclude this is all he meant. He simply
released his project as it was.

Why should PG say "unicode support will be easy and that he will accept
patches"? Wouldn't making promises be contrary to the entire disclaimer he
wrote? (Have you read disclaimers before? Paul's addresses at least as many
facets as those written by lawyers.)

If bloggers want to influence PG into feeling guilty in order to get him to
spend more time on arc regardless of his disclaimer of "when it's done, it's
done," they can do it by phrasing things like adults. Or, writing their own
languages.

(Great post. Maybe PG wouldn't have realized this perspective had you not
posted it for users to upvote. I know I didn't notice your perspective. On the
other hand, I wouldn't change a thing. You're expecting PG to plan for all
potential hurt feelings, be it Mac users, Linux users, corporations, people in
remote locations without access to the Internet because this means they can't
download arc, and more. This is impossible to do, and leads to nothing being
released at all.)

~~~
prescod
I'm not expecting PG to plan for hurt feelings. I'm expecting him simply not
to go out of his way to hurt feelings. For no particular reason he took a
technical time-to-release issue and turned it into a political correctness
issue (and therefore made the whole debate around it political).

Simply take this paragraph:

"Which is why, incidentally, Arc only supports Ascii. MzScheme, which the
current version of Arc compiles to, has some more advanced plan for dealing
with characters. But it would probably have taken me a couple days to figure
out how to interact with it, and I don't want to spend even one day dealing
with character sets. Character sets are a black hole. I realize that
supporting only Ascii is uninternational to a point that's almost offensive,
like calling Beijing Peking, or Roma Rome (hmm, wait a minute). But the kind
of people who would be offended by that wouldn't like Arc anyway."

And change it to:

"Currently, Arc only supports Ascii. MzScheme, which the current version of
Arc compiles to, has some more advanced plan for dealing with characters. At
some point (I don't know when) I or someone else on the Arc team will probably
figure out how to take advantage of it."

That's all. Say less. Stick to the technology. Avoid politics. Controversy
avoided.

