Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

An annoying aspect of getting into Octave was realizing that it had only very basic support for Unicode (no Unicode in filenames, variable names, comments...)

The main issue seems to be Matlab compatibility, which is stuck with UTF-16 (like some Windows internals), while most languages have moved on to UTF-8 as a standard (Heck, even Windows PowerShell (and Notepad?) default to UTF-8 now!)



Some time ago I even opened a feature request[1] to implement Unicode symbols like in Lean, for example. And to have the Language Server Protocol implementation[2] for better integration with various editors, including VS Code, Emacs, Vim, etc.

[1] https://savannah.gnu.org/bugs/?57103

[2] https://savannah.gnu.org/bugs/?57106


What the world needs is a pull request, not a feature request ;)

Especially for the first one. Being able to use greek letters as variable names would be incredible (now that it is possible to do so in Python and in C).


> Being able to use greek letters as variable names would be incredible

I've seen people do that in Haskell, I think. I've no idea why that's desirable. It's difficult to type up. It only adds complexity. Is it to be able to use variable names like they're used in mathematics? That's optimized for easy handwriting, through terseness, but because of just how terse it is it's incompatible with autocomplete, so it actually offers a much worse writability/readability ratio in an editor. Why use a miniature vocabulary of symbols when you can use English? I'd always choose `delta` over `Δ`. Actually, why use greek at all?

I realise my insight is a bit condescending. My background is plain programming. I've had to implement some algorithms from papers. I guess I'm venting having to interpret those symbols, where terseness was favored over readability.


> Is it to be able to use variable names like they're used in mathematics?

Yes. That's the main point (in my case). I'm really tired of having variables with ugly names like "alpha" and "beta", when I could have simply α and β, just like they appear in print.

> Why use a miniature vocabulary of symbols when you can use English?

Because that's the vocabulary which is already used in math, and I do not wont to change mathematics to adapt to the limitations of programming languages.

See, I don't coincide at all with your point of view, but allowing greek leters in variable names would not keep you from using your preferred naming conventions.

EDIT: it's not really "difficult to type" either. For example, in linux if you add

    setxkbmap -option caps:ctrl_modifier
    
to you keyboard configuration, then you can type greek leters by preceeding regular keys by CAPS LOCK.


I’m with you. It’s great the way Julia lets math look like math in code. And if you define a "dead Greek" key using xmodmap, typing the whole greek alphabet is no harder than typing uppercase letters.


> Where terseness was favored over readability

That's like saying that |||| is more readable than 4. Yes, it does take some time to learn the symbols. (I'm not sure what you mean about being incompatible with autocomplete?)

> Why use a miniature vocabulary of symbols when you can use English?

Why do you assume that English is going to be easier? Heck, even for a native English speaker not familiar with programming, √() is going to be easier to understand than sqrt() !

What would you say if you had to work on a code of Russian origin, would you prefer α or альфа ?

> It's difficult to type up.

Depends on your keyboard layout...

http://norme-azerty.fr

(Greek letters are accessible from AltGr+G.)


Why is UTF-16 considered very basic? Can't it encode everything UTF-8 can encode?


I guess that I formulated that poorly : Having to default to one of the UTF-16 instead of UTF-8 is another complication that makes it harder to do Unicode properly. (What happens when you paste UTF-8 into a UTF-16 file ?)

And I guess that they have to follow the lead of Matlab anyway, at the risk of going into another direction... EDIT : Yeah : https://savannah.gnu.org/bugs/?57103#comment7

Speaking of which, I should check how Scilab deals with it!


Yes it can. JavaScript and the web DOM use UTF-16 and nobody says their Unicode support is basic.

So UTF-16 is not the problem.


PowerShell and Notepad both use UTF-16, but will read/write UTF-8 to files (filenames are UTF-16).

It's a bit complex, mainly due to hysterical raisins ^W^W historical reasons.


'Member how "Bush hid the facts" ?

----

https://docs.microsoft.com/en-us/powershell/scripting/dev-cr...

> In PowerShell 6+, the default encoding is UTF-8 without BOM on all platforms.


Yeah, that's external IO, but nice that they made it more default.

I got used to reading UTF-16 text rendered as ASCII when dealing with NT :)


I guess that’s improving a bit. The new release improves Unicode support, mainly in regexes. But still not Julia-level Unicode integration.


This isn't related to Matlab compatibility... Matlab uses UTF-16 internally but recent versions work with UTF-8 input seamlessly (converting automatically). Octave uses UTF-8 on Linux which works well, but last time I checked has broken Unicode support on Windows.


> Octave uses UTF-8 on Linux which works well

It doesn't :

On Linux :

- Can't use Unicode filenames

- Can't use Unicode identifiers

(- Actually, you can use Unicode in comments, not sure why I wrote that...)


Octave on Linux has no problem with UTF-8 filenames, it can read from and write to such files without problem.

It's true you cannot use non-ASCII UTF-8 names for script and functions, since in that case the file name is also the command/function name. But that's a statement about valid identifiers, not supported filenames. Every language I know of has restrictions on identifiers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: