Another source code that demonstrates how things that are easy to write, read and understand in FP languages are a nightmare in Rust. Maybe making functional code indecipherable is not the best example of what Rust may actually be good for.
I'm actually finding the opposite. This is mapping very nicely to Rust. But of course not as nicely as it would if I could encode things as lambda terms in the language itself. I'm currently working on an impl of Fn for the Expression. Maybe it'll work out.
This code isn't really meant to be an easily digestible read. I don't make much effort to explain the church numerals for example, I just show that it works with a few tests.
I personally find a lot of good FP principles come into play in Rust quite frequently.
My first thought is, that code should be plain ascii and preferably english.
But ultimately it's fine and nice that the language supports Unicode. It should be a matter of code guidelines and of the particular project setting the preferred way beforehand.
It should not be a decision by people implementing the language but not all projects using that language.
If you look at the Unicode identifiers RFC's discussion[2] you'll see the steps we're taking to address and mitigate that and other problems. That being said if you look at the tracking issue[3] you would also see that we haven't had the man power to fully implement this feature, which is why it will remain opt-in, nightly only for the medium term.
Hmm. What would you attack? That is, if you were writing a crate that I was thinking of using in my program, how would homographs allow you to compromise my code?
How would people end up using your evil crate though? I guess you could rely on having people copy and paste code from your documentation or tutorials, but if they just type the name then they get the original.
Here’s a somewhat contrived “attack”: I write a crate with two functions that are named similarly, wait for you to integrate it, then file a pull request using the Unicode function name (and hence calling the malicious function) and be relatively assured that if you looked up the function you’d stop reading code at the ASCII one.
The code published on crates.io does not need to match the code in your github repository, so... you don't need unicode homographs to do this kind of attack.
While it's definitely cute and amusing, this is exactly why languages that specify their source code in non-ascii encodings are a bad idea. Fully 92.6% of working programmers have no idea how to produce a λ or γ on their input device of choice. Maybe a third of those could eventually find it with a character picker tool.
Java started this (in the unicode era anyway -- yeah yeah, APL). The community quite sanely rejected it as a matter of style. I have no idea why it hasn't been killed dead in new designs.
Because people in non-English-speaking countries are perfectly within their rights to name source code entities in their native language instead of either having to mangle words into ASCII representations (possibly even leading to ambiguities in case of near-homonyms that have identical ASCII bastardizations!) or being forced to awkwardly translate domain terminology into English which is a task that may be any of the following: a) against organization policy, b) not what programmers are paid to do, c) lead to ambiguous or outright misleading translations, d) require good English skills from anyone touching the code, e) make documentation more difficult, f) present unnecessary hurdles when onboarding new people to work on the code.
Sure, everybody has 'rights' but if I were to come across a workplace in my country where programmers were using much of our native language in the code I would consider it a red flag and I would expect to find amateurish practices in other places as well. The thing is that English is the lingua franca of programming and that ASCII is the lingua franca of source files. The highest quality manual that you are going to find is in English and stack overflow will answer your questions in English. Also, various tools are going to respond to non-ASCII characters in various ways which may not always be the most pleasant. You are bound to get yourself some problems that you would not have gotten otherwise.
For me it seems bad to design some systems in English like public administrations software or following a law very closely (there are even no translated concepts in things like taxation and legal system) and data exchange formats should be in the language it matters (for example, all electronic bills in Spain are XML with entities in Spanish), so if you want a 1:1 mapping you need to use native language.
Sure, you could consider it unprofessional, but I'd wager that the large majority of all professional programming is done in languages other than English. The tooling issue is an "is vs. ought" problem. In 2019 full UTF-8 compatibility is not a feature anymore; on the contrary, a lack of support is a bug.
Do you have any citation on that ? From what I know the vast majority of all professional programming is done in English. Even companies like SAP that had a lot of programming in German have shifted to programming in English.
I'm an english native speaker, so on the one hand, I'm glad english is so common in programing. On the other, I can't help but wonder, aren't other languages actually better for the job at hand? Chinese, Japanese kanji, or to a lesser extent, Hangul, all have this kind of awesome mix of concision and expressiveness. I'm not really a language guy, but I've been tempted to learn chinese just for this reason. It also has a really cool feature of being much easier to read than to write - which makes a certain level of 'reader-first' coding practice built into the language itself. You could cut down on the number of sigils, line noise, even play around with different ways to organize source code (there are, for instance, palindrome poems written in chinese that use two dimensions, which would be basically impossible in english).
I'm not disagreeing with you because sometimes it's just project requirements - but as a developer in a non-English-speaking country - the idea of writing code in a non-English language is ridiculous to me.
Programming languages and standard libraries are written in English, programming terminology is invented in English, the international community adopted english as a defacto standard, almost all third party APIs expose English interfaces.
Every time I've seen a project written in my native language it's been ridiculously hard to follow because the terminology is translated in various ways (there's no official or even widely used terms because hardly anyone uses native language in SW development and CS outside of academia) and it's a mish mash between languages since everything not written in the project (along with language keywords) is written in English.
I don't usually take a hardline stances on things but this is one thing where it would be 100% a dealbreaker for working on a project.
Sure one should stick with English when working on an industrial standard codebase that many people on the western hemisphere will work on. Unless off course these folks are limited to the southern continents where Spanish or French might be more appropriate.
And outside the western hemisphere where off course if the code base is unlikely cross, say, outside the Chinese borders, where Mandarin would be more appropriate.
Then there are non-industrial standard codebases, say a hobby project. I’m sure that somewhere in Guinea-Bissau there is a 17 year old learning Python at this moment, really happy that they can write Portuguese special characters as variable names in python 3 (if only the documentation had been translated to Portuguese).
Just because English is the appropriate language in by far the majority of cases, does’t mean we should deny the possibility of not using it.
As a Finn, I don't see how the additional characters ä and ö (which in Finnish are not umlauted a and o but separate letters in the alphabet) could suddenly cause invisible bugs like that. But I can very easily see how ASCIIfying ä and ö to a and o can lead to real misunderstandings (just as an example väärä means "wrong" or "incorrect", but vaara means "danger" (also "esker" but that's less likely to cause confusion :P) Germans might be fine with transcribing ä and ö to ae and oe but in Finnish that's not proper.
Also, bugs caused by accidental homographs in identifier names are caught by any reasonable type system, just like any typos. As for your edit, you can't stop clever people being clever without code reviews anyway, no matter what the language.
Agda[0] and Coq and other languages for formalising mathematics (they are also programming languages) make a lot of use of Unicode to make the mathematical statements readable. Here is an example:
> ∑-+-distribute : ∑ (A + B) C ≃ (∑ A (C ∘ left)) + (∑ B (C ∘ right))
This is quite readable and would be understandable to someone who did not know Agda or the library in particular, but who understood the subject material. If the characters were replaced by words the overhead would be big.
The Plan9 operating system invented the compose key, which makes writing unicode characters easy enough for anyone to use them. Unfortunately, it requires a bit of setup to make the compose key work in X on Linux or *BSD (I have described th process here[1]). Agda and Coq therefore use special input modes for Emacs and other editors.
I also use Unicode in my Latex documents to make them more readable. [2]
This is true but it is still a choice. In my spare time coq project I use ascii exclusively in my source files. Same thing for my PhD thesis in theoretical physics that was in LaTeX.
> The Plan9 operating system invented the compose key, which makes writing unicode characters easy enough for anyone to use them. Unfortunately, it requires a bit of setup to make the compose key work in X on Linux or *BSD
I disagree. I looked through the source code to Principia a while back, and it makes really effective use of Greek script, black-letter script, math symbols, etc for implementing a physics engine. (It's a mod for Kerbal Space Program that implements n-body gravitation.) It's really well written and used just enough to make things more understandable. (They've got a great units system too; it's worth reading the code just for that.) Here's a random selection: https://github.com/mockingbirdnest/Principia/blob/master/phy...
Like anything else, it can certainly be misused. That doesn't mean it's never worth using though.
Greek letters are good for transcribing equations that were written with Greek letters. The equations were written like that because you only get one letter per variable and you need more than 26 variables across multiple equations, which you want to be consistent. You only get one letter per variable because on paper you write down each equation about fifty times as you work with it, so it's kept short as much as possible.
Unless you speak Greek, I can't think of any other reasons for Greek letters to wander in to code.
But because you learn physics and maths on paper, you learn the equations this way. And once you've spent 10 years practicing physics with Greek letters, it's really annoying to be forced out of it when you program.
100% this. To understand code written with Greek letters you'd also need to understand the author's intentions for using said letters. Is mu an average or coeff of friction? Why should I lean on a crutch of context for a casual skim of code? It doesn't help readability and it surely isn't enjoyable to write.
It all depends on context. Short identifiers absolutely help readability. I feel like some people absolutely go too far with overly verbose variable names, especially in Java or C#.
Notation matters. What if I told you that to add two numbers you had to write it out as plus(number, number)? Clearly you'd riot. The reason people complain about short IDs is because they're not used to it, but there are a lot of domains (math, science) where using traditional notation enhances understanding. Hell, even in programming if someone wrote sequenceIndex instead of i when iterating over the indices of an array I'd think they were trying to troll.
Aditionaly, in statistics Greek letters denote parameters, while the modern Latin alphabet we all use today is for variables.
The data `x` has a normal(mu, Sigma) distribution, for example.
These sorts of conventions enhance readability by providing extra information. You know instantly that `y` must be data, while `theta` is a parameter.
I thought Racket (starting with BSL) to incoming freshman at Northeastern University, and they would nearly riot every year when we tell them it's `(+ 1 1)` for the next 4 months (at least).
If it's hard to write then you can just improve your editor; there are lots of ways to do that, and most of them involve improving the software rather than hardware. Having a lousy editor is no excuse!
I don't think r² as a variable name is a good idea. It may look natural for some (including myself), but for others it implies a function while not being one.
Yea, that's fair; it's definitely one of the more surprising things that they do. On the other hand, it's pretty common to pass around squared values that the math says that you ought to have taken the root of, when you know how long it takes to do that root. You're going to end up with some kind of convention for it, so it might as well be a short one.
I think it’s important to allow Unicode for developers to be more inclusive of other languages and cultures. This is at the forefront of the community of Rust.
> this is exactly why languages that specify their source code in non-ascii encodings are a bad idea.
Seems like your problem isn't that the source has a specified encoding, but that the syntax allows for non-ascii characters in identifiers, which is actually an unstable feature of Rust, only available with nightly compiler builds. See the feature flag: https://github.com/nixpulvis/lalrpop-lambda/blob/master/src/...
Specifying ASCII for the source code, as your post suggests is better, would leave everyone stuck using \uxxxx escapes for non-ASCII data in string literals. That's unnecessary when you instead specify UTF-8.
I like to look at it from a different perspective:
Programming has mostly looked the same since we moved away from punchcards, and it seems worthwhile exploring different approaches (eg visual programming) and augmentations to the existing system. Now that we have UTF-8, it's trivial to represent all sorts of characters and use them for coding, and it's only natural that some people try.
Now, while I personally haven't seen a convincing application of using non-ascii beyond comments, that doesn't mean they don't exist. And if people come up with such a way, and a large number of people decide it's a good idea, then tooling in whatever form will follow and make using this new style of coding easy.
I feel like arguing that this should never be done because people don't know where to find the symbols is a bit like arguing we shouldn't build electric vehicles because we don't have a charger network on highways yet..
I think it reads fluently to anyone who is studying the lambda calculus in some way, but it's not fluent for the editor; or they use Latex. I suspect that's the motivation for Julia, Agda, and Coq also allowing unicode.
Also most programmers wouldn't care to implement toy projects on the lambda calculus as part of their job. This is like complaining that most doctors couldn't program their own MRI machine.
I see no problem with this outside of the set of core Rust technologies. If you were to require the average developer to type mathematic symbols it would be a barrier. However, libraries specifically targeting mathematicians would obviously not have that same barrier. Making mathematical equations clearer to read and easier to work with within that domain may certainly be worth using syntactic tricks and non-ASCII symbols.
I've seen plenty of code bases which are not in English, since the developers did not have English as their first language. This is usually a well-contemplated decision - for instance, the Ruby language source code is specifically written so that you don't require knowledge of Japanese to understand the core libraries or the C reference implementation.
A dedicated global compose key solves this problem very elegantly. Hitting the ◆ compose key (right alt for me) followed by a series of intuitive characters inserts the corresponding character.
For example:
◆ - - - produces an em dash (—)
◆ - - . produces an en dash (–)
◆ ' e produces é
◆ | c produces the cent symbol (¢)
Usually, you can just guess the combination and be right 3/4 times. Otherwise, it's fairly easy to look it up, or create it if it doesn't exist yet.
Some distros of Linux have this built-in, but I use WinCompose[1] on Windows.
macOS has something very similar with the option key, but the set of characters doesn’t include all Greek letters which is incredibly annoying: https://sites.psu.edu/symbolcodes/mac/codemac/
Also works in the Julia REPL, and jupyter notebooks. I'd hope that the tooling for any language which allows unicode variable names would support this.
Emacs has this plugin (well not a plugin because it’s built in). One can select an input mode. Some translate directly (e.g. a Greek input mode where typing a gives an alpha). There is also a TeX mode like what you describe and another mode which is more concise (e.g. &a for alpha, where the means Greek). The difference from what you described is that instead of some escape sequence for these special prefixes, they are just never converted if they don’t match and you move the point away.
My tip: on recent macs, if you have two keyboard layouts, there's an easy option to make caps lock switch them. (Great so long as you got a mac with a real escape key too...)
> Fully 92.6% of working programmers have no idea how to produce a λ or γ on their input device of choice
I can imagine a simple remedy for this. Have the IDE suggest symbol replacements whenever the use types the name of the symbol. 'lambda' causes λ to be suggested.
on Windows you can simply type `Windows key` + `.` and easily find it there, tho if languages use such characters they really should also have alternatives
ASCII is typeable by everyone. I'm german. I know how to type äöüß they are native keys on my keyboard. I know how to type \lambda in LaTex. I don't know how to get the λ key. However, I can type ASCII. A russian knows how to type cyrillic letters, and while very likely also knowing their way around ASCII, probably can't type λ easily either. Like it or not, ASCII is the least common denominator. There is nothing gained by having your macro called λ instead of lambda. It's just a redundant gimmick that makes code harder to type.
> I have no idea why it hasn't been killed dead in new designs.
Well there is the use case that someone can't or doesn't want to learn/use english and their native language instead. These people definitely should be able to do that. For me personally, I've arrived at the conclusion that I personally do not want to code in projects in my native tongue. Simply because then a) it's only visible to a limited community and b) I constantly have to translate english concepts/ideas into the native tongue or risk having a half-english/half-german mixup. Other people might arrive at other conclusions, that's their thing and therefore they should be able to code in their native tongue and script.
This "native speakers" question is different from the question of using λ instead of lambda in an otherwise english codebase, or using Chebyshev's cyrillic name vs an ASCII transliteration. And here my opinion is different: some people may benefit from being able to code in their native tongue, but nobody has any benefits from λ or suddenly having a set of cyrillic characters as function name.
If you check the lib.rs of the project, you can see that it opts-in to nightly features. Naming anything λ is not possible in stable Rust right now. This will change though whith a newly merged RFC [1]. Fortunately, great care had been put on preventing homoglyph and mixed script attacks. My biggest issue with the change is that it's not opt-in but some future compiler release will accept non-ascii idents by default. It's easy to see why they arrived at this outcome: the entire thread is full with political ideology. According to the people in the thread, it might make someone feel excluded if they had to put #![allow(non_ascii_idents)] to lib.rs or an analogon to Cargo.toml... Seriously...
The keywords etc. are all in english anyway, an additional #![allow(non_ascii_idents)] can't be such a big issue can it.
I hope, like in java, non-ascii idents in mostly english codebases will get rejected by the Rust community as bad coding style. Fortunately there will be an option at least to forbid non-ascii identifiers without needing additional tooling and I guess I'll enable it in all of my own codebases.
You don't think λ-calculus deserves the symbol `λ!` in Rust? Anyone who is learning or interested in λ-calculus should be aware of what it means (that's the whole point).
I generally agree we should all be so fortunate as to be able to read the code put in front of us.
And you clearly know how to type a λ, you did it 6 times in your post.
Your computer has better tools that don't require you to google it. Even the character picker is better, since it doesn't require loading a webpage (and some operating systems have better character picker programs than others). Emacs lets you directly search for a character by name; perhaps your favorite editor has a similar capability. If it doesn't, perhaps you could improve your editor.
Amazing, 46 comments, all about my choosing to use the correct symbol for the macro. It's not like I choose some super misleading Unicode characters, like for example, how macOS smart quotes might change "foo" into “foo”.
Again, technically correct... maybe programming languages should match these kinds of quotes too.
But this is all more or less beside the point. I can type `λ!{x.x}` easily, since I have a Greek keyboard on my system (and a λ on my wrist, and a cat named π).
It's even more of a small deal as I've included the equivalent `abs!{x.e}` and `app!(e1,e2)` macros. In fact `λ!` and `γ!` are just macro "aliases".
Also, given the prominence of α-renaming, β-reduction, and η-reduction, I'm very glad I can use these symbols in Rust.
Very cool! I think you've inspired me to implement something like `Expression << Expression` much like your `$`.
I'm currently trying to determine the best way to implement `From<Expression> for u64` (or maybe `Option<u64>`...), so I can convert both to and from `u64` types as church encoded numerals. Eventually the goal is to `impl Into/From<Expression>` for all the types one might use, giving a horribly inefficient runtime for Rust ;)
It currently does both with and without η-reduction, and generally should work similarly to http://lambda.jimpryor.net/code/lambda_evaluator/ a very cool online JS lambda evaluator.