The problem with it, I've found, is the lack of debugability. You can enumerate a hashtable, and you can inspect it in the debugger. You generally can't enumerate a function. I've found that with complex systems, the biggest correlation to programming productivity is the amount of debug information available, not the size of the code. Trading off debugability to get short code is usually a poor trade-off, once you get past a certain code size.
I don't think this point is made often enough. Brevity is a virtue certainly, but ultimately what counts is not LOC, bug counts, iterations/sec etc but features delivered to users. Worse is better reigns supreme here which is why you so often see engineering and design disasters like php, craigslist, myspace, and windows brushing aside far more elegant but less engaging alternatives. For most of us the pride we take in our craft makes this a very hard pill to swallow but swallow it we must.
Regarding Arc, there's still a distinction, since as you point out you can't enumerate the domain of a function like you can the keys of a hashtable. (And you also "call" a hashtable in Clojure.)
No, it's really not. Brevity != clarity. Clarity is hard to argue with, brevity is not. By his own admission this new code is harder to write and harder to debug.
It's interesting to me that in the article the author dismisses any performance impact of his changes because "we can profile and deal with it latter", but fails to realize that by attempting to reduce the length of his code he is at risk of creating exactly the same type of needlessly hard to read, hard to debug, code that premature optimization does.
Of course if other people are reading your code then you should consider what current programming practice is. But the current practice in Clojure is, for cultural reasons, to make things really succinct. So you can assume those reading your code are ok with that.
Up to a point, that's true, but if you start shortening words by removing letters, you've gone too far.
Concise programs often appear laughably unreadable to people who are used to a definition of "clarity" that derives from more verbose languages. At one time they appeared that way to me too. But with time spent working in less verbose languages, to my surprise I found myself wanting to use shorter and shorter names, not to interfere with clarity but to enhance it.
It takes less effort to read a word than to trip over an unfamiliar abbreviation, so while the code may be more concise, it is less clear. If your code relies upon the reader to know a bunch of specific idioms, then it might be concise, but it isn't clear; it's jargon.
Nor all all words arbitrary collections of letters, most words have some form of alternating consonant vowel structure that makes them easy to speak and remember. Write a program that just arbitrarily combines letters and I doubt you'll get much out that looks like words; there are lots of patterns in how we create words, it isn't arbitrary. If you can't read your code aloud and sound reasonable then you've probably inappropriately abbreviated your identifiers.
Map is by far a better name than mp; there is no shortage of vowels; map is already a short identifier and a real word.
One doesn't just sit and speed-read math in a single pass. Nor poetry. Good programs have that kind of density and deserve the same consideration; they are not newspaper columns. The trouble with the shallow notion of readability that says "I should be able to random-access any line of code and understand it right away" is that it results in less intelligible whole programs, which is a net loss of readability and of other things as well.
I wish more good programmers understood this. The bad ones we can just write off, but too many good ones have been schooled to focus overly on the line-of-code. If you have a million lines of code, it doesn't much matter how readable an individual line is; nobody's going to be able to read the whole thing. What we should be striving for is to produce an equally functional program with orders of magnitude less code (which would automatically be a more functional program). This requires a quite different notion of readability.
Math notation is horrendous by the way, too many implicit assumptions about the readers background knowledge. See what Gerry Sussman has to say about it. The great thing about code is it forces one to be explicit, which forces one to actually understand and learn.
Reducing the size of a program by an order of magnitude requires better abstractions, not shorter selector names.
This is heresy, but I actually think it might be better if it weren't a real word. I find I benefit from inventing names for functions or data abtractions. It helps to clear the mind of what those things might be.
Plus making map into mp or whatever makes it much more googleable. And don't get me started on Clojure having map and a Map.
By the way, there is a variant of "googleable" that I find to be a valuable property for names in a codebase, and that is "greppable". I try to make sure that the name of any important concept is a unique string in the program, so a simple grep will yield all the places it occurs. I'll even rename things that aren't as important, if their name includes this string, to preserve said uniqueness. (And no, the 'find usage' feature of IDEs doesn't come close to satisfying this need, since a concept can occur in many ways that don't yield to code analysis, such as comments.)
I'm just saying that I prefer having that hook, and that destructuring assignment lets me do so with brevity and clarity.
There's some low-hanging fruit to cut down on the required storage, namely wrapping to-words and frequencies together, and memoizing that. This doesn't reduce the number of hashtables, but at least there's no need to store the entire word lists. As it stands, this will simply fail if your corpus is too large to fit in your RAM all at once.
A limited-use but handy technique, and they're not arguing for dogmatic adherence to the rule. Something to keep in mind, but don't go re-wiring old code unless you're bored. You could also code the function so if you pass no key-args, you get the whole hash back.
I'm personally a fan (sometimes) of code like this when I really want to one-line something but need multiple values:
value1, value2 = [(x=aFunction(arg))[:key1],x[:key2]]
Honestly it's this kind of thing that makes me wonder if the Pythonistas don't have a point about clarity vs cleverness.
x = aFunction(arg)
value1, value2 = x[:key1], x[:key2]
value1, value2 = aFunction(args).values_at(:key1, :key2)
If LOC was so important for readability, we could just write in C without using any newlines.
A rough gauge for better code might be how much a skilled coder can grok by glancing at N lines of code. This means that reducing LOC can be helpful, but it's not a 1:1 correspondence — your example of C with no newlines would make it harder to read. On the other hand, putting what is essentially a hash destructuring on one line actually does tighten up the semantics.
edit: my most-epic-single-line is a range definition, and I actually use this one as-is with a description of how it works:
[((left=prefix*10*scale-1)==-1 ? (scale==0 ? prefix-1 : prefix) : left)..((right=prefix*10*scale+(10*scale-1))==-1 ? prefix-1 : (scale > 0 ? right-1 : right))]
So, let's say we have two documents, and want to calculate their distance. Each document has 2,000 words. We can either scan each word in each document once, counting them in memory and processing a total of 4,000 words, or we can start with a word in one document, scan both documents for all occurrences of that word, move on to the next word, scan both documents for all occurrences of that word, and so on, for a total of ~N! scans. Given a choice between a really efficient function running in factorial time, or a longer, uglier function running in constant time ... I'd kind of prefer the latter.
Am I way off the mark here? Is there something I misunderstood?
No. You can just pass in the appropriate, memoized frequency function to call. In the new code, that just means that "freq" would be an additional argument to euclidean, instead of a globally defined function.