
Rules for Writing Clojure Programs - fogus
http://twoguysarguing.wordpress.com/2010/07/26/7-rules-for-writing-clojure-programs/
======
Raphael_Amiard
My naive 2 minutes solution :

    
    
        (reduce #(assoc %1 %2 (inc (get %1 %2 0)))
                {}
                "abcaaabcccaabrrcb")
    

EDIT: While it's shorter and (in my opinion) a lot clearer than the OP's
solution, it's also consistently faster. Functionnal code and composition is
cool, but i think the author got trapped into making his code beautifull, or
maybe just into the wrong algorithm for the job. It happened to me before in
clojure and other functional languages

~~~
Lozzer
Or as mentioned in the comments

    
    
        (frequencies "abcaaabcccaabrrcb")
    

with some optional sorting if that's required.

------
KirinDave
I and many other experienced people on #clojure@freenode looked at this.

Heard comments such as:

    
    
        “Huh.”
        “I can't believe -> and ->> are being demonized now.”
        ”I think if you're calling things rules you should be somewhat careful about how you phrase them...”
        “Someone is wrong on the internet.”

~~~
sqrt17
The real reason why you want ->> can be seen in the code example for rule #7:
instead of mapping the result of partitioning, it (wrongly) reuses a (i.e.,
the original list).

If bugs result in irregularity, then making regularity explicit (by using ->>
or other macros/higher-order functions) can be a good thing.

~~~
KirinDave
Good catch!

That's a very good point, and a very hard bug to see when you "get lazy" and
use a, b, c as step names.

------
shasta
I was going to complain that parts of the article were disjointed and
contradictory (Use -->, don't use -->, JK use it) until I noticed the name of
the blog is "Two Guys Arguing."

Solution from the comments: (apply merge-with + (map (fn [c] {c 1})
“abcdaabccc”))

~~~
ataggart
The problem with the latter solution is that apply isn't lazy. The more
correct solution would be a simple reduce:

    
    
      (reduce #(update-in %1 [%2] (fnil inc 0)) {} "abcdaabccc")

~~~
runevault
I wonder if update-in or merge-with would be more efficient.

Another thing to consider is if you want to get fancy (persistent! (reduce
#(update-in %1 [%2] (fnil inc 0)) (transient {}) "abcdaabccc"))

Still no REPL but this should work and take advantage of transients if you're
doing it against a bigger dataset (which I assume you care about since you
also worried about laziness)

------
praptak
A tradeoff to be aware of: the introduction of sort raises the complexity of
the solution from O(n) to O(nlogn).

~~~
aphyr
I saw that and flipped out. It also requires an extra two scans through the
list, in addition to the memory overhead for the temporary structures.
Happily, the merge-with suggestion is much more efficient.

------
d0m
Don't take it harsh, but if it takes all _this_ for this simple theoretical
problem, imagine for a real world problem. In my opinion, a pure iterative
approach is really simpler:

(pseudo-code) :

<code> myset = {}

foreach c in mystring: if myset[c]: myset[c] += 1 else: myset[c] = 1

myset.sort(fn x,y: myset[x] < myset[y])

</code>

As I said, it's all about opinion (and I know there are million ways to
simplify that or make it different, i.e. myset.get() or whatever), but I find
the imperative approach way easier to gasp at first glance.

Compare that to:

(let [a (sort str) b (partition-by identity a) c (map (juxt first count) a)]
(sort-by second > c))

First, I don't really like a,b,c as name variable (to be fair, myset and
mystring are terrible too). Second, I find there are so many extra step to
resolve this simple problem.. we need: partition-by, we need identity, we need
juxt (wtf is juxt?). Third, wtf are first, count and second..?

I know high-level function are kewl, and working with lists are even cooler
and using all of this in a 4 lines in a pretty clean way is again even cool-
cooler. However, when I read this, even if there are less lines than the
imperative way, it still takes more time to understand what the code is doing.

~~~
mquander
Sorry, but I think we'll have to agree that this just depends on background. I
can figure out what the second is doing significantly quicker than the first.

When I think about this problem, my mind immediately thinks "well, I want to
take all the characters, group them into buckets, one per letter, and then
count how many are in each bucket." I don't think "I want to make an empty
map, starting with each character mapping to zero, iterate over every
character and increment an associated counter in the map, and then read out
the counters at the end." If I have to write the second -- because my language
has bad list libraries or no higher-order-functions -- I have to "translate"
the first into the second in my head.

The Clojure code expresses the computation in exactly the same way it lives in
my head. The imperative code expresses that like a computer does it, and to
even understand it I have to "step through" it in my head a little bit to make
sure the computer is doing the right thing.

------
bitsai
A friend and I are working through Project Euler as part of our effort to
learn Clojure. My code is littered with the pattern of (map #(list % (f %))
coll) all over the place... Very glad that I read this article (and the
comments), and I think it's time for me to start exploring the clojure and
clojure-contrib API's in earnest.

------
herdrick
(frequencies "abcaaabcccaabrrcb")

~~~
Avshalom
as a note, the source is:

    
    
      (defn frequencies
      "Returns a map from distinct items in coll to the number of times they appear."
      {:added "1.2"}
      [coll]
      (persistent!
       (reduce (fn [counts x]
                 (assoc! counts x (inc (get counts x 0))))
               (transient {}) coll)))
    
    

which I think is pretty much just Raphael's solution.

------
WilliamLP
Can someone convince me what is gained by stressing out over making an easy
solution to a trivial problem concise and elegant like this? You have a
perfectly good imperative solution which is the way almost any child would do
it (go through things one by one and keep counts of every bucket), so what is
gained by messing with that? Is it not time to move on to the rest of the
program?

~~~
derefr
You're thinking like someone who most often _writes_ code. Think like someone
who most often _reads_ code—a maintenance programmer, say—and the benefits
become obvious. We get away in this industry with writing things that, upon
being encountered again, immediately provoke "throw it away and start again"
comments. This sort of code will not provoke those sentiments.

(That's not to say that it's actually good code—the article itself points out
the problems—but those problems can actually be fixed by _modification_ , not
_replacement_ , of the code, which is what we should all be striving for.)

~~~
WilliamLP
Are you really going to defend that this isn't trivial to read:

    
    
        str = "abcaaabcccaabrrcb"
    
        dict = {}
        for ch in str:
            if ch not in dict.keys():
                dict[ch] = 0
            dict[ch] = dict[ch] + 1
    
        sortedItems = sorted(dict.items(), key=lambda x: x[1], reverse=True)

~~~
sreque
No need to pick on poor python here :). With the pysistence library, you can
write code that mirrors other solutions presented in this thread:

reduce(lambda d, c: d.using( __{c:d.get(c, 0)+1}), "abcaaabcccaabrrcb",
make_dict())

I don't know enough about HN to know why it doesn't display the two *'s that
need to be be right after d.using(. I also couldn't figure out how to do
keyword arguments in python where the key is a variable, not a constant, so if
any python expert knows how, you could remove 4 characters from the above
solution! :)

~~~
WilliamLP
Yes, very cute and concise, and you know Python and I don't:) My point is to
ask why to devote yourself to "rules" and optimizing conciseness of a trivial
problem here. I don't think "readability" is a defense, because I believe my
program (not being a Python programmer) is nearly completely free of guile,
still fairly concise in terms of symbols, and instantly readable even to a new
programmer.

So where's the payoff?

~~~
gruseom
It's interesting and revealing that you're not getting a good answer to your
question. In these types of conversations, most of the back-and-forth is
really just name-calling -- where the names are things like "readability"
instead of "poopoohead" -- on top of statements of personal preference. It's
easy for us to debate _reduce_ vs. _merge-with_ vs. _foreach_ , but difficult
to have a meaningful conversation about _why_ we prefer what we prefer. I
happen to have a few minutes between finishing my coffee and getting on with
things, so here are some reflections.

A lot of people who are into FP simply find the concise style more satisfying.
That's not regarded as a legitimate reason for preferring one choice over
another, so this kind of discussion often hits a what-you-can't-say dead end.
We've all seen too-clever code that was just awful, and we all keep saying
that cleverness for its own sake is bad, and that position dominates the
discussion in a heavyhanded way right now. Perhaps that's because it's
necessary to keep the pseudo-clever crowd in check. I doubt it, though,
because nothing keeps them in check.

Like it or not, the truth is that one factor driving what programmers do _is_
intellectual satisfaction. This has always been and will always be the case,
no matter how many schoolmistresses (most of them, of course, programmers
trying to master their own temptations) beat us with the cane of "business
value is the only thing that matters" and "no one cares about your stupid
clever code".

(I'm by no means arguing that the impulse to make things "cute and concise"
ought to take precedence or have free rein, but rather that this factor needs
to have a seat at the table before we have a chance of working out a balanced
approach.)

The mathematicians have one over on us here. In their world too, the most
important thing is getting the job done, but elegance is a close runner-up:
it's esteemed for its own sake, and more importantly it's well understood that
there's an intimate dialectic _between_ elegance and getting the job done. If
we started thinking that way, we might finally make some progress out of these
language game dead ends.

Switching gears to the technical matter of what the concise functional
approach buys you (apart from elegance, for those who find it elegant), the
trouble is that there's no way to answer this question at the level of a code
snippet because _all_ approaches yield reasonable snippets in intelligent
hands. The real question is: what are the impacts of these varying approaches
on building whole systems? That question, as far as I can tell, remains
completely unaddressed! Except for the inevitable platitudes about side-effect
free code being good for concurrency and so on. Yet that is the question that
matters, and our code snippet debates are just the drunk looking for his car
keys under the street lamp, even though he didn't drop them there, because the
light's better.

Unless you believe that the choice of programming language is irrelevant
(which is a little like saying an artist's medium is irrelevant), it's clear
that different languages have different classes of programs that are easy to
write in that language, and thus different classes of programs that will tend
to evolve in it. Each version of an evolving program conditions the next
version, so this influence gets compounded over time. (Code snippets are at
the rudimentary extreme of that process, which is why they have so little
comparative value; what we really need to see is how those snippets evolve
into much larger systems.) The language itself and the current version of the
program written in it condition the very stream of ideas about what you might
do next. So there's a profound influence here, but we don't know how to study
it. Failing that, the best approach is to let a thousand flowers bloom and see
what systems emerge, but even there historical accident plays a huge role in
where the development energy goes and thus what systems ever have a chance to
emerge.

Personally I find that the imperative style is easiest in the small and the
functional style is easiest in the large, and the trick is to find a design
that allows the two to be fused with some regularity. But then I work in
Common Lisp so I _would_ say that.

Excuse the length, no time to make shorter.

~~~
WilliamLP
Thank you for the post, it was interesting and made me think. I especially
liked this point: "but rather that this factor needs to have a seat at the
table before we have a chance of working out a balanced approach."

I agree that I would like to see how functional approaches can benefit large
systems, in concrete terms, since almost all of the examples I've ever seen
involve micro-code like this article. I'm especially skeptical that it can
make a dramatic difference for the kind of messy business code that today is
usually solved by large object-oriented systems and modular libraries.

