
Writing Friendlier Clojure - whalesalad
http://adambard.com/blog/write-friendlier-clojure/
======
diego
Nice post. As the author of the original snippet I'll admit it wasn't meant to
be production-quality code, just a toy example in the context of a blog post.
When I find code like that in a reasonably large production codebase I tend to
refactor it too. I do believe in optimizing for readability by default; in
Clojure this sometimes requires a bit of effort because of the "stalagmite"
effect OP describes.

------
mraison
Nice post! What about the following (untested) version of the sentence
function:

    
    
        (defn sentence [mapping]
            (->> :start
                 (iterate #(pick-next-word mapping %))
                 next
                 (take-until #(= (last %) \.))
                 (clojure.string/join " ")))
    

There is no loop, and it almost reads like English: "Using :start as a
departure point, iteratively pick the next words until you find one that ends
with a dot. Join all the words generated so far with spaces."

Small catch: take-until is not yet in clojure.core, however it's easy to add
to your utility toolbelt and there's already an issue open for it
[http://dev.clojure.org/jira/browse/CLJ-1451](http://dev.clojure.org/jira/browse/CLJ-1451)

~~~
adambard
Iterate was what I was looking for! That version is definitely better, even if
you have to DIY take-until.

I, er, may have spent a while clicking around the related docs for loop and
recur searching for a better function. I remembered there was something like
this out there, but I never did find it.

~~~
mjcohen
So many programming languages are like magic: you have to know the name of
something before you can control it.

------
lkrubner
In terms of cleanly communicating one's intent, I'd also suggest a look at
Sean Johnson's talk about letting the function arguments determine what
version of the function is used. This is a common idea in most programming
languages and some Functional languages, such as Haskell, tend to take it to
an extreme. And yet, it is not as common as it should be in Clojure.

One of his rules is "If your function starts with a conditional, get rid of
the conditional and look for that pattern in your function arguments." That
is, move the conditional into the different versions of your function.

Also, his "Recursive Function Pattern Matching Pattern" makes recursion much
easier to read. See the screenshots here and then click through to the video:

[http://www.smashcompany.com/technology/the-recursive-
functio...](http://www.smashcompany.com/technology/the-recursive-function-
pattern)

------
Wonnk13
I really enjoy Adam's blog. Very plain english explanations for everything.
This and Clojure for Brave and True make the language much more approachable.

------
papaf
I know that writing code comments isn't in fashion at the moment, but both the
original and the refactored code would have benefited from comments.

------
thom
For me, this refactoring doesn't solve anything of note, and it introduces
risks of its own. The function make-markov-mapping returns a weird
intermediate data structure (a lazy sequence of tiny maps). You don't want
that getting out into your codebase, which wasn't a risk in the original.
We've also further embedded the string handling, which means when it comes to
refactor away from just strings, we have more work to do and more places to do
it. Also, just in terms of readability which is of course slightly better,
we've still got things like nested threading macros which are a big cognitive
overhead.

Ultimately, we're revealing no new insight about the underlying problem and
offering no new capabilities for the future.

~~~
thom
So, worried that I was being overly-negative, I thought I'd offer something of
substance. If you wanted to separate the Markov bits from the string bits, you
could do it thusly:

    
    
        (defn markov-chain [coll]
          (let [adjacencies (partition 2 1 coll)
    	    initial-chain {::start [(first coll)]
    			   (last coll) [::end]}]
    	(reduce 
    	 (fn [chain [predecessor successor]]
    	   (assoc chain predecessor (conj (get chain predecessor []) successor)))
    	 initial-chain
    	 adjacencies)))
    
        (defn successor [chain item]
          (rand-nth (get chain item)))
    
        (defn follow-chain [chain]
          (->> ::start
    	   (iterate (partial successor chain))
    	   (remove #{::start ::end})
    	   (take-while some?)))
    
        (defn sentences [text]
          (clojure.string/split text #"\."))
    
        (defn words [sentence]
          (remove empty? (clojure.string/split sentence #"\s+")))
    
        (defn markov-text [text]
          (->> text
    	   sentences
    	   (map words)
    	   (map markov-chain)
    	   (apply merge-with into)))
    
        (defn render-sentence [words]
          (str (clojure.string/join " " words) \.))
    
        (defn sentence [chain]
          (render-sentence (follow-chain chain)))
    

I'm not saying this is in any way strictly better, but it leaves you with
abstractions that can be reused, and text-specific stuff in one place that
could be a separate namespace. This is what I'd look for in a code-review of a
refactoring like this - not just neatness, but making abstractions explicit
and separate, naming as many things as possible, etc etc.

------
kazinator
TXR Lisp:

    
    
      (defun wordlist-sentence (sentence)
        [(opip
           `@1.`
           (split-str @1 #/\s+/)
           (cons :start)
           (remqual ""))
         sentence])
      
      (defun make-markov-mapping (sentence)
        (let ((wordlist (wordlist-sentence sentence)))
          (hash-from-pairs (mapcar (ret ^(,@1 (,@2))) wordlist (rest wordlist))
                           :equal-based)))
      
      (defun markov-data (text)
        [(opip
           (split-str @1 #/\./)
           (mapcar make-markov-mapping)
           (reduce-left (op hash-uni @1 @2 append) @1 (hash :equal-based)))
         text])
      
      (defun pick-next-word (map word)
        (let ((choices [map word]))
          [choices (rand (length choices))]))
      
      (defun sentence (map)
        (for
          ((words ())
           (this-word :start)
           (next-word "X"))
          ((not (eql [next-word -1] #\.))
           (cat-str words " "))
          ((set next-word (pick-next-word map this-word))
           (set words ^(,*words ,next-word))
           (set this-word next-word))))
    
    

Run:

    
    
      $ txr -i markov.tl
      1> (defvar map (markov-data
        "Influence in society, however, is a capital which has to be economized \
        \ if it is to last. Prince Vasili knew this, and having once realized that \
    
        \ if he asked on behalf of all who begged of him, he would soon be unable \
        \ to ask for himself, he became chary of using his influence. But in \
        \ Princess Drubetskaya's case he felt, after her second appeal, something \
        \ like qualms of conscience. She had reminded him of what was quite true; \
        \ he had been indebted to her father for the first steps in his career. \
        \ Moreover, he could see by her manners that she was one of those women-- \
        \ mostly mothers--who, having once made up their minds, will not rest \
        \ until they have gained their end, and are prepared if necessary to go on \
    
        \ insisting day after day and hour after hour, and even to make scenes. \
        \ This last consideration moved him."))
      map
      2> (sentence map)
      "Prince Vasili knew this, and having once realized that she was quite true; he
     could see by her manners that if it is to ask for himself, he felt, after hour,
     and even to last."

~~~
kazinator
sentence function based on giterate instead of for loop.

    
    
      (defun sentence (map)
        (let* ((words (cdr [giterate true (op pick-next-word map) :start]))
               (dot (pos-if (op eq [@1 -1] #\.) words)))
          (cat-str [words 0..(succ dot)] " ")))
    

Requires small weak to pick-next-word

    
    
      (defun pick-next-word (map word)
        (let ((choices [map word]))
          (if choices [choices (rand (length choices))])))
    

giterate returns an infinite list, but we take from it up to the word which
ends with dot, ignoring the endless stream of nil objects which follows.

Alternative: partition the list on the ending word, then glue the first two
partitions:

    
    
      (defun sentence (map)
        (let* ((words (cdr [giterate true (op pick-next-word map) :start]))
               (part (partition-by (op eq [@1 -1] #\.) words)))
          (cat-str (append (first part) (second part)) " ")))

------
moomin
The implementation of "sentence" highlights that Clojure really needs an
"interate" function with built in termination. Because no-one uses iterate...

~~~
kazinator
I cannot quite agree.

The problem here is that termination is indicated by a word which ends in a
period, and that word must be included in the word list. This is a poor form
of "in-band signaling".

Kludgy sequence operators which take or produce one extra item which _doesn
't_ satisfy the inclusion predicate can be avoided by proper design of the
data representation which avoids this type of framing.

TXR Lisp version fixed to use the t symbol for start, nil for end, and
straight giterate while true. (I.e. while the sequence element isn't nil):

    
    
      (defun wordlist-sentence (sentence)
        [(opip
           `@1.`
           (split-str @1 #/\s+/)
           ^(t ,*@1 nil)          ;; list starts with t, ends with nil
           (remqual ""))          ;; blanks not permitted
         sentence])
    
      (defun make-markov-mapping (sentence)
        (let ((wordlist (wordlist-sentence sentence)))
          (hash-from-pairs (mapcar (ret ^(,@1 (,@2))) wordlist (rest wordlist))
                           :equal-based)))
    
      (defun markov-data (text)
        [(opip
           (split-str @1 #/\./)
           (mapcar make-markov-mapping)
           (reduce-left (op hash-uni @1 @2 append) @1 (hash :equal-based)))
         text])
    
      (defun pick-next-word (map word)
        (let ((choices [map word]))
            [choices (rand (length choices))]))
    
      (defun sentence (map)
        (let ((words (cdr [giterate true (op pick-next-word map) t])))
          (cat-str words " ")))
    

In this representation, the words which end with a period are still there, but
they have a next-word entry in the map: they all point to nil.

So we have a nice "spindle":

    
    
                      t
                      |
          +---------+-------+----+
          |         |       |    |
        various   start   ..... words 
          |         |       |    |
                     .
                     :
                     .
                     :
            |         |       |
         sentence.  end.   words.
            |         |      |
            +---------+------+
                      |
                     nil

~~~
moomin
It's not ideal, but there's plenty of stuff that looks like this in reality.

Personally, I favour Haskell's unfoldr
[http://hackage.haskell.org/package/base-4.8.1.0/docs/Data-
Li...](http://hackage.haskell.org/package/base-4.8.1.0/docs/Data-
List.html#v:unfoldr) which both doesn't require the output to be the same as
the intermediate values and cleanly supports termination.

(There's actually a very good reason Haskell cares about termination: it means
you can be precise about side-effects. With a classic lazy list, any side
effects caused by it are determined by how many items you actually evaluate.)

~~~
kazinator
I've looked at this before; I decided to implement it finally, calling it
expand-right. (I use the "reduce" terminology rather than "fold", so the
antonym has to be consistent).

With explicit lambdas, decrement from 5 to zero, not including it in list:

    
    
      1> (expand-right (lambda (item) (if (zerop item) nil (cons item (pred item)))) 5)
      (5 4 3 2 1)
    

Using functional combinators:

    
    
      2> [expand-right [iff zerop false [callf cons identity pred]] 5]
      (5 4 3 2 1)
    

Include zero; bit of a mouthful:

    
    
      1> [expand-right [iff null nilf [callf cons identity [iff zerop nilf pred]]] 5]
      (5 4 3 2 1 0)
    

Though it can do it, it doesn't look like a great interface when inclusion of
the last element is required.

Comparison with the inclusive iterate (which I did implement and called
"ginterate"\--thanks for the nice name:

    
    
      1> [ginterate [chain zerop not] pred 5]
      (5 4 3 2 1 0)

