
Haskell version of Norvig's spelling corrector - marcosero
http://marcosero.com/blog/norvig-haskell-spelling-corrector/
======
quchen
> I wrote this code putting brevity over readability, which is something I
> usually never do

Shouldn't the point of such a post be to show interesting code? I'm having
trouble reading through the densely packed source.

In addition to tromp's minor nitpick, I have several major ones.

\- the code is full of redundant parentheses. HLint can detect those (and many
other style errors) automatically. LPaste has HLint installed so you have a
linting pastebin available online.
[http://lpaste.net/116871](http://lpaste.net/116871)

\- A lot of the functions are written in a non-idiomatic way. "m >>= return .
f" is "fmap", "(.)" can combine functions much more readable than Lisp stacks
of parentheses.

\- ByteString.Char8 is usually a wrong choice, more on that here:
[https://github.com/quchen/articles/blob/master/fbut.md#bytes...](https://github.com/quchen/articles/blob/master/fbut.md#bytestringchar8-is-
bad)

\- If you count to "length x" then often there's a more elegant solution that
avoids calculating the length altogether. For example "splits xs = zip (inits
xs) (tails xs)".

\- Brevity is never better than readability.

\- No top-level definitions should lack a type signature. GHC even has
warnings for that (I think they start firing with -W).

\- A function should do one thing and then be composed with other functions.
"lowerWords" converts to words and then maps them all to lower case, for
example. These are two completely different operations in one long line.

\- In order of increasing generality: foldr union empty = unions = mconcat =
fold

\- Use pattern matching, avoid "(!!)". transposes w = [ a ++ [b0,b1] ++ bs |
(a, b0:b1:bs) <\- splits w] - also see
[https://github.com/quchen/articles/blob/master/fbut.md#head-...](https://github.com/quchen/articles/blob/master/fbut.md#head-
tail-isjust-isnothing-fromjust-)

\- For large amounts of words that you split and concatenate again, String is
probably not the right type. Text is good for dealing with such things.

\- replaces w = [as ++ [c] ++ bs | (as, _:bs) <\- splits w , c <\- alphabet]

... and so on.

~~~
marcosero
Hi, author here. I think you partially missed the main purpose of the article,
which for me was just having fun by playing with a language I'm currently
learning. I wasn't try to teach anything to anyone.

But I must say, thanks for the great feedback! Lots of stuff I didn't know
that we'll make me write better Haskell code :)

~~~
flebron
Well, what you wrote is "The main reason I did it was to see what Haskell is
capable of compared to other languages such as Python." The problem is that
what you coded isn't what Haskell is capable of :)

------
tromp
Minor nitpick: the first real line of code

    
    
      alphabet = "abcdefghijklmnopqrstuvwxyz"
    

is better written as

    
    
      alphabet = ['a'..'z']
    

This is really syntactic sugar for

    
    
      enumFromTo 'a' 'z'
    

using the function

    
    
      enumFromTo :: Enum a => a -> a -> [a]
    

from the typeclass Enum for enumerable types, and the fact that a string (type
String) is just a list of characters (type [Char]).

~~~
evincarofautumn
As long as we’re picking nits…

> I wrote this code putting _brevity over readability_

Overall, this is not particularly terse, for Haskell code. With all the
lambdas, it looks like OCaml! For example, these are equivalent, and I find
the latter clearer:

    
    
        (sortBy (\(_,c1)(_,c2) -> c2 `compare` c1))
    
        sortBy (flip (comparing snd))
    

Now, it’s not necessarily a bad thing to be explicit, but in cases such as
these, it’s less repetitious to just use the standard library functions.

~~~
quchen
For reverse sorting, there's a type that does specifically that.

    
    
        sortBy (comparing (Down . snd))
    

See [http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-
Or...](http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-
Ord.html#t:Down)

~~~
evincarofautumn
True, but I prefer not to use typeclasses in that way.

------
flaie
Interesting read for a Haskell newcomer like me!

Regarding the original webpage of Norvig's spelling corrector, I think it is
not up to date as I remember browsing the web and finding some shorter
versions in other languages.

I've shortened the Python version to 14/15 lines using some features of
Python3.

------
wyager
Cool! Since we're suggesting changes, here's what I'd do. (Not that anything
is wrong with the OP's code, just that it's good to point out all the
different stylistic techniques you can adopt.)

    
    
        7. alphabet = ['a'..'z']
        8. nWords = B.readFile "big.txt" >>= return . train . lowerWords . B.unpack
    

or:

    
    
        8. nWords = train . lowerWords . B.unpack <$> B.readFile "big.txt"
        

Make `splits`, `deletes`, etc. values (not functions). `splits` has access to
`w`, so there's no need to pass it as an argument 4 times (or even to pass `w`
as an argument to the other functions).

    
    
        27. sortCandidates = (sortBy (flip (comparing snd))) . M.toList

~~~
codygman
I used to compose return with a series of pure functions as well, but I found
that using liftM seems cleaner.

~~~
codygman
example:

    
    
        nWords = liftM (train . lowerWords . B.unpack) (B.readFile "big.txt")
    

There was recently a very good article[0] about practically using monads that
mentioned using liftM.

However whenever using a functor instance is possible it's probably better,
since functors can't do as much as monads. I'm not quite sure how much this
would help/apply to this small example though.

0: [http://softwaresimply.blogspot.com/2014/12/ltmt-
part-3-monad...](http://softwaresimply.blogspot.com/2014/12/ltmt-part-3-monad-
cookbook.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+SoftwareSimply+%28Software+Simply%29)

------
bshimmin
Not to make any particular point, but mainly just because I fancied a bit of
procrastination this afternoon, here's a CoffeeScript version (heavily leaning
on Underscore):
[https://gist.github.com/benshimmin/2ee78c932797faadfc89](https://gist.github.com/benshimmin/2ee78c932797faadfc89)

------
dschiptsov
Which "proves" again that programming is neither about OO nor about purity..)

~~~
dschiptsov
Where I am wrong? This is a straight-forward translation of non-OO Python code
into Haskell, isn't it?

So there we can't see any "benefits" of truly-OO (original code has been
written in a "functional style") or pure-functional approaches (the code has
no "benefits" being converted into a pure-functional language).

Lists and Sets are "classes" in Python, but it doesn't matter, because
implementation of "basic" types does not alter the behavior - sets could be
implemented out of Lisp's conses.

Btw, knowing who the author is and seeing some "functional patterns" in Python
code, it is very probable that original corrector has been prototyped/written
in Common Lisp, then re-written in Python, and now re-written in Haskell.

The point was in an elegant algorithm and compact implementation, not in
language of choice or in particular programming paradigm.

