

Let's Take a Trivial Problem and Make it Hard - mbrubeck
http://prog21.dadgum.com/41.html

======
sethg
The Haskell Data.Array.Diff module provides arrays that use mutation behind
the scenes, so that updates are O(1), but provide a pure functional interface.
So if "c" is the 256-element array of byte counts, and "i" is the index of the
element you want to increment, then you can do this:

let c' = c // [(i, c!i + 1)]

To quote the documentation: "When the // operator is applied to a diff array,
its contents are physically updated in place. The old array silently changes
its representation without changing the visible behavior: it stores a link to
the new current array along with the difference to be applied to get the old
contents."

For more information:
[http://www.haskell.org/ghc/dist/current/docs/libraries/array...](http://www.haskell.org/ghc/dist/current/docs/libraries/array/Data-
Array-Diff.html)

I don't know if something similar exists for Erlang, but if not, I assume it
wouldn't be terribly hard to implement.

~~~
old-gregg
Can you implement your own Data.Array.Diff in Haskell or it's just a
standardized fallback to C? I'm on chapter 6 of 'Real World Haskell' and I
honestly don't know yet.

~~~
sethg
It appears, from skimming the source code, that they use "unsafePerformIO",
which is Haskell's "I want to trick the compiler into thinking that this
imperative code is pure" function. (It's not part of standard Haskell but I
think all implementations provide it.)

You can use unsafePerformIO in your own code if you really need it--it's
basically a giant loophole in the type system, so if you don't know what
you're doing, you can screw yourself, but if you really need that power, it's
there.

More here:
[http://www.haskell.org/haskellwiki/IO_inside#Dark_side_of_IO...](http://www.haskell.org/haskellwiki/IO_inside#Dark_side_of_IO_monad)

------
silentbicycle
That's a problem for which imperative solution has a very direct fit. If you
want to do _everything_ in a purely-functional manner, some stuff is just
going to be awkward or hard to do efficiently, period. There are other cases
where the functional solution is ideal, and trying to solve it imperatively
becomes clumsy and bug-ridden. A sufficiently large project will probably have
at least a few of each (or a small implementation of half of Prolog, etc.).

This is why using a multiparadigm language (like OCaml or the various Lisps)
or an FFI to work in a few complementary languages is often a practical
approach. Working solely in a pure-(one thing) language often involves over-
committing to a specific trade-off.

Edit: Or, yes, the imperative sub-language embedded in the state monad.

~~~
eru
Or Haskell and the StateMonad, if you need imperativeness. (Though I do not
advocate it for this problem.)

However, functional solutions to this specific problem are not cumbersome at
all. See for example this snippet of literate Haskell. It assumes that you
have already put your binary data in a list of chars and return a Map
(similary to a Python dict) of frequencies:

    
    
      > import qualified Data.Map as M
      > freqMap :: [Char] -> M.Map Char Int
      > freqMap list = foldr op M.empty list
      >     where op :: Char -> M.Map Char Int -> M.Map Char Int
      >           op c freq_map = M.insertWith (+) c 1 freq_map

~~~
silentbicycle
Could that be adapted to work for a stream, rather than a complete list in
memory? That could become a constraint on the problem under reasonable
circumstances. If it was a list of chars, it makes sense to just accumulate
counts over the list (which is what you're doing, I think).

I'm not great at Haskell (GHC > 6.6 won't install on my main computer...
<http://hackage.haskell.org/trac/ghc/ticket/1346> ), and while I understand
the type system etc. from using OCaml, in that, I would just do this
imperatively and give it a functional interface.

I'm not just being difficult, I'm curious how you'd go about it. (Edit: Cool,
thanks.)

~~~
mbrubeck
Since Haskell lists are lazy, they aren't necessarily in memory all at once.
For example, the standard "getContents" function reads a file and returns a
list of characters. But since it returns a lazily-evaluated list, you can read
arbitrarily large files and they will be streamed efficiently as you consume
the list.

This is the same laziness that lets Haskell operate on infinite lists like
[1..]

~~~
eru
Yes, I guess you are right. I have been bitten a few times by questions of
memory before, so I was cautious in my answer.

There's still a lot for me to learn about how lazyness impacts memory
requirements. (Some times Haskell was lazier than I was expecting and kept
thunks in memory that were bigger than their eventual result.)

~~~
silentbicycle
While it probably wasn't clear, that's what I was wondering about when I asked
about using streams. I find using lazy evaluation by default confusing
sometimes, so I think about it terms of using streams or iterators. (Or are
they called generators? The kind of function that, when called, returns either
the next value or an end-of-stream sentinel.)

~~~
eru
Python calls them generators. Haskell lists display somewhat similar
behaviour.

Although I still have to learn a lot, lazyness by default brings more benefits
than it costs IHMO.

------
sethg
Ironically, I suspect that a better example of a problem that's easier to
solve with mutation than with a pure functional interface would be graph
reduction, which is a technique for evaluating purely functional languages.

<http://en.wikipedia.org/wiki/Graph_reduction>

~~~
eru
I guess at least it will be far easier with mutations than in a strict pure
functional language.

------
tophat02
I think the question posed by the author is misfocused. The right question, in
my mind, is:

Will there ever be "one paradigm to rule them all"?

My guess is no. This type of simple, trivial problem screams for a structural,
imperative approach. Other problems call clearly for mutable objects, others
for declarative DSLs, and yet others for a pure functional approach.

Functional purists will tell you that mutable state is ALWAYS bad. OOP purists
will tell you that straight-structured programming is ALWAYS bad, and
structural purists will tell you that hand-optimized but difficult to
understand assembly code is ALWAYS bad.

They're all wrong, of course. Pick the right tool for the right job. Now, it's
true that not all programmers are skilled enough to do that, but I consider
that to be a project management/source control problem, not a language
problem.

~~~
tetha
In fact, one can argue that the one paradigm (or, the one philosophy) to rule
them all is known. It is the UNIX way of doing things: have a lot of simple,
little programs reading a well-defined input on on an input pipe, producing a
well-defined output on the output pipe. That way, each problem can be
implemented in a language fitting the problem (read this: A language which can
model the input data and the output format in the most natural way) and you
will end up with a lot of simple programs, all well-readable and all very
beautiful.

Of course, this has a first (minor) problem: You need to know multiple
languages. However, I consider the "problem" of many people not knowing
multiple languages a chicken-and-egg-problem. If you just know a single
programing language and never learn a second one, you will not learn how to
learn programming languages. If it becomes very natural to know many
languages, learning a new language becomes very, very simple (I for myself
found it to be a very good way to get a good grasp on the runtime model of the
language.Once you know the runtime model of a language, actually programming
it is easy).

However, the worse problem is: Grasping and actually designing things in the
UNIX-way is hard. In fact, hard is not the right term. It is _different_ from
the way such design was taught to me. I for myself recently massively
restructured a (simple) code generator into many smaller programs,
communicating via pipes. At first, this felt very akward, because.. it was
different? However, now that I am implementing and using this, the new version
does have massive benefits. (In fact, webservices appear to be the unix-
philosophy in different clothes). So I guess it is just a problem with peoples
minds.

~~~
silentbicycle
It's a good technique for trying to keep complexity from getting out of hand.
Unix pipes seem similar in many ways to a kind of message-passing concurrency
(like in Erlang), before it was cool. :) The OS itself keeps tabs on the
processes, handles buffering, etc.

The Unix style seems to fit together relatively poorly with complex type
systems, though. Some programs (compilers, in particular) need very little
interaction with the outside world, and tend to require complex internal data
structures. Again, no one paradigm fits everything.

------
derefr
A slightly different viewpoint: a local array of 256 elements is, semantically
speaking, no different than 256 separate variables. Assuming that we had 256
registers laying around to keep them in, tail-call optimizing a 256-variable
recursion would just leave those values in their registers between loops. And,
since the stack is just a place to hold registers we aren't paying direct
attention to, we shouldn't treat the 256 variables any differently because
they're on the stack instead. Thus, mutability is actually the _correct_ tail-
call optimization in this case.

~~~
mononcqc
With Erlang, a similar effect could be achieved with the process dictionary:

    
    
      > erlang:put(1,1).
      undefined
      > erlang:get(1).
      1
      > erlang:put(1,erlang:get(1)+1).
      1
      > erlang:get(1).
      2
      > erlang:put(2,3).
      undefined
      > erlang:get().
      [{1,2},{2,3}]
    

It's limited to a single process and thus doesn't break the 'no-shared-memory'
model. It also lets you access and modify data without copying it, so the
closest implementation of such a function would need to be defined that way
and would only return the final list.

Of course, this breaks the idea of 'purely functional' which is what the
author found problematic.

------
old-gregg
_Totals could be switched from a tuple to a tree, which might or might not be
better than the setelement code, but there's no way it's in the same ballpark
as the C version._

Unless I am mistaken about his suggestion, O(n log(n)) isn't quite "not in the
same ballpark" as O(n), and if you replace a suggested tree with a data
structure with O(1) lookup then we're precisely in the same ballpark.

~~~
calambrac
You mean, for instance, with a destructively updateable array? A novel
concept, if only he had thought of that.

~~~
old-gregg
Medium complexity for a hash table is O(1) and when your keys are single bytes
from [0..255], a good stdlib implementation shouldn't be far off from that. In
other words I just found his "not in the same ballpark" remark to be an
exaggeration, although I understand where he's coming from: I am learning
Haskell at the moment and oftentimes I ask myself how practical the pure
functional style can get.

And what's up with this passive-aggressive attitude?

~~~
calambrac
I didn't think it was all that passive. It's frustrating to read a comment
that's nitpicking the article by _making the exact same point as the author_ ,
but using big-O notation as if that were the issue. Hint: it's not. The point
is that whatever purely functional data structure is being used, either the
whole thing is being copied on every single recursive call or some tricky
behind-the-scenes magic is going on to make destructive-looking actions not
actually destructive. Either way, you're nuts if you think its approaching C
speed.

------
yason
Bah, just take a <http://c2.com/cgi/wiki?SufficientlySmartCompiler> and have
any unnecessary rounds of copying removed while still retaining purely
function style in the source code. Right, this wasn't so hard?

