Now, this and other articles about the difficulty of writing standard algorithms in functional languages makes me want to see FP's pitch be unbundled. The thing is that FP is pitched as a useful thing to learn to get an insight into programming, improve your mental fitness, see things at a higher level and so-forth. But it's also pitched as a way to be more productive and effective as a programmer. Effectively, it's sold as an exercise cycle and a real bicycle, a way to improve yourself and good way to get somewhere.
Now, I'll acknowledge that functional programming probably is great for the higher-level view of programming and such. But the challenges outlined in this article makes me doubtful fp is a productivity tool, a system which will allow some large number of programmers to make a large step in productivity.
I mean, if fp doesn't make simplistic programming faster and stands in the way of a large number of algorithmic approaches, it seems your productivity would consistently hobbled. Is there a counter-argument to this problem?
At the end of the day, you still write some stateful, imperative code.
It's just that your utility/work functions are pure, which means that you have predictable side-effects: you've stuck them all in one (imperative) place, or perhaps a couple places, but you're guaranteed to know where they are because the type system enforces notating which code has the ability to cause side effects. (Makes it easy to grep on type signatures, for example.)
Most of my workload is dominated by high level concerns: correctness, large-scale data flow, coordination of state, and security.
The purity of functions means that I explain all of the constraints/solutions of those problems in pieces, which I then compose with a clear data flow between them (since they can't have shared, mutable state), and only at the end, lift them to operate on the state of the system (in a stateful, imperative block).
This way of programming gives me much more confidence in replacing pieces of the code without worrying about the side effects, because the side effects are predictably related to that code.
At the end of the day, that predictability of side effects makes the majority of my work MUCH easier, even if it adds complexity to some low level data manipulation.
I easily write 10x the high level code I do low level code, so a 20% savings on that code, even if I double the complexity of low level code, is a net savings on my time/effort.
tl;dr: Functional programming is a productivity booster if most of your work is of a high level nature, even if it makes low level manipulations harder to do. Interfacing with low level imperative code can be (and is) regularly done, to get the best of both worlds.
In fact, I've always made a point, when working in a semi-functional language, of writing as much code functionally as possible. My experience is that it has the same benefits as you describe. So I don't know how much more there is to be gotten by actually switching to Haskell. My guess is, most of the additional benefit would come from the type system. But maybe I would be motivated to write even more code functionally.
You certainly can write similar code in ML, Java, or C - or whatever language strikes your fancy. It just helps when the type system is lined up with that goal, such as forcing you to denote where side-effects occur, and ensuring only functions that expect side-effect based code can operate on them. (For what it's worth, I've used both ML and Scheme to write similar style code as part of a programming languages class; and similarly, the C code I write relies on the functional ideas as much as possible.)
I just like the combination of Haskell's syntax and type system, so I prefer it to something like C (where functions aren't exactly a first class datatype, and the type system is somewhat weaker).
By the way, you don't need to live without static typing in Lisp. Shen provides a powerful type-safe layer on top of Lisp.
If you don't mind temporary impurity, Debug.Trace provides some helper functions, but they call unsafePerformIO, which can do unexpected things.
Once you learn the techniques and programming styles of Haskell, you can certainly apply those ideas elsewhere, but they lose a lot of their power when you (or another developer on the same code base) start "cheating".
From a personal perspective, I program a lot in Haskell and I find it perfectly fluent.
Just don't expect to take an algorithm predicted on assumptions of a particular memory model and translate it unmodified without at least using the embedding trick I mentioned above.
Fortunately, you're in luck. The whole point of this series is to culminate in a description of the "embedding trick" I referenced above. Part 2 is written and being edited now. It introduces a slow, simple, buggy way of writing Union/Find "purely".
Part 3 will fix the bug and then explore a fast embedding trick.
Thus far there are no dependencies on outside libraries (though eventually I'll have to use `containers`) so you can just drop it into GHCi and play around!
For that matter, I'm curious on any performance comparisons between algorithms such as the DLX and any "FP" equivalent. This already exist?
Again, let me emphasize this is a worst-case analysis. Many common algorithms are equivalent regardless, and that's part of the reason tel specifically names Union/Find as such an algorithm, as it's actually somewhat rare to encounter one where there isn't a Big-O equivalent algorithm that is pure. Pure functional algorithms often require some modestly clever amortization to get there, but that's perfectly valid both in theory and in practice (many beloved "imperative" structures have amortized complexities, too, including such basics as a hash table).
In practice pure functional can sometimes come with larger constants; whether they hit you in practice depends on your use case and, often, sensitivity to garbage collection pauses.
In light of all that, I think it would be fair to say that your question is somewhat ill-defined. You can really only compare particular algorithms against each other, because there's no trivial equivalence between "imperative" and "pure functional" algorithms. Plus, the barrier between the two in practice is quite fungible... especially in a garbage-collected imperative language nothing stops you from using a "functional" algorithm, and every practical "functional" language will give you a way of running "imperative" algorithms directly. (Yes, even Haskell. But tel is building up to that. I won't give it away yet. Stay tuned.)
I have to confess your "easy to show/see" isn't immediately obvious to me yet. However, that is as likely because I haven't tried hard to see as anything else. This post is between other things I'm trying to get done. Poorly already, digging into this is something I fear would not help.
I am interested, though, so more pointers and explanations would be greatly appreciated.
1) Take your algorithm that involves updates to memory.
2) Split it up so that "memory" is represented by an ADT - in the original imperative setting, it's logically a hash table - O(1) read and write.
3) Replace that ADT with a binary tree. Now you have O(lg(n)) read and write.
4) If the previous algorithm was O(f(n)), it can't do any particular thing more than O(f(n)) times, including memory access. So in the worst case, you've made O(f(n)) things take O(lg(m)) times as long, so the new algorithm must be in O(f(n) * lg(m)).
Constants are always highly situation dependent. If you are replacing a single memory lookup with a tree traversal, that's going to be a huge difference. If, for some reason, access to your mutable variables is already an expensive operation, it might not make much difference at all. If you need to take periodic snapshots of your world state, the mutation-free version might come out way ahead sharing portions of the tree that don't need to be copied.
And this is why I particularly asked about the DLX algorithm. It is specifically made for rapid backtracking. Reading briefly  shows that it was even made parallel to speed it up. ("made it parallel" is a gross simplification of course.) Is a very interesting read on methods to make a heavily mutation based algorithm parallel.
A snapshot of freely mutated memory is a O(N) copy.
A snapshot of a immutable tree is a single O(1) pointer copy (you just need to save the root).
Doing a full copy every change would be tremendously costly (substantially more than the penalty for walking the tree on that change, and probably overwhelming the overhead ofr walking the tree on reads).
Doing a full copy every hojillion steps would of course amortize to cheap (and probably the overhead from walking a tree for reads and writes would overwhelm it).
Anything real will of course fall somewhere between. As I said, constants are tremendously context dependent.
Note that this (of course) doesn't speed up the mutation free version - but if you have the constraint of wanting regular (or otherwise cheap) snapshots then using the mutation-free version can be the cheapest way of doing that.
I don't know the details of the DLX algorithm, or adaptations to it, well enough to say much about it in particular off the top of my head. I'd love to dig into it at some point, but I've unfortunately got higher priorities presently.
It would probably use many of the same tricks as the immutable structures. Which is why I would assume they would be equal. (That is, I realize that immutable structures don't do a full copy on every "change." Depending on the structure and the change, they don't even do a copy at all.)
Consider, we basically just described how git works, no?
Particularly interesting, genuinely "persistent" data structures (as used by Okasaki in Purely Functional Data Structures - which is a fantastic read and tons of fun) can give amortized worst-case bounds even in the presence of a malicious consumer of the data structure picking which operations to run against which historical versions of the object.
DLX is actually a great example of this sort of thing, as the whole point is that the operation to remove an item from a doubly linked list can be reversed to add the item back just from looking at the item that was removed.
And again, consider the way that git works. Nobody would call c and the techniques they use immutable, but the underlying data structure is.
More directly put, I am not calling for all data structures in a classical "mutation based program" to be mutable. I am curious about some of the more famous mutation based algorithms and if there are good comparisons to the analogous versions.
There was a great post here a few weeks ago about the blind spot in functional programmers in building trees. Having just seen the "threaded trees" for what I think is my first time, I have to confess it took me longer to make than I would have thought. Mainly because I was trying to hold on to some of the "immutable" methods of programming.
Certainly it is possible to find alternative constructions that work. Occasionally these may still be faster. However, I strongly contest that it's "tougher" in a non-mutable context. Specifically:
'In a mutable language this can be done by simply updating the head pointer easily enough. In a non-mutable language this is tougher in some respects.'
This is wrong. The hard part about this is making sure old things pointed at by existing snapshots don't change. If your data is immutable, you get that for free.
Moreover, in terms of complexity of the system, (mutating algorithm + a bunch of stuff to capture the mutations) is likely to be messier than the nonmutating algorithm (which is sometimes cleaner than the mutating version to begin with, but certainly not always).
Also, note that you've moved to talking about "mutable languages", we had been talking about algorithms.
'This is pretty much the thing that tripped up a ton of folks back from the early days of java. "Why doesn't x.replaceAll('a', 'b') change the value of x?" was not an uncommon mistake to encounter.'
Which is clearly a problem with "non-mutable languages"? The problem there is that the Java paradigm had been strongly mutation oriented and then they dropped an incongruous mutation-free "method that is really more of a function" in there. Clarity and consistency are important in any setting.
"DLX is actually a great example of this sort of thing, as the whole point is that the operation to remove an item from a doubly linked list can be reversed to add the item back just from looking at the item that was removed."
But you can't do that if you might be sharing those lists with someone else. The point is that the constraints imposed by immutability are often the most effective means of addressing other constraints, and so study of these things is quite valuable. This thread has never been "Haskell is much better because it doesn't let you mutate anything!" - both because I've already acknowledged that in many settings the mutable versions of algorithms are preferable and because Haskell does let you mutate things (you just have to be explicit about what).
"And again, consider the way that git works. Nobody would call c and the techniques they use immutable, but the underlying data structure is."
As an aside, you can write C with very little mutation going on, if that's what you want to do. I've not looked at the git source, so I have no idea the degree to which they do.
As I said, though, that's an aside - my main point here is that immutable data, and algorithms working with it, are valuable and there are situations where they are the best solution even where mutation is "allowed" and even where a mutation-heavy version might be preferred in a slightly different setting.
"There was a great post here a few weeks ago about the blind spot in functional programmers in building trees."
A cursory search isn't turning this up - it sounds interesting. Do you have the link?
I think how I'm talking past you is I am perfectly fine with mutation based algorithms using immutable data structures. That is, union find can easily be done using a standard Scala immutable Vector for the array. Only caveat is each "mutation" has to be of the form "x = x.setValue(index, value)".
So, the question I give you is do you consider that a mutation based algorithm or not? I would, as the heart of the algorithm is still based on the updates of the array. You are just safe in knowing that any place you have let a reference to it leak out is never going to change. This is both good and bad, of course. Depending on what you are doing.
Stated differently, I don't think we have been distinguishing between mutation based algorithms that use immutable data structures with mutation based data structures. (That is to say, I have not been concerning myself with that.) So, if you consider it an immutable algorithm as soon as an immutable data structure enters into the mix, then yes, most of what I've been saying is nonsense.
Seems unnecessarily restrictive to me, as just changing it such that each update to the underlying structure requires changing a pointer as well as following the data structure update is much less of a change than, for example, the story that is at the root of this discussion.
For Git, this is roughly what it does. If you do a repack, the new pack is only used once it is done. They rebuild the entire pack, then update the reference to the active pack. If you cancel the process at any point, the old pack is still good and still works. The process of building the structure is heavily "mutation" based, but once it is made, nothing is ever changed.
And you should look into the DLX made parallel. It is very different than how algorithms are made so in most popular literature.
If you'd like, try to instantiate something along the lines of `Mem (State (IntMap v))`. It's slightly trickier than it looks due to some typing issues (which don't matter, but I'll save the reveal). It's also likely to be slightly buggy.
Hopefully by part 3 I'll be able to poke very precisely at the seam between mutable and immutable algorithms and unlock how to pick between them with greater refinement.