

Are functional languages inherently slow?  - gtani
http://flyingfrogblog.blogspot.co.uk/2012/06/are-functional-languages-inherently.html

======
larsberg
> Functional languages require infrastructure that > inevitably adds overheads
> over what can theoretically > be attained using assembler by hand. In
> particular, > first-class lexical closures only work well with > garbage
> collection because they allow values to be > carried out of scope.

Inevitable is an overly broad characterization. For example, in this
particular case, Manticore uses a combination of control-flow analysis and
reflow analysis to inline higher-order functions in instances when we can
determine that the free variables (if there are any) are obtainable in some
other static way. This observation extends to the general allocation result;
modern functional language compilers do quite a bit to get that under control
(see my IFL paper on arity raising/unboxing/ripping apart datatypes, which
provides a survey of the state of the art circa 2009) and, particularly given
the additional register freedom when you're not wedded to the C calling
convention, can do some real magic to avoid performing allocations,
particularly in inner loops. GHC's is type-directed unboxing/type unrolling,
IIRC, which in practice is on par (modulo separate compilation issues) with
the implemented MLton work cited in this post.

So, it's less that poor performance is "inevitable" and more that it requires
fairly insane amounts of work by the compiler developer. And, for example, I
seem to remember that the Visual C++ team spent more person-hours on template
compiler error message clarity in one product cycle than we will spend on the
entire Manticore project in its lifetime. And yes, I know the "sufficiently
smart compiler" joke.

A more fundamental complaint is that functional language implementations tend
to provide _fragile_ performance. That is, without understanding some fairly
gritty details of the compiler and runtime, it is difficult to understand why,
for example, the second time you made a call to a non-inlined combinator
providing a second function argument at the same type, your program became
slower in MLton but not in GHC (YMMV; a representative but not authoritative
example). A good way to see this phenomenon is to look at optimized benchmarks
for OCaml or GHC (e.g. in the PL shootout) and ask, "is this idiomatic
ML/Haskell code?" Many benchmarks for functional languages that perform close
to C look suspiciously like the original C benchmark, with just a few
functional sprinkles on top.

I don't know of a good answer to that performance problem without requiring
users to become near-experts with the compiler, and that's not good practice.

~~~
ch
I am not convinced that requiring users to become near-experts in their given
compiler (and by extension their chosen platform) is a bad practice.

I think it is a fine balance when writing code to find a way to express
algorithms such that they a) are understandable to other developers and b)
maximize the opportunity for the compiler to optimize the code into machine
instructions.

That does not mean that compiler writers can stop after creating an assembler,
since its just turtles all the way up from there; but it does mean that
compiler writers should be thinking about how to maximize the
machine/developer interface.

Given this view, you certainly cannot extract language creators from language
compilers, which either leads us to a world where all work is done creating
new languages or one where we all develop in lisp ;)

------
calebmpeterson
> However, few practical applications benefit from persistence in practice so
> this is often not advantageous.

This statement is questionable.

What _is_ true is this: many professional software developers are comfortable
with imperative/OOP/mutability. This is easier (not simpler) because it's
familiar. The benefits of immutability are not obvious/familiar.

Persistent data structures enable immutable data with cheap(ish) operations
that can produce updated versions of that data. This means I can expose data
to multiple threads, calling functions, etc... without worrying about that
data being placed into an inconsistent state or viewed in an inconsistent
state by callers.

Back when much of our profession malloc'ed their memory, we always had to
answer the question "who is going to free() this chunk of the heap?" And when
we answered wrong, or forgot to answer, we leaked memory. Garbage collectors,
while imperfect from a controlability and performance standpoint, are much
better at this than humans.

An excellent analogy exists with mutable data. When I expose a piece of
mutable data via any public interface, I have to ask the question "who can
change this data and how will consistency be enforced?" I'm left with the
answer "I don't know, I hope no one gets it wrong" or even worse: "they better
lock the right semaphore(s) in the correct order". Immutable data (implemented
via persistent data structures) enables me to instead say "here's the data, if
any caller would like to update, they can a) pass me an updated version which
I can validate, or b) call into some part of my public interface and instruct
me what update to make"

Or, in the case of Clojure (my blub at the moment), put the runtime in charge
of maintaining consistency during mutation using refs and the STM.

~~~
ww520
When you expose a piece of mutable data, usually it's wrapped up in an
interface that enforces the access and consistence of that piece of data so
that you have control rather than let it up to the callers. If you don't want
to have the synchronization wrapper, it's ok. Just document it clearly and let
the callers deal with it.

------
dmlorenzetti
_If you have a benchmark comparing C with a functional language then it is
almost certainly an extremely simple program. Arguably so simple that it is of
little practical relevance today._

A lot of simple tasks are highly relevant. For example, in numerical
simulation, you probably want a fast matrix-vector multiply. Conceptually this
is extremely simple, and it's easy to benchmark. Examples of practical
relevance: climate modeling, simulating combustion in engines, and optimizing
control surfaces on airplanes.

"Sort" is simple. Does that make it irrelevant?

How about finding the standard deviation of a bunch of numbers?

~~~
sbi
How relevant are numerical benchmarks when comparing languages? In practice
you'll probably use BLAS, LAPACK, etc. for numerical computing and get on with
it; you're unlikely to write code that is both faster and numerically stable.

------
rejuvenile
I am not an expert in functional languages, but I reason about it (that is,
the question "Are FP languages inherently slow?", not the article) in the
following way. I am curious to know if this reasoning is, well, reasonable.

\- CPUs are inherently not functional. They are imperative, as they are Turing
machines. They don't speak high-level math.

\- Therefore, programs written in a functional language must always be
translated into imperative code for execution. Compare this with an imperative
language where the compiler need not perform super-fancy translation.

\- If the problem can be mapped efficiently to machine code by the compiler,
the functional program is fast; otherwise, it may be slow. Either way, it may
be very hard to know what is going on behind the scenes.

\- This is not always a lose; sometimes the compiler can perform extensive
optimizations because it speaks math, and the functional language will win
handily.

\- Writing in an imperative language, you always know how your program will
behave; however, you lose the opportunity to have the compiler deduce a better
way (assuming there is a better way) by writing in a functional language.

\- But if there is no better way, you have the opportunity to make the
execution very fast by modeling the imperative solution around the constraints
of the hardware.

Of course, there are other dimensions to performance, among them programmer
productivity and number of defects.

~~~
slurgfest
It just seems funny to have an argument between the user having to use the
machine's phrasing vs. having to use the phrasing of higher math. Higher math
is clearly not the only abstraction. Ergonomics for one person coding or for
two people collaborating, on a practical task, seem more important to me.

------
darkstalker
The speed of a language depends on it's implementation, not on the language
itself

~~~
mistercow
That is true of _languages_ , but it is in fact the case that there are
algorithms that run in O(n) time when written impurely, but require O(nlogn)
or such when written in a purely functional style, even in state of the art FP
languages. Since FP languages provide impure facilities for use in the real
world, you _can_ still use the O(n) algorithms, but at the cost of purity.

So the relevant question here really is not "are functional languages
inherently slow?", but "is pure functional programming inherently slow?".
AFAIK, the answer to that question is not conclusively proven yet.

------
iand675
No, see GHC here: [http://shootout.alioth.debian.org/u32/which-programming-
lang...](http://shootout.alioth.debian.org/u32/which-programming-languages-
are-fastest.php)

A lot of work goes into getting it that performant, but it's entirely doable
with the right knowhow.

(Before anyone gets pedantic, I realize that languages aren't slow, just
implementations of them, etc.)

~~~
jlarocco
Well, GHC is still slower than C, C++, Ada and Fortran, so the link doesn't
really refute his point.

And, looking at the code, a lot of the Haskell is subverting purity and
laziness by using things like Data.ByteString.Internal and
Data.ByteString.Unsafe, so I think that actually makes the case even stronger.

~~~
acomar
> subverting purity and laziness

That's an interesting choice of words. Quite obviously no useful program can
be totally pure (otherwise it'd do nothing but heat the machine). And Simon
Peyton Jones is on record saying that he's not sure if he'd carry the laziness
torch quite so far if he had to do it all over again. Neither is to say that
these aren't powerful or useful techniques, but rather that in Haskell they're
chosen as defaults -- impure and/or strict code are the outliers that require
justification, not the other way around. So given that, in what way is using
ByteStrings (safe or unsafe) subversion? Far from it -- use the tool that
makes sense.

With most functional languages, the usual technique to writing fast code is to
first write it to be correct, and then transform it to be fast. And that's
exactly what unsafe ByteStrings are: optimizations. There's nothing wrong with
optimized code, and I see no problem with the technique. The only relation it
bears to the OP is that it's an explicit unboxing. Perhaps the compiler should
automatically do that unboxing, but that's easier said than done.

~~~
jlarocco
I guess my biggest problem is all of the people claiming Haskell is pure, lazy
and fast. It seems disingenuous when to actually be fast requires swapping out
the standard data structures with non-pure, non-lazy versions.

To put it another way, a naive implementation in C will compile and be
relatively fast, while a naive implementation in Haskell will, at the very
least, require replacing the standard data structures with things like
ByteStrings, just to match the default C speed.

~~~
acomar
Pure and fast is certainly feasible -- you don't always need to use unsafe
bytestrings for example. Laziness makes performance hard to predict because
you have no clue when something will get evaluated -- there's no contract with
the compiler. Maybe one day some group of really smart people will figure out
how to take advantage of that to write compilers that produce really fast
code, but I think language-wide laziness is not a feature bound for mainstream
languages. Targeted laziness? Probably, it's very useful for expressing data
structures. But not everywhere by default.

In any case, I don't think people are trying to be disingenuous. Haskell is,
or can be, all three of those things. It's just very hard to get them all at
the same time.

------
gleenn
No, OCaml and Haskell can be faster than C at times. Look at some benchmarks.

~~~
postfuturist
I did a benchmark. Naive Haskell implementation took 28 seconds, heavily
optimized / hinted Haskell version took 11 seconds. Naive C implementation
took 0.3 seconds.

<https://github.com/deliciousrobots/sml_life>

~~~
mej10
I have no idea what is heavily optimized about that Haskell code. I am no
Haskell expert, but all you did was use an unboxed array. I added one
strictness annotation to the "v" parameter on your iter function and it
reduced the time from 10 seconds to 1.5 on my laptop.

None of the other code you wrote uses a linked list, so I don't know why you
used them for Haskell. You used vector types for most of them (which Haskell
has, by the way). Pretty disingenuous.

I admit that the prominent use of lists in newbie Haskell tutorials could have
mislead you, but you seem to know the difference in data types since you used
vectors everywhere else.

~~~
postfuturist
Why did you accuse me of being disingenuous? I used an Array (life-array.hs),
if Haskell implements Arrays as a linked list... well, that's a problem.

~~~
mej10
I accused you of that, because it seemed like you knew what you were doing
with the other implementations, and then just suddenly forgot data structure
performance when you got to Haskell. It was an asshole move, and I am sorry.

They aren't linked lists, but you still used a linked list internal to your
"iter" function, where you used a vector (or array) there in other
implementations.

------
wonderzombie
Semi-OT: is there a good primer on GC people could recommend? His discussion
of different GC strategies piqued my interest, but I'm not entirely sure where
to start.

~~~
larsberg
The GC book by Jones is the place to start to understand GC in general. He
also has a new "GC Handbook," but I haven't looked through it yet to see how
well it covers modern parallel and concurrent collectors, particularly for
NUMA machines.

Having spent a large portion of the last five years working on a parallel
concurrent collector that scales to 48-AMD and 64-Intel cores, I can say the
game's totally different on NUMA architectures. Unfortunately, not much has
been written yet on it (I have an MSPC paper; the Intel folks have a paper on
Pillar that was at ISMM; that's about it). Feel free to ping me directly if
you get to advanced material and need more pointers. Many of the systems have
not yet been published, as it's shockingly difficult to publish on modern GC
implementations these days.

~~~
wonderzombie
Not sure if anybody reads comments five days later, but thanks for the
suggestion and the offer! I appreciate it.

Why do you suppose it's so hard to publish about them? A general lack of
interest?

------
stephencanon
If ever Betteridge's law applied...

~~~
_delirium
Despite the (not great) title, I think it's actually a fairly interesting
article, which isn't so much about the conclusion, but some difficulty and
success points w.r.t. functional-language efficiency.

