
From 60 Frames per Second to 500 in Haskell - coolsunglasses
http://keera.co.uk/blog/2014/10/15/from-60-fps-to-500/
======
fsloth
Oh dear. I love F# and would really want to motivate myself to Learn me some
Haskell but these sort of articles discourage me enormously.

Please, if there are any people here who have written successfully production
code in Haskell that needs to have as low memory footprint as possible and be
as performant as possible and have reached something like 1.5 x C level memory
consumption, 0.75 x C level performance I would really be grateful for any
references. I cannot wait to shed myself from the chains of C++ but stories
like these, that follow patterns like these, give me the impression we're not
there yet:

1\. Awesome language <3 <3 <3 2\. Write production code 3\. Hit a mysterious
performance problem (too much memory, too slow, or both)

Note: My interest is in dense computational kernels that are not I/O bound.

Edit: The language is obviously great for problems akin to syntax translation,
I'm just constantly hoping someone would do something industrial strength with
it and tell about it. Jane Street are extremely bullish about Ocaml
([https://blogs.janestreet.com/](https://blogs.janestreet.com/)). I've yet to
face a similar perpetual love letter to Haskell as a language from industrial
users.

~~~
carterschonwald
I write performance sensitive haskell all the time.

In fact my main focus the past 2 years has been designing better array
computation tools (for dense and sparse matrix computation). (some of which
i'm in the process of finally open sourcing)

I've tools and tricks that would drive humans mad before they could figure out
how to port the full awesomeness of what i've done to C++. The main place
where I'll use C in my haskell is for writing SIMD kernels or wrapping up some
unrolled ASM from openblas/blis like this monstrosity
[https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_6...](https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_64/dgemm_kernel_4x4_haswell.S)

for throughput cpu/ram throughput bounded workloads that aren't allocation
heavy, thats pretty much the only time I'll break out C, for wrapping up those
crazy kernels that are tuned to a specific cpu microarchitecture.

I also know quite a few people who use haskell as a meta language for some
domain specific EDSL they compile via using LLVM as a library, I know of
people who've proprietary tool chains using modern GHC haskell for really
really performance sensitive workloads in this fashion in high frequency
trading, computer vision, and scientific computing. (and I've the fortune to
consider them my friends)

btw, over on reddit, the amazing austin seipp (who does a LOT of work on GHC)
has a super articulate post about performance engineering in haskell
[http://www.reddit.com/r/haskell/comments/2jbl78/from_60_fram...](http://www.reddit.com/r/haskell/comments/2jbl78/from_60_frames_per_second_to_500_in_haskell/cladaeq)

Anyways, at the end of the day, engineering is about building software that
works in finite time. The performance bits will only bit in your inner most
loop, and the ffi overhead in haskell for c that takes < 10µs should be about
2-5ns (when an ffi call takes less than ~ 10µs, its safe to use the "unsafe"
ffi, for operations that may take > 10µs, you should always use the
default/"safe" ffi convention or someone will come knocking at your door late
at night quite angry).

I should also disclose I'm slowly working out a few crazy extensions to ghc
for low level performance engineering, though those might not make it into GHC
till 7.12 at the current rate things are going in my life :)

the one area where GHC doesn't shine is in mega core allocation heavy
workloads, but i've yet to see anyone do well by default in that regime in any
language! :)

EDIT: also, profile before doing performance engineering. And before that
choose the right algorithms! :)

EDIT: to further elaborate on some of the tech i've got, I've a way of doing
array computation that gives me a very nice blend of guaranteed good memory
locality + extensibility that i've not seen in any other array computation
tooling i've been able to lay my hands on, at least on this planet :)

~~~
fsloth
Thank you, your post was most encouraging! I think I'll have to presume that
the adept Haskell devs are mostly developing and not popularizing that much of
their advancements.

~~~
carterschonwald
well, they do whatever they have to do to pay the bills! I think a lot of
these tool chains, or fragments thereof, are going to be open sourced
eventually, but all engineering (open or not) is being paid for by someone (at
least indirectly). So a lot of these tools are only going to be make that move
once the authors can "pay" for the time to do so. ('cause life is more than
just software i'm told!)

You can talk with a lot of people doing numerical computingy things in haskell
on the #numerical-haskell channel on freenode.

------
rrradical
I'm also pursuing mobile games in Haskell, so I'm very happy to see them
working on this. That said, less than 30 fps for a simple game seems very poor
to me. For comparison, my previous game ran easily at 60fps on an iPhone 5
w/Gambit Scheme.

So, I'm wondering where the culprit is. Is the android phone fairly old? What
does a basic rendering test run at? I.e., w/o yampa or SDL or other game
logic? I hope to have my own answers to these questions soon, but bravo to
Keera for blazing the trail!

------
advocaat23
Is it concerning that most Haskell optimization guides talk about adding ! to
enforce evaluation? It may be a sign that pure functional programming is not
quiet there in the real-time world (e.g. real-time rendering) and that some
abstractions (declarative instead of imperative) break for these use-cases.

What I want to say is that you really have to know a lot about Haskell to
predict a program's runtime performance which is essential in the context of
real-time. In my opinion these optimizations seem a little bit hacky:
unsafePerformIO is per definition hacky, she-bangs seem sometimes like guess-
work and concurrency is always hard to get right (even with MVars).

I really like Haskell but do not consider it yet for latency-sensible tasks as
it feels like writing unidiomatic Haskell. In particular I find it concerning
that one has to exploit concurrency and parallelism for such a simple game.

Are their ambitions to make GHC (hard) real-time friendly? This will probably
require a real-time GC optimized for latency. I would like to read more about
this, in particular about using pure (strongly typed) functional languages in
a low-level/real-time context. Would it be possible to exploit the type-system
there?

~~~
lomnakkus
As it turns out, in practice, you usually want your data structures to be as
strict as possible, but your control flow to be as lazy as possible. Further,
some code just isn't well-suited to laziness -- so much so that there's work
afoot to have a "strict-by-default" LANGUAGE pragma added in GHC 7.10.x[1].

[1]
[https://ghc.haskell.org/trac/ghc/wiki/StrictPragma](https://ghc.haskell.org/trac/ghc/wiki/StrictPragma)

~~~
tome
> you usually want your data structures to be as strict as possible, but your
> control flow to be as lazy as possible

That's a very nice slogan. I'll have to remember that. It certainly covers the
case of generally wanting your record fields to be strict and lists to be lazy
when used for control flow.

~~~
lomnakkus
> That's a very nice slogan.

Yeah, I borrowed it from some Haskell luminary. My search-fu failed me so I
didn't source it as I should have. :)

------
taeric
It is somewhat mind blowing that the original version of the demo game was
done with ridiculously rudimentary logic gates. If you haven't read the
history of how Wozniak did this with little over 40 chips, you should look it
up.

And yes, I realize the graphics are different. Probably even the game play.

~~~
bjterry
At UC Berkeley one of the classes (EECS150) focused on creating an old-school
arcade game in an FPGA using (virtual) logic gates. It is one of the best
classes I ever took and really gives you an understanding of how these things
were possible. It seems impossible when you first see it, but by the end it
all clicks and you realize how much is possible in pure silicon.

~~~
taeric
Sounds incredibly fun. Know if any of the resources you used are available in
the wild?

~~~
vmind
Not who you're replying to, but there's a similar class at Cambridge where you
implement pong/game of life on an FPGA board. Some of the practical course
notes are available publicly, though the full computer design notes referenced
are not. There may be some interest in the basic sources and approach however:
[http://www.cl.cam.ac.uk/teaching/0910/ECAD+Arch/](http://www.cl.cam.ac.uk/teaching/0910/ECAD+Arch/)

(Linked to the course I know from a few years ago, the course has since
changed)

------
maxcan
As a long time haskeller and lover of the language this article really brings
out the warts in the language.

I'm super excited about rust [http://www.rust-lang.org](http://www.rust-
lang.org) because it retains much of the safety of Haskell, excluding effect
tracking, while potentially offering C/C++, bare metal speed.

~~~
evincarofautumn
Which warts, exactly? It’s a happy account of simple code changes that led to
dramatic speed improvements, found using GHC’s great profiler. I’m excited
about Rust as well (and working on a similar language) but GHC’s performance
is quite competitive. And I don’t find a few strictness annotations and MVars
so onerous; in fact, I usually make my data structures strict by default,
making fields lazy when I actually need knot-tying or streaming or what have
you.

This is also a good example of how easy concurrency and parallelism are in
Haskell—I didn’t really get how to effectively program with concurrency until
I learned Haskell, then ported the knowledge to more challenging environments
in imperative-land.

------
efnx
If you're interested in game programming with Haskell I invite you to the
#haskell-game IRC channel (irc.freenode.net) and our sub-reddit
[http://reddit.com/r/haskellgamedev](http://reddit.com/r/haskellgamedev) :)

------
MrBuddyCasino
I appreciate this article for demonstrating Haskell optimization techniques.
But honestly, I kind of expected a 3D game with impressive graphics to finally
prove FP can used for these purposes. This is just a Breakout clone that runs
below 30fps on Android.

~~~
angersock
Yeah, it seems real impressive until you realize that you could duplicate it,
faster, with only a small bit of C or C++.

Until I see a real-time software rendering project in Haskell doing full 3D,
don't really think it's worth half the fanboyism it gets.

~~~
dyarosla
I have to agree. I wanted to see something spectacular and I wanted to see
Haskell explored as a real option. But running 25fps on Android?! Seriously?
It can't hit 30fps or today's accepted video game speed of 60fps on an
extremely trivial Breakout game?? As I was reading, all I could think was that
ONLY after a lot of optimizations requiring hacks or deep knowledge of how
Haskell operates you hit a low slightly higher but still low
benchmark..that's.. nice(?). Not the 'Wow, I should start using Haskell'
argument at all.

I don't need to see full 3D to convince me, but the average game on any
platform is tens to hundreds of times more asset-heavy, resource-heavy and
computation heavy than this example. It's just not convincing to see it
perform UNDER the expected 30-60fps.

~~~
angersock
So, the reason I mention full software 3D, is this:

Computers are, at heart, numerical engines. 3D rendering, in software, is the
perhaps the purest form of exploiting that. If your language can't do that
well, then it's a high-level language and is suited for gluing stuff together.
I specify software 3D because any fool can pull in some OpenGL bindings and
talk about framerates, but in that case their language is only glue for
hardware APIs.

And there's nothing wrong with that, mind you, but then you can't get by on
"But think of the performance this could have, one day, with magical compiler
technology from the futuuuuuure!". You have to get by only on claims of
safety, or developer ergonomics, or provability, or something else.

------
JonAtkinson
Cached version:
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://keera.co.uk/blog/2014/10/15/from-60-fps-
to-500/)

~~~
keera_studios
Sorry about this. Thanks for posting this link. The server had KeepAlive on
and couldn't handle the load. Solved.

------
nightcracker
For the love of god, use frametime to measure improvements, not FPS. "A 15 FPS
increase" is entirely meaningless, and can mean you doubled your performance
or a 1% improvement.

~~~
keera_studios
Someone else pointed this out. You are right, for comparison of each
improvement independently, it's better to use frame time.

In this particular case, we were after a very specific framerate, and it helps
to think in FPS. The game outputs both frame time and framerate (currently of
each subsystem independently).

------
Narishma
When profiling games, using FPS to measure performance increases/decreases is
kinda silly. For example, going from 25 to 30 fps (5 fps increase) is a bigger
performance improvement than going from 90 to 120 fps (30 fps increase). Much
better to use frame times.

~~~
keera_studios
Agreed. The game outputs both. In this particular case, we are after a very
specific framerate.

------
LukeShu

        > C does not annotate pure functions as such, but the
        > semantics of SDL make some data conversions
        > referentially transparent
    

FWIW, GCC supports such annotations with `__attribute__ ((pure))`.

~~~
cordite
But is that data that ends up in the lib*.so's?

I can see that possibly optimizing within GCC (or LLVM if it supports it) but
I have no clue for library shared/static objects.

~~~
LukeShu
The annotation would go in the header files, so that it works with lib*.so's.
For example, glibc's strlen is marked this way (as well as a whole host of
other functions).

------
jgalt212
Haskell and Clojure are both functional languages with Haskell being purer.
That being said, I fairly regularly see posts about how fast Haskell is (Bryan
O'Sullivan has a great blog[1]) and I rarely if ever see posts about how fast
Clojure is.

What is/are the reason(s) for this? Is it the fault of the JVM, or the
bytecode the Clojure compiler emits?

[1] [http://www.serpentine.com/blog/](http://www.serpentine.com/blog/)

~~~
skittles
Haskell is strongly and statically typed which means its compiler can generate
more optimized code. The compiler also creates programs that run without a
virtual machine. Clojure is dynamically typed and runs on the JVM. It is still
really fast compared to many other dynamic languages.

~~~
rwosync
Static typing is mostly orthogonal, Haskell is mostly blazing fast because of
all the aggressive compiler optimizations and the last decade of hard work by
Simon Marlow and others on the parallel runtime.

~~~
tome
Static types provide guarantees that help the compiler know when optimizations
are valid.

------
dkersten
An aside: this article has a lot of bold text, which I find quite distracting.
I've seen it said that over-reliance on formatting to make your point is a
sign of poor writing.

Most of the bold text could have been removed without any drop in impact (in
fact, it would have greater impact because whatever bold text is left would
actually stand out).

------
imaginenore
A game that simple should run at 1000 fps in any modern language.

------
mpweiher
Am I missing something here? Breakout ran in real time without overlaps etc.
on the Apple ][+ with a 1 MHz 6502.

------
rlpb
The link just shows a blank page. Zero frames per second here.

~~~
tome
The web server is having trouble. Reloading worked for me.

~~~
keera_studios
Sorry to both of you for this. Thanks for pointing it out.

The server was misconfigured (KeepAlive) and couldn't handle the load. It's
been fixed now.

------
tempodox
Apparently, the web server doesn't share the speed gain. Switching to the
cached version...

