
Designing a Programming Language for the Desert - Athas
https://futhark-lang.org/blog/2018-06-18-designing-a-programming-language-for-the-desert.html
======
keldaris
How close is Futhark to being considered production-ready? How close is the
generated code to hand-optimized OpenCL for real problems (I did look at the
benchmarks, but only the Rodinia suite had comparison numbers, and those are
trivial to beat)?

OpenCL is easy to write for anyone who knows C (far more people than fans of
functional programming) and moderately difficult to write decently well.
What's the value proposition of Futhark? Is it just that some people prefer
writing in a functional DSL rather than C, or does it purport to enable some
optimizations that are hard to write well manually?

~~~
Athas
> How close is Futhark to being considered production-ready?

It's not 1.0 yet, and there are ways to trip up the compiler by accidentally
writing irregular code (and the workarounds are nonobvious), but several
people have used it productively (albeit non-production by most standards).
Addressing this is part of the research we're doing, and we are quite sure we
know how to get there.

> OpenCL is easy to write for anyone who knows C (far more people than fans of
> functional programming) and moderately difficult to write decently well.
> What's the value proposition of Futhark? Is it just that some people prefer
> writing in a functional DSL rather than C, or does it purport to enable some
> optimizations that are hard to write well manually?

In principle, a skilled GPU programmer can always outperform Futhark. The
value proposition of Futhark is partly that you can obtain decent code without
having low-level knowledge (usually within x2 of hand-written performance),
and partly that even a skilled GPU programmer can get there much faster. Most
low-level GPU languages (like OpenCL and CUDA) make it very hard to write code
that is both fast _and_ modular. For example, you might write some really nice
CUDA kernel that operates on vectors, but there is no obvious way to apply it
to every row of a matrix - probably you will have to write a new kernel from
scratch; one that is very similar to the old one, but without direct code
reuse. In Futhark, you would just 'map' the old function over the matrix, and
let the compiler figure out how to turn it into one or more segmented
operations.

Essentially, the Futhark compiler is not bound by the restriction that the
optimised code should be maintainable by humans, so it is able to use
inlining, duplication, fusion and a host of other transformations that are
helpful for performance, but no sane programmer would write if they cared
about the maintainability of their code.

In practice, Futhark usually gets beat on single primitive operations (matrix
multiplication, FFT, etc), but at an application level, it tends to do well.

~~~
keldaris
Thank you for the detailed response. I'll try to allocate a weekend to port
some of my usual scientific GPGPU test cases to Futhark and see how it goes.

> Most low-level GPU languages (like OpenCL and CUDA) make it very hard to
> write code that is both fast and modular.

That's absolutely true, which is why a frequent solution is to achieve
modularity by using simple metaprogramming to generate kernel code from higher
level constraints. I understand Futhark aims to obviate at least a subset of
these issues, so there's value there.

> Essentially, the Futhark compiler is not bound by the restriction that the
> optimised code should be maintainable by humans, so it is able to use
> inlining, duplication, fusion and a host of other transformations that are
> helpful for performance, but no sane programmer would write if they cared
> about the maintainability of their code.

Am I correct in assuming that you can still (easily) obtain, inspect and
profile the Futhark-generated OpenCL code using the usual tools? I suppose a
sensible workflow could be using Futhark to get as far as you can and then
hand optimize the bottleneck kernels further.

> In practice, Futhark usually gets beat on single primitive operations
> (matrix multiplication, FFT, etc), but at an application level, it tends to
> do well.

Do you happen to know of any scientific/engineering applications (lattice
Boltzmann methods, molecular dynamics, etc.) using Futhark in the wild? From a
quick glance at the website it seemed like all of the applications are focused
on image/video processing, any reason for that?

~~~
Athas
> Am I correct in assuming that you can still (easily) obtain, inspect and
> profile the Futhark-generated OpenCL code using the usual tools? I suppose a
> sensible workflow could be using Futhark to get as far as you can and then
> hand optimize the bottleneck kernels further.

Sort of. Futhark generates pretty simple OpenCL host code, but there is not
yet an obvious way to tie the generated kernels back to the original Futhark
source code. As a result, it's easy enough to detect that some specific kernel
is the bottleneck, and often also why, but it can be a bit of a puzzle to
connect it to a part of the original program. While you _can_ in principle
edit the generated kernels yourself, it's not code that is at all nice to read
or modify.

> Do you happen to know of any scientific/engineering applications (lattice
> Boltzmann methods, molecular dynamics, etc.) using Futhark in the wild?

No. Closest are simple things like nbody[0] simulations.

[0]: [https://github.com/diku-dk/futhark-
benchmarks/tree/master/ac...](https://github.com/diku-dk/futhark-
benchmarks/tree/master/accelerate/nbody)

> From a quick glance at the website it seemed like all of the applications
> are focused on image/video processing, any reason for that?

Not sure. Most of these are stencils, and while Futhark does alright with
stencils, it doesn't do anything particularly clever (no hexagonal tiling, for
example). Maybe it's just that they are easy and satisfying to write, because
you trivially get something visual at the end.

Futhark's design was originally inspired by nasty financial algorithms (the
ones from the FinPar suite[1]), which tend to be a combination of Monte Carlo
methods and differential equations. I'd say that is Futhark's main strength.

[1]:
[https://dl.acm.org/citation.cfm?id=2898354](https://dl.acm.org/citation.cfm?id=2898354)

------
tehsauce
These DSLs for high performance parallel computing based on purely functional
algorithm descriptions are very exciting- halide and accelerate are similar
projects. I can see as the demand for writing parallel, portable code
increases technology like these becoming essential tools.

~~~
mkirklions
What applications?

~~~
lou1306
Most algorithms related to computer vision and image processing come to mind.
Of course there are efficient libraries written in general-purpose languages,
but DSLs could lead to smaller, easily maintainable codebases.

------
Nzen
tl;dr futhark is a DSL for transpiling non-graphics parallel submodules for
other programs. Which is to say, if one were to write a N-queens futhark
program, it could create C, C#, or Python modules for use in a larger program
that would leverage OpenCL for parallelizing the chess moves.

This blog post declares some culture boundaries. Namely, the team intends to
keep the dsl targeted at this domain, rather than expanding to general
computation in some far future. This enables them to focus on common target
language constructs and keep their documentation efforts toward similar
experts. They acknowledge that this demands more of their users, for <example
all> external imports must be in the same folder when compiled, but allows for
simplicity and clarity all around.

Hence, the author uses the analogy of a desert creature that survives in an
environment of constrained resources.

------
earenndil
Honestly, if it's _not_ intended for general use, I don't think it's worth it
to complicate the build process, and complicate the process of setting up a
dev environment by installing a whole new compiler toolchain just for a couple
of small functions.

~~~
everdev
> Thus, even were Futhark to become the indisputably best language in its area
> - which is certainly the goal! - the user base would still be small, and so
> would the development resources.

Exactly. If I'm selecting a language for any critical components, it's going
to be one with a strong community, not something obscure that might be a
little less verbose or a little more performant.

~~~
Athas
What if it's a _lot more performant_? It's of course always a risk to
incorporate another language, but sometimes performance is a critical feature,
yet you have no time or inclination to write (say) low-level GPU code by hand.
Futhark is more of an alternative to CUDA or OpenCL than to high-level
languages.

Also, there is the important property that Futhark programs tend to be fairly
small, and are semantically close to ordinary functional programming. Thus,
even in a worst case situation where the Futhark compiler is abandoned, it is
a reasonable task to simply rewrite all the code in another high-level
language.

~~~
earenndil
> sometimes performance is a critical feature, yet you have no time or
> inclination to write (say) low-level GPU code by hand

I also have no time or inclination to integrate an entirely new tool into my
build chain, complicate and lengthen the build process, and increase the
cognitive load required to begin working on the project.

~~~
crummy
That's probably why they focussed on making the compiler as simple and
straightforward to use as possible, so it does not significantly increase the
complexity of your build process.

~~~
orbifold
You would still need to wrap it in some form for it to be useable from most
other build systems except make. For build systems such as bazel this is quite
some work for little immediate benefit.

------
taneq
> Fairly sure you don’t have to watch out for sandworms in a real desert.

Doesn't that map rather well to being acquired by $bigcorp?

------
erikpukinskis
Transcendant read. Bang on.

------
dom96
So futhark is designed to be small and specific to a single class of problems.
Has the author considered implementing it in another language as a DSL?

~~~
Athas
Yes, we considered that. There were two reasons we wrote a standalone language
instead:

* It gives more flexibility and simplicity, regarding both syntax and type system. The compiler can also more easily analyse the AST, because there are no artifacts of an embedding.

* An embedded DSL is practically available only to users of the host language. In contrast, Futhark is equally accessible to many languages, either via direct code generation (such as for Python), or using a fairly simple C FFI.

The best example of an embedded high performance DSL is, I believe,
Accelerate[0] for Haskell. It is really good, but is ultimately only
accessible to Haskell programmers, and the implementation has to solve non-
trivial problems related to the embedding.

[0]:
[http://hackage.haskell.org/package/accelerate](http://hackage.haskell.org/package/accelerate)

