
Halide, a language for image processing and computational photography - roquin
http://halide-lang.org/?
======
blt
I saw their talk at Siggraph in 2012. This is really interesting work.

The target image processing algorithms are low level. Think "How can I write
_the fastest_ 3x3 box blur for Sandy Bridge and later architectures", not "How
can I speed up my face detector".

Examples of scheduling decisions that Halide deals with: process image in 4x4?
8x8? 16x16 chunks? use a temporary buffer for an intermediate processing
stage, causing worse cache behavior but reusing results that are needed more
than once? use a sliding window the size of a couple rows as a compromise?

This kind of work means the difference between a laggy or interactive Gaussian
Blur radius slider in Photoshop.

The Halide code shown at the talk was replacing really hard-to-read code with
lots of SIMD compiler intrinsics. Dozens of lines of code doing something that
would be 5 lines in a naive implementation. With Halide, it's almost as
readable as the naive version because the scheduling stuff is separated from
the algorithm.

For an application like Photoshop, this is a big win because they will _never_
choose code readability over performance. Performance wins every time. If they
can get the same performance with readable code, they are very happy.

GPU code generation falls naturally out of the scheduling intelligence and
restricted problem domain.

I have never used Halide, so I do not intend to endorse it, but this line of
inquiry is absolutely useful for a certain niche of programming.

------
linhat
As a computer vision researcher, this looks _very_ interesting, although
somehow I have yet to understand how they want to generalize highly
complicated optimization patterns (access order, locality, pipelines, platform
limitations ...), especially since some algorithms (other than the shown blur
filter) require quite complicated access patterns on the image data and can
only be hand optimized most of the time (that doesn't imply that they would
not benefit from general optimization at all, just that they might be way
faster when hand optimized). Still, if Halide produces faster code for some
cases (e.g. filter operations amongst others), it will still be worth its
salt.

~~~
oneofthose
Since this is a language and a compiler, my guess would be the answer to your
question is: the compiler will optimize for the underlying platform. The whole
point of Halide is stated in their abstract: "... make it easier to write
high-performance image processing code" which is the exact opposite of "hand
optimization". Halide allows developers to express what to do in a powerful,
domain specific language - the compiler takes care of the "how".

This approach makes a lot of sense: abstract the annoying low level
architecture details. They have a lot of targets which is fantastic: x86/SSE,
ARM v7/NEON, CUDA, Native Client, and OpenCL. Let the architecture specialist
worry about the architecture specifics. The disadvantage: the achieved
performance then depends on quality and wisdom of the compiler. But once
certain things are optimized for a specific architecture, every user will
benefit.

How they do it on the compiler end of things, I'm not sure. There are a number
of techniques. Among the simpler is auto-tuning. There is also a new term:
"copious-parallelism" [0]. It acknowledges that to achieve performance
portability across platforms, algorithms must offer explicit ways of
parametrization and tuning to adapt to different platforms. I think this is
the right concept but believe that it could be implemented within the
compiler. The domain specialist should not have to think about those things.

[0] [http://www.hpcwire.com/2014/01/09/future-accelerator-
program...](http://www.hpcwire.com/2014/01/09/future-accelerator-programming/)

~~~
seanmcdirmid
The paper was specifically about hand doing your schedules, not the compiler
doing them automagically, which is a huge pipe dream at any rate. For the
programs in the class, you are looking at a few orders of magnitude in perf
differences for different schedules, which is why the programmer needs control
so they can be guaranteed the performance they are expecting. Compilers only
reliably optimize at the few percentage point nice to have level.

~~~
oneofthose
You're right, I completely misunderstood the purpose of Halide. I read up on
it and I see now that they simplify how developers can do the copious
implementation by hand. The schedule must be specified by the developer, the
compiler doesn't to that.

------
SixSigma
As usual, the Unix Room at Bell Labs already had a go at this. It was called
Pico and is from 10th Ed. Unix. 1984

I have the print version of book in my collection. It is interesting.

[http://spinroot.com/pico/](http://spinroot.com/pico/)

[http://spinroot.com/pico/tutorial.pdf](http://spinroot.com/pico/tutorial.pdf)

[http://spinroot.com/pico/atttj.pdf](http://spinroot.com/pico/atttj.pdf)

~~~
psuter
It looks like they indeed also had a DSL for image processing. Despite what
the Halide presentation may be emphasizing, its novelty is not so much in the
DSL than the superoptimizer that comes with it, though.

Now, superoptimization is not a particularly new topic either (see e.g. [1]),
but Halide demonstrated that improvements can be significant in this domain.
Far more impressive than, e.g., superoptimization applied to generic assembly
code (hasn't really worked so far), query optimization, etc.

[1] [http://blog.regehr.org/archives/923](http://blog.regehr.org/archives/923)

------
sqrt17
I might be a stickler here, but this strikes me as a good example of a
"domain-specific language" rather than of a general programming language.

Now, it's certainly possible to nail general-purpose programming features onto
a DSL (e.g. matlab or R), but the approach (also used by Halide) to embed the
DSL into a general-purpose language that allows you to do so (here:C++) is, in
my eyes, vastly better suited for scaling up from experimentation code to
something that can become part of a larger system (think OpenCV in robotics
etc.)

~~~
kakali
They don't advertise it as a general programming language. The end of the
video they specifically state it only works with optimizing algorithms that
deal with images. General stuff like trees, they have no application towards.

~~~
zalman
The general domain of applicability is data parallel processing. There is very
little that is image specific in Halide. So audio, dense linear algebra, some
finite element applications, etc. would also be well within the domain.

------
marpalmin
We evaluated this some time ago in the company I used to work but it doesn't
have intel compiler nor ms compiler support so it was an instant no go. Anyway
it is an interesting idea.

~~~
zalman
You should be able to link generated Halide code(1) with either compiler.
(Using Halide generated code does not impose much at all in the way of
requirements other than C-level ABI compatibility and basic library support.)

A fair bit of work has recently been done on making Halide work well on the
Windows platform. We are erring toward requiring newer versions of Visual
Studio however. (E.g. there are advantages on being able to rely on C++11
support for generating Halide code.)

I do not see any issues filed on things not working with Intel's compiler.
Feel free to file one. Halide is an open source project and both bug reports
and patches help a great deal.

(1) Halide can operate as either a Just In Time (JIT) compiler or an Ahead Of
Time (AOT) compiler. Generally AOT is much more convenient for deploying in
production. The end result of of compiling a single filter in Halide is an
object file and a header file a single C-linkage function with all of its
parameters passed in on the call. The minimal runtime is included in the
object file and dependencies on libraries are also quite minimal, things like
libc and pthreads. Functionality provided by the runtime can easily be
overridden by providing one's own definitions for various functions, as weak
linkage is used.

There is also a set of Python bindings, which can be used to write Halide code
which is then AOT compiled. The AOT compiled object file can be linked into
pretty much anything that can handle C functions either via direct linkage or
a foreign function interface.

------
adw
I guess the /? in the url is here to defeat the dupe filter, but the site
loaded much faster for me without it.

------
bitL
Can you combine this with AVISynth for rendering videos? It would be cool to
be able to completely bypass Lightroom when I need to process individual
images and do it instead in a single script.

------
ccozan
Sincerely, I was expecting something like in Blade Runner [0]. It's quite easy
to take "language" for something spoken.

[0] -
[http://www.imdb.com/title/tt0083658/quotes?item=qt1827458](http://www.imdb.com/title/tt0083658/quotes?item=qt1827458)

------
rimantas
Really cool name. In rare case someone does not get it:
[http://en.wikipedia.org/wiki/Silver_halide](http://en.wikipedia.org/wiki/Silver_halide)

~~~
egeozcan
By coincidence, "Halide" also happens to be a Turkish female name.

~~~
jedismith
Slightly different pronunciation though I think.

~~~
egeozcan
Actually, I think the pronunciation is quite different.

~~~
gamegoblin
Indeed. The Turkish name is pronounced similarly to "holly duh" and the
chemical is pronounced "hay lied".

------
frozenport
One wonders if you could build an OpenCV interface using Halide to maintain
compatibility but also the potential for optimizing across OpenCV function
calls?

I want to write in OpenCV, but I also want the compiler to fix everything
across function boundaries :-)

------
sharpneli
I cannot shake the feeling that this is just yet another iteration of the
usual concept.

If you look at Google Renderscript, Microsoft C++AMP, NV Cuda, OpenCL and
whatnot you'll realize that they all do exactly the same thing, difference
being only in syntax level (with few caveats, but those are relatively minor).
Some being more cumbersome to use than others.

Halide looks neat to use syntax wise but it doesn't seem to make the actual
algorithms any easier to write. You still have to do the same mental work on
same level as you have to do with OpenCL.

All of these are to eachother what Fortran, Pascal and C are to eachother.
Same basic idea in different package. I'm waiting for the first system that is
equivalent of C++. Something that really brings new concepts to the table
instead of just a different syntax.

~~~
linhat

      I'm waiting for the first system that is equivalent of C++.
    

Not gonna happen anytime soon. C++ is much more than just a programming
language, it is an entire "ecosystem". You have toolchains, libraries,
drivers, APIs, existing software stacks, all written in C++ and able to
interface with C/C++ directly, i.e. no translation layer needed. You're not
just gonna replace that with a new language.

Sure, Halide may be seen as just some syntactic sugar (much like quite a few
bits and pieces of C++11), but it actually provides you with a different level
of abstraction than say OpenCl, MPI or OpenMP where you are very specific
about e.g. level of concurrency (which heavily impacts the design of your
algorithm) while Halide tries to almost completely separate algorithm and
scheduling.

~~~
sharpneli
I maybe wrote it badly. Skrebbel said it a lot better. It's about providing
something that's on truly higher level.

It's nice that Halide attempts to separate the algorithm and scheduling,
however as of now it's also possible in OpenCL too. The implementations are
just so bad at it that it's not really useful.

