
Show HN: Bild – Simple image processing in Go with parallel processing support - amzans
https://github.com/anthonynsimon/bild
======
uniclaude
Wow, this code is so easy to read.

Even though I might not use this package as I don't have any Go project in my
pipeline, reading the code was a nice learning experience and just demystified
a few things about image processing for me.

Go might not be one of my favorite languages, but its readability is just
great (or maybe this codebase isn't idiomatic Go?).

~~~
IshKebab
Agreed, that is definitely one of the nice things about Go. Compared to
something like Haskel, to pick an extreme example, it is so easy.

Part of that is because there aren't any 'higher level' functions so you are
forced to write everything out as explicit loops. Often a bit tedious but it
is definitely easier to see what exactly is happening when reading the code.

Side rant on Haskel: I really wanted to get into it, but when you see function
signatures like this:

    
    
        add                     :: Integer -> Integer -> Integer
    

You have to wonder what pipe they are smoking. I'm sure someone is going to
tell me there is an obscure compsci reason why it couldn't have been this:

    
    
        add                     :: Integer, Integer -> Integer
    

But that is just stupid, sorry.

~~~
tome
It could have been

    
    
        add :: (Integer, Integer) -> Integer
    

but then you couldn't write

    
    
        add 1
    

and use it as a function. And that would be just stupid, sorry.

~~~
dpc_pw
It's just a matter of syntx. After all it's rather common for people to want
to do partial application to non-first argument:
[http://stackoverflow.com/questions/4553405/how-can-i-bind-
th...](http://stackoverflow.com/questions/4553405/how-can-i-bind-the-second-
argument-in-a-function-but-not-the-first-in-an-elegan)

So something like:

    
    
        add 1 _
    

or whatever (I'm not Haskeller so hard for me to judge what exact syntax)
would do.

And then

    
    
        add _ 1
    

would be easily possible too.

~~~
tome
Absolutely, but then what would

    
    
        add _ (minus 1 _)
    

mean? Not the same as

    
    
        let f = minus 1 _
        in add _ f
    

and things would get very weird very fast.

------
matt42
Did you benchmark Go vs C/C++ for parallel image processing ? I'm curious to
see if there is actually a cost into switching to a higher lever language like
Go.

~~~
amzans
I haven't done any formal benchmarks yet. At the moment the library is not
fully optimized, just the low hanging fruit (optimization/bugfixing is the
next stage).

But I will definitely post some benchmarking results soon, as they are crucial
for the optimization stage.

By the way, any suggestions/contributions to the project are more than
welcome!

~~~
Mr_P
Some basic suggestions:

All of the blur/ functions could see big improvements with some simple
changes:

* Use separable kernels whenever possible (this reduces a O(K^2) evaluation to 2 _K evaluations)

_ Some filters (like your box filter) can be done efficiently as a linear
combination of IID filters (i.e. implicitly computing a pair of running sums,
subtracting one from the other as you go).

* The convolve/ package could use some cache blocking, and should definitely have the conditionals on the inner-loop removed.

Unfortunately, these changes will likely make your code a bit more difficult
to read.

This is one of the reasons Halide has become so popular for image-processing
(if you're interested in high performance image processing without sacrificing
maintainability, Halide is definitely worth looking into!)

~~~
amzans
Thank you for the suggestions! The blur and convolution packages are
definitely among the first things to be optimised, lots of other features
could benefit from a much needed faster convolve function.

Halide looks very interesting, I found the "Halide Talk" video on their
website to be a great primer on their methods.

------
pcwalton
Without SIMD (or autovectorization) this will be very slow. This is really the
canonical use case for SIMD acceleration, which Go doesn't support.

This is very clear code though and great for learning.

~~~
pkroll
There's
[https://github.com/bjwbell/gensimd](https://github.com/bjwbell/gensimd) for
doing SIMD in Go. That'd take away the "clear" part pretty quickly.

------
luckydude
This looks neat, nice code, but isn't this just the same stuff that xv (1) has
been doing since the 90's? It sure looks similar. Maybe this is much faster?

(1)
[https://en.wikipedia.org/wiki/Xv_(software)](https://en.wikipedia.org/wiki/Xv_\(software\))

~~~
amzans
The project was originally for learning purposes, as I wanted to try out Go's
concurrency/parallelism for something not web related. It quickly grew into a
collection of common image processing functions, so I decided to turn it into
a Go package.

The goal of the library is ease of use and development instead of being the
fastest implementation (and then loosing too much readability). That being
said, it is not slow but there's still room for improvement, which will be
addressed during the optimisation and bug fixing stage.

I'm not sure if it's the same as xv, to be honest I haven't worked with it
before. But it looks interesting, thanks for the link!

------
nzjrs
In which situations is parallel execution used?

~~~
amzans
Basically every time that we need to iterate over the _Pix []uint8_ containing
the pixel data, as this can be run in parallel (or concurrently if running in
a single logical CPU).

You will usually see it like this:

    
    
      parallel.Line(height, func(start, end int) {
        for y := start; y < end; y++ {
          for x := 0; x < w; x++ {
            pos:= y*img.Stride + x*4
            // ...
          }
        }
      })
    

_parallel.Line_ is a function that takes a _length int_ and a _fn func(start,
end int)_ and it splits the provided length into segments, then it dispatches
the provided fn with each segment range to a new goroutine. The number of
segments is defined by the number of available logical CPUs.

So basically each dispatched _fn_ is iterating over its assigned range of the
_Pix []uint8_ and the image is split into chunks by height. Notice that only
the y-axis loop is assigned a partial range and we iterate over the x-axis
first, this is because we want to move sequentially in the slice as much as
possible.

During the optimization stage I would like to benchmark against a tiled
segmentation instead of a line one, but the current version only implements
the latter.

