
Wrapping C with Python: 3D image segmentation with region growing - chestervonwinch
http://notmatthancock.github.io/2017/10/09/region-growing-wrapping-c.html
======
saltcured
People interested in mixing image processing, Python, and C code for high
performance might also enjoy tinkering with a combination of Numpy and
PyOpenCL. It gives you some powerful mechanisms to manipulate n-dimensional
arrays and then offload some brute-force work to your GPU or multi-core CPU.

OpenCL is comparable to CUDA. It's essentially a C dialect with a lot of
overlap to the OpenGL GLSL (shader language), with intrinsics for certain SIMD
operations.

You write your outer data wrangling code in Python, and put your little OpenCL
kernels into the program as multi-line strings, which get compiled at runtime
into appropriate parallel processing routines which are dispatched by
whichever OpenCL drivers you install. I've used my NVIDIA GPUs and Intel
multi-core/SIMD CPUs to good effect.

This kind of parallel processing leads to turning your mind inside-out a
little and using signal-processing techniques. You want think in terms of
cooperative algorithms you can perform out of a large number of independent,
localized operations rather than a single point of focus which sequentially
wanders around a buffer.

~~~
make3
you can probably use Tensorflow without the differentiation stuff

~~~
Hydraulix989
I miss Theano

------
gravypod
Why was the stack implemented as a linked list? Could this be turned into a
block allocated array to improve cache locality (get rid of an entire int).
64%8 = 0 so you won't have any alignment issues. You'd also avoid doing free()
on every loop.

On a more general note if you're using the stack to queue up subsequent
computation why not just opt for tail-recursion which will be optimized out?

Also why are you using f2py rather than just writing a C module? [0]

[0] - [https://csl.name/post/c-functions-
python/](https://csl.name/post/c-functions-python/)

~~~
chestervonwinch
By block allocated array you mean something like a hybrid between a fixed-size
array and a linked list stack? I'm not very familiar with tail recursion, so
I'll take that as a suggestion to read more on it. The answer to your last
question is that f2py was easy enough to use and something I'm already
familiar with :)

~~~
gravypod
By block allocated array I mean.

    
    
        typedef struct { int x, y, z; } vec3;
        typedef struct { int size, i; vec3 items[]; } stack;
    
        static inline stack *stack_make(int size) {
        	stack *s = malloc(sizeof(stack) + size * sizeof(vec3));
        	s->size = size;
        	s->i = 0;
        }
    
        static inline bool stack_push(stack *s, vec3 *v) {
        	if (s->size <= s->i)
        		return false;
        	s->items[s->i++] = *v;
        	return true;
        }
    
    
        static inline vec3 *stack_pop(stack *s) {
        	return s->i == -1 ? NULL : &s->items[s->i--];
        }
    

If you want to dynamically grow your stack size you can implement realloc to
have your data set grow/shrink but I would recommend against that. Also note
that this implementation is not ideal. You obtain a reference to the internal
data in stack_pop. If you push something new to your stack your pointer value
will change. stack_pop should be changed to stack_pop(s, &container_vector)
and populate that for consistency.

~~~
chestervonwinch
The reason I didn't implement the stack with an array is that it has to be of
size at most the number of voxels in the image volume. This is potentially
much, much larger than size that the stack will grow.

In any case, I implemented the array stack along the lines of your post (with
some modifications), and it yields some minor improvements (about 0.014 less
seconds on average).

