
Ask HN: Data augmentation tools for 3d object detection - iluvdata
What are good data augmentation techniques which can work with 3d data specifically depth being third dimension. Also, any pointers on tools around them would be of great help.
======
based2
[https://medium.com/ymedialabs-innovation/data-
augmentation-t...](https://medium.com/ymedialabs-innovation/data-augmentation-
techniques-in-cnn-using-tensorflow-371ae43d5be9)

[https://www.quora.com/What-are-good-data-augmentation-
techni...](https://www.quora.com/What-are-good-data-augmentation-techniques-
for-a-small-image-data-set)

[https://www.researchgate.net/post/Is_there_any_data_augmenta...](https://www.researchgate.net/post/Is_there_any_data_augmentation_technique_for_text_data_set)

[https://forums.fast.ai/t/data-augmentation-for-
nlp/229/15](https://forums.fast.ai/t/data-augmentation-for-nlp/229/15)

~~~
iluvdata
Thanks for sharing these links. I found
[https://github.com/aleju/imgaug](https://github.com/aleju/imgaug) and
[https://github.com/mdbloice/Augmentor](https://github.com/mdbloice/Augmentor)
to be good when you have 2D data. My challenge is that I have x,y and depth(Z)
now all image transformations in colour space can't be applied to depth space
so which ones would work best and any pointers on tooling around them would be
helpful.

~~~
yorwba
> all image transformations in colour space can't be applied to depth space

Why? If you treat the depth channel as an additional color, you might even be
able to use the libraries you mentioned without modification (unless they have
hard-coded assumptions on the number of color channels). Depth isn't really
special, all the same ideas for data augmentation still apply. You just have
to transform it with the rest of the data, so if you e.g. mirror the image,
the depth gets mirrored as well.

~~~
iluvdata
But when RGB scales, tilts or shears then how do you mathematically move depth
accordingly?

------
mathgaron
Are you talking about a depth channel on an image plane? It also largely
depend on the problem but here is a few tricks that helped me for my problems
(object tracking).

\- Generating synthetic data is powerful if you have the depth modality as it
is easy to render. Also the real/synthetic domain gap is narrow compared to
RGB. I consider it as data augmentation: you usually do many renders from a
single 3D model.

\- If you can somehow normalize the offset (e.g. compute normals) that can
help. In my case I could offset the center of the object as 0 depth and it
greatly help the network to converge.

\- Classic augmentations like gaussian noise, gaussian blur and also
downsampling the depth helps (apply these randomly).

As for tooling, I just use numpy/pytorch for most operations and OpenGL for
renders.

------
lovelearning
Regarding tools, OpenCV has a PLY-to-2D-images renderer [1].

[1]:
[https://github.com/opencv/opencv_contrib/blob/master/modules...](https://github.com/opencv/opencv_contrib/blob/master/modules/cnn_3dobj/samples/sphereview_data.cpp#L83)

