Why? If you treat the depth channel as an additional color, you might even be able to use the libraries you mentioned without modification (unless they have hard-coded assumptions on the number of color channels). Depth isn't really special, all the same ideas for data augmentation still apply. You just have to transform it with the rest of the data, so if you e.g. mirror the image, the depth gets mirrored as well.
- Generating synthetic data is powerful if you have the depth modality as it is easy to render. Also the real/synthetic domain gap is narrow compared to RGB. I consider it as data augmentation: you usually do many renders from a single 3D model.
- If you can somehow normalize the offset (e.g. compute normals) that can help. In my case I could offset the center of the object as 0 depth and it greatly help the network to converge.
- Classic augmentations like gaussian noise, gaussian blur and also downsampling the depth helps (apply these randomly).
As for tooling, I just use numpy/pytorch for most operations and OpenGL for renders.