There definitely is room for improvement. Our proposed framework uses Mask R-CNN for object segmentation. However, Mask R-CNN only as a resolution of 28x28 pixels for the object masks if I remember correctly (which is then resized to the actual resolution of the object in the image). So there are a lot of limitations from that alone.

if you haven't already seen it, https://github.com/dbolya/yolact generates very sharp instance masks very quickly, may be of interest

Those masks look great, thank you for the pointer!

yup! best of luck, love your research.

