
TensorFlow Mask R-CNN code for pixelwise object detection and segmentation - llebttam
https://github.com/matterport/Mask_RCNN
======
jszymborski
God bless people who implement models from academic articles that should
frankly include them to begin with. What's more is that this implementation
has clear instructions for extending this to your datasets.

~~~
j_s
[edit] > people who implement models from academic articles

Is anyone collecting the various HN discussions as these pop up? I would
appreciate help finding them again.

~~~
iandanforth
If you're looking for a way to find the implementations again I recommend
[http://www.gitxiv.com/](http://www.gitxiv.com/)

~~~
singularity2001
this should be the standard way for publishing

------
robbrown451
This is cool.

I wonder with stuff like this, what happens if a self driving car is capable
of processing reflections in glass windows? What if it sees a reflection of
itself and is able to properly identify it as being itself? Does that make it
self aware?

I'm being serious. People like to throw around terms like "self aware" with
some assumption that it is a long way off, or impossible, to have a machine be
self aware. But that would meet my definition. People will say "yeah but
that's not what I mean." And I want to know what you actually mean, then.

~~~
halflings
Many robots already have cameras pointed at themselves. What does that change
really?

~~~
robbrown451
Honestly nothing, to me. But other people think that ability to recognize
oneself in a mirror is important.

To me the concept of self-awareness and consciousness is pretty much
meaningless, especially if you are considering it something that machines
don't have or can't have (or if they eventually do have it, we'll know).

The reason I mention it with this, and with self driving cars (which this
particular system may not be fast or reliable for yet), is that for those
people, it may register better because it seems more analogous to a human.
With those robots you speak of, do they also recognize other robots? Do they
have some sort of logic that knows that they are like those other robots in
many ways, but in significant ways they are different (i.e. they have control
over their own behavior but not over the others')?

The point is not that something particularly amazing is happening, the point
is that it is getting easier to illustrate with real world examples that "self
awareness" is not this magical thing we currently have no idea how it works.

------
AndrewKemendo
I really want to try MaskRCNN with OpenImage instead of COCO - worried about
training time though. I tried to train MaskRCNN on a K80 and it failed after a
week.

------
Cacti
Thank you! This was on my list of things to implement but I hadn't had time
yet, nor was I really looking forward to the custom op implementation. :)

Great job.

------
aabajian
Wow, this is fascinating. From a radiology perspective, it could be the
missing method for segmenting findings inside a convoluted radiograph.

~~~
AndrewKemendo
Yes, and that's possible now with many different CNNs. The limiting factor is
the training/validation/test data in your subject.

For example you couldn't implement Mask R-CNN with the COCO dataset as
implemented here and get inference on your radiology problem set.

~~~
adyus
Would transfer learning help with that?

~~~
AndrewKemendo
Yea I mean transfer learning would bring over the first n convolutions so it
would be faster than from scratch, but you still need the radiology data to
get the last few steps.

------
bitL
Very cool! How is the performance? R-CNN used to be much slower comparing to
YOLO or SSD; FCN seems to be very fast as well though requires a lot of GPU
memory. Can that your version be used for realtime semantic segmentation?

~~~
waleedka
This architecture is optimized for accuracy rather than speed. The official
paper reports 200ms inferencing time per image on a GPU. This implementation
is likely a bit slower because we use Python in a couple of layers. This is
easy to optimize, but we haven't gotten around to it yet.

With that said, there are a lot of things you could do to make this much
faster. For example, use ResNet50 instead of ResNet101. You can also reduce
the number of anchors or the number of proposals to classify, and that should
improve performance significantly at the expense of a little loss in accuracy.

------
state_less
Can someone add depth to the training set? I'd like depth estimates for
objects in the frame too. It could be interesting to fly into a video.

Does the iPhone 8 have rgbd now for short range? Maybe someday we could get
pixel by pixel depth estimates?

~~~
AndrewKemendo
This exists in several different implementations and would be a separate DNN.

[http://cs231n.stanford.edu/reports/2017/pdfs/203.pdf](http://cs231n.stanford.edu/reports/2017/pdfs/203.pdf)

------
amelius
It seems that to train this, you need to input precise masks.

I'm wondering if one day it would be possible to train a network without masks
(just a classifier), and it will figure out the masks by itself.

------
naveen99
Surprised I still haven't seen a pytorch translation of deepmask / sharpmask.
But glad to see atleast a tensorflow implementation. Will definitely try it
out.

~~~
technics256
Been out for a couple months:

[https://github.com/felixgwu/mask_rcnn_pytorch](https://github.com/felixgwu/mask_rcnn_pytorch)

~~~
naveen99
> Unfortunately, we could not fit the model into the GPU we have and there is
> some ambiguity in the paper as well, so we decided to stop the project and
> wait until the official code being released.

------
bradneuberg
This looks great! Thanks for releasing this.

------
m3kw9
Anyone tried the inference speed?

------
amelius
Just as I moved to PyTorch, they implement this for keras and tensorflow :(

------
curiousgal
_The repository includes:

Pre-trained weights for MS COCO_

can't seem to find them anywhere.

~~~
waleedka
They're in the "Releases" section. It's a 250MB .h5 Keras file.

------
amelius
How do they measure accuracy?

------
DFASDF
no performance evaluation on MSCOCO

~~~
waleedka
Evaluation code against MS COCO is included in the repository, both for
bounding boxes and segmentation masks so it should be easy to run (but takes a
long time).

We should publish more details, though. Thanks for bringing it up. Our
implementation deviates a bit from the paper (as mentioned in the
documentation), and optimizing for COCO was a 'nice to have' rather than being
the main objective. We got pretty close to the reported numbers (within 3 to 4
percentage points) but that was with half the training steps compared to the
paper. We'll try to add more details over the next few days.

