
Facebook open sourced object detection from images technology , SharpMask - tachyons
https://code.facebook.com/posts/561187904071636/segmenting-and-refining-images-with-sharpmask/?refid=52&_ft_=qid.6322801441051145749%3Amf_story_key.1023434960795274919%3AeligibleForSeeFirstBumping.1
======
epberry
Wow this field is moving fast. Just yesterday I was commenting on RCNN which
apparently is old news now! Although it appears the same basic architecture is
still used. If you're looking for a nice history of object detection leading
up to a survey of current methods and a discussion of RCNN, check out this
talk by Larry Zitnick:
[https://youtu.be/UXHWNNzdPVM](https://youtu.be/UXHWNNzdPVM), one of the key
people behind the OP.

At the end of this post they talk about extending this to video. This, in my
opinion, is a much harder problem. Convnets are standard for images but no one
has found a really good architecture for video. Some key questions I have
about video: 1) How do humans perceive moving things? My guess is that there
are major differences down to the visual cortex that would warrant brand new
architectures. 2) Could we operate neural nets directly on encoded data, such
as h.264? Not only would this be computationally much more efficient than
decoding video into frames and processing each one but some codecs give you
motion vectors and other useful temporal data for free. 3) How do we handle
temporal information? LSTMs work well for sequential data and there's been
some work on using them for video but I'm not aware of much success on using
sequential networks to detect things like plot points in movies.

