Hacker News new | past | comments | ask | show | jobs | submit login
Deep learning powers a motion-tracking revolution (nature.com)
82 points by digital55 15 days ago | hide | past | web | favorite | 15 comments

Interesting to see deep learning techniques entering the realms of non-cs focused academia. I recently heard of another example from the Planet Money podcast[1], that used deep learning to perform an elephant census.

One of my favorite things about ML is using it to reduce human labor hours on tasks like these.


I wonder what the next iteration of new CS tech will be once we apply ML to every field where it's easily applied.

You can apply it on top of arbitrary combinations of fields where ML could be easily applied to.

Image segmentation and 3d pose estimation are not new. I'd like to see how the new deep learning approaches compare to the existing state of the art.

Deep learning approaches have been the state-of-the-art for these tasks for several years now. Other techniques just cannot compete.

That's only partially true. Deep learning approaches for segmentation are state of the art where you have large labeling sets with high variability in the targets, but that hardly covers everything. The techniques are also being used where this isn't the case, of course, but with mixed results at best. Which should hardly be surprising, as transfer methods are hardly a panacea.

Can you give an example where a non-deep learning segmentation method outperforms deep learning algorithms? Do you mean static camera setups where traditional background subtraction works well enough? I would assume DL approaches match or outperform those even there.

You can't just look at segmentation tasks, but the availability of good labeling and sufficient training data. Given enough data of sufficient quality capturing the variability of your domain, and good quality labeling on it, you can probably make a deep model perform quite well. Failing to have these inputs, you will often do better with other techniques (there are many).

Furthermore, if you have a decent physical model and/or some constraints (e.g. lots of industrial QA, cell counting, etc.) with a fixed FOV etc., you can do quite well with classic approaches and that can be quite robust. Some of the deep models you see performing well on e.g. small curated sets for conferences just don't generalize well at all, which isn't surprising given the setup.

They are not fast enough for real time 3D pose estimation (60fps)

That might have been true yesterday. But today, https://developer.nvidia.com/rtx-broadcast-engine

There are DL methods that well exceed 60 fps for single-instance estimation (on high-end GPUs, especially on Turing or Volta cards with FP16).

I would also be interested in any alternative approach that even comes to the ballpark of DL performance in pose estimation.

The Kinect could do pose estimation on 48 joints at 30 FPS on the Xbox 360's CPU.

This one doesn't need a huge separate dataset for each animal (~200 frames), just user has to define parts of animal they wish to track. Most 3D pose estimators require a huge 3D pose dataset for training.

This is because of transfer learning: for example, by default, DeepLabCut starts with a pre-trained ResNet-50. Thus these trackers start out with an excellent "understanding" of the statistics of natural scenes out of the box.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact