
Show HN: SOTA semantic segmentation with MobileNetV3, in 3 lines of PyTorch code - ekzhang
https://github.com/ekzhang/fastseg
======
p1esk
Can you please explain what you did there, and how it compares with the
official state of the art?

~~~
ekzhang
Yeah, so this work was done at Nvidia as part of a larger project that
required semantic segmentation. But although MobileNetV3 has state-of-the-art
performance on many image tasks like classification, detection, and
segmentation - __there are no public implementations of MobileNetV3 for
semantic segmentation with good accuracy. __

Looking through implementations on GitHub, we saw accuracies of 40-50% mIoU,
which is frankly unacceptable given that the paper claims 72.6% mIoU. So over
the past few months at Nvidia, I worked with some researchers from ADLR
([https://nv-adlr.github.io/](https://nv-adlr.github.io/)) to implement
MobileNetV3 in PyTorch. After a bunch of hyperparameter tuning, we managed to
train it to within 0.3% of the accuracy reported in the paper:
[https://arxiv.org/abs/1905.02244v5](https://arxiv.org/abs/1905.02244v5). See
the "Metrics" section of the GitHub README for more detailed information.

Also, unlike other code releases, this repository is meant to be _easy to
use_, in that it works out of the box (just install with pip), and is
extremely fast. My goal in open sourcing these models was to make it easier
for others to do the same kind of work.

I'm looking forward to seeing what people do with these models. :)

~~~
p1esk
Awesome, thanks! It would be really valuable if you described what tricks did
you use to get to that accuracy. Like what hyperparams turned out to be
important, what was missing in the paper, and how did you do hyperparam
tuning. Not here, but on the github page. Your advice will probably outlast
your results :)

