Hacker News new | past | comments | ask | show | jobs | submit login
Open-Sourcing Bit: Exploring Large-Scale Pre-Training for Computer Vision (googleblog.com)
83 points by theafh 11 days ago | hide | past | web | favorite | 6 comments

I've been going through the fast.ai course, and pre-training is like witchcraft, it's so spookily effective.

You can take a 15MB mobilenet model, add a layer at the end, fine tune with half a dozen examples of a few different image classes (in a few minutes on a consumer grade laptop), and recognize lots of different examples in real time with a web app reading continuously from a webcam.

The advances made in Computer Vision in the last ten years are mind blowing.

Interesting experiments. Too bad they're not releasing the JFT pretrained models. I guess the cutoff point of what's too valuable to share has been reached.

GB has been pretty good about releasing models (especially compared to, say, DeepMind), such as EfficientNet.

JFT is the exception. I find JFT interesting so I pay close attention to anything using it, and as far as I've noticed, no model has ever been released based on JFT, going back to 2015 at least when it was much smaller. It's always either held back or the released model is based on public datasets (eg BigGAN - released G was on ImageNet though the paper notes that the JFT BigGAN completely avoided divergence problems, which is very interesting). I've wondered if legal/copyright issues block any release: there's always someone who tries to argue that a model is a derived work, and nothing in the JFT-300M papers mentions having licenses covering public redistribution.

I don't think Google has ever released models trained on JFT. But if you're interested in large-scale vision models, you can check out these models from Facebook trained on 940M Instagram images (several times bigger than JFT!)


No comments, probably because everyone is trying to fire up the demo code in colab and trying to make "whose dick is it?" classifiers...

Well for my humble needs, the first layers of a pretrained VGG16 were already good enough, so I have little use for yet another even more resource hungry visual encoder.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact