
Training ResNet-50 on ImageNet in 35 Epochs Using Second-Order Optimization - pplonski86
https://arxiv.org/abs/1811.12019
======
p1esk
I think the main question is - do second order methods enable faster training
on a single GPU? Not in terms of epochs, obviously, but wall time.

------
IshKebab
Is it possible to over-fit optimisation methods? I mean people concentrate so
much on coming up with fancy ways of training resnet really quickly or with
huge batch sizes or whatever. Maybe the methods themselves don't generalise to
other networks.

Maybe they do though. Just a though.

~~~
asparagui
Yes, over-fitting is definitely something that happens. Here's a paper that
demonstrates the problem by testing CIFAR classifiers on a new version of the
input data:
[https://arxiv.org/abs/1806.00451](https://arxiv.org/abs/1806.00451)

The flip side is that by having a common problem to tackle it's much easier to
compare and contrast different results to figure out what really works. Many
approaches that improve results on CIFAR don't scale to ImageNet, and many of
the ImageNet papers don't scale to the even larger datasets that people are
using now.

~~~
IshKebab
Aha amazing - I've been wondering for ages how rankings would change with new
test data for MNIST (or another common dataset). Thanks for the link!

------
sheeshkebab
Are there any (unsupervise or min supervised) methods yet that learn on non
repeating datasets? As humans we don’t really learn by rereading the same text
35 times, or stear at the same picture continuously, and so why are all these
algorithm research papers keep focusing on methods that are basically trying
to fit a function over a static known and labeled data?

~~~
IanCal
To be fair, we do spend a pretty major chunk of our lives looking at
repetition to learn, particularly when younger.

The term you may be looking for though is "one shot learning".

~~~
ericd
Yeah, one shot learning in humans is built on top of enormous model
representations of the world built up over many years, so I wouldn’t be
surprised if we’re a bit premature in trying to pull it off in ML other than
in very simple cases.

