
Findings from the Imagenet and CIFAR10 competitions - jph00
http://www.fast.ai/2018/04/30/dawnbench-fastai/
======
antognini
> I worry when I talk to my friends at Google, OpenAI, and other well-funded
> institutions that their easy access to massive resources is stifling their
> creativity. Why do things smart when you can just throw more resources at
> them?

I'm not convinced that this is a real problem. The big tech companies have way
more compute than anyone else, so they should do the large-scale experiments
that no one else is able to do. It's true that you don't have to be
particularly creative to come up with a lot of these experiments, but
nevertheless they should be done and there is a lot of scientific value there.

I absolutely agree that more attention needs to be paid to smaller
institutions since they're just as likely as anyone to come up with the next
great idea. Double blind conference submissions help with this to an extent,
but they're not a panacea. If you see an anonymous paper that has performed a
thousand Imagenet experiments, is there really any doubt where it came from?
And similarly, if they _didn 't_ perform a bunch of Imagenet experiments, is
there really any doubt that it didn't come from one of the big players? So now
you can mask a bias against small institutions as a subtler "you didn't
perform enough Imagenet experiments" excuse. (Incidentally, this was the main
reason that the paper by Smith & Topin was rejected from ICLR [1].)

EDIT: Full disclosure, I'm currently working on a set of not-so-creative (but
still IMO important) experiments that use a large amount of compute at Google.
So I have some bias here. :-)

[1]:
[https://openreview.net/forum?id=H1A5ztj3b](https://openreview.net/forum?id=H1A5ztj3b)

------
jph00
Jeremy from fast.ai here. Let me know if you have any questions about the
methods we used, or anything else relevant to this project.

~~~
cs702
I do have a question, but first I must say that as I was reading your post I
found myself nodding in agreement again and again and again. Yes, the Silicon-
Valley-centric AI research community (including UK-based DeepMind and its
surrounding satellites) tends to overlook or even ignore important research
from elsewhere. Yes, "algorithmic creativity" and focused tinkering often
prove more important than computational resources. Yes, innovation in deep
learning does not require big data nor massive hardware, but "engineers are
drawn to using the biggest datasets they can get, on the biggest machines they
can access, like moths flitting around a bright light" (so true!). And yes,
"genuine advances [have] consistently come from doings things differently, not
doing things bigger." Ah-men!

With that out of the way, here's my question:

Have you guys tried or managed to achieve learning super-convergence with
attention or residual attention models?

~~~
jph00
No I haven't tried yet. One of our students has started experimenting with
language models (AWD LSTM) and getting encouraging results (about 3x speed
improvement), but not with attention.

I really hope that these results will help encourage more people to both see
where and how super-convergence can be achieved. I suspect that we're only
scratching the surface of what's possible.

~~~
cs702
Thank you for the prompt answer -- and for your intellectual generosity.

I too hope these results encourage more people to see where/how super
convergence can be achieved.

FWIW, trying to achieve it with fully attentional models is on my "R&D things
to try" list at work.

