
One model to learn them all - mpweiher
https://blog.acolyer.org/2018/01/12/one-model-to-learn-them-all/
======
maxpupmax
Wow. Can someone pull a Hacker News and explain to me why I'm allowed to be
super pessimistic about this result? I want to believe.

~~~
andreyk
No need to be super pessimistic, but there is reason to temper your excitement
- this is quite preliminary research (little results to indicate much benefit
to this - "But the results show that even on the ImageNet task, the presence
of such blocks does not detract from performance, and may even slightly
improve it."), and is still entirely supervised. Something like this able to
learn in a semi-supervised fashion from images and text could really seem
revolutionary.

~~~
sdenton4
There's a pretty strong literature around retraining just the top couple
layers of a deep net to Target a slightly different objective.

The interesting possiblity opened here is training new feature processing
frontends to work with an established 'conceptual' backend.

------
reilly3000
I think this nips at the core. The reality we live in is an unlimited stream
of inputs. Their notion of attention is important. Modeling weight of inputs
seems to be an interactive componation of bidding the value of past and future
across infinite dimensions. By encoding them as a single input stream they are
nudging at what a brain does well when is works right: contextulization.

------
dchichkov
I see a comparison with the sate-of-the-art on single problems (and it is far
from state of the art). I don't see a comparison on the proposed multimodal
dataset with a baseline, for example T2T or ByteNet?. Do I just have to
believe that this is a good model?

Aside from that, nice to see that someone actually makes the effort and deals
with the logistics of working with multimodal data. Usually researchers stop
at maximum three datasets and call it multimodal, it is great to see eight!

------
tw1010
One model to overfit it all

~~~
SubiculumCode
And in the darkness bind them.

~~~
YeGoblynQueenne
And in the variance bias them?

------
YeGoblynQueenne
For me, as a non-practitioner (I'm happy with my GOFAI, thank you), the big
problem with neural nets is that there are too many architectures, each
tailored to a specific problem. On the one hand it's great that there's a
broad toolset, on the other hand there are so many competing claims about
best-of-class performance that it's hard to know what is even the state of the
art. There is too much noise, you know?

So it'd be nice to see a result that reduced the noise a bit. I'm afraid this
one doesn't fit the bill. It's not so much reducing architectural options, as
piling even more architecture on top of the already sprawling mass of
architecture. It's got architecture hanging from its architecture!

I mean, come on- one component is a gated-mixture-of-experts, which is to say,
a collective of feed-forward nets. There are just so many _layers_ upon
_layers_ of choices to make for each type of network to use for each component
of the entire model. How do you make these choices?

Or, think about how long this specific architecture is going to give state-of-
the art results (as claimed). It uses normalised ReLu convolution blocks- the
height of fashion, at the moment, but what happens in four years from now,
when nobody uses that anymore, because it's so 2010's?

ANN research frustrates me like this. It's describing an art form, but I'm not
sure that's really the most useful thing to do.

~~~
npatrick04
I get your frustration with architecture of architecture. However this kind of
AOA has been going on for a long time in a similar domain of control theory.

Model Predictive Control is a method of applying a model of a process to
estimate it's state. Per Wikipedia, it's been used since the 80s. When you
apply it to tracking something you aren't in control of, say an enemy plane,
you need to use multiple models, and then pick the most probable solution.

It's not elegant, but it's not actually a bad way to get good results.

------
aportnoy
One model to rule them all, one model to find them, One model to bring them
all, and in the darkness bind them.

------
erikpukinskis
Legit question: if human intelligence is fundamentally limited by scale, why
aren’t there some humans running around with heads twice as big and twice the
neurons? If it’s such an advantage why hasn’t nature selected for it in at
least one place on Earth?

(My answer: above the scale of the human brain there’s diminishing returns on
more neurons. Two individuals with their own volition are smarter than one
individual with double the brain size. We’ll do a lot of research in AI to
find that there aren’t any machine learning problems that can’t be run on a
computer about the size and speed of a brain, or distributed among some number
of independently operating brain sized computers, and that the problem never
was scale but just where to focus)

~~~
joshmarlow
Here's my understanding/interpretation - if I get any details wrong, I hope
someone will correct me!

Brains are very expensive and, for our body size, we have very large brains;
evolution had to build a brain that could operate _within the calorie
constraints imposed by our ecological niche_ \- presumably a hunter-gatherer
niche.

If you graph adult body volume relative to gestation time across the placental
mammals, there's a good correlation, but humans are an anomaly; for our body
volume we should take more than 9 months. So we are kind of born prematurely -
the reason? So our heads can fit through the birth-canal. Significantly bigger
brains would have likely entailed altering our walking gate, thus potentially
sacrificing our ecological niche (and guaranteed source of those much needed
calories!). Most of this was taken from Pinker ([0]).

Bostrom ([1]) brings up another interesting point - anything smarter than us
would have probably taken longer to evolve than us. So it might not be
inaccurate to think of ourselves as the dumbest species that could build an
interplanetary species.

My suspicion is that things much smarter than us are very possible; my
reasoning is, it just seems so strange to me that the peak of feasible
intelligence just so happens to correspond with what can be fed by a hunter-
gatherer diet and fit through a birth-canal which fits into the constraints of
an upright primate.

It seems more likely to me that we're a local optimum - we're probably close
to the optimal design for something as smart as us, in our ecological niche.

[0] -
[https://en.wikipedia.org/wiki/How_the_Mind_Works](https://en.wikipedia.org/wiki/How_the_Mind_Works)
[1] -
[https://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dang...](https://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dangers,_Strategies)

~~~
tormeh
Very interesting! I happen to know of some other problems: Heat and cross-
brain communication time.

Bird brains are much smarter than we are per weight and volume. Not a little
bit, but a lot. I think part of the reason they can do that is that they have
much better cooling because their heads are small. Having a bigger brain than
we do means that it would need to have a lower activity level because of
cooling constraints.

There's also diminishing returns from size because the interconnects from one
point in the brain to another have longer latency. One response to this is to
fold the brain, as we do. Regardless, at some point adding more volume stops
making sense.

~~~
joshmarlow
I read once that bird brains (or some part of them) are more modular and
consist of densely connected components - any idea/references about that?

Also, this blog-post (and associated links) may interest you -
[http://www.rifters.com/crawl/?p=6116](http://www.rifters.com/crawl/?p=6116)

------
Animats
That's encouraging in two directions. Not only are they encoding different
subject matter areas in the same net, they're using the same net for different
subject matter areas. Standardized net designs may work for a variety of
tasks.

Progress marches on.

------
zbyte64
Reminds me allot of DRAGNN:
[https://arxiv.org/pdf/1703.04474.pdf](https://arxiv.org/pdf/1703.04474.pdf)

Both are multi-task endeavors with novel encoding techniques.

------
brabel
One model, such a great idea! We could introduce a revolutionary concept for
that... like, an Object! Which can be converted to lots of different
"formats"! Like an image, or text (e.g. JSON and XML, really a revolution)!!!
Why no one has thought about that?! /s

------
d--b
Seriously 'one model to learn them all'? Isn't that a tiny bit overreaching?

The results are not very surprising after the Google translate post about
multi language translation.

------
oneman
Superintelligence is hyperintergration of the metasystem.

