
OmniNet: A unified architecture for multi-modal multi-task learning - subho406
https://arxiv.org/abs/1907.07804
======
visarga
The direction of this work is interesting. I hope to see more reuse in deep
learning based on a shared common representation and custom encoder/decoders.
It's unfortunate that every time we start from scratch and inherit nothing
from the multitude of models and tasks tackled before. Current DL models are
like custom apps developed for specific tasks, I'd like to have something more
like a neural-OS.

I think the shared representation could be a set of vectors (like in
transformers) based on self attention or graphs with vector features on
vertices and edges. Transformers are similar to graphs, they just don't get
edge information as a separate input. Graphs can represent any input format,
scale better with input size, generalise in combinatorial problems and
implement any algorithm. They are also easier to parallelise, being more
similar to CNNs than RNNs.

My hope is that embracing graphs as the common data exchange format between
neural models will lead to a leap forward towards AGI. Well, that, and
investing more in artificial embodiment/environments.

~~~
subho406
That's a really interesting idea! Giving such neural networks access to more
and more modalities helps in useful transfer of information across the various
nodes of the network, making it capable of performing zero-shot learning on
never before trained tasks. It also makes it easy to add new tasks to the AI
system even when data availability is a problem.

------
subho406
Full source code available at:
[http://github.com/subho406](http://github.com/subho406)

