Hacker News new | past | comments | ask | show | jobs | submit login
OmniNet: A unified architecture for multi-modal multi-task learning (arxiv.org)
14 points by subho406 36 days ago | hide | past | web | favorite | 3 comments

The direction of this work is interesting. I hope to see more reuse in deep learning based on a shared common representation and custom encoder/decoders. It's unfortunate that every time we start from scratch and inherit nothing from the multitude of models and tasks tackled before. Current DL models are like custom apps developed for specific tasks, I'd like to have something more like a neural-OS.

I think the shared representation could be a set of vectors (like in transformers) based on self attention or graphs with vector features on vertices and edges. Transformers are similar to graphs, they just don't get edge information as a separate input. Graphs can represent any input format, scale better with input size, generalise in combinatorial problems and implement any algorithm. They are also easier to parallelise, being more similar to CNNs than RNNs.

My hope is that embracing graphs as the common data exchange format between neural models will lead to a leap forward towards AGI. Well, that, and investing more in artificial embodiment/environments.

That's a really interesting idea! Giving such neural networks access to more and more modalities helps in useful transfer of information across the various nodes of the network, making it capable of performing zero-shot learning on never before trained tasks. It also makes it easy to add new tasks to the AI system even when data availability is a problem.

Full source code available at: http://github.com/subho406

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact