Hacker News new | past | comments | ask | show | jobs | submit login

Offtopic sort of, but does anyone know if folks are working on combining vision and natural language in one model? I think that could wield some interesting results.




And here is a short guide and a link to a Google Collab notebook that anyone can use to create their own AI-powered art using VQGAN+CLIP: https://sourceful.us/doc/935/introduction-to-vqganclip


yeah there has definitely been work done in that space: it’s called multi-modal models

not sure if this is the latest work but here’s some results from Google’s AI Blog

https://ai.googleblog.com/2017/06/multimodel-multi-task-mach...


What would be really cool is neural networks with routing. Like circuit switching or packet switching. No idea how you would train such a beast though.

Like imagine the vision part making a phonecall to the natural language part to ask it for help with something.


Sounds like The Society of Mind - https://en.m.wikipedia.org/wiki/Society_of_Mind


Capsule networks have a routing algorithm as far as I know




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: