Offtopic sort of, but does anyone know if folks are working on combining vision and natural language in one model? I think that could wield some interesting results.
What would be really cool is neural networks with routing. Like circuit switching or packet switching. No idea how you would train such a beast though.
Like imagine the vision part making a phonecall to the natural language part to ask it for help with something.