I think the shared representation could be a set of vectors (like in transformers) based on self attention or graphs with vector features on vertices and edges. Transformers are similar to graphs, they just don't get edge information as a separate input. Graphs can represent any input format, scale better with input size, generalise in combinatorial problems and implement any algorithm. They are also easier to parallelise, being more similar to CNNs than RNNs.
My hope is that embracing graphs as the common data exchange format between neural models will lead to a leap forward towards AGI. Well, that, and investing more in artificial embodiment/environments.