If it turns out that the easiest, or even only means of doing this is by emulating the human brain, then it is entirely possible that we inherit a whole new set of constraints and dependencies such that world-simulation and an emobdied mind are required to make such a system learn. If this turns out not to be the case, that there's some underlying principle of language we can emulate (the classic "airplanes don't fly like birds" argument) then it may be the case that text is enough. But that's in the presence of a new assumption, that our system came pre-equipped to learn language, and didn't manufacture an understanding from whole cloth. That the model weights were pre-initialized to specific values.
I don't think it's so weird to imagine that natural language really doesn't convey a ton of explicit information on its own. Sure, there's some there, enough that our current AI attempts can solve little corners of the bigger problem. But is it so strange to imagine that the machinery of the human brain takes lossy, low-information language and expands, extrapolates, and interprets it so heavily so as to make it orders of magnitude more complex than the lossy, narrow channel through which it was conveyed? That the only reason we're capable of learning language and understanding eachother (the times we _do_ understand eachother) is because we all come pre-equipped with the same decryption hardware?
1) They appear to have crafted the skeleton of a grammar as it is with their nodes, super nodes, and slot collocations. This is directly analogous to something like an Xbar grammar, and is not learned by the system; therefore, if anything, it's strengthening a Chomskian position; the system is learning how a certain set of signals satisfy its extant constraints.
2) The don't appear to go beyond generative grammar, which already seems largely solvable by other ML methods, and is a subset of the problem "language". Correct me if I'm wrong here, it's a very long paper and I may have missed something.