Even if ChatGPT can't fully grok a specification, I wonder how well it could be used to "test" a specification, looking for ambiguities, contradictions, or other errors.
I am not sure LLMs in general and GPT in particular are needed for that. In the end any human language can be formalized the same way source code is being formalized into ASTs for analysis.
A good specification or any other formal document (i.e. standard, policy, criminal law, constitution, etc.) is already well structured and prepared for further formalization and analysis containing terms and definitions(glossary) and references to other documents.
Traversing all that might be done with the help of a well suited neural network but only on the grounds of correctness and predictability of the network’s output and holistic understanding of how this network works.
As of now, the level of understanding of inner behavior of LLMs (admitted by their authors and maintainers themselves) is “the stuff is what the stuff is, brother”[]