That is an interesting thought. I don't fully understand how though. The main challenge is the training data. If you need to first create the interactive experience... What would the added value be?
For example is creating novel combination based on (large) training set. If the network had enough weights on what is realistic looking could create novel game experiences based on a prompt of say a film or book.