Honestly, it doesn't matter for the end user if there are more tokens generated between the AI reply and human message. This is like getting rid of AI wrappers for specific tasks. If the jump in accuracy is actual, then for all practical purposes, we have a sufficiently capable AI which has the potential to boost productivity at the largest scale in human history.
It starts to matter if the compute time is 10-100 fold, as the provider needs to bill for it.
Of course, that's assuming it's not priced for market acquisition funded by a huge operational deficit, which is a rarely safe to conclude with AI right now.