How practical can they be when current flagship models generate incorrect respon...

How practical can they be when current flagship models generate incorrect responses more than 50% of the time[1]?

This might be acceptable for amusing us with fiction and art, and for filling the internet with even more spam and propaganda, but would you trust them to write reliable code, drive your car or control any critical machinery?

The truly exciting things are still out of reach, yet we just might be at the Peak of Inflated Expectations to see it now.

[1]: https://openai.com/index/introducing-simpleqa/