LLMs have no real sense of truth or hard evidence of logical thinking. Even the latest models still trip up on very basic tasks. I think they can be very entertaining, sure, but not practical for many applications.
Consistent, algorithmic performance on basic tasks.
A great example is the simple 'count how many letters' problem. If I prompt it with a word or phrase, and it gets it wrong, me pointing out the error should translate into a consistent course correction for the entire session.
If I ask it to tell me how long President Lincoln will be in power after the 2024 election, it should have a consistent ground truth to correct me (or at least ask for clarification of which country I'm referring to). If facts change, and I can cite credible sources, it should be able to assimilate that knowledge on the fly.
Then we already have access to a cheaper, scalable, abundant, and (in most cases) renewable resource, at least compared to how much a few H100s cost. Take good care of them, and they'll probably outlast most a GPU's average lifespans (~10 years).
Humans are a lot more expensive to run than inference on LLMs.
No human, especially no human whose time you can afford, comes close to the breadth of book knowledge ChatGPT has, and the number of languages is speaks reasonably well.
I can't hold a LLM accountable for bad answers, nor can I (truly) correct them (in current models).
Dont forget to take into account how damn expensive a single GPU/TPU actually is to purchase, install, and run for inference. And this is to say nothing of how expensive it is to train a model (estimated to be in the billions currently for the latest of the cited article, which likely doesn't include the folks involves and their salaries). And I haven't even mentioned the impact on the environment from the prolific consumption of power; there's a reason nuclear plants are becoming popular again (which may actually be one of the good things that comes out of this).
LLMs have no real sense of truth or hard evidence of logical thinking. Even the latest models still trip up on very basic tasks. I think they can be very entertaining, sure, but not practical for many applications.