Do we understand why prompt engineering is still necessary? Why it is unable to correctly determine ("understand") what output the user wants from unstructured input?
Sure, but from my experience it seems like humans currently have a much better ability to infer context and meaning from input than these current generation of LLMs.
I assume that as LLMs get better they will be able to produce better output without needing to be prompted in such specific ways.
Or perhaps ask simple and common follow up questions when they detect ambiguity in the request, like humans do.
I think the key missing ingredient of current AI systems is the lack of internal monologue. LLMs are capable of asking questions, but currently you need explicitly prompt it to deconstruct a problem into steps, analyse these text and decide whether a question is warranted. You basically need to verbalise our normal thought process and put it in the system prompt. I imagine that if LLM could do few passes of something akin to our inner monologue before giving us a response they would do a lot better on tasks that require reasoning.
This is being worked on, look into Chain/Tree of Thought applications
What is missing for me is it recognizing that it lacks enough information to provide a sufficient response, and then asking for the missing information.
- typically, it responds with a general answer
- sometimes it will say it can give a better answer if you provide more information (this has been increasingly happening)
- however, it does not ask for specific information or context, it doesn't ask what if, or if/else, kinds of problem decomposing questions
I do expect these things to improve as we are reaching the limit of raw training data & model sizes. We're primarily in the second order improvements phase now for real applications. (there are still first order algo improvements happening too)
This makes sense to me. LLMs would benefit tremendously by using clarification prompts. Instead they spew output with whatever confidence level their creators deem is good enough.