For the RAG example, I don’t think it’s the prompt so much. Or if it is, I’ve yet to find a way to get GPT4 to ever extrapolate well beyond the original source text. In other words, I think GPT4 was likely trained to ground the outputs on a provided input.
But yeah, you’re right, it’s hard to know for sure. And of course all of these tests are just “vibes”.
Another example of where Claude seems better than GPT4 is code generation. In particular GPT4 has a tendency to get “lazy” and do a lot of “… the rest of the implementation here” whereas Claude I’ve found is fine writing longer code responses.
I know the parent comment suggest it likes to make up packages that don’t exist, but I can’t speak to that. I usually like to ask LLMs to generate self contained functions/classes. I can also say that anecdotally I’ve seen other people online comment that they think Claude “works harder” (as in writes longer code blocks). Take that for what it’s worth.
But overall you’re right, if you get used to the way one LLM works well for you, it can often be frustrating when a different LLM responds differently.
I should mention that I do use a custom prompt with GPT4 for coding which tells it to write concise and elegant code and use Google’s coding style and when solving complex problems to explain the solution. It sometimes ignores the request about style, but the code it produces is pretty great. Rarely do I get any laziness or anything like that, and when I do I just tell it to fill things in and it does
But yeah, you’re right, it’s hard to know for sure. And of course all of these tests are just “vibes”.
Another example of where Claude seems better than GPT4 is code generation. In particular GPT4 has a tendency to get “lazy” and do a lot of “… the rest of the implementation here” whereas Claude I’ve found is fine writing longer code responses.
I know the parent comment suggest it likes to make up packages that don’t exist, but I can’t speak to that. I usually like to ask LLMs to generate self contained functions/classes. I can also say that anecdotally I’ve seen other people online comment that they think Claude “works harder” (as in writes longer code blocks). Take that for what it’s worth.
But overall you’re right, if you get used to the way one LLM works well for you, it can often be frustrating when a different LLM responds differently.