Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How do you prompt the "advanced" models
18 points by _jogicodes_ 11 days ago | hide | past | favorite | 12 comments
I use the Windsurf IDE, which comes with integrated LLM chat and edit functionality. Ever since I switched to it two months ago and for the three months before that I was using Cursor (similar editor), I have always had better results with Claude.

With the apparently more advanced reasoning models I thought that that would change. In Windsurf I have DeepSeek R1 as well as o3-mini available. I had thought that they would improve my outcomes to the prompts that I'm giving. They did not, far from it. Even though in benchmarks they consistently pull ahead of Claude 3.5 Sonnet, in reality, with the way I am prompting, Claude almost always comes up with the better solution. So much so, that I can't remember a time where Claude couldn't figure it out and then switching to another model fixed it for me.

Because of the discrepancy between benchmarks and my own experience I am wondering if my prompting is off. It may be that I am prompting Claude specific having used it for a while now. Is there a trick to know to prompt the reasoning models "properly"?






I think those benchmarks are all noise. Claude has so far been the only model I really trust and use in Cursor. All those fancy pants reasoning models seek to just jerk themselves off and never really do anything better than Sonnet.

One thing I always make sure is to never get it to just spit out code. I always go back and forth a few times to ensure alignment before I say “Bombs Away” and let it write code.


That's an interesting approach so you chat first, making sure the model understands what you want and then you switch it to write mode? I must try that approach on more complex stuff. That is kind of how you get Claude to reason in a way, no?

Yes. It also helps to have it document a plan or spec and refer to that document.

Yes. If you were hiring an engineer and you asked them a coding question, what would you think if the very first thing they did was start writing code without asking a single question? They’d fail the interview instantly, right?

These LLM’s seem to assume their job is to just write code instantly. No! Tell them to not do that! “No code yet, let’s make sure we are on the same page first. This is requirements and discussion, I’ll tell you when it’s time to write code” and go back and forth on requirements and stuff. If fact bake that into your “system prompt” or whatever your tool provides to set up the pre-prompt—I will straight up tell it what I just used as an example about failing an engineer who jumped right into writing code. That LLM had plenty of training data about why that is a bad idea! You just need to “remind it” by telling it.

These tool makers should honestly build that in by default but I bet a lot of people expect the magic genie to start writing magic code by just reading your mind or something. If they had it ask all those pesky requirements first, I bet they’d start failing some idiotic benchmarks or something.

In addition consider asking it if it has all the context it needs to help you. Sometimes it will say “no, can you let me see such and such class/file”.

Also I make sure to include the full path and file name at the top of all my files to help the LLm since I doubt that kind of metadata gets passed through otherwise. I’ll sometimes give it a “tree” of my own entire project so it knows where all the files are at.

Providing proper context is absolutely critical. This tool cannot read your mind! Tell it exactly what its role is and give it the context to do the job properly.

These things aren’t magic. Learn how they work a bit and then you’ll realize they are a really fancy command line interface or something… dunno how to describe it, but it’s just another way to interact with a computer. That’s all it is.


Is editing code really the endpoint for llms?

I suspect we'll be getting to a point where the "code" is just instructions, codified in a special markup file, and llms just write the worst possible, kiss code you can think of - but is extremely secure because it's just like direct database access with all the security constraints you define, but always applied correctly. In other words think of the actual code as a non-committed artifact, and it's just emitted if the descriptors change.

The long term of llms writing code isn't to give us human quality code, it's to give us what we'd think of assembly but rigorously output to all auth requirements.


But, the need for reproducible and auditable code will remain. If I take your idea full circle, first we’ll have this committed set of English prompts, then we’ll need to evolve some type of very explicit structure around the prompts to ensure the LLM interprets it exactly in the right way, then we’ll want to make the instructions more efficient and easier to type than a long paragraph of English text, so we’ll create a more condensed markup syntax, and then we’re right back at… a programming language.

The long term of programming languages is to make it higher and higher and higher level, so LLMs may be the next iteration.

We'll start seeing things like Java, Python and Ruby code as we see Assembly now, and the structures you're describing as the program.


In Windsurf (and a similar FOSS tool, RA.Aid that I develop), sonnet is almost always the best model to drive the agent itself. The reasoning models really shine when you have some kind of logic problem, planning problem, debugging complex code, etc. That's why we have our agent call out to the reasoning model just for when it needs to "ask the expert" something. It works fairly well.

I'm usually a LLM project hater but gotta say RA.Aid looks pretty rad :) Kinda reminds me of Open Interpreter[1] from back in the day, but all grown up.

[1]: https://github.com/OpenInterpreter/open-interpreter


Claude represents some sort of inflection point or phase change. With all the noise, Claude Sonnet is by far the most impressive model since gpt4.

You should only use reasoning models if your prompt needs reasoning (eg debugging a weird error or writing an optimized algorithm).

I’d love to know as well. The OpenAI models refuse to stay on task for me, gradually ignoring instructions to do whatever it wants.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: