Hacker News new | past | comments | ask | show | jobs | submit login
Prompt Engineering Guide: Guides, papers, and resources for prompt engineering (github.com/dair-ai)
544 points by yarapavan 12 months ago | hide | past | favorite | 149 comments

I've been developing a methodology around prompt engineering that I have found very useful:

From Prompt Alchemy to Prompt Engineering: An Introduction to Analytic Augmentation:


A few more edits and it's ready for me to submit to HN and then get literally no further attention!

How is this different from e.g. a python agent like https://github.com/hwchase17/langchain-hub/blob/master/promp... ?

Using the terminology that I'm working with this is an example of a second-order analytic augmentation!

Here's another approach of second-order analytic augmentation, PAL: https://reasonwithpal.com

Third-order, Toolformer: https://arxiv.org/abs/2302.04761

Third-order, Bing: https://www.williamcotton.com/articles/bing-third-order

Third-order, LangChain Agents: https://langchain.readthedocs.io/en/latest/modules/agents/ge...

The difference isn't in what is going on but rather with framing the approach within the analytic-synthetic distinction developed by Kant and the analytic philosophers who were influenced by his work. There's a dash of functional programming thrown in for good measure!

If anything I filled a personal need to find a way to think about all of these different approaches in a more formal manner.

I have scribbled on a print-out of the article on my desk:

  Nth Order

  - Existing Examples [x] (added just now)
  - Overview []
  - data->thunk->pthunk []

This seemed interesting. So if I get your idea correctly, rather than talking to the chatbot directly, you first run your prompts through some algorithm which increases the chances of the AI getting what you are asking and giving a successful result?

It's more like instead of asking for a chatbot to try and hallucinate/synthesize/guess answers like large math questions or the population of Geneseo, NY (which it is bad at) to introduce a computing environment [read: eval(completionText)] so it can instead translate questions into executable code or further LLM queries but with a provided context.

This is currently how a number of existing tools work, both in published literature and in the wild with tools like BingChat.

I have personally found this analytic-synthetic distinction to be useful. I'm also a huge fan of Immanuel Kant and I really love the idea that I can loosely prove that Frege was wrong about math problems being analytic propositions!

The whole prompt engineering thing feels like a temporary stopgap. Because ideally language models should be able to understand prompts from anyone without them trying to craft in a way that works for the model.

Not really. It's just a higher level programming language. You still need to know how to decompose a problem and structure the solution. On some level, also systems thinking as this stuff will get integrated together.

It's not higher level programming, because it's imprecise. We don't know how input will impact output, or to what extent, or why. If prompt engineering is programming, so is poetry. Which, I suppose you could make an argument for (calling forth subjective imagery in the reader's mind), but at that point you're sufficiently stretching definitions to the point they mean nothing.

I have been using ChatGPT to test its ability to create JSON on the fly, just with a natural language description of the exact data format, and a natural language description of what I want to go in that object. It even escapes the string properly if it contains special characters, from what I noticed.

Other than complicated requests like “if object is of type A, include fields ABC but not D. If object is of type B, include only D but not other fields”, it gets this right 99% of the time.

It also works for CSV, but it’s trickier. It seems like it “knows” how JSON works to a much better extent.

And as for parsing JSON? I’ve not truly pushed it to its limits yet, but so far it’s had no issues understanding any of it.

It’s mind-boggling. Yes, it’s inefficient, but it can basically parse, generate and process valid JSON with just a brief set of instructions for what you want to do. For exploring ad-hoc data structures or quick mocking of API backends, this is great.

Try evaling completed Python dicts! Works like a champ…

> It's not higher level programming, because it's imprecise.

This argument is weak. Undefined behavior does exist and „high level programming language „is a moving target

Not only that but throwing away precision is an intrinsic part of becoming higher-level. This is just a different way of doing it.

Is “imprecision” that the previous commenter is reacting to maybe more specifically the strong potential for nondeterministic behavior exhibited by LLMs? That would seem to stretch the practical experience of programming vs. “lower-level” tools like Python or C. (Also, a world where I am calling Python “lower-level” is wild).

Prompt generation feels like it's going to reach a point where it becomes more similar to legalese — which in itself feels more similar to a programming language than natural speech.

why can't the model do those things? As someone said, it's like Google-foo. It used to be the case where you had to give Google some context to disambiguate certain queries but now it is really good so I don't have to write anything crazy like site:wikipedia.com etc

My experience with Google is the exact opposite. It is so poor at interpreting "what I really want" that for it to be really useful, I need to lean harder on google-fu than in the beforetimes. But Google has removed, weakened, or broken many of the operators that were used to do this, so google-fu isn't as possible as it used to be.

Because most people are so unbelievably boring, so statistically predictable and common, that taking off the edge of queries leads to higher click-through rates overall. It's the same pages all the time, and most people don't even bother to scroll to page 2. Of course, people who know what they want, and are best served with diversity and breadth - like you ;) - lose out.

I am absolutely not affiliated but you should seriously consider giving Kagi a try.

I was increasingly frustrated with all the NLPness and operator deprication in google which has been accelerating since at least the 2010s.

But with Kagi it reallys makes me feel like I am back at the wheel. I think for product search it still has some way to go but for technical queries it is just on a whole other level of SNR and actually respects my query keywords.

Huh, my recollection is the exact opposite. I remember the good old days when I could use inurl: link: and explore the website contents fully and drill down further if necessary, compared to now, where google seems to always think to know better than you what you are looking for. If you are not happy with the initial results it gave you, you are pretty much out of options, good luck trying to drill down to some specific thing.

Large language models will probably never be reliable computers. The solution is either providing translation examples (aka, in-context learning, few-shot) or fine tuning a model to return some symbolic method to tell an associated computing environment to evaluate a given bit of math or code.

In many ways this is sort of what humans do. They have something they don't yet know how to do and they go away and learn how to do it.

It definitely feels like I have to work to maintain the state of each algorithmic step when I'm multiplying two three digit numbers in my head. It's a lot easier if I can maintain that state on a piece of paper.

I'm definitely not calculating the path of a projectile in a similar manner when I catch a ball.

I'm definitely not "computing" a sentence when I read it in the same way that I compute the multiplication of those two three digit numbers!

>I'm definitely not calculating the path of a projectile in a similar manner when I catch a ball.

You're not calculating it in a traditional sense, but there's definitely some systems of partial differential equations being solved in real-time.

Because prompts are too general to solve most problems.

Prompt: "Calculate the average price of Milk"

This is far too vague to be useful.

Prompt: "Calculate the average Price of Milk between the years 2020 and 2022."

A little better but still vague

Prompt: "Calculate the average Price in US Dollars of 1 Gallon of Whole Milk in the US between the years 2020 and 2022."

Is pretty good.

For more complex tasks you obviously need much more complicated prompts and may even include one-shot learning examples to get the desired output.

> why can't the model do those things?

Isn't this just an intrinsic problem with the ambiguity of language?

Reminds me of this: https://i.imgur.com/PqHUASF.jpeg

edit: especially the 1st and last panels

Like an advancing army, chatgpt will soon write prompts for you better than any of this new crop of nontechnical wannabe shamans. This little wave of prompt gurus trading for attention their insights into how to game systems they can't comprehend has about as much to do with programming as does slinging shitcoin tips.

ChatGPT has certainly helped me refine some prompts. It’s 50/50 whether I’d make the same changes more quickly just rewriting the prompt by myself, but it helped me notice some blind spots in the prompt and get me thinking about how to fill them in.

Agreed. Much like google fu there is still likely to be some skill involved, but calling it "engineering" seems a bit much

Proper prompt engineering would likely involve finding emergent properties like this:

https://github.com/dair-ai/Prompt-Engineering-Guide/blob/mai... (this is claimed for LLMs, not proven)

It only seems like a trick until enough papers get written about these kinds of findings.

Thos seems similar to the examples given on the gpt3 playground

Agreed, "prompt design" (for example) would've been a better characterization. I'm sure people are already putting "prompt engineering" on their resumes.

It sounds better than “asking leading questions”.

and better than “can type to AI”

Google also defines "engineering" as:

2. the action of working _artfully_ to bring something about. "if not for his shrewd engineering, the election would have been lost"


Merriam-Webster has:

3 : calculated manipulation or direction (as of behavior)

giving the example of “social engineering”


Random House has:

3. skillful or artful contrivance; maneuvering


Webster's has:

The act of maneuvering or managing.


In Canada you can't call yourself an engineer unless you've actually been certified as an engineer. In the US, engineering programs worth anything are typically ABET certified. We should stop throwing that word around like it doesn't have a centuries old profession associated with it. It cheapens what actual trained engineers do and have done to earn their training.

The underlying problem is that human language is imprecise. Even humans frequently misunderstand what each other is meaning when they say things, because very often there are multiple valid ways to interpret a statement.

Ithkuil, however, is very precise :)


I disagree because when I want to get results from flesh-and-blood humans, I still have to engineer my prompt very carefully (and often get it wrong!).

This includes asking questions, and trying to direct someone to effectively complete a task.

Prompt engineering is just communications skills as applied to AI instead of meat-minds.

I think as of now one big difference is that when talking to a competent flesh and blood human, they will ask (or at least try) the right questions before giving an answer.

E.g when talking with my accountant, she usually ask me a bunch of clarification questions and numbers, instead of just making a best effort, but confident sounding response with whatever initial context i gave.

I myself don’t have to have the depth of knowledge to direct someone to complete a task step by step to get the right results.

Very interesting point.

Perhaps a big step forward for chat AI interfaces is to make the AI capable of knowing when it needs to ask follow-up questions and then having it do so. Essentially, it helps you along with your prompt engineering.

Perhaps this is what the prompt engineering tools are doing anyway.

What we need to do is integrate the prompt engineering tools with the chatbot itself, so it can both help extract a good prompt from users and then answer that prompt in the same process.

I think this is where we'll move towards relatively soon; it seems obvious when you say it.

If anything, we're all learning pretty quickly that a chatbot interface is probably not the ideal way for interacting with an LLM!

I've seen this argument a bunch recently, and I firmly disagree with it. I just published this explaining why: https://simonwillison.net/2023/Feb/21/in-defense-of-prompt-e...

That's what I'm wondering, if using LLM to create prompts for the LLM is a few steps away from integrating that into the LLM and unlocking recursive phenomenon and emergent behavior.

This is actually the premise behind Anthropic's Constitutional AI (https://arxiv.org/abs/2212.08073) where they have a set of principles that guide a workflow where the LLM creates the prompts and the critiques.

They were recently given $300M by Google, so certainly you have a promising idea there

Ah, so Google' response to the OpenAI+MSFT collaboration...

We don't have to do that to communicate with humans because they're aren't one-shot. Humans ask followup questions like "Polish as in people from Poland, or polish as in rubbing until it's shiny?". Or they know you both work for a translation company and you don't talk about shinifying objects very often.

If you did have to send google-esque one shot queries to humans you'd probably settle on short hand like "polish shiny" or you'd opt to use "poland" specifically to avoid it. For most known-ambiguous terms you'd do this, the same way we say out loud "M as in Mancy" because we know the sound of the letter can be ambiguous sometimes. We have lots of these in English that you use all of the time without knowing it. In a smaller audience the local language gets even more specific, you probably disambiguate people with your friends like "programmer Bob, not carpenter Bob". It's not at all crazy that we'd develop another local language for communicating with computers even if that's not a traditional programming language

Except we do that for humans and it's still not really relevant to language models.

If I were talking to a python programmer I could assume they knew what a for loop was, so could phrase a question requiring the context of one differently than I would for the layperson. Just like if I were designing inputs to one language model I could assume it's capabilities that are different from another.

I don't think it's relevant to the performance of language models for that same reason though, we already have to design out queries with the thing being queried in mind so I don't see why we wouldn't for llms.

I’ve been able to make ChatGPT ask for follow-up clarifications if it’s unsure about the meaning of something. I didn’t explore this further, but the ability is right there and can be triggered with the right prompt text.

This IMO is the key distinction between "true" AI/AGI and the current generation of LLMs. ChatGPT cannot understand what you are telling it. It does not have a brain and cannot speak conversational-level English. This is why the entire area of prompt engineering needs to exist. Whether it is a stopgap or not depends on if we can actually make these models understand what they are asked rather than just pattern match.

I fail to see the distinction you're trying to make. Creating the input for the llm to get a specific result is crafting an input in a way that works for the model and it's something we even do between different people(you probably have different ways to ask your boss and spouse to do something).

Even if it doesn't become obsolete as an activity the knowledge you build around it becomes obsolete faster than almost any other tech sector. Whatever you learn by trial and error now about one particular model will be completely outdated in two years from now.

It’s not, it’s what all these services are doing behind the scenes so users don’t have to.

Ideally people should be able to understand sentence from anyone without them trying to craft in a way that works for the person.

But they don't.

"Prompt Engineering" is a fictional attempt to create space for human intermediaries, and arguments against this are weak and pointless. Disregarding where it is _today_, the goal obviously is for natural language to produce the desired output. It happens to fail at that today, but poking around at whatever arbitrary, unpredictable mutations to the input will get you there (without explanation) isn't a skill.

Even the becoming-cliche "humans don't know what they want" argument is its own counterargument. You're right, humans do not know how to precisely ask for what they want, and yet manage today to navigate around this all the time without learning a new way to communicate.

My opinion on this comment is that it is completely wrong.

1. Currently, Prompt Engineering works, and that alone is a reasonable reason for people to explore it. The concept of doing nothing about things that work today because there might be a day when they become obsolete makes no sense at all.

2. Prompt Engineering is important and meaningful. Most people are thoroughly incompetent at giving instructions. And it's a core skill of projects. An incompetent person will fail, no matter how competent their subordinates are.

3. PE is something that will be needed even as AI gets better (until we have omnipotent superintelligence). In fact, even giving instructions to humans requires PE. It's a limitation of human language.

4. I think it's also wrong that people's motivation for exploring PE now is an attempt to make room for humans. Have you ever been to an AI drawing community?

5. It's not PE that should exist for human intermediaries, it's the skills to handle AI, including PE. If humans no longer need the skills to deal with AI, it will be AGI by definition. If what you're saying is that when AGI comes along, all humans will be unnecessary, that might be true, but then what you're saying is meaningless.

God we're already making it an acronym?

The goal is to not need a translation layer sure, but the reality is that of having translation layers, and that the ai is not smart enough to understand when it can't accurately respond to a prompt to say so. It's the translator that has the understanding that diffusion won't come up with Santa giving a massage, only receiving one.

Translation is a skill, and translators remain employed today despite ml improvements in automated translation over the years.

Humans do talk in different variations of language for different tasks. Eg. Legalese and code switching.

I'm not sure how you're defining skill, but it seems like you can get better at prompt engineering, and some people are consistently better at it than others. I'd call it a skill, just like framing a search query in Google is a skill. Sorry if that's weak and pointless.

The problem is calling it "engineering".

Sounds more legit than "prompt wizardry"

Agree, it's such a word people here have strong opinions about.

But what is your suggestion? I feel it would be recieved easily if it was LLM Prompt Cheat sheet or something, though I wouldn't have seen it on HN.

Prompt framing? Prompt optimization?

I’d go with prompt hacking or prompt bashing

you can get better at astrology, I'd still advise against carving out a niche there. Claiming you can tame a hurricane of untraceable parameters with guesswork, on rapidly changing systems whose designers have the goal of obviating the need to format queries in magical ways, is dicey at best.

I think you're basically saying that this thing is a) a black box, and b) more complex than any human can pretend to understand, even if we cracked open the box and looked inside.

But, that does not close off the possibility of learning what ways we can poke and prod it to affect the results.

It would be interesting to run an experiment to test whether some people are consistently better at generating desirable results from an AI model. My money would be that they can, even at this early stage of "prompt engineering" as a discipline, let alone 5 or ten years from now.

You may also be saying "don't call it engineering, it feels more like black magic," a position I would be sympathetic to. But, I think a lot of realms of engineering deal with uncontrollable elements we don't understand and have to just deal with, with increasing levels of control as our disciplines become more sophisticated.

I actually appreciate the good-faith rephrasing, and it's correct, but I think my precise concern failed to come across. It's not "local man frowns upon change", it's that I think we've created the mirage of place to add value that is very temporary but may look attractive to people perhaps eventually replaced by this very tech. It's a sweet "don't worry" song while an absolutely exploding area of innovation is actively trying to make it exist less.

Yes the end goal is for the computer to perfectly understand every nuance of a very ambiguous human language, and everyone understands that, but we aren't there yet and may never get there. Until then, prompt engineering is not "fictional" but exists to fill in a real gap in the system.

Just like you can get better are searching things with google or explaining things to a human, you will be able to get better at getting what you want out of chatgpt.

I don't see why you find that controversial.

It's not controversial, it's just clear this will in the not-too-distant future be relegated to the same sort of high school level introductory course that introduces a person to saving files on google drive. I worry that people are believing this will be a thriving space and a new frontier for tech talent, when really they exist to the extent current interfaces are failing. And current interfaces are improving at impressive speed.

I've seen a boom of people selling ebooks, etc. with "46 Revolutionary ChatGPT prompts" with the idea that someone found some cool behavior that could be economically useful.

However, I think this kind of 'engineered prompt' sharing is about as useless as sharing the engineering calculations for building a bridge over a particular river : none of those calculations are generalizable and all must be redone for each future placement.

Chances are you'll need to tweak any purchased prompt to fit your own use cases, and if you are capable of tweaking purchased prompts, why can't you just chat with the bot for an hour or two and build a working prompt for yourself?

I think the value of "prompt engineering" as a skill is all about adaptability and testing and verification processes: what is the process of going from idea to functioning prompt for a particular use. Silver bullet prompt fragments are cool, but not really products - the only people buying these are suckers who don't understand how to use chatGPT. Because chatGPT will happily help you make these prompts for free.

FWIW, I watched a relatively trivial video on using ChatGPT a couple nights ago that had like 6 prompts, and while I had seen people do some of them before it was still useful: it showed me how people were using the project and massively accelerated getting to useful results. People buy books of suggestions for how to do literally everything from song writing to project management... is it really surprising that how to get the most value out of talking to an AI--something many people might find a bit strange and unnatural--is something that you could help people be better at? If the same eBook had been "46 revolutionary prompts for your next team brainstorming meeting" or "46 revolutionary ice breakers for your next party" would we even bat an eye?

Do you have a link to the video?


Honestly, my favorite moment has nothing to do with the thesis of the video: it is when she says "zero-shot chain of thought" and stops talking for a moment to comment about how that rhymes, as it definitely only rhymes because of her accent; but I'm really into linguistics and, in particular, phonology, so, for me, that kind of moment is magical.

But like, the point is: if you watch that video and replace the idea of you talking to ChatGPT with talking to one of your coworkers, this is a totally legitimate video someone would make. You can't quite just use a book of brainstorming ideas, though, with ChatGPT, as it has some fundamental limitations to how it understands certain kinds of concepts or language, and so seeing "oh that path worked well" is practically useful.

Interesting. I found most of what was described in this video fairly obvious. The prompt length one was useful non-obvious insight, but the rest of it I think is pretty easy to figure out on your own if you play with ChatGPT for a few evenings. But could certainly see it being useful to chatGPT novices.

I don't doubt that there is value is sharing useful workflows to learn how to write prompts or useful things to add to prompts, but I've seen a lot of people selling a 'list of tuned prompts' as if each was a plug-and-play deployable software stack. I think anyone who tried to actually productize these prompts would find the need to tweak the prompts so much they might as well have started from scratch.

In short, I think there is value in teaching people how to write prompts to custom-fit their needs, but the idea of general purpose couple-paragraph superprompts that mere knowledge of is worth tens of dollars seems fallacious to me.

Disclosure: I've never bought any of these, maybe they are that good. I'm very skeptical though.

I just don't think that most people--maybe you ;P--would say the same thing about a book about how to brainstorm with a team better, as that's all you're doing when you are talking to an AI... and yet people buy those books all the time and I certainly read a few of them in my early 20s. Was it a great, earth-shattering video? No. But seeing someone quickly get concrete value from something is still of a lot of practical value for seeing how it strings together.

That's the COT-CAUGHT merger. Probably for most Americans at this point it rhymes

There's lots of people trying to make a buck off this shit.


I just published this, as a counter to the "prompt engineering will be made obsolete as AIs get better" argument that I keep seeing: https://simonwillison.net/2023/Feb/21/in-defense-of-prompt-e...

We need a name for this, something like genie effect, but that doesn't sound creative enough.

But working of the fables of the genie, we have and all powerful wish machine where the user mistakes they are telling the genie with intention of their thoughts, whereas the genie is looking at all possible decodings of the words they are saying without knowing the users thoughts. If you do not verbalize exactly what you want (and have done some thinking about the ramifications of what you want) you might end up squished under a pile of money, or some other totally ridiculous situation.

Unless you share your entire life with the AI/genie you will always have to give more detail than you think. Hell, in relationships where people live together for years there is always the possibility for terrible misunderstandings when communication breaks down.

The age-old bane of software requirements: "It's exactly what we asked for but not what we want."

I've used an analogy of magic and spell casting to prompting in the past, but I've since been convinced that it's risky to describe it like that because it implies that this stuff is all-powerful and impossible to understand: https://simonwillison.net/2022/Oct/5/spell-casting/

“Any sufficiently advanced technology is indistinguishable from magic” -- A.C.C

I'm wondering if we'll reach a point where it is impossible or nearly impossible, and what value of 'understand' we assign to it. For example deeply understanding x64 processor architecture. Does any one person understand it all, very unlikely at this point. The investment the average person would have to perform to accomplish even part of that would lead most of them to seek other more fruitful endeavors. Biology would be a good example of this. Nothing about biology is magical, and yet even simple systems contain such an information density that they are at the impossible to fully understand point, and that's before we start layering them into systems that have emergent behavior.

It seems to be a theme with AI.

Scale.ai (no affiliation, not a customer) has a product called Spellbook


Yeah, I've definitely had informal conversations with AI experts where they've talked about collecting spells for their spell books!

This was the theme of Bedazzled, the Peter Cook and Dudley Moore film (remade in 2000), itself a twist on the Faust legend.

In this case though, the increasingly desperate engineering of the prompt was each time intentionally side-stepped by the devil, though much like GPT, he couldn't help being like that, it was just his nature.

We need a name for this, something like genie effect

Investigating why the prompt isn't working:


This is great! More experiments are key. I know, I know, many of you are getting flashbacks to physics lab, but come on, it's totally different when you're coming up with the experiments yourself! :D

As for some philosophical inspiration, the analytic philosophers like Kant, Frege, Russel and early Wittgenstein have methods for breaking down natural language into (possibly) useful elements!

Like, everyone speaks of "context" in terms of these prompts... how similar is that to Frege's context principle?


Some other Wikipedia links:



In Frege's The Foundations of Arithmetic he disagrees with Kant that a proposition like "2 + 12 = 14" is synthetic (and Kant used large numbers as an example of how analysis would fail!!!). However, once we have a large language model we can experimentally show that math computations are indeed synthetic. Kant was right!

I actually did read through your "Prompt Alchemy to Engineering" link and the reference to Frege took me very off guard. Then I thought about it a little more and decided mathematical platonism felt even more fitting to me now.

I would love to hear some more of your thoughts, please expand on what you mean!

The more seriously I see people take the topic of "prompt engineering" the more deeply depressed I become about this field.

It feels like tutorials on tailoring the emperor's new clothes.

At the same time I feel it's only a matter of time until my job interviews involve questions like "can you provide a few example of where you've used prompt engineering to solve a problem? Describe you prompt engineering process?" and I'm just not sure I can fake it that hard.

I feel exactly the same way. Let’s do our part and not allow “engineering” to be co-opted in this manner and completely lose all meaning. I’ll settle for “prompt trials”.

I feel the same way about “data science” because there’s no “science” in it; could just call it statistics or whatever, no data science expert is actually called a “scientist” by the society, because we all know it’s not really science.

The popular confusion/conflation of "science" with "engineering" has long been a pet peeve of mine. They are very different things.

If you publish work in a journal or conference that's meaningfully important in your field you are a scientist.

Seems like you forgot that Mathematics is a field that has journals and conferences.

I'm trying to look at it as a powerful new tool like modern IDEs that can make a lot of people's workflow more efficient. If the question comes up in an interview, they're just asking if you have experience with this new tool their team uses, and comparing how you use it to their process. Similarly not having experience with an IDE like Visual Studio might not be a dealbreaker, but if there's another candidate with a ton of experience then it could tip the scales in their favor. Long way of saying, I would try to stay optimistic and just play around with it enough so you can at least bullshit about it if you still don't want to use it.

Be careful what you wish for; you might just get "artisanal handcrafted certified cruelty-free prompts" instead.

We found that having pre-determined sections are indeed very useful.

One for context, one for rules, one for output format, one for input data, etc.

Also, you can as for a structured output (gpt spits out JSON fine) but you need to be super explicit about each field possible values and the conditions they appear in.

Templating prompts with jinja goes a long way for testing.

And you will need a lot of testing, if for nothing else than remove variability in answers.

It's funny because people keep saying gpt will removing all values from having writer skills, but being a good writer helps a lot with crafting good prompts.

I've been writing prompts with the intent of producing JSON that contains JS functions to evaluate. Since the response needs to be parsed and evaluated in a specific structure it means that changes need to be made on the evaluation side as well as the prompt template side.

So I've been writing the translation examples (few-shot examples [not my favorite term]) as TypeScript, transpiling to JS and compiling with other examples and a prelude, and using the same method to build the translation examples as to build future prompts. It saves a lot of silly mistakes!

JavaScript has less tokens than TypeScript, so it seems more economical to convert to JS beforehand instead of passing TS to the LLM! I wouldn't be surprised if TS resulted in better solutions, though... add it to the endless list of things to test out...

The problem with that approach is that is very hard to verify that the output is consistent. If you get pure json, you can check it against a schema, and regularly check that chatgpt is not outputting BS (which is does easily). With turing complete code, it's way harder.

The approach I've been using is that I test against a corpus of questions with known answers. Using sample-and-vote, that is asking for some number of variable completions with a non-zero temperature and treating the explicitly computed solutions as votes, has smoothed over issues with basic syntax and logic errors.

This is pretty similar to the approach used in the Toolformer paper, other than sample-and-vote, which I believe is novel.

In this context what does temperature mean?

Temperature is an option passed into the LLM. A temperature of 0 means the response is deterministic, that is that it always returns the same completion. A temperature of greater than zero results in increasing variation in the completion.

Essentially what large language models allow us to do is move through an n-dimensional space using only words. Somehow we need Google Maps for semantic spaces, but to do that we need to have a strong definition of what a "semantic space address" is and what "semantic roads" look like

This is a cool way to conceptualize it. When conversing with humans, you can explore various parts of the semantic space, but the process also involves a range of top-down and/or bottom-up mechanisms that facilitate a kind of 'hidden navigation'. LLMs, on the other hand, lack this capacity but with the right prompts, one can simulate it by knowing how to steer, which is sort of what prompt engineering feels like to me.

"engineering" is an interesting way to frame "throwing things at the wall and seeing what sticks"

Yeah, this seems a lot like calling management "mandate engineering" or something. You don't really understand the processes by which it's working, so referring to anything related to a using LLM to get useful outputs as "engineering" seems a bit high-flown.

Um, I think you may be slightly confused about many physical processes and how they are engineered. There are plenty of problems where we understand that if we do A between the bounds of (X and Y) then everything works, and if we go out of those bounds things break. When building the process you don't have to understand why A works at all as long as you stay in those bounds, you just have to design the process either not to exceed them or to stop and give a warning if it does.

I'm engineering this conversation with you right now.

Welcome to Sorites Paradox

Yeah, but we already did it with software engineering.

Really?... Isn't the 'underlying' skill ( and it was hard for me to write that last word )... being able to communicate? Are we really going to call this anything but the ability to write one's thoughts intelligently?

One recent paper to be added to this list of Approaches/Techniques is:

Toolformer: Language Models Can Teach Themselves to Use Tools


The authors show a valuable method of using prompts to generate training data to be used for fine-tuning.

Check out Appendix A.2 in the paper for the example prompts.

I much prefer leisurely engineering

Came here for this joke - thank you :-)

The prompt engineers will be the first self-proclaimed engineers to be replaced by AI.

I've found that context is everything to getting consistently good output, and augmenting your prompts with known truths with SERPAPi, embeddings and a vector db, brought really flakey results into the >90% accuracy threshold.

As an aside - does anyone have good tools or methods for testing and evaluating prompt quality over time? Like performance monitoring in the web space, but for prompt quality. The techniques to use LLMs as evaluation tools of themselves always has seemed flakey when I've tried it, I'd like to use a more grounded baseline.

For example, if you have a prompt that says "What is the weather today in {city}?", you can run it against a list of cities and expected outputs (using a lookup to some known truthful API). That way, when you make changes to the prompt, you can compare performance to a baseline.

We personally came across the evaluation problem while building MakerDojo[1]. My current workflow is to run manually on 50 different test cases and compare the results against the previous versions of the prompt. This is extremely time consuming. And to be honest, I no longer test for every little change.

Some more contect - as a way to support MakerDojo[1], we are building TryPromptly[2] - a tool to do better prompt management. In that tool, we are building the ability to create a test suite, run the test suite and compare the results. At least knowing for which test case the results varied and reviewing them would go a long way.

Here is the format we are thinking: https://docs.google.com/spreadsheets/d/1kLBIb7W0jrY-IkNPqJsN...

In addition, we are about to launch after test suite is to have live A/B tests in the production based on user feedback. Users can upvote or downvote their satisfaction with the results and that will inform you which version of the prompt yielded better results.

If you have other ideas on how to test them better, it would be super helpful to us.

[1] MakerDojo - https://makerdojo.io [2] TryPromptly - https://trypromptly.com

These guides are going to feel so dated ten years from now. It's like reading a guide on how to use a stylus for a palm pilot.

Yes and no.

Human like AI isn't going to be magic because humans are not magic. You are still going to have to comprehensively communicate with your AI so it understands your reference frame.

If you walked up to the average programmer on the side of the street and threw a programming fragment at them, especially if its a difficult problem, they will have a whole load of follow up questions as to place the issue you want to solve. Coming at a human or AI with expectations and limitations first will almost always lead to a faster and better solution.

Tangential, has someone tried to have two chatGPTs chatting together? Like the good old joke which consists of dialing two random persons and have them discuss together.

Super grateful for this guide!

If there's anyone here who's become good at perfecting llm prompts & available for a freelance contract -> please contact me (my email is in my profile)

It's easy-ish to get gpt to generate "good enough" text, obviously. Like any tool, what's interesting and complicated are the use cases when it doesn't follow instructions well / gets confused, or you're looking for a more nuanced output.

I’m someone who hasn’t jumped onboard this space—isn’t this “prompt engineering” concept just for OpenAI’s ChatGPT or is there some broader reason to expect prompt engineering to be relevant? I’m extremely hesitant to rely on ChatGPT or its current interface as a tool since it exists at the mercy of a massive profit-hungry corporation. It feels too early to start inventing formal concepts against a single black box API.

One of the emergent properties of large language models is that they allow for "in-context learning" by providing "zero-shot", "one-shot", "few-shot" training examples. This seems to be consistent across all sorts of transformer based models with language capabilities. Certain kinds of prompts result in certain kinds of completions and "prompt engineering" is a collection of methods related to encourage a specific completion.

For example, prompts that do not contain enough facts about a desired question will result in "hallucinations", whereas prompts that have additional context programmatically added to the question will result in more reliably factual completions. I find it useful to think about this grounded in terms of the analytic-synthetic distinction from analytic philosophy.

The above guide is an excellent resource! You should read it!

I’ll give it a read then! Good to hear it’s more generic than I thought.

Is is relevant for all LLMs, for instance if you're asking Stable Diffusion to generate an image of a castle.

Composing the right prompts here will allow you to obtain actually what you want versus, let's say, a castle as drawn by a toddler (okay maybe that's what you want, but that's not the point)

Respectfully but realistically, this better fits “Prompt Analyst” than “Prompt Engineer”.

This morning, I brushed and flossed, which makes me a Plaque Removal Engineer. I then used my skills as a Room Tidyness Engineer to make the bed. After that, I engineered the harness onto my dog and took her on a walk: Canine Fitness Engineer. I engineered the water to a higher temperature using the kettle, and poured it over some coffee grounds to create a chemical reaction in my morning brew: Yeah, that's right, Caffeine Engineer. After this incredibly productive morning, I got in the car and drove to my job as a computer programmer.


I shared my tutorial (with code) on the topic before, but here's it again (just sent a PR to this repo too).


In short, Prompt Engineering is just one of the pieces in building a GPT-3/LLM-based solution. In fact, I'd say a whole new set of Software Engineering best practices is necessary and will gradually emerge. I gave one such approach that has been useful to my related projects.

Interestingly enough the best results I get from ChatGPT for writing code come from prompting it in a "pseudo code" style, with variables and logical instructions written in plain English.

In another universe the Just In Time mantra infected how we did engineering, and became known as "Prompt Engineering".

But in this universe, it took a different turn. Words are a funny thing. Ironic.

Thanks -- this will help out a LOT in copyright infringement cases!

This reminds me of Hitchhiker's Guide to the Galaxy. We've built systems that can answer any question, but now we need even more powerful systems to tell us what the question is.


No… no… that’s the answer. The question would be:


Prompt engineering. The pick-up lines of the NLP world…

Prompt “engineering” reeks of the A in STEAM

Prompt "engineering" is about as engineering as software "engineering".

Wasn't COBOL a first attempt in having a natural language interface to the machine? Is this the next attempt to make the human-machine interface simpler?

Why is it that we create something with which we can't communicate but that we expect to solve all our problems.

Nice, thanks. I have limited knowledge, but let's choose one example, GPT, are people using GPT to create prompts for GPT?

Large Language Models Are Human-Level Prompt Engineers


yes,I am doing it all the time.

Nice. What does your development environment/harness look like for this? Jupyter/Colabs?

i use neither, I call with APIs. I have databases of prompts and unit tests to help me test my prompts through api calls currently.

If anyone is looking for an easier way to manage their prompts in production (versioning, templating, caching and testing), please checkout https://trypromptly.com. We built this out of our experience using Open AI's completions API at https://makerdojo.io which is now powered by Promptly. (Testing and sharing are WIP)

On a relevant note, does anyone know any good prompt guides for vision models like Stable Diffusion, Midjourney, or Dall-E?

Langchain is a fantastic library to use but also educate yourself on the dark art of prompt engineering.

<T> Engineering

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact