Hacker News new | past | comments | ask | show | jobs | submit login
Promptbase: All things prompt engineering (github.com/microsoft)
195 points by CharlesW on Dec 17, 2023 | hide | past | favorite | 42 comments



Curious as to why some sort of model specifically trained for prompt engineering couldn't be placed before a LLM in a pipeline, or used in a transfer learning scenario.

I'm also curious as to why existing LLMs can't be fine tuned to handle this, if prompt engineering is really a major concern.

Usually, if I don't get the answer I'm looking for from ChatGPT, I tell it it's not the answer I'm looking for, what the original answer was missing, and I usually get a better answer the second time around.

If it goes beyond that I sometimes resort to just cussing at it, and that usually does the trick.


> Usually, if I don't get the answer I'm looking for from ChatGPT, I tell it it's not the answer I'm looking for, what the original answer was missing, and I usually get a better answer the second time around.

This is an expected behavior, as by default even with RLHF ChatGPT will output statistically "average" content. It's also the reason why Chain of Thoughts prompting works very effectively.

I have a (somewhat out of date) notebook demonstrating this, plus a function calling trick which allows you to get the improved result in a single API call: https://github.com/minimaxir/simpleaichat/blob/main/examples...


Thanks for sharing.


I don't think anyone says it can't. Isn't that basically what ChatGPT does to interact with DALL-E?


I don't personally know.

I've been curious if DALL-E has been truly been mixed in w/ ChatGPT as a single model using a Mixture of Experts (MoE)- type learning gate to train them all together.


> Curious as to why some sort of model specifically trained for prompt engineering couldn't be placed before a LLM in a pipeline, or used in a transfer learning scenario.

They can.

It's just important enough for performance to be worth doing this manually.

> I'm also curious as to why existing LLMs can't be fine tuned to handle this, if prompt engineering is really a major concern.

It depends what you want them to do. They're general purpose things and there isn't a one size fits all solution here.


In my custom instructions. I ask it to always reword my question better and then answer that instead. It usually works quite well for me.


As one prompt, or as two? As in you query for "reword this prompt: [...blah...]" and then manually copy-paste the answer back in, or query for "[...blah...] // instead of processing the prompt before the //, reword the prompt to make it more effective and do inference on the reworded prompt"?

I guess in the 2nd case it would be useful to ask it to output the reworded prompt too.


Prompt engineering is rapidly becoming just a way to squeeze better, predictable performance out of crappier, cheaper models.

If your app or feature is powered by an LLM, why pay for the state-of-the-art LLM when you can get comparable performance out of a cheaper one with a little bit of prompt engineering?

Over time, as the models themselves improve, prompt engineering may become less important. I already think it's largely unimportant for day-to-day one off things.


Indeed, I'd expect more capable models to be less amenable to prompt engineering, but even if it is true it's quite possible out best models are not past "prompt engineering efficiency peak" yet. It is also fairly hard to quantify (I was thinking about some naive approaches to do that during our work on BIG-Bench [1] but I couldn't think of something robust enough), so I don't think we will even be able to say we are past this peak until much later.

[1] https://github.com/google/BIG-bench/issues/801


Prompt engineering benefits models of all sizes, even the 70B ones.


I think this is really interesting from a meta (in-context) learning pov, but I think at some point prompt engineering stops being prompt engineering and instead becomes in-context training. This is problematic for applications such as search or completion, unless some other software/model does this automatically for you!


Are there any good write ups for LLM prompt engineering for beginners? Would love to recommend something to our Javascript developers - like this repo is great, but the basics are definitely getting lost in the python code


Start with something like Phind.com or AutoExpert for ChatGPT Plus:

- https://phind.com

- https://chat.openai.com/g/g-LQHhJCXhW-autoexpert-chat

Phind has a whole Discord channel and they're pretty focused on building a great tool aimed at programmers. It's approaching pinned-tab status for me.


Yea sign up for ChatGPT Plus and make custom GPTs for yourself. That's where you can experiment with prompt engineering (custom instructions) to solve your immediate problems in a reusable way, without having to host a server or anything.

Disclosure: I work for Microsoft; this advice is my own.


Note that openai add more custom things to the prompt than you do, so you can't just drop your things into an assistant and have them work.


The prompt techniques in this submission are not for beginners. You can get >80% of the way there with using a simple system prompt such as:

    You are an expert JavaScript programmer. Write a JavaScript function based on the user input.

    You must obey ALL the following rules:
    - Only respond with the JavaScript function.
    - Never put in-line comments or docstrings in your code.
And then editing iteratively based on the output to whatever your desired use case is. If you want to test ChatGPT system prompts directly in a UI, you can do that in the OpenAI Playground: https://platform.openai.com/playground?mode=chat


Right, but going beyond the general purpose knowledge. For example wondering about best practices or approaches for building agents for a proprietary API - not extensive enough for something like Pinecone and all the knowledge should in theory fit into context


I'm shocked people are even talking about prompt engineering like it's a new discipline, when the measure of success is basically how you feel about the results.

I've found that GPT-4 works pretty well if you just talk to it like a person.


I found this resource [0] handy for getting a grasp on all the different terms people use (zero/one-shot, tree of thoughts, RAG, etc). It's not super detailed, which is actually a good thing for an introduction—it just gives a high-level introduction to each technique before linking to papers and other resources for more in-depth research if the technique seems likely to help. It was enough for me (a professional developer) to get started on some side projects with Mistral.

[0] Prompt Engineering Guide (https://www.promptingguide.ai/)


Excellent resource, thanks a lot!


How is this called engineering where there is no material with properties that can be worked with?


Financial engineer, software engineer, social engineering... etc are counter examples to requiring physical materials with properties.

Even from an etymology standpoint engineer comes from the Latin ingenium meaning clever.


software engineers work with the chips, which have an API and from my POV it is a property.

that is a portion of what I have been referring to.


What's social engineering, then?


I think I have been misread, the whole point of my comments is to realize that anyone can staple engineering and make you think this is something someone actually thought and went through, which is usually, not.


not a science. joking, friend.

I am not really sure how to answer this question, seems like a bunch of skills.


I'll call it "AI" when prompt engineering ceases to be a thing.


I hate it as much as the next person when people make unwarranted comparisons between AI and humans, but... there are multiple fields dedicated to studying effective human communication, with research into which writing techniques are most effective for pedagogy, technical communication, legal communication, and so on. Even if we achieved full smarter-than-human AGI, I don't expect prompt engineering to go away.

The real problem is that we insisted on giving a discipline that is far more art than science the title "engineering".


Yes, isn’t the whole “alignment” fear basically that if we had smarter than human AGI, we would need smarter than human prompt engineering?


Alignment refers to the process of aligning AI with human values. I don't see why a superhuman AI would require different prompting than is in use today.


The idea is that keeping a superhuman AI aligned would require superhuman prompting. This is the whole premise of OpenAI's SuperAlignment research and recent publication.


Fine. I'll call it "AI" when call our communications with it "rhetoric" ;)


How could it be "a thing?" The models aren't well understood or published in most cases, the data used to create the model is unknown, and the system returns a single result instead of several results with tagged probabilities.

I don't know what "prompt engineers" think they're "engineering." There's nothing of the sort remotely happening here. This is just random uninformed actions being tested against a weak fitness function. The results are effectively meaningless in any broader context.


Humans are "intelligent," yet we often have to re-explain ourselves when asked semi-complex questions.


You wouldn't say you're "prompt engineering" when communicating with your spouse or boss.


But you would say you used "social engineering" to manipulate an organization: https://en.wikipedia.org/wiki/Social_engineering_(security)


That's largely a difference of terminology. Prompt engineering is just a more technical, all-encompassing term covering colloquial language like "can you rephrase the question" or "I'm not understanding, can you explain that in a different way?"

I interpreted the original comment to mean "I won't consider ChatGPT an artificial intelligence until we don't need to prompt engineer." If that was the intended meaning, I just wanted to highlight that we do "prompt engineer" humans while also considering them "intelligent."


Sorry, but this is not prompt engineering, but it's not their fault. Real prompt engineering is mostly not even a thing with GPT-4 because somehow the NLP world is massively behind the stable diffusion world on prompt engineering.

https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

It's almost 2024 and I still can't ask ChatGPT for "the definition of {apple|orange}" where {apple|orange} is the mathematical embedding average of those two words. I can sure do this in Stable Diffusion though!


That's just a No True Prompt Engineer fallacy. You can get substantially improved generation results for either LLMs or Diffusion models with proper prompt engineering, even with GPT-4.

You can do prompt term weighting with offline LLMs (e.g. compel: https://github.com/damian0815/compel ) or averaging the embed tokens yourself and passing the embedding matrix to the model, but unlike with image generation where the results of prompt weighting are more obvious, there isn't as much of a need for it for LLMs.


Sorry, but compel seems to only work with diffusion models. Note that they have no examples of running this with modern LLMs. It only supports a fraction of the techniques that, for example Automatic1111 supports for "real" prompt engineering. Even if I'm wrong and compel does work with, any transformer model, it still only supports a small fraction of all the "real" prompt engineering techniques.

I shouldn't have to "average the embed tokens myself".

I know you're a big deal in the industry, but you're SO WRONG about the idea that "there isn't as much of a need for it for LLMs". I hate that the NLP community has such huge blindspots for this kind of stuff. Anything that gives further levers of control has massive improvements to the capabilities of an LLM.

My github gist proves that all of these techniques found in automatic1111 work with NLP models, yet no one implements it. I think it's because of the potential for breaking alignment techniques with them.

And yes, I will fight and die on this hill even if Yann Lecun and Christopher Manning tell me I'm wrong. I know I'm right.


> My github gist proves that all of these techniques found in automatic1111 work with NLP models

No it doesn't, your gist just shows that it's possible to implement, which I'm not disputing. And even then, your example with GPT-2 has the comment "GPT2 is very tempermental with this technique" and requires a temperature of 20 to behave which is an accommodation that can never be used in a real application.

If you can create test cases and demos where prompt weighting in a LLM results in a distinct observable improvement as it does with diffusion models, that would be a different story. I'd love to be proven wrong but your gist doesn't do it.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: