Time to move to open-source and smaller reasoning model.
Here are the top three learnings from auto-prompt optimizing DeepSeek R1 LLaMA70B for RAG:
1⃣ A trained DeepSeek R1 LLaMA70B(r1 distilled) is even better than GPT-o1 without training.
2⃣ The “Reasoning” model is less susceptible to overfitting compared with non-reasoning models. By comparing it with GPT-3.5, both gpt3.5 and r1 distilled start at the same accuracy and reach similar accuracy on the validation dataset. However, on the test dataset, r1 distilled often achieves much higher accuracy.
3⃣ R1 can think too long and run out of output tokens before finishing the task. The optimized prompt specifically added instructions for it to “think less.”
Yup. The LLM-AutoDiff is just getting started. But it has proven generation-only without explicitly doing few-shot samples can be even more effective and create shorter final prompts
Congrats on the paper! I read through some of the github docs and read through the paper, this sounds very impressive, but I'm trying to think of how to best use this in practice... is the idea that I could give some kind of high-level task/project description (like a Python project), and this framework would intelligently update its own prompting to avoid getting stuck and to continue "gaining skill" throughout the process of working on a task? Could this be used to build such a system? Very curious to learn more.
I'm Li Yin ([GitHub](https://github.com/liyin2015)), the author of AdalFlow and a former AI researcher at Meta AI.
AdalFlow was inspired by a viral [LinkedIn post](https://www.linkedin.com/posts/li-yin-ai_both-ai-research-an...) I made, discussing how the LLM ecosystem lacks a shared library that bridges the gap between research and product development—similar to how PyTorch has streamlined model training and adaptation.
I decided to build this library while working on my product, a conversational search engine called [Sylph](https://sylph.ai/). After trying out existing libraries and finding that I had to write everything myself, I ended up with a solution that was lighter, faster, and offered more control. However, managing the codebase soon became overwhelming.
AdalFlow is based on my vision for the future of LLM applications, which I see as a three-stage workflow:
- *V1*: Use the library to quickly build your initial task pipeline, getting you 70-80% of the way to production.
- *V2*: Auto-optimize the prompt to push an additional 10%, bringing your product to a near-ready state without the hassle of manual prompt iteration.
- *V3*: Leverage V2 to label more data. As more users interact with your product, the next step is to fine-tune the LLM, further optimizing for speed, accuracy, and cost-effectiveness.
We've completed V1 and V2. Our auto-optimizer can enhance GPT-3.5 performance to match that of GPT-4, making any task nearly production-ready. Our architecture is the most robust, lightweight, and modular, with our auto-optimizer being the most accurate—even when compared to Dspy and Text-Grad. We have three research papers coming out soon that will explain how we achieved this. This is the first time the library has been released ahead of the research papers.
It’s definitely worth checking out—you might be surprised by the results. We've had similar experiences using PyTorch and PyTorch Lightning.
Thanks for the insightful response. Good point on using 4o-mini to save cost. I'll try it out.
I will check more into the soft-prompt tuning.
For the current scope, we are focused on in-context learning, ways to improve model reasoning at the inference time.
We use auto-differentiative framework (backpropagation) to do zero-shot instruction optimization and few-shot demonstration. currently even just zero-shot can often surpass Dspy's few-shots (as many as 40 shots). And I have come up a training paradigm that will (1) start zero-shot (2) review performance from advanced teacher model to see if we can have a gap to gain from the teacher. (3) if there is a gap to teacher, we start to do low-shot demonstrations, and gradually increase the number of shots.
Built with AdalFlow library: https://github.com/SylphAI-Inc/AdalFlow
Will including dataset creation, evaluation, and auto-prompt optimization
reply