More

meame2010 · 2025-02-06T21:25:13 1738877113

On-going open source project to reach to product-grad product.

Built with AdalFlow library: https://github.com/SylphAI-Inc/AdalFlow

Will including dataset creation, evaluation, and auto-prompt optimization

meame2010 · 2025-02-05T20:05:34 1738785934

not few shot, but prompt tuning via text generation via auto-differentiation.

https://arxiv.org/abs/2501.16673

meame2010 · 2025-02-05T18:38:24 1738780704

Time to move to open-source and smaller reasoning model.

Here are the top three learnings from auto-prompt optimizing DeepSeek R1 LLaMA70B for RAG:

1⃣ A trained DeepSeek R1 LLaMA70B(r1 distilled) is even better than GPT-o1 without training. 2⃣ The “Reasoning” model is less susceptible to overfitting compared with non-reasoning models. By comparing it with GPT-3.5, both gpt3.5 and r1 distilled start at the same accuracy and reach similar accuracy on the validation dataset. However, on the test dataset, r1 distilled often achieves much higher accuracy. 3⃣ R1 can think too long and run out of output tokens before finishing the task. The optimized prompt specifically added instructions for it to “think less.”

meame2010 · 2025-02-01T18:53:08 1738435988

We use gpt4o as the backward model. But I’m excited to try deepseek r1 as it has explicit reasoning available.

We are continuously adding more benchmarks to the paper with UTAustin.

meame2010 · 2025-02-01T18:51:31 1738435891

Yup. The LLM-AutoDiff is just getting started. But it has proven generation-only without explicitly doing few-shot samples can be even more effective and create shorter final prompts

meame2010 · 2025-02-01T18:48:51 1738435731

Author here. Yea, in this fashion. And it can create the feedback using llm as a backward engine

meame2010 · 2025-01-29T05:15:45 1738127745

Implemented in AdalFlow:https://github.com/SylphAI-Inc/AdalFlow

hnuser123456 · 2025-01-29T17:09:43 1738170583

Congrats on the paper! I read through some of the github docs and read through the paper, this sounds very impressive, but I'm trying to think of how to best use this in practice... is the idea that I could give some kind of high-level task/project description (like a Python project), and this framework would intelligently update its own prompting to avoid getting stuck and to continue "gaining skill" throughout the process of working on a task? Could this be used to build such a system? Very curious to learn more.

meame2010 · 2025-01-29T17:58:32 1738173512

you need a training dataset, and a task pipeline that works. You can refer to this doc: https://adalflow.sylph.ai/use_cases/question_answering.html

hnuser123456 · 2025-01-29T18:16:57 1738174617

Thank you, I missed the use cases section, that explains a lot. Nice documentation. Might play with this when I get home.

meame2010 · 2024-08-23T07:09:09 1724396949

Hey Hackers,

I'm Li Yin ([GitHub](https://github.com/liyin2015)), the author of AdalFlow and a former AI researcher at Meta AI.

AdalFlow was inspired by a viral [LinkedIn post](https://www.linkedin.com/posts/li-yin-ai_both-ai-research-an...) I made, discussing how the LLM ecosystem lacks a shared library that bridges the gap between research and product development—similar to how PyTorch has streamlined model training and adaptation.

I decided to build this library while working on my product, a conversational search engine called [Sylph](https://sylph.ai/). After trying out existing libraries and finding that I had to write everything myself, I ended up with a solution that was lighter, faster, and offered more control. However, managing the codebase soon became overwhelming.

AdalFlow is based on my vision for the future of LLM applications, which I see as a three-stage workflow:

- *V1*: Use the library to quickly build your initial task pipeline, getting you 70-80% of the way to production. - *V2*: Auto-optimize the prompt to push an additional 10%, bringing your product to a near-ready state without the hassle of manual prompt iteration. - *V3*: Leverage V2 to label more data. As more users interact with your product, the next step is to fine-tune the LLM, further optimizing for speed, accuracy, and cost-effectiveness.

We've completed V1 and V2. Our auto-optimizer can enhance GPT-3.5 performance to match that of GPT-4, making any task nearly production-ready. Our architecture is the most robust, lightweight, and modular, with our auto-optimizer being the most accurate—even when compared to Dspy and Text-Grad. We have three research papers coming out soon that will explain how we achieved this. This is the first time the library has been released ahead of the research papers.

It’s definitely worth checking out—you might be surprised by the results. We've had similar experiences using PyTorch and PyTorch Lightning.

To learn more about our optimizer, visit: https://adalflow.sylph.ai/use_cases/classification.html.

Best,

Li

meame2010 · 2024-08-19T17:29:34 1724088574

I think you can use the python version to optimize the prompt and use the typescript version to deploy it

meame2010 · 2024-08-19T10:14:30 1724062470

Thanks for the insightful response. Good point on using 4o-mini to save cost. I'll try it out.

I will check more into the soft-prompt tuning.

For the current scope, we are focused on in-context learning, ways to improve model reasoning at the inference time.

We use auto-differentiative framework (backpropagation) to do zero-shot instruction optimization and few-shot demonstration. currently even just zero-shot can often surpass Dspy's few-shots (as many as 40 shots). And I have come up a training paradigm that will (1) start zero-shot (2) review performance from advanced teacher model to see if we can have a gap to gain from the teacher. (3) if there is a gap to teacher, we start to do low-shot demonstrations, and gradually increase the number of shots.