Hacker News new | past | comments | ask | show | jobs | submit | meame2010's comments login

On-going open source project to reach to product-grad product.

Built with AdalFlow library: https://github.com/SylphAI-Inc/AdalFlow

Will including dataset creation, evaluation, and auto-prompt optimization


not few shot, but prompt tuning via text generation via auto-differentiation.

https://arxiv.org/abs/2501.16673


Time to move to open-source and smaller reasoning model.

Here are the top three learnings from auto-prompt optimizing DeepSeek R1 LLaMA70B for RAG:

1⃣ A trained DeepSeek R1 LLaMA70B(r1 distilled) is even better than GPT-o1 without training. 2⃣ The “Reasoning” model is less susceptible to overfitting compared with non-reasoning models. By comparing it with GPT-3.5, both gpt3.5 and r1 distilled start at the same accuracy and reach similar accuracy on the validation dataset. However, on the test dataset, r1 distilled often achieves much higher accuracy. 3⃣ R1 can think too long and run out of output tokens before finishing the task. The optimized prompt specifically added instructions for it to “think less.”


We use gpt4o as the backward model. But I’m excited to try deepseek r1 as it has explicit reasoning available.

We are continuously adding more benchmarks to the paper with UTAustin.


Yup. The LLM-AutoDiff is just getting started. But it has proven generation-only without explicitly doing few-shot samples can be even more effective and create shorter final prompts

Author here. Yea, in this fashion. And it can create the feedback using llm as a backward engine


Congrats on the paper! I read through some of the github docs and read through the paper, this sounds very impressive, but I'm trying to think of how to best use this in practice... is the idea that I could give some kind of high-level task/project description (like a Python project), and this framework would intelligently update its own prompting to avoid getting stuck and to continue "gaining skill" throughout the process of working on a task? Could this be used to build such a system? Very curious to learn more.

you need a training dataset, and a task pipeline that works. You can refer to this doc: https://adalflow.sylph.ai/use_cases/question_answering.html

Thank you, I missed the use cases section, that explains a lot. Nice documentation. Might play with this when I get home.

Hey Hackers,

I'm Li Yin ([GitHub](https://github.com/liyin2015)), the author of AdalFlow and a former AI researcher at Meta AI.

AdalFlow was inspired by a viral [LinkedIn post](https://www.linkedin.com/posts/li-yin-ai_both-ai-research-an...) I made, discussing how the LLM ecosystem lacks a shared library that bridges the gap between research and product development—similar to how PyTorch has streamlined model training and adaptation.

I decided to build this library while working on my product, a conversational search engine called [Sylph](https://sylph.ai/). After trying out existing libraries and finding that I had to write everything myself, I ended up with a solution that was lighter, faster, and offered more control. However, managing the codebase soon became overwhelming.

AdalFlow is based on my vision for the future of LLM applications, which I see as a three-stage workflow:

- *V1*: Use the library to quickly build your initial task pipeline, getting you 70-80% of the way to production. - *V2*: Auto-optimize the prompt to push an additional 10%, bringing your product to a near-ready state without the hassle of manual prompt iteration. - *V3*: Leverage V2 to label more data. As more users interact with your product, the next step is to fine-tune the LLM, further optimizing for speed, accuracy, and cost-effectiveness.

We've completed V1 and V2. Our auto-optimizer can enhance GPT-3.5 performance to match that of GPT-4, making any task nearly production-ready. Our architecture is the most robust, lightweight, and modular, with our auto-optimizer being the most accurate—even when compared to Dspy and Text-Grad. We have three research papers coming out soon that will explain how we achieved this. This is the first time the library has been released ahead of the research papers.

It’s definitely worth checking out—you might be surprised by the results. We've had similar experiences using PyTorch and PyTorch Lightning.

To learn more about our optimizer, visit: https://adalflow.sylph.ai/use_cases/classification.html.

Best,

Li


I think you can use the python version to optimize the prompt and use the typescript version to deploy it


Thanks for the insightful response. Good point on using 4o-mini to save cost. I'll try it out.

I will check more into the soft-prompt tuning.

For the current scope, we are focused on in-context learning, ways to improve model reasoning at the inference time.

We use auto-differentiative framework (backpropagation) to do zero-shot instruction optimization and few-shot demonstration. currently even just zero-shot can often surpass Dspy's few-shots (as many as 40 shots). And I have come up a training paradigm that will (1) start zero-shot (2) review performance from advanced teacher model to see if we can have a gap to gain from the teacher. (3) if there is a gap to teacher, we start to do low-shot demonstrations, and gradually increase the number of shots.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: