Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Relari – Auto Prompt Optimizer as Lightweight Alternative to Finetuning
32 points by antonap 10 months ago | hide | past | favorite | 4 comments
Hi HN, we are the founders of Relari (https://www.relari.ai). We launched our LLM evaluation stack on HN a few months ago (https://news.ycombinator.com/item?id=39641105), which is now used in production by AI teams at companies like Vanta and PwC. We have since expanded to directly optimizing parts of an LLM pipeline using a data-driven approach. In particular, we see a lot of potential in the Auto Prompt Optimization—which could be an attractive alternative to fine-tuning in many cases—to use data to align LLMs for domain-specific tasks.

Here’s a demo video: https://www.loom.com/share/4ad30bf1053e46a3846fc5a07495c486

We started working on the auto prompt optimizer because of our own frustration with developing, iterating, and maintaining prompts across different use cases and models. A minor update to the underlying LLM, a change in user requirements, or a shift in application infrastructure can render a carefully crafted prompt useless. As one user put it, “Prompt engineering is not software engineering; it’s wishful thinking.”

We tried prompt optimization tools like DSPy and TextGrad, but realized they require you to adopt new frameworks, craft custom metrics from scratch, and offer limited visibility into the optimization process (or even the final optimized prompt). This lack of transparency left us guessing whether the new prompts are genuinely better or just different.

Our Auto Prompt Optimizer aims to be an easy-to-use yet robust alternative, with maximum visibility into the optimization process and final results. It takes two inputs: a dataset with inputs and expected outputs for a given LLM task, and a target metric (we have 30+ out-of-the-box metrics). The optimizer then starts from your initial prompt and uses the dataset to align the LLM output with your desired outcomes. It does this iteratively, mutating the prompt based on feedback from the target metric. The optimizer automatically selects the examples from the datasets to create few-shot prompts and bake in common techniques such as chain of thought when appropriate.

Here are two examples of the results that include the initial prompt, each version of the new prompt, and its performance on the target metric

- Drug Review Prompt: https://app.relari.ai/demo/prompt/drug-review (a non-standard task where the optimizer created sophisticated instructions with detailed rating rubric and corner case handling)

- Summarization Prompt: https://app.relari.ai/demo/prompt/cnn-highlights (a simple task where the optimizer added more straightforward instructions on styling)

We see the prompt optimizer as a lightweight and practical alternative for adapting LLMs for domain-specific tasks. It can deliver high-quality prompts with as few as 100 data points.

Try it yourself (https://app.relari.ai/). You can upload your dataset or generate a simple synthetic dataset to start the optimization process. It is recommended to use a dataset with at least 30 samples. The optimization process can take up to an hour depending on the size of the dataset and metrics, so we ask you to create an account so we can keep track of each optimization run and will send you an email notice once it’s completed.

What’s next? We’re currently working on support for more advanced features like prompt chaining and agent tool call use cases. For power users, we offer custom metrics and multi-objective optimization to address the most complex use cases.

What’s been your biggest challenge with prompt engineering? Would a dataset-driven approach could improve your prompt workflow? We’d love to hear your thoughts and feedback on our approach.




This looks really slick!

Maybe this is too much of a tangent, but is it reasonable to want to see a la carte pricing options?

For instance, I have an enterprise project that I could see using this for, but that's a project with a discrete time budget (probably 1-3 months) and a tool like this would see heavy usage for the first few weeks, then intermittent usage, then we would only need to use it for maintenance updates far in the future.

The initial $1k/mo tier fills me with worry, because I can see blowing through my usage credits in the first month (and then I need to contact sales about signing up for the enterpri$$$e option), but then I wouldn't need nearly so many subscribed credits in the later months (and I imagine they don't build up like Audible credits, but instead are use-it-or-lose-it -- which also isn't great for how our development goes in spurts).

This is not the only AI tool that uses pricing like this (looking at you, Roboflow!) -- but I've never felt like this structure fits with the intermittent patterns of AI development that I use in my day-to-day job. I can understand wanting to have customers sign up for SAAS and the reliable income that such things would bring, but I feel like a la carte pricing for these kinds of tools (even if they were 4x or 5x more expensive than the equivalent credits in the subscription bundles) might let me try out these tools in an enterprise environment without waffling between "free tier" and "recurring SAAS budget line-item".


Thanks for bringing this up! The best thing is to see how we can make the enterprise plan work for you, feel free to reach out to us (founders@relari.ai).


Bit confused what the value add is over a framework like DSPy. This still requires you to create an eval dataset with ground truth, basically the only hard part of using DSPy. Easily getting the optimized prompt and having some metrics out of the box is not worth nearly $1k/mo IMO

Side note: I’ve had a lot of luck combining automatic prompt optimization with finetuning. There is definitely some synergy https://raw.sh/posts/chess_puzzles


Thanks for the feedback, love your article diving deep into DSPy! Here's how our platform is different:

1. You are absolutely right, the dataset is a big hurdle for using DSPy. That's why we offer a synthetic dataset generation pipeline for RAG, agents, and a variety of LLM pipelines. More here: https://docs.relari.ai/getting-started/datasets/synthetic

2. Relari is an end-to-end evaluation and optimization toolkit. Real-time optimization is just one part of our data-driven package for building robust and reliable LLM applications.

3. Our tools are framework agnostic. If you can build your entire application on DSPy, that's great! But often we see AI developers hoping to maintain the flexibility and transparency to have their prompts / LLM modules work with different environments.

4. We provide well-designed metrics and/or custom metrics learned from user feedback. We find good metrics very key to making any optimization process (including prompts and fine-tuning) work.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: