ursaguild's comments

ursaguild · 2025-06-04T20:13:41 1749068021

Ingesting documents and using natural language to search your org docs with an internal assistant sounds more like a good use case for RAG[1]. Agents are best when you need to autonomously plan and execute a series of actions[2]. You can combine the two but knowing when depends on the use case.

I really like the OpenAI approach and how they outlined the thought process of when and how to use agents.

[1] https://www.willowtreeapps.com/craft/retrieval-augmented-gen...

[2] https://www.willowtreeapps.com/craft/building-ai-agents-with...

ednite · 2025-06-04T20:25:31 1749068731

Interesting, and thanks for explanations.

In this case, the agent would also need to learn from new events, like project lessons learned, for example.

Just curious: can a RAG[1] system actually learn from new situations over time in this kind of setup, or is it purely pulling from what's already there?

ursaguild · 2025-06-04T21:10:13 1749071413

Especially with a client, consider the word choices around "learning". When using llms, agents, or rag, the system isn't learning (yet) but making a decision based on the context you provide. Most models are a fixed snapshot. If you provide up to date information, it will be able to give you an output based on that.

"Learning" happens when initially training the llm or arguably when fine-tuning. Neither of which are needed for your use case as presented.

ednite · 2025-06-04T21:38:11 1749073091

Thanks for the clarification, really appreciate it. It helps frame things more precisely.

In my case, there will be a large amount of initial data fed into the system as context. But the client also expects the agent to act more like a smart assistant or teacher, one that can respond to new, evolving scenarios.

Without getting into too much detail, imagine I feed the system an instruction like: “Box A and Box B should fit into Box 1 with at least 1" clearance.” Later, a user gives the agent Box A, Box B, and now adds Box D and E, and asks it to fit everything into Box 1, which is too small. The expected behavior would be that the agent infers that an additional Box 2 is needed to accommodate everything.

So I understand this isn't "learning" in the training sense, but rather pattern recognition and contextual reasoning based on prior examples and constraints.

Basically, I should be saying "contextual reasoning" instead of "learning."

Does that framing make sense?

mardef · 2025-06-05T03:56:33 1749095793

There is no memory that the LLM has from your initial instructions to your later instructions.

In practice you have to send the entire conversation history with every prompt. So you should think of it as appending to an expanding list of rules that you put send every time.

alpineman · 2025-06-05T11:39:44 1749123584

You can use a RAG / embedding database as a kind of memory and add pulled 'approved examples', 'feedback comments' etc alongside future prompts.

Xmd5a · 2025-06-05T11:43:53 1749123833

Here's my recommendation:

What you're attempting to do, integrating an agent in your business, is difficult. It is however relatively easy to fake. Just setup a quick RAG tool, plug it into your LLM, and you're done. From the outside, the only difference between a quick-n-dirty integration and a much more robust approach will be in numbers. One will be more accurate than the other, but you need to actually measure and count performance to establish it as a fact and not just a vibe.

First advice: build up a dataset and measure performance as you develop your agent. Or just don't, and deliver what hype demands.

As for advices ... and looking at those other commenters left ... If you want to do this seriously, I'd recommend that you hire someone who already did that kind of integration, at least as a consultant. Someone whose first reflex won't be to just tell you LLMs are fixed and can't learn but will also add this isn't a limitation since RAG pipelines are better suited for this task than fine-tuning [1].

Also RAG isn't a monolithic solutions, there are many, many variations. For your use-case, I'd consider more elaborate solutions than just baseline RAG, such as GraphRAG [2]. For the box problem above, you might want to consider integrating symbolic reasoning tools such as prolog, or consider using reasoning models and developing your own reinforcement learning environments. Needless to say, all of these aspects need to be carefully balanced and optimized to work together, and you need to follow a benchmark/dataset centric-approach to developing your solution. For this task consider frameworks that were designed to optimize llm/agentic workflows as a whole [3][4].

Shit is complex really.

[1] https://arxiv.org/abs/2505.24832 tells us generalization happens in LLM once their capacity for remembering things is saturated, and this might explain why fine-tuning has been less efficient than RAG so far.

[2] https://microsoft.github.io/graphrag/

[3] https://github.com/microsoft/Trace

[4] https://github.com/SylphAI-Inc/AdalFlow

ednite · 2025-06-05T13:28:39 1749130119

Sound advice and much appreciated. In this case, I might team up with someone to help me add this feature to my SaaS. But I’ll definitely dive deeper into the subject. Thanks for the info and the links!

barrenko · 2025-06-06T06:01:45 1749189705

There's also (of course) the agentic rag, especially if your data is from a lot of different types of resources, and you will have some context / memory set up that it relies on, in actuality with a lot of context there's is not a lot of "learning" needed.

K0balt · 2025-06-06T14:59:40 1749221980

Incorporating more data or new data into the RAG pool is a form of “learning”, but in general agents don’t “learn” unless you give them a journal or allow them to modify their own prompt.

mousetree · 2025-06-04T20:52:44 1749070364

You can ingest new documents and data into the RAG system as you need

ursaguild · 2025-05-04T14:11:08 1746367868

I was under the impression ACL supersedes KQML and was proposed by FIPA?

sgt101 · 2025-05-04T15:05:40 1746371140

FIPA came after KQML and tried to learn from it and do something better. There were people who didn't think that it had succeeded in that.

So it's debatable I think.

ursaguild · 2025-05-04T13:39:07 1746365947

This article was written a few weeks after MCP was released and touches on why MCP is important. While I guess you could argue that technically there's nothing to it, protocols such as MCP is addressing a missing need to standardize interactions between your ai app and another service. Code needs to now be written for users, devs (apis), and ai.

https://www.willowtreeapps.com/craft/is-anthropic-model-cont...

ursaguild · 2025-05-04T13:32:56 1746365576

smolagents by huggingface would be more of an agent framework. If it was discussed we would see smolagents/llamaindex/pydantic/etc with frameworks on figure 2. Several frameworks were left off in this paper as it focuses more on the protocols.

https://huggingface.co/docs/smolagents/

arresin · 2025-05-04T14:32:52 1746369172

Good point. Appreciate the correction.

ursaguild · 2025-03-27T15:27:50 1743089270

The real benefit I see from mcp is that we are now writing programs for users and ai assistants/agents.

By writing mcp servers for our services/apps we are allowing a standardized way for ai assistants to integrate with tools and services across apps.

ursaguild · on Dec 7, 2024

Just saw that this was built for a hackathon. Huge kudos and congratulations!

ahmetd · on Dec 7, 2024

thank you! although I wasn't able to win the hackathon it was still a fun experience :)

ursaguild · on Dec 7, 2024

https://lmarena.ai

ursaguild · on Dec 7, 2024

I like the idea of more comparisons of models. Are there plans to add independent analyses of these models or is it only an aggregation of input limits?

How do you see this differing from or adding to other analyses such as:

https://artificialanalysis.ai

https://huggingface.co/spaces/TTS-AGI/TTS-Arena

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena

Great work on all the aggregation. The website is nice to navigate.

botro · on Dec 7, 2024

I made https://aimodelreview.com/ to compare the outputs of LLMs over a variety of prompts and categories, allowing a side by side comparison between them. I ran each prompt 4 times for different temperature values and that's available as a toggle.

I was going to add reviews on each model but ran out of steam. Some users have messaged me saying the comparisons are still helpful to them in getting a sense of how different models respond to the same prompt and how temperature affects the same models output on the same prompt.

adrianomartins · on Dec 8, 2024

Hey, this is pretty insightful! Wonder if, in the course of researching to build this website you reached any conclusions as to what’s the AI assistant currently ahead.

rtsil · on Dec 8, 2024

I can confirm, it's still very helful, thank you!

ahmetd · on Dec 7, 2024

the gradio ui looks ugly imo, that's why I used shadcn and next.js to make the website look good.

I'll try to make it as user-friendly as possible. Most of the websites are ugly + too technical.

refulgentis · on Dec 8, 2024

I want to point out you dodged the data question, and there's a reason for it.

I like your work visually on first glance, god knows you're right about gradio, even if its irrelevant.

But peddling extremely limited, out of date, versions of other people's data, trumps that, especially with this tagline. "A website to compare every AI model: LLMs, TTSs, STTs"

It is a handful of LLMs, then one TTS model, then one STT model, both with 0 data. And it's worth pointing out, since this endeavor is motivated by design trumping all: all the columns are for LLM data.

vivzkestrel · on Dec 8, 2024

now imagine going one step further and actually running a prompt across every AI model and showing you the best answer and the AI model that generated it

alternatex · on Dec 8, 2024

Who decides what the best answer is?

vivzkestrel · on Dec 8, 2024

the user who runs the prompt?

alternatex · on Dec 12, 2024

Those tools exist, they do not need to be imagined. Look into the related comments. Also they do little, but increase the labor of getting an answer. Not exactly an improvement of AI for the user to spend more time reviewing AI answers.