Hacker News new | past | comments | ask | show | jobs | submit login

This is effectively my question. I assume there is some magic going on. But how many engineering hours worth of magic, approximately? There is a lot of speculation around GPT-4 being MoE and whatnot. But very little speculation about the magic of the ChatGPT front end specifically that makes it feel so fluid.



That's mostly because there's very little value in deep speculation there.

It's not particularly more fluid than anything you couldn't whip up yourself (and the repo linked proves that) but there's also not much value in trying to compete with ChatGPT's frontend.

For most products ChatGPT's frontend is the minimal level of acceptable performance that you need to beat, not an maximal one really worth exploring.


What front end is better than ChatGPT? Is the OP implementation doing running summarization or in-convo embedding lookup?


It sounds like a cop-out but: it's one made for your use-case.

If you're letting people do fun long-form roleplay adventures using summarization alongside some sort of named entity K-V store driven by the LLM would be a good strategy.

If you're building a tool that's mostly for internal data, something that leans heavily into detailed answers with direct verbatim citations and having your frontend create new threads when there's a clear break in the topic of a request is a clever strategy since quality drops with context length and you want to save tokens for citations.

People who are saying LLMs suck or are X or are Y are mostly just completely underutilizing them because LLMs make it super easy to solve problems superficially: when it comes to actually scaling those solutions to production you need more than random RAG vector database wrappers.


>alongside some sort of named entity K-V store driven by the LLM

I'd be curious to hear more about how exactly this works. You do NER on the prompt (and maybe on the completion too) and store the entities in a database and then what? How does the LLM interact with it?


LLMs thrive at completely ambiguous classifications: you can have them extract entities and something like "a list of notable context".

Let's say we want to let our chat remember the character slammed the door last time they were in Village X with the mayor in their presence and have the mayor comment next time they see the player.

Every X tokens we can fire a prompt with a chunk of conversation and a list of semantically similar entities that already exist, letting the LLM return an edited list along the lines of:

   entity: mayor

   location: village X

   priority: HIGH

   keywords: town hall, interact, talk

   "memory, likelyEffect"[]: door slammed in face, anger at player
Now we have:

- multiple fields for similarity search

- an easy way to manage evictions (sweep up lowest priority)

- most importantly: we're providing guidance for the LLM to help it ignore irrelevant context

When the user goes back to village X we can fetch entities in village X and whittle that list down based on priority and similarly to the user prompt.

None of this has any determinism: instead you're optimizing for the illusion of continuity and trading off predictability.

You're aiming for players being shocked that next time they talk to the mayor he's already upset with them, and if they ask why he can reply intelligently.

And to my original point while this works for a game-like experience, you wouldn't want to play around with this kind of fuzzy setup for your companies internal CRM bot or something. You're optimizing for the exact value proposition of your use-case rather than just trying to throw a raw RAG setup at it




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: