A new way to build with Large Language Models

Zetaphor · on Feb 10, 2023

Neat, this is like a SaaS version of Langchain[1]. For those who haven't already seen it, Langchain is a framework that gives you the ability to do this as well as integrate out of the box solutions for other common problems like memory.

[1] https://langchain.readthedocs.io/en/latest/

zkoch · on Feb 10, 2023

You can use both Langchain and GPT Index with Fixie Agents. We're fans of what they've built, and they can be great tools.

Ozzie_osman · on Feb 10, 2023

Langchain is awesome. Another great library is GPT Index. This field is apparently being called "prompt orchestration" (getting data from external sources, chaining going back and forth with an LLM, etc).

thejosh · on Feb 10, 2023

Next week: Hiring prompt orchestration op engineers, 5 years minimum experience.

No wait, that doesn't have a great acronym.

genewitch · on Feb 10, 2023

Change engineers to "professionals"

mbil · on Feb 10, 2023

Also lambdaprompt https://github.com/approximatelabs/lambdaprompt

simonw · on Feb 10, 2023

What's your guidance concerning the risk of prompt injection with applications built on this kind of platform?

If developers on your platform are building code that accepts prompt input from untrusted users do you have any measures in place to avoid them tricking the system into executing functions in an inappropriate or unintended way?

One thing I'd find useful would be a limit on the number of functions that can be called during a single execution - and rate limits and budget calls on functions generally might be smart as well.

recuter · on Feb 10, 2023

I don't know about unintended but why is "inappropriate" a problem? It is user input. You went out of your way to trick it and got what you asked for - how is it different from say user submitted content on regular websites?

I'll never understand this cultural phenomenon. Anybody can open the browser inspector on a random social site and tweak the page to say whatever they want and send screenshots around "implicating" the poor bastards - but none-hypothetically who actually cares?

Sort of akin to running a pub and having a drunk run his mouth. That's not on the establishment that is on the individual.

simonw · on Feb 10, 2023

Whether or not prompt injection is genuinely harmful entirely depends on what developers build with this stuff.

If you're going to give a LLM-driven program the ability to execute functions that can change state in the world you need to understand prompt injection, so you don't accidentally build something that you really shouldn't have built.

The first public prompt injection vulnerability was a Twitter bot which started spam-mentioning people and threatening the president. https://arstechnica.com/information-technology/2022/09/twitt...

recuter · on Feb 10, 2023

> The first public prompt injection vulnerability was a Twitter bot which started spam-mentioning people and threatening the president. https://arstechnica.com/information-technology/2022/09/twitt...

Sure. But also, so? Seems like a storm in a tea cup. "Our twitter got hacked", the end. There are countless such examples that have nothing to do with prompt injections. If anything the phrase "all publicity is good publicity" comes to mind.

I remember in the mesozoic era of the internet (early 00's) there was something like a KFC promo website where they put a guy in a chicken suit and had him respond to chat suggestions directly from users 24/7. Hilarity, as you can imagine, ensued. Shenanigans even.

If some user submitted content (even if it is sieved via a model) ends up "inappropriate" you can just delete it and move on.

I probably wouldn't hook it up to a shell running commands but the same generic advice as with all untrusted input applies..

simonw · on Feb 10, 2023

People have hooked up shells running commands already. A few days ago someone managed to extract an OpenAI API key from an environment variable because someone was running prompt output through an eval() function: https://twitter.com/ludwig_stumpp/status/1619701277419794435

Developers who don't understand prompt injection will continue to make nasty mistakes like that.

juberti · on Feb 10, 2023

I agree caution is needed here. We have taken a few steps:

- Rate limits are enforced to provide caps on agent and function usage.

- Execution depth is capped to prevent the LLM from getting into loops.

- Function output is sanitized to prevent corruption of LLM state.

- Functions execute in a completely separate environment from the rest of the service, including the LLM, to reduce the impact from bad functions.

Note that this doesn't entirely prevent against "; DROP TABLES"-type hacks against the implementation of the function, but that problem isn't unique to us. It may however be possible for the LLM to look at function inputs and flag overtly malicious ones.

largepeepee · on Feb 10, 2023

Seems like some devs are taking risky bets since they can't differentiate themselves from the competition on this new space.

localhost · on Feb 10, 2023

Relevant to this discussion is the Toolformer paper that was just released today from Meta AI. To me the exciting result from the paper is that it appears that much smaller models can be used to translate human intent into actions: https://arxiv.org/abs/2302.04761

not2b · on Feb 10, 2023

If you ask an existing LLM for a stock quote, it will hallucinate one, or give you a price it found in the snapshot of the web that it was trained on. But the intent here is to build a system that knows it should call an external service to get the right answer. It seems that there's a risk, though: the platform might use the correct service and get an accurate answer. Or it might make up a plausible answer. Who knows?

vlovich123 · on Feb 10, 2023

So it can replace all the talking heads on CMSNBC? That can only be good news.

gibsonf1 · on Feb 10, 2023

Wow, it seems like a fully frightening idea to have an ml/dl system with no clue if its right or wrong about anything (as it knows literally nothing, including that fact) to have any kind of role in autonomous agency.

skybrian · on Feb 10, 2023

Neat. I was just thinking that one thing you can do that current large language models don't do is use a calculator to check your work.

So, the first thing I'd want to try is hooking up some basic arithmetic to see if its math gets more reliable. And if not, why not?

Seems like someone would have tried this already?

juberti · on Feb 10, 2023

It's built-in :-)

skybrian · on Feb 10, 2023

Can you say anything more about how well it works?

cloudking · on Feb 10, 2023

Looks interesting, can you show us more use cases on the landing page? The stock price example is okay, but I want to see what I can use this for that I can't currently do easily. What problems does this solve?

zkoch · on Feb 10, 2023

Yeah, great feedback. We'll have a bunch more examples that vary in complexity coming out in the next couple of weeks.

cloudking · on Feb 10, 2023

Cool I signed up for access, good luck!

hn_throwaway_99 · on Feb 10, 2023

Could you use this to provide better support chat bots? That is, support bots obviously need to be scoped so that they only give information the user asking has rights to, e.g. "Show me the stocks in my portfolio". Assuming the user has logged into the interface showing the chat, is there a way to pass that user's unique access token through to one of these Fixie Agents to ensure that the data being accessed is scoped by that access token?

zkoch · on Feb 10, 2023

Short answer: yes! We think this is actually a super compelling use case, and there are a few different ways to accomplish it. We already support auth mechanisms like Oauth, and there is also agent storage built in.

michaelbarton · on Feb 10, 2023

if I’m understanding this correctly this is a way for LLMs to call external APIs by showing them special key words that call functions

I assume the output of the LLM is monitored by fixie and the result of the function inserted allow the LLM to respond again?

juberti · on Feb 10, 2023

yes, the LLM sees the result of the function and processes it according to what it has learned from its few-shots (which may involve calling more functions, or returning a formatted response).

thejosh · on Feb 10, 2023

This looks great. I really think a good use case for the current generation would be to be able to extend it with your own data.

Having internal users ask questions then have a ChatGPTesque system answer with injected data would be nice. Would be very tightly controlled (or it couldn't answer what it doesn't have).

Something between a rigid chatbot that's exists now and ChatGPT that just makes up plausible answers.

benpacker · on Feb 10, 2023

Is this OpenAI based, or do you host your own LLM? Can I provide my own fine tuning so that I can teach the AI to call my code better?

youssefabdelm · on Feb 10, 2023

The next stage: you guys carefully curate a gigantic list of extremely high quality agents and I dont have to make any agents for my specific use case.

Stage after that: GPT, or some other AI constantly adapts / curates the agents for me based on my notes, conversations, etc. (Although a privacy nightmare)

sprobertson · on Feb 10, 2023

I've wanted something like this for years! And just started building it myself last month (mostly for home automation). I like the multi-agent model, and the "thought" steps are pretty interesting. But how do you deal with context length with all the back and forth?

juberti · on Feb 10, 2023

Current context lengths are usually more than adequate for these interactions; the details of each individual step within an execution only need to be retained until the final response is emitted.

dang · on Feb 10, 2023

This would be better to discuss when there's something people can try, rather than just a waitlist.

waynesonfire · on Feb 10, 2023

dang comes in with the hammer!!

dang · on Feb 10, 2023

A user emailed to point this out and I think they made a fair point.

sprobertson · on Feb 10, 2023

Did the user have a competing project? Never seen this before with much less interesting tech demos.

dang · on Feb 10, 2023

No, just someone who cares about the quality of HN's front page. The idea that waiting lists don't come with enough information to support a substantive discussion is pretty well established on HN.

What's the interesting demo here? Did I miss something?

sprobertson · on Feb 10, 2023

The homepage video with multiple examples was a good demo, but maybe most people don't click that far. That would have probably been a better post.

singularity2001 · on Feb 10, 2023

Another big step would be to give LLMs access to computational sandboxes (like wasm) to do some calculations based on what it deems worthwhile calculating. (without access to bash/aws)

jarbus · on Feb 11, 2023

For a team of such apparently impressive people, their website doesn’t work on phones with a large font size for accessibility.

ethav1 · on Feb 10, 2023

Is matching prompts to agents a popular research problem? Does anyone know of some related work?

knome · on Feb 10, 2023

nice. I've been playing at openai for the last couple weeks, experimenting with extracting commands and queries from user prompts trying to gauge how to do pretty much exactly this. this kind of thing is going to be awesome.

nateburke · on Feb 10, 2023

> You can also build your own Agents that are hosted on your own infrastructure, using the tools and examples in the open source Fixie SDK.

What exactly is running on my infrastructure? Do I need to buy GPUs? What state needs to live on Fixie cloud infrastructure?

revskill · on Feb 10, 2023

Should this be called SLM (Small language models) ?

_fol8 · on Feb 10, 2023

I read a NY Times article by Ted Chiang today in which he made a kind of "stochastic parrot" argument for chatGPT - https://www.newyorker.com/tech/annals-of-technology/chatgpt-.... I believe, on the other hand that chatGPT may eventually be able to generate AGI, but that this will occur emergently and spontaneously. In other words, it will be difficult to predict.

One of the conditions for this, is for chatGPT models to start being able to write their own code in order to produce models of themselves that are more accurate and more efficient. Given this ability, and some fitness criteria, genetic algorithms may be used to create new LLMs. This sounds like science fiction, but once the compute requirements come down for these models (by a couple orders of magnitude), I believe this may be possible.

To what extent does your model allow for semantic models to create semantic models that are themselves more efficient in relation to some fitness criteria? Can I tell a model "You (model) I want you to reproduce using interaction with these other models (some collection of other models) and have the child model offspring be more efficient according to this criteria [for example the resultant models will create short stories that are more likely to receive high ratings on a subreddit devoted to short stories]".

You would need to get around the "model pollution" problem in which LLM models pollute the space for which the models generate data because other models are producing web artifacts (Ted Chiang's Xerox of a Xerox problem). I call this the problem of alpha (direct experience). One of the ways I've thought of to fix this is to have models trained on direct user input (such as cell phone video and pictures from a single user) - I have to admit that I got this idea from Neal Stephenson's Snow Crash (see Gargoyle). If your platform can integrate with visual processing this may have a high information density - object detection in daily videos demonstrating how objects are related to each other in the real world of the user and correlating these into a semantic network.

I'd also suggest that Obsidian integration might be useful.

This is exciting, thanks for making the Fixie SDK public.