Hacker News new | past | comments | ask | show | jobs | submit login
RestGPT (github.com/yifan-song793)
228 points by omarfarooq on Sept 11, 2023 | hide | past | favorite | 83 comments



How RestGPT differs from ToolLLM or Gorilla?

papers:

1. ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs https://arxiv.org/abs/2307.16789

2. Gorilla: Large Language Model Connected with Massive APIs https://arxiv.org/abs/2305.15334


Did you open the linked paper in the OP?

Gorilla is compared;

ToolLLM seems to postdate this project.


The problem here is where these actions come from. Generic LLM cannot generate correct actions in many (if not most) real life cases. So, it will have to learn, and LLMs aren't good at learning. For example: "I'm tired, play my favorite". The action depends on _who_ is saying and on what's going on right now. There may be someone sleeping, or watching TV. I'm afraid that acceptable solution is much more complicated.


I have investigated use of agents for real support agent type work and the rate of failure made it unacceptable for my use case. This is even after giving it very explicit and finely tuned context.

I suspect that if engineering of LLM solutions utilizes unseen testing data more, it's going to become apparent that it really does not have sufficiently reliable "cognitive" ability to do any practical agent type work.


Have you used any benchmark to test agents? I'm currently looking for REST API usage benchmark for LLMs.


DO we have to expect _that_ level of understanding from the agent, though? If my wife said that to me, I may have a good chance of queuing up the song she has in mind, but anyone else? No chance. I don't expect tools like this to be able to understand cryptic requests and always come to the right answer. I'm happy if I can request a song or an action, or anything else in the same way i might ask another human who doesn't know me intimately.


If not, how is this more useful than something like Siri?


Natural language understanding. Siri doesn't get context at all. You can twist unstructed data or requests however you like and the LLM will deal with it just fine.

"Play my favorite" is just a knowledge problem. If GPT fails there, it's because it doesn't know your favorite, not because it can't parse the request or understand what you need it to do.

You have to speak certain ways to Siri to get it to do things.

Unless specifically hard-coded, Siri will never receive "damn I'm finding it hard to read" as input as decide to turn on the lights. GPT will.


Human: "damn I'm finding it hard to read"

GPT w/ memory: "Because you still have dyslexia."


Thanks for the chuckle.


"Siri doesn't get context at all." and yet immediately "GPT fails there, it's because it doesn't know your favorite"

"Knowing your favorite" is the context.

> Unless specifically hard-coded, Siri will never receive "damn I'm finding it hard to read" as input as decide to turn on the lights. GPT will.

Of course it won't. You have to very specifically fine tune it to understand what light conditions are, where you are in the house, and what it is you need to turn on.


> You have to very specifically fine tune it to understand what light conditions are, where you are in the house, and what it is you need to turn on.

Where you are in the house and what needs to turn on, at least, is an API query job, not a fine-tuning job.

As far as whether it can understand the relevance of lighting to the situation, I just asked ChatGPT 3.5 the question 'Acting as an AI home assistant, if you hear me say "I'm finding it hard to read", what actions would you take?' and 'Adjust the lighting' was the second option it gave back (after 'ask for clarification'). I think we're there, honestly, we just don't have the different parts connected yet.


> Where you are in the house and what needs to turn on, at least, is an API query job, not a fine-tuning job.

And that API magically comes form where?

> I just asked ChatGPT 3.5 the question 'Acting as an AI home assistant, if you hear me say "I'm finding it hard to read", what actions would you take?'

So, basically:

- you had to pre-program Chat GPT to act as a home assistant

- you had to provide it with specific context and specific phrasing for it

- it still failed, asked for clarification, and only then responded

And now you have to this song and dance every time you want to coax GPT into doing what you need (and that's what RestGPT does).


What you're describing as pre-programming is a little misleading if the "pre-programming" doesn't need to change for each specific request: a real product would provide that "pre-programming" for the user.

Prompting for task performance is fine as long as you're not expecting the end user to have to replicate your prompting. Your goal is to change model activations for a given input, the end user is similarly affected regardless of if you used a prompt or fine-tuned.

-

This task doesn't require fine-tuning though, zero-shot performance is enough:

I generated a mock schema from Home Assistant's API (https://data.home-assistant.io/docs/states/) and explicitly gave the model the option to ask for clarification, but it has no problem translating non-obvious commands into actions without asking for details:

https://chat.openai.com/share/fc5b972f-4641-47a1-9842-2e0d69...

Note those objects mirror Home Automation, you could hook that up today without any song and dance. Combine that with RAG and you'd have something that's a lot more useful than Siri and capable of improving performance over time.


We're back to "how is it different from Siri".

You had to provide two pages of text and do manual mapping between human-readable names and some weird identifiers to provide the simplest functionality.

Funnily, this functionality is also completely unpredictable.

I ran your prompt and first request, and got "Identify the area with the lowest observed request volume and increase the brightness of the light in that area to improve the lighting." ChatGPT then proceeded to increase brightness in the garage.

---

It's also funny how in the discussion about context the context of the app is forgotten.


I think you fundamentally don't understand the topic if you're talking about two pages of text?

The end user would never type in a word of that: they'd say "[Wake word] play me some music"

A piece of software running on a device would transcribe what it heard, and fire off a request to the LLM with all of that text wrapped around their statement.

For ease of sharing I used the web interface to provide the instruction, but you'd use the API with a prompt which also dramatically increases determinism.

No one is writing out the state of each light bulb: you trivially query that information programmatically and bundle it with the request.

In a real product there'd be explicit handling of detecting where the request came from, that's already a problem that's been worked on, but I wanted to demonstrate the main difference vs Siri: zero-shot learning

The LLM wasn't told what those volumes mean, but it was flexible enough to infer the intent was to provide a form of location, rather than ask.

It's a forced example so if you want to get caught up on the practicality of audio for locating people be my guest, but it's to show LLMs are great at "lateral applications" of capability:

You give them a few discrete blocks of functionality and limited information, and unlike Siri they can come up with novel arrangements of those blocks to complete a task they haven't yet seen.

Honestly the fact you keep going back to "look at all the text" feels a bit like if I showed you the source code for an email messaging app, and you told me: "No one will ever use email! Who would write all that instead of just writing a letter and mailing it?!"


Indeed, the context is "people using natural language to make requests". No soul on earth would consider/use your phrasing. I (a human) have clue what your request is for - "lowest observed request volume"...??? Try "raisr the lights where we usually aren't asking you for much" and you might get tge same result. As far as I can tell, brightness increase in the garage (where, I'd guess, you've made the least requests), the AI apparenyly understood better than you or I what you meant.


That JSON isn't something you'd type, it's something that you can programmatically generate if you have a Home Assistant setup.

With super primitive wake word detection and transcription, the most you get is:

- What the user said

- How loudly each microphone in the house heard it.

If you take a look at the mock object in that transcript, that's what it maps to...

```json { "request": "I'm finding it hard to read" "observedRequestVolume": [ 3eQEg: 30, iA0TN: 60, h1T3y: 59, 5Qg1M: 10 ] } ```

The only part that would be human provided is: "I'm finding it hard to read"

The invented challenge was to see if using a suboptimal set of inputs (we didn't tell it where we are) it can figure out how to action.

It's zero-shot capability that makes LLMs suitable for assistants: traditional assistants can barely handle being told to do something they're capable of in the wrong word order, while this can go from hastily invented representation of a house and ambiguous commands to rational actions with no prior training on that specific task


> And that API magically comes form where?

HomeAssistant, or any number of other providers. Do you think this part is somehow difficult?

> you had to pre-program Chat GPT to act as a home assistant

That is what we call "a prompt". It is a well-known technique. I am surprised that this should look strange to you.

> you had to provide it with specific context and specific phrasing for it

That is what we call "a prompt". It is a well-known technique. I am surprised that this should look strange to you.

> it still failed, asked for clarification, and only then responded

You have misunderstood. In its list of actions to take, the first and only response it gave, the first thing it said it would do in context is ask for clarification as to why I was finding it hard to read. That seems entirely reasonable to me. Does it not to you?

> And now you have to this song and dance every time you want to coax GPT into doing what you need (and that's what RestGPT does).

So what? It's not something the person sat in the dark ever has to care about.


> That is what we call "a prompt". It is a well-known technique. I am surprised that this should look strange to you.

> That is what we call "a prompt". It is a well-known technique. I am surprised that this should look strange to you.

Funny how we're in the discussion about context, and you decided to ignore and discard the entire context of the discussion :)


>You have to very specifically fine tune it to understand what light conditions are, where you are in the house, and what it is you need to turn on.

No, you don't.


They mean context in the sentence or that or that be inferred from “common sense” and without any specific knowledge, I think


I hate Siri as much as anyone, but Chat GPT has no context in the "common sense" either.

The sibling comment literally says "I had to provide a long-ish sentence as a context/programming instructions before it could do anything". https://news.ycombinator.com/item?id=37464563


You expect it to know things like that the user wants it to act as a home assistant without being told? That’s not common sense, that’s mind reading


No, I expect people to stop pretending that LLMs somehow know context unlike the stupid Siri.

There's considerably more in that prompt besides just "you need to act like a home assistant"


There was insufficient context. Imagine I tell you "turn on that light, where I'm pointing". You'd do no better. No one here is under the conviction magical prescience is involved. This tooling provides the mechanism for an initial API call to be tied to the event described, in natural language, as "look where I'm pointing". The first response (to ask for clarification) is precisely what a human agent would do to get context to clarify the coarse-grained request. The second guess, assuming you disabled the (explicit) allowance for clarifying questions, is also a magnificent recognition of implicit, common-sense context. Seems it's even more effective than you at following the true context for this tools appropriate placement.


Why would we want this at all if it doesn't know you that well? Current voice assistants without AI can already handle songs and actions like that. Seems like it's largely solved.


I think this can be easily fixed, if LLM can do notes on what's going on. If it will have additional context before the prompt:

```

You are home assistant. Here is information what's going on in the house:

It's 4PM. Bob likes Chopin Fantaisie-Impromptu. Alice likes Mozart Rondo in D. Bob is in the house. Alice will be back from office at 5PM.

You get a prompt: I'm tired, play my favorite

```

For the above input any LLM will play Chopin.


Where is that input coming from?


It's just example, I've manually created it. But I think LLM can do such memory notes for itself and include as context.


Memory notes by an LLM for its own consumptions reminds me of the Polaroids in the movie Momento.


Nice analogy - yes, something like this. What is more, LLM notes can be hierarchical, to have some kind of generalization.


At least those were deliberate.


So, an LLM that has no context, and must have context provided to it via notes, prompts etc. will somehow create these notes for itself?


Exactly, you'd use a section of the context as a permanent scratch pad. There are a few existing techniques (e.g. tree of thoughts) that look a bit like this.


How would that "scratch pad" be filled?

Let me remind you of the context of the conversation: we're asking LLM to play a favorite song.


You need some kind of sidekick model to watch for the output of the LLM in order to spot sentences where the LLM e.g. "doesn't know about your favourite song". This model would trigger a custom prompt to fill the scratch pad : Your are Bob, your favourite song is ___. The model would act as a middleware and inject scratch pad information to each prompt you address to the LLM.


And that model knows about things like "favorite songs"... how exactly? And about 1 million other things required to build the notes in the original comment?

Or is it models all the way down?


I think the model should only know about spotting some kind of token the LLM has the order to give when it doesn't know something related to your personal taste. It is technically possible (I insist on technically) for the LLM to be moderated in such a way that it retains itself from hallucinating your personal tastes, in the same way it is actually possible for gpt models to give answers such as "As a trained language model, I cannot blablabla" when certain conditions are met. From here, you have to imagine the scratch pad to be a sort of manual file you have the responsability to feed in order for the LLM to have the necessary context to answer your request. The sidekick model is there only to trigger the condition "something must be added to the scratch pad, due to the appearance of a certain token in the LLM response". I can imagine building something like this in emacs, where a buffer containing the scratch pad is opened on the sidekick demand : you have the responsability to enter "Bob's favourite song is X", or maybe the sidekick is able to extract elements of the LLM answer to propose " The LLM failed to answer when you asked, as user Y : What is my favourite song ?".

The sidekick can be very simple as I said, this depends on the ability for your LLM to be moderated.

GPT4's answer to the "What is my favourite song ?" :

  I don't have access to personal data about       someone unless it has been shared with me in the course of our conversation. I am designed to respect user privacy. So, I don't know what your favorite song is unless you tell me. What is it?
I can add instructions to every prompt I perform so that GPT4 says "Please input X" when it lacks X information about me to answer.

  Hello, what's my favourite song ?
Answer:

  Please input your favourite song:
The middleware recognize the form "Please input X", opens the scratchpad buffer, adds :

   Favourite song: 
And you type your favourite song here.

The scratchpad may have some kind of sentence like " Here is a list of things about me you should remember." That helps contextualizing content for the LLM.


I add this just for demonstration purpose, the end of this discussion extract is wild :

- Please input your favourite meal.

- Favourite meal : Sushi, please remember that, along with my favourite song

- Noted. Your favourite meal is sushi and your favourite song is the "Fuga" from BWV 1001 by J.S. Bach. How can I assist you further?

- Emulate a linux terminal from now on :

-Alright. You're now "interacting" with a simulated Linux terminal. Keep in mind that this is only a text-based simulation and does not have the full capabilities of an actual Linux terminal.

``` user@chatgpt:~$ _ ```

Please input your commands, and I'll respond in kind.

- cd Documents

``` user@chatgpt:~$ cd Documents user@chatgpt:~/Documents$ _ ```

- ls

``` user@chatgpt:~/Documents$ ls notes.txt project/ resume.pdf favorite_song_info.txt sushi_recipe.md

user@chatgpt:~/Documents$ _ ```

- cat favorite_song_info.txt

``` user@chatgpt:~/Documents$ cat favorite_song_info.txt Favorite Song: Title: Fuga Composition: BWV 1001 Composer: J.S. Bach

user@chatgpt:~/Documents$ _ ```


I understand this, and I guess this might work for some simple scenarios, but this quickly becomes tedious for the user. My favorite songs, foods, TV shows etc. change often. Or may depend on the time of day, mood etc.

You'd need to define multiple tokens, and multiple side models, to figure that out and annotate. If you ask the user for it every time... that won't work :)


In a way e.g. spotify already tracks your behavior and almost "knows" your favourite song/album of the month.

It is technically possible for you to track yourself in an automated manner, no ? If Gorilla or RestGPT can contact several APIs to "put the last Bowie Album" then it could also feed a file of your habits and by there infer on what your favourite album is.


Previously when you told it your favourite song (directly or indirectly) it would have outputted something like "NB: favourite song is X" and then that would be included in the prompt forever afterwards.

Anthropic has models that can input ~75k words, so I don't think this is far fetched as a way of adding memory to LLMs.


> Previously when you told it your favourite song (directly or indirectly)

How did I tell it what my favorite song is "indirectly"?

> it would have outputted something like "NB: favourite song is X"

Again, how? Let's say the model is trained to add this to notes when asked directly. What exactly is "indirectly", and how will it recognize that it should add "Alice will be back from office at 5PM" and "My favorite food is sushi" and not "we order Chinese takeout on the weekends"?


I'm genuinely not seeing a problem there that the Planner part of the paper couldn't cover. "Who said that" and "what's going on right now" are just API calls. Besides which, if one person says "play my favourite" while another person is watching TV, that's not the LLM's job to unpack.

The point is that the ability to call APIs gives them the ability to learn so that the actions that are eventually taken are correct in context. It's like a more generic version of https://code-as-policies.github.io/.


hopefully it can be solved with the target API, the target API knows who is calling this API, the service has user information. Or this will be translated into "Play the most played playlist", and the action will be enough.

I agree with you in general though, more useful AI is, more data it will need to see. I strongly believe companies like Microsoft, Google or Apple will bring the best experience because they own operating systems. It is going to be very hard for a third party to build a general AI assistant.


The whole notion of "memory" in LLM research solves this problem.


> So, it will have to learn, and LLMs aren't good at learning

LLMs are bad at human-like learning, but their zero-shot performance + semantic search more than make up for it.

If you give an LLM access to your Spotify account via an API, it has access to your playlists and access to details about each song like `BPM`, `vocality`, even `energy` :

https://developer.spotify.com/documentation/web-api/referenc... https://developer.spotify.com/documentation/web-api/referenc...

An LLM with no prior explanation of either endpoint, can figure out that it should look at your favorites playlists, and find which songs in your favorite list are most suitable for a tired person.

-

But it can go even further and identify its own sorting criteria for different situations with chain of thought:

Bedroom at night: https://chat.openai.com/share/6b1787ef-cd84-4834-b582-5024f8... Kitchen at 5pm: https://chat.openai.com/share/7ddaa047-0855-48c1-bcea-308083...

Rather than blindly selecting the most relaxing songs it understands nuance like:

> Room State: "lights on" and "garage door open" can imply either returning home from work or engaging in some evening activity. The environment is probably not yet set for relaxation completely.

And genuinely comes up with an intelligently adapted strategy based on the situation -

And say it gets your favorite wrong, and you correct it: an LLM with no specialized training can classify your follow up as a correction vs an unrelated command. It can even use chain-of-thought to posit why it may have been wrong.

You can then store all messages it classified as corrections and fetch those using semantic similarity.

That addresses both the customization and determinism issues: You don't need to rely on the zero-shot performance getting it right every time, the model can use the same chain of thought to translate past corrections into future guidance without further training.

For example, if your last correction was from classical music to hard metal when you got back from work, it's able to understand that you prefer higher energy songs, but still able to understand that doesn't mean every time in perpetuity it should play hard metal

Kitchen w/ memory: https://chat.openai.com/share/43635427-55d5-4394-b282-46acae... Bedroom w/ memory: https://chat.openai.com/share/8c146dd5-2233-4aba-8f6a-b97b7a...

-

I experimented heavily with things like this when GPT came out; part of me wants to go back to it since I've seen shockingly few projects do what I assumed everyone would do.

LLMs + well thought out memory access can do some incredible things as general assistants right now, but that seemed so obvious I moved on from the idea almost immediately.

In retrospect, there's an interesting irony at play: LLMs make simple products very attractive. But if you embed them in more thoroughly engineered solutions, you can do some incredible things that are far above what they otherwise seem capable of.

Yet a large number of the people most experienced in creating thoroughly engineered solutions view LLMs very cynically because of the simple (and shallow) solutions that are being churned out.

Eventually LLMs may just advanced far enough that they bridge the gap in implementation, but I think there's a lot of opportunity left on the table because of that catch-22


> Yet a large number of the people most experienced in creating thoroughly engineered solutions view LLMs very cynically because of the simple (and shallow) solutions that are being churned out.

Maybe, just maybe, because even simple solutions are invariably an incomplete brittle complicated unpredictable mess that you can't use to build anything complex with?

As eloquently demonstrated by your "simple" solutions


Your reply is not indicative of someone capable of a good faith conversation on the topic, but I'll bite.

I think you don't understand what the hard and easy problems are that underly the solutions I'm talking about.

For example: you repeatedly reply to people talking about the length of the prompts, but end users don't need to write prompts. It's trivial to append instructions around what a user says.

On the other hand, you keep replying to people with "how is that not just Siri" when people describe the LLM demonstrating zero-shot classification for example, but you don't seem to understand how difficult of a problem that has been for ML. Those contrived chat logs you see are demonstrating multiple discrete classifications that would have each cost untold hundreds of thousands of dollars in development of recommender systems to replicate just a few years ago.

Most people couldn't even dream of building a Spotify song recommender from first principles that could capture nuance like that chat demonstrated with an army of engineers. The fact is today, right now, that's something someone could hack into a real usable personal recommender in a weekend.

At the end of the day LLMs don't make all problems easier, and they make some problems harder: but the problems they make easier are extremely hard problems. I think if you're not familiar with how hard some of the things they're doing are, then the things they're doing poorly glare out much brighter.

If you spend half that weekend is spent fighting the LLM to output JSON the right way, it sure sounds like LLMs are just dumb hype machines... but it doesn't reflect the sheer impossibility of the value they're providing within that same system.


> Your reply is not indicative of someone capable of a good faith conversation on the topic, but I'll bite.

You think so because replies to me have willfully ifgnored and misunderstood the point of my replies. And have willfully ignored the context (which, as I already said, is funny and ironic).

The whole discussion started with

- "LLMs can't generate actions in real life situations"

- "We can't expect LLMs to do that"

- "How is it more useful than Siri"

- and here's the most important one: "Siri doesn't have context ... GPT fails there, it's because it doesn't <know context>"

So, Siri is bad, because it doesn't have context. But somehow even though GPTs are the same, they are good because... someone somewhere can come up with an imprecise unpredictable prompt for a rather specific situation tat may or may not work for some people... and that's why they are better than Siri and have context.

"Where is this context/input coming from?" - "end users don't need to write prompts. It's trivial to append instructions around what a user says."

This is literally magical thinking. "Someone somehwere will maybe somehow create a proper prompt that maybe will definitely work, and users won't have to do anything". This... is literally Siri. It even asks for clarifications when it can't understand something.

You keep harping on about "zero-shot classification". And completely ignored what I wrote: I ran your amazing zero-shot classification, and it immediately failed. It raised the brightness in the garage. I guess someone (not the end user) should write another model to correct the first one. And when that one inevitably, and immediately, fails, someone (not the end user) should trivially write corrections for that. It's all turtles all the way down, isn't it? (On a second try it did say that the user is likely in the kitchen or in the bathroom, and increased brightness in the bathroom).

Thing is: I don't subscribe to this magical thinking. I see innumerable failure modes and "edge cases" (which are not edge cases, but actual every day scenarios) where none of this works.

This is also the reason why we haven't seen any complex product (apart from specialised fine-tuned ones) built with LLMs: they fail very much like Siri does in even the simplest scenarios. No one knows how to provide an actual proper context of a person's life so that it works reliably more than half of the time (and when it seemingly works, a simple MRU would probably work better).

The "trivial annotations" for user actions are anything but. (There's also a separate discussion here: https://news.ycombinator.com/item?id=37464277).

> Most people couldn't even dream of building a Spotify song recommender from first principles that could capture nuance like that chat demonstrated with an army of engineers. The fact is today, right now, that's something someone could hack into a real usable personal recommender in a weekend.

As an engineer who works at Spotify (not in recommendations, but I know the details at least superficially), thank you for a hearty laugh this sentence brought me.

As I said, magical thinking.


You had a chance to prove my assumption wrong by writing this same exact comment without all the snark.

At the end of the day if you're just unmoved by the implications that an ML model went from a bag of tokens to a structured, explained chain of thought, and a final response on an unknown task with rewards defined in natural english (!) and intentional ambiguity most humans wouldn't even try to confront... there's not much conversation to be had.

I think the rest of us (including your colleagues) will continue to build on these models, and like most advancements there'll be a vocal crowd insisting the car isn't useful because it can't be fed with grass.

> not in recommendations

You didn't have to say that after complaining ChatGPT's web interface didn't give both us the same reply (most people in ML understand how temperature relates to LLM output)

_

By the way, if making your own personal music recommender seems like "magical thinking", maybe you're a little lost on which parts of Spotify's recommender systems are complex due to scale: if Spotify only needed to make song selection work for one person at a time, they'd have a lot more leeway in architecture.

In fact, when the problem was flipped and they were scaling humans attaching their recommendations to songs, they built on OpenAI: https://newsroom.spotify.com/2023-02-22/spotify-debuts-a-new...

Not unexpected when you're a founding organizer of the "NLP 4 Music and Audio Forum".


"Gorilla: Large Language Model Connected with Massive APIs" (2023) https://gorilla.cs.berkeley.edu/ :

> Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! Join us, as we try to expand the largest API store and teach LLMs how to write them!

eval/: https://github.com/ShishirPatil/gorilla/tree/main/eval

- "Gorilla: Large Language Model connected with massive APIs" (2023-05) https://news.ycombinator.com/item?id=36073241

- "Gorilla: Large Language Model Connected with APIs" (2023-06) https://news.ycombinator.com/item?id=36333290

- "Gorilla-CLI: LLMs for CLI including K8s/AWS/GCP/Azure/sed and 1500 APIs (github.com/gorilla-llm)" (2023-06) https://news.ycombinator.com/item?id=36524078


It seems after 1-2 years that the true power of LLMs is in DevOps. I got pretty excited when I tried GPT-3 (completion model), but as time went by and OpenAI shifted to chat models, we lost control over the LLM part and found new meaning in taking whatever model OpenAI made available as a blackbox and "chained" it to other tools we already had, like data bases, APIs, function calls/tools, etc. I'd say DevOps is exactly where open source is seriously behind; there are decent open source models but it costs so much to self host them, despite the full power and control we have on them (via text generation webui and the like).

OpenAI is playing the DevOps game (starting maybe with introduction of ChatML). Open source community plays the LLM and benchmarks game. Ironically, the two are converging, meaning that OpenAI's models are getting dumber (not the API) thanks to censorship and RLHF, to the point that open source models are even better than some OpenAI models in some aspects. On the other hand, open source models are getting better tooling and DevOps thanks to oobabooga, llama.cpp, etc.

I'm seriously waiting for competitors to change nVidia's monopoly in this space. Maybe Apple?


I think currently M2 max is best bang for buck running interface in open source model. But use case is so niche that Apple probably doesn't actively start supporting open source models. In the long run I hope some smaller company gets shit together and starts competing with NVIDIA.


The GPU support in ML frameworks however is really not impressive. I have a Macbook with M1 Max 64G RAM, I can load a 7b model for fine-tuning (Huggingface Trainer, Pytorch, MPS), but the speed is just too slow, can only reach to 50% the speed of an i5-12500 CPU in my tests.


At $6,000, how is M2 Max the best bang for the buck?!

One could get two used 3090s and setup a decent PC at lower prices.


Two 3090s don’t have 96GB of VRAM


That's true. But OTOH, one can't easily upgrade Macs.


With 96gb of VRAM, will you really need to? Personally I think apple servers might be adopted for AI and LLM workloads soon.


M2 max studio is 2000$.


> I'm seriously waiting for competitors to change nVidia's monopoly in this space. Maybe Apple?

I would have thought AMD is the obvious contender. They are #2 in GPU's, they have formidable programming talent (based on their advances with Ryzen vs Intel) and they have targeted AI as their goal.

Am I missing something?


AMD have repeatedly dropped the ball when it comes to software support for compute and AI. Their hardware is quite capable, but very few people can actually make it work, which means most of the existing models have poor AMD support.

This is getting better with ROCm and such, but that's Linux-only and only works for a subset of tasks.

Both Intel and Apple have better "out of the box" support for ML and the ability to invest more into making these things work (e.g. Apple have implemented Stable Diffusion against Core ML themselves)


> WARNING: this will remove all your data from spotify!

That is quite the caveat.


I feel like that script needs a few 'are you completely sure?' Prompts


thanks for pointing that out!


It's actually really interesting to see GOFAI techniques like planning used in conjunction with LLMs.


I see a GOFAI resurgence thanks to LLMs.


The examples are pretty lame since you can do what the examples do way faster without using a LLM and paying OpenAI.


ChatGPT + Noteable is already powerful to get some work done via API calls (after installing and importing the libraries, writing Python code, managing secrets for authentication etc)

There is surely scope to streamline this much further

I am very intently watching this space


Interested to learn more (big fan of data stories). Do you have any particular use cases you would recommend to look into?


I've seen Noteable+ChatGPT demo, where user can chat with ChatGPT and responses where executed in the Noteable-hosted Python notebook. It was cool!

It would be also cool to have such plugin for Google Colab.

I hope someone will come with a new way to interact with LLM models other than chat UI. It would make code writing even more faster.


Separating the language model from the knowledge model (and then splitting that into domain specific knowledge models) is the next low(ish) hanging progression step in the field.


This is interesting, I do something similar with unfetch.com - I have some examples on unfetch.com/directory - There are a lot of potential use cases for LLm & APIs


The service unfetch.com looks cool. Can I run it locally on my machine, not in the cloud? Is it open-source?


It's not opensouce, even though it's something I would like to be able to do soon. We offer enterprise plans, which use LLAMA2 instead of openai and can run on your own cloud (or even locally if you have enough RAM). If you are interested send me an email at c@unfetch.com


Does this use a chain of LLM/REST requests or does it use a single multi-task LLM request that launches a REST call while populating its output?


Waiting for BashGPT.

It could use curl to do REST calls too, and a lot more.



How does this compare to agentGPT?


Finally a use for my IRC bot


Paging the Google Home team…


Is there any local GPT model available, even if its a bit older? I had heard maybe 3.5 leaked but perhaps I'm mixing up StableDiffusion?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: