Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Fructose – LLM calls as strongly typed functions (github.com/bananaml)
218 points by edunteman on March 6, 2024 | hide | past | favorite | 99 comments
Hi HN! Erik here from Banana (formerly the serverless GPU platform), excited to show you what we’ve been working on next:

Fructose

Fructose is a python package to call LLMs as strongly typed functions. It uses function type signatures to guide the generation and guarantee a correctly typed output, in whatever basic/complex python datatype requested.

By guaranteeing output structure, we believe this will enable more complex applications to be built, interweaving code with LLMs with code. For now, we’ve shipped Fructose as a client-only library simply calling gpt-4 (by default) with json mode, pretty simple and not unlike other packages such as marvin and instructor, but we’re also working on our own lightweight formatting model that we’ll host and/or distribute to the client, to help reduce token burn and increase accuracy.

We figure, no time like the present to show y’all what we’re working on! Questions, compliments, and roasts welcomed.




IMHO, in the future programming may look similar to this. Write a type declaration for a function with an expressive type system, e.g. refinement types. Then use LLMs + SAT/SMT to generate provably correct code.

This strikes a happy medium, where machines are assisting programmers, making them much more productive. Yet the resulting code is understandable as a human has decomposed everything into functions, and also robust as it is formally verified.

I am working on a F# proof-of-concept system like this, there are other alternatives around implemented in Haskell and other languages with varying levels of automation. It is potentially an interesting niche for a startup.


Yeah, functional programming and pure functions seem perfect for generative programming. Granted, I think they're perfect in general, but as long as human programmers are still stuck in the world of object-orientation (the modern sense), then they're going to be wasting the time in the LLM feedback loop. The LLM should be be able to write a unit of code however it wants in a way that is as self-contained as possible. Since an LLM can, in theory, quickly "understand" code that most software engineers would object to, then we should get out of the way of LLMs rather than expect them to be like we are.


The field that has been doing this is called Program Synthesis. Here’s an example survey:

https://www.microsoft.com/en-us/research/publication/program...

I’ve wanted to see the traditional techniques combined with modern ML to sort of drive the search and generation process. Then, we’d still have the advantages of both formal specifications and classic AI (esp traceability). While looking for a synthesis link, I stumbled onto one paper trying to mix the two approaches:

https://ojs.aaai.org/index.php/AAAI/article/download/5048/49...


> Write a type declaration for a function with an expressive type system, e.g. refinement types. Then use LLMs + SAT/SMT to generate provably correct code.

This is how I use copilot currently, so I might not be following on what part of this is 'future' facing or relevant to this Fructose project?

Not being contrarian, I thought this was an interesting point but as I thought about it more I realized, "wait, they're describing what I already do".


Do you use Copilot on a language equipped with refinement or dependent types, and use said types to constrain generation of entire functions, in a single step, that are also formally verified?


How do you do this? I know nothing but this sounds really interesting.


How!!! Could you think of writing a blog post about it?


This project seems pretty different to what you've proposed. Fructose looks like it's "just" asking an LLM to evaluate a function (written in English text), and then jamming whatever comes out back into your type system.

Being able to sometimes answer a given question is perhaps a first step to writing code that can answer that question reliably, but it's a long way from an LLM that does the former to one that does the latter.


Oh yeah, you just reminded me of this cool talk I saw at Strange Loop a while back. Not about the AI parts but re: program synthesis in Haskell:

"Type-Driven Program Synthesis" by Nadia Polikarpova https://www.youtube.com/watch?v=HnOix9TFy1A

Links to more projects and papers by Prof. Polikarpova: https://cseweb.ucsd.edu/~npolikarpova/

I think this is one of the main projects she discusses in the talk: https://github.com/nadia-polikarpova/synquid

EDIT: meant to mention this too, which I think has been around a bit longer, not that I've ever used it in production: https://ucsd-progsys.github.io/liquidhaskell/


Lots of related work also by A. Solar-Lezama https://people.csail.mit.edu/asolar, sometimes in collaboration with N. Polikarpova.


TIL, nice--thanks!


Oddly similar to summoning demons.


yeah I had a moment working with fructose where I realized "oh this is more like functional programming than I expected"


Is the F# POC open source? Link?


Not yet, it's a bit rough. The LLM I am using requires a bit of extra fine-tuning to be really smooth, I need to rent a bigger GPU. Besides, I am working on some novel integration between transformers and SAT/SMT that will take me some time to finish.


Is the theory tied to a specific llm? I'm interpreting it as, e.g., the llm writes the code, the solver verifies it, repeat until correct. In this situation the two are decoupled and the llm would be a drop in and thusly could be any local or remote llm. Is there something about your approach that doesn't allow this?

(also, +1 for OS link request)


Decoupling both is the simplest option, but not the one I am focusing on. Also note SAT/SMT can also be used for synthesis.

In fact, synthesis has a relatively rich history using SAT/SMT solvers.


Are you using the SAT/SMT solvers to feed training data into a transformer or integrating the solver logic into the model code?


Our product (phosphor) is built end-to-end in F# so this stuff is close to my heart. You might find Moonbit's approach to functional AI interesting as well https://www.moonbitlang.com/blog/moonbit-ai.


This approach may be too high-level "magic" to the point of being difficult to work with and iterate upon.

Looking at the prompt templates (https://github.com/bananaml/fructose/tree/main/src/fructose/... ), they use LangChain-esque "just try to make the output to be valid JSON" when APIs such as GPT-4 Turbo which this model uses by default now support function calling/structured data natively and do a very good job of it (https://news.ycombinator.com/item?id=38782678), and libraries such as outlines (https://github.com/outlines-dev/outlines) which is more complex but can better ensure a dictionary output for local LLMs.


Many of our early users have said this as well. I don't want this to turn into an abstraction monstrosity: the more unadulterated the prompt, the better. We're looking to outlines as inspiration for doing this logic as part of the model vs the client. Thanks for the links!


Big proponent of guaranteed outputs for LLMs. I wrote a library awhile back (gpt-json) that did something similar by querying the OpenAI API. At the end of the day though while their responses are _highly likely_ to be valid JSON they're not guaranteed. There's only so much that can be done with remote calls to their model's black box.

The future here really lies in compiling down context free grammars. They let you model json, yml, csv, and other programming languages as finite state machines that can force LLM transitions. They end up being pretty magical: you can force value typing, enums, and syntax validation of multivariate payloads. For use in data pipelines they can't be beat.

I did some experiments a few weeks ago on training models to generate these formats explicitly with jsonformers/outlines. Finetuning in the right format is still important to maximize output. You can end up seeing a 7% lift if you finetune explicitly for your desired format. [^1] At inference time the CFGs will constrain your model to do what it's actually intended to.

[^1]: https://freeman.vc/notes/constraining-llm-outputs



valid json, yes, but not a specific json schema (yet, who knows, maybe they ship schema support, I'm surprised they haven't)


it seems this is in the context of "extraction" where all of the data is already present in the input text, and all that's needed is the reformatting. This is something we've been wrestling with (even today): is the role of fructose and/or our eventual formatting model to provide both intelligence + formatting (such as generating novel data, on the fly, constrained to structure), or just formatting (anything-to-json). Not sure what the answer is and not expecting one, we're just realizing the clear split between needs of users running extraction vs more generative/creative tasks.


we don't need to limit ourselves to context-free, either. its possible to enforce scope as well, and even force per-token type correctness, as least for somewhat syntactically well-behaved languages that use local type inference.


> not unlike other packages such as marvin

This feels pretty much identical to Marvin? Like the entire API?

From a genuine place of curiosity: I get that your prompts are different, but like why in the name of open source would you just not contribute to these libraries instead of starting your own from scratch?


Thanks for asking, and I'd agree. I'd give the same answer as the folks asking about instructor: we built this in a week and are sharing it early, this package API happens to have landed on what Marvin is doing, we're likely to change over time, especially leaning toward running our own models as part of it.


Wait. So why not just contribute to an existing open source project if you’re going to implement an identical API?

If you run your own models as a part of it, surely you could hook up your models as a backend to whatever abstractions you’re copying here.


Wait, someone made a similar comment as this elsewhere in the thread. So why don't you just upvote that?

If you have your own thoughts, surely you could just think them to yourself while upvoting.


yikes.

I was responding to them sidestepping the first commenter’s question.


yikes.


yeah this seems to be pretty much the same interface as `fn` from marvin, except w/o pydantic (see https://github.com/PrefectHQ/marvin?tab=readme-ov-file#-buil...)


Does anyone else get bothered by how this seemingly results in code that won't compile?

Instead of this:

@ai() def describe(animals: list[str]) -> str: """ Given a list of animals, use one word that'd describe them all. """

it would seem a lot more intuitive to do this:

def describe(animals: list[str]) -> str: return ai("""Given a list of animals, use one word that'd describe them all.""", animals)


Technically a function body needs at least one statement. A docstring is just an expression statement (a string), so a function definition with just a docstring is synctatically valid Python. I've seen people say multiline string literals are Python's version of multiline comments, but that's really just convention; it's a noop expression statement. Same as doing

    def foo():
        4
Which is also an expression statement as a function body, and also does nothing. Contrast to actually using a comment as a function body; comments aren't statements (nor expressions, so they can't be used as an expression statement):

    def foo():
        # this doesn't work
> IndentationError: expected an indented block after function definition on line 1

Of course, this doesn't really matter at all, and I get that it feels strange. I've just been thinking about grammars and syntax lately, and it's been interesting to now have the vocabulary and mental model to understand these unintuitive things :)


I typically use "pass" for this exact case of having a stub function body (typically to be implemented later)


Wouldn't the correct way be with the use of ellipsis, as is used in type stubs?

  @ai()
  def describe(animals: list[str]) -> str:
    """ Given a list of animals, use one word that'd describe them all. """

    ...


Yeah the pyright doesn't like the annotated return type not being honored by the empty stub function. I wonder if there's a way to trick it.

For your suggestion, the decorator would still be required to overload the function execution with the remote call, otherwise you'd just be calling the function body, but we have considered special wrapper return types to help play better with pyright (and also give programmatic access to debug details of the call), but that'd add bloat to the package and subtract from the more native python feel we're aiming for.


> Yeah the pyright doesn't like the annotated return type not being honored by the empty stub function. I wonder if there's a way to trick it.

Python has an existing convention for this (so its not a "trick"), the use of the special value Ellipsis (literal: ...)

https://mypy.readthedocs.io/en/stable/stubs.html


Beautiful, this does the trick!! Thanks for the tip.

  @ai()
  def stub() -> int:
      """docstring"""
      ...

  # (use ... instead of "pass" in the function body)


Good stuff. How does this compare to Instructor? I’ve been using this extensively

https://jxnl.github.io/instructor/


answered in different thread. tldr: not that different for now. we're likely to do some serverside optimizations, esp. given our gpu inference history.


I like your UX a lot more. Modeling the llm calls as actual python functions allows them to mesh well with existing code organization dev tooling. And using a decorator to "implement" a function just feels like a special kind of magic. I'd need more ability to use my own "prompt templates" to use this as a lib but I'm definitely going to try using this general pattern.


Since you are going down this route, I would recommend you guys to build some sort of unit test driven fine tuning framework, where you may provide input output examples expressed as simple function calls. You could then let the LLM generate examples and check them using the unit tests and keep the valid results to build up a valid data set. For bonus points, the unit tests themselves could also call the LLM to check if the output passes criteria expressed in natural language or not.


I love this emerging space at the intersection of programming and LLMs. It goes beyond having the LLMs generate code: that's an obvious and amazing use case, but it's far from the only one.

Another project I'm excited about in this area is GPTScript, which launched last week: http://github.com/gptscript-ai/gptscript.


This is much nicer than calling GPT in the middle of my code. Honestly the aesthetics of Fructose just make the code so much neater


Thanks!


How do you guarantee output structure? Does it ever fail to conform?


It's not 100% yet. Route to that: 1. Clientside, retry strategy on failed parse. Not yet implemented, we throw an exception on parse fail right now, but soon to be implemented. Not ideal because of token burn and latency, but the best quick solution. 2. For the custom model we're building, we use strict grammar definitions to bias outputs toward the needed structure (or if there is only one structurally correct token, outright skip the generation of that token and insert it directly).

I've been impressed at how well gpt-4 does with the default prompt template we use. Even better if you enable the chain_of_thought flavor.


Basically my experience with homegrown. If you give it a typescript template for outputs, and a ton of prose to tell it how to respond, you usually get the right responses.


TGI just integrated Guidance in 1.4.3, that by itself can support both grammar/JSON/Pydantic & tool invocation/function calling.

Langchain & Llamaindex plus Fructose really need to skip the structure adherence work & move to chunking/KG generation since that's the next pain point to tackle.


What is TGI? Is that the huggingface Text Generation Inference project on GitHub?


Yes. TGI is Huggingface's version of LLVM (some nuance, of course). LLVM also launched grammar support recently too, so we'll be looking into it.


All of these acronyms are so confusing. I'm assuming LLVM isn't the compiler tool, but searching "LLVM ai" doesn't give me any good results.


They probably meant vLLM https://docs.vllm.ai/en/latest/


ah shoot, yes I meant vLLM, sorry for the confusion, lots of comments to reply to :)


Why do you have Guidance in caps?

https://github.com/guidance-ai/guidance

or ...

https://huggingface.co/docs/text-generation-inference/concep...

or ... ?

A quick glance through these, they don't seem yet to leverage json_object on OpenAI with the word JSON in the prompt, which works wonders with the 0125 models.


I find it grating that all of these types of things say "LLMs" when in fact they literally only work with OpenAI. There are hundreds of variations of LLM models. When it works with only gpt-4-turbo or gpt-3.5-turbo, it's inaccurate to say it's a tool for LLMs in general.


So you're saying they should ensure compatibility with all LLMs on Day 0 so you can avoid a personal "grating" feeling. It's called an MVP.


They should just say it works with OpenAI or ChatGPT. It's called being honest.


feel free to PR the readme if you feel it was misleading


True, Grammar has been around with Llama.cpp since the early days IIRC. Microsoft had Guidance as well & now TGI supports it as well. End game for Langchain/Llamaindex if just guarantee or structure adherence was the only reason to use & beg the LLM for usable output.


Maybe I’m not the audience for this but how is this a “product”. Coercing LLM outputs into a function call is built into OpenAI itself.

What is fructose doing extra here? It’s like productising copy&paste which every modern OS has, no?


Nice, I've just started building something similar in TypeScript. I wasn't a big fan of the Langchain model. I wanted to develop with normal functions in an imperative manner so the code is very easy to read. I'm also using decorators to add the required functionality to workflow steps so I can support retries, and build something like LangSmith on top too.

So far I've been able to make a little workflow that can complete real simple infrastructure requests in JIRA. Pick the right repository, make the changes, compile, and push up the merge request.


open source?


This is great, it might be really helpful for what I’ve been working on

Just put together a small project that uses GPT to find good job matches[1]

One of the most challenging things in making it useful for more users, is managing the prompt that include several pieces of user input, and need to return a specific format, with a structure that depends on what the user wants to include in the prompt

What’s the typical use case for this? Who needs it the most right now?

Thank you!

[1] https://news.ycombinator.com/item?id=39621373#39624542


So what is this actually putting into the prompt to guide generation? I dislike libraries that come with a lot of pointless abstraction.

I'm about to write something that generates typescript code from pydantic models. If this just works out of box, it would make me very happy.

I'll take a look through the repo tomorrow, sorry if my response is a little lazy, I just got off work.


Definitely very excited to see this be a thing. Genuinely liked the approach to make function calls strongly typed and rely on functional programming principles.

During my senior year, I worked on a research project very similar to this and I’m glad to see this out there for everyone. I’d love to connect with the team if possible!


absolutely! My DMs are open at @erikdunteman on twitter, or erik at banana dot dev


I love the concept, but I'd really prefer being able to use it against local llms (localai, ollama, etc).


as with marvin, you can just swap the base url and use any of the oss proxy libs that clone the openai api (but since they don't do function calling [except for mistral i think], its not as good afaik)


Seems like a great feature (and honestly allows us to do smarter things for strictly structured generation). I'm curious, what's your main motivation for local llms vs hosted APIs?


Not wanting to get dinged for $20/mo? Ability to use offline for local home automation (eg: "given a verbal input request, determine the home devices in scope and their on/off state" => "given the current devices state and the verbal request, generate a list of home assistant actions to perform"), using a custom model for the above, etc.


> Not wanting to get dinged for $20/mo?

OpenAI (& peer) API pricing is honestly quite cheap - I spend way less than 20$ a month for my needs. On the other hand, ChatGPT offers a different experience, so I pay for that too.

https://help.openai.com/en/articles/7127956-how-much-does-gp...


I'm pretty sure the primary reason is that you don't want an update of the hosted LLM to suddenly break your application without warning.


Thanks everyone for a great Show HN! This turned out much larger than expected, thanks for all the comments and github stars. We had a fun time with it.

Lots of takeaways, blogs to read, things to implement, issues to address. On it!


I've done a lot of work over the last year wrangling LLM outputs - both from the OpenAI API as well as local LLMs.

What are the benefits of using Fructose over LMQL, Guidance or OpenAI's function calling?


Still learning about the landscape so can't give informed opinions. LMQL is a new one for me, will check it out.

What we're mostly going for is composability vs abstraction. What's the smallest nugget of lift we can do for you, to make it feel natural to implement what you want? In this case it's treating the calls as functions and leaning on native python features like functions, docstrings, and types, so you can still use the python language like closures to do the weird things you need.

This is all handwavy, put on my wizard language design hat, so take it with a grain of salt. We're just trying things out.



here's an awesome post on the landscape https://hamel.dev/blog/posts/prompt/


I remember reading that, good stuff.

I'd like to see an injectable mitm like proxy that can rewrite payloads. Many of these frameworks are useful, but when they go off the rails, they hard to modify and introspect.

It would be nice if LLMs had a way to speak an annotated format, like XML that was able to encode higher level information in a coherent manner over "well formed" addhoc text.

LLM libraries are in a crazy state right now. It is like JS frameworks 2015, a new one that demos well every other day.


one idea we're cooking is to offer a proxy with a hosted reformatting model on-board, to rewrite payloads on their way back in the case of type parse failure. fructose, the clientside sdk, would be optional


How does Fructose relate or compare to Instructor (https://github.com/jxnl/instructor)?


Currently, quite comparable and obviously Instructor is more mature and feature rich. They're going the "patch the openAI client" approach which makes code written still use openAI SDK patterns which is pretty smart. Jason seems like he knows what he's doing.

We're trying to make it more of a language feature with the decorated functions. Plus exploring the hosted formatting model direction.

How do you feel this compares? Do you think there's any gaps in current tools worth working on?


Have you done any comparison with DSPy ? (https://github.com/stanfordnlp/dspy)

Feels very similiar to DSPy except you dont have optimizations yet. But I like your API and the programming model your are enforcing through this.


i do not know what i am doing.

I just want `create(response_model=T) -> T` lol


the king has arrived! Instructor is a clever API on this, clean


Plus one on this question! Seems very similar.


Very Cool! Would it work for Pydantic out of the box? Or that's something coming along?


currently don't have pydantic support yet, but we're not too opinionated on that. I know it's seemed to emerge as a standard, and I imagine useful in the context of running fructose in a FastAPI handler, but we led with dataclasses because they're language native and achieve much of the same thing


Pydantic can serialize both instances and classes/types to json and json schema. That seems quite helpful for this use case. How are you handling serialization to/from the llm? Role-your-own or are there additional libraries for doing this with dataclasses?


Obvious question - how is this better than marvin, instructor, outlines.


fyi: LM Studio can host a server that uses the OpenAI api to whatever model you are using locally

So as long as this library can be directed to localhost or configured, it can use any LLM


Are you planning to add other types like Claude or Llama2?


eventually, but priority goes toward finding an abstraction that feels right. We're very likely to break this package API, still v0. Sticking with openai till we have more confidence in the foundation being correct.


Can this be a F# Type Provider?


Good Lord that sounds terrifying - I do in general prefer my type definitions to be at least approximately deterministic.


Derick1_1gc /insgragam


Yet another implementation of https://esolangs.org/wiki/English




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: