Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: GPT-JSON – Structured and typehinted GPT responses in Python (github.com/piercefreeman)
174 points by icyfox on May 5, 2023 | hide | past | favorite | 72 comments
Hey HN, I've been using GPT a lot lately in some side projects around data generation and benchmarking. During the course of prompt tuning I ended up with a pretty complicated request: the value that I was looking for, an explanation, a criticism, etc. JSON was the most natural output format for this but results would often be broken, have wrong types, or contain missing fields.

There's been some positive movement in this space, like with jsonformer (https://github.com/1rgs/jsonformer) the other day. But nothing that was plug and play with GPT.

This library consolidates the separate logic that I built across 5 different projects. It lets you prompt the model for how it should return fields, inject variable prompts, handle common formatting errors, then cast to pydantic when you're done for typehinting and validation in your IDE. If you're able to play around with it, let me know what you think.




I like the idea, but I think a library that focuses on producing requests and parsing responses according to schema is better. Sending requests to the server is orthogonal to the purpose.

What we've found useful in practice in dealing with similar problems:

- Use json5 instead of json when parsing. It allows trailing commas.

- Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.

- For numerical scores, do a similar thing by asking GPT for a description, then with the small embedding model write a few examples matching your score scale, and for each response use the score of the best matched example. If you let GPT give you scores directly without explanation, 20% of the time it will give you nonsense.


> Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.

Have you tried just getting it to do both? It reasons far better given some space to think, so I often have it explain things first then give the answer. You're effectively then using gpt for the extraction too.

This hugely improved the class hierarchies it was creating for me, significantly improving the reuse of classes and using better classes for fields too.


This seems like a better approach. Introducing another unrelated model seems like it would just add an extra point of failure to watch out for.


There's a benefit in having a model that can output only true/false if that's all that's acceptable, but if I was doing this myself I'd want to see how far I could get with just one model (and then the simple dev approach of running it again if it fails to produce a valid answer, or feeding it back with the error message). If it works 99% of the time you can get away with rerunning pretty cheaply.


Thanks for the thoughts! I've deployed a few meta models that act like you're describing for second stage predictions, but for fuzzy task definitions have actually seen similar luck with having GPT explicitly explain its rational and then force it to choose a true/false rating. My payloads often end up looking like:

  class Payload:
    reasoning: str = Field(description="Why this value might be true or false),
    answer: bool
Since it's autoregressive I imagine the schema helps to define the universe of what it's supposed to do, then the decoder attention when it's filling the `answer` can look back on the reasoning and weigh the sentiment internally. I imagine the accuracy specifics depend a lot on the end deployment here.


Didn't know about json5, so I had to deal with trailing commas in another way. I found that providing an example of an array without trailing commas was enough for GPT to pick up on it.

The tips on booleans and numerics are interesting! Will keep them in mind if I ever need to do that. I've definitely experienced a few quirks like that (E.g. ChatGPT 'helpfully' responding with "Here's your JSON" instead of just giving me JSON).


I’ve also found good results by asking for it to give the answer first, then to explain its answer. Best of both worlds, since I can just ignore everything following and it still seems to do the internal preparatory ‘thinking’.


There’s some really good info along thff we same lines in this course https://learn.deeplearning.ai/chatgpt-prompt-eng


Another alternative JSON parser is the YAML parser. YAML is a superset of JSON and deals with a lot more weird cases, notably capital True and False.


Is it possible to give an example what a small embedded model would look like? Curious how to make something like this!


We just use https://www.sbert.net/. Compare the embedding of the answer with the embeddings of YES versus NO.


Here's 40 lines of python code that I've found to be unreasonably effective at accomplishing something similar:

https://github.com/jiggy-ai/pydantic-chatcompletion/blob/mas...


Thanks. Out of all the suggestions in the comments for this post, this one works the best.

And in fact it is only one line, not 40:

    "Please respond ONLY with valid json that conforms to this pydantic json_schema: {model_class.schema_json()}. Do not include additional text other than the object json as we will load this object with json.loads() and pydantic."


yes thats actually it


Thanks, I'm also doing something similar and figured out that asking a question with examples as `user`, adding the perfect response as `assistant`, replying with `Perfect. Now do this {}` works really well and cuts of a lot of trial/error.


Thanks! I have something similar that I've been using and struggling to keep consistent. I like that this is a relatively small package, I'll probably end up using it to play around with.


Very cool. Out of curiosity, for the retries why are the errors appended as system messages as opposed to appending to the user message? And in either case, would it help to prepend the error with something like “Be sure to avoid outputting something that would cause this error:”?


based on my experience with GPT-4 in coding tasks it responds extremely well to just appended raw error output text without additional explanation.


I dont see how these infinite loops are a good idea... You never sure if you actually getting a good result ?

What is the failure rate?


Um, what? This is a standard retry loop. It's just generally good practice.


I've been interfacing with GPT programmatically for a little while now, leveraging it's "soft and fuzzy" interface to produce hard / machine-readable results. JSON was the format that felt best-suited for the job.

I see a ton of code in this project, and I don't know what most of it does. As far as GPT troubles with JSON, I'll add a couple: sometimes it likes to throw comments in there as if it was JS. And sometimes it'll triple-quote the JSON string as if it was Python.

My approach to solve these problems was via prompt engineering - using the system message part of the API call. Asking it to "return valid json, do not wrap it in text, do not preface it with text, do not include follow-up explanations, make sure it's valid json, do not include comments" - seems to work 99% of the time. For the remainder, a try-and-catch block with some fallback code that "extracts" json (via dumb REs) from whatever text was returned. Hasn't failed yet.

It's fascinating to watch the new paradigm arrive, and people using old habits to deal with it. This entire project is kind of pointless, you can just ask GPT to return the right kind of thing.


Why not both? You can tell it in the prompt what you want and still constrain the output programmatically.

Also note that the output still depends on a random sampling of the next token according to the distribution that the net gives you - so there is a lot of genuine randomness in the model's behaviour. And because each sampled token influences the rest of the response, this randomness will become stronger the longer the response is.

So if you already know you're only interested in a particular subset of tokens, it makes sense to me to clamp the distribution to only those tokens and keep the model from getting onto the "wrong path" in the first place.

Also, pragmatically, if you can get the model to restrict itself to JSON without telling it in the prompt, you're saving that part if the context window for better uses.


I agree with others that it would be interesting to see an LLM that outputs JSON natively - but I think it would also be moving in the opposite direction of the general trend. Right now I can ask it for JSON, YAML, or a number of other formats.

To answer "why not both?" -- bottom line, the effort involved. I don't want to deal with yet another library, the bugs in it, and the inevitable -changes-. GPT's capacity for bridging the gap between structured and human languages is an enormous boon. It bridges a gap so large, we most of the time can't span it with our imaginations. I don't need to write code to tell GPT what to do, I can direct it in plain english.

I'm not worried about the size of the context window the same way we're not worried about memory or disk space - there will be more.


I'm curious what you might think of https://github.com/knowsuchagency/struct-gpt


But the problem is the 99%, no ?


It works fine 99% of the time by just using a small amount of extra instruction in the actual prompt. The method GP describes works in any language with just the basic building blocks of http requests, regexp, and a json decoder.

Why do we need a library for this?


You might not! Depends on what you're looking for. I've been finding this library most helpful in places where I have a lot of GPT calls in pipelines, so having typehinted schema return values / some built in error correction / variable injection / establishing standards for the IO of prompt schema is the most useful. So IMO I see its main use as a good standard set of operations that work pretty well out of the box and that allow you to hack around them with decent flexibility.


yeah, I don't disagree, however this idea is better a function I use within gpt-index or langchain. There is a horse race and we're all making our bets about who's going to win


I'm having success with simple YAML schema. One thing that's very helpful for the prompt is to include "description" and "example":

      -
         column name: salary_max
         format: number
         example: 150,000
         description: Salary Maximum
      -
         column name: keywords
         format: string
         example: engineer, python, docker, remote
         description: Relevant Keywords (Comma separated keywords used for filtering and matching jobs to candidates)


Nice! Another reason to prefer YAML is token count — YAML is 3x cheaper than JSON: https://twitter.com/v1aaad/status/1643889605538635782


Good suggestion mmaia - I'm opening a new issue to keep track of the different output schemas that are being suggested. One thing that originally worried me a bit with yaml was its relative reliance on space-based formatting to drive meaning. GPT generally tokenizes newlines/spaces but a lot of preprepared datasets strip these out, so I preferred the explicitness of a json that's idempotent to spaces. Have you tried using this approach for non-tabular or nested data like lists or dictionaries?


I built zod-chatgpt https://github.com/olup/zod-chatgpt a couple of weeks ago to serve same purpose in typescript


Yes, this is a very similar technique that I have been using and it works great. One suggestion of something that worked well for me was to use safeParse instead of parse. Then if it doesn’t pass validation, you can retry by passing in the JSON object and the validation error messages. You could also use tricks like starting with a smaller model, then try larger models if you hit a validation failure. Not a great approach for real-time chat but very useful for when you need high-quality results.


I asked ChatGPT (GPT-4) about the idea of integrating GPT-JSON with CUE language (https://cuelang.org/) for potential improvements and the answers are quite impressive:

"Yes, integrating the CUE language with GPT-JSON can potentially lead to improvements in its capability. CUE, short for Configuration, Unification, and Environment, is an open-source data constraint language designed to simplify the definition, generation, and validation of data in structured formats, like JSON, YAML, and others.

Integrating CUE with GPT-JSON can bring several benefits:

1) Validation: By using CUE, you can define schemas for the input and output JSON, which helps in validating the data structure and ensuring it adheres to the expected format. This can help reduce errors and improve the quality of the interaction with the GPT model.

2) Simplification: CUE can help simplify the process of defining and generating JSON inputs and outputs for GPT-JSON. By providing a more concise and expressive syntax, CUE can make it easier to craft structured inputs and outputs, which can be especially helpful when working with complex data structures.

3) Interoperability: Integrating CUE can improve the interoperability between GPT-JSON and other systems or applications. With CUE's ability to work with multiple data formats (such as JSON, YAML, or HCL), it becomes easier to integrate GPT-JSON with various tools and platforms.

To integrate CUE with GPT-JSON, you would need to follow these general steps:

1) Define CUE schemas for the input and output JSON structures used by GPT-JSON.

2) Use the CUE language to validate and generate input JSON for GPT-JSON.

3) Process the output JSON generated by GPT-JSON using CUE to ensure it adheres to the defined schema and transform it as needed.

Integrating CUE with GPT-JSON can improve the overall robustness and ease of use of the library, making it more accessible and useful for a wider range of applications."


I think you could implement the entire thing in CUE, minus moving from/to python. We're already doing what the last 3 points instruct the user to do for integration. We are using Go rather than Python, so the CUE to language types has better integrations. Of course, there is the prompt engineering side, but that is just text with instructions for the LLM that need to be well crafted.

One thing we have seen is that you need to adjust your prompts when OpenAI updates their model. Given they only support their dated models for so long, it seems increasingly difficult to make the case to build on top of LLMs you cannot control the life cycle for.


Maybe not the best thread to ask, but - is there an ELI5 explanation of what exactly CUE is and what is it for? I've landed on that website several times in the last two years, and I could never make heads or tails of it. That the name is just a mix of random, unrelated verbs doesn't help.


Its a really well thought out way to combine schemas, templates, and data into one language. Its a superset of JSON and it provides a functional interface to validate inputs, outputs, and configurations.

It integrates very nicely with go and can convert between go structs into cue structs and vice versa.



Pretty good pitch from ChatGPT if you ask me! I haven't used cue outside of test projects before, so a few questions:

1. Is cue's validation a material improvement from something like pydantic or zod, which defines schema as code versus in .cue files? I see their docs argue that this can allow for client-side validation and lighter weight schema files which doesn't seem to totally address the library side of things.

2. Have you used the scripting layer before and do you find it useful in practice? I'm struggling a bit to see how I or GPT would use this in my day-to-day.


CUE's validation is very strong, as long as you're ok with some level of functional programming and immutability.

Despite what GPT said above, its Configure, Unify, Execute. The Execute aspect is powerful but you have to be ok with functional programming and immutability.

Pairs up really nicely with Go.


Hey, this is really neat! I've taken a very similar approach in TypeScript. +1 to the sibling comment that recommended parsing with json5 (but don't tell the AI you're doing that, it's a waste of context space and it might get more confused anyway).

I've had luck doing chain-of-thought prompting in the JSON payload as you've described, too. Cheers, really validating to see someone taking a similar approach.


Have you considered Guardrails, https://shreyar.github.io/guardrails/, it’s like Pydantic but for LLMs?


In the last 24 hours I've seen a bunch of projects doing LLM -> JSON. I think we want to be focusing on markdown instead. An intuition I have developed is that an ideal prompt has a very clear narrative structure and very tight "semantic locality" (the instruction is at the end, the most salient data is close to the instruction, etc).

JSON is admittedly way easier to work with up front, but markdown seems to be a more scaleable choice.

Of course, this is all very much an opinion and highly anecdotal at the moment.


Since others are sharing their prompt-only solutions to get JSON, I'll share what I've been using. Has been working reliably:

"Do not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.

{

  "author": "string describing the author full name",

  "year": "number describing the year the book was written",

  "isFiction": "boolean describing if the book is a work of fiction"

  ...
} "


Is there a reason we won’t have LLMs that only speak in JSON? These JSON hacks are clever and cool but feel like they’ll be obsolete in 6 weeks.


Fingers crossed OpenAI / Anthropic / etc do this! Would make working with these APIs for prediction projects that much easier.

Technically speaking it's pretty to force the model into an valid JSON-schema if you have access to the inference autoregressive loop and the logit activations. You can either force known areas of the template to a prefixed template, or fill in the basic JSON wrapper and force the model to choose arbitrary keys and values.

Imagine it might come down to how much of their usage is on generating text vs generating structured prediction payloads.


A lot of their value is turning unstructured text or user intent into the structured data. Even if the primary input is JSON, you want to explain what to do with it in words, and then get JSON back.

Why do you think they will be obsolete in 6 weeks?


I Found that LLMs are pretty good with TOML. The multiline strings are also a real bonus. One thing I thought that was interesting is that sometimes the LLM will mistake the triple quoutes for the backticks so it will output something like this

  ```
  [TOML]
  key="""
  value
  """
  """


You can instruct the LLM on what the outer markers are.

> Generate some TOML for me, surround the output with a pair of triple equal signs (===).


Ha, I just spent two hours hacking something up similar and I eventually just sent a system role with zod schema in the content


I happen to have a very similar idea recently and created this GPT-logic package for node. It basically transforms GPT generated results into JS data types. Check it out if you are interested. https://github.com/wayneshn/gpt-logic


Nice project! I took some inspiration from this as well as https://github.com/jiggy-ai/pydantic-chatcompletion/blob/mas... to create the following:

https://github.com/knowsuchagency/struct-gpt

I tried to make the API as intuitive as possible and added the ability to provide examples to improve the reliability and quality of the LLM's output.


Langchain also has a built in Pydantic output parser https://python.langchain.com/en/latest/modules/prompts/outpu...

However I've found it to be generally unreliable and adds a lot of text to each call.

I suspect this can be improved by: - Only writing the parts of the spec which are needed, rather than the full JSON schema spec - Including relevant examples rather than arbitrary JSON schema examples


Wasn’t there a yesterday proj that asks gpt to output non json but then it turns it into json after? Thus making it less prone to json errors



That one is robust but requires pytorch and huggingface and all kinds of things. I think most of us want something that's reasonable few lines of code and can run in places like serverless and replit.


I built a toy[0] for Typescript that works similarly to this.

It takes the expected return type of a function, translates it into a JSON Schema, queries OpenAI, validates the response, and then magically returns the response as typed output.

[0]https://github.com/jumploops/magic


I wait for the day someone fine-tunes an open model to always respond in JSON. I suspect it won't be long now.


You can use simpler types. Instead of :

        messages=[
            GPTMessage(
                role=GPTMessageRole.SYSTEM,
                content=SYSTEM_PROMPT,
            ),
            GPTMessage(
                role=GPTMessageRole.USER,
                content="Text: I love this product. It's the best thing ever!",
            )
        ]
Try:

        messages=(
            ("system", SYSTEM_PROMPT),
            ("user", "Text: I love this product. It's the best thing ever!")
        )
Or:

        messages=(
            SystemMsg(SYSTEM_PROMPT),
            UserMsg("Text: I love this product. It's the best thing ever!")
        )
This is still Python, not Java.


We've been using a variation like this to great effect:

    TYPES: [string, int, bool, float, uuid, datetime, email, url]
    RELATIONS: [belongs-to, has-one, has-many, many-to-many]

    SCHEMA: """
    {
      "Datamodel": {
        "Name": "<application-name>",
        "Models": {
          "<model-name>": {
            "<field-name>": "<field-type>",
            "<field-name>": "<field-type>",
            "$relations": {
              "<relation-name>": {
                "name": "<relation-name>",
                "type": "<relation-type>",
                "model": "<model-name>"
              },
              "<relation-name>": {
                ...
              }
            }
          },
          "<model-name>": {
            ...
          }
        }
      }
    }
    """
To let users write things like this: https://twitter.com/verdverm/status/1652504163635347456


python newbie here, why did messages change from [ to (


Doesn't matter much in this particular case, one is a tuple, the other is a list, both can be read with a for loop.

The important part is that the way messages types are defined is very heavy, and you can get them to be lighter without sacrificing type safety.


Typed ChatGPT.

Someone had to come along and spoil the fun.

Next it'll be ChatGPT/XML.


Some folks are more comfortable swaddled in well-defined boxes.


Idiot question, I'd like to register and pay for the ChatGPT API but can't find the link to get an API token :-)


You can get API keys from the page https://platform.openai.com/account/api-keys (which you can find open by clicking on your avatar and selecting "View API keys").


Thanks!


If you already have an account with them: https://platform.openai.com/account/api-keys.


Thanks!


What about few shot learning, e.g. injecting 1 or 2 examples of JSON in the prompt? That should be fine as well


I literally tried that yesterday. You will be introducing bias. It works well until you infer with something very similar to one of the examples you used in the prompt but not exactly the same then it will always return the json you used in your example.


the dirtyjson library has been doing this work for me quite well




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: