Hacker News new | past | comments | ask | show | jobs | submit login

You can do some really cool things now with these models, like ask them to extract not just the text but figures/graphs as nodes/edges and it works very well. Back when GPT-4 with vision came out I tried this with a simple prompt + dumping in a pydantic schema of what I wanted and it was spot on, pretty much this (before json mode was a supported):

    You are an expert in PDFs. You are helping a user extract text from a PDF.

    Extract the text from the image as a structured json output.

    Extract the data using the following schema:

    {Page.model_json_schema()}

    Example:
    {{
      "title": "Title",
      "page_number": 1,
      "sections": [
        ...
      ],
      "figures": [
        ...
      ]
    }}

https://binal.pub/2023/12/structured-ocr-with-gpt-vision/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: