nutlope's comments

nutlope · 2024-11-16T07:20:22 1731741622

Thank you!

nutlope · 2024-11-16T07:20:09 1731741609

Should be up, please try again!

mkl · 2024-11-16T10:15:46 1731752146

It let me upload a file, but didn't produce any output.

nutlope · 2024-11-16T07:16:12 1731741372

Hi all, I'm the author of llama-ocr. Thank you for sharing & for the kind comments! I built this earlier this week since I wanted a simple API to do OCR – it uses llama 3.2 vision (hosted on together.ai, where i work) to parse images into structured markdown. I also have it available as an npm package.

Planning to add a bunch of other features like the ability to parse PDFs, output a response in JSON, ect... If anyone has any questions, feel free to send them and I'll try to respond!

nh2 · 2024-11-16T09:00:21 1731747621

I put in a bill that has 3 identical line items and it didn't include them as 3 bullet points as usual, but generated a table with a "quantity" column that doesn't exist on the original paper.

Is this amount of larger transformation expected/desirable?

(It also means that the output is sometimes a bullet point list, sometimes a table, making further automatic processing a bit harder.)

zainia · 2024-11-16T15:18:27 1731770307

Here's the prompt being used, tweaking that might help: https://github.com/Nutlope/llama-ocr/blob/main/src/index.ts#...

rch · 2024-11-16T21:18:28 1731791908

I've had trouble with pulling scientific content out of poster PDFs, mostly because e.g. nougat falls apart with different layouts.

Have you considered that usage yet?

gcr · 2024-11-16T19:00:57 1731783657

How accurate is this?

When compared with existing OCR systems, what sorts of mistakes does it make?

Szpadel · 2024-11-16T11:55:26 1731758126

> Need an example image? Try ours. Great idea, I wish more services would have similar feature

Curiositry · 2024-11-16T08:20:23 1731745223

Option to use a local LLM?

Eisenstein · 2024-11-16T09:47:40 1731750460

I made a script which does exactly the same thing but locally using koboldcpp for inference. It downloads MiniCPM-V 2.6 with image projector the first time you run it. If you want to use a different model you can, but you will want to edit the instruct template to match.

* https://github.com/jabberjabberjabber/LLMOCR

nirav72 · 2024-11-16T10:08:14 1731751694

MiniCPM-v 2.6 is probably the best self-hosted vision model I have used so far. Not just for OCR, but also image analysis. I have it setup, so my NVR (frigate) sends couple of images upon motion alert from a driveway security camera to Ollama with minicpm-v 2.6. I’m able to get a reasonably accurate description of the vehicle that pulled into the driveway. Including describing the person that exits the vehicle and also the license plate. All sent to my phone.

nutlope · on May 4, 2023

Hey! Have you tried out Edge Streaming yet? It uses the Edge Runtime which is a fraction of the cost of serverless functions and lets you stream responses for much longer than 10 seconds, giving you the "chatting" effect that you see on ChatGPT.

Docs: http://vercel.fyi/streaming Example: https://vercel.com/blog/gpt-3-app-next-js-vercel-edge-functi...

shahahmed · on May 4, 2023

I have not! thanks for letting me know, I'll give it a try.

nutlope · on Sept 2, 2022

It's a conference registration site that involves a series of challenges involving a wordle and a multiplayer experience with a prism built with Three.js