Llava1.6, IntenVL, CogVLM2 can all do OCR with nothing but tiled image embedding...

qeternity · 2024-06-07T22:27:47 1717799267

They can do it. They can not do it particularly well compared to SoTA OCR systems.

Onawa · 2024-06-07T14:47:18 1717771638

Do you know of any guides or tutorials to doing this? I tried using the MiniCPM model for this task, but it just OCRed a tiny bit of information then told me that it couldn't extract the rest.

pwillia7 · 2024-06-07T15:40:11 1717774811

I bet you could get this working in https://github.com/comfyanonymous/ComfyUI

I have done some other LLava stuff in it

3abiton · 2024-06-07T15:44:57 1717775097

I thought ComfyUI was mainly for SD. I should get into the game again.

lagniappe · 2024-06-07T15:50:13 1717775413

You can build just about anything with it

pests · 2024-06-07T18:49:59 1717786199

thanks been trying to remember the name of this project for weeks now

cpursley · 2024-06-07T19:49:51 1717789791

How well does this work on complex data tables?

tictacttoe · 2024-06-07T19:33:44 1717788824

I found llava to be disappointing, but Claude Haiku is quite good