More

constantinum · 2025-01-30T02:21:00 1738203660

For instace Llamaparse(https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse...)uses LLMs for pdf text extraction, but the problem is hallucination. e.g > https://github.com/run-llama/llama_parse/issues/420

There is also LLMWhisperer that preserves the layout(tables, checkboxes, forms)and hence the context. https://pg.llmwhisperer.unstract.com/

cpursley · 2025-01-30T12:38:05 1738240685

Is this open source? Is it slow Python? That's where I'm stuck.

constantinum · 2025-01-31T18:28:33 1738348113

This is not open-source. It has high accuracy and it is faster too. All you need is to point your documents to the API.

constantinum · 2025-01-10T19:06:42 1736536002

Non-fiction as audiobooks

constantinum · 2024-12-18T02:37:38 1734489458

> The "best" models just made stuff up to meet the requirements. They lied in three ways:

> The main difficulty of the is project lies in correctly identifying page zones; wouldn't it be possible to properly find the zones during the OCR phase itself instead of rebuilding them afterwards?

Anyone curious, try LLMWhisperer[1] for OCR. It doesn't use LLMs, so no hallucination side effects. It also preserves the layout of the input document for more context and clarity.

[1] https://unstract.com/llmwhisperer/

Examples of extracting complex layout:

https://imgur.com/a/YQMkLpA

https://imgur.com/a/NlZOrtX

https://imgur.com/a/htIm6cf

bambax · 2024-12-18T12:47:06 1734526026

Looks interesting, but the cost is prohibitive for a hobby project. Also, it doesn't really solve my problem.

Google Vision already returns the coordinates of each word (and even of each letter), so it's easy to know where the word was on the page, and even, if necessary, to rebuild the page with the words correctly placed -- that's fundamentally what I do with the mouseover on the interactive demo: https://divers.medusis.net/boislisle/pub (at the paragraph level).

But my problem isn't to know where the words are (Google Vision provides that); it's to know what belongs to what, what is footnotes, what is main text, etc. This is what the post discusses. Just having the text following the same layout as in the original wouldn't help, because I'm not trying to reproduce the layout or the typesetting, I want to rebuild the content semantically, so as to do different "flows".

That said, it got me thinking... there may be an opportunity to do a cheaper version of LLMwhisperer? ;-)

constantinum · 2024-12-14T17:01:13 1734195673

Answering an important question: “Why is it difficult to extract meaningful text from PDFs?” https://unstract.com/blog/pdf-hell-and-practical-rag-applica...

constantinum · 2024-12-13T20:13:14 1734120794

I will try it with some complex layout PDFs or documents with tables. These documents have real business use cases for automation — insurance, banking, etc.

Anyone here who wants to convert PDF documents or scanned images as it is preserving the layout, do try LLMWhisperer - https://unstract.com/llmwhisperer/

constantinum · 2024-12-13T02:43:11 1734057791

The chapter where there is a comparison of techniques for structured data extraction is insightful.[1] Does anyone wants to explore more on the structured data extraction techniques, do refer to this piece [2]

[1] https://www.souzatharsis.com/tamingLLMs/notebooks/structured...

[2] https://unstract.com/blog/comparing-approaches-for-using-llm...

constantinum · 2024-12-09T17:38:04 1733765884

Whatsapp for friends and family

Slack for work

Telegram for apartment community

email for very very close friends

iMessage for those using ios

constantinum · 2024-12-09T17:34:15 1733765655

https://eagle.cool/ - image curation app

Raycast

Notability

nickster · 2024-12-10T02:40:06 1733798406

Eagle.cool is a cool application! I can think of a lot of things you could track beyond image assets. I have a few friends that work in the prop industry and they have always talked about wanting a way to track props across productions.

constantinum · 2024-12-07T17:45:57 1733593557

Ahrefs or SEMRush Toss a coin and choose between the two.

constantinum · 2024-11-22T15:37:09 1732289829

GA4 is like a free trial offering to upscale to Looker and google cloud products(paid)