Demo = impressed. How's SeekStorm's prowess in mid-cap enterprise? How hairy is ...

wolfgarbe · 2024-12-02T18:55:51 1733165751

Yes, integration in complex legacy systems is always challenging. As a small startup, we are concentrating on core search technology to make search faster and to make the most of available server infrastructure. As SeekStorm is open-source, system integrators can take it from there.

fiedzia · 2024-12-02T19:05:30 1733166330

Same as any other full-text search solution - it's your job to integrate it.

m348e912 · 2024-12-03T01:00:33 1733187633

>Demo = impressed.

How did you demo? Did you spin up your own instance and index the wikipedia corpus like the docs suggest? I'd like to just give it a whirl on an already running instance.

Never mind, found that someone posted a link already.

jazzyjackson · 2024-12-02T20:06:54 1733170014

On that topic, can anybody chime in on state of the art PDF OCR? Even if that's a multimodal LLM, I've used ChatGPT to extract tabular data from images but need something I can self host for proprietary data.

CharlieDigital · 2024-12-03T00:14:43 1733184883

Azure Document Intelligence (especially with the layout model[0]) is really good. It has both JSON and MD output modes and does a pretty solid job identifying headers, sections, tables, etc.

What's interesting is that they have a self-deployable container model[1] that only phones home for billing so you can self-host the runtime and model.

[0] https://learn.microsoft.com/en-us/azure/ai-services/document...

[1] https://learn.microsoft.com/en-us/azure/ai-services/document...

jazzyjackson · 2024-12-03T02:43:20 1733193800

Peculiar, Thanks!