Hey everyone. As AI gets better and better and multimodal I believe one of the most common use cases will extracting structured data from unstructured files. So things like shipping labels, bank statements, invoices, patents, etc.
I plan to release workflows soon which will simply take any file via email or form and save the structured content on a spreadsheet/csv or a new PDF.
Let me know if you would be interested in trying the workflows and if you have a use case to extract/organize different files.
I got one. Say I gave it a corpus of structured[1] files that follow Schema X, then I gave it a pile of outputs (PDF, HTML) generated from that corpus, where StructureFileName.xml = StructuredFileName.pdf. Could you see this doohickey being able to take in just the PDF/HTML/Word output, then output its best guess at chucking that into a Schema X file?
Pretty much everyone I work with are XML fetishists, and adore hard coded ontologies and taxonomies forged with many years of blood and sweat. I'm a bit more pragmatic and technology-minded. Even before AI I was pretty sure that using Python ML to generate a graph of keywords was a hell of a lot more useful than handcrafted ontology - doesn't cost hundreds of thousands of dollars in billable hours either. Now, with this stuff, we can get around hard coding all that structure itself, and maybe have source documents that normal people can read without about five zeroes worth of bespoke tools.
[1] And when I say "structured" I mean *completely frickin bananas".
Great! now do this for commercial notifications sent to my email. Things like bank transfer. Usps deliveries. Shopping delivery notifications. Food delivery.
I plan to release workflows soon which will simply take any file via email or form and save the structured content on a spreadsheet/csv or a new PDF.
Let me know if you would be interested in trying the workflows and if you have a use case to extract/organize different files.