hi HN,
We've just open-sourced knowledge table, a tool designed to simplify extracting and exploring structured data from unstructured documents.
We built it internally to help solve some of the structured data problems we work on, and kept getting questions about it when we used it in demos. A lot of it was motivated by the frustration of zero-shot multi-document retrieval, as well as the need for non-technical domain experts to have an interface they're familiar with. We found that inserting a tabular intermediary step for structured data construction in your backend RAG system improved accuracy and consistency, and was immediately explainable (and critiquable) by domain experts.
It's built with react and fastapi, all dockerized. We're going to consistently contribute to this, mostly integrations with some feature extensions we learn from customer work. Future work is outlined in the readme and roadmap.
key features:
- natural language interface for data extraction
- customizable extraction rules
- chunk linking for data traceability
- spreadsheet-like interface for familiarity
- extensible backend for developers
This is intended to be a community project, and we're looking for feedback and contributions, but we'll be working on it regardless as customers use it. We're particularly interested in hearing about potential use cases, feature ideas, and code contributions. Suggestions and critiques welcome.
github: https://github.com/whyhow-ai/knowledge-table
demo: https://knowledge-table-demo.whyhow.ai/