Show HN: TrustGraph – Do More with AI with Less (Open Source AI Infrastructure)

TaterTots · 2024-10-07T15:06:21 1728313581

Very interesting project! What’s different about how this builds knowledge graphs from other projects?

JackColquitt · 2024-10-07T15:10:56 1728313856

One of the key differences is that the graph edges are being stored directly in scalable graph stores - either Cassandra or Neo4j. Also, the graph edges are structured as RDF (https://www.w3.org/TR/rdf12-schema/). In TrustGraph version 0.11.20, the extraction process follows the patterns that many projects use: finding entities (both conceptual and person, places, things, etc.) and relationships between entities. Upcoming releases will continue to evolve this process to make the extracted knowledge graph much more granular, especially focusing on the source of the extracted graph edges.

Daniel

chiccomagnus · 2024-10-09T12:58:38 1728478718

Do you use naive chunking? Have you tried something else?

JackColquitt · 2024-10-09T20:16:28 1728504988

The biggest issue I have with semantic chunking is that it requires a LLM to help create the breakpoints. That's a pretty big cost and latency penalty, for potentially no benefit. That being said, we've seen chunk size have a huge impact on the naive extraction to the graph. Using recursive character chunking showed huge gains from going from 1000 characters down to 500 characters, even with long context LLMs. However, once we got out to 2000-4000 character chunks, there didn't appear to be much difference. But, if you're looking to extract maximum detail from a text corpus, it seems utra-small chunking is likely beneficial.

That being said, with ultra-small chunking, there's a lot of redundancy in the extracted graph edges. These are some of the problems were trying to solve with the TrustGraph extraction processes.

Daniel

cybermaggedon · 2024-10-09T18:47:37 1728499657

Have got a recursive chunker and token chunker integrated currently, all configurable. Can add others. Daniel wrote a great blog about trying different chunking parameters at blog.trustgraph.ai/p/dark-art-of-chunking