Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: TrustGraph – Do More with AI with Less (Open Source AI Infrastructure) (github.com/trustgraph-ai)
22 points by cybermaggedon 7 days ago | hide | past | favorite | 5 comments
Hi HN,

We’re Daniel and Mark, the creators of TrustGraph (https://github.com/trustgraph-ai/trustgraph). TrustGraph is an open source, full end-to-end AI infrastructure that automates knowledge graph building and querying along with modular agent integration. A unique aspect of TrustGraph is that the graph building is a one-time process that builds reusable knowledge cores that can be stored, shared, and reloaded. You can read more about TrustGraph knowledge cores here (https://trustgraph.ai/docs/cores/).

Throughout our careers, we’ve been faced with huge datasets of unstructured knowledge - thousands of pages, tens of thousands of pages of documents with facts disconnected by thousands of pages. Knowledge graphs and AI unlock a solution to convert this unstructured knowledge to an enriched knowledge structure enabling AI to extract accurate intelligence.

Unlike AI frameworks, TrustGraph is your infrastructure. All the services, stores, observability, and backbone gets deployed in a single package. Once deployed, TrustGraph enables you to:

- Ingest PDF, TXT, and MD files in batches - Chunk ingested docs with multiple chunking options and parameters - Structure the knowledge in each chunk as RDF - Convert each chunk to mapped vector embeddings - Store RDF triples in either Cassandra or Neo4j - Store vector embeddings in Qdrant - Monitor system performance, like CPU/memory resources and token throughput, with a Grafana dashboard - Store and load knowledge cores - Query the graph store and VectorDB for AI generation - Easy agent integration with Apache Pulsar backbone

TrustGraph is model agnostic and currently supports:

- Anthropic API - AWS Bedrock API - AzureAI API - OpenAI API in Azure - Cohere API - Llamafile API - Ollama API - OpenAI API - VertexAI API

TrustGraph is deployed with either Docker or Kubernetes. For knowledge extraction, we’ve seen Claude 3 Haiku and Gemini 1.5 Flash provide the best value. However, with a quality knowledge core extraction, you can use locally deployed SLMs to get big LLM performance on your dataset. Our goal is to be able to use TLMs (models 3B parameters and smaller) for AI generation on knowledge cores.

Trust is the foundation of TrustGraph’s goals. TrustGraph aims to enable “accuracy first” AI generation with our fully transparent and open source approach. Realtime observability will enable doing more with AI with less - less compute, less memory, and less power. We have a vision for “open knowledge cores”, that will contain common terms, definitions, and information to aid the generation for niche use cases driven by community.

We hope you join us on this journey of doing more with less. Daniel and Mark

Give us a try: https://github.com/trustgraph-ai/trustgraph Full Documentation: https://trustgraph.ai/docs/getstarted Blog: https://blog.trustgraph.ai Join the Community: https://discord.gg/sQMwkRz5GX






Very interesting project! What’s different about how this builds knowledge graphs from other projects?

One of the key differences is that the graph edges are being stored directly in scalable graph stores - either Cassandra or Neo4j. Also, the graph edges are structured as RDF (https://www.w3.org/TR/rdf12-schema/). In TrustGraph version 0.11.20, the extraction process follows the patterns that many projects use: finding entities (both conceptual and person, places, things, etc.) and relationships between entities. Upcoming releases will continue to evolve this process to make the extracted knowledge graph much more granular, especially focusing on the source of the extracted graph edges.

Daniel


Do you use naive chunking? Have you tried something else?

The biggest issue I have with semantic chunking is that it requires a LLM to help create the breakpoints. That's a pretty big cost and latency penalty, for potentially no benefit. That being said, we've seen chunk size have a huge impact on the naive extraction to the graph. Using recursive character chunking showed huge gains from going from 1000 characters down to 500 characters, even with long context LLMs. However, once we got out to 2000-4000 character chunks, there didn't appear to be much difference. But, if you're looking to extract maximum detail from a text corpus, it seems utra-small chunking is likely beneficial.

That being said, with ultra-small chunking, there's a lot of redundancy in the extracted graph edges. These are some of the problems were trying to solve with the TrustGraph extraction processes.

Daniel


Have got a recursive chunker and token chunker integrated currently, all configurable. Can add others. Daniel wrote a great blog about trying different chunking parameters at blog.trustgraph.ai/p/dark-art-of-chunking



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: