You could achieve this with a single graph. Graphiti has a "message" EpisodeType that expects transcripts in a "<user>: <content>" format. When using this EpisodeType, Graphiti pays careful attention to "users," creating nodes for them and maintaining "fact" context for each user subgraph.
"Facts" shared across all users will also be updated universally. Alongside Graphiti's search, you'd be able to use cypher to query Neo4j to, for example, find hub nodes (aka highly-connected nodes), identifying common beliefs.
I see that you mention Microsoft’s GraphRAG. My understanding is that a key part of their approach is hierarchical agglomeration of graph clusters to be able to answer wide questions from the graph. Is that in the works?
Yes, that is in the works and is a high priority for us. The major discussion point internally around implementing this feature has been on the retrieval portion. In general we want to provide many flexible search strategies that return a variety of different information. We want to organize search in such a way that it is flexible enough to meet a variety of demands, while also being ergonomic enough to be usable and understandable. We want to make sure that we update our retrieval approach at the same time as adding the community summaries so that it is easy to make use of this additional information.
Our implementation will likely involve us adding community nodes that will contain a summary of the nodes in that community. Did you have any perspective or opinions on best ways to implement the graphRAG style summarizations?
Hey HN! We're Paul, Preston, and Daniel from Zep. We've just open-sourced Graphiti, a Python library for building temporal Knowledge Graphs using LLMs.
Graphiti helps you create and query graphs that evolve over time. Knowledge Graphs have been explored extensively for information retrieval. What makes Graphiti unique is its ability to build a knowledge graph while handling changing relationships and maintaining historical context.
At Zep, we build a memory layer for LLM applications. Developers use Zep to recall relevant user information from past conversations without including the entire chat history in a prompt. Accurate context is crucial for LLM applications. If an AI agent doesn't remember that you've changed jobs or confuses the chronology of events, its responses can be jarring or irrelevant, or worse, inaccurate.
## Zep’s Suboptimal Fact Pipeline
Before Graphiti, our approach to storing and retrieving user “memory” was, in effect, a specialized RAG pipeline. An LLM extracted “facts” from a user’s chat history. Semantic search, reranking, and other techniques then surfaced facts relevant to the current conversation back to a developer for inclusion in their prompt.
We attempted to reconcile how new information may change our understanding of existing facts:
Fact: “Kendra loves Adidas shoes”
User message: “I’m so angry! My favorite Adidas shoes fell apart! Puma’s are my new favorite shoes!”
Facts:
- “Kendra used to love Adidas shoes but now prefers Puma.”
- “Kendra’s Adidas shoes fell apart.”
Unfortunately, this approach became problematic. Reconciling facts from increasingly complex conversations challenged even frontier LLMs such as gpt-4o. We saw incomplete facts, poor recall, and hallucinations. Our RAG search also failed at times to capture the nuanced relationships between facts, leading to irrelevant or contradictory information being retrieved.
We tried fixing these issues with prompt optimization but saw diminishing returns on effort. We realized that a graph would help model a user’s complex world, potentially addressing these challenges.
We were intrigued by Microsoft’s GraphRAG, which expanded on RAG text chunking with a graph to better model a document corpus. However, it didn't solve our core problem: GraphRAG is designed for static documents and doesn't natively handle temporality.
So, we built Graphiti, which is designed from the ground up to handle constantly changing information, hybrid semantic and graph search, and scale:
- Temporal Awareness: Tracks changes in facts and relationships over time. Graph edges include temporal metadata to record relationship lifecycles.
- Episodic Processing: Ingests data as discrete episodes, maintaining data provenance and enabling incremental processing.
- Hybrid Search: Semantic and BM25 full-text search, with the ability to rerank results by distance from a central node.
- Scalable: Designed for large datasets, parallelizing LLM calls for batch processing while preserving event chronology.
- Varied Sources: Ingests both unstructured text and structured data.
Graphiti has significantly improved our ability to maintain accurate user context. It does a far better job of fact reconciliation over long, complex conversations. Node distance reranking, which places a user at the center of the graph, has also been a valuable tool. Quantitative data evaluation results may be a future ShowHN.
Work is ongoing, including:
1. Improving support for faster and cheaper small language models.
2. Exploring fine-tuning to improve accuracy and reduce latency.
3. Adding new querying capabilities, including search over neighborhood (sub-graph) summaries.
Running neo4j on reasonable hardware (& depending on graph size), approximately 70% of the latency is calling OpenAI's embedding API. We've seen latency up to 750ms, for just the 3rd-party API call. :-/
As a result, we host embedding models on rented GPUs and get a P90 latency of 10ms.
Something worth noting is that the parent comment refers to using Cursor, not ChatGPT/Claude.ai. The latter are general-purpose chat (and, in the case of ChatGPT, agentic) applications.
Cursor is a purpose-built IDE for software development. The Cursor team has put a lot of research and sweat into providing the used LLMs (also from OpenAI/Anthropic) with:
- the right parts of your code
- relevant code/dependency documentation
- and, importantly, the right prompts.
to successfully complete coding tasks. It's an apple and oranges situation.
We extensively use vLLM's support for Outlines Structured Output with small language models (llama3 8B, for example) in Zep[0][1]. OpenAI's Structured Output is a great improvement on JSON mode, but it is rather primitive compared to vLLM and Outlines.
# Very Limited Field Typing
OpenAI offers a very limited set of types[2] (String, Number, Boolean, Object, Array, Enum, anyOf) without the ability to define patterns and max/min lengths. Outlines supports defining arbitrary RegEx patterns, making extracting currencies, phone numbers, zip codes, comma-separated lists, and more a trivial exercise.
# High Schema Setup Cost / Latency
vLLM and Outlines offer near-zero cost schema setup: RegEx finite state machine construction is extremely cheap on the first inference call. While OpenAI's context-free grammar generation has a significant latency penalty of "under ten seconds to a minute". This may not impact "warmed-up" inference but could present issues if schemas are more dynamic in nature.
Right now, this feels like a good first step, focusing on ensuring the right fields are present in schema-ed output. However, it doesn't yet offer the functionality to ensure the format of field contents beyond a primitive set of types. It will be interesting to watch where OpenAI takes this.
Zep is building the long-term memory layer for the LLM application stack. With a fast growing open source community and recently launched cloud service, we're seeking a developer relations engineer to accelerate adoption among developers and help shape our roadmap.
In this highly visible role, you'll engage with the developer community across multiple channels - writing technical content, delivering presentations, creating sample apps/demos, and fielding inquiries. Working directly with our founder + engineering team, you'll help steer our product strategy.
Zep is funded by Y Combinator, Engineering Capital, and angels such as Guillermo Rauch (Vercel).
We use Ellipsis across most of our repos. PR titling and summaries are great and definitely reduce the cognitive load for both code contributors and reviewers. Reviews are also a good first line of defense for code quality, catching issues human reviewers haven't caught. Though on occasion it's also a little overly pedantic or confused. I guess not unlike us humans at times ;-)
They seem to be broken when I try any HF ids besides what came preconfigured. e.g. just tried brucethemoose/Yi-34B-200K-DARE-merge-v5-3.1bpw-exl2-fiction or LoneStriker/shisa-7b-v1-3.0bpw-h6-exl2
"Facts" shared across all users will also be updated universally. Alongside Graphiti's search, you'd be able to use cypher to query Neo4j to, for example, find hub nodes (aka highly-connected nodes), identifying common beliefs.
More here: https://help.getzep.com/graphiti/graphiti/adding-episodes