JackColquitt's comments

JackColquitt · on Dec 23, 2024

Ever wonder what it's REALLY like inside the world of tech startups? Join Daniel Davis and Brennan Woodruff, the COO of GoCharlie (a generative AI startup), for a brutally honest conversation about the realities of creating an AI startup versus the usual Silicon Valley tech hype.

Brennan shares his unique journey from KPMG to Uber, then to SoftBank's Vision Fund, and finally into the world of generative AI. They dive deep into the harsh realities of the tech industry, the surprising camaraderie between Uber and Lyft employees, and the overwhelming competition and noise within the AI space.

This is a must-watch for anyone interested in tech, startups, AI, or just a behind-the-scenes look at Silicon Valley thinking and what it takes to build a company in today's world.

JackColquitt · on Dec 19, 2024

Is it possible to retrieve information in a RAG system with matrix multiplication? The CTO and Cofounder of FalkorDB, Roi Lipman, explains how using GraphBLAS techniques, yes, you can store and query a knowledge graph with math!

Highlights:

What is a sparse matrix?

How does it compare to a VectorDB?

Are GNNs the future?

Are knowledge graphs the answer to the lack of explainability in AI?

JackColquitt · on Oct 9, 2024

The biggest issue I have with semantic chunking is that it requires a LLM to help create the breakpoints. That's a pretty big cost and latency penalty, for potentially no benefit. That being said, we've seen chunk size have a huge impact on the naive extraction to the graph. Using recursive character chunking showed huge gains from going from 1000 characters down to 500 characters, even with long context LLMs. However, once we got out to 2000-4000 character chunks, there didn't appear to be much difference. But, if you're looking to extract maximum detail from a text corpus, it seems utra-small chunking is likely beneficial.

That being said, with ultra-small chunking, there's a lot of redundancy in the extracted graph edges. These are some of the problems were trying to solve with the TrustGraph extraction processes.

Daniel

JackColquitt · on Oct 7, 2024

One of the key differences is that the graph edges are being stored directly in scalable graph stores - either Cassandra or Neo4j. Also, the graph edges are structured as RDF (https://www.w3.org/TR/rdf12-schema/). In TrustGraph version 0.11.20, the extraction process follows the patterns that many projects use: finding entities (both conceptual and person, places, things, etc.) and relationships between entities. Upcoming releases will continue to evolve this process to make the extracted knowledge graph much more granular, especially focusing on the source of the extracted graph edges.

Daniel