Hacker News new | past | comments | ask | show | jobs | submit login

That's cool! How does it perform compared to more "naive" methods? How did you go about comparing that performance, and was it in a real world RAG?



Yep benchmarks are available at https://github.com/ZeroEntropy-AI/llama-chunk?tab=readme-ov-... , we used this dataset https://github.com/ZeroEntropy-AI/legalbenchrag which is a retrieval-focused version of LegalBench.

It scored better than LlamaIndex's recursive character text splitter and that was including some custom regex work to improve it. If you put enough effort into the regex you could probably get there, but the whole point of the agentic chunking is for it to be automatic and contextual.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: