Hacker News new | past | comments | ask | show | jobs | submit login

First of all, great question.

Second, we use a search service, and vectors are treated as supplementary to the text search, so chunking doesn't matter as much. We will usually take an entire PDF page and embed that, no matter what structure the data on that page is. We do keep track of the name of the document and the page number. For SQL records, we just turn each record into a text string and embed that.




Thanks for your feedback! Could you share a bit about your team? I’m curious how many people are involved and what kinds of skills or roles are needed to make this happen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: