Hacker News new | past | comments | ask | show | jobs | submit login

Have you considered using an index directly on language tokens (eg. the abstract language tree representing the file) instead of ngrams on the source text?



We have not done this yet, but we do intend to.

Actually, our search engine is so fast that syntax highlighting the search results is often slower than finding them... so if we store the language tokens directly in the index, we'll be able to directly emit syntax highlighted snippets and make it even faster.

It may also enable some interesting search capabilities in the future, like searching within comments or by code structure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: