Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CommonSearch https://github.com/commonsearch/ was a project to try and build a search engine where you could "Explain" (in the SQL sense) the result, based on Common Crawl. Open Source and transparent. But it did not seem to have gathered much enthusiasm. Which I find sad.

If you have some loose change on you.. a bit of processing on 71TB of data.. and you got yourself an index precisely like you want it.

Anyway, without "some" NLP no search engine is going to be very useful.

You need to know how to tokenize.. at a minimum. For many languages, this is not as trivial as it is for English.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: