Hacker News new | past | comments | ask | show | jobs | submit login

Looks like a cool project, I would love to see this as a browser plugin of some sort. As for the corpus, I suspect that using articles from Wikipedia would be appropriate. Especially large articles are routinely checked and cleaned up. It has the added benefit of being available in multiple languages.

(https://en.wikipedia.org/wiki/Wikipedia:Database_download)

EDIT: I see this has already been suggested, along with a large amount of other source in another comment by daveytea.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: