Hacker News new | past | comments | ask | show | jobs | submit login

The issue of inflections can be mitigated by applying Porter's algorithm to each word before you examine it. The algorithm isn't perfect, but it is fast enough to use in a context like this. An Emacs Lisp implementation is available:

https://github.com/kawabata/stem-english/blob/master/stem-en...

You need to read Japanese to make sense of the comments, but it does seem to work, and for cases it can't handle an exception list shouldn't be too hard to implement, most simply by just including exceptions in the dictionary and falling back to exact matching when you can't match a stem.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: