
Stemmer – An English (Porter2) Stemming Implementation in Elixir - fredwu
https://github.com/fredwu/stemmer
======
Isamu
Interesting to see the Porter2 algorithm
([http://snowballstem.org/algorithms/english/stemmer.html](http://snowballstem.org/algorithms/english/stemmer.html)),
having also implemented a Porter stemmer ages ago myself.

But I have to ask, why are we still using this approach, which has known
inaccuracies and quirks (in English)? Its one virtue is compactness, which in
the old days was more important than accuracy.

But it is fun, I'll grant you that.

~~~
ch4s3
I think porter2 works well enough. You don't need lexical category like
lemmatisation, so it's a lot simpler than that approach.You also don't need a
huge volume oftraining data.

~~~
rspeer
You don't need any training data to do better than Porter2, just lexical data.
If you're content to map the same surface form to the same stem regardless of
context, as Porter2 would, what you need is a dictionary mapping words to
their stems.

This used to be considered a lot of memory to use. It's not anymore.

This leaves you with some edge cases (is "axes" the plural of "axis" or
"axe"?), but it's much better than mixing up the words "universe" and
"university".

~~~
ch4s3
That's a good point, I think I overstated the need for training data.

------
thibaut_barrere
"Stemmer was built to support the Simple Bayes [1] library"

[1]
[https://github.com/fredwu/simple_bayes](https://github.com/fredwu/simple_bayes)

------
awetzel
For those interested in the Stemming subject in Elixir, I made a simple Elixir
wrapper around snowball implementations of stemmers :
[https://github.com/awetzel/stemex](https://github.com/awetzel/stemex)

snowball -> C -> NIF -> Elixir

------
qaq
Nice to see Elixir ecosystem growing, congrats on cool release

------
gariany
I'm sorry, but. why is this important? #therestoftheworld

~~~
qaq
Because it's Elixir :)

