
Machine Learning: Full-Text Search in JavaScript – Relevance Scoring (2015) - octosphere
http://burakkanber.com/blog/machine-learning-full-text-search-in-javascript-relevance-scoring/
======
diegolo
"a general search can be machine learning" I don't get this sentence: Machine
learning is about building a mathematical model of sample data, known as
"training data".

If you want to talk about machine learning and search you should probably talk
about learning to rank
([https://en.m.wikipedia.org/wiki/Learning_to_rank](https://en.m.wikipedia.org/wiki/Learning_to_rank))

~~~
snotrockets
I'd argue that you're too restrictive in your definition. e.g. unsupervised
clustering has no sample training data.

The usual definition (due to Mitchell) is that machine learning is a system
s.t. its performance on a given task improves by past experience.

~~~
thegginthesky
Actually, any unsupervised method, including clustering, still has training
data. The only difference is it doesn't have a target y variable in the
training set to minimize the error metric, hence the name unsupervised.

But the definition you mention is right. Yet, any dataset that you use to fit
your model will be your training set, even if you don't have a train test
split or the like, because you used it to train your model over.

~~~
snotrockets
K-means has no "training data" per se.

------
inertiatic
Search is now machine learning? Interesting introduction to the topic
otherwise.

~~~
softwaredoug
I would say this isn't machine learning, but relevance in general is an
interesting topic to apply supervised learning. Of course the training data is
the hard part,

An article on the topic,
[https://opensourceconnections.com/blog/2017/08/03/search-
as-...](https://opensourceconnections.com/blog/2017/08/03/search-as-machine-
learning-prob/) (disclaimer I wrote it...)

~~~
inertiatic
I also work on this field so I do have an idea of what's possible if you apply
machine learning techniques to improve relevance rankings.

But to my intuition, basic search doesn't feel like a machine learning task.
After reading some of the responses to my post however I'm trying to come up
with a meaningful reason why I wouldn't consider IDF to be machine learning,
given that it is updated as more documents enter the corpus and your system
"learns" to re-rank existing result sets based on these new documents.

------
humbleMouse
Reading your site on my phone and it reloads every 10 seconds. Annoying.

------
rajangdavis
Curious to see how this might compare against Postgres's full-text search.

The text search vector type is pretty much a poor man's bag of words model
(with removing stop words and some lemmatization) but instead of counts, you
get placement of where the words occur.

------
eggie5
he generated query-document features. Now he just needs to collect relevance
labels for the documents, then he can learn a ranker a la LTR.

------
ElD0C
(2015)

------
magma17
relevance==frequency?

anything is ML now...

------
4FNET7
thanks --

