Hacker News new | past | comments | ask | show | jobs | submit login
Alike: light kNN library for Node (github.com)
10 points by mck- on July 16, 2013 | hide | past | web | favorite | 8 comments



I can see two problems with this:

-- A naive linear scan for a lookup will not scale as the size of your database grows larger. You should be looking into space-partitioning trees, or approximate methods like locality-sensitive hashing.

-- Euclidean distance is a terrible metric for kNN on non-metric spaces, which is what your movie example is. It will also be beaten to a pulp by the Curse of Dimensionality: http://en.wikipedia.org/wiki/Curse_of_dimensionality#Distanc...


Thanks for the feedback/ideas :)

You're right, it won't scale as well as could be for large datasets -- or work on large dimensionalities. It's meant to be a light-weight solution for the more common use-cases..

For larger cases, it would be good indeed to resort to better methods.. I wanted it to be stateless and functional, whereas with space-partitioning trees, don't you need to maintain the tree (and not generate it on the fly for it to be scalable)?

As for your second point, could you elaborate on why the movie example is non-metric?


> I wanted it to be stateless and functional, whereas with space-partitioning trees, don't you need to maintain the tree (and not generate it on the fly for it to be scalable)?

Yes, but it's essential to maintain some sort of summary or index data structure to make the method scalable. A linear scan may be stateless, but that's hardly an issue when the method won't scale beyond a few hundred examples.

> As for your second point, could you elaborate on why the movie example is non-metric?

No triangle inequality.


The simple algorithm as is uses sorting, hence nlogn -- scales reasonably well into thousands of examples.. a tree would be klogn -- minor improvement unless n >> k?


Alike is a versatile light-weight kNN/similarity library that can be useful for many Machine Learning projects. Whether you are building a recommendation system, or an optimization model, comparing objects is pervasive -- feedback welcome!


I've been looking for this!


I'm glad it's useful :) What is your use-case?


I am making a website to match people with similar values and ideas! It just happens that we measure these properties on values from -100 to 100, I think it could fit perfectly. It even has ability to attribute weights to different properties.. sounds like fun =)




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: