Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft makes AI tool for better search available as an open source project (microsoft.com)
142 points by klhugo 6 days ago | hide | past | web | favorite | 16 comments

I work on the team that built this and many of our other ranking tech. I can answer any questions people have. Also shameless plug: using this tech we released some artificial search sessions as an exploratory dataset. https://github.com/dfcf93/MSMARCO/tree/master/ConversationlS...

the Github link gave me a 404

update: you made a typo, should be https://github.com/dfcf93/MSMARCO/tree/master/Conversational...

Great work! How does this compare to hierarchical navigable small world graph, like nmslib?

I wasnt really familiar with nmslib but I guess I think its a little bit faster and scales to billions of items.

Cool project! One of the challenging use cases mentioned in the description is people taking a picture and asking the search engine, 'What is this?'. Has this been solved? (it is a very hard problem if taken beyond simple object classification)

In many ways yes. For images Instead of using some NLP parser and vectorization technique you vectorize the image and then do a similar lookup

If you want to play with the tech there is a developer kit. https://www.bingvisualsearch.com/develop

how effective is it at identifying mysterious objects, the way it's crowdsourced on Reddit's "/Whatisthis"? (meaning, how big is the index?)

LOL - that's exactly the use case I was thinking of. But wouldn't an AI have to be trained with lots of examples of the item in question to get a high quality detection? If so, it might not be able to find that one image that antique shop in Whereverston has on their web site.

I don't work on the image side but from what I understand the entire index is vectorized so its not categorizing them like a imagenet system would as much as finding a nearest neighbor that can be categorized.

1. how's this different from pysparnn or faiss?

2. does this support both sparse and dense vectors ann?

Good press release title. Now just add it to the pile ;) https://github.com/erikbern/ann-benchmarks

Yes, exactly! This seems like totally overblown title. It's akin to saying Google open sourced their key search tech Kubernetes, which is an open source rendition of Borg, where all the Google workloads run on top of.

lol - good point, foiled the Great Microsoft PR play


no it turns out it's "A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario."

But Math.random() is a really good guess.

Open sourced the print statement

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact