Hacker News new | past | comments | ask | show | jobs | submit login

I work on the team that built this and many of our other ranking tech. I can answer any questions people have.

Also shameless plug: using this tech we released some artificial search sessions as an exploratory dataset. https://github.com/dfcf93/MSMARCO/tree/master/Conversational...

Can you explain how this is different from a regular word2vec NLP algorithm?

Honestly its not. There is a paper in SIGIR 2019 about how it was trained and works called 'Generic Intent Representation in Web Search'. The jist of it is if a document has a sat click from multiple queiries make those vectors closer together. Do this over a few billion urls and documents and you can make a vector represenation for any query.

Thanks for the response! I'll definitely dig into it some more.

Awesome. Thanks for also releasing the data set.

Any IP encumbrances we should be aware of?

For the dataset The MS MARCO datasets are intended for non-commercial research purposes only to promote advancement in the field of artificial intelligence and related areas, and is made available free of charge without extending any license or other intellectual property rights. The dataset is provided “as is” without warranty and usage of the data has risks since we may not own the underlying rights in the documents. We are not be liable for any damages related to use of the dataset. Feedback is voluntarily given and can be used as we see fit. Upon violation of any of these terms, your rights to use the dataset will end automatically.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact