
Understanding BERT and Search Relevance (2019) - martinlaz
https://opensourceconnections.com/blog/2019/11/05/understanding-bert-and-search-relevance/
======
binarymax
Hey I wrote this about 6 months ago, nice to see it here! AMA, but please note
that this is SoTA territory and things have changed significantly since then.
Notably, folks are now seeing good preliminary results with SBERT (sentence
level encodings instead of token-level):
[https://www.aclweb.org/anthology/D19-1410.pdf](https://www.aclweb.org/anthology/D19-1410.pdf)

~~~
charlescearl
I've glanced at an approach augments an existing document by adding likely
search terms generated by BERT model specialized for Q&A

[https://cs.uwaterloo.ca/~jimmylin/publications/Nogueira_Lin_...](https://cs.uwaterloo.ca/~jimmylin/publications/Nogueira_Lin_2019_docTTTTTquery.pdf)

Do you have any thoughts on this or similar approaches in production?

~~~
binarymax
I'm really interested in this general area of generating good candidate
queries from documents, but haven't spent much time on it. I haven't seen it
in production, and I don't think the topic gets as much attention as it should
because intuitively it sounds like a really good idea, so thanks for the
paper!

Results for R@1000 look pretty impressive, and I'll check out the project
code. Given the high recall and low MRR, using this for the initial recall
step with a rerank is definitely worth looking at. if the high recall carries
over to your own data and you can rerank those top 1k to increase precision,
then you've got something good.

