
Building Cross-Lingual End-to-End Product Search with Tensorflow - kawera
https://hanxiao.github.io/2018/01/10/Build-Cross-Lingual-End-to-End-Product-Search-using-Tensorflow/
======
nerdponx
Even if you aren't interested in "product search" _per se_ , this is a great
technical read about search on hierarchical data, with lots of handsome
graphics and without being too dense.

~~~
xiao_haozi
I find that the entirety of the author's blog is of similar quality. Even
though a very specific context is provided as a point of conveyance, I find
that the style of presentation is effective in allowing for generalization of
the concepts for other use cases. There are also some really great gems in
there - e.g. the fashion MNIST!

------
jschmitz28
> During the inference time, we first represent user input as a vector using
> query encoder; then iterate over all available products and compute the
> metric between the query vector and each of them; finally, sort the results.
> Depending on the stock size, the metric computation part could take a while.
> Fortunately, this process can be easily parallelized.

An alternative is to precompute a search index over the item vectors if the
dataset of items is very large and you’re OK with running an approximate
search to trade a bit of recall for performance, using algorithms provided by
libraries like the following.

Nmslib:
[https://github.com/searchivarius/nmslib](https://github.com/searchivarius/nmslib)

Faiss (Facebook):
[https://github.com/facebookresearch/faiss](https://github.com/facebookresearch/faiss)

Annoy (Spotify):
[https://github.com/spotify/annoy](https://github.com/spotify/annoy)

------
innagadadavida
> Why is m a string/key matching function? Why can’t we use more well-defined
> math function, e.g. Euclidean distance, cosine function?...

Isn't the purpose of a search index (aka inverted index) to compute the cosine
similarity efficiently? Is this not possible to do for latent space dense
vectors? Or am I missing something?

