
Visual Search at Pinterest [pdf] - kaivi
http://arxiv.org/pdf/1505.07647.pdf
======
shackenberg
For reference, here is the link to the real arxiv page:
[http://arxiv.org/abs/1505.07647](http://arxiv.org/abs/1505.07647) and here is
the blog post announcing the paper:
[http://engineering.pinterest.com/post/120111908004/building-...](http://engineering.pinterest.com/post/120111908004/building-
a-scalable-machine-vision-pipeline)

------
hurrycane
I'm curios what tech stack they use for this?

~~~
richmarr
No idea what Pinterest are using, but I led a team building the same thing
using (mostly) commodity search kit in 2008.

Feature extraction was done with standard Java libs (proprietary algorithms
though). Queries were initially performed using a vector space model, but I
moved that to using an inverted index (Lucene) because in our use case the
image queries were usually combined with free text and parametric search
params.

The main issue we faced was scaling search with large number of query
parameters, since a naive implementation created something like 300 query
terms for each visual search. We did various things to optimise that, from
distributing the index, to using index statistics to pick optimum words to
query. I submitted some optimisation code (a modified MoreLikeThisQuery with
an LRU term cache) back to Lucene, not sure what happened to it, think the
JIRA issue is still open.

~~~
boomzilla
Could you give a high level description of the features used? Interest points,
corners, edges, color histograms, or something more sophisticated?

~~~
richmarr
The context was fashion retial, so we used interest points & edges to match
similar shapes, and histograms for colour similarity.

