Visual Search at Pinterest [pdf]

shackenberg · on June 1, 2015

For reference, here is the link to the real arxiv page: http://arxiv.org/abs/1505.07647 and here is the blog post announcing the paper: http://engineering.pinterest.com/post/120111908004/building-...

hurrycane · on May 29, 2015

I'm curios what tech stack they use for this?

richmarr · on May 29, 2015

No idea what Pinterest are using, but I led a team building the same thing using (mostly) commodity search kit in 2008.

Feature extraction was done with standard Java libs (proprietary algorithms though). Queries were initially performed using a vector space model, but I moved that to using an inverted index (Lucene) because in our use case the image queries were usually combined with free text and parametric search params.

The main issue we faced was scaling search with large number of query parameters, since a naive implementation created something like 300 query terms for each visual search. We did various things to optimise that, from distributing the index, to using index statistics to pick optimum words to query. I submitted some optimisation code (a modified MoreLikeThisQuery with an LRU term cache) back to Lucene, not sure what happened to it, think the JIRA issue is still open.

boomzilla · on May 29, 2015

Could you give a high level description of the features used? Interest points, corners, edges, color histograms, or something more sophisticated?

richmarr · on May 30, 2015

The context was fashion retial, so we used interest points & edges to match similar shapes, and histograms for colour similarity.

frik · on May 29, 2015

Google Image search has such a feature too. One can drap&drop a local image to the search bar on the Google Image page. Help page: https://support.google.com/websearch/answer/1325808?hl=en ; info: http://www.quora.com/What-is-the-algorithm-used-by-Google-Se...

One of the earliest such similarity image search was a research prototype on Airliner.net (ca. 2006): http://www.airliners.net/similarity/ , e.g. http://www.airliners.net/search/similarity_search.php?photo_... , http://infolab.stanford.edu/~wangz/project/imsearch/SIMPLIci... , http://alipr.com/cgi-bin/zwang/regionsearch_show.cgi

It's called Reverse image search: http://en.wikipedia.org/wiki/Reverse_image_search

andrewzhai · on May 29, 2015

Here are some of the technologies we use:

Caffe - deep learning feature computation and model training

OpenCV - local + other features

Zookeeper - service discovery

Cascading - batch processing jobs

We’ve built infrastructure around some of these libraries to operate at scale. For example we’ve build an incremental feature extraction pipeline that at the core uses caffe + opencv for the feature extraction (more details on how this works is in the paper).

We serve our infrastructure through ec2.