Hacker News new | past | comments | ask | show | jobs | submit login
Visual Search at Pinterest [pdf] (arxiv.org)
32 points by kaivi on May 29, 2015 | hide | past | favorite | 7 comments



For reference, here is the link to the real arxiv page: http://arxiv.org/abs/1505.07647 and here is the blog post announcing the paper: http://engineering.pinterest.com/post/120111908004/building-...


I'm curios what tech stack they use for this?


No idea what Pinterest are using, but I led a team building the same thing using (mostly) commodity search kit in 2008.

Feature extraction was done with standard Java libs (proprietary algorithms though). Queries were initially performed using a vector space model, but I moved that to using an inverted index (Lucene) because in our use case the image queries were usually combined with free text and parametric search params.

The main issue we faced was scaling search with large number of query parameters, since a naive implementation created something like 300 query terms for each visual search. We did various things to optimise that, from distributing the index, to using index statistics to pick optimum words to query. I submitted some optimisation code (a modified MoreLikeThisQuery with an LRU term cache) back to Lucene, not sure what happened to it, think the JIRA issue is still open.


Could you give a high level description of the features used? Interest points, corners, edges, color histograms, or something more sophisticated?


The context was fashion retial, so we used interest points & edges to match similar shapes, and histograms for colour similarity.


Google Image search has such a feature too. One can drap&drop a local image to the search bar on the Google Image page. Help page: https://support.google.com/websearch/answer/1325808?hl=en ; info: http://www.quora.com/What-is-the-algorithm-used-by-Google-Se...

One of the earliest such similarity image search was a research prototype on Airliner.net (ca. 2006): http://www.airliners.net/similarity/ , e.g. http://www.airliners.net/search/similarity_search.php?photo_... , http://infolab.stanford.edu/~wangz/project/imsearch/SIMPLIci... , http://alipr.com/cgi-bin/zwang/regionsearch_show.cgi

It's called Reverse image search: http://en.wikipedia.org/wiki/Reverse_image_search


Here are some of the technologies we use:

Caffe - deep learning feature computation and model training

OpenCV - local + other features

Zookeeper - service discovery

Cascading - batch processing jobs

We’ve built infrastructure around some of these libraries to operate at scale. For example we’ve build an incremental feature extraction pipeline that at the core uses caffe + opencv for the feature extraction (more details on how this works is in the paper).

We serve our infrastructure through ec2.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: