Hacker News new | past | comments | ask | show | jobs | submit login

No idea what Pinterest are using, but I led a team building the same thing using (mostly) commodity search kit in 2008.

Feature extraction was done with standard Java libs (proprietary algorithms though). Queries were initially performed using a vector space model, but I moved that to using an inverted index (Lucene) because in our use case the image queries were usually combined with free text and parametric search params.

The main issue we faced was scaling search with large number of query parameters, since a naive implementation created something like 300 query terms for each visual search. We did various things to optimise that, from distributing the index, to using index statistics to pick optimum words to query. I submitted some optimisation code (a modified MoreLikeThisQuery with an LRU term cache) back to Lucene, not sure what happened to it, think the JIRA issue is still open.




Could you give a high level description of the features used? Interest points, corners, edges, color histograms, or something more sophisticated?


The context was fashion retial, so we used interest points & edges to match similar shapes, and histograms for colour similarity.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: