

Directed Edge sees a post-search web where recommendations rule - GVRV
http://www.techcrunch.com/2009/08/06/yc-funded-directed-edge-sees-a-post-search-web-where-recommendations-rule/?awesm=tcrn.ch_3QZh&utm_campaign=techcrunch&utm_medium=tcrn.ch-twitter&utm_source=twitter.com&utm_content=twitter-publisher-main

======
rwolf
"We can take data sets with millions and millions of data points and figure
out what’s related to a given item in a few milliseconds. Most recommendations
engines pre-compute stuff rather than generating the recommendations in real-
time like we do"

I've been looking into recommendation algorithms recently (started with the
excellent book "Programming Collective Intelligence"), and this sounds
lightyears ahead of the way we currently do things. I suppose you could take
one of the algorithms that requires pre-computing and throw resources at it,
but it seems like they are talking about something new.

Since I'm just getting started, I'd like to find some academic (or blog-faux-
academic) articles on whatever recent advances behind recommendations without
the need for precomputing. Anyone know where to look?

~~~
ntoshev
There is no magic. I don't know what Directed Edge are doing, but simpler
Amazon-style recommendations (people who bought X also bought...) doesn't need
to precompute anything if you choose your data structures properly: get a list
of things people bought along with X, count them (or actually store them
counted), optionally normalize for general popularity, sort in decreasing
order, show top results.

I'd love to hear something from wheels though.

~~~
wheels
Amazon's algorithms are simpler than our own (at least as far as I can tell)
and most recommendation engines use some sort of embedding to reduce the
dimensionality of the problem.

Amazon's related products do in fact seem to be, or very near to, a simple
counting structure. Our ranking algorithm builds a large subgraph around an
item and then does a few passes with a couple different ranking schemes to try
to figure out the hot items within that subgraph, prunes "noisy" connections
(i.e. links that are "hot", but don't actually pack much semantic meaning) and
then tries to scale things so that the results returned aren't simply those
with the largest overlap, but those that are most relevant within that
subgraph relative to the larger graph. In that sense, it has some similarities
to web-search algorithms.

In user-visible terms, that means that our results are often less obvious than
Amazon's recommendations -- for a long time we called that the "tell me
something I don't know" problem. It's no good to do a search for "Miles Davis"
and have "Jazz" com back as a related item. If you know about Miles Davis, you
already know about Jazz.

~~~
hooande
Do you have any kind of results that demonstrate how accurate your
recommendations are? I checked your site but couldn't find anything

~~~
wheels
You mean examples or metrics?

If you mean examples, you can see our algorithm applied to link structure
analysis on the related pages here:

<http://pedia.directededge.com/>

If you mean metrics, the only one that I find really meaningful is a feedback
loop to see what users are in fact interacting with, and we'll have something
in place for that shortly. Synthetic metrics on recommendations quality don't
really impress me because they ignore that recommendation algorithms are
solving an human-computer interaction problem as much as they're solving a
k-nearest-neighbors problem. I've got another article in the pipe on some of
the interesting problems of ranking on real data, but it keeps getting pushed
back since there's you know, a lot to do at the moment. :-)

------
mattmaroon
"Directed Edge truly believes that we’re about to see a shift on the web away
from search and towards recommendations."

The difference is somewhat arbitrary when you think about it. When I Google,
I'm asking it to recommend me stuff related to what I'm looking for. Google is
nothing but the world's best recommendation engine.

There are about 1,000 sites that could use good recommendation technology to
enhance their profits though, so I like this company's monetization chances.
Easy elevator pitch too: recommendations as a service.

~~~
wheels
It's a continuum, as hinted below, but the results are pretty different in the
polar cases. I'll fall back to my Miles Davis example.

Searching for "Miles Davis" returns 10 pages about Miles Davis. Hitting our
engine with Wikipedia data for Miles Davis gives you John Coltrane, Herbie
Hancock, Wayne Shorter, Thelonious Monk, Sonny Rollins, Canonball Aderly, and
so on.

<http://www.directededge.com/?Miles%20Davis>

The search end of the spectrum is about finding something you're looking for
-- recommendations are about discovering things you didn't know about.

The two meet in the middle with "personalized search". If I type "python" into
a search engine, do I want snakes or code? You can probably figure that out
based on what I've done in the past.

~~~
mattmaroon
Yeah, and that's awesome, and I think will totally converge with the type of
search we have today. Imagine you type "Miles Davis" in Bing (which I'm using
because they'd be more likely to experiment with altering the dominant
paradigm than Google) and they show you a column of pages on the left about
Miles and the stuff you're showing on the right. That'd be bad ass.

------
davidw
Wow, congrats to wheels & co for the TC hit and YC - I don't think anyone knew
about that!

------
pclark
Hacker News user "Wheels" is a founder - congrats on the write up.

Didn't realize you were YC funded

------
sethorion
Question:

Greg Linden, who worked on Amazon.com's recommendation engine, has referred to
what he calls the "harry potter problem". To quote from his blog:

'...this calculation would seem to suffer from what we used to call the "Harry
Potter problem", so-called because everyone who buys any book, even books like
Applied Cryptography, probably also has bought Harry Potter. Not compensating
for that issue almost certainly would reduce the effectiveness of the
recommendations, especially since the recommendations from the two clustering
methods likely also would have a tendency toward popular items.'

How did you compensate for this problem? Do you simply ignore vertices in the
graph that have a large degree?

Or, are you using non-linear weighting functions, such as a perceptron's
sigmoid function?

With regard to Wikipedia, almost everyone who has edited an article has also
edited the article on Bill Clinton. So, if you are using the edit-history
metadata to compute recommendations, you would have to compensate for the
"Bill Clinton problem".

------
nrao123
I have been following the "reccomendation algos as a service" for about 2
years now. This definetely seems interesting but a side opportunity could be
to do aggregator/optimizer of recco algos for merchant/publishers similar to
what Rubicon Project/Pubmatic does on aggregating/optimizing ad networks.

The reccoAlgo aggregatot would take all the various recco algo services such
as DirectedEdge, Aggregate Knowledge, Loomia, Minekey, Persai (now dead) and
many others and keep running tests (similar to the netflix prize) and whatever
is better is given more airtime on suggesting related products/pages for
retailers/publishers. The compensation model would work on a percentage of
revenue for additional clicks/purchases on the suggestions.

------
keefe
"....we’ve gone from having a graph-store to having a proper graph
database..."

A graph database and a "triple store" in semantic technologies are essentially
the same thing. This company makes some very aggressive claims that
allegrograph, Jena, Oracle (with Spatial), sesame and others (including the
korean arm of my current company) have also made. Typically, such claims fail
to live up to the marketing. I wonder how this solution compares to these
traditional triple stores?

------
joubert
Are there any objective measurement of the quality of recommendations across
various recommendation systems?

~~~
ntoshev
There is more than one, depending on the problem and what exactly do you want
to improve. For example RMSE is the measurement used for the Netflix Prize.

~~~
wheels
I really liked Greg's commentary on RMSE -- summed up my thoughts better than
I could:

[http://glinden.blogspot.com/2009/03/what-is-good-
recommendat...](http://glinden.blogspot.com/2009/03/what-is-good-
recommendation-algorithm.html)

------
bandris
Also congrats for DirectedEdge for doing the animation without flash on the
front page. Nice!

~~~
wheels
You can actually search in the little bar there for any wikipedia article to
show related stuff, and as kind of a little easter egg, you can specify a
starting page via the query string, e.g.

<http://www.directededge.com/?Y%20Combinator>

~~~
djuedemann
This little wikipedia search bar is hiding nodes from the graph most of the
time. It bothers me a little.

~~~
wheels
We tried a few combinations -- and may try some yet. If it was further right
it threw off the balance of the layout. If it was up at the top it drew too
much attention to itself. We tried messing with the z-index on hover too, but
that looked funky.

~~~
djuedemann
I thought it would be nice if the user could drag it around.

------
utnick
cool, have you thought about entering it in any online recommendation
contests? ( like netflix or the github one )

