
The ugliest hack I've ever pulled off - adamsmith
http://blogs.xobni.com/asmith/archives/36
======
jwp
Yeah, yeah, so the engineering part wasn't that great. But were the results
good? Which algorithms did you use? What were your features? Seems like an
interesting experiment, especially with all the gratuitous "friending" people
do. Reminds me of a related paper:
<http://www.hpl.hp.com/research/idl/papers/facebook/facebook.pdf>

~~~
adamsmith
Check out the report -- <http://www.scribd.com/doc/747/Friendship-Prediction-
on-Facebook>

Cliffs notes version:

It turns out that how many friends of friends you have in common is the best
predictor. After that, it's the number of photos you appear together in, and
how many photos your friends of friends appear in. Following that is the
number of classes you have in common.

All of the traits (like religious views, what state you're from, guy/girl,
etc) are secondary.

It worked pretty well.

(Coincidentally, the facebook friendship prediction was my answer to the last
question on the YC app.)

~~~
jwp
The non-friend vs anti-friend distinction didn't even occur to me, but it's
clearly an important part of the experiment. I like all the discussion of
data. Fun problem.

I did not realize squaring an adj matrix tells you what it does. Thanks for
edjumacating me.

Did you go past f2hops? Seems like 3 would be reasonable and predictive. Since
1/2 of your tree is so small, and there were 12k nodes in the tree, that
suggests to me a pretty easy task. Do you agree? It would be interesting to
see if PCA or LDA pick the same features as the decision trees did. Just a
click away in Weka, after all.

(An aside, and neat hack: Buddy of mine just walked in and saw the document on
my screen. He saw the decision tree and said, "I remember those. In grad
school I printed out decision trees as C if/else statements. Part of running
my decision tree was a call to gcc.")

~~~
adamsmith
Hi jwp,

I think I tried or wanted to do f3hops. Either I did it and I was stretching
the 2GB memory limit, or I couldn't. As you exponentiate the sparse matrix
more and more you get a less and less sparse matrix. I was really hurting for
memory.

Yeah, I really wish I could have spent more time exploring the algorithms
side. If I didn't start Xobni during summer 2006 the plan was to write a book
on machine learning, in practice. One of these days..

P.S. Would love to get in touch. Can you post your email or send me a note? My
email is adam dot smith foo xobni.com where foo == @.

