Hacker News new | past | comments | ask | show | jobs | submit login
Who Does Facebook Think You Are Searching For? (thekeesh.com)
269 points by jkeesh on Aug 18, 2011 | hide | past | favorite | 108 comments

Hi, my name is Keith Adams, and I worked on Facebook's typeahead. The system has evolved a bit since we launched, but I talked about first_degree.php in the tech talk we did about the typeahead back-end last year:


Briefly, first_degree.php returns objects you're directly connected to in the graph, and if there's space a few machine-generated guesses at other good results. We preload these as soon as you focus the "Search" box at the top of the page, in the hopes of having some decent results to show when you start typing. The index field does, as the article inferred, represent our best guess at a ranking function on these first degree objects. The inputs to this ranking function explicitly do not include other users' behavior on the site. I talked a bit about our ranking function in this quora question:


Edit: To clarify what a lot of people seem to be wondering, visiting someone's profile does not affect the search results of anyone but yourself.

Keith, I would like a feature to clear my facebook search history/profile view history. I consider it a privacy issue.

The knowledge that anyone who stumbles across my logged-in profile can see who I'm interested in by experimenting in my friend search bar has a chilling effect on the profiles I view. Google and most browsers have a 'clear search history' feature- it would be nice if facebook had the same.

That's not a privacy issue, its a security issue. If you can't be bothered to log out your account on shared computers, people are going to get access to your account, and a malicious person will be able to do anything they please. Search type-aheads are the smallest of your problems.

Being able to uncover what profiles have been viewed is a significant problem for a lot of people. And they aren't worried about malicious users - they are worried about their partners/parents/children seeing which profiles they've viewed. Who looks at what profile is one of the most sensitive pieces of information facebook has, and I'm a little surprised that they do this.

I'd be more worried about people accessing my Internet banking or my email than who I look at on Facebook. Log out of your bloody accounts before you leave your PC!

Just to be clear - I don't think people are especially worried about strangers seeing which profiles they've viewed (it would be quite unlikely it would be meaningful) - they are worried about the people they trust seeing which profiles they've viewed.

Facebook doesn't show which profiles you've viewed explicitly for a good reason - I think they were just hoping that this would be obfuscated enough not to get much notice.

they are worried about the people they trust seeing which profiles they've viewed

In other words, you don't trust the people you trust. Perhaps you need to rethink your behavior, based on your revelation that you don't actually trust them.

It's not always as simple as that. There are times where it's better to keep certain information from certain people no matter how much you trust them or like them. It's just better that way.

Some of us have had significant others with jealousy issues. It's not that we don't trust them, it's just better if they don't worry about silly issues like whose Facebook profiles we've been perusing. Not because it's a secret or because there is anything to hide, just because it helps the SO control their counter-productive impulses if they aren't informed on the matter.

That same kind of avoidance can be applicable in all other sorts of relationships. It's not that the people aren't trusted, it's just that there's no reason to know, and it will only cause damage if they find out -- not because it's bad or trust-breaking, but because the person's reaction may be problematic for completely different reasons (compulsions, or potential taint of future circumstance).

this is the old "you don't have anything to hide" mantra dressed up in more personal terms. people aren't perfect and they have the right to distrust their closest friends if they want to.

Of course they do. That's why modern operating systems have security.

It just doesn't make sense to insist that both (a) I want to leave everything unlocked and open; and (b) I don't want anyone to be able to see what's there.

You've got every right to protect your privacy from prying eyes. But if you want to do so, do it.

I still don't understand why 'people they trust' would have access to their 'personal' facebook account.

I often hand my laptop or iPad to my wife so that she can look at something/use it for a few minutes while sat on the sofa (her computer is a desktop in the other room).

I normally don't bother logging out of whatever sites I'm logged in to.

I'm not too bothered what she sees if she opens up the Facebook tab with me logged in - but I can see that it may be a problem for some.

And I don't think that "hand it over for five minutes to check something" is that uncommon a use-case.

"I normally don't bother logging out of whatever sites I'm logged in to.

I'm not too bothered what she sees if she opens up the Facebook tab with me logged in - but I can see that it may be a problem for some."

That's the crux of it - you're not bothered by it. If you were bothered, surely you'd hit the 'log out' button?

Yes and no ...

"here - look at this interesting article on Wikipedia" ... she starts reading ... ... I get bored and go to make cup of tea, forgetting that I hadn't logged out of Facebook ... ... she finishes article, closes tab and finds herself on my Facebook page ...

It is my fault, but a "clear history" (or at least make my history invisible) function would mean that mistakes like the above will have much less impact

Because they share a computer.

Multiple user accounts.

I disagree, it most certainly is a privacy issue. You may share a computer with friends, room mates, and significant others. If someone had access to your facebook account they could read all your messages, but this is something we are all aware of. Up until now I was _not_ aware of the fact that someone gaining access to my account could actually discover which profiles I frequent, which is like having a browsing history I did not know about, which I cannot even clear/remove after I learn about it. This is what makes it problematic.

If someone you don't trust has access to your logged-in profile, you have much worse problems. For example, they can read all of your private mail and send mail as you.

I think you mean their private Facebook inbox. They may already be limiting what they do on Facebook. But they can't clear the history thing.

The potential "victum's fault" is still no excuse.

Someone not looking after their own security by leaving logged in sessions where other people can access them is indeed their fault.

Come on. This is a personal responsibility issue and nothing else.

There's more to it than just logging out. What if you don't want a subpoena to reflect your most visited profiles? There should be a way to purge any data kept longer than the routine access log cycle, including information on frequently viewed profiles.

The fact that someone can "stumble across" your logged profile is less of a concern for you than the fact that they might see your search history? Seriously?

If you read his discussion of how it works, though, that's very much not what this is. What you're asking for doesn't even make sense, to be honest.

It's not about who you search or view. It's all just a big guess at who you might be interested in. From what I've gathered, it tries to find probable relationships based on all of these factors:

Comments, Likes, Tags, Events, Applications, Friends, and Work and Education data

Note that those are both your own comments, etc, and those of the people it's trying to relate you to. Who knows exactly how it combines all those to come up with its guesses, but I'm quite positive it has fairly little to do with who you're actually viewing (though I'll admit, maybe that's a small factor in there, too).

This graph lists people that I do not have "friended" on Facebook, but whose mail address I probably have uploaded via the friend finder in the past.

However, I've removed all imported friend finder contacts on https://www.facebook.com/invite_history.php some weeks ago.

I guess this implies that Facebook does not really remove the imported information, but that it keeps this information and just hides those contacts from you in the friend finder interface.

I think you're leaping to a bad conclusion. Keith's post mentions "computer-generated guesses" to fill in the list. Looking at the guesses in my list, they appear to be friends of friends. (The prefill list is probably using logic similar or identical to the logic that presents "people you may know.")

Find somebody on your mystery list who isn't a friend of your friends, and then you can get paranoid.

I see plenty of people I know of but have never interacted with online (in fact I just found the previous-unfound profile of a coworker with a very common name). Facebook is /smart/--though I wouldn't doubt for a second that they've used your friend finder data to strength their edgerank data I would not be surprised to learn their seeming precognition comes from more esoteric sources.

Interesting point - the "friend ranking" numbers are different when I checked Mozilla vs. Chrome.

In Mozilla, the first person ranked was at -6.2650374; on Chrome, it was -7.2581474 (I go back and forth with this person quite a bit). Also, the ranking of some of the people were different.

My guess is something to do with browser cookies or caching - any ideas why this might be the case?

Did you go back to Mozilla and make sure the value hadn't just changed for real?


OT, but for the second time this year, someone is newly (just as of today) using one of my secondary Gmail accounts as their FB user ID. They don't appear to have access to the Gmail account -- and I've newly killed any concurrent sessions and then changed its password and security Q/A to be sure. So, I don't know how they're accomplishing this. But that Gmail account received the sign up confirmation messages and is now filling up with friend confirmations.

Any chance you could plug me into someone relevant at FB, as this appears to be a recurring problem without a ready explanation? Email to pasbesoin at that gmail place.

P.S. As best I can determine, my systems are clean, and I've no other problems/compromises that are apparent. There appears to be something borked with the Facebook account creation confirmation process.

Can other people's interaction on your Facebook user affect their ranking on your list?

"The inputs to this ranking function explicitly do not include other users' behavior on the site"

Boy, this would be a great way to hijack Facebook accounts. Just convince a bunch of people to run your bookmarklet on their Facebook profile.

Indeed, it is not wise to run third-party bookmarklets while logged into Facebook. This one may be benign, but the next one may not be. If you want to see the JSON we're talking about, just load


with 'userid' replaced with your numeric Facebook account ID.

For mine to display scores, I had to remove the "[0]" after "filter": https://www.facebook.com/ajax/typeahead/search/first_degree....

With the "filter[0]" in the url, all scores came back as 0 for me.

Thanks, edited accordingly.

I just keep getting errors. Viewer is also supposed to be replaced with the numeric ID, right?

Nevermind, I misinterpreted. I just realized it's supposed to be "viewer=[ID]", I was interpreting it as "viewer=[ID]&userid=[ID]". I'm still getting errors, but presumably that's another matter.

how do you find your numberic facebok ID?

If your fb id is raptrex, https://graph.facebook.com/raptrex

There might be another way, but this is the one I use.

It's javascript. Read the source before running it. The obfuscated gook that's 80% of the script has 0 diffs from jQuery as downloaded at jQuery.com; and the rest of the script is easy to verify in 5 minutes.

It's also a great habit to get into. Reading source code is invaluable in understanding and learning. And it is a skill that can be cultivated just like others. For example, the Prey project is an invaluable piece of software, except it's potentially extremely sensitive. Probably worth reading the source first.

You also start to get very well versed in the "usual way of doing things", especially if it's a language/paradigm you're not programming daily in.

Unfortunately, while a good practice, reading the code is not an iron-clad defense.


To give one example of how this could fail, the server could return different code when the request referer is facebook.

Out of my top ten, seven are women I have had a crush on at some point. Seems they are on to something...

Not them, you :)

Those who don't want to run the script can visit the facebook first_degree page[1], search for "path" in the output. Note that you need to replace your facebook id in the link which can be obtained via graph api [2].

[1] http://www.facebook.com/ajax/typeahead/search/first_degree.p...

[2] http://graph.facebook.com/{your_vanity_name}

Would it be possible for somebody to create a virus that would grab this file and publish it to people's profiles? I think I'd crawl into a hole and die if my ex girlfriend discovered how highly she ranked...

I've been waiting for the day when something is breached and you can see who's viewed your profile...all hell will break loose

I'm kind of sadistically hoping Anonymous comes through on their promise and starts some kind of Facebook apocalypse like that.

What Anonymous should do is claim they are ready to do this and give everybody a couple days notice to remove their accounts before the hit. lulz would ensue if they could get the media worked up, which I bet wouldn't be that hard to do.

it seems like it has happened very briefly, http://www.mobileinc.co.uk/2009/12/facebook-app-uses-exploit...

My wife is ranked really rather low... bad news.

Not necessarily, if you see each other everyday there's no real need to interact through Facebook.

Facebook once suggested I reconnect with my wife. I saw that as good news that we were interacting in the real world not on FB.

Low is high in this instance. Unless you were aware of that and saying her high count was a low ranking.

That would require the virus to successfully log in as you, and at that point does it matter?

Well, this script/data now has the increased effect of being very socially detrimental. It would be absolutely awkward as hell if data like this came out to friends.

This is pretty funny in the ajax code:

    success: function(result) {
        alert('Please try again.');
    error: function(data) {
        var text = data.responseText;
        // ... processes the data here...
Why is success error and error success?

I think because the request is asking for content type JSON but the Facebook response comes back as javascript with a for (;;); at the beginning. The content types don't match so jQuery invokes the error callback.

Looked at mine. First thought: whoa, I hope no one else ever sees this

Facebook you scary.

My list is all random. The first entry is my girlfriend, the next though is a random girl from college whose profile I have hardly visited more than say 4-5 times ever on Facebook. Same for others down the list.

So is it something were people are apparently assuming that it is onto something because they can see those 2 names they do not want anyone to know about among the 10 being in the top or is the result for everyone actually correct?

In the case of the latter how is it so random for me? I have Michael Arrington on the top 15 and I swear I do not stalk his profile.

Facebook's reps haven't said the list is based on just on profiles you view, they said it's based on "people you interact with" -- there may be other stuff in there, like messaging activity, or commenting, or liking. Even if you don't go directly to Arrington's profile, you can be interacting with him in the newsfeed.

I would think that clicking on links that a person posts is a good signal and thus would increase your engagement number. Other signals that probably feed into this are pictures you look at, likes, comments, groups you are in common with, etc. i.e. anything that ties you to another person, though I bet they are all weighted differently.

My roommate is infatuated with this girl he's known since HS. I just ran it on his laptop and she's at -68. Everybody else is above -4.

Can anybody beat that?

1. This is very cool 2. This is very scary. Someone hacking me then posting the top 10? Social disaster

Why? I don't use Facebook, but visiting people's profiles on social sites doesn't mean much on the other ones. Most of the people I look at on Twitter or Google+ have done something so brazenly spammish that I have to go to their profile to find the "report spam" link or whatever. These people aren't my friends, they're just people that spam me that I want to go away.

Hardly a social disaster.

Well, imagine if you will you were stalking some person you kind of know but not very well, or someone who is an ex-whatever, etc...and that info gets out, and they know. It could spell disaster. Sure, it's entirely the person's fault for stalking in the first place, but the internet enables that so...

You realize he wasn't talking about you, right?

Here's how the top 209 people in the list rank for me (the whole list was too long for me to bother cleaning up at 1:30am). 22 people had rankings of less than zero. http://i.imgur.com/WqcKF.png

Here's a graph of my entire list, annotated with "I don't know this person" and "This person is invited to my wedding".


Top ten:

1) best man

2) random person who lives in my apartment blocked who I stalked once to get their mail to them

3) Wife-to-be

4) Her sister

5) Good friend of mine from Uni

6) Me

7) Place my wife-to-be volunteers at

8) Another friend from uni

9) Wife-to-be's best friend

10) Someone I know online but not IRL.

Tried browsing through the javascript to see if you're screwing with me somehow ... got too lazy 1/3 way through and decided to trust that as a member of HN community you wouldn't (probably a terrible idea). But yea, the script works ... all too well.

You can always just use the inspector in your browser to view the original JSON and run it through JSLINT to make it readable. No external script needed. ;)

Yea, I ended up doing it afterward, to take a deeper look into the date. tnx!

I might be one of the few here who has a facebook account with 0 friends and never really use it. So the results are very interesting as they contain people who I know but do not interact with. Some of them must be from profile searches I have performed but I cannot explain the others. I am inclined to believe they must be incorporating 'other users behavior on the site'.

If you have folks with non-ASCII names, here's a Python three-liner to convert the output of first_degree.php to a text file:

    with open("first_degree.php.txt", "w") as f:
     for e in json.loads(open("first_degree.php.json","rb").read().replace("for (;;);",""))["payload"]["entries"]:
      f.write(("%s %r\n" % (e['text'],-e['index'])).encode('utf-8'))

I'm curious whether these numbers factor in people who are looking for you/interacting with your profile. My list has some people I don't recognize as well people who I definitely have not clicked anything of recently, which is why it might.

Well I changed my privacy settings to visible to all, made a dummy account and click raped my profile and my dummy account didn't show up. I'll add the accounts as friends and then make note of the number, then do clicking from the dummy and see if it changes. There could be some privacy implications behind the initial coolness of this.

Also, if you make an account and do nothing your value for yourself is 0.939565, which I guess is some sort of baseline of 0 interaction? Although I don't understand how they are modeling your interaction with yourself, tbh.

It seems like there is a lag time between clicking stuff and the value changing, because I'm not getting my dummy to show up at all.

The bottom half of my list consists entirely of people who I didn't recognize at all. All but one of the ones I looked at have at least one mutual friend with me. (Now I'm curious whether Facebook generates some of these entries just by crawling my social graph?)

Also, can anyone share what range of numbers they're seeing? At the very top of my list is one negative number. Beyond that, the top half of my list ranges between 0.1 and 1.0. The last half of the list ranges from 1.0 to 1.2.

Top of my list is -3.7312181, bottom is 1.237559. 11 negative numbers, 108 positive. A friends was skewed more to negative numbers, so I'm guessing that maybe you just aren't a heavy/frequent user?

You guessed it. :-)

I also thought that other people's interactions with me are a part of the algorithm for this same reason. However, the reply by Keith Adams seems to suggest that is not the case, and that they are just "machine generated guesses" which weigh into the algorithm.

Part of the algorithm may involve them trying to predict who you will want to add as a friend. People that attended the same events as you, people who have recently added your friends as friends, stuff like that.

I am wondering this too, there are many people in that list which I am not friends with and have only visited their profile once, yet they have a higher score than some of my really close friends.

The score is probably a measure of mutual information (or some other kind of log prob). That's what usually produces scores distributed like this.

I'm not a frequent Facebook user but I log in at least once a day. The first person on my list is a girl I am dating who is about a -2. I am the second person on my list also at about -2. About 30 more people have a negative number. The remaining people (about 500) are all a positive number. I don't even know who many of the people are at the bottom of my list.

It is interesting to note however that my brother is a frequent Facebook user and his first person on his list ranks at about a -26, and EVERY SINGLE person in his graph is assigned a negative number. Despite this, he still doesn't know who a lot of the people are on the bottom of his list.

Other things I noticed: - Some people that I barely even know but just became friends with are ranking highly on my list. I imagine this is because I probably viewed a bunch of their pictures after being friends, combined with the fact that I don't often view profiles of people I am friends with.

- Some of my more recently added friends have the same exact value.

I wonder how much information could be extracted about the algorithm by creating a dummy community of people and connecting them together, then recording the results. I imagine much of the algorithm could be reverse engineered by this if anybody were up to the task. I will probably work on it if nobody else does and publish my findings.

EDIT: Explained what I was trying to say better, fixed grammar.

I think it's a bit funny that with a score of -0.3 I am #2 on my own list (#1 being a girl who's profile I checkout a bit too often), I'm stalking myself??

That said, I've always thought it was strange how FB comes up with the people not only in search but also the ten friends list on the left in profiles, the chat list on the right, and 'people you may know'. I've done a few tests w/ friends regarding the topic, and pretty much what everyone here has guessed seems to be true. There's a very good chance FB factors in people who are looking for you/interacting with your profile (unreciprocated), but of course they won't admit that..

I don't think it's such a big deal in either case. The top Chinese Facebook-like SNS site, http://renren.com/, shows everyone the most recent 9 visitors on a page, and its often used (by would-be stalkers) as a way to overtly show interest "hey, i'm checking you out" kinda deal, or for couples to let each other know they are attentive. Otherwise, its just friends keeping tabs on each other. I'll admit, at first I was weirded out by it, but now it seems almost normal.

I'm probably way out of the norm here, but I honestly wouldn't mind having this in FB.

Pretty much exactly what I was going to say about this (I'm #3 on my list). I just posted the article and my top 10 on my wall, tagging everyone in it. Curious as to what the non HN crowd thinks.

Great. You're probably going to be Ground Zero for next week's version of the "run this program to see who looks at your profile" rumor, and the week-after-next's bogus virus warning about running the script.

That is really strange to me. Maybe I don't stalk enough or something, but often when I look up a friend (someone who is actually in my friend list and I communicate often with) it seems to be impossible for Facebook to find this guy and I have to go directly to the friendlist search to get this person's profile. Always (or so often that I don't remember the exceptions, which is in usability terms "always") when I use the normal search bar I get people I don't know, have not even indirect relations with or be in any way interested in. Sometimes the names don't even consist of the words I was putting in the search bar. That's why I really wonder why people can think of facebook's people search as something cool or "stealable".

As some have mentioned, doing a couple of tests with dummy accounts seems to indicate that a friend visiting your page can influence your first_degree.php, which would explain why there are a few people who you never stalk who happen to be on your list.

That's a possibility. The post that Keith Adams made however refutes this, but it is still possible that he is wrong/mistaken or that the first_degree.php was changed after he left Facebook.

I should have been more clear. Keith seems to indicate that views from other users don't affect your first_degree.php. I disagree, here was my process:

I built a dummy account in incognito that is friends with with my regularly used profile (A) and a profile owned by a friend of mine (B). All friend requests were made from A & B TO the dummy. I then checked the first_degree.php of the dummy, as expected, me and my friend were first. Perhaps not surprisingly, the next people on the list were the intersection of A & B 's friend lists.

Now I logged back in to A, knowing that B wasn't searching for the dummy, and searched for the dummy a few times a day for a couple days (I learned about first_degree a couple weeks ago).

Checking the dummy's first_degree showed that friends passed A & B had more of A's friends.

My guess is that Keith is right in general. I might not be able to move myself up the ladder of the dummy, but I can change the Facebook Social Graph by changing my viewing behavior. The machine generated list uses that graph to determine first_degree. Even if it's indirect, it means first_degree can be influenced by searches of others. Or I'm wrong!

Thanks for all the comments and feedback on the bookmarklet. Glad it was entertaining. As mentioned by other commenters, the popularity of something like this does raise many interesting security issues.

Facebook rape (posting embarrassing statements when using a friend's account) is common. What happens when facebook rape becomes "post who your friend has a crush on"?

Not knowing much about facebook API, tried plugging in somebody else's profile ID instead of "Env.User". Thank God it came back with "not authorized."

The bookmarklet doesn't work if you have secure browsing (https) enabled on Facebook. As you should.

Time to run this script on my crush's profile to see if I am on her top of list or not.

Doesn't work unless you are logged in as that person. (I tried, just to make sure.)

That's why you have to hack into their account first :-) (kidding)

is facebook censoring this? i tried to post the link to the thekeesh.com page to a friends 'wall' and everything in my comment after and including the link was elided. Or maybe facebook post entries are always that broken.

I don't think they're censoring it -- The HTML on this page makes heavy use of the TABLE and SPAN elements for formatting. That seems to be something that makes it difficult for Facebook's link-grabber to read pages. I've encountered the same problem posting links to other sites with idiosyncratic HTML.

TechCrunch now has a post on it so you can definitely post a link to TC

I just noticed how incredibly fast Facebook search is

It's not that fast, Facebook actually starts the search query from the second letter (about 174 ms), before that, it uses the prefetched cache of first degree friends and apps.

Facebook excels at making sensational frontpages with pretty vanilla technology.

that's a damn bad design. I stared at my laptop for a while to a bunch of hats?

I get why the parent was downvoted; it's not relevant to the topic. That said, when you open a page and all you see is the header and the article title? I think it's perfectly valid to comment on the design, or lack thereof, of the page.

Seriously, the article was a worthwhile read, but I don't see how you can NOT comment on the design.

Maybe it's the image requests that just made the server fall over.

Luckily as I read this (and my significant other read it over my shoulder) all my top matches are close male friends and my brother (and her).

Seriously though, this is a bit on the creepy side Facebook.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact