Hacker News new | comments | show | ask | jobs | submit login
Yandex can now search for Facebook posts (itar-tass.com)
74 points by iamtechaddict 1286 days ago | hide | past | web | 40 comments | favorite

Only the public posts.

However, Facebook has a long history of changing their privacy policy and settings, and always in a way that (by default) makes public some things that previously were private. Usually with a couple of weeks to a month notice, not more.

So, unless you are diligent enough to review/delete your private stuff every time Facebook makes such a change, it is likely that in the future, stuff you marked "private" today will, in fact, be publicly indexed and searchable by yandex tomorrow.

Which is why, as has always been the case, you should assume everything you do on Facebook is public. Either because Facebook will decide to make it that way one day; or because of a security breach of some sort (hasn't happened yet, as far as I know), or because NSA/MI5/Mossad has access to it and will use it against you at the most inopportune moment (for you; it's a great moment for them).

This is ridiculous speculation. Source?

Well, they went from allowing you to have a private profile (ie. not searchable and non-friends see no info) to making name/age/gender/profpic/email/friends list public. There's quite some evidence that Facebook is using private messages in targeted ads (meaning the ad buying now knows private information about you).

Please don't mislead people. Ad buyers do NOT receive any private information from Facebook users when they run an ad. They tell Facebook what kind of user they want to see their ads, and Facebook shows the ads to those particular users. There is no exchange of information.

Pretty much what Beagle said. Targeted ads on Facebook are the social network equivalent of a spear phishing email. It's trivial to target a very specific set of even just one individual if you've done your research and then follow them around the web

Not directly, no. But at least in 2010 (last time I did anything to do with ads), you could buy the ad according to your demographics on facebook, and then plant a cookie to correlate to other sources of information.

So, yes, I was able to use facebook to mark specific people (by buying an ad that targets them), and then follow them around the internet by participating in real time bidding on their views.

So even though facebook does not directly sell that information, you don't have to be very smart to indirectly "buy" it from them (and AdNexus and friends).

Have you ever created a facebook ad?

They've apparently been targeting ads based on things said in private messages. http://arstechnica.com/business/2014/01/facebook-sued-for-al...

3 second google search yields these results from 3 months ago:



This is pretty well known, and has been happening every 1-2 years. Are you new on the internet?

The title makes this sound much more severe than it is. According to the article they only get access to _public_ posts, i.e. posts they would have been able to access by crawling the site anyway.

Note, though, that Facebook's robots.txt and Terms-of-Service typically prohibit the crawling of these 'public' posts.

Users shouldn't be surprised that others can see them; they may still be surprised at the new level of discoverability by strangers and via other sites.

Developers may still be a bit miffed at the mixed meaning of 'public' here – world-readable to the extent it benefits Facebook (and negotiated partners), but not allowing downstream automated indexing/analysis/excerpting by the general public.

Honoring robots.txt isn't a legal requirement, and I don't think a crawler can read or agree to TOS. Personally I'm in favour of anything publicly accessible being fair-game for indexing purposes, but this kind of news about database access makes me uncomfortable.

Re: honoring robots.txt isn't a legal requirement

In common-law jurisdictions (like the US), I wouldn't be so sure of that. It has a lot of precedent behind it, as a longstanding convention for indicating site-owner/rightsholder wishes. Ignoring it – deploying software designed or configured to be oblivious to it – could create legal risk.

A literal reading of copyright law and laws about 'authorized' use of computer systems would assess all bulk copying/reuse of web content without explicit advance permission as illegal. It's the force of traditional/reasonable industry practices (like robots.txt), and offsetting considerations like fair-use, that make it legally defensible.

Well, they're getting the firehose which is a lot more valuable (real time).

I still think yandex is underrated outside of Russia. It's webmaster tools[1] seem to get better and better whereas Google's seem to remove more and more information.

[1] http://webmaster.yandex.com/

Surprised something like this hasn't happened sooner. That social networks like FB sell personal user data to advertisers is the industry's worst-kept secret.

At least now they're dropping any pretense that they give a rat's ass about privacy.

Facebook doesn't sell data to advertisers. Do you have a source for your claim?

Don't know if this is still the case, but in the past it was possible to pay facebook for ads targeting e.g. 25-30 year olds in the US, and color that add with a cookie, so that when you saw the same browser later, you'd know it was a 25-30 year old from the US.

(It went much deeper than that - marital status, month of year, a few other things I can't remember). So, while they weren't directly offering and selling this information as is, you could buy it from them by buying ad space on the demographics you cared about. (Yes, this also required you had other access to ad networks, real time exchanges, etc -- but if you really want info on people, it was chump change)

Facebook "partners" with several large 3rd party advertisers.

I'm not sure how you would define "sell data", but other advertisers are using that data for targeting at scale already.

These post were public anyways, therefore I fail to see how this will affect anyones privacy.

Also, like ryanmerket said: Facebook doesn't sell data, you are just misinformed.

A lot of Facebook users are making public posts without realizing it, and a lot more don't understand the ramifications (difficulty traveling, job risk, stalkers, ID theft, etc.)

As the article notes, this did happen eariler: Bing has the same level of access.

Access to the firehose != access to database. Bad reporting.

I have a Facebook account nowadays, mostly for private chats, reading friends' statuses and a few groups with classmates, but any posts I make are strictly set to public. I know I shouldn't expect them to be private on Facebook and this forces me to think a little about what I post. Pseudonymously, I don't use my real name on Facebook, but still.

What part of "public" do people not understand?

Typically same part that includes such complex terms as "access" and "third party".

The part that is subject to change at Facebook's whim

From public to "more public?"

No, from private to public. They are notorious for changing privacy policies with effects that are not obvious to most users.

A year ago Yandex made an iOS app "Wonder" that could search in Facebook and other social networks. Facebook blocked this app one month later.

Good to know that now Yandex and Facebook have signed an agreement.

FWIW, here's Facebook's robots.txt file. While this was a firehose agreement and not subject to robots.txt, it is an interesting look:


They give the Internet Archive the most access, but oddly go out of their way to block their TOS and privacy policy from them and not anyone else. Sneaky.

Update: scratch that, I missed that they were Allow statements for ia_archiver. The Internet Archive has by far the least access.

They give the Internet Archive the most access, but oddly go out of their way to block their TOS and privacy policy from them and not anyone else. Sneaky.

It's actually the opposite. They allow IA access to their terms and policy, but block everything else:

User-agent: ia_archiver

Allow: /about/privacy

Allow: /full_data_use_policy

Allow: /legal/terms

Allow: /policy.php

Disallow: /

Duh, all the others were Disallow statements and I just missed that the IA ones were Allow. Thanks for the tip.

I guess it's only for Russian and Turkish-language posts.

Right, most likely only Russian/Ukrainian languages. I don't think they will index English posts.

I am deactivated since long time, do I need to worry ?

I don't know if you need to worry but you shouldn't assume that deactivating your account means your data is not there anymore.

I have deleted my FB account once and waited the amount of time they say is needed for data to be deleted. In fact, I stayed out of FB for >6 months. When I decided to create an account again using the same email address, at the first login, FB was ready to remind me of all the friends I "probably knew" (sure enough, they were all people I had added as friends before deleting the account). So that information is unlikely to be deleted.

To get a somewhat vanilla experience I had to create an account using a different email address. Then it behaved as not knowing me (too much).

Yes. I deactivated an account per their instructions for permanently removing an account over a year ago and haven't logged into it and it's still active. The information on it is still available.

Facebook has also reset all of my account's policy settings to the most public options twice since I've been on facebook. My posts kept their privacy settings but everything else, my friends list, likes, etc become public instead of only visible to myself. I suspect it was a bug related to data migration since both times it happened during site upgrades.

Still, I do not trust facebook with anything I do, ever. I assume it's just as public as the information on my blog. I aggressively censor posts and tags because of that.

I tried deleting an account recently. It was surprisingly easy. You just had to find the right page and push delete.

Now, my account hasn't gone poof yet (it takes 2 weeks, they say), but I'm optimistic.

I deleted my facebook about a year ago now.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact