

1 Database Containing 35,000,000 Google Profiles. Implications? - sathyabhat
http://blog.cyberwar.nl/2011/05/verified-google-allows-mass-downloading.html

======
kylec
A few months ago I discovered a curious book on my doorstep. To my shock, it
contained names, address, and phone numbers of thousands of people in my area.
I suspect that many other people, including criminals and certain types of
marketeers are in possession of similar books. Implications?

~~~
blinkingled
Couldn't I tell the phone company to NOT include my profile in the phone book
/ directory they publish? I remember doing something like that with SBC - sons
of you-know-what, they definitely sell out that info judging from number of
crap calls I got when I was with them.

~~~
tonfa
You can chose if you profile is indexable or not.

------
fauigerzigerk
I'm usually very critical of and sensitive to any privacy issue. But the
Google profile is a public profile, which is made abundantly clear on every
occasion. This is what you see when creating the profile:

"Decide what the world sees when it searches for you. Create a public profile
to display the information you care about and make it easy for visitors to get
to know you. [...] Your profile will be visible to anyone on the web, and
anyone with your email address can discover it."

I fear that this kind of completely spurious criticism discredits anyone who
has real privacy concerns.

------
jasonkester
If I read this correctly, Google lets you mark some of your profile
information as public. And as a result of this, a member of the public was
able to download it.

So, uh... what exactly is the story?

I think the key piece of advice for people not wanting their personal
information to be downloadable from the internet is to not publish their
personal information on the internet.

------
tonfa
> I did NOT publish the database and did NOT violate any Google policy.

But he might have broken some EU and NL laws about privacy. You can't create a
database with personal information without consent even if it's possible.

~~~
jrockway
The database was created by Google and the users who typed in their
information. He just made a copy.

~~~
tonfa
Does not make it legal. Even if the original provider got consent from the
user, it doesn't mean you have the right to copy the database (and that you
shouldn't declare the data collection to the relevant privacy agencies).

~~~
jrockway
So operating any web index in Europe is illegal?

~~~
tonfa
A web index is not the same thing as a database about people. The original
blog post explicitly says he built such database, he didn't just mirrored
Google's data.

It's not illegal if you get the necessary permissions from the privacy
agencies (which will ask things like: how is the data stored, do you do join
with other databases, can a user ask to have its information removed, etc.).

(IANAL I just happen to have dealt with that kind of things when building a
lobbyist database out of public documents for an advocacy group)

Edit: removed part about database rights, lets not complicate the subject.

------
Joakal
Every search engine does this, I'm not sure what the implications are if it's
public?

Data mining is quite possible but there's an expectation that the profiles are
public so no one privacy-conscious will be putting sensitive information in
it.

I'm curious to know if anyone supposedly have ways of restricting 'mass-
downloading'? I don't know of any website that does short of rate-limiting
requests from a single source.

~~~
hessenwolf
There is rate limiting and also some regularity testing, e.g., one site would
only let me download if I scheduled randomly and less than a certain
frequency.

------
yhlasx
It is meant to be public and available to anyone who tries to access it. It
has nothing to do with privacy.

Don't do your paper just to do it. Go and find more real/serious stuff.

------
hxf148
If the information is marked public then crawling it is how the web works, or
at least how searching and indexing works.

I ran into that a bit with our startup (<http://infostripe.com>) when doing
demo's it was sometimes shocking to people that with a bit of searching I was
able to make a complete profile of their public online activities.

I think that even when people know a particular site is public on it's own
they sometimes don't make the connection between software and search engines
aggregating all that together without their involvement. Usually this is not a
problem for most people but I have seen instances where a user would use the
same username on very different services and get burned for it.

------
rachelbythebay
Implications? Maybe he'll make a Google Profile social network before Google
does.

------
ravivyas
OMG!!! My public data is public.

------
theoretical
I contacted Google about this issue in November of 2008 - I only received an
automated response. (Matthijs mentioned that was why he posted the previous
post[1] on the topic prematurely)

Perhaps with the increasing awareness of this issue, Google will be forced to
act.

[1] [http://blog.cyberwar.nl/2011/05/google-profiles-exposes-
mill...](http://blog.cyberwar.nl/2011/05/google-profiles-exposes-millions-
of.html)

~~~
tonfa
Why act, they make great effort to explain that the data is public. You can
even chose if you want to be indexable (search visibility).

------
nikcub
I have one Google username that has a public profile and that I use for
account registration etc. I have another that I use for personal email that is
private.

I assume more people will start doing the same if they are privacy conscious.

Searching this database is no different to searching on Google itself. The
only concern would be having a mass email list, but spammers have had those
for years and filters sort that out.

------
holdenc
Here's one implication: a scammer decides to send a "Your Gmail account is
being canceled" phishing email to every address there. It clicks through a to
fake but convincing Gmail login page that captures the user's real login info.

I've already had a few friends call for help with this since apparently it's
pretty common.

~~~
wfaris
[http://googleblog.blogspot.com/2011/02/advanced-sign-in-
secu...](http://googleblog.blogspot.com/2011/02/advanced-sign-in-security-for-
your.html)

~~~
MichaelApproved
It's not as helpful as you would think. The people who would activate the
second step sign in and the people who fall for the phishing scheme don't
overlap that much.

------
motters
If these are public profiles then maybe this isn't a problem, but if the data
contains non-public profiles then its a security breech for Google. The
robots.txt settings would lead me to believe that these are public profile and
that Google intends people to view/download them.

------
zecg
Public profiles can be automatically harvested? Curl and wget should be
classified as munitions and access to those tools restricted in at least 45
states. Shut. Down. Everything.

------
hazelnut
nothing new: <http://news.ycombinator.com/item?id=1537968>

------
X-Istence
I would love it if it were made publicly searchable available so I can see
what data is available on me personally.

~~~
Joakal
I tried to find out with your name and came up with this: 'Google launches
Google Xistence to manage social media life' [0]

Your information is pretty easy to find out:
[https://groups.google.com/groups/profile?enc_user=V5aPoREAAA...](https://groups.google.com/groups/profile?enc_user=V5aPoREAAABMk84TPAX7t0uWkf9Ym7YqkdEasx1kiYTQavV7mdW13Q)

[0] [http://www.webmarketinggroup.co.uk/News/google-launches-
goog...](http://www.webmarketinggroup.co.uk/News/google-launches-google-
xistence-to-manage-social-media-life-1613.aspx)

~~~
X-Istence
Never seen [0] before, or heard about it.

The second is just one of my email addresses, and isn't my Google Profile.

------
arapidhs
isn't it up to the user to publish or not his profile?

------
drivebyacct2
My profile is marked as public. I expect that it would be available were
someone to try to access it, whether it was a friend, whether it was someone
scoping out my class(mates), or whether it was someone downloading by the
thousands. What's the difference to the user? The whole point is that if my
data is public, other people will see it. How is it important if my profile is
visible locally alongside other profiles?

I also don't know what people expect people to do. If you ignore the easily
available privacy policy, there is no excuse. Period.

