
Google Says It Mistakenly Collected Data on Web Usage - J3L2404
http://online.wsj.com/article/SB10001424052748703460404575244763621501220.html
======
keefe
So like, how exactly did this accidentally happen? It's not like skynet's
running those vans and just decided to sniff the traffic of open wifi nets it
passed. <cliche>power corrupts and...</cliche>

~~~
X-Istence
The code was compiled with -DDEBUG and instead of recompiling without it was
put on the vans driving around.

Honest mistake, they are owning up to it, and removing the data. The amount of
data that they could have picked up while moving around driving is very little
and it is available to anyone with a wifi antenna and some time to drive
around. I'd be less worried about Google than about the kid a couple doors
down that just found out about nmap, wireshark and kismet.

~~~
keefe
Either I skimmed that article very poorly or you have some inside information
on this... I may be slightly paranoid but I am always skeptical about stuff
like this. I could imagine a lot of useful pieces of information that could be
gleaned, like a geocoded map of where people are ACTUALLY using google and for
what... I don't really know, it just gives me a little bit of a funny feeling

~~~
X-Istence
The guy asked how exactly did this happen, I was speculating. I am in no way
associated with google, and a quick Google of my username would have figured
that.

From the quick skimming of the articles I did I saw several mentions on
different places of debug versions. It is very plausible.

You may be paranoid, but if you had read any of the articles you would have
seen that the cars channel hop 5 channels a second, and that they are
constantly driving. All they are looking for is beacon information sent out by
access points. Apparently this also captured a couple of packets of actual
data that was not discarded by the computers in the Google cars because of a
debug build (from what I have read, again that might be pure speculation, but
makes sense).

That won't be enough time to even figure out if the user is using Google or
not. Go grab an wifi antenna, a wireless networking card, your favourite Linux
distribution and go wardriving. You will pick up the same information that
Google picked up. Now, if from the small samples you did get you can determine
where people are using Google and for what and can make it valid research,
good on you, but I doubt you will get much more than the occasional URL, and
if you are lucky that URL is a Google URL.

~~~
keefe
You're assuming that this business about debug modes and the cars channel
hopping so quickly is not a cover story. There's no way for us to know that.
My theory was that given a bunch of interesting data and no way to see what
they were really doing, there's no telling what was really going on.

~~~
X-Istence
Sure, but if all of us are walking around with tinfoil hats on, looking over
our backs for the men in black suits, all of the time we wouldn't get much
done now would we?

Also, once again, the channel hopping is fairly normal for a wifi wardriving
setup, feel free to go test this theory on your own, as well as go driving
down the road at 25 - 50 Mph and get anything but a few packets of data from
the wireless networks you do come across where the owners at the moment you
are driving by are using their wireless, and data is being transmitted in
clear text.

~~~
keefe
and if all of us accepted the stories of the corporate and government Powers
then they could do whatever they like. There's an awfully big gulf between
being skeptical of official stories and being so paranoid that it interferes
with productivity.

I'm certainly not interested enough to test any of this stuff... but my
thought was OK, could they intercept GET requests to URLs off of google.com
and that gives query strings etc? Even 100 URL+GEOCODE per hour would be win.

I'd certainly be tempted to try it if they were my vans, if for no other
reason than to see if it was possible.

I'm doubting that the range is 25-50mph in residential zones btw.

------
drivebyacct
"on web usage". What a joke.

~~~
acqq
Yes, it's obvious they simply saved all the unencrypted packets they were able
to get. If your your e-mail server didn't have encrypted communication, they
saved your mails and your passwords. And yes, anybody else could have done
that with your unsecured WiFi packets. The difference here is the large scale
of collection and the power of the collector. The chance for misuse would be
mostly by having it in wrong hands, but there's already immense amount of data
on Google that would be of the same nature. If you use them you better trust
them. My much bigger worries are that people don't see problem in sites like
Facebook asking the passwords of your e-mail accounts to "connect you with
your friends." And people do this gladly!

~~~
drivebyacct
The Wifi channel was changing 5 times per second. The microscopic fragments
they got weren't usable. The tiny possibility that you were sending sensitive
data, multiplied times the tiny possibility that it wasn't SSL, multiplied
times the very small possibility that it was grabbing enough information to
get a full frame, let alone a handshake seems very unlikely.

If Google had this massive mountain of my personal traffic lying around, I
think they'd have a hard time "accidentally" stockpiling all of it.

I was mostly mocking the word choice anyway. "Web Usage". The worst allegation
is that they logged actual packets or traffic, which I still find unlikely. I
_did_ find it interesting that that particular part of the detail was left out
of Google's otherwise very informative blog post about this incident.

~~~
borism
802.11g network can broadcast 33.75Mb during 5 seconds. That's quite a few
e-mails.

~~~
drivebyacct
"5 times per second"

and

"The tiny possibility that you were sending sensitive data, multiplied times
the tiny possibility that it wasn't SSL, multiplied times the very small
possibility that it was grabbing enough information to get a full frame, let
alone a handshake seems very unlikely."

