
Microsoft logged everyone's MSN IM conversations for a month - nickb
http://blogs.zdnet.com/emergingtech/?p=863
======
eb
"Microsoft logged everyone's MSN IM conversations for 6 months"

This title is sensationalist and inaccurate. The article says that the dataset
consists of one month of logging:

"'The compressed dataset occupies 4.5 terabytes, composed from 1 billion
conversations per day (150 gigabytes) over one month of logging,' according to
the researchers."

~~~
tlrobinson
Furthmore, the data is _anonymous_ :

"We present a study of _anonymized data_ capturing a month of high-level
communication activities within the whole of the Microsoft Messenger instant-
messaging system."

~~~
dcurtis
How anonymous can your own voice be? Not very much, I don't think.

A checkbox on a survey can be anonymous, but my communication with other
people would be impossible to record "anonymously." Remember all that flack
Netflix got for releasing its "anonymous" records of people's movie-watching
habits?

Just because my name is omitted doesn't make it morally unobjectionable to
release the information without my consent, but a lot of huge companies have
been doing this more and more often.

~~~
eb
You shouldn't expect privacy when sending information to someone else's
server.

Google records your searches, pg monitors voting, and TiVo knows what you're
watching on TV.

The data associated with a user is valuable in improving the product and makes
for interesting research. If you want anonymity, you have to explicitly try to
be discreet.

~~~
dcurtis
When I use MSN, I'm sending data to the other person through the Microsoft
tunnel; if they clone the data while it's going through the tunnel and save it
for their own use, that's kind of a breach of my trust.

I don't expect them to NOT do this, but I still believe it's uncool.

Also, there's a huge difference between recording my personal conversations
and recording my movies, searches, or voting habits.

~~~
0x44
_there's a huge difference between recording my personal conversations and
recording my movies, searches, or voting habits._

It isn't a personal conversation if it's going through an intermediary.

~~~
dcurtis
Why not? I can have a personal conversation with someone else verbally over
the air, but I can't have a personal conversation with someone else over bits?

~~~
0x44
You're not having a personal conversation with someone else over bits, you're
having that conversation with someone else through an intermediary,
specifically the hardware of your respective ISPs and any other third parties
between whom the packets route. That isn't so much a personal conversation, as
a very accurate game of Telephone.

As an aside, I apologize for not responding sooner, I did not know how to find
my old posts.

------
JulianMontez
I don't want Hacker News to start having sensationalist headlines...

------
gojomo
I suspect they logged the endpoints, not the contents of the conversations.

4.5 TB / (1 billion coversations/day * 30 days) = 165 bytes/conversation

That's enough for anonymized IDs, IPs, timestamps of start/finish (and
probably individual messages in a many-message conversation) -- but not full
transcripts or recorded voices.

------
tim2
The inflection of the k-core graph at 20 is interesting. Rather than a
fundamental property of human nature though, I'd speculate that this is the
direct result of school class sizes.

------
wallflower
Ever see that scene in Bourne Ultimatum when someone says a non-typical-
english word. Blackbriar..

If it's non-SSL, then it's plain text. C'mon - you college folks must have
tried to sniff the Ethernet LAN (my college used switching as a deterrent)

~~~
a-priori
Not a problem anymore with pervasive wi-fi. Go to the library or something and
sniff away. Or, get fancy and use Ettercap.

------
tlrobinson
_All of our data was anonymized; we had no access to personally identiﬁable
information. Also, we had no access to text of the messages exchanged or any
other information that could be used to uniquely identify users._

------
dmix
This article could of been way more interesting if the writer wanted it to.
This was just like reading a press release.

I know its a known theory, but they just proved it with a massive network.

