
Phone call metadata does betray sensitive details about your life - Libertatea
http://www.theguardian.com/technology/2014/mar/13/phone-call-metadata-does-betray-sensitive-details-about-your-life-study
======
skywhopper
If it didn't, the NSA would not be so interested in collecting it. Paranoid
people who believe they might be being listened in on are unlikely to reveal
much directly in the conversation itself anyway, so in those cases the
metadata is _more_ important. Also, the metadata can reveal anomalous
behavior, which they look for mainly because it's easy to find, but also
because it can reveal important information assuming the targets are correctly
selected.

Anyway, the only reason they aren't collecting the calls themselves is because
the storage required is not yet available, so their begging off about "but we
aren't storing the content" is disingenuous. They can, at any moment, capture
any call going over the long-distance network (which would include pretty much
all cell phone calls) so the only thing they're unable to do is to
retroactively listen in on calls. If you are flagged for whatever reason (you
know someone who knows someone who knows someone they suspect of sending money
to a bad charity), you may well be being monitored.

~~~
slashdotaccount
The amount of storage required for storing _all_ of the call recordings (GSM,
VoIP, land-line) are currently available. For example, Speex [1] can compress
voice even in 2kbps. So storing everything e.g. in 8kbps you can store 916259
hours (104 years) of voice in just _one_ 3TB disk.

[1] [http://speex.org/](http://speex.org/)

~~~
Cthulhu_
Let's take the US. 317 million people, assuming they call for an average of 10
minutes (based on nothing whatsoever, btw), gives approx. (10 _317) / 60 = 53
million hours of phone conversations a day.

Given 916.259 hours = 3 TB, 53M / 916.259 = 57 _ 3 TB = 172 TB of data /a
day/. And that's just the US. Even if you adjust the average to just one
minute a day, you're still looking at 17 TB / day, which should be sorta
manageable I reckon.

But it's not just the US. Let's assume they want to track all voice
communications globally, rounding it to an even 7 billion, ten minutes a day.
I'm counting 3819 TB/day. That's a lotta 3 TB hard drives.

tl;dr: big data is big. disclaimer: I suck at basic arithmetic, I probably
made a miscalculation.

~~~
dictum
~200TB/day is entirely within the room the NSA's budget allows.

You don't need to record all phones in the world if you have the metadata of
all calls in the world. The NSA or another spy agency could record only calls
that match a given pattern: place from which the call originates, who is being
called, time of call, previous calls made or received by the line, when the
line or phone was purchased, whether this phone's call patterns resemble
another phone's call patterns, etc.

They wouldn't achieve 100% coverage, but efficacy would probably be 99%+.

I would guess the problem with this kind of semi-targeted collection is
processing power to decide who is a target and schedule the line taps.

~~~
gknoy
Or, record all of it that you can into a daily or weekly cache, and then keep
in indefinite (expensive) storage the things which are Statistically
Interesting but outside our current budget/capabilities to store forever.

~~~
dictum
Yes, good point. You don't have to store everything forever, you can have
tiers of interestingness — capture everything, read it looking for certain
patterns, store anything that matches those patterns forever, store stuff that
matches [secondary, less important but still useful pattern] for a few years,
and store all other calls for a few months or a year.

Phone calls are quite low quality audio, but I don't expect the NSA to be
limited to consumer grade text-to-speech technology, so at least for calls in
some languages, they could store the transcripts forever.

EDIT: Apart from processing power, another expensive problem with such a setup
is memory to store the firehose temporarily.

EDIT 2: If you were wondering, 200TB/day would run at $7500 for 50 4TB
external hard drives at $150 each, assuming you wanted to use a Backblaze-like
setup. In a year, that's $2.7 million. (This doesn't account for redundancy.)

------
rayiner
While this is certainly true, sentences like this one show that The Guardian
doesn't understand the nature of the data/metadata distinction: "The
researchers cite statements like that of President Obama, that the NSA was
'not looking at content,' and ask whether the legal distinction between
metadata and content is matched by harm reduction in the real world."

The distinction between data and metadata in U.S. law isn't about "metadata"
being supposedly less harmful than "data." That has absolutely nothing to do
with it. The distinction is based on the 4th Amendment's requirement that
people have an expectation of privacy in the information in order for it to be
protected. The idea is that if you have to expose certain information to a
company in order for the communication to work at all, then you can't expect
that information to be private. If you expose it, if you _have_ to expose it,
it's not private.

The legal distinction reflects the underlying technical distinctions. You can
encrypt a phone conversation, but you can't encrypt the signaling information
and still have the system work. You can encrypt the contents of a packet, but
you can't do the same for the IP headers. You have to expose this "metadata"
in order for modern telephone and IP networks to work as they are currently
designed, and it's this exposure that creates the distinction for 4th
amendment purposes.

~~~
bradyd
"The right of the people to be secure in their persons, houses, papers, and
effects, against unreasonable searches and seizures, shall not be violated,
and no Warrants shall issue, but upon probable cause, supported by Oath or
affirmation, and particularly describing the place to be searched, and the
persons or things to be seized."

There is nothing in the 4th Amendment that says anything about an expectation
of privacy. Besides phone metadata IS private. It is only shared with the
person placing the call, the phone company, and the recipient of the call. It
is not publicly available. Just because I share something with someone else
does not mean it is no longer private, just that it is private between myself
and that other person.

~~~
rayiner
> There is nothing in the 4th Amendment that says anything about an
> expectation of privacy.

Ignoring the 4th amendment jurisprudence is a double-edged sword, because it
allows interpretations of the 4th amendment that are a lot more conservative
than the modern "expectation of privacy" formulation. The original intent of
the 4th was just to prevent customs searches of your home. Extending it to
communications at all was an invention of SCOTUS. Moreover, the interpretation
of the word "their" is a challenge. How can Bob and Marie say that AT&T's
records about them are " _their_ . . . papers, and effects . . ." when _they_
didn't record that information, _they_ aren't storing that information, and
_they_ don't even have access to that information?

No, if you're a privacy advocate, you're far better starting from the
"expectation of privacy" springboard than going back to the literal text.

> It is only shared with the person placing the call, _the phone company_ ,
> and the recipient of the call.

The difference between "private" and "public" is a spectrum. On one end of the
spectrum are the thoughts in your head. On the other end are thoughts you
publish in the NYT. The way the 4th amendment jurisprudence uses the word
"private" it means things that are close in the spectrum to the former
extreme, not everything that isn't at the latter extreme. Sharing something by
publishing it in the NYT means it's not private. So does sharing something
with potentially thousands of people at AT&T or Google.

~~~
bradyd
"How can Bob and Marie say that AT&T's records about them are "their . . .
papers, and effects . . ." when they didn't record that information, they
aren't storing that information, and they don't even have access to that
information?"

I agree with you on this, but I don't agree that means it is no longer
protected by the 4th Amendment. That information is AT&T's "papers and
effects", so collecting it without a warrant is a violation of AT&T's 4th
Amendment rights, not necessarily Bob and Marie's.

~~~
rayiner
You don't need a warrant if the party consents, and as far as I know, AT&T
hands that information over voluntarily or in response to valid subpoenas.

------
Lagged2Death
_Researchers ... successfully identified a cannabis cultivator, multiple
sclerosis sufferer and a visitor to an abortion clinic using nothing more than
the timing and destination of their phone calls._

Unfortunately, the political actors who are the biggest cheerleaders/defenders
of total surveillance are also the ones most likely to be in favor of
unconstitutionally severe pursuit and punishment of drug growers, in favor of
ejecting sick people from the health care system (preferring that they instead
die quickly and cheaply), and in favor of publicly shaming abortion patients.

In other words, a result like this is particularly likely to re-enforce
existing biases more than anything else. Plenty of Americans will respond to
such news by suggesting such data should be used _more_ , not collected less.

------
jcromartie
As someone said recently "we know you have called a phone sex line 10 times in
the last month, and a divorce lawyer last week, but we don't know what you
talked about."

------
prof_hobart
I'm not doubting that it's entirely possible to identify a lot about people
from their phone metadata. I don't think it takes a huge amount of creative
thinking to establish that someone who calls a hydroponics dealer and a
headshop is fairly likely to have some connection to drugs, and that tracking
their phone calls would allow you to spot that.

But I'm not sure that this article really adds a lot to the discussion with
statements like "Owing to the sensitivity of these matters, Mayer explains
that the researchers elected not to contact the three participants for
confirmation that their inferences were correct"

If they could have correctly identified a drug dealer from a pattern of
seemingly innocuous phone calls (and actually validated that they are correct)
then this could have been at least vaguely interesting. As it is, the story is
"if you've phoned somewhere that deals with MS relapses, we can make a guess
that you could have MS". Well thanks, Sherlock.

------
CurtMonash
Just for the record -- unless somebody is also storing the contents of the
calls, which the NSA strenuously claims isn't happening, it's NOT metadata.
It's just data.

Why? Because metadata is "data about data", so something is only metadata if
there's other data for it to be data about.

[http://www.dbms2.com/2014/02/23/confusion-about-
metadata/](http://www.dbms2.com/2014/02/23/confusion-about-metadata/)

~~~
delinka
Meh. The call itself, and the data about the call, traverse the telephone
network. The data about the call, the metadata, is being logged and stored.
Yes, by the time it gets to an NSA storage device, it's no longer "meta," but
at the time the human made a call and generated the metadata, it was metadata.
This is the tag we've chosen to stick on it. That the tag maybe should change
doesn't really change the privacy implications, and would only serve to
confuse the conversation to a populace that already doesn't seem to care.

------
SilasX
Well, yeah? The first thing I thought when the NSA story broke was "Just
metadata? Okay Mr. President and members of Congress: how about we release a
list of everyone you talked to and when? It's just metadata, right?"

~~~
trebor
How about we perform a basic statistical analysis and see what locations you
frequent. From that we can tell what religious, financial, and social
organizations you are a member of, and can go beyond to what stores you
patronize. And what routes you take to and fro. Metadata can be as harmful as
recording the calls, if not moreso.

Betcha they collect more than just the raw call metadata, and collect metadata
on data connections. From that they can more easily determine the above, since
you don't even need to be on a call for a service to connect to the 'net.

------
higherpurpose
If it didn't, it would be even more troubling that US uses phone metadata to
assign drone assassination targets, as if it wasn't troubling enough.

[https://firstlook.org/theintercept/article/2014/02/10/the-
ns...](https://firstlook.org/theintercept/article/2014/02/10/the-nsas-secret-
role/)

------
moron4hire
Don't mean to be rude, but literally thought this was common knowledge by now.
Clearly, that is my own failing for not understanding that The People != Me.

I think people are starting to get it, though, especially thanks to how creepy
targeted advertising is getting on Facebook and the like.

And that's kind of the thing, with all of the data that Facebook had on me,
they couldn't figure out to not send me advertisements for skeezy dating
services? That's the only solace I take from this story: it sucks that the
security services are collecting all of this data, but if I can't do anything
about it, it's good to know that they are drinking from a firehose.

~~~
jcromartie
Loads of "truthy" political bullshit that goes around on Facebook and email
chain letters is regarded as fact (by the test of gut feeling) by plenty of
voting adults.

There are different kinds of "common knowledge" among different groups of
people. What's common knowledge to a person immersed in cyberculture for
decades is going to be a world apart from a sales guy.

------
lalos
This is a good slide that demonstrates the power of metadata.
[http://pbs.twimg.com/media/BeMm9qJCEAEHu9Y.jpg](http://pbs.twimg.com/media/BeMm9qJCEAEHu9Y.jpg)

------
wbracken
Once again, if people are worried about NSA capturing cell phone meta-data,
please, go check out what the CFPB is doing with credit card data. We are
talking every transaction from millions of Americans.

[http://www.usnews.com/opinion/economic-
intelligence/2014/02/...](http://www.usnews.com/opinion/economic-
intelligence/2014/02/10/why-is-the-cfpb-collecting-so-much-credit-card-data)

------
waveman2
Protip: the government scans and keeps the exterior of all mail that is sent.
So, put the returning address on the _inside_ of the envelope so that it is
harder to work out who is sending mail to whom.

------
epoxyhockey
I think it is very naive (bordering on denial) to think the the NSA doesn't
record the contents of every phone call it can get its hands on, not just
meta-data.

~~~
alecdbrooks
Sure, but I think it's useful to show that, even taking them at their word,
metadata alone is dangerous.

