

Mining social networks - iamelgringo
http://www.economist.com/node/16910031

======
akshayubhat
Interesting. Sadly it does not mentions any practical tools for doing social
network analysis:

Here are few that I know:

For small networks (up to a million or two million nodes such as Wikipedia
Link graph from 2009)

Following libraries provide code to handle and manipulate Network datasets:

1: SNAP by Prof. Jure Leskovec [ <http://snap.stanford.edu> ] written in C++

2: Networkx by Lanl [ <http://networkx.lanl.gov/> ] written in Python, esp.
good for fast prototyping

There are few Databases for storing networks, e.g. Neo4J <http://neo4j.org/> .

Additionally there is a Graph Processing Language called as Gremlin

<http://wiki.github.com/tinkerpop/gremlin/> .

For Large networks with millions and billions of nodes, one can use Hadoop /
Map-Reduce or Apache Hama [still in nascent stage]. Google has a special
system known as Pregel which it uses to perform scalable computations over
large networks.

~~~
Elite
I'm watching one of the 2 hours videos from the Stanford professor. He looks
like a college freshman! No offense intended.

Do you have any resources on how to use these tools (especially the python
library), and examples of implementation? I'm interested in learning more and
using these tools on real data, but don't necessarily want to spend the time
learning all of the theory behind it.

~~~
akshayubhat
Yup he is young, he did a post doc for a year at Cornell [under Prof.
Kleinberg] after his PhD, and directly became Asst. Prof at Stanford.

Well you can start with reading this book
<http://www.cs.cornell.edu/home/kleinber/networks-book/> for overview,
regarding application of these techniques are considered you can look for
papers at recent WWW, NIPS,ICML conferences. The most popular and well studied
areas are Link Prediction and Community Detection. The SNAP library comes with
some good example code. You can also have a look at Divisi project at MIT
Media Lab if you are interested in reasoning/ analogy over the networks.

For real datasets, there are a lot of encyclopedic datasets such as Wikipedia
/ DBPedia /Semantic Web/ Music Brainz, as well as social ones such as Twitter
follower network dataset. If you are in a university, you can even get full
Web Graph from Yahoo [for research use alone].

------
iamwil
So this was back in 2008, but I read this from Bruce Schneider:

[http://www.schneier.com/blog/archives/2008/10/data_mining_fo...](http://www.schneier.com/blog/archives/2008/10/data_mining_for_1.html)

<quote>But the authors conclude the type of data mining that government
bureaucrats would like to do--perhaps inspired by watching too many episodes
of the Fox series 24--can't work. "If it were possible to automatically find
the digital tracks of terrorists and automatically monitor only the
communications of terrorists, public policy choices in this domain would be
much simpler. But it is not possible to do so."</quote>

So did something change in the last two years? I mean Palantir is making a
crap ton for presumably something similar.

Was this ever a problem, or was it overcome?

~~~
Alex3917
"Was this ever a problem, or was it overcome?"

The problems Schneier is talking about have very different characteristics
than the ones in this article. So the problems Schneier is talking about are
still very real. But solving little pieces of the problem is definitely
possible, especially with a good amount of human mediation.

~~~
doron
Human mediatiation is key. Skilled intelligence analysis made by well versed
operators. even then it is very hard.

As one intelligence analyst once told me " it is looking for a needle in stack
of needles"

------
rjurney
Reading a social network analysis textbook, one is struck that all the
examples involve rigorous data collection by social scientists with
clipboards. Hundreds and thousands of hours of work for a small graph. Modern
social networks have given us an abundance of data against which to leverage a
backlog of techniques that were previously constrained by manual data
collection.

Phone companies use this stuff in ways that may be unethical, but it can just
as easily be used in ways to empower you to leverage your own network. There
are exciting potentials we are just beginning to see.

------
loup-vaillant
So, phone companies spy on us to make money, help the police (probably without
a warrant), and spread propaganda.

<sarcasm> How surprising. </sarcasm>

Seriously, it's not like we couldn't have foreseen:
[http://www.softwarefreedom.org/news/2010/feb/01/freedom-
clou...](http://www.softwarefreedom.org/news/2010/feb/01/freedom-cloud-
software-freedom-privacy-and-securit/)

------
theprodigy
This is why facebook is worth billions. They can mine so much social data and
deliver targeted ads to the proper influencers.

------
dsplittgerber
does anyone know of a secure, non-free alternative to gmail , where one
wouldn't have (as much) to worry about having your privacy violated?

~~~
cageface
Keep in mind that your privacy is only as secure as that of your
correspondents. I kind of gave up on the idea of keeping email private when I
realized that most of the people I write are non-techy and use GMail, Facebook
etc. naively.

~~~
evgen
This is only true if your correspondents are all using the same service. If
you use gmail then google knows the entirety of your email network. If you
host your own mail then google only knows your links to your correspondents
who use gmail, but not your links to those using hotmail, facebook, etc.
(unless that info is revealed to google by putting multiple recipients in the
to or cc entries in the message.)

