
Stanford Large Network Dataset Collection - betolink
https://snap.stanford.edu/data/
======
themeek
A note here that SNAP at Stanford is funded by NSF grants through DARPA SMISC,
which is a research group in the DoD looking to learn how to get better at
influencing social media groups online for propaganda.

(Strategic Communication is the DoD term, well one of them, for propaganda)

[http://www.theguardian.com/world/2014/jul/08/darpa-social-
ne...](http://www.theguardian.com/world/2014/jul/08/darpa-social-networks-
research-twitter-influence-studies)

[https://www.fbo.gov/index?s=opportunity&mode=form&id=972cbc8...](https://www.fbo.gov/index?s=opportunity&mode=form&id=972cbc835c3702e9758aedcf032fb4ec&tab=core&_cview=1)

------
Smerity
If you're looking for even larger graph datasets, the team at
WebDataCommons[1] extracted hyperlink graphs from Common Crawl[2]. They're
available at both page and domain levels of granularity.

The page level hyperlink graphs are 3.5 billion web pages and 128 billion
hyperlinks for 2012 and 1.7 billion web pages connected by 64 billion
hyperlinks for 2014.

[1]:
[http://webdatacommons.org/hyperlinkgraph/](http://webdatacommons.org/hyperlinkgraph/)

[2]: [http://commoncrawl.org/](http://commoncrawl.org/)

------
turnersd
Sad to see the beeradvocate and ratebeer datasets were removed before I could
grab them.

[https://snap.stanford.edu/data/web-
BeerAdvocate.html](https://snap.stanford.edu/data/web-BeerAdvocate.html)

[https://snap.stanford.edu/data/web-
RateBeer.html](https://snap.stanford.edu/data/web-RateBeer.html)

~~~
ised
[http://jmcauley.ucsd.edu/cse255/data/beer/Ratebeer.txt.gz](http://jmcauley.ucsd.edu/cse255/data/beer/Ratebeer.txt.gz)
[http://jmcauley.ucsd.edu/cse255/data/beer/Beeradvocate.txt.g...](http://jmcauley.ucsd.edu/cse255/data/beer/Beeradvocate.txt.gz)

------
amit_m
All of these datasets seem to be some kind of unweighted graph with no
additional information (except community information in some cases).

Does anyone know where one can find richer network data sets? i.e. a graph in
which the vertices have some attributes.

~~~
achompas
Not sure I agree. For example there's a patents set:

[https://snap.stanford.edu/data/cit-
Patents.html](https://snap.stanford.edu/data/cit-Patents.html)

with included patent classification data. You can also join the WikiTalk set:

[https://snap.stanford.edu/data/wiki-
Talk.html](https://snap.stanford.edu/data/wiki-Talk.html)

against Wikipedia data. You can obtain more node attributes for these (and
others) easily by joining against public sets.

------
zzleeper
Are there any large datasets out there representing n-partite networks? So
instead of people connnecting w/people I see e.g. devedges between developers
and languages, or products and users, and so on..

------
alistproducer2
I'm just getting into machine learning so I'm looking forward to practicing on
these datasets. Thanks for sharing.

------
elihu
The availability of real-world signed network datasets is really great, I've
used the Stanford Large Network Dataset Collection in the past to test
predictive accuracy in reputation systems. (Looks like they added a new
dataset to the "signed" category -- wikipedia requests for adminship.)

