

Massive scrape of the Twitter friend graph - Harkins
http://groups.google.com/group/get-theinfo/browse_thread/thread/605a00d5ddc62d72

======
markbao

      username: 'theinfo.org' 
      ... the password is the ramanujan taxicab number followed by the word 
      'kennedy', all one word.
    

Wait, what?

Anyway, it's 1729.

~~~
mechanical_fish
<http://mathforum.org/library/drmath/view/52600.html>

------
reconbot

        Authorization Required
    
        This server could not verify that you are authorized to access the document requested. Either you supplied the 
        wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.
    
        Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
    

Something doesn't understand how to supply the credentials... its probably me.

~~~
kirubakaran
If your browser doesn't prompt you for username and password, you can supply
them in the address bar:

<http://username:password@website.com/>

So, in this case:

[http://theinfo.org:1729kennedy@infochimp.info/ics/data/arch/...](http://theinfo.org:1729kennedy@infochimp.info/ics/data/arch/social/network/twitter_friends/)

------
symptic
Excellent! I've met with Philip (I'm redesigning his division's website here
at the University of Texas) and he told me about this project. I didn't expect
it to be mobilized this soon.

Sounds promising. :)

------
mattjaynes
Awesome - I've been playing with CouchDB and since the raw data is in JSON -
gonna try loading this into it and running some experimental map/reduce views
for the data. Thanks!

------
tlrobinson
Nice. But 10 million tweets? That's a few days worth, what's the point?

~~~
symptic
The point is to be a sort of Google algorithm for Twitter. This is plenty of
data to at least get a very solid idea of who the top Tweeters are based on
their connections, influence, and popularity.

Also, keep in mind he scraped the TOP Twitter users (those with X+ followers).
A lot of Twitters tweets likely come from those under that threshold, saving
time, storage space, and effort.

