

Github Archive - ig1
http://www.githubarchive.org/

======
danso
Dumb question, but is the code behind the archive open source? I didn't see
any link but did see the star button which...

Ok...I see the star button serves as a link
<https://github.com/igrigorik/githubarchive.org/>

It's great to have other crawler projects to learn from

~~~
PanMan
It is a great project. However, it's not based on crawling, as far as I
understand: Github pushes their data to Google's BigQuery tool, which this
gets its data from.

~~~
briandoll
Correction: We (GitHub) do not push data to BigQuery. Ilya's project archives
the GitHub public timeline. That data is then uploaded from his archive into
BigQuery. The BigQuery project is public, so anyone can use it.

~~~
PanMan
Is there any docs on the github public timeline? I did found
<http://github.com/timeline> which made me guess for
<http://github.com/timeline.json> which works, but is this documented
anywhere? Are there parameters? How often does it refresh? Thnx!

------
Cyranix
I haven't had to deal with this scenario, thankfully, but I seem to recall
dimly that there is some mechanism to rescue GitHub users who accidentally
publish passwords and access keys. Does the archive respect this same
mechanism? (Or did I dream up the mechanism in the first place?)

~~~
omra
There certainly is a mechanism [0], however it doesn't actually completely
remove the commit from the remote GitHub repo, as people with the direct link
can still access it. I'll check to see if I can find removed commits using
this.

[0] <https://help.github.com/articles/remove-sensitive-data>

