
GitHub buries 21 TB open-source code in Arctic vault for 1k years - seesawtron
https://github.blog/2020-07-16-github-archive-program-the-journey-of-the-worlds-open-source-code-to-the-arctic/
======
timsally
Github's site for this has much more detail on the technical aspects of this:
[https://archiveprogram.github.com/](https://archiveprogram.github.com/).
Their data archival goals are even more ambitious than burying it in the
Arctic. _The GitHub Archive Program is partnering with Microsoft’s Project
Silica to ultimately archive all active public repositories for over 10,000
years, by writing them into quartz glass platters using a femtosecond laser._

Worth considering swapping the submitted link out for the original source, per
the HN guidelines. I would suggest the original blog post:
[https://github.blog/2020-07-16-github-archive-program-the-
jo...](https://github.blog/2020-07-16-github-archive-program-the-journey-of-
the-worlds-open-source-code-to-the-arctic/).

~~~
seesawtron
how does one swap the link in HN? one can only edit the title text.

~~~
jagged-chisel
Admins (dang, for example) will occasionally notice a request like this and
make the change.

~~~
seesawtron
great. thanks for sharing.

------
orbital-decay
This is a bit more than a simple PR move as I initially thought after reading
the article submitted in OP. They take steps to ensure redundancy,
recoverability, durability, and much more. They work with the experts who know
the problem by heart.

Still, the main question remains. Computer programs are extremely volatile by
nature, they are written for the constantly changing media, and need huge
amounts of context to be useful for future generations. Will a snapshot of the
Linux kernel be useful in 100 years? 1000 years? After some global disaster?
Even if you don't intend to run it (which requires archiving the current state
of the supported platforms and much more), how much additional work will be
needed to simply read it for studying?

 _> After the Challenger disaster, a hunt for the blueprints of the abandoned
Saturn V rocket ensued. They were largely recovered, thanks to the work of
archivists._

Yet nobody can build another Saturn V today with these blueprints. Any complex
product is much more than just a set of blueprints. It's an organized process
that depends on a certain team of people, supply chain, existing
infrastructure, and much much more. Useful products only exist when they are
supported.

~~~
ben0x539
idk about you but if I were a hyper-advanced singularity hive mind entity from
the year 12,020, I'd get a huge kick out of scrolling through some archive of
those adorable entry-level not-even-sapient programs our ancestors wrote in
antiquity.

~~~
PopeDotNinja
I bet in 12,020, we'll still be waiting for Ruby 3.

~~~
nurettin
Don't know what features you're looking for, but the roadmap in their issue
tracker says it is due in about 5 months. [https://bugs.ruby-
lang.org/projects/ruby-master/roadmap#3.0](https://bugs.ruby-
lang.org/projects/ruby-master/roadmap#3.0)

~~~
PopeDotNinja
Mostly joking. It's just been in the works for a while now.

------
blocked_again
What happens if someone file a DMCA takedown of one of the repositories in the
vault?

~~~
HenryKissinger
The polar bears will keep the lawyers away.

------
boarnoah
Somewhat unrelated but I've always had this concept of documenting everyday
life down to the most mundane of details (by today's standards).

Archived for a few hundred years that could be pretty interesting for future
historians.

Even with how digitized everything is, there is definitely a lot of things
that are undocumented (at least consistently in one place) or left out as too
obvious.

~~~
freehunter
Agreed. Far too often in history books/podcasts/shows you see guesses as to
what normal life was like, but they’re just guesses. Because no one bothered
writing down exactly what the average Roman went to war wearing or how they
lined up or what their tactics were, because the contemporary audience already
knew. It would be considered boring and too much information. But 2000 years
later, we’re still guessing what the Gallic Wars looked like, sounded like,
smelled like. Because it wasn’t important enough to write down.

We still have some of these same issues today, but given the rate of change in
technology, we’re talking years instead of centuries or millennia. What was
the average life like for a startup employee during the Dotcom era
(1995-2001)? I have no idea because I was too young to experience it and I’ve
never been able to find any books or accounts of the average daily work life
even 20 years ago.

~~~
reaperducer
_Far too often in history books /podcasts/shows you see guesses as to what
normal life was like_

The problem is that history books/podcasts/shows are not intended to reveal
that level of granularity about life. They exist to provide a summary within a
specific set of boundaries (printed pages, run time, etc...).

If you want to know the nitty-gritty, you need to start with the source
material. There's more source material on daily life available in the world's
museums and university archives than you could ever digest in a lifetime.
That's why it's summarized in the above-mentioned forms.

~~~
freehunter
Not true at all. Listen to History of Rome and you’ll hear historian Mike
Duncan constantly saying “we don’t know how this happened because no one ever
wrote it down”. Dan Carlin says the same thing a lot in Hardcore History. If
was just a matter of too much details they would have said “this detail
doesn’t matter” or just bypassed it.

It’s not that the medium isn’t granular enough, it’s legitimately that no one
wrote this stuff down so it’s just gone forever. There is no source material
to start with.

------
akerro
This archive contains JHipster and openapi, but I'm failing to understand one
thing. Does it also contains OS, dependencies, compilers, linkers etc? What's
a point of just storing the jhipster without its node and maven dependencies?
Why store the code you won't be able to compile and execute?

~~~
ebg13
> _Why store the code you won 't be able to compile and execute?_

The code itself is a manifestation of human creative expression and part of
our cultural heritage even without executing it.

> _Does it also contains OS, dependencies, compilers, linkers etc?_

If they are free/open-source software (in this specific case on GitHub), then
yes. See also their partner
[https://www.softwareheritage.org/mission/](https://www.softwareheritage.org/mission/)
which archives more than just GitHub.

------
amelius
Did they include all dependencies?

~~~
seesawtron
It's a "snapshot". In python one has to import a module during run time but
for example, in matlab code, the modules and their classes/functions should
already exists within the snapshot directories as submodules. I am not sure
about other languages how this works.

------
aeyes
This is the guide to read the data: [https://github.com/github/archive-
program/blob/master/GUIDE....](https://github.com/github/archive-
program/blob/master/GUIDE.md)

You need to know how tar, lzma, utf-8 and git work. Judging by how hard it is
to decipher ancient texts, I'd be surprised if anybody was able to read code
without additional information on these subjects. Especially storing the
binary git repo as-is will make it very hard.

------
gkoberger
Here's the original article:
[https://www.bloomberg.com/news/features/2019-11-13/microsoft...](https://www.bloomberg.com/news/features/2019-11-13/microsoft-
apocalypse-proofs-open-source-code-in-an-arctic-cave)

One of the fascinating parts of this, to me, was this whole thing was about
permanence... and at the same time, Nat's home burnt down.

------
tra3
Whole of GitHub is only 21tb? Wild.

I now have permanent, inaccessible backup of my code. Also wild.

~~~
fit2rule
Yeah, that's kind of a rub. What if I don't want to be remembered in the
future?

This sort of seems cynical on Google's part. The right to be forgotten has
just been irreversibly denied to some of us.

I know for a fact that a recently deceased friend wanted his repo's nuked. I
wonder if this is going to be taken into account somehow. One can assume that
laser can be used to burn nulls where needed, too ..

------
supernova87a
I would love to know what kind of plastic film medium they tested and used to
make sure it wouldn't delaminate/deplasticize/etc over 1000 years...

~~~
runj__
The testing done for the ISO standard seems to be based on irradiation(1).
Seems fairly straight forward although I'm not sure it's tested for all kinds
of radiation etc., but with a specific temperature and light it looks like
very few other things would matter.

(1)
[https://www.hindawi.com/journals/jpol/2016/8547524/](https://www.hindawi.com/journals/jpol/2016/8547524/)

------
jarmitage
Interesting to think about this alongside Long Tien Nguyen and Alan Kay's
paper "The Cuneiform Tablets of 2015" (2015)

[https://news.ycombinator.com/item?id=11510649](https://news.ycombinator.com/item?id=11510649)

[http://www.vpri.org/pdf/tr2015004_cuneiform.pdf](http://www.vpri.org/pdf/tr2015004_cuneiform.pdf)

------
threeseed
Sounds like a great project.

I hope in the year 3020 they have invented the advanced technology that allows
a company like Github to sort their repositories by name.

------
m0zg
You can actually see if the code you contributed made it there. I have an
"Arctic Code Vault Contributor" badge in my profile now.

------
pizza
Hope there'll be plentiful decent microfilm readers in the future post-
civilizational collapse

~~~
noir_lord
The technlogy to read microfilm is relatively primitive (I mean you still
require lenses and a bright light source but you don't require a computer or
advanced electronics).

------
annadane
"Ha! Microsoft will embrace open-source when hell freezes over"

------
seesawtron
Would future generations be able to run this code on future hardware or future
development softwares or programming languages? That would be a challenging
test of backward compatibility.

~~~
jalk
We emulate a lot of old hardware today, so the future can hopefully do so too.
The future might however be astonished over just how many web-frameworks were
“needed”

~~~
seangrogg
By then they'll wonder how we did anything useful online in under 8 petabytes,
given their state of affairs now involves all sites registered within the
multiverse-DNS to include mandated state surveillance payloads.

And because ads.

------
rejschaap
Taking cold storage to another level.

Short video about the project:

[https://www.youtube.com/watch?v=fzI9FNjXQ0o](https://www.youtube.com/watch?v=fzI9FNjXQ0o)

------
bori
Vikings bury axes in Arctic ice for 1k years

~~~
aabhay
Apes bury coconuts in arctic ice for 1k years

------
sebazzz
For good order they should have archived npmjs, NuGet, and popular SDKs as
well.

My source code is worthless without it.

------
tapatio
Please don't store my garbage code from my teens for 1000 years.

------
coold
_node_modules should keep ice together

_ what about water chips?

------
fortran77
Why not 2,000 years?

------
mamon
How do I create a mirror of Github?

~~~
sdesol
You can't, unless your question is "how do I mirror a github repo?"

I'm currently indexing thousands of GitHub repos for analytics reasons, and if
you don't play nice (fetch infrequently), GitHub will throttle/terminate your
connection.

~~~
threeseed
GitHub isn't the first company to do this.

You just need to use one of the many scraping proxy services which will give
you a HTTP proxy that cycles through millions of IP addresses. It's impossible
to block since they are largely residential IPs controlled by malware.

It definitely works but you are encouraging them which isn't a nice feeling.

------
dutch3000
waste of time

------
mbostleman
So something on the order of 1000 js front end frameworks from now.

------
seesawtron
Does anyone know if they archived gitlab code as well? Edit: It wasn't because
Github and Gitlab have different management.

~~~
andy1729
The archive doesn't include code from Gitlab as GitHub and Gitlab both are
different entity managed by different group of people!

