Hacker News new | past | comments | ask | show | jobs | submit login

Archive Program director here - it's really not a PR stunt, we genuinely believe it will be of significant historical value and quite a good chance it will be of practical value.

Much of that is "if we forget technology which we realize somewhere down the road we actually might want to use again." History provides plenty of examples of this, and it's particularly important with a technology which mostly lives on ephemeral media that only lasts a few decades.

Even if you do expand your speculation to post-disaster scenarios, though, while it's true the archive wouldn't be an instant reset button, it would help greatly accelerate the recovery of technology. It's worth noting that it will come with a slew of (human-readable, not encoded) technical works regarding subjects ranging from modern software engineering to microprocessor design to photolithography to power systems, which we call the Tech Tree, along with a guide and index to all the stored repos. Wherever its inheritors / discoverers may be in terms of technological advancement, and especially if they have modern-ish hardware (which can last much, much longer than most storage media), recovering the archive's contents will be a lot faster than rediscovering them from scratch.

(Also worth noting we'll be storing "greatest hits" copies of the ~15,000 most-starred / most-relied-on repos, along with a sampling of several thousand repos with few/no stars, in a selection of places like Oxford's Bodleian Library; our hypothetical future tech seekers won't have to go all the way to Svalbard for those.)

I don't want to stress the doomsday scenarios too much, though, despite our ongoing pandemic. I think the most likely outcome by far is that progress will continue; the archive may be useful to recover a couple of otherwise forgotten technologies that suddenly become important / interesting; and it will ultimately be chiefly of interest to historians. That historical value is a key reason why it casts such a broad net. I too have a couple of fairly unsophisticated pet projects in there that the future won't be interested in individually - but collectively is another matter. One of the most interesting things our advisory committee told us is that history is replete with lists composed by wealthy people of the books they thought most important, carefully preserved for posterity, whereas what modern historians _really_ want is ordinary people's shopping lists, of which almost none survived. That's one reason there are millions of repos in the Arctic now, instead of eg just the most-starred 100K: some of those may be the modern technological equivalent of Renaissance shopping lists, for the historians who may take a particular interest in this (possibly) especially wacky and volatile era.

I know it's an inherently cinematic and dramatic project and so it's tempting to call it a PR stunt ... but I assure you, it's not, and, speaking personally, I would never have gotten involved with it if I thought it was.




People have some legitimate and some less legitimate criticisms here, in the HN comments section of course, but I for one think this is a fantastic effort and I'm pleasantly surprised to read what the new badge I saw on my profile yesterday is actually about.

There will always be "negative Nancies" -- especially here, they are everywhere -- but personally I'd just like to say thanks for having some vision outside of the normal day-to-day of making money for shareholders and keeping regular customers happy. More of this, please.


Did people with repositories know this was going to happen and did you give them a choice to opt out?


Rather more eloquently asked than by the other person I saw querying this[0]! I suspect it's covered under Github's TOS - specifically[1], only public repositories were included and these are all effectively just backups. Especially in the case of the vault in Svalbard. But you can opt out of the 'warm storage'[0].

[0] https://github.com/github/archive-program/issues/36 [1] https://docs.github.com/en/github/site-policy/github-terms-o...


I recognize they wouldn't have done it unless they felt confident of having the legal right, but it's just bad manners not to ask first.

If that's the case, this not-a-PR stunt degraded my impression of them.

I'm quite certain this isn't what their customers contemplated when reading "backup" in their ToS.

EDIT: Interestingly it says "This license does not grant GitHub the right to sell Your Content or otherwise distribute or use it outside of our provision of the Service.

It also says "You still have control over your content".

Is a subarctic vauly really within the ordinary course of providing the service? Did content owners have an opportunity to exert any control?

Most probably think it's neat, but GitHub would be naive to imagine everyone would consent.

Also what happens if it turns out one of those repos had personal information in it and the subject makes a GDPR right-to-forget demand? Are they going to drag it out and purge that bit of tape?


>Also what happens if it turns out one of those repos had personal information in it and the subject makes a GDPR right-to-forget demand? Are they going to drag it out and purge that bit of tape?

I believe GDPR has exemptions for archives ([0] section 28) so that's less of a concern for them I imagine. I recognise what you're saying, but I think anyone _very_ opposed would have a difficult time in court arguing GitHub should remove their work/name/etc. My (very loose) understanding of the law is that they would have to demonstrate some kind of loss. That being said, GitHub could just have sent a notification email with very little effort. Maybe 'no harm, no foul' applies here?

[0] https://www.legislation.gov.uk/ukpga/2018/12/schedule/2/part...


Hi Jon, Congratulations on moving forward with this. Thank you! If you ever think about what might come next in terms of being able to re-make computers and so on from scratch, here is a concept website I put up around 1999 (when I was trying to get NASA to support the work for space settlements). I still work on the general idea on-and-off in my spare time (generally at a more abstract level of software for sensemaking and organizing information) but so many other distractions get in the way: https://www.kurtz-fernhout.com/oscomak/goals.htm

From there: "The OSCOMAK project is an attempt to create a core of communities more in control of their technological destiny and its social implications. No single design for a community or technology will please everyone, or even many people. Nor would a single design be likely to survive. So this project endeavors to gather information and to develop tools and processes that all fit together conceptually like Tinkertoys or Legos. The result will be a library of possibilities that individuals in a community can use to achieve any degree of self-sufficiency and self-replication within any size community, from one person to a billion people. Within every community people will interact with these possibilities by using them and extending them to design a community economy and physical layout that suits their needs and ideas. As the internet has grown, it has enabled collaborative work which has created many success stories, including Linux, Python, GCC, Squeak and other projects. We want to harness that power and apply it to organizing technological knowledge in concert with many interested individuals. The main project goal is to develop an on-line library of technology ideas, techniques, and tools, including a range from high-tech processes like plastics to medium-tech like ceramic houses to low-tech like spinning wheels. Also included will be biotechnology processes, like perennial agriculture, companion planting, sheep farming, and eventually cloning and DNA synthesis. One process to be included is a way to convert the high-tech computerized library to a low-tech paper one as desired. Key to the whole endeavor will be to present everything in a how-to fashion. Also needed is a way to map out and simulate the interrelations of processes; for instance, sheep raising requires veterinarians, antibiotics, feed, fencing, and shears; shears require a blacksmith, metal, and a furnace. This latter feature also would be used to keep track of the product flows into, out of, and within a community's entire economy."


> Also worth noting we'll be storing "greatest hits" copies of the ~15,000 most-starred / most-relied-on repos, along with a sampling of several thousand repos with few/no stars

Making all of this code essentially useless. You'd need to store those repos and their entire dependency tree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: