
Building a Universal Archive of Source Code - rutenspitz
https://cacm.acm.org/magazines/2018/10/231366-building-the-universal-archive-of-source-code/fulltext
======
kragen
Archiving software is a problem we don't have an adequate solution to yet.
Part of the problem is that software is so interdependent: my small program
depends not only on the compiler that compiles it, but also the kernel it runs
on, the libraries it links with, and even the I/O peripherals it interacts
with.

Lorie proposed a "universal virtual computer", or UVC, to solve this problem,
but it omitted the description of the peripherals (which, I'm told, was the
hardest problem in bringing up the Spacewar emulator) and additionally has
such a loosey-goosey definition that no reimplementation from the spec could
conceivably be compatible with any other implementation.

Chifir, from Nguyen and Kay's paper, is a proposed solution to this for the
problem of _one single_ computer. You can implement the CPU part of Chifir in
under an hour. We can probably come up with a better bootstrappable CPU;
Chifir shows a promising direction to go in.

The biggest serious project in the direction of an archive of _buildable_
source code is Debian. Thousands of volunteers work constantly to ensure that
every version of every package can be built from source at the time, and it's
mirrored all over the planet to protect from accidental loss. Unfortunately,
Debian doesn't have a commitment to maintain compilability of old versions,
although they do retain the source code.

------
lioeters
Here's the archive itself - apparently including all public GitHub repos:

[https://archive.softwareheritage.org/](https://archive.softwareheritage.org/)

------
ktpsns
That's funny, I first thought about paper.
[http://www.bitsavers.org/](http://www.bitsavers.org/) is a famous archive of
vintage computing software (and hardware documentation), and it is just a
large folder of PDFs. In contrast, when I browse
[https://archive.softwareheritage.org/](https://archive.softwareheritage.org/)
it feels like a combination of Github and the Wayback machine
([https://archive.org/](https://archive.org/)).

How do they intend to archive the generated documentation of software? How do
they intend to keep this stuff running? How do I even "run" a generic code for
something in an arbitrary github archive? Just storing and presenting git
repositories nicely doesn't sound like a breakthrought, especially if it is a
clone of github, i.e.

------
kwhitefoot
I only scanned the article so perhaps I missed it but I can't see any mention
of build processes and platforms to run the built artefacts on.

My experience of software development is that just having the source is close
to worthless unless you also have some way of building and running it.

For instance I just downloaded Microsoft's WCF samples. Even with the
instructions I am unable to use them because the instructions do not apply to
the platform I am using (Win10 Home). If something current from a major
supplier that is, presumably, interested in making it work doesn't go then
what hope is there for something decades old that runs on an OS and hardware
that no longer exists?

~~~
tjr
I work in aerospace. We have a large group of people dedicated to archiving
software. When we send them something to archive, we need to provide not only
the software itself, but also provide (or provide a link to a previous archive
of) all related libraries, build tools, compilers, operating systems, etc.

The idea is that absolutely anything _could_ be rebuilt, if needed.

Not sure what they do for old development hardware! An interesting question,
if you can't run Windows 3.1 on the latest Dell, or whatever. (Or... VMS...)
Virtual hardware, perhaps?

~~~
_trampeltier
Just run the old OS in a VM. But that's usually not the real problem. Often
there are some old special hardware interface cards somewhere inside from the
old machines.

------
stcredzero
The old code viewing/metrics CodeCrawler project used a universal format
called MOOSE to compare software projects across languages. (Even across
programming paradigms.)

