
Archivists are racing to digitise 300 years of newspapers before they crumble - samclemens
http://www.theguardian.com/books/2015/jul/05/british-library-digitising-newspapers-boston-spa
======
jedberg
It's interesting that people are currently lamenting the fact that a huge
chunk of human history will be lost because it's all digital now, while at the
same time we are racing to digitize the past to prevent it from getting lost.

It almost seems inevitable that most of the details of human history will be
lost regardless....

~~~
waterlesscloud
The key advantage with digital is that it's easily copied.

For that advantage to be worth anything, the material has to actually
be...copied.

Central repositories for this stuff are fine, maybe even essential. But they
need to also be spreading around full copies quite liberally.

~~~
jedberg
> For that advantage to be worth anything, the material has to actually
> be...copied.

This is they key. I think people are afraid of our all digital world because
no one backs anything up.

~~~
agumonkey
Also, digital is so efficient it accumulates far faster than older mediums. I
find it a very telling pattern. Things are easy and so they get out of
control.

When I was a kid, a photo album had something like 50 tiny fading photos, and
it covered 5 years or so. Each one was a very high density of memories. Now we
have thousands of them. Curious.

~~~
jqm
In the old days someone put a photo album in a trunk in the attic. Then they
got old, forgetful, the kids didn't care about the photos, then the person
died. Years later someone else found the photo album.

What happens in this scenario with digital photos? Sure, if someone is careful
(and most people frankly aren't... I've seen more than a few people lose all
their photos) you can keep backing them up. Until you can't anymore or you
don't. Then they are gone.

------
slyall
A major Australian/NZ newspaper company shipped all their photos to the US for
digitization. The company ( Rogers Photo Archive) doing the digitization then
went bankrupt.

[http://www.theguardian.com/media/2015/jun/08/fairfax-
media-p...](http://www.theguardian.com/media/2015/jun/08/fairfax-media-photo-
archive-stuck-in-us-warehouse-after-digitising-deal-unravels)

Especially embarrassing the the NZ branch since they got special permission
from the government to ship the photos out of the country. Some of the photos
ended up on Ebay.

------
bane
Out of curiosity, there's a huge number of newspapers already archived on
microfilm (unless it's been all thrown away). I'm sure digitizing from
microfilm will be a reasonable alternative.

~~~
irv
it's mentioned in the article that microfilm is digitised

------
ams6110
Hm. Yet to be proven that any digital storage will last anywhere close to that
long. At best it will be a continually active process of refreshing the
archive and converting the data to whatever storage media and formats are
currently in use.

~~~
bitJericho
That's not a problem. The problem is in the analog mediums (paper) that will
always disintegrate over time. Digital media will not lose fidelity over time
at all. In 10,000 years, as long as the backups were maintained, the data will
be as good as the day it was made.

~~~
cpeterso
Moving the scans to new digital media and data formats is the hard part. It
takes effort and money, whereas analog formats need no extra work.

~~~
acdha
This is only partially true: with online storage, it's relatively easy to make
bit-for-bit identical copies even if the physical storage medium changes over
the years because there's always an overlap period when a technology falls out
of favor.

Where it gets expensive is when you neglect to do that and then 50 years from
now someone is pulling a Zip disk or LTO tape out of a box and wondering how
to read it.

In contrast, analog formats will always lose quality as you copy it so you
have a strong incentive to make copies which will last as long as possible. If
you get the right material it might be transferable in the future with no work
– e.g. high-quality photographic prints on archival-quality stock – or you
might end up needing to build exotic equipment which can do things like
optically scan records to reconstruct an audio waveform
([http://irene.lbl.gov](http://irene.lbl.gov)) or deal with media which has
disintegrated ([https://www.nedcc.org/audio-preservation/irene-
blog/2014/08/...](https://www.nedcc.org/audio-preservation/irene-
blog/2014/08/12/delaminating/)). One look through e.g.
[http://britishlibrary.typepad.co.uk/collectioncare/index.htm...](http://britishlibrary.typepad.co.uk/collectioncare/index.html)
should be enough to see limited a time period “no extra work” is valid for.

The common theme for both formats is that it's critical to maintain the
ability to read and make copies. Once something falls out of common usage the
cost to rebuild that capacity go up dramatically because you're no longer
enjoying mainstream economies of scale and the work will increasingly require
skilled technicians using bespoke tools.

This can be particularly bad with digital formats if the use of DRM means that
few/no people are legally allowed to create tools during the period where many
of the original creators are still available for consultation.

------
WalterBright
The idea of concentrating all these one-of-a-kind newspapers into one building
is crazy. What if it burns down?

> At such low oxygen levels, the contents simply can’t go up in flames.

Famous last words.

> And with standards for the documentation, archiving and accessing of data –
> official and personal – still being thrashed out,

I don't understand why this is a problem. Scan them to pdf files, and put them
on web pages. Let google index them.

~~~
jperras
It's a basic principle of fire safety engineering – paper requires a
concentration of more than 14.1% oxygen to allow combustion to occur.

Considering that paper composes the majority of the mass in that installation,
a sustained hypoxic environment at 14% or below is exactly how this system
should be designed.

If the papers were separated into separate warehouses, they would still all
have the same environmental requirements. Additionally, you require N times
more budget, where N is the number of warehouses you've constructed (not to
mention the difficulty in querying physically distributed warehouses for
information).

> I don't understand why this is a problem. Scan them to pdf files, and put
> them on web pages. Let google index them.

Before making such broad, sweeping statements, perhaps read up a bit on the
principles of information science:
[https://en.wikipedia.org/wiki/Information_science](https://en.wikipedia.org/wiki/Information_science)

Edit: You are editing your comment every few minutes, so I don't know what to
reply to anymore.

~~~
WalterBright
Ships have hulls to keep the water out, too, but sometimes the water gets in
and they sink. I can think of dozens of ways the archive could still burn.

Here's just one: large earthquake breaks open the building, cuts electric
power, breaks gas lines, fire starts. Firemen are overwhelmed and give
priority to saving civilians in other buildings, stacks of newspapers are at
the bottom of their list.

Another: Fire starts in building next door. Wind whipped flames set the
archive on fire from the outside. Archive burns down with everything in it.

A third: bunch of militants take over that part of town. Set fire to the
archive because they are opposed to history. Not like that has never happened
before, like in ISIS controlled Iraq, and the great fire of Alexandria.

It's the classic eggs-in-one-basket scenario.

~~~
JoeAltmaier
At least its a good basket, in a reasonable place, with safeguards in place.
Two or three 9's more likely to work than...doing nothing.

------
MichaelCrawford
I have some newspapers that my grandmother saved from world war II. Those were
quite unlike todays newspapers.

