
Internet is losing its memory: Cerf - adrian_mrd
https://www.itnews.com.au/news/internet-is-losing-its-memory-cerf-495854
======
stillkicking
In many ways he's understating the problem. At least there used to be file
formats to support, and files to lose. But how much of the data you interact
with daily is even directly accessible to you? How much of it can you access
when you're offline?

Take Slack. You can install it locally, but that doesn't matter, because it
won't start without an internet connection, and its local storage is
completely opaque. Compare this to a mail client, where everything is stored
locally, indexed and searchable offline, and can be exported into a universal
albeit messy MIME format, as well as imported into other accounts.

Of course this is necessary for the business model, the Slack free-plan event
horizon wouldn't be effective if it only applied to which new data you can
sync down... and if users only discovered this when e.g. moving to a new
computer, they would quickly start to wonder why they can't just transfer the
data they already have themselves.

With Google Drive, your 'docs' are just placeholders pointing to the cloud.
You can export them individually to Word or print them to PDF, but it's a
manual process, which you will likely only think of when it's too late.

For an example of how this can really matter: an acquaintance is embroiled in
a legal dispute with an ex-employer. Thanks to Apple's sane implementation of
IMAP Mail on iOS, she still has access to all her company communications there
to use as evidence. Unlike on her PC, where she was just using webmail the
entire time and has nothing.

The incentive in the cloud age is to create dysfunctional products that
provide an illusion of permanence instead of the reality of tangibility. I
expect this is only going to become a bigger problem over time.

~~~
knorker
> You can export them individually to Word or print them to PDF

Actually it's really really easy to do in bulk:
[https://takeout.google.com/settings/takeout?pli=1](https://takeout.google.com/settings/takeout?pli=1)

Not invalidating the rest of what you said.

~~~
013a
I think the rest of what they said matters, though. Its not "preservable by
default". If you need access to that data in any situation where your
connectivity to Google or that account is severed, you're shit out of luck.
The best case example is if you don't have internet. The bad case is really
that "legal dispute" argument, where you're dealing with a bad actor who has
power over you and your Google account. The worst case: Google themselves
severs your access.

~~~
knorker
I'm trying to not phrase this in a dick way, but what did you think I meant by
"Not invalidating the rest of what you said"?

------
ravenstine
People need to keep their own personal archives. While there are great
projects like archive.org, it's becoming more clear that it's not enough.
There's not enough resources for an organization(that isn't spying on
everyone) to archive YouTube, or Slack history, etc., so it's up to you to
back up that data if it's important to you. Culture can be easily lost to the
void if we don't do this.

As I've mentioned in another HN thread, I've been using youtube-dl to back up
YouTube videos that are important to me since it's increasingly obvious that
YouTube is merely a profit machine that would delete all of its user's videos
in one fell swoop if they made an extra buck off it. I've also begun to back
up some blogs that have vast mounts of great content, but I've found that
HTTrack is... well, terrible. That's not to say it doesn't technically do its
job, but it's way too aggressive. I wan't something that's easier to configure
and can do a better job of taking "snapshots" of blog entries while blocking
ads, javascripts, being able to resume properly if my power goes out, etc.

~~~
txsh
I noticed old articles I had bookmarked were disappearing so I’ve started
using a self-hosted Bookstack to save a copy of everything interesting I read.
I enter reader mode and copy eveeything on the page, and then paste it into
the WYSIWYG editor. Everything is formatted nicely and it even creates a menu
using the head tags of the sections. The only problem is, while images copy
correctly, they’re still hosted on the original server so I have to replace
them with an uploaded copy. Only takes a couple of minutes though. I could
probably automate this with a script.

~~~
18pfsmt
If you use Firefox, there have been several implementations of this add-on
that I've used for ~10 years:

[https://en.wikipedia.org/wiki/ScrapBook](https://en.wikipedia.org/wiki/ScrapBook)

------
paidleaf
Losing memory via attrition is certainly a concern. But I think the bigger
concern is losing memory via censorship and copyright. For most people, the
internet is now google search and a couple of social media companies. Google
has been curating search results for a while now. We no longer see what is out
there in the internet, but a narrow google approved view of the internet. For
example, 10 years ago, google showed us the entire elephant ( more or less ),
today, we only get to see the tail or the trunk or whichever part of the
elephant google decides to frame. Now all of the social media companies are
now curating. And with the loss of neutrality, ISPs will undoubtedly curate.

And then there is the copyright issues where archiving is getting more
difficult. Kids today don't know what a great tool google cache was because
it's gone. And archiving sites are being attack from news, media, politicians,
etc.

In other words, the internet dark age isn't going to be a result of formats
getting old ( though that is an issue ). It's going to be a result of us only
being allowed to see through the frame that a handful of companies deem
appropriate. The dark ages didn't happen because formats got old. The dark
ages happened because the powerful decided that we should view the world
within a certain narrow religious frame and censored everything else.

~~~
rezeroed
In the olden days, everyone was slapping together their own websites, rather
than using the same old templated wordpress, hugo, etc, so there was far more
character, and not everything was a blog. And so people surfed the web. These
days the web is more purpose oriented, people looking for something, the
"surfing" all within walls, wave pools.

~~~
floren
The days of the web ring, which often as not was broken or took you through
dozens of content-free sites but could occasionally drop you into something
fantastic.

Browsing Internet Directories.

Just plain finding some guy's massive meticulously-maintained HTML-only site
full of fascinating articles, bookmarking it, slowly working your way through
it over the course of your evening browsing sessions.

------
WalterBright
By contrast, before the internet, most everything was routinely lost anyway.

Only papers from famous people were kept. Even so, for example, HP's
historical archive was all on paper and placed in a single building. Which
burned down.

The WTC collapse destroyed the unpublished archive of Kennedy pictures.

~~~
krapp
We still have receipts and tax records carved into ancient Babylonian clay
tablets, and letters from the 18th and 19th century, so I believe it's not
really true that only papers from famous people were kept, rather that
artifacts of interest to elite classes (be they religious or scientific)
tended to remain preserved or published.

Although it is interesting (and a bit depressing) that we can't really seem to
escape that model for preserving knowledge over the long term. Physical media
needs patronage, buildings, printing presses, etc, while digital media needs
infrastructure, manufacturing, programmers, etc.

~~~
WalterBright
Do you have any letters, photos, tax records, etc., from your great
grandparents? Do you know anything at all about your ancestors who came to
America (if you're living in America)?

------
AnIdiotOnTheNet
I'm probably weird in that my philosophy doesn't jive well with the current
societal norm, but I think it is important to come to terms with a simple
truth: All things are ultimately lost. Physics tells that even the universe
itself will end, even if different models disagree as to precisely how. Things
begin, they exist, and then they end. That they are not eternal does not mean
that their existence was meaningless. Which isn't to say that preservation is
pointless or wrong, but I think it is unhealthy to try and hang on too firmly
to the past.

~~~
IGI-111
It's the good kind of nihilism I suppose: accepting that ultimate futility
doesn't spare you from having to strive for meaning, but it does help you
relativize your losses.

That said on the topic of conservation, or more accurately History, I don't
think you can convincingly argue for forgetting things as a society. The cost
of repeating errors are as great as the benefits of safekeeping knowledge.

~~~
AnIdiotOnTheNet
I'm advocating for the letting go of things. Certainly there are things that
are valuable to preserve, but preservation for preservation's sake, I think,
is unhealthy. Will society really benefit from preserving every geocities
site, every newgrounds flash game, every scrap of poorly written slashfic?
Probably not. Are there instances of each that are culturally relevant? Sure,
and it might be worth keeping those around longer.

------
kwhitefoot
The comment about Archimedes and infinitesimals makes reminds me of something
that has been bothering me for a long time: what brilliant ideas are there
lurking in the world unknown to almost all of us that could change the course
of history if only they were publicised?

~~~
bcaa7f3a8bbc
I have a similar thought experiment (or Sci-Fi plot?) for a long time.

If an advanced alien civilization came to Earth, and helped ancient human
civilizations to built and operate a computer system, which can share, store
and translate scientific discoveries all across the globe (while they are not
allowed to help humans besides operating the system) throughout 5000 years of
history - how could it change the human history?

~~~
gsanghera
The ancient humans would then also (eventually) develop patents because
someone somewhere would have thought about it even back then - and with
worldwide distribution - it would catch on. Effectively, what I'm saying is -
when faced with a common greater good that is a game changer for humanity,
humanity then has to collectively want the greater good. And humans can be
selfish - ancient or not.

------
pmlnr
[http://longbets.org/601/](http://longbets.org/601/)

[https://adactio.com/journal/11937](https://adactio.com/journal/11937)

"The original URL for this prediction (www.longbets.org/601) will no longer be
available in eleven years."

This has been a known problem for a while, but it's always nice to see more
voices crying out about it to the more general public.

------
jacquesm
Due to the Curse of SaaS soon the web won't be about documents any more but
about applications that have no obligation to preserve anything. The only
thing you can be sure of is that your advertising profiles will be stored for
the next 100 years or so.

------
013a
Frankly, its somewhat hilarious to hear all this coming from Cerf, who works
for the worst offenders of data centralization of all time. Think of the data
that gets lost on Drive every day due to their product decision around the
SaaS model + proprietary document formats, or the data that was lost/hidden on
Code, Wave, Plus, their shattered IM platforms, etc.

------
wolfgke
A large problem in archiving is actually the copyright law. Sites like sci-hub
or Library Genesis do a great job in archiving scientific papers and books,
but are illegal.

Also archive.org is sometimes on the brink of legal/illegal, e.g. not all
material that can be found on archive.org is really legal; though for lots of
the formally illegal archived content, the copyright holders do not care or do
not want to cause an outcry.

------
devilmoon
Slightly related to this article, I recommend this Tedx talk by Cerf himself:
[https://www.youtube.com/watch?v=GV0A82TCrf0](https://www.youtube.com/watch?v=GV0A82TCrf0)

------
imhoguy
Adobe Flash player (closed SWF format) death takes toll also with uncountable
number of human artistic creations, games etc. I wonder if there is some way
to preserve that relic of 2000s.

~~~
icebraining
Seems like there are a few people still working on Shumway, the flash engine
written in JavaScript.

~~~
theandrewbailey
I doubt anyone is working on Shumway. Its in the Firefox graveyard.

[https://bugzilla.mozilla.org/describecomponents.cgi?product=...](https://bugzilla.mozilla.org/describecomponents.cgi?product=Firefox%20Graveyard)

~~~
icebraining
From Mozilla yes, but there was some activity by others this year:
[https://github.com/mozilla/shumway/pull/2442](https://github.com/mozilla/shumway/pull/2442)

------
dahart
At first I thought this was saying _all_ information on the internet should be
preserved (which is a growing privacy concern), but after reading more
carefully I think it’s about having a way to preserve the information that is
intended to be public and permanent from the beginning. We don’t have any good
ways yet to guarantee future access to public information.

I’d never heard of the Digital Object Architecture (DOA). The article itself
makes light of the unfortunate acryonym with “History pronounced DOA”. That
actually left me confused about what they were talking about for a minute.

No idea if it’s a good way to preserve academic papers on the internet, but
the business model side of it is a pure open question. That makes me wonder
whether it solves anything at all. The problem with information on the
internet is that the people who publish eventually lose the interest or the
ability to continue paying for storage and access.

“Economics/Business Model

While the Handle System has been used for many years in publishing and library
systems, generalizing to other applications, e.g., Internet of Things, will
likely generate economic concerns related to the business model of the system,
especially at the Global Handle Registry. Will organizations be charged for
each identifier? Will organizations that acquire a prefix be able to create
unlimited sub-prefixes or will they be charged for each sub-prefix? How will
these policies be developed? How will the money flow? What will be the impact
on developing countries or small businesses?”

[https://www.internetsociety.org/resources/doc/2016/overview-...](https://www.internetsociety.org/resources/doc/2016/overview-
of-the-digital-object-architecture-doa/)

------
bcaa7f3a8bbc
Think about it, we still have almost-complete versions of Usenet archives from
the 70s to the late-90s, the whole network has been preserved as if it was a
time capsule. You can visit [https://olduse.net](https://olduse.net)
(maintained by a retired Debian developer), and see the heyday of hackers and
early technological adopters as if it was just yesterday, the entire society
is archived here.

Discussing about the 4-color theorem recently proved, latest version of C
compiler, difference between a vacuum tube amplifier and a solid-state
amplifier, where GNU Project and Linux kernel was launched, and early online
culture and tons of colorful, hilarious, but forgotten and buried memes, and
weird phenomena emerged from the collective (un)consensus... Sci-Fi fandom
being an integral part of online and hacker culture, millions of lewd story
written in alt.sex, _" Immediate Death of Usenet Predicted!"_, _" There is no
cabal"_, _alt.french.captain.borg.borg.borg_ , Coffee and Cat warning, The
church of Kibology, Anti-spam Movement, creationism vs evolutionism debates at
talk.origin, Meow Wars - the first meme war online, all the personal attacks,
trolls, flame wars, and "cyber-stalking", etc.

Then centralized WWW replaced distributed Usenet, crappy HTML replaced perfect
machine-readable data format. Would we have a similar archive for Reddit or
Hacker News? Possibly not. _So Hacker News, just come and create one! You can
make it!_ Anther unique challenge created by WWW is the inaccessibility of
server-side software - exporting and preserving the data is NOT enough, unlike
Usenet which you can just load any data. The user-interface and functionality
of one website itself is also the collective memory that needs to be preserved
- we need replicated software of a website, which has identical user-
interface, which has all the functions from the original website: users to
click an username and see the posts, karma of this user, etc. I don't think
anyone even noticed the existence of this problem. Luckily, major websites
online such as Reddit or 4chan, all use FLOSS software which would make the
work easier, but still a huge challenge due to the inaccessibility of raw
database. Also, to make some contents meaningful in the future, external
resources such as hyperlinks to other websites and images should also be
preserved, considering this, the chance of creating an authentic and complete
archive is even lower.

\---

But even if we're still using a distributed network where data preservation is
still technically possible, and there is no walled garden, it may still be
difficult to implement. In the era of Usenet, you often attach your name,
address and phone number - there was virtually no threats except for a few
trolls - this is why archiving Usenet was possible in the first place. But the
Internet is not the Net anymore, now not only humans - almost every piece of
equipment involved on the route may be your enemy.

The ongoing security and privacy movement is a huge threat of historical
records. From my observation, at least of infosec hackers community - After
Snowden's revelation, public and open discussion is slowing being transformed
into private, closed, encrypted and temporary activity, plus self-hosted
platforms like ActivityPub, GNU/Social, Mastodon. This is indeed good from a
security and privacy perspective and it is exactly what we need now.

But we are also creating a huge gap of knowledge, information and history on
the Internet. After my death, none of my self-hosted code, or my blog, or my
GNU/Social posts will survive. In conclusion, _" collect 'em all"_ is both an
malicious NSA dragnet surveillance, and a glorified act of history
preservation. This is where the contradiction lies.

I don't know what to do. For WWW, archive.org is a workaround and I think it
needs more donation. But for all the other self-hosted things like Mastodon
and git server, there is no solution at all.

~~~
slimshady94
How do I navigate this olduse.net site? Is this a forum? I can't make heads or
tails of how to see all those articles you mentioned. I've clicked every link
on the homepage.

~~~
bcaa7f3a8bbc
Use your Usenet client to connect the server nntp.olduse.net. While the
original news client rn by Larry Wall written in 1984 is probably not working
today, other implementation, such as slrn is still here.

See also:
[https://en.wikipedia.org/wiki/Newsreader_(Usenet)](https://en.wikipedia.org/wiki/Newsreader_\(Usenet\))

Note that olduse.net is a playback of Usenet, time-shifted 20 years back, some
of the events I mentioned has yet to occurs, you can navigate the website,
download the original archives, and load it by your own to explore.

If you don't have the setup yet, to get an quick idea of how it works, read
these interesting articles.

* [http://olduse.net/blog/what_rms_saw/](http://olduse.net/blog/what_rms_saw/)

* [http://olduse.net/blog/Dennis_Ritchie/](http://olduse.net/blog/Dennis_Ritchie/)

* [http://olduse.net/blog/stargate_controversy/](http://olduse.net/blog/stargate_controversy/)

You can also just browse the old Usenet from Google Groups. It's the same
contents anyway, but the experience is poor.

------
unquietcode
It's worth noting that in complex systems, forgetting can be just as important
as remembering. The ability to evolve and change is in part predicated on the
ability to selectively forget some elements that are no longer helpful. As
another commenter pointed out, some aspects of our society's cultural memory
are, probably, best left in the past, if preserved at all.

~~~
notabee
Knowing what should be forgotten requires either foreknowledge of the future
or relies on inaccurate predictions based on current assumptions. Thus the
less information of uncertain usefulness that is retained, the more that those
assumptions dictate what will be considered useful in the future without
having to rediscover things entirely. That limits adaptability. Just try to
imagine how many things were invented in the past that weren't seen as useful
at the time and had to be rediscovered later. How many ideas were lost in the
Dark Ages, for instance, because they offended religious sensibility at the
time? If there is capacity to retain information in an organized way without
undue cost, it should be retained as a hedge against future uncertainty. No
single generation of humans should be trusted to make such decisions without
being unduly influenced by the biases of their time.

------
walterbell
If you're on Windows, [https://www.mailstore.com](https://www.mailstore.com)
has a mature, free version that will backup all your local and cloud mail to a
searchable offline archive. Uses standard formats. Can also move your mail
between cloud providers.

------
sulam
We have been forgetting things for thousands of years. Last I looked, we still
don’t know how to make Damascus steel or Roman concrete. People will reinvent
the things they need and maybe never even know that it’s a reinvention. I
reinvented a couple fundamental graphics techniques because I needed them and
didn’t study computer graphics in school. I’m sure this happens all the time.

I’m all for making it easier to recover data and generally make it easier to
store stuff. And I believe you have a right to your data (yay GDPR). But this
isn’t a catastrophe.

~~~
arbitrage
> we still don’t know how to make Damascus steel or Roman concrete

That's an oft-repeated but ultimately meaningless statement.

We know how to make things better than Damascus steel or Roman concrete. Our
processes have exceeded the ancients. The quality of Damascus steel is likely
overrated, since it wasn't quite as absolute shit as what was being regularly
traded at the time. Same with concrete.

We might not have exquisite written recipes and procedures for these
materials, but they have been reverse-engineered, and it turns out, they
weren't that great compared to modern chemical and material engineered
products.

We don't need to 'rediscover' Damascus steel because it is obselete. A
romantic idea, and poetic in how it was 'lost' to time, but consider that it
was 'lost' to time, similar to Japanese steel-working, because nobody wanted
to buy it anymore. It became economically and culturally irrelevant.

~~~
sulam
Exactly. We forgot how to do a thing and invented better. The “lost” knowledge
isn’t really worth much to us today.

And that will be true of essentially everything we forget in the future as
well — the forgetting is the sign that it wasn’t needed.

------
rmorey
In case anyone hasn't already heard if it,
IPFS([https://ipfs.io](https://ipfs.io)) is a really cool project that aims to
help solve this problem

~~~
mbowcutt
Glad someone else mentioned it. I haven't seen much news from them recently
other than a picture-sync application.

Looking forward to the next few years and their (anticipated) growth

------
_salmon
The Archive Team has been doing this work for at least 10 years. If you'd like
to help archive the internet, they have an easy-to-use tool you just run and
let work in the background.

[https://archiveteam.org/index.php?title=ArchiveTeam_Warrior](https://archiveteam.org/index.php?title=ArchiveTeam_Warrior)

