
Delete Never: The Digital Hoarders Who Collect Terabytes - yazr
https://gizmodo.com/delete-never-the-digital-hoarders-who-collect-tumblrs-1832900423
======
blakesterz
Somewhat related... I'm starting to think the fear we had 20 years ago about
formats going obsolete is largely overblown. I know for sure that SOME formats
will become unusable in another few decades, but the major things we all use
for this type of stuff will pretty much never die. I bet we'll be able to open
a GIF, JPG, PNG, MP4 in 100 years just as easily as we can today. There's just
way too many of these things around now.

I know libraries and museums are full of oddball things like wire recorders
and wax cylinders that are used as examples, but I'm just not sure that's
applicable to most (not all) digital files now. I just can't imagine there
will ever come a day where we'll say "It's time to convert these 1 billion
PNGs we have saved to the latest greatest format or we'll never be able to use
them". Hopefully I'll be alive in 30 years to see if I'm wrong :-)

~~~
joepie91_
There are _already_ digital file formats that are difficult to read - often
made worse by them having design quirks that are not documented anywhere and
only exist in the original implementation, which of course won't run on modern
systems. The same can happen for particular (non-standard) variations of
documented formats.

Digital data formats have the same problem as large and popular websites (eg.
social media): they _look_ permanent, simply because they're so popular, even
when they never _are_ really permanent. At some point, they're going to fall
into disuse, and from then on, documentation will slowly fade away.

This is why it's so important to work on permanently documenting these file
formats today, while the 'documentation rot' is still fairly limited. It's why
things like this exist:
[http://fileformats.archiveteam.org/wiki/Main_Page](http://fileformats.archiveteam.org/wiki/Main_Page)

------
zaarn
Atm I have about 40TB of raw storage, of which 30TB are available and 18TB
used for data. Most of that data is hoarded data. Stuff like the entire image
archive of the apollo missions or all public domain research papers.

I find it very important that people keep this stuff around, the internet
forgets so easily.

~~~
rambojazz
> all public domain research papers

You mean _all_ of them? How many are there?

~~~
zaarn
All I could get my hands on, I cannot check the exact size at the moment but
it should total in at around 150GB of data.

~~~
rambojazz
Is there any public archive of this (and possibly other free data too)? I'd
love to hoard some libre bits myself :)

~~~
zaarn
There is a fairly large archive under the torrent crossref-pre-1909-scholarly-
works, which contains pre-1909 works and is public domain. I didn't grab that
one but it should be easy to find with any search engine.

There is also lots on archive.org!

------
blastbeat
I can relate to this, although I to do this only on a very moderate level. As
a side note, despite the technological advances of the last years, you can
meet quite a lot of collectors in private direct connect hubs these days.

~~~
fwsgonzo
Where do you find the hubs? And which DC client is not terribly outdated, or
has a horrible reputation?

~~~
blastbeat
The client which today is mostly used in private hubs is AirDC. It is
optimized for efficient file transfer and comfortable file organization. For
instance, AirDC lets you define different share profiles for different hubs,
which is tremendous useful to satisfy different share rules for different
hubs. To do so, AirDC makes heavy usage of the ADC protocol
([http://adc.sourceforge.io](http://adc.sourceforge.io)), and they implemented
nice extensions for it over the years. Recently, build a web interface. Now
you can run AirDC on your server, and access it via browser from your
smartphone. Otherwise, DC++ is still a legit choice. Both clients are active
and maintained. There are other clients as well: ApexDC++ (modern client, but
runs still on WinXP, as far as I know); ncdc
([https://dev.yorhel.nl/ncdc](https://dev.yorhel.nl/ncdc)) and eiskaltdc
([https://github.com/eiskaltdcpp](https://github.com/eiskaltdcpp)) both run at
Linux and are maintained.

Concerning hubs: There are hublists, which include open hubs; but don't except
quality and safety there. Most great hubs from the past went off the grid at
least a decade ago. To go there, you need an invite. You also need to obey to
draconic rules, like 24/7 uptime or 6 TB min high quality share.

------
malshe
Earlier post:
[https://news.ycombinator.com/item?id=19315821](https://news.ycombinator.com/item?id=19315821)

------
5555624
I can relate,too; but, I'm not sure I can access some of it. While some is on
floppies and I can read them -- some have failed -- stuff that needs my Zip
Drives or Bernoulli drives? I'm not sure either drive has survived the last
few moves. (I know the Windows 98 computer runs; but, I'm not sure about the
one with Windows 3.11.) Maybe that should be my project for 2019.

