
Archive.org and California to start a data sharing and preservation project - bpierre
https://blog.archive.org/2018/06/05/internet-archive-code-for-science-and-society-and-california-digital-library-to-partner-on-a-data-sharing-and-preservation-pilot-project/
======
AdmiralAsshat
Does anyone here work at Archive.org? Can you speak to how well-funded the
organization is and what sort of measures are in place to keep it afloat? I
think it's a fantastic service and I donate, but I worry that it could vanish
the next day if funding suddenly dries up. I feel like a large corner of the
Internet collectively takes the site for granted and don't bother doing their
own in-house archiving because "TheArchive will just suck it up for us."

~~~
bnewbold
I can't speak formally for the Internet Archive, but the existing content and
services are not going to disappear overnight: funding comes from several
sources, thought has been put in to organizational structure, and things have
been designed to keep core access and preservation infrastructure running with
minimal cost and effort (eg, if the economy tanks).

Getting the content coverage people sometimes assume we already have is
another matter. Additional funding (thanks for you donation!) go towards
additional crawling and keeping up with the endless treadmill of media types
and protocols. Eg, headless browser crawling development and deployment to
capture javascript-heavy sites
([https://github.com/internetarchive/brozzler);](https://github.com/internetarchive/brozzler\);)
this is much more expensive than "classic" crawling.

For more on increasing storage costs and the under-funded state of web
archiving in general, I recommend David Rosenthal's blog, eg:

[https://blog.dshr.org/2018/05/longer-talk-at-
msst2018.html](https://blog.dshr.org/2018/05/longer-talk-at-msst2018.html)

[https://blog.dshr.org/2014/03/the-half-empty-
archive.html](https://blog.dshr.org/2014/03/the-half-empty-archive.html)

Far more effective and robust than hoping the archive is "suck it up for us"
is to upload snapshots/dumps/exports yourself! Anybody can create an
archive.org account and upload content (recommend
[https://github.com/jjjake/internetarchive](https://github.com/jjjake/internetarchive)
over the HTML form), within reasonable limits. Obviously, care needs to be
taken to remove sensitive (and personal) information first.

~~~
toomuchtodo
These blog posts are fantastic. Thanks so much for sharing them.

------
egh
To be clear, this is not the state government of California, but a division of
the University of California (the California Digital Library) working with the
Internet Archive and Code for Science & Society.

~~~
dragonwriter
> To be clear, this is not the state government of California, but a division
> of the University of California

The University of California is a part of the government of the State of
California established in the State Constitution, whose governing body is
comprised of 18 members appointed by the Governor and confirmed by the Senate,
plus seven ex-officio members, three of whom are State elected Constitutional
officers (Governor, Lt. Governor, and Superintendent of Public Instruction)
and one of whom is the Speaker of the Assembly.

(That said, it _is_ unusual and potentially misleading to refer to UC as
“California”, but not because UC is actually separate from the government of
the State.)

------
tannhaeuser
I hope archive.org doesn't fall victim to GDPR and the upcoming EU "copyright"
reform (eg. will stop serving to EU). I'm not a fan of the vague "right to be
forgotten" concept as it applies to individuals, and think history rewriting
is a much more serious issue going forward.

Though I've heard credible complaints from "copyright" holders vs archive.org.

~~~
adrianratnapala
Yes. The popular sentiment here is "GDPR good, EU copyright reform bad". And
that's understandable.

But both data-privacy and copyright they try to create ownership of
information and must do so through intrusive legal measures because physical
nature makes is against it.

------
app4soft
TL;DR:

 _The project aims to demonstrate how members of a cooperative, decentralized
network can leverage shared services to ensure data preservation while
reducing storage costs and increasing replication counts._

------
WindowsFon4life
The former to be influenced to the "common truth" by the latter, no doubt.

