

All the documents that Google deleted from Google Groups, saved by Archive Team - sp332
http://www.archive.org/details/archiveteam-googlegroups

======
dantheman
This guys and gals really do a great job of stepping up to help save our
history. Often times they have very little notice making their job incredibly
difficult.

If you ever run a service that contains user created data, please be
responsible when shutting it down by providing a way to archive it.

Congrats

~~~
stinkytaco
>If you ever run a service that contains user created data, please be
responsible when shutting it down by providing a way to archive it.

I actually find this to be an interesting question. I certainly feel that the
Google Groups stuff is likely worth saving. Project histories, answers to
obscure technical problems, etc.

But what about something like grouphug.us? Or even Facebook? I've got to
imagine there are things put out there in the irresponsibility of youth or the
first blushes of adulthood that someone is practically aching to have
disappear from the Internet. To have the Internet Archive come in and save it
all might be... unfortunate for some.

~~~
gwern
> But what about something like grouphug.us? Or even Facebook? I've got to
> imagine there are things put out there in the irresponsibility of youth or
> the first blushes of adulthood that someone is practically aching to have
> disappear from the Internet.

Of course. And there's tons of awesome stuff in those sites - just like Google
Groups. You're wrong just like all the people mocking Archive Team for caring
about Geocities or Friendster are wrong, you just draw the line around
something you have experience with, is all.

At these sizes, your intuitive impressions of average quality are _completely
irrelevant_. It's all worth saving.

~~~
semanticist
It's not about the quality of the content on sites like grouphug.us, but the
nature of the content - things that the poster might be happy to be forgotten
in 20 years time.

Previous generations youthful indiscretions tended not to be preserved for all
time, for the most part.

~~~
unimpressive
Even better. All the stupid stuff you did on the Internet that you've
_forgotten_ that you did. There are no doubt numerous examples that could come
to haunt me in later years. Stuff I don't even remember.

Part of the problem with putting your thoughts down in writing is that if you
change your mind later the writing is still there. I think that in the years
to come were going to see more scandals arise from this sort of thing.

------
eps
I missed it -- why did Google decide to purge the documents in the first
place? That can't be _that_ short on disk space, can they?

~~~
Dylan16807
I'm sure the have copies laying around if it's only a TB. But I'm sure
google's goal was to shut down a service to cut costs and complexity.

~~~
jcrites
Indeed. The price to keep the information is peanuts.

Take Amazon S3 (at its highest price point, <= 1 TiB):

<http://aws.amazon.com/s3/pricing/>

($0.125 per GB) * (1 terabyte) = 128 U.S. dollars per month.

I'm surprised Google doesn't want to keep the data available in order to data
mine.

------
sp332
If you're wondering how they did it:
<http://archiveteam.org/index.php?title=Google_Groups_Files>

------
lubujackson
Google Groups used to be DejaNews, which itself was an archive of Usenet. So
this a hugely important preservation of Internet history. Congratulations and
thank you, Archive Team!

~~~
unwiredben
This archive isn't of all the Google Groups messages, but instead of the
separate file upload area that each discussion group had.

------
wybo
Wonder when someone will create a torrent of this, so it can be stored in a
distributed way

------
gst
Did they get some kind of permission by Google or by the authors, or do they
just hope that no-one will sue them for copyright infringement?

