
Yahoo Groups Archiving Progress - themadprogramer
https://data-horde.blogspot.com/2020/01/saving-private-groups-this-time-mission.html
======
ehsankia
I don't understand why Yahoo or whoever is orchestrating this shutdown is so
hostile to the archival process. People are offering their own time and
storage space, how hard would it be for whoever is working on the shutdown to
just allow people to easily download all the raw data, instead of having them
manually scrapping. Wouldn't the latter be much worse for both parties?

This is data going back decades, there has to be someone in power somewhere on
that team that realizes the value of this data? We're not asking them to keep
it up, just to make it easy to archive.

~~~
thrownaway954
sometimes people want to let things die and for good reason. i can only image
what a burden it has been to police something like a newsgroup server over the
years. i can also imagine that they haven't caught everything, so they just
want it to go away.

remember that not everything is precious and needs to be reserved.

~~~
betamaxthetape
I completely agree, but the way they handled this was appalling:

(a) The "We're deleting all your content" message was only visible on group
home pages, not when viewing messages, meaning a lot of the people who used
the web-interface didn't know about the shutdown.

(b) The email sent out to all members about the deletion was sent weeks after
the initial notice, often didn't arrive, and was inaccurate (dates had changed
by the time people received the email). So many folks that used the email
interface also didn't know about the deletion.

(c) The amount of time given was far too short. Yahoo Groups had existed as a
brand since January 2001, and was based off Yahoo Clubs / eGroups, both of
which had existed for several years before that. Two months notice for a 20+
year old service is laughable.

(c) The site has been unreliable during the last few months, due to the high
traffic of people trying to save their stuff. This has significantly
complicated efforts to archive it.

(d) Yahoo's solution to the data destruction, their "Get My Data" feature, is
poorly thought out and often doesn't work. No photos are included in the
download unless uploaded by that account. Many groups had members who uploaded
photos / scans / schematics / etc.. and since left the group. Those photos are
now lost. We've also found the downloads to be missing content that
necessitated the request be made multiple times until you could patch together
a complete version from all the incomplete ones.

------
Keppl8R
Also happening with RootsWeb Mailing Lists!

Beginning March 2nd, 2020 the Mailing Lists functionality on RootsWeb will be
discontinued. Users will no longer be able to send outgoing emails or accept
incoming emails. Additionally, administration tools will no longer be
available to list administrators and mailing lists will be put into an
archival state. Administrators may save the email addresses in their list
prior to March 2nd. After that, mailing list archives will remain available
and searchable on RootsWeb. As an alternative to RootsWeb Mailing Lists,
Ancestry message boards are a great option to network with others in the
genealogy community. Message boards are available for free with an Ancestry
registered account. Thank you for being part of the RootsWeb family and
contributing to this community.

Sincerely, The RootsWeb team

[https://home.rootsweb.com/](https://home.rootsweb.com/)

~~~
syntheticnature
At least they're leaving the archives up and searchable, though I figure they
know they'd be subject to the wrath of genealogists if they didn't.

------
mirimir
There's also the issue that archives may seemingly be up, but not at all
complete.

For example, many of the mid-90s posts about Jim Bell's "Assassination
Politics" proposal are missing from public archives of the cypherpunks list.
The explanation there seems to be proactive evidence destruction that occurred
as news about the ongoing federal investigation spread.

Also, it seems like some old Usenet stuff has disappeared from Google Groups.
There's stuff that I archived years ago that's no longer findable.

Anyone else noticed that?

~~~
joecool1029
>Also, it seems like some old Usenet stuff has disappeared from Google Groups.
There's stuff that I archived years ago that's no longer findable.

>Anyone else noticed that?

Yeah, like everyone who even casually uses usenet realized that Google Groups
search has been broken for some years now, somewhere in one of the G+
refactors it broke like 60% of the search and nobody at Google seems to care.

Google Groups at this point mostly exists as a frontend for clueless people to
stumble on and bump decade old threads that active subscribers often can't
even pull. It's honestly worse than the spam problem used to be.

~~~
mirimir
Huh. Thanks.

I guess that it's been that many years since I used Usenet.

If stuff is really gone, that would suck. Because I've read that Google has
the only comprehensive Usenet archive, acquired with Deja News, supposedly
going back to 1982.

~~~
joecool1029
> If stuff is really gone, that would suck.

It would. Though it's probably not. It's more likely they have their search
tuned not to burn extra cycles looking for archived shit not in RAM like the
web archive does.

That's my guess at least, there have been HN threads in the past where people
run obscure searches and it takes some time for Google to crunch it.

~~~
mirimir
Thanks.

So I wonder if there's a workaround to get deeper search.

Maybe I'll dig, and see if I can find how I scraped it.

------
syntheticnature
Looks like a lot of hard work has gone into this.

I found myself wondering, more versus future efforts, about the writing
already being on the wall for Yahoo Groups years ago (e.g. the multi-day
outage in 2017, IIRC). Why wait until the official shutdown notification
before getting a scrape of the public groups? As someone who had a modicum of
presence still on the site, it had grown increasingly prone to random errors,
and I even wondered whether one day it would go down and refuse to come back
up no matter how much effort Yahoo put forth attempting to revive it. That
could've easily been an even worse "Yahoo-geddon."

(Not that I expect the effort Yahoo would put forth would amount to much,
since in the now-defunct Yahoo help forums a Yahoo team member revealed Groups
had been "deprioritized" a few years back, i.e. no official support, no help
for groups with deceased mods, etc.)

~~~
kalleboo
> _Why wait until the official shutdown notification before getting a scrape
> of the public groups?_

Archive Team does have projects to scrape some sites that are looking like
they might not survive the long term (e.g. right now there's a LiveJournal
project) but they're all volunteers and I think they just don't have the time
to do everything all the time.

------
ComodoHacker
Contrary to the title, the post doesn't contain anything specific about the
progress of archiving. Like how much of the content is saved and how much
isn't.

~~~
themadprogramer
OP here. Ok, thank you, this is the kind criticism I crave!

I suppose by "vague" you mean there weren't any hard numbers.

Well I kind of linked the archive team tracker but you had to do some clicking
to find it, and unfortunately I didn't have a statistic on the SYG's progress
at the time of posting. However since posting I do have a rough idea of
things:

Archive Team's own Tracker reports 2.76 TB of data to have been saved. The SYG
team hasn't fully tallied up their data yet, but have counted the number of
groups they've retrieved and/or are retrieving from to be around 123K.

I have since inserted this data into the post and would like to thank you for
your brutal honesty :)

~~~
ComodoHacker
Thank you for the numbers, and for your much appreciated work.

------
tech234a
It’s great to see the community coming together to save the content. I’m glad
that they were able to save a considerable amount of the data.

------
fouc
That chart was interesting. Interesting how the growth of yahoo groups
memberships really dropped off right around 2014.

~~~
betamaxthetape
I don't want to say something I can't substantiate, however in 2013 Yahoo
rolled out the "Neo" interface to Groups that was vastly unpopular with
existing users [0], and many groups chose that point to switch away.

[0]
[https://www.theregister.co.uk/2013/09/03/yahoo_groups_neo_de...](https://www.theregister.co.uk/2013/09/03/yahoo_groups_neo_design_upsets_users/)

~~~
Izkata
It wasn't just unpopular, it was actually broken - constant javascript and
server errors that prevented things from loading.

