
Options for Saving Yahoo Groups Content - elorant
https://researchbuzz.me/2019/10/17/your-options-for-saving-yahoo-groups-content/
======
jefftk
I wrote up some instructions on using YahooGroups-Archiver and dealing with
the kinds of failures I've seen with it: [https://www.jefftk.com/p/archiving-
yahoo-groups](https://www.jefftk.com/p/archiving-yahoo-groups)

~~~
Alex3917
> They're not that wasy to read, because it doesn't do any kind of quote
> folding, but we can always do that later.

I can create an endpoint to bulk import/process content to FWD:Everyone if
there is enough interest. It's sort of the opposite of an archival product,
where it's designed to aggressively reformat content to improve readability.

E.g. here is a comparison between a thread in Gmail and a thread that's been
formatted with our tech:

[https://www.youtube.com/watch?v=zAXsHiqQm1E](https://www.youtube.com/watch?v=zAXsHiqQm1E)

[https://www.prettyfwd.com/t/Imac8ycyQe26vDZd8pEkMg](https://www.prettyfwd.com/t/Imac8ycyQe26vDZd8pEkMg)

------
girst
looks like the Archive Team is on it:
[https://archiveteam.org/index.php?title=Yahoo!_Groups](https://archiveteam.org/index.php?title=Yahoo!_Groups)

~~~
dredmorbius
I had experience working with ArchiveTeam on the G+ shutdown.

1\. They're _AMAZING_.

2\. They're _not_ superhuman.

3\. Major archival projects take time to prepare, set up, and execute.

4\. Only public content is even potentially accessible. Much can get lost.

AT did a stellar job with G+, it's still a far cry from the access that was
provided when Google hosted the content.

 _Google actively and by all appearances deliberately thwarted and frustrated
archival efforts._ This was a tremendous disappointment. It's quite likely
Yahoo will do similarly.

If you want _your own_ control over archives, you're going to want to use /
run tools directly yourself.

TL;DR: AT is a huge benefit, but it's not a total solution.

~~~
ninju
Just a question regarding comment format

Shouldn't the _TL;DR:_ message be at the top of the comment, thereby not
having to read the entire comment (since its too long :-)) to get the summary
statement

------
patja
This is the one I've been using to good effect:
[https://github.com/IgnoredAmbience/yahoo-group-
archiver](https://github.com/IgnoredAmbience/yahoo-group-archiver)

no Mongo database required, just plain old .eml files

You do need to pull some cookie values out for the authentication to work, and
the Readme is pretty sparse, but it does work.

~~~
superkuh
_oops, edit_ , Actually, I used this one: [https://github.com/philpem/yahoo-
group-archiver](https://github.com/philpem/yahoo-group-archiver)

Yup. I've used this one successfully with messages, message attachments,
files, and photos. It works.

That said, on yahoo's side, lots of message attachments the message meta-data
claim exists don't actually exist anymore because yahoo has lost or deleted
them. Same with some of the photos. But _all_ the messages and _all_ the files
are always there.

------
dredmorbius
I'm about seven months out of having gone through a similar process with
Google+. What I'd contributed specifically was in trying to create some
structure and process for people and organisations hoping to migrate _both_
content _and_ communities from G+. Success was mixed, but it was, I think,
better for what we'd managed to do.

A large part of that was setting up informational resources with an eye for
future migrations to use, learn from, and adapt these. In particular:

[https://social.antefriguserat.de](https://social.antefriguserat.de) is a wiki
focused (principally) on the G+ migration, but with some generalisable
principles. Please feel welcome to visit and add to it. (I'm the principle
editor/admin.)

One of the huge wins we had was a third-party end-user tool for data export.
I'm checking to see if there's a similar Yahoo Groups product.

[https://old.reddit.com/r/plexodus](https://old.reddit.com/r/plexodus) is the
"Google Plus Exodus" but wink-wink "public liberation exodus" subreddit for
discussion. I've already posted several items on the Yahoo Groups shutdown.
Again, for an out-of-band regrouping channel, it's an option for Yahoo Groups
users.

I also need to follow up with EFF and some other groups I'd talked to briefly
about the issue of migration off public web services.

------
ocdtrekkie
A number of years ago, I downloaded several Groups I cared about using
something like PG Offline (though I don't remember if that's exactly what I
used) that gave me an sqlite file. I ended up converting it into MySQL and
building a terrible viewer "app" with PHP that just navigated up and down the
messages.

I never totally understood what happened to some of my old Yahoo Groups
though. Several disappeared completely, and some despite being completely
inactive, persisted for years upon years.

------
luckylion
What's the reasoning for _not_ working with the internet archive or similar
organizations when shutting down a service like Yahoo Groups? It seems that it
would be much more efficient to work with them on sharing a cleaned DB dump
instead of having multiple groups put lots of unnecessary load on their
servers.

Given that they need exporters for individual data anyhow to be GDPR
compliant, it shouldn't be a huge project to tune them slightly.

~~~
girst
the cynic in me says: why risk legal issues, however small they may or may not
be, when doing literally nothing sidesteps that issue? if only 1 user thinks
of suing you, legal gets involved, maybe even external attorneys. also, who
will pay for possibly hundreds of man-hours of work cleaning up the data?

~~~
luckylion
As for cleaning up the data: I don't think that's a large task. Whatever is
published now has either been cleaned or is just as problematic from a legal
standpoint (possibly more so because it's shared publicly, not shared with a
contractually obligated third party).

You're probably right though regarding "not doing anything isn't doing
something wrong". But still, saving lots of computing resources, get good PR
and friendly news coverage, that has to be worth something.

~~~
ghaff
To a first approximation, there's no PR or friendly news coverage. 99.999% of
even the tech world doesn't care. Meanwhile, the effort involved is no one's
day job. Someone is far more likely to hear "Why are you wasting your time on
this?" than "attaboy!"

~~~
luckylion
Thanks for the perspective, that makes sense to me. I tend to assume that
having somebody work on stuff like that is a rounding error for even the
smallest department at Yahoo, but as you point out everybody has to justify
how they're spending their time to somebody.

------
raincom
is there a tool to migrate yahoo groups email conversations to GNU Mailman so
that I can search old conversations?

~~~
u801e
Or maybe create a usenet newsgroup in the alt hierarchy and post the messages
there.

------
downrightmike
Serious: What are you guys saving?

~~~
carapace
[http://tech.groups.yahoo.com/group/concatenative/](http://tech.groups.yahoo.com/group/concatenative/)

... oh, it seems it's already gone.

------
sarbaz
We should require websites to freeze and archive public content instead of
deleting it. We already require media published in nearly every other format
to be submitted to Library of Congress, and we should extend those protections
to our internet heritage.

~~~
ghaff
>We already require media published in nearly every other format to be
submitted to Library of Congress

No. We do not. Registering your copyright provides some advantages if you ever
want to sue for copyright infringement. But there is absolutely no requirement
to submit materials to the LoC.

~~~
sarbaz
[https://www.copyright.gov/mandatory/index.html](https://www.copyright.gov/mandatory/index.html)

> All works under copyright protection that are published in the United States
> are subject to the mandatory deposit provision of the copyright law

You own copyright over works whether you register them or not. To the best of
my understanding that means that you are always required to deposit works when
you publish them, whether or not you register your copyright. But IANAL, so I
could be way off.

EDIT: Here's a real citation[0]

> What is the difference between mandatory deposit and copyright registration?

> ... Optional registration fulfills mandatory deposit requirements.

So you definitely are required to deposit a copy with LoC whenever you
publish/distribute in the US

[0]
[https://www.copyright.gov/help/faq/mandatory_deposit.html](https://www.copyright.gov/help/faq/mandatory_deposit.html)

~~~
ghaff
Interesting. I've never heard of that.

So I stand corrected. (Sort of.) Apparently it's a very old provision. [1] It
doesn't really make much sense in today's world. Read literally, almost
anything you create--including online--must be sent to the LoC.

So, yes, it is a law on the books and maybe mainstream publishers comply with
it. But But pretty much no one else does AFAIK.

[1] [http://articles.ibpa-online.org/article/need-know-
copyright-...](http://articles.ibpa-online.org/article/need-know-copyright-
mandatory-deposit/)

~~~
sarbaz
Not necessarily, because it can be debated whether posting something on the
internet really counts as publishing, and whether most internet content is
original enough to be eligible for protection. A good example of people who
are probably violating this law would be webcomic artists.

But I do think the world would be a better place if we added new legislation
to require platforms to submit archives of their public content. Perhaps LoC
should also be funded to scrape the web to capture smaller sites.

