
Human-Assisted Archival of Yahoo Groups - jstanley
https://github.com/davidferguson/yahoogroups-joiner
======
jancsika
Wonder if we could make a deal with all those casino seo spammers from
indonesia. (Or whoever sells their service to them.)

If they will use their cache of yahoo addys to exfiltrate Yahoo Group content,
we'll give them a free month of gitlab user spam usage no questions asked.

~~~
ozfive
I agree there must be something that could be done to utilize the click farms
elsewhere.

------
jstanley
Leaderboard: [https://df58.host.cs.st-
andrews.ac.uk/yahoogroups/leaderboar...](https://df58.host.cs.st-
andrews.ac.uk/yahoogroups/leaderboard)

Unfortunately doesn't seem to be updated very frequently.

~~~
lucb1e
And my new yahoo account can't join any groups :/

~~~
username4567
I had this. Once I added a backup email address, it worked. I found this by
trial and error. It seems that Yahoo doesn't consider its email addresses to
be valid.

~~~
lucb1e
Did you see the sibling comment to yours, posted 7 hours earlier? Is that not
a different issue?

------
crazysim
Is there a possibility of asking ReCaptcha to disable their protection for
Yahoo Groups? It's a reach I guess.

~~~
ajayyy
That's Google, and that would be suicide to the brand.

~~~
kccqzy
Exactly. Especially since Google is trying to transform reCAPTCHA into an
enterprise offering: [https://cloud.google.com/recaptcha-
enterprise/](https://cloud.google.com/recaptcha-enterprise/)

------
philshem
As of now(), it seems the reCAPTCHA is over the limit

[https://github.com/davidferguson/yahoogroups-
joiner/issues/1...](https://github.com/davidferguson/yahoogroups-
joiner/issues/14#issuecomment-563060387)

~~~
kopiojnfaru
It works fine now, I have been joining groups for some hours.

~~~
tpmx
That's great! Thanks to whoever fixed this - at e.g. Google or Verizon.

------
jscholes
You can join a Yahoo group by sending a blank email to
"<groupname>-subscribe@yahoogroups.com". There is no CAPTCHA, only an
automated email confirmation which you reply to. Is there a reason why that
wouldn't work for this project?

~~~
philshem
If this works, would be much easier to automate the army of manual archivistas
that is currently using the chrome extension and blocked by recaptcha.

~~~
davidferguson
You can only join a very small number of groups this way before it just stops
working - we did try this out early in the archive process

------
joshuamcginnis
Could Mechanical Turk be used to do this? I'd be happy to make a donation to
help offset the cost.

~~~
O_H_E
This is the third time I have seen this suggested. Maybe get in touch, and try
to suggest that. I am also in a tight schedule these days and it would be
easier to help with money.

------
thinkingkong
How fitting that Yahoo archival needs to be organized by people.

~~~
Arbalest
How so? Are you referring to some kind of history of Yahoo?

~~~
thinkingkong
Yahoo started by building an index of the internet but instead of using an
algorithm, they primarily relied on people crawling the web and categorizing
websites and curating things. It was a people powered directory.

------
landryl
Jumped in when we were joining groups starting with 'S', and now we're already
at 'E'. Really satisfying.

Given that my account got a connection attempt from Sweden, I guess that's
where the archivist live. Hopefully he will have a nice morning tomorrow
thanks to the community.

~~~
kopiojnfaru
I started joining groups a few hours ago and now have 43. The first group I
joined started with U. Then there were some with T, some with S, and I'm at R
now.

I'd rather we archive large or relevant groups first instead of going
alphabetically and having to join groups with just 1 post.

~~~
davidferguson
It's going alphabetically through the group's people have nominated, and once
that's done, it'll go by group member count.

------
ozfive
Some things I noticed about google re-captcha already. It can't tell the
difference between a 3x3 matrix of a house and a bus, it doesn't know the
difference between scooters and motorcycles, it has a margin of error on fire
hydrants/usually the last image that it asks you to identify a fire hydrant in
and fades out will not bring up another fire hydrant so you can click verify
early.

~~~
CriticalCathed
I believe I read something on HN months ago that claimed Google would force
failures on otherwise successful captchas. Little bit of gas-lighting if true.

~~~
Jaruzel
Anecdotal I know, but I have definitely experienced this.

------
lopmotr
I wonder if groups with actual members will be OK. I'm a member of one and
have several years of message digests in my email. I just downloaded all of
them. For groups without members, or with no-one bothering to read them
anyway, maybe it doesn't actually matter?

~~~
Diagon
If they are public, yes, it should be ok. Regarding groups that are old or
inactive, A couple that I'm concerned with had important early discussions,
that people on other derivative groups refer to regularly.

------
OJFord
I haven't seen anything about this saga in mainstream news; I can't see that
it's going to be won without getting papers on-side.

~~~
Diagon
There are three that we've noticed so far:

[https://bbs.boingboing.net/t/as-the-end-nears-for-yahoo-
grou...](https://bbs.boingboing.net/t/as-the-end-nears-for-yahoo-groups-
verizon-pulls-out-all-the-stops-to-keep-archivists-from-preserving-
them/156924)

[https://www.zdnet.com/article/verizon-kills-email-
accounts-o...](https://www.zdnet.com/article/verizon-kills-email-accounts-of-
archivists-trying-to-save-yahoo-groups-history/)

[https://www.theinquirer.net/inquirer/news/3084557/verizon-
bl...](https://www.theinquirer.net/inquirer/news/3084557/verizon-blocks-
archivists-yahoo-group-content)

But yes, you are right. We need to keep pushing it into the public sphere.
Twitter, Reddit, I hate to mention FB here, but ...

~~~
OJFord
I don't really consider those 'mainstream news', I've only heard of zdnet - I
thought boingboing was a public WiFi hotspot provider.

I primarily mean newspapers. The Times (of London & New York), The Guardian,
but Bloomberg too.

The story is probably a wider point about the fragility of online information,
for which this is a mere significant event, but without that happening I just
don't see the give-a-shit count increasing.

------
aitchnyu
How can I recognise services that intend to live forever? Meetup used to have
reports and mailing lists but it went to a sinking company and is now a
pointless SPA which doesnt let Firefox log in. I know a group that acquired
several competing products, and they missed a crucial innovation which the
indies had for a while.

~~~
pjc50
Nobody and nothing lives forever.

For organisations, the best you might be able to do is some kind of co-
operative: it's much less likely to sell out (although not impossible), you
generally get a vote in how it's run, and since they're forced to be self-
funding you're not dependent on VC funding whims. With sufficient runway
transparency you can always know how far they are away from shutdown and how
much funding they need.

Twenty years ago (!) I helped set up a hosting co-op for university societies:
[https://www.srcf.net/](https://www.srcf.net/)

One of our specific aims was preserving continuity. Most societies are run by
undergraduates who do it for a year or two and leave after 3 years, so making
it as easy as possible to handle handover was a key feature. It's done pretty
well for something that pre-dates Facebook, Github, Myspace, and even Yahoo
Groups itself.

~~~
aitchnyu
I admire the neat design and a backend of (quoting your site) "the server".
Have you written an article about this?

~~~
pjc50
Oh, it's been years since I was involved with any of the actual running of it.
I don't even have a shell account any more. In the early days it _was_ "the
server", a spare PC that was donated. These days it looks like they have a
donated cluster:
[https://www.srcf.net/faq/about#system](https://www.srcf.net/faq/about#system)

The "backend" will be Apache. On day 1 we used SSI (server-side includes) for
"theming" pages, which were all in handwritten HTML. I suspect it's still like
that given the five blank lines before DOCTYPE. It looks like some bootstrap
CSS has been sprinkled on it since then. There's no front-end Javascript
because there doesn't need to be.

> Until 2006 we had just one server in use, kern (a dual Athlon 1.6GHz PC with
> 2GB of RAM and 400GB of disk). Before that we used to run on an ancient
> Intel Pentium running at 166MHz with 128MB of RAM. How times have changed :)

Indeed. That ancient system was perfectly adequate for serving web pages to a
few thousand people for light use. At the time I was carrying around the
amazing new thing that was a computer you could fit in your pocket and play
music illegally downloaded from the internet on. It was a Toshiba Libretto 30
with 8MB (eight megabytes) of RAM and a PCMCIA sound card.

Our systems approach to the SRCF was very much "what is the simplest thing
that could possibly work". Apache+CGI+PHP with UNIX user accounts will get you
a _long_ way if you let it.

The real achievement is political and personal. I'm amazed that they've always
managed to find good enough volunteer staff for the whole thing for twenty
years.

------
barik
How many more groups are left to join? There's no easy to way as far as I can
tell to get a sense of completion. The extension goes in reverse alphabetical
order, but also loops around again.

------
xianwen
Is there any market to run a self sustained website like Yahoo Groups? By self
sustained, I mean that there would be income that would pay the hosting costs.

~~~
lonnyk
Facebook and Reddit are pretty good at running groups.

------
sundarurfriend
Is there any way to make this extension work on Firefox?

------
Diagon
Three Ways You Can Help:

Help by Joining Yahoo Groups so the Archive Team can Download them (easy! -
this is the link in the OP here):
[https://github.com/davidferguson/yahoogroups-
joiner](https://github.com/davidferguson/yahoogroups-joiner)

(That's the most needed right now so the scripts can get access to the
groups.)

Help by Downloading yahoo Groups with the Archive Team's Script (not hard!):
[https://www.archiveteam.org/index.php?title=ArchiveTeam_Warr...](https://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior)

Get the word out/Call for Action (put pressure on Verizon!):
[https://modsandmembersblog.wordpress.com/taking-
action/](https://modsandmembersblog.wordpress.com/taking-action/)

Don't miss the sidebar with these links:

    
    
       https://modsandmembersblog.wordpress.com/media-contacts/
    
       https://modsandmembersblog.wordpress.com/contacting-verizon-directly/
    
       https://modsandmembersblog.wordpress.com/contacting-verizon-yahoo-stockholders/
    

Also, you can add these emails to the media contacts:

    
    
      "Reporter Katyanna Quach" <kquach@theregister.co.uk>,
       "Managing editor Gavin Clarke" <gavin.clarke@theregister.co.uk>,
       "Corey Wilson & Rachel Janc; Senior Director, Communications" <press@Wired.Com>,
       "Pitches" <submit@wired.com>,
       "Rich Woods" <rich.woods@neowin.net>,
       "Paul Thurrott" <paul@thurrott.com>,
       "Brad Sams" <brad@petri.com>,
        "Kate Rayford, Media Inquiries" <katie.rayford@slate.com>,
        "Bryan Lowder (LGBTQ issues/culture)" < bryan.lowder@slate.com>,
        "Torie Bosch (emerging technology effects on public policy and society)" <torie.bosch@slate.com>,
        "Jonathan Fischer (big tech, cities, media/internet culture)" <jonathan.fischer@slate.com>,
        "Susan Matthews, Health & Science" <susan.matthews@slate.com>,
        "Erika Allen, Executive Managing Editor" <erika.allen@vice.com>,
        "Katie Drummond, SVP, Global Content" <katie.drummond@vice.com>,
        "Press, US" <press@vice.com>,
        "Press, Canada" <presscanada@vice.com>,
        "Press, UK" <ukpressoffice@vice.com>,
        "Pitches, Culture" <culture.pitches@vice.com>,
        "Pitches, Tech" <tech.pitches@vice.com>,
        "Issues" <issues.pitches@vice.com>

------
ajayyy
Can you only join one group per account?

~~~
barik
You can join as many groups as you want. For example:
[https://df58.host.cs.st-
andrews.ac.uk/yahoogroups/leaderboar...](https://df58.host.cs.st-
andrews.ac.uk/yahoogroups/leaderboard)

------
abathur
An injunction would be nice, though I'm not sure if a court would see anyone
as having sufficient standing to stop the gears entirely.

~~~
pjc50
There's absolutely no legal basis for such a thing, though. _Legally_ it's
Yahoo's and they can shut it down and delete it tomorrow. It's only morally
that they have an obligation.

------
ozfive
On It!!!

------
yori
Wow! Kudos to the Archive Team on not giving up on this mission.

