Hacker News new | past | comments | ask | show | jobs | submit login
Yahoo Groups to remove all content December 14 (yahoo.com)
349 points by 83 36 days ago | hide | past | web | favorite | 224 comments

I have a group I still care about. Here's how I'm archiving it:

    git clone git@github.com:andrewferguson/YahooGroups-Archiver.git
    python archive_group.py <group-name>
See https://github.com/andrewferguson/YahooGroups-Archiver and https://www.archiveteam.org/index.php?title=Yahoo!_Groups

I'm using https://github.com/IgnoredAmbience/yahoo-group-archiver , it has support for databases and files which is important for the groups I want to archive.

It's not my project, but I'm working on improving it, somewhat fixed the database support yesterday. Trying to figure out how to add support for links, but I'm stuck at link subdirectories/folders; I can't find a way to list the links that aren't in the root dir with the API (though once I have the link name I can get them with links?filename=DIR/LINK.url)

Does that script even work for you? When I run it, I'm prompted for my Yahoo! password, but after typing it in the program just spits out a "Login failed" message. The problem has been reported on the issue tracker but there's been no activity on it since February.

You need to add some parameters for the Yahoo authentication cookie values. -tc and -ty should be set to the cookie values from your browser, after logging into Yahoo.

It also needs some try/except handling to be added for all the cases where Yahoo fails to provide the requested asset.

Once you sort those things out, it works pretty well.

Could you please explain what this means and how to do it? This is not documented on the project page, and I'm having the same issue as the poster above.

login.yahoo.com creates 2 cookies. The program's help identifies two flags called -ct and -cy which I assume you're referring to. Am I to understand that to properly run the program, the command should be entered as so:

./yahoo.py -ct %cookieOneContent -cy %cookieTwoContent -u username -p password groupname

I don't know which cookie goes where so I tried it both ways, and both with and without the -u username and -p password flags; none of them worked, only returning HTTP Errors.

There should be at least two cookies, one called T and the other Y, on yahoo.com, the content of these looks like a long query string, "z=something&a=something&etc". Use T in -ct and Y in -cy, like -ct "z=something&etc" -cy "v=something&etc"

Sorry this is so tricky.

-u and -p don't seem to be needed anymore? The script complained so I dropped them. Also, I used the "F" cookie for -ce.

So: ./yahoo.py -ct "T cookie"-cy "Y cookie" -ce "F cookie" "group name"

Ah, I see. It wasn't under login.yahoo.com then; the cookies I was looking for were under "yahoo.com locally stored data." This worked. Thank you.

e: actually, no dice. Got it to work on one group and then never again. A couple of the groups I've tried can retrieve messages, but never files or attachments. It always gives an exception when it gets to the files.

I'm offering a service that will extract all the yahoo group data from both private/public yahoo groups data for a fee.

My service: Includes - Public groups backup - Private groups extraction (will require admin credentials for the account) - Extraction formats include excel, csv and html

Not Included in the backup - Importing to new platform - Data Translation/Data cleaning - Additional post backup requests

Total cost: - Full backup $500

Payment details: - $250 for the first installment. - $250 after the last sample excel file is sent

After the first payment I will start work. Once I have completed the data backup, I will send a excel file with 10% of the final backup file as verification of my work. I will send the file once the final $250 payment is complete.

If your interested, please contact me at yahoogroupsbackup@gmail.com

- oran c

Noob question: how to convert the downloaded json file into a readable message?

First run those commands, then run

    python make_Yearly_Text_Archive_html.py <group-name>

Happen to have a screenshot of the output of that? Just from skimming the source, I would imagine that messages in longer threads get pretty unreadable.

https://i.imgur.com/s7IiHO6.png https://i.imgur.com/V6QzKdT.png Run these scripts with python2, as it uses print statements. Also

   pip install natsort

They meant a screenshot of the output file.

I tried this long time back. It requires a mongodb instance. I was not happy about it.

Doesn't work for me. The file scraper crashes with the default Firefox Selenium driver, and with the Chrome driver it immediately quits, falsely reporting that it successfully downloaded all the files.

If you don't use git authentication and use https, this works better:

    git clone https://github.com/andrewferguson/YahooGroups-Archiver.git

i'm getting this:

 YahooGroups-Archiver  master  python archive_group.py tns98

Archiving group 'tns98', mode: update , on Wed Oct 16 20:01:04 2019 Traceback (most recent call last): File "archive_group.py", line 143, in <module> archive_group(sys.argv[1]) File "archive_group.py", line 67, in archive_group max = group_messages_max(groupName) File "archive_group.py", line 89, in group_messages_max return pageJson["ygData"]["totalRecords"] UnboundLocalError: local variable 'pageJson' referenced before assignment

but nobody else seems to be facing this issue?

tns98 is a private group. " To archive a private group using this tool, login to a Yahoo account that has access to the private groups, then extract the data from the cookies Y and T from the domain yahoo.com . Paste this data into the appropriate variables (cookie_Y and cookie_T) at the top of this script, and run the script again."

that was it, cheers. didnt realize it was private.

I got this error when trying to archive a private group, see the README for how to archive private groups.

ah thanks, wish that was documented. thought it was public.

This is horrible.

Some Yahoo Groups must be still active as mailing lists, or at least an archive for some niche communities, and they have irreplaceable, valuable information. I don't know other groups, but I know the Tektronix oscilloscope group (TekScopes) has been on Yahoo for a decade, and it's often the only source of information about vintage Tektronix oscilloscopes that dated back to the 1960s, some were former engineers with firsthand experience who can help to fix your scope or identify a replacement part, and the mailing list archive has a lot of lost knowledge that cannot be found elsewhere.

Well, fortunately TekScopes migrated to Groups.io in recent years, which is good. But just think about other groups...

I've seen several fandom wikis recently which sourced things to Yahoo Groups comments from the creators. Presumably those statements will be lost forever soon unless someone spots those things and archives them.

In the present, it feels a bit silly to mourn random posts from writers and webcomic creators as a serious loss. But when I consider the impact of past artists' letters and journals (e.g. Tolkien), it feels rather different. Some of these silly things will turn into real academic and historical sources if only they survive.

> I've seen several fandom wikis

Internet subculture surfing and archaeology is my favorite hobby, I do it for fun, not for researches. I have seriously considered quit software engineering and move to social science, but it will remain as a hobby.

I've spent a long time browsing alt.folklore.computers and Usenet FAQs from early Usenet archives, or reading the long-deleted 4chan posts from Bibliotheca Anonoma [0], a repository of history and folklore from Something Awful, 2channel, 4chan, and looking up and laughing at memes that nobody remembers. Or read the early writings and thoughts of online personalities long forgotten.

The biggest frustrating in my exploration is meeting a dead link that hasn't been archived by archive.org, if it occurs, all bets are off, the journey ends officially. By erasing a webpage, parts of the collective history and culture is thereby removed.

I've also seen the death of endless small communities due to changes of circumstances, usually without a warning and happens overnight.

The lesson is that we should donate to archive.org today, just click here. https://archive.org/donate/

[0] https://github.com/bibanon/bibanon

Web archaeologists, that seems fun, necessary, and something way cool in 10 years from now

This seems fun. Any advice on getting started? It's so tough to find new sites. I tried making a website before to find new interesting sites, but it never took off.

Go on geocities search engine

I don't know about that. I recently found out how a webcomic I followed until the artist was forced to cancel it was supposed to end. It gave me some closure for characters I spent years caring about.

>This is horrible.

I would post a counter argument: this is inevitable. Anything stored on someone else's machine, especially for free, is subject to disappear at any moment. Even if Yahoo wanted to keep Groups active, a hardware failure or natural disaster could undo all of this in the blink of an eye. At least this way Yahoo is giving people the opportunity to archive and move things.

It should be the expectation that irreplaceable, valuable information should be archived somewhere that offers the hope of long term archival.

> a hardware failure or natural disaster

This is generally something that large "Internet" companies are handling well. A hardware failure or natural disaster wiping out a datacenter will generally not affect data that such companies want to keep.

There are several I'm on (canard-aviators and gns480-users), and they've been active for something close to a decade. (And there are still emails every day!)

Frustratingly, both of these groups have great content (manuals, tips, documentation) uploaded to Yahoo Groups. It's going to be a real loss.

You may want to start archiving them immediately using the advice from this comment. https://news.ycombinator.com/item?id=21272524

I think there's an important lesson to learn: unlike offline, many online communities only exist because people find it interesting, and there exist many things that no one will take care about. If not now, when? It not you, who?

For me, it's the local Freecycle community in my city.

Also, an area UU church still uses Yahoo Groups for several mailing lists. Trying to help them technically 10 years ago was difficult at the least; I gave up and eventually moved on. It took a few deaths in the congregation for them to update their website, for example. And now, the virtual death of Yahoo Groups will force them to change again.

Archive them! Please! For the good of the future.

They will apparently still work as mailing lists

It gets better... [facepalm]

From Twitter:

  Today it was announced that Yahoo! Groups is 
  shutting down, and taking with it a piece of 
  critical national infrastructure: 
  the Oftel Yahoo Group which is used for 
  managing UK phone number assignments. 
  Yes, really: See Ofcom's website

Original tweet: https://twitter.com/erincandescent/status/118458732359973683...

From what I understand, it will still work as a mailing list platform, just everything on the web will shut down.

“Anything you post on the internet will be there forever” meme is becoming less true these days. Closed gardens that get shutdown without any notice, dead links run rampant, full reliance on cloud vendors for data storage.

I don't think the meme was ever supposed to mean it's safe to rely on the internet to preserve your data. Instead, it means you should never assume you have the power to delete data once it's out there.

Funny how this is cutting both ways. If you want it gone, expect to be disappointed... and if you want it preserved, expect to be disappointed.

An alternative formulation is that if you want it gone, just wait, while if you want it preserved, save it yourself.

That's what I tell myself as I pirate everything online. Couldn't afford to preserve everything by buying it. The only way to guarantee preservation, personally, is to save it personally.

Even if you could afford to buy everything you care about, much of it is only legally available through DRM which limits or downright removes your capacity to ensure the availability of your copy to you.

Every so often you hear a story about a recording that was presumed lost but has been found on someones VHS collection[1].

Then there are examples like Marion Stokes[2] who recorded 71000 tapes over 30 years.

Fast forward to today (ha ha), and I do think there is some justification for downloading 'pirate content' for archival purposes. In the same way that I think LibGen is a very important project even if it angers some.

To be clear, I believe in buying the items I'm actually going to read or watch to support the creators. Not just pirating everything.

[1]: https://www.theguardian.com/science/2019/jun/28/apollo-11-ta... [2]: https://www.atlasobscura.com/articles/marion-stokes-televisi...

Human life is so far a game of cross-purposes. If we wish a thing to be kept secret, it is sure to transpire: if we wish it to be known, not a syllable is breathed about it. - Hazlitt

This is pretty true about reputation in general not just online

Or as it’s also called, “unreliability”.

yes, because they are both absolutes in opposite ends. if you want something to be perfect expect to be disappointed.

“Only wimps use tape backup. REAL men just upload their important stuff on ftp and let the rest of the world mirror it.”

― Linus Torvalds

The important part is that someone has to mirror it.

the bit you missed is that he implied your stuff has to be so good that people choose to mirror it!

From a security perspective you must assume "Anything you post on the internet will be there forever". That's also why we tell kids that it'll be online forever - because there is no guarantee we remove all copies of it.

However from a data integrity perspective you must hold the opposite perspective "Assume your copy is the only copy that exists", as you cannot guarantee other's have a copy of the data or will keep their copies.

I'm personally very worried about what the internet will look like in 15 years. How many awesome communities, blogs, projects, etc. will disappear?

Yeah, I think that people really tend to overlook how much of our common history and heritage exists exclusively as bits on some old hunk of rust, and how frequently those bits are permanently and irrevocably vaporized.

Shutdowns like this are well-planned and orderly compared to the much-more-common-than-anyone-wants-to-admit "woops, I ran rm -rf in the wrong directory", "woops, I ran fsck on a mounted disk" (which I literally did last night, fortunately without the deserved vaporization), or "woops, someone sold the wrong server and the hard drive is out in the wild now", followed up by the inevitable "woops, turns out the backups have been broken for the last 32 months, probably should've been checking that".

Historians are going to have a hard time moving forward. Until the last 15 years or so, when someone died, you could go collect "their papers" -- journals, letters, records, photos, and so on. The digital equivalent is scattered across dozens of personal devices and thousands of servers. Even if we're generous and assume that something like 20% of it will still exist when a person becomes deceased, the data is comingled in someone's crappy MySQL somewhere, or it's stored behind passwords, crypto keys, and other contraptions that are virtually unbreakable as compared against something like "call the locksmith".

As people age, I think these concerns will begin to be taken much more seriously, but it shows the crucial importance of fully utilizing things like the GDPR-mandated "fork over your data" export and becoming proficient in backup, archiving, and system forensics.

This is an area of some interest to me, so when my grandmother died a couple of years ago, I worked quickly to get a bit-for-bit copy of every user-accessible digital device in her house. I got almost everything, but a distant family member absconded with the iPad, which had been her primary computing device for the last couple of years of her life, before I could even get on a plane.

[Tangential pro-tip here: large files can and do lose bits. Create multiple copies, create parity files, and don't rely blindly on either RAID or filesystem-level checksums.]

Fortunately, she was in her mid-90s and while she enjoyed technology, she also maintained handwritten journals for over 50 years, so we still have the main records of her legacy. But things will be different for people not-all-that-much younger.

I'd just hope Hacker News never "upgrades" itself in the next 15 years so I can still find archives of previous discussions, including this thread.

Hacker News is the center of technology, just like the Usenet Groups were the center from the late 70s to the mid-90s, that must be preserved as computer history. Our generation can see how the Internet started and how 4.3BSD worked by reading the Usenet archives, the future generations deserve to be able doing the same, learn how the Silicon Valley in the 2010s worked by reading Hacker News archives.

> Hacker News is the center of technology

Let's not pat ourselves on the back too hard.

Okay, not the center, but possibly the largest general-interest forum near the center. Can you name another Internet forum where discussions are similar to general-interest Usenet tech groups, where even internal details or confidential information about tech companies such as Microsoft [0] or Oracle [1] leaks out from time to time?

Mailing lists? Well, yes, but there are all single-topic forum.

Twitter? Probably, but it's not a forum, and it's even responsible for killing the forums.

Reddit? Those programming groups are large, but they are closer to newsfeed where you can submit links than a forum (Hacker News is also a newsfeed, but it's discussion-orientated, some people don't even read the news). Although some communities such as r/bitcoin has plenty of important history preserved here, but not a general-interest tech forum.

4chan? It has some historical significant for Internet subcultures, but probably not in general interest.

Chatrooms? Yes. Freenode is still big, Discord has a lot of communities, but they are not forums.

My conclusion about Hacker News being the center was after a long depression that the social media has killed large forums where long opinions are elaborated. But Hacker News is still kicking around, like it was 1995.

[0] https://news.ycombinator.com/item?id=5689391 (original comment deleted, reposted here: http://blog.zorinaq.com/i-contribute-to-the-windows-kernel-w...)

[1] https://news.ycombinator.com/item?id=18442941

HN is just another slashdot. And slashdot looks like it's still going fine: https://slashdot.org/

Yes, but I think it doesn't replace HN.

The first difference is that Slashdot provides a summary of the news, so most Slashdot users do actually read the news, and makes a few points down in comment; unlike Hacker News, where many don't and just wait for a chance to jump in and post comments. Therefore, the contents are often of a different nature, HN is closer to a forum.

Also, the current Slashdot is truly different from early Slashdot. It seems to be more corporative and commercial. On early Slashdot, you can occasionally see interviews and book reviews dedicated to the community, you rarely see those post nowadays, if any. Also, in recent years, Slashdot implemented (or refused to implement) changes and created a strong opposition within a subgroup of users, and gave to birth of forks of smaller site.

While HN is still here and living well (I do know Lobster is a HN fork).

So both HN and Slashdot needs archives, and they should know better not to push for destructive upgrades that makes reading historical posts difficult. Both work well today, but I don't know what would happen in the future.

HN is pretty corporate for YC companies and stuff.

Slashdot has been a walking dead version of its' former self once it sold to SourceForge.

VA Linux, the company behind SourceForge, bought Slashdot back in 2000, so that kinda sounds harsh to me -- I mean, the site only launched in 1997. :)

I think it slid into gradual irrelevance in the following decade, but I'm not sure there was any one moment that did it. Personally, I'd date its decline to 2011, perhaps not coincidentally shortly after founding editor Rob Malda left.

You give me to much credit, chipotle. Or perhaps I should call you... El Coyote!

I'd say it was the forced rollout of Beta. I haven't been to Slashdot since... Soylent News is high-quality and not too distracting.

I don't honestly remember much about Beta other than not having the impression it was an improvement. :) But I don't think Dice, which bought the site in... 2012, I think?, was a very good steward of the brand.

I'd never heard of Soylent News. I can't decide if I love or hate that name. :)

Yep, I have a user in the 12000 uid series there and it has not been the same since Rob (CmdrTaco) left.

HN is the closest I can get to a replacement.

It still exists, I guess. It's a pale shadow of what it once was, and not even close to as useful or fun as HN.

And Slashdot looks like it's still going.

To me it's more of an very early reddit.

I agree with the general sentiment that more often than not it seems like an “upgrade” in the user experience is actually an enormous downgrade.

However it is important to not romanticize anything on the Internet too hard, speaking from my own disappointment. Hacker News for the duration that it is here is a valuable resource. Exploit it while you can, and be prepared to move on when you can’t.

> Exploit it while you can, and be prepared to move on when you can’t.

I'll be fine for moving to whatever platform in the future, as long as it uses an open protocol, NNTP or HTTP, it doesn't matter.

My problem is when websites start pushing destructive upgrades that make reading historical posts difficult, for example, excluding them from index or completely screwing up the formatting.

Hopefully the Internet Archive Team will take care of it, if HN ever does it.

Nah my friend.

I mean you have to physically mine anything you want to save into an archive of your own. Exploit.

Sorry if this seems cynical, but if anything on the Internet is worth referring back to, it’s worth making your own copy or you should just assume that at some point you’ll never see it again. Fortunately it’s just text and hyperlinks. Even screenshots work if you OCR them somewhere down the line, though I don’t employ that method.

Au 'voir ma bien-aimée Slashdot.

And Silicon Valley municipal politics and health fads totally unsupported by anything other than good feels.

“Anything you post on the internet will be there as long as it's embarrassing and gone as soon as it would be useful”

It was never, ever true. Information on the web rots away at an amazing rate. For example, one of the first big tech stories covered on the web was the Microsoft-Netscape browser war and the resulting MSFT anti-trust trial. Have fun hunting down the original reporting. Well over 90% of it is gone.

Try a Google search with date range, plenty of original results here:


I found this when looking for information on the fact that there was apparently a lawsuit over gnu Emacs on the gosling thread.

The meme is sorta true thanks to the Internet Archive. For me, it even has a snapshot of a personal site from 18 years ago in it.

Yep, same with me. I managed to get my old programs that I wrote 10+ years ago and feared to have lost along the way back thanks to the archive.

Most of them even with the source code, as I released both the exe's and the source code back in the day.

Good reason to have "Export Data" on every app you use (or create) and make it in a format which would be fairly easy to load somewhere else. Much better to make it easier on your users than to try to wall them into your product; your product will eventually disappear. Better to make quality products than dark patterns.

I’m noticing this whenever browsing a website that has a curated list of links from 8+ years ago. Take Metacritic, for example. Try clicking any of the links for a game review on a system from 2 or 3 generations ago. I once tried this for a random, not-so-exotic game and like 80% of the links were dead.

This is a good thing. You shouldn't have to answer for every little thing about your past, especially if you have already owned it and moved on. the issue is that most people love drama and continue to bring things up and demand a response when it has already been responded to. Just go onto Youtube or Twitch and you'll see what I mean.

Not necessarily a bad thing.

Not necessarily a bad thing for individuals, but definitely a bad thing for Internet users as a collective. Some Yahoo Groups are still active and has irreplaceable, valuable information that dates back to 2000.

We should just have stayed with NNTP and usenet in general but it had a big flaw. It could not be controlled by corporations.

It's a miracle that email has survived to this day using open protocols, but they're desperately trying to control that too using all kinds of blacklists and tricks to keep "regular" people out of setting up a private server.

> “Anything you post on the internet will be there forever” meme is becoming less true these days.

That applies only to embarrassing stuff.

"Anything put on the internet can be used against you forever"

Seeing this made me donate a small sum to the Internet Archive, maybe you (dear reader) want to do it too? https://archive.org/donate/

If one is able and willing, a monthly recurring donation helps them budget for the future.

I just did the same. Thanks for the nudge.

Don't know if it's still true, but i heard once the the UK phone number portability system was run on a yahoo group.

(a quick google shows the rumors were true.. https://www.ofcom.org.uk/__data/assets/pdf_file/0027/56646/s...)

That is hilarious.

"A review might consider whether it is befitting for the world's sixth largest economy to manage critical national infrastructure via a Yahoo group but we would hope that is obvious."

As an email mailing list, the Yahoo Group will continue to work. The website functionality is what's being lost.

Not trying to spam.

I'm offering a service that will extract all the yahoo group data from both private/public yahoo groups data for a fee.

My service: Includes - Public groups backup - Private groups extraction (will require admin credentials for the account) - Extraction formats include excel, csv and html

Not Included in the backup - Importing to new platform - Data Translation/Data cleaning - Additional post backup requests

Total cost: - Full backup $500

Payment details: - $250 for the first installment. - $250 after the last sample excel file is sent

After the first payment I will start work. Once I have completed the data backup, I will send a excel file with 10% of the final backup file as verification of my work. I will send the file once the final $250 payment is complete.

If your interested, please contact me at yahoogroupsbackup@gmail.com

- oran c


9 years later and still spot on. Just waiting for the day they “sunset” Yahoo! Mail and delete everything.

Yahoo Mail already recycles inactive accounts after 12 months. So if you haven't logged in for a year, they'll unsubscribe you from all the mail they can find in your inbox and delete all your mail, then let someone else sign up with your username, receive password reset emails for your accounts, etc.

I was able to password reset my ancient ebay account by recreating my old email on yahoo that I had deleted 10 years ago. This impersonation issue is a real problem because I used it to get back into that old ebay account.

Yikes. I was still using an old yahoo address for some old accounts. Just signed in and you are correct, they wipe the inbox after 12 months! The accounts I use it for aren't important, but I guess I need to migrate them to another email provider this weekend.

dude, what? That's pretty bad... Nice way to "recover" old accounts on various sites, I guess, eh? Jeez...

fastmail.com recycles accounts too once you stop paying.

Somehow, I got upset at the thought of Yahoo! wiping accounts, but not of FastMail recycling accounts. I have some weird psychology going on there. Somehow I have the expectation that a free service would keep my stuff for me indefinitely, but not that a business would when I stop paying.

Do you know if they recycle immediately or do they give some time to catch up on payment?

What would suck is if there are services that don't allow you to change your email, which might be used to identify yourself to their services.

Oh wow. Any links to read more on this?

So either don't stop paying, or move everything out before stopping?

On the other hand, I guess if you have limited resources, why would you not optimize the use of them?

I registered a Fastmail account a year ago. Next thing I know, I had multiple stores (Steam, Target, etc) sending me e-mails that relate to a specific user account. Sad.

In the meantime, they seem to be sneaking in charges for things. I saw a Yahoo Pro Mail the other week. I can’t imagine the circumstances that triggered that as I’ve barely glanced at my Yahoo Mail for years. Customer service claimed to have trouble dealing with it. Eventually had the credit card company deal with it.

I still use the Yahoo! mail address that I registered in the year 2000. Around 2008, back when it was still a decent mail experience, I actually paid something like $25 for a pro account for a year, just to support them somehow. The payment hid the ads in the sidebar.

Oath went all in on mining the data and showing me useless promos. I have a gmail account that I use for all my personal correspondence now, but my yahoo account is registered with 20 years worth of signups and accounts. I have been counting the days where I have to figure out how to migrate all those accounts to a different email. When I need to do that, it will probably be under a domain that I own so that I never have to go through with it again.

I forwarded any emails that contained info I might want to search for. When tabs got good with Gmail I pretty much stopped using Yahoo for signups and the like. Whatever still defaults to Yahoo isn’t anything I care much about at the moment.

I should have a backup address but I should probably just use Microsoft for that.

About a decade ago I archived many Yahoo Groups I liked with yahoo2mbox. No idea if there are any similar scripts now.

I tried using hypermail to convert some of the mbox files into HTML so I could post them online, but for some reason hypermail doesn't seem to work on any of them. Thunderbird reads the files fine, though. If anyone has any ideas about why, let me know. I could even provide a sample mbox file if you email me (address at website linked in my HN profile).

The worst part is that there's one private Yahoo Group that I never was able to get access to that apparently was important in a hobby I've participated in. I guess that one's going to the bit bucket...

Edit: Archiveteam has information about other archiving programs: https://www.archiveteam.org/index.php?title=Yahoo!_Groups

I did the same sometimes ago. I had a very large number of messages so I had to break things down into separate mbox according to years and months. I then just loaded the file into the email client. I wanted to sync the mailbox with Gmail (that would be closest to getting it back online) but never had the time. However, I wish there was an exporter for Google Groups but there isn’t any as far as I could tell.

If you can read them in Thunderbird, sometimes that's all it takes. I had to save a friend's soon-to-deleted university account. I had another IMAP account also configured Thunderbird. I was able to drag and drop from the soon-to-deleted account into sub-folders of the IMAP account.

If I understand you right, you're suggesting that I get Thunderbird to process the mbox file by moving it into another folder. Reasonable idea, as hypermail seems to work on mbox files generated by Thunderbird. I'll give that a shot if the solution posted by jefftk doesn't pan out: https://news.ycombinator.com/item?id=21272524

Looks like this Python script can generate HTML files, so no need to use hypermail.

What's the point of Yahoo at all anymore? Finance? I am at a loss of what it is they still even do.

They target and serve ads. Yahoo is still a key piece of Verizon's strategy for making money from internet advertising. Verizon, as a wireless carrier, knows where you are most of the time, and the various internet properties of the Verizon Media Group (https://en.wikipedia.org/wiki/Verizon_Media) know a lot about what you do online.

Run a pretty damn good implementation of Fantasy Sports.

Not sure about how the rest of the company does, but in the iOS App Store, the Gmail app has 147K reviews and Yahoo Mail has 2.1M reviews. I think that's a decent proxy to tell us that Yahoo Mail is _hugely_ popular still.

Note that app developers can choose to "reset" their reviews whenever they publish a new release, so those numbers might not be representative of the number of users each has.

The number of reviews can also depend heavily on whether and how often the app asks a user to review it.

Do you need an app from the store for either of those things on iOS? Seems like something the built-in mail would handle fine (and in all probability, better).

Reviews costing a dime a dozen i don't really think it's a valuable indicator of success, use or quality.

Fake App Store reviews cost $0.50-2.00 each, roughly.

And they are very difficult to hide at that scale.

Obviously it depends on where you purchase your review service so it's hard to give definitive prices, but I'm looking at a listing offering 1-2K AS review bundles for $150-$200, which averages between 10 and 15 cents. No idea on the quality, but IMO it doesn't look like fake reviews often try to hide themselves.

Thats weird because I have a Yahoo account that I manage through gmail... I am not sure how that works haha.

My Yahoo! is still useful. That's my customized web portal (I know, antiquated word), where I collect and access my RSS feeds. I also use Yahoo Mail and get my sports updates from Yahoo Sports.

Apparently Yahoo is still big in some parts of Asia, like in Japan for example.

As for the rest of the world, I don't see the point of Yahoo anymore.

Yahoo Japan has nothing to do with the American Yahoo tho... It looks beautiful btw: https://www.yahoo.co.jp/ - still using the old look.

"In 2017, Verizon Communications purchased the core internet business of America-based Yahoo!, and merged them into Oath, Inc. Yahoo! Japan was not affected. It continued as a joint venture between Softbank and what remained of Yahoo! Inc., renamed Altaba."

more info: https://en.wikipedia.org/wiki/Yahoo!_Japan

Notably, prior to its acquisition Yahoo! (America) had a 35% stake in Yahoo! Japan along with 24% of Alibaba. Even while Yahoo! was independent and turning a profit, it was valued as an albatross dragging down Yahoo! Japan's prospects to the tune of about negative $4 billion.

People were suggesting that a holding company should be broken out to manage all three entities without doing any business of its own, so that Yahoo's terrible prospects couldn't cut into the value of the other two. (Since an independent business can't be worse than worthless to shareholders, it would have immediately switched to a positive valuation.) Instead, the market skipped a step, divesting the functional businesses to enable a positive-value acquisition of the remains.

What's fascinating is that Yahoo! Japan does something very similar to Yahoo! America, but one of the two profitable businesses was valuable while the other was understood to be doomed.


Archive team[1] has some tools for archiving groups (if you've got some you care about.)

[1] https://www.archiveteam.org/index.php?title=Yahoo!_Groups

Thanks for the link.

I'm into hardware synthesizers and there was a lot of groups exchanging patches, utilities, fixes, repair tips, mods and what not in these groups, a LOT of extremely useful information is going to be lost, unfortunately.

This is a terrible news.

This is what I fear will happen some day to Stack Overflow (and other members of the Stack Exchange network), with HN, and with Reddit.

There's too much great content in private hands.

Stack exchange puts their data dumps into the Internet Archive [1]. Others have done the same for HN and Reddit. I’m slowly archiving all of Imgur and Reddit’s image hosting system. ArchiveTeam is typically on top of these sorts of things though, unless a major web property goes dark unexpectedly.

[1] https://archive.org/download/stackexchange

> archiving all of Imgur and Reddit’s image hosting system

that's a huge task? where do you have funding for that?

Archive Team routs its collected data to the Internet Archive, with tools that I believe any individual could apply independently. As far as storage, apparently the Internet Archive has a $10M budget, which will go quite a long way if you accept slow or even offline storage for the content.

But I'm still not sure how an individual would extract the images fast enough. Archive Team's largest project was ~1.5 petabytes of Google+. In 2012, Imgur had 300M images uploaded, which at 5MB per image would be 1.5PB. Simply keeping up with daily throughput would be an enormous undertaking, much less digging out all of the historical content.

Given the fact 10 years ago Y! Groups had over 4 Petabytes of data this might be a record for them

Storage is cheap when host it yourself.

Wait, are you trying to get everything from these services, or just some portion? (Oldest, highest-traffic, lowest-traffic, etc.)

A quick-and-dirty estimate says Imgur was adding perhaps 1.5PB of data per year way back in 2012; that much storage is going to run you $30k even on offline tape. Downloading that much data is more doable than I expected (1/3 of a year on a gigabit connection), but just keeping up with Imgur's new uploads sounds like a monumental task.

(It's a good task, and it does seem possible for one dedicated person. I'm just really curious to hear how you're managing it!)

LTO8 for cold storage, Backblaze styled pods for warm storage. I’ve had some help in acquiring supplies and equipment. All of the racking, colo, networking, and sysadmin is me for my gear (and my wife; she is a cable management fanatic, the right color for the appropriate VLAN!).

Is this your job?

Just one of my hobbies (digital preservationist).

crazy. you work with archive team? I've wanted to get involved with that for a while.

Some of his favorite porn on Imgur removed and he vowed to never let it happen again.

The writing has been on the wall for years. I use Yahoo Groups for several local community organizations and features would just randomly break for weeks at a time. There was zero support from Yahoo; it's obvious they no longer had any engineering resources assigned.

Is Google Groups the only practical alternative now? (Some people don't have Facebook accounts so that's not an option.)

Or just an email list Depending on your needs. Though SPAM has made that harder without a real commercial service.

Spam isn't really a problem for email lists you have to sign up for unless you have a really large number of subscribers. Spammers need zillions of addresses to make it worth their while to do anything.

It isn't a problem at all for closed lists...

Yes, the anti-SPAM efforts have made running your own Mailman (or other list service) instance an increasing pain. Many providers now silently drop list email, even if you have jumped through all the approrpiate hoops.

We also need file hosting and calendars in a single integrated service so an email list doesn't meet our needs.

Google Groups isn't available in all markets (e.g., China).

There are some other alternatives like groups.io. This is nice for some open source projects that want to have people all over the globe use it.

Thanks for the recommendation, Groups.io looks like a nice alternative.

meetup.com is charging a fee, yahoo group is closing, google group is restricted in certain areas, thanks for mentioning groups.io

we need a low-cost or free alternative to online groups and event meetup, that is not instant-messenger-alike(facebook,slack,etc)

The issue is mostly not the availability of often free software or even the cost of hosting—especially for non-video content. Video isn’t really a big problem either unless it goes viral.

But you lose the discoverability and network effects of a centralized service.

It'd be interesting if there was a discovery service, compatible with all open source community software out there.

So one could search for local communities, like at Meetup.com or Facebook, and topics like "running" etc to find local running group communities?

Any thoughts about how to make that happen?

Look: https://news.ycombinator.com/item?id=21259267

> "Building another event organization platform isn't the solution" > "open-source aggregation [of different event feeds]"

I’m not sure how much of a difference there is between centralization and aggregation. Especially if it’s not just a read-only feed.

You can basically aggregate tomorrow with RSS.

Doing that for oneself is pretty simple I would think,

and doing this for other people, so they find what they are interested in, in cities far a way or in other countries, is pretty hard, I think.

That was pretty much my point. Aggregation is technically simple. Creating a locations that's the go-to for people running and planning events is much harder because it's basically a network effects problem.

I'm trying to create a Pay-what-you-want alternative to Meetup and Facebook Events,


It's an open source cross between StackOverflow, Slack, HackerNews ... and soon, I hope, enough features from Meetup.com to be an ok replacement. (I'd like to create my own event communities, using this, ... e.g. a co-working at a café group, and sketching group etc)

Build an NNTP server. Works great, easy to use, easy to backup, little resources, many web frontends available, even better used with a proper Usenet reader.

What are the features that make something like Yahoo! Groups so appealing? Is it just inertia? Genuinely curious, given the recent spate of group-based service shutdowns (Meetup going paid, Google Bulletin Board shutting down, now Yahoo).

I guess I don't really understand why something more modern like a subreddit or even an old-fashioned mailing list couldn't take its place.

Yahoo Groups was great (when it worked) because it wasn't just an email list. It also had file storage and a group calendar with notifications. Having all those services on a single site made usage and administration easy, even for non-technical people.

Yahoo Groups is a mailing list!

* blatant plug *

I run Gaggle Mail (https://gaggle.email) which does paid group email (free for groups of less than 20).

It’s really aimed at non-technical folk so not really the HN crowd but thought I’d mention it.

So Yahoo doesn't do their directory any more, they don't do groups, and they don't have their own search engine. Why are they still keeping the lights on?

Yahoo Finance. Yahoo Sports (including Fantasy and there’s also Rivals which isn’t under Yahoo banner). Yahoo News. Yahoo Mail. Yahoo Search isn’t their own but it’s still decently widely used when comparing it to everyone outside Google and Baidu.

The Yahoo overall domain and site is still a top 10 visited worldwide site. I doubt it’ll drop out of the top 20-25 for a while especially if you exclude porn sites.

Yahoo combined with AOL is still a top 5 web advertising company. Google, FB are clearly ahead. Amazon is pulling ahead. Microsoft is there. Then it’s mostly relatively speaking smaller operations like Verizon Media rounding out the rest that have close to or above 1% share.

Yahoo Finance is quite good.

The announcement is confusingly worded, but if I'm parsing that right, Yahoo intends to nuke everything except the ability to mail a defined bunch of people from Groups? So no more web UI with message history, much less attachments, files, etc.

Pretty sad news. I get the feeling that, in general, no company wants to give users a way to share free-form content and have ownership over that content. Every new "social media" platform that pops up allows sharing only a very strictly-defined set of data, shared only in a limited way.

A lot of the open-ended nature of sharing on the web has gone by the wayside, while extremely narrowly-focused sharing methods take their place.

Flickr for example feels like some kind of relic, a holdover from an earlier time, where you can actually _join groups_ and share photos (incl. metadata!) in there and have discussions in those groups. I'm very thankful it still exists, and I dread the day it is shut down.

I used to spend a ton of time browsing Flickr groups related to my interests (R.I.P. "Macintosh" group!) as it was a perfect way to find great photos and some interesting discussion you cannot find elsewhere. I've tried to do the same on Instagram over the years, and it's just not the same. Browsing everyone who tagged their stupid coffee photo #macintosh is not even close to the same experience, and sharing a conversation with people is simply not possible in that context.

Not sure what else to say. It's just another casualty of the web I grew up with. The web I watch slowly disappear. :\

Of course no company wants to, because more money is made through a combination of online targeted advertising and data mining. This of course has encouraged walled gardens. Since this is where the money is, this becomes where the marketing is and where the effort to capture new users through friendly methods and appearances are.

It's amazing to believe at one point, this used to be the de facto source of a lot of the information I would search for on the net. Oh how the times have changed.

RIP Yahoo Groups

The last traces of tech-centric Internet are being done away with... Only the ad-ridden cesspool remains. Sigh

Dunno if I'd call Yahoo! Groups "tech-centric."

There goes a lot of collective history... I hope archive.org is able to snapshot it.

Why does Yahoo always shut down their most useful products? They shut down Yahoo Boss Geolocation services, which at the time was among the most accurate geolocation services (look up an address, and get a latitude/longitude position). Besides Google which was more accurate, but had a restrictive license on their API, it was the best out of all we tested.

Back around 2000, I used yahoo groups a ton. Good times, now most of those groups (synthesizers) are on Facebook.

I really dislike how so many groups are now Facebook based. I miss being able to have emails from groups. Some were important enough that they went to my inbox, some had specific folders so I could check them when I needed. And my email is something that's already a part of my life. It's so much less efficient to use Facebook groups.

I agree. I have had a folder in gmail for synthesizers for probably 15 years now. I am able to research topics this way and it feels fabulous.

Facebook being so closed is a real problem.

Is there any value left in Yahoo!'s services at this point? This is a genuine question.

Yahoo missed the boat in terms of their real asset, which was identity.

Microsoft, Google and Facebook ran with it. The rest of Yahoo is just an irrelevant husk.

They also had a friend network in Yahoo! Messenger, but no one was able to turn that into a true social network.

They killed off all of the chat rooms on a mid-December as well. In fact, December 14, 2012. Curious. Watching Yahoo wither slowly over the course of so many years is fascinating in a morbid sense.

I am told that they had some very interesting technological endeavors, but this was not sufficient to save them, apparently. One clue has been watching their home page and the frantic amounts of "partnering" they've done, to the extent that they pushed much of it even out to the chat clients. That isn't the whole of it, but it does seem to point to a kind of desperation on their part.

Based on the yahoo post Groups is not going away, they are just disabling all the features except the ability to be a web only no email private group forum. Previously you could upload files to common areas, have polls, subscribe by email. The groups will still be there, but will be much less useful for many people. However the functionality will be similar to say here, people can create accounts and post messages.

Disabling links is unclear. Each Groups I think can have a list of links to things, but maybe they are saying they intend to filter all posts to remove any url references?

It looks like it won't have a web interface or archive. It's becoming email only with no web archive history:

Will I be able to use Yahoo Groups going forward?

You'll still be able to communicate with your groups via email and search for private groups on the site.

It seems like email is staying, too.

It's weird because it says "Email Updates", "Message Digest" and "Message History" will go away.

As I understand it, email updates is when you get the emails sent to you from the list, message digest sends several at once rather than one at a time, but what is message history? Is that the archives on the web site? So you can be invited, you can join, you can send emails, but no one can receive them and there's no archive of them apparently? That can't be it. Their description of what is happening is completely unclear.

There seems to be a general trend from public to private / limited-group discussion platforms, at least among major providers. See also the G+ shutdown.

Any Yahooligans able to comment?

Not a Yahooligan, but a comment from a staff member on the now-removed Help Forums about two years ago indicated that Yahoo had "deprioritized" Yahoo Groups and would be providing only the most minimal of support. This feels much like the AIM shutdown, even if some things will continue to work for an undefined period of time.

I'd posit Reddit as a counterexample. Discovering and participating in extremely niche group communities has never been easier.

While reddit probably works well for very large groups, their 6-months-and-the-thread-is-locked policy is very bad for small ones... not to mention that it violates netiquette !

Reddit is comparatively (revenues, staff) tiny.

Large firms are behaving in more risk-averse ways, it seems.

At least some are.

As of last year, Reddit is (was?) the 3rd-most popular site in the US, surpassing Facebook.


Although today's rankings place it at #6, just behind Facebook at #5 and Yahoo at #4 (srsly?).


By users and visits, yes. By revenues and valuation, no.

Users and visits pose little loss risk. Revenues do.


2019 Revenues: ~$100 million.


FB 2018 revenues: $55 billion.

Your original post was about a trend of users moving to private communities. I don't see why revenues and valuation matter for this?

No, my original comment was about major providers shifting to non-public channels.

G+, or its G-Suite successors, still exists, but as closed networks. Similarly Yahoo Groups.

In looking at G+ Community size, open vs. closed participation was a huge factor in membership. Simply open vs. closed (mod approved) membership had a substantial impact:


So you're arguing that Reddit is not a "major provider", but G+ is/was...?


As I'd hoped I'd been sufficiently clear on.

Yahoo Groups came from the acquisition of eGroups.com, one of the very early VC-backed pure internet services. Sold to Yahoo for $432m (in stock, sadly for them).

TBF Yahoo stock performed very well at some points since then so it might have worked out well for them if they sold at the right time.

Not that this is a the best example of this, and I'm asking this as an honest question: at what point should we be OK with data being ephemeral and legacy services dying? With things like Yahoo! Groups, AIM, etc., the world has mostly moved on, and while these announcements bring back some memories, I don't necessarily see them as a huge loss, either.

Does anyone have any suggestions for a free alternative? In an ideal world I'd be looking for a paid one but I'm a member of a community group with no real affiliation, so there's no mechanism by which everyone could chip in. Plus, the obvious free alternative is a Facebook group. I'd like to suggest an alternative if I can.

A couple old groups that I'm a part of have migrated off of yahoo to https://groups.io I think it's a step up in usability and interface.

What about forum software like Discourse or Flarum https://flarum.org? However Discourse is a bit expensive.

In addition, I'm creating a Pay-what-one-wants alternative to FB gropus and Meetup and email lists ... It's sort of a cross between StackOverflow, Slack, HackerNews: https://www.talkyard.io (open source).

What do you like the most about Groups.io? if I may ask

trying to use yahoo-group-archiver,

hard coded some data into the yahoo.py file :

cookie_Y = '...'

cookie_T = '...'

and later assigning them :

    args.cookie_y = cookie_Y
    args.cookie_t = cookie_T
before calling the function YahooGroupsAPI

and using my username / password

getting these error messages :

logging in...

Exception raised on uri: https://groups.yahoo.com/api/v1/groups/alesis-ion/messages {"ygError":{"hostname":"gapi1.grp.bf1.yahoo.com","httpStatus":500,"errorMessage":"Internal error: UDB Failed","errorCode":1001}} ERROR: Couldn't download message; 403 Client Error: Forbidden for url: https://groups.yahoo.com/api/v1/groups/alesis-ion/messages Exception raised on uri: https://groups.yahoo.com/api/v2/groups/alesis-ion/files {"ygError":{"hostname":"gapi15.grp.bf1.yahoo.com","httpStatus":500,"errorMessage":"Internal error: UDB Failed","errorCode":1001}} Traceback (most recent call last): File "yahoo.py", line 485, in <module> archive_files(yga) File "yahoo.py", line 154, in archive_files file_json = yga.files() File "c:\Users\ITD\Documents\Python Scripts\yahoo groups 3\yahoogroupsapi.py", line 96, in get_json raise e requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://groups.yahoo.com/api/v2/groups/alesis-ion/files

tried it on another group, same result.

Has someone encountered these messages ?

To do this without providing any tool for exporting data is proof of a lack of good engineering culture.

Ah snap. I have very fond memories of the HP DE100C Linux based console mp3 player group helping me troubleshoot problems back in the day, even an HP employee hooking me up with a new remote. It's sad to see them shut down.

Im confused. Does this mean Yahoo Groups is shutting down?

That's what I thought too, but it looks like it's just getting restricted to private groups and disallowing media/content uploads.

It's probably to push people off so they can shut it down later.

All messages will stop on Yahoo groups (last one of the bullet points.)

i would assume it's an anti-spam effort

It will just function as an alias for all that group members. Yahoo won't save the email on their servers; instead, expands the blah@yahogroups.com to every member, and sends them a copy. That way, they don't need to maintain any infrastructure besides email server.

I wonder how much this has to do with GDPR and other data regulations? Presumably Yahoo Groups has too little activity to be worth keeping in an active state. But even in a static, archival state, Yahoo would still be obligated to fulfill data requests for it, right?

Too little activity ?

When I worked on it back in 2010 the sheer volume of email sent was staggering.

The issue with groups always was and always will be monetizing it.

Fair enough. But what is it these days compared to Facebook, Reddit, Google Groups, or even Tumblr? Yes I know that the latter platform is suited for types of content much different than Yahoo Groups, but I'm thinking Verizon cares mostly about it in terms of engagement metrics (userbase, page views, potential ad space, etc).

Keep in mind groups is 18 years old. It started as an email list and I can tell you we had 12 email servers and they sat at around 2k message a second... steady. We didn't even alert until they hit 20k in the queue.

The photos in Groups was also an interesting thing. We once had a "hacker" create groups called shard1 shard2 .. etc etc. He put his porn in there and deep linked to it. We found out when we got the internet bill for the overseas bit. We were pumping out 8GB a second to China and didnt even notice the traffic until we got the bill :)

Yes, I am sure Facebook/ reddit has eclipsed it by far in data sets now, however the volume in groups isnt matched by many other.

When I left they had ~4 Petabytes of data... around 8 years ago

Very interesting context, thank you!

Geocities all over again.. Why can't they go into readonly mode? Or even a downloadable archive? How much storage could it possibly be?

It's situations like this that the idea of a 'permanent web' like IPFS resonates.

what could be the reason ?

Posts are down to their lowest levels since the 90s (if I'm reading this graph right).[1] Also, someone at Y! probably figured out how much it cost to run the infrastructure and maintain the code.

[1] https://www.archiveteam.org/index.php?title=File:Yahoo_group...

That graph is biased in that it only covers discovered groups and only half have been discovered (older groups are likely more likely to have been discovered). Assuming it's NOT biased something happened in the 2nd half of 2014 to destroy traffic to Yahoo Groups.

> Assuming it's NOT biased something happened in the 2nd half of 2014 to destroy traffic to Yahoo Groups.

That is the time when Yahoo! Groups fucked up their DMARC implementation:


Everyone now knows how babbys are formed.

Paging /r/datahoarder

Ignore the below; I was wrong.

~~Note that list content isn't going away as the headline implies, but instead all the files uploaded to groups.~~

What exactly is going away and how do we know? There's contradictory information and it's confusing.

Under "what features will go away?" it includes: links, conversations, email updates, message digest, message history. That sounds like more than just files uploaded to groups.

Yah, I'm striking my comment. Saying "no longer able to upload content"-- I assumed it was limited to files.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact