Hacker News new | past | comments | ask | show | jobs | submit login
Historical programming-language groups disappearing from Google (lwn.net)
737 points by beachwood23 on July 28, 2020 | hide | past | favorite | 332 comments

It's funny, when I took a tour of the US Geological Survey, the curator of the collection hated Google (which was just a few blocks away). He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS.

But what happens when they get bored with map data and get rid of it?

He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.

At the time we all told him not to worry, Google would never remove data it had collected. Looks like he was a lot smarter than us.

Well, that's the problem with the whole internet. Remember those pages created in the 90s/early 2000s? People thought they were sharing information to the whole world. It turns out that most pages created in the 90s are now inaccessible or have been siloed by big corporations. The fact that we allowed corporations to take over the internet made it an inhospitable place for everyone else without corporate backing.

I don't think it's any harder to create a website than it ever was. The problem seems to be that corporations have made it so easy to do it within their silos that people aren't willing to spend ten hours on something they could do in ten minutes, not realizing that they're going to spend a lot more than ten hours creating content which the company will then vaporize at random whenever they feel like it.

A decade ago, there use to be celebrity websites which had forums, galleries, blogs now it's just Instagram. Hell so many prominent celebrities don't even own a domain name in their name. Also, it's not like the content has improved. Earlier their use to be HQ images in those celeb galleries now the highest resolution image is 1200x1200. The only thing that has improved is how easily a celebrity can reach millions everything else has gone downhill with respect to discussions, forums, galleries, blogs. Most of these are replaced by poor comments section.

It's not just celebrities, so many independent artists are putting up their talent on Instagram and I don't have access to any of it because I need an Instagram account for that. Instagram web version is forcing to sign up if you scroll 1 page down on a profile.

Sometimes I feel like we need to build cutting edge decentralized applications that will burn these walled gardens to the ground. /rant

  The only thing that has improved is how easily a celebrity can reach millions
From Wikipedia:

  Celebrity is a reference to the fame and wide public recognition of an individual or a group
I'd posit that celebrities are celebrities by connecting with millions. A platform that offers a "celebrity" the ability to connect easily with millions seems to be worth more than a list of any other features.

What has changed is the meaning of the word connected.

If celebrities (with their fame, reach and money) can't bother owning their own domains, what chance does a normal guy/gal have? It would be trivial for celebrities to set up their own websites and share whatever stuff they are sharing. At minimum, they can do this in addition to whatever social media they are on.

It is as if we are all becoming lazy and/or many of us don't realize the harm in giving all our info to half a dozen super mega corps. Most of these mega corps aren't even distributed in the world, they are all American (except tiktok) which is another interesting angle.

This is going to happen (already happening?) in the webapps/apps world too. There are so many no-code tools popping up - most will die, the rest will get acquired by the mega corps. Made a great webapp that is successful? Now you are stuck with bubble/airtable/shopify/whatever. I cannot name many no-code tools that lets you export your application to be hosted independently.

I feel like we are on a path where in a few decades, a dozen or two corporations will control every single aspect of our lives - online especially, and probably offline too.

> If celebrities (with their fame, reach and money) can't bother owning their own domains, what chance does a normal guy/gal have?

This is a matter of demand, not capability. It seems most celebs just don't care about setting up their own stuff, and, really, why would they? There are free platforms out there that give them huge amounts of reach. Most of these people just don't need their own website. They may come to regret that decision later, but it's their decision to make.

If a normal guy/gal wants to set up their own domain and website, it's not hard for them to do so, certainly no harder than it was in the 90s/00s, and probably a lot easier. The "no-code" stuff certainly has lock-in disadvantages, but you can simply choose not to use them if you want. Yes, it's more work, but it was always more work to do it yourself, and always will be.

A lot of ordinary people don't post in public, or if they do it's not under their real name, and there is a growing trend of deleting it after a few days. They don't want you to have any data about them at all.

The modern alternative to Usenet is private Facebook groups that never get indexed.

>I don't have access to any of it because I need an Instagram account for that.

Check out https://bibliogram.art

502 Bad Gateway

For what it’s worth, the original link is back up again.

These celebrities will be forgotten as soon as their Instagram account is gone or followers appear to be dead accounts mostly. Some artistically influential will get to the mainstream thanks to their fans hoarding and sharing the data, like all these last century music/film stars who left tons of material in any medium, now digitized and shared/pirated. Although I get a bit worried about p2p sharing as multimedia rental walled gardens got too popular.

The venn diagram of "celebs who are influential" and "fans who are data hoarders" don't overlap as much as you think. The latter tend to be a much nerdier group than the general population.

Yeah, you may be right here. Also Instagram has low entry barrier, where super stars got enormous promotional budgets. Now it is full race to the bottom I would say, commoditization of popularity, massive celebritism :)

There are many projects to make network decentralized such as ipfs (InterPlanetary File System). When exposing these services to public, legality is a big issue.

Plenty of rights would involve in it (copyright, privacy, and so on). Also, all kind of crimes are another issue. It is hard for people to keep monitoring if contents are safe or not.

I think we need a network version of the War of Independence.

Counterpoint: does the average internet user want to download a new app or go visit a different website each time they want to get these updates?

If the websites bother to export RSS/ATOM/ActivityPub feeds, they won't need to. They'll just "subscribe" to the stream in their preferred app/webservice, and get aggregated updates about everyone they care about.

Sure, but what about commenting on the post? No one likes creating accounts for different websites in order to post one comment.

It's obvious that things like Twitter and Instagram provide value to celebrities and people who follow them. It's just that there are some serious externalities not factored in.

ActivityPub lets you mark a comment as a "reply" to a post (or any web resource), then all you need is a trackback-like facility to link from the post to comments on it. It doesn't need to be centralized.

How do you despam that?

Or de-nazi it for that matter?

> like we need to build cutting edge decentralized applications

Yes, please! More and faster.

Centralized services are easy to build, because they offer an obvious location to do some of the things that are tricky to do in a distributed fashion. They are also, by the way, far easier to for small numbers of coordinating people to control, which makes them popular with corporations, authoritarians and sociopaths.

Decentralized services will rarely be the New Shiny that attracts all the 14 year olds for a few minutes. But, unlike email, you never hear anyone whining that Myspace won't go away.

We have tons of decentralized platforms. The amount is not the problem. The problem is the user experience. For users who don’t care about anything behind the scenes and only cares about the experience, why would they go out of their way to figure out Mastodon when Facebook has a one-step signup process?

The reason central platforms win is because they have to be dead simple to use in order to attract any users. Decentralized platforms get their initial users because of how cool the technology is, but those people (people like me and you) aren’t UX experts and don’t prioritize UX.

It has to be easier than the central platform, and the central platform has the benefit of millions/billions of dollars to throw at it. Which means the decentralized platforms have to work even harder to overcome that. It’s not impossible, but it does require engineers to overcome their desire to build cool things and instead focusing on building a user experience that’s better than what Facebook/Twitter/etc can provide.

Joining a Mastodon server is just as simple as signing up to Facebook. The difference is that no single Mastodon instance has centralized control over their users; you always get the option of signing up elsewhere, or using your own instance.

This is what I would have thought, but I've heard from more than one friend who was frustrated by having to choose an instance in the first place. What makes one better or worse than another? What if I choose wrong? What if I need to move?

This isn't difficult, per se, but it's not as easy as signing up for Twitter for the simple dumb reason that you don't need to make that choice. And the concerns about "wrong choices" aren't entirely unfounded; the first Mastodon instance I signed up for was effectively abandoned by its sysadmin. The decentralization still makes it hard to search for new users to follow compared to Twitter (checking right this moment, when I click on someone I follow to see who they follow, it only shows me people they follow on the same instance); depending on the Mastodon client I'm using, it can actually be a little hard to follow someone even when I find them if they're not on the same instance I am. Again, technically none of this is super difficult, but for a user who isn't philosophically committed to the fediverse, tiny little frustrations start to add up quickly.

>What makes one better or worse than another?

Most public mastodon instances usually have an about page that describes what their intended audience is. You can also look through the public timeline to see what users are saying first before signing up. If you aren't sure, pick a larger general instance.

Despite all that, in my opinion you'll probably get more mileage out of joining an instance run by someone you know and trust.

>What if I choose wrong? What if I need to move?

On newer versions of mastodon there is already a migration option to import/export your data between servers. https://blog.joinmastodon.org/2019/06/how-to-migrate-from-on...

Regarding your second paragraph: If you'd like to fix bugs in your chosen mastodon client, I'm sure that would be welcomed.

> you'll probably get more mileage out of joining an instance run by someone you know and trust.

What if you don't know or trust anyone that runs a Mastodon instance? And don't have the time/means/expertise/motivation to run one yourself?

Find someone who does and make friends with them?

Well, you can see how that's a bit of a bigger ask than "go to Twitter.com and click SIGN UP," right? :)

People really, really want to argue that decentralization of social networks doesn't make things harder, but eventually the defense always shifts to "well, you have to be willing to jump through a few hoops if you believe in the advantages of a decentralized/indie internet, which you totally should," because the truth is that the decentralized way does make things harder. Personally, I do believe in the advantages of the IndieWeb, and I do think it's worth jumping through those hoops. I just think we need to acknowledge those hoops exist, and always be thinking about ways we can reduce the friction for people who say "I like all those ideas in theory, but in practice it's too frustrating."

It’s the “signing up elsewhere or using your own instance” that’s the problem. You’re joining a Mastodon server and if you want to go to another server you have to actually move things. It takes actual effort.

It seems like the Mastodon developers look at email and think “if it works for email it’ll work here” and don’t understand that people deal with email because they have to, not because they want to. I don’t want to have to change my email address when I switch providers and I don’t want to have to move all my stuff if my favorite Mastodon server decides to shut down.

That’s not a solution that’s just another problem. It’s bad user experience.

> and if you want to go to another server you have to actually move things. It takes actual effort.

Not sure how that's worse than something like Facebook, where you literally don't get that option. If you want your asserted identity to be reasonably secure and easy to assess for other users, you have to find a trusted host or do your own hosting; that's no different from any other service.

This is what I’m talking about with developers not understanding users. I even said “users don’t care what’s behind the scenes”. Any normal user who is looking to leave Facebook wants Facebook The Product but not Facebook The Company. They don’t care about hosting or asserting identity. They don’t want options, they want a product.

People use email because it’s the best communication system existing. If someone doesn’t want to change his address he creates his own self-hosted mail server. Or just use his own domain name when using email service with thord party email providers. Email system is amazing, the issues with ux are in fact minimal, almost non-existent.

Is your dad, aunt, or grandma going to create a self hosted mail server?

Seriously this is exactly what I’m talking about. This is a textbook example of what I’m talking about.

No end user hosts their own email service.

Sadly no one has made this easy and possible.

Even if self-hosted email was easy, I don't find it to be a great idea.

I ran my own for a while using mail-in-a-box but ended up moving to fastmail because I didn't trust myself to maintain the setup. I need my email system to just work and that's likely the same for the majority of people.

(I validated that lack of trust in myself even with fastmail by not realizing for almost 3 days that I let my domain expire, thus causing emails to bounce with no way of me knowing that was happening)

Ok so make mastodon as easy as gmail. I can use Gmail.com or my own domain. Let me have same flexibility with my internet identity. I wish the USPS had gotten into the official Internet identity game. One place to receive legal emails, store my private keys, public keys etc. protected by law.

Too bad spammers and big corps broke that one as well. Maybe if it worked as a publush subscribe system (MQTT) instead where the sender was responsible for storing and distributing content, then the spam problem would be somewhat fixed?

A lot of people, clubs and businesses publish their content on Facebooks and Instagrams because those platforms are better for getting your content out to your followers and more people. They are being rational.

Where's the non-proprietory decentralized platform that lets me reach as many people as I can on Facebook? There isn't one.

Why aren't the social functionality of identity / friends / followers / newsfeed / etc. built into browsers in a standardized way?

Facebook is 16 years old. That was a lot of time to figure out an alternative solution, but all we have are experimental projects that rely on adoption that they don't have to be useful.

Corporations aren't going to change how they behave, but it's annoying that us techies are apparently incapable of beating them at our own game.

> A lot of people, clubs and businesses publish their content on Facebooks and Instagrams because those platforms are better for getting your content out to your followers and more people. They are being rational.

I like trains, and I started a website back in 2001 for people to share their photos. It was reasonably popular. One of my drivers was taxonomy and archiving of images for future enthusiasts.

Today, it's dead. People post their photos on Facebook groups. They get attention, likes - all the stuff that matters to a human. A week later the photos are lost in the group, hard for anyone to find, no indexing, no exposure. The comments - from people who worked on the railways, knew people involved - useful to historians of the future, are fantastic. But if you can't find them, what point?

I get why Facebooks succeeded. For my site, I was a total geek: why would I dirty the site with anything social? Well, look who's laughing now.

>People post their photos on Facebook groups. They get attention, likes - all the stuff that matters to a human. A week later the photos are lost in the group, hard for anyone to find, no indexing, no exposure.

Not even a week if you consider the target audience for what's posted as opposed to the poster. Algorithmic sorting and infinity scrolling have pretty much eliminated the ability to go back and look at something you saw a few days ago (unless the algorithm decides to boost it back into your feed).

I haven’t seen your train site, but the kind of content I imagine you produce would be, in my mind, akin to a reference book.

By contrast, Facebook is at best like a magazine, at worst a radio phone-in about trains.

Reference works in the form of websites have amazing value in and of themselves. I don’t think they need to be measured by social eyeballs when they attain an outright high level of quality.

I happen to be particularly fond of a reference website that is a taxonomy and history of British traffic lights:


> Why aren't the social functionality of identity / friends / followers / newsfeed / etc. built into browsers in a standardized way?

Newsfeed is RSS/Atom.

Identity / friends / followers are really one package, and it isn't a thing browsers can solve on their own, because people want the ability to do password resets etc. Also, decentralized identity is somewhat the opposite of this anyway -- people don't want to use the same "identity" for their parents and their friends and their boss.

The best way to do this is for sites to use email as identity, because it's common and gives you password resets, but people can create more than one and separate them as they like.

Which the technology to do already exists, but Facebook and Google made it easy and the free software equivalent takes several hours to get running. Which we could fix, but haven't (yet).

Sure RSS exists, and I use it, but it's not even built-in to (most) browsers anymore. You open an RSS link in the browser and it spits out XML garbage. Wat.

RSS is sadly not enough on its own without the other puzzle pieces. Private feeds are not really a thing, it doesn't let you comment on or like or share the article to your friends, etc.

That's because Google Reader existed, then they killed Google Reader.

ActivityPub solves most RSS limitations.

Why aren't the social functionality of identity / friends / followers / newsfeed / etc. built into browsers in a standardized way?

Because these compete with the interests of browser vendors, interests which finance a degree of development that dominates and ultimately stifles independent efforts.

Remember that Google pitched Google+ as an "identity service". They're now accomplishing this through Android, Doubleclick, GA, Gmail, and ReCaptcha, far more effectively. And sell ads on it.

Facebook isn't going to pay for social integration development by Mozilla: Zuck wants that pie to himself.

Channel monopolies would prefer RSS died and browsers (or apps) served their specific feed directly and exclusively.

More often than not, the "game" trends toward market capture and acquire + kill or absorb business strategies. At a certain size that's hard for anyone to beat.

That's part of the problem: readers became followers. Don't forget to like and subscribe.

I went through exactly this thinking recently when I wanted to setup a blog for myself (and migrate an existing one off of WordPress). I tried my best (and I think I succeeded) in ensuring that I am not locked into one vendor, and it was pretty much free.

Someone else mentioned that you can't reach as many people from your silo-ed website as you can if you go through social networks. I found one way you could get best of both worlds - through Medium's import feature[1]. But I don't yet know how effective that is.

Here's a short write-up in case anyone's interested: [2]

[1]: https://help.medium.com/hc/en-us/articles/214550207-Import-a...

[2]: https://ketanvijayvargiya.com/58-setup-blog-and-email-on-cus...

I think that's only half of it. The other half is that it's easier for consumers to find content in the silos of the large corporations than content that exists outside of them.

Yep, it’s even worse. There are some things that don’t have a definitive answer, like many aspects of CoViD. Some pages with what I would categorize as “inquiring” get removed just because they don’t line up with the WHO. This isn’t about questioning vaccines, but rather the unsettled questions around this new disease. Just gets banned... it’s not like I have a stake in that fight _other_ than dismay at private censorship of opinions that don’t toe a given line. It’s rather frightening.

But... many of the same companies will fill your search results or fill affiliate pages with quackery ads just fine...

Why do you feel that it was the job of "corporations" to preserve and archive of every page forever?

In my country, all physical books and magazines which are published must be submitted to the government in X copies. The government then keeps an archive.

With webpages, the problem of obtaining X copies never existed. Why couldn't the government have archived webpages like it always did with books?

It is not that it is their job. The problem is the mismatch between the broader public understanding of the lifetime of "a webpage" and reality, when said "webpage" is inside a walled silo (and maybe even when it isn't).

I guess I don't see why anyone would expect their webpage to keep being published indefinitely with no contract or ongoing payment. Not many other things in the world work like that.

Services provided by a company typically don't survive the end of the contract to provide that service. If the company itself goes bankrupt, all services cease to be provdided immediately.

Typically the only organisations which can credibly commit to providing a service for more than a few years/decades is the government of a country, a well-funded foundation with a clearly specified mission, or similar.

In the UK, this job is handled by the British Library. They have a legal duty to collect annual snapshots of all websites using the .uk TLD https://www.bl.uk/collection-guides/uk-web-archive

I believe you are misrepresenting the situation. No one expects corporations to archive and preserve all data, especially not data that they are not associated with.

However, if they create a monopoly on that data they have an obligation to preserve it, especially in the case of a corporation outright aquiring data instead of simply "out competing" for data. And as everyone mentions, of course they are in no way legally obligated to do so, but they are by any reasonable standard ethically obligated.

I do think that the government could and should archive data, but there is currently no system in place for doing so and likely will not be for a long time, if ever. Corporations would simply have to maintain the data that they already have.

I'd argue this is a feature not a bug. The internet is a protocol for communication, not archival or retention. Any notion of persistence is owned by nodes in the network. Retrieval from an "archive" over the internet comes in the form of communication. The web introduced hypertext, and a protocol for exchanging hypertext, on top of this communication protocol. But again any notion of persistence of hypertext, and the "links" between hypertext documents, is the responsibility of the nodes in the network.

They were sharing information with the whole world, but in an ephemeral medium.

The web, and internet, is not an inhospitable place for anyone without corporate backing. You can host a somewhat reliable service on a raspberry pi over your home internet connection.

You just can't be found. You can self host but unless someone finds you some other way you are excluded.

> It turns out that most pages created in the 90s are now inaccessible

Some of that is because search engines have simply stopped returning them in results even though they're still online.

The issue is is that a web page only lasts as long as its funding does: private sites are great, but someone still has to pay for the server and, when they die, it’s probably going to just vanish, unless the internet archive got it.

How are big corporations preventing a web server from serving content over HTTP in old-school HTML?

Seems like a fundamental truth of Capitalism: privatization and ultimate destruction of anything that can be monetized. Certain things are impossible without money and to make money to have to generate or consume something which leads to a never-ending cycle.

I seem to have a distrust of corps imprinted in my brain since birth and had never fell for their candies/propaganda. All my stuff is always on my own servers with shadowing of course.

> He had been ordered to turn over all of their historical arial archives for scanning by Google, and then told the USGS would no longer do arial scanning since Google was doing it. But there was no agreement for Google to turn over their arial scans back to the USGS.

Jeez, that's horrifying. Literally just giving public assets to private corporations.

Public funded data should be publicly available which includes use by private corporations.

Agreed. You seem to be missing the part where this is no longer publicly available because the USGS no longer has the data.

I believe the original author meant that Google doesn't have to turn over aerial scans that Google commissioned (not the scans they took of the USGS files).

Furthermore, I'd be shocked if Google just kept the original copies that the USGS gave them.

If getting it takes connections or prestige, then yes.

If any entity with a plausible use case could and still can get that data at the cost of the copy, I don't see why not. The whole "copying does not deprive the original owner" meme applies particularly to such public assets.

> If any entity with a plausible use case could and still can get that data at the cost of the copy, I don't see why not.

Can you point me to where I can download this data for the cost of a copy? Didn't think so.

Did you try asking the USGS?

Google presumably also didn't have a website with a download button.

At a fundamental level, my complaint isn't about money or access, it's about ownership.

Yes, money and access are important. Yes, ownership has money and access implications. But ownership in itself is fundamental, and the money and access problems usually don't immediately follow the ownership problems, because if they did, nobody would give up ownership in the first place.

I've said it before and I'll it say again: corporations don't have the right to "innocent until proven guilty". We don't have to wait for corporations to do something wrong to do something about it. A lot of people on Hacker News seem to have this idea that we should wait to regulate until businesses follow incentives to the point of doing great harm, and then once they're when they do great harm, we should just say, "Oh it's not their fault, they were just following incentives."

I don't accept this. It's obvious that taking publicly owned data and making it privately owned will lead to Google placing that data behind a paywall, an ad-wall, or a simple loss of access if Google feels they can't monetize access. Even if Google maintains a relationship with the USGS such that we always have access, there's no reason for the USGS to spend our tax dollars to pay rent to Google as a middleman. We don't have to wait for Google to follow incentives to that point--we can see where this is going and we don't have to pretend we don't.

I assume the USGS still holds the same data, no less accessible than before.

So, if Google has put their copy of the data behind a wall, or deleted it, it's no less accessible than if Google had never been allowed to make that copy.

but... it's not current data. the historical data loses value as it ages. Aerial scans of "one month ago" are now less accessible.

I don't like this but if a corporation is a person, they have the same right to it that the rest of the public has.

If the effort to USGS could be quantified in a cost, I'd expect Google to pay USGS to make the public data available?

It does sound awful. I don't know what the right answer is.

> I don't like this but if a corporation is a person, they have the same right to it that the rest of the public has.

1. A corporation is not a person. Corporations don't have rights, except inasmuch as the people within the corporation have rights.

2. The problem isn't that Google has access to the data, it's that USGS and the rest of the world no longer have access to the data, except on Google's terms.

Wasn't there a seminal surpreme court case that has been used as precident to show that corporations do have rights? Something tax related?

The supreme court also decided in Dred Scott v. Sandford that people of African descent imported into the United States and held as slaves were not included under the constitution and could never be citizens.

The supreme court was wrong on racism, and it's wrong on corporate personhood.

What's right and wrong is a purely human construct and changes over time. The point was that, currently, in the eyes of the law, corporations hold many of the same rights as individuals. This could change, but would require special circumstances not considered previously or a change to the law by congress.

Corporations aren't people. I can't get married to Google. If you can point to specific precedence of corporations being given access to certain data on grounds of their personhood then your argument makes sense but just because corporations are considered like people in the context of speech doesn't mean that applies literally everywhere else.

Corporations aren’t people, but from a legal perspective, they have some rights in common with persons, which is why you often hear statements saying they are.


The law is wrong.

I never said otherwise.

Okay, but it's pretty tiresome to hear the supreme court opinion of corporate personhood brought up constantly as if it has any validity whatsoever.

I can sympathize with that. Unfortunately, the opinion of the supreme court is a force to be reckoned with, whether we agree or not.

> I can't get married to Google.

But they can still fuck you over, like when they banned my GMail account for no reason, with no warning nor explanation.

Yea most of the wind about “taxpayer dollars being wasted” is just flatulence but this is a straight up robbery.

> But there was no agreement for Google to turn over their arial scans back to the USGS.

That was poor negotiation by USGS Solicitor's Office. Libraries participating in google digitization programs negotiated to keep copies of their scanned materials in the Hathi Trust Digital Library https://www.hathitrust.org

You act like the Director of the USGS was acting in good faith. It's pretty likely that Eric Schmidt, or similar, already worked things out with high-level officials within the government and the USGS Director was not given any real decision making capabilities.

There are laws for book publishers, requiring that they send copies to your local government's central library. In the US it's the library of congress. Some of the books they don't keep, but they do filter them by which books are important and which aren't. Maybe the same should be done for "viral" posts, such arial scans, and other data deemed important.

In Germany the national library is also required by law to take care of digital ("körperlose Werke") media.

They are still figuring out what a good way for archival of those is and are quite selective in choice what they archive, but they plan to expand on that

German page: https://www.dnb.de/DE/Sammlungen/DigitaleSammlungen/dgitaleS... English page: https://www.dnb.de/EN/Sammlungen/DigitaleSammlungen/dgitaleS...

Electronic works in the U.S. fall within the mandatory deposit statute, but then are excused by Copyright Office administrative rules. However, it seems they've slightly narrowed the electronic works exception (first in 2010, and tentatively for 2020) since the last time I looked: https://www.copyright.gov/rulemaking/ebookdeposit/

I didn't know that LoC has any discretion regarding keeping items mandatorily deposited for copyright registration. Do you have more information on this?

The USGS is currently in the middle of an 8-year 1.1-billion-dollar program to develop a nationwide digital elevation model from aerial lidar. The data, which is freely accessible, is hosted on AWS. Cute story though. The hackernewses are going to eat it up.


USGS is in the process of collecting that data right now, it's not from the archives, and DEM is different from USGS aerials (which are photographs) and run out of a different USGS office. This is sort of irrelevant.

Making digital data publicly available is pretty new for USGS. Just a few years ago archived aerial imagery had to be ordered by mail and it was a pretty lengthy process. Topo maps (the earlier equivalent of the DEM data to which you refer) were generally ordered on paper as well up to five or so years ago, but they're in a lot more popular use so more third parties got into the business of distributing them. I've relied moderately heavily on both for some of my research and was a very painful process until just recently to get anything older than current. In the meantime, yes, Google had it all at some point, but mostly stopped using it or providing it because they obtained better quality imagery.

Fortunately USGS now has a slippy map for topo and an admittedly rather clunky ESRI query service for aerials.

USGS has been providing free public access to DEM data for ages. The SRTM has been available via FTP since at least 15 years ago when I first started using it to render hillshaded maps. There's not a secret handshake needed.


and the DEM has always been in a native digital format. The whole problem here is that the aerials and conventional maps are not, they're on paper and film and fiche. It takes a lot of time and money to get it digitized and available and USGS was not able to do that for a long time. You could argue that Google's generous offer to digitize the EROS archives contributed to the delays on this.

Keep in mind that when we talk about the EROS archives we're talking about data that goes back to the 1930s and earlier for some product types.

For a long time I got the topo maps from the website of a state government bureau that had conveniently run them through their own large-format scanner and posted the TIFFs - USGS didn't get around to it for years after. It's hard to blame them too much as they had a shoestring budget.

Actually, for amusement value, that state agency appears to have removed the TIFFs from their website and now says that you can order the topo maps by mail for $8 a piece, which is what I used to have to do. I wonder if USGS got mad at them, which is a bit ironic since they don't mention that USGS themselves only recently started offering them online for free. For additional amusement value EarthExplorer, the fairly new service that lets you retrieve aerials online, has a banner up that downloads are intermittently broken and indeed I can't get it to work at the moment.

Really I'm struggling with your statement about digital distribution being new for the USGS. We're talking about an agency that ran a finger service to inform people about recent earthquakes!

I had to order some maps over Antarctica by fax a year ago. The USGS had a functional webshop, but it only served the US, everybody else had to fill out this form including debit card data and send it over. It turned ou my uni actually still had one (1) functional fax machine.

I took the tour 10 years ago. Obviously his objections were heard. I’m glad they listened to the guy.

Companies are necessarily managed for the quarter and countries should be managed for the century.

The planet should be managed for the centuries, plural.

But we're just not smart enough to understand that, never mind make it happen.

Instead we prefer to cling to the bizarre delusion that billions of individuals with competing interests will somehow spontaneously self-organise into the best of all possible worlds.

That is indeed pretty much what it says on the tin.

However, to be fair, it has always been the most greedy and self-interested, with already the most disproportionate power to rig the game in their favor, that have been most vocally advocating this system. No surprise there, of course.

What fascinates me is how a majority of people, who certainly do not personally benefit from that system, have been made to believe that they do. Sure, political corruption, cultural indoctrination/propaganda, horrendous general education, and I can think of a few more .. but still I've always been amazed, how it appears to have canceled even basic logic reasoning among so many.

Who knows, maybe one day, it will turn out to not just "correlate" with an addiction to carbs/sugars, of which the country has plenty of problems with too. Junkies have always been easy to manipulate.

Until then, at least it still gives some hope that a growing number of people now realize that this system just doesn't work as it is advertised.

And it's not even true. At least here in Europe (I can't comment on the US as I've never visited), Google Maps is really poor.

It's fine when you travel by car, but when I'm hiking through the hills I'm just walking through an empty square on Google Maps. Volunteer-driven OpenStreetMap is MUCH better. And there the data is actually open and safeguarded.

Governments should support that kind of project instead of corporate privacy-invading playtoys like Google Maps.

In another life, I was a land surveyor and I did a lot of LiDAR work as well as heavy use of USGS data. Almost anybody except the most blinkered in that industry would have seen this coming, I think. It's just one more data point that convinces me that Google should be broken up or at least not allowed to silo previously public categories of data.

I had a similar almost to the letter conversation when I did some web work for a much smaller GIS firm back in the day, but wanted to add that in my experience this isn't just a google thing but an issue with governments and outsourcing in general.

Anecdotally, a close relative (and many others in her institute) designed entire curricula of learning modules for a government-owned nationwide technical college, back when online learning was newish, ~20 years ago (I think back when SCORM was fresh). These were tightly integrated into the traditional in-class offerings. A couple of years later a "trim the fat" government slashed internal capabilities and outsourced all "IT" hosting, management, etc.

All of the online learning modules (which would have cost millions in man-hours to develop) were literally handed over as "content" to a company who to this day offers them back to her institute under per-student licenses (that far exceed any "hosting" costs of these basically static resources) over a decade later. This company also profits off licensing to an array of pop-up online "institutes" that don't even approach the pedagogical context needed to ensure quality education outcomes from these resources.

Like a comedy of errors, from time to time some lecturer at her college will want to ask some question about the materials, their boss directs them to the company support (which is a paid service), after the issue escalates through the support tiers and they realise they need the expert knowledge of the author she'll get an email with the question, a process that can take days or weeks when the lecturer could have walked into the office next door and asked her directly, if the company hadn't stripped all author credits from the materials.

If the company decides to shift business models, or goes out of business, or is acquired and scuttled, these assets get blown to the winds.

There's a lot I could say about this situation, but essentially governments in general seem to devalue their assets at taxpayer expense, the IP of these assets could have been better handled rather than just giving it directly to the first company to win the contract all those years ago.

> historical arial archives

A font of knowledge

Is this a valid reason for not using Go?

I am a holdout.

(Not suggesting I am "smarter" than Go users, but I can forsee issues with Go being controlled by Google.)

I will probably never use Go in its current situation.


I think most of us are young enough to live up to the point where we would look into the mirror asking ourselves what did we do 20 years ago and nobody will really remember because you know,a few bits here,a few bits there,and all it disappeared...

Not smarter. Wiser.

It sounds awful that Google has the best mapping data in the US. In the UK Google's data is awful, worse than OpenStreetMap and much worse than Ordnance Survey, the national mapping agency.

The funny thing is that this happened already when Google bought DejaNews and broke the interface after a year.

The underlying problem here might as well be considered a fundamental shortcoming of pure/fundamental capitalism. I make no claims about the value of alternatives, or even if there are any (better ones, that is).

Anything that is (no longer) of commercial value will be "phased out" and dismantled/destroyed. One might still stretch it a bit, by arguing that the commercial value of something can include its future potential value. But I personally know not a single commercial companies that ever choose that over short term cost reductions and "profit optimizations".

Luckily, there are governments who acknowledge this shortcoming and build structures to compensate for it. But when governments decide to leave (almost) everything to commercial markets, then the importance of anything and everything can and will only be measured by it's commercial (contemporary) value/profitability.

People have every right to vote for and support such a system. But then don't complain, when all that you will get is only what such system supports/provides.

Isn't killing projects Google's key strength?

Like we have whitewashing and greenwashing, I propose the term:

Googlewashing - to proclaim “Google would never ...”

He said Google is great now, with all their maps, which were far more accurate and had better coverage than the USGS...But what happens when they get bored with map data and get rid of it?

Looks like he was a lot smarter than us.

If you would've asked me back when Google was new, and we all believed in "Don't be Evil," I would never have thought that Big Tech would end up being the Ministry of Truth and The Memory Hole.

Just recently I collected all of the archives of comp.lang.ada I could find and imported them into a public-inbox repository. There's a gap around 1992 that I couldn't find a copy of, but it's otherwise complete. It took a few days to get everything into the right format and get SpamAssassin dialed in, but it would certainly be possible to do this for the other comp.* groups if one had the patience.




I would personally very much appreciate it if the ada recources could be placed or archived again on the internet. Lately I had the feeling even books where a better option for finding information about the language.

The vast majority of the spam content is injected into these newsgroups via Google Groups itself, and is not even seen on other NNTP servers.

Blocking posting access to these newsgroups from GG is generally a good thing for those newsgroups.

Not being able to search the archive is the unfortunate collateral damage though. Google is not obliged to provide a Usenet archive, I suppose.

Formerly obtained deep links to the content also do not work!

If you formely cited a comp.lang.lisp article by giving a direct link into Google Groups, people navigating it now get a permission error.

What would be a good free NNTP server or NNTP archive?

The D programming language forums work as a NNTP server as well as web forums. I have in the past downloaded all content from the forum allowing me to have fully offline archives of threads. This is so underrated. I think NNTP could make forums much more superior although it feels like there arent many clients springing up AFAICT.

Adding some new NNTP features to Thunderbird was my introduction to open-source software and ultimately led me to being one of the primary maintainers.

NNTP is a wonderful protocol, arguably the simplest of the 4 mailnews protocols (IMAP, POP, SMTP, and NNTP). While it seems to share the same basic format as RFC822 messages, it actually tends to avoid some of the more inane issues with the RFC822 formatting (generally prohibiting comments and whitespace folding).

Unfortunately, the internet by the early 2000s started turning more and more into an HTTP(S)-only zone. Usenet itself hemorrhaged its population base, especially as ISPs shut down their instances (e.g., because someone found one child porn instance somewhere in alt.binaries.*).

We periodically hear calls to replace the D NNTP forums with "modern" forum software, but naaah, nobody does it better than NNTP!

Vladimir Panteleev did, though, write a web interface to NNTP:


which is also freely available:


The only things I would ever modernize about the D forums if I were to ever bother is just CSS style or something, but honestly, they work, they're faster than most forums out there and aren't crashing all the time (I'm looking at you vBulletin!) so it's fantastic.

I'm a broken record on this, so you may have seen me point it out before, but NNTP started dying in the mid-late 1990s, when binaries took over. It was extraordinarily difficult to keep reliable full-feed binaries (NNTP is the dumbest conceivable way to share large binaries), and if you couldn't do that, customers would yell and ultimately abandon your service for a cheaper one, while opting for more centralized Internet NNTP services.

Ultimately I think the web would have eaten Usenet anyways, but it's a shame; we were Freenix-competitive (I think I independently invented the INN history cache), and that was some of the most fun I've had doing systems engineering work.

> I'm a broken record on this, so you may have seen me point it out before, but NNTP started dying in the mid-late 1990s,

I didn't start posting to usenet before 1999 and was a regular poster in a few groups from that time up till around 2014. Excluding spam, what was the activity level of groups before and after eternal september?

> It was extraordinarily difficult to keep reliable full-feed binaries

I don't understand why ISPs wouldn't just limit their newsfeed to the text only newsgroups? Did peering arrangments require one to also provide binary newsgroup access? IME, ISP and university news servers had mediocre binary completion rates at best. If someone wanted binaries, they could always subscribe to one of the paid newsfeeds that provided better completion. So I don't really see the incentive for providing binary access at all at an ISP/educational institution level.

That was exactly what Universities where doing here in the 90's. We didn't have physical disk space or bandwidth for a complete feed, but where able to provide all the non-binary groups in the main top levels (including alt.*).

The ISPs needed the binaries though, because that was all they were used for. People read the text on their Uni systems, where they got free dialup, rather than chew up their ISP connection time quotas.

> People read the text on their Uni systems, where they got free dialup, rather than chew up their ISP connection time quotas.

At dialup speeds, the only practical binaries one could download would be images. mp3 files were practically the upper limit of file size one could download. Beyond that, articles would expire off the server before they could be downloaded.

Without broadband and better completion rates, which commercial ISPs didn't really provide (especially the latter), customers probably wouldn't really try using their usenet feed for that purpose when they had other alternatives for binaries.

I just used my ISP's usenet access for text groups until they discontinued it.

If you disabled binaries, customers would get angrier than if you simply didn't provide Usenet at all. Nobody that cared about Usenet would sign up to a provider that didn't provide full feeds. I agree that it's irrational, but it also destroyed Usenet, a couple years earlier than I think it would have otherwise.

For what it's worth, I never called my ISP's customer support to complain about bad completion rates on Usenet. Logically, people would have found other ways to download what they wanted, either by finding an alternate source, or another usenet feed to replace or combine with their ISP's.

What really took down usenet was when Andrew Cuomo, back when he was the state attorney general of New York, made a deal with several major ISPs to restrict access to child porn via usenet.

This lead to many ISPs discontinuing their usenet service, which in turn lead decreased the number of people posting to text groups. Within a few years of that happening, practically all the regular posters in the groups I used to frequent just stopped posting. Those same groups now only have spam posted every several weeks based on what I've seen via google groups. Prior to that, these groups had plenty of active discussions going back to the mid '90s and earlier.

I was the tech lead at the most popular ISP in Chicago in the mid-late 1990s and I assure you that people complained, on Usenet, in email, and in phone calls. And we kept a full feed!

Ah I see, I'm not more curious that you say it's really simple, I haven't read the spec much personally. I loved the concept of the D forums so much I intended to attempt to setup my own NNTP daemon from scratch, but it's in a bucket list of projects I want to try out, the only resource I could think of reading are the RFCs not sure if anybody else has documented Usenet otherwise.

What ever happened to grouplens? That is a protocol/system for collaborative filtering that used Usenet as a guinea pig.

Probably a bad idea, according to what we know now about echo chambers in social networking.


What you can do with NNTP is run a local NNTP caching server. Then connect to that server instead of the real one. Your caching server can retain articles as long as you want; much longer than the upstream server.

(Though mere long article retention is not necessarily the best archive interface, of course.)

Disclaimer: I'm not well-versed in the solutions in this space. Maybe there is some NNTP cacher out there that also has a web archive interface into it or whatever.

Yes, and I have 100% of the D newsgroups archived back to the very first post. Anyone can get them from the D NNTP server. I also wrote a program to create static web pages from them:


and the generated pages:


When we were working on the history of the D programming language paper, this was an invaluable resource.

Back when I was first learning about D (over 10 years ago), I crawled those archives and sorted all the posts based on the comment count. It revealed many interesting topics and the history of how D progressed as a language.

I wish forum.dlang.org had a quick way of browsing just the top list of the most commented posts.

Well it is open sourced on GitHub you are welcome to make tickets. Maybe thats a small type of enhancement the D forums could benefit from. Filters that produce informative pages / results.

Good idea. Never thought of that!

I've been using the NNTP server provided by https://www.aioe.org/ for quite a few years.

There is also https://www.eternal-september.org/ which I used.

AOIE requires no authentication. The Eternal September server requires account registration via the web site; then you use an authenticated NNTP connection.

There are other servers out there.

These sites do not provide any archive.

Google's handling of these critical archives they were given is pretty abhorrent. The usenet archives should really be made public since there is no business value to them and they don't care about usenet.

When Google started, there was maybe an overall altruistic, visionary, principled culture among many pre-Web Internet-y people, and it looked like Google was of that same school of thought.

(This was at the same time that there was a gold rush of IPO plays, hiring anyone who could spell "HTML", and plopping them down in slick office space, Aerons for everyone, and lavish launch parties, with tons of oblivious posturing and self-congratulating. But Google stood out as looking technically smart, at least I believed the "Don't Be Evil", since that was the OG culture, and it seemed a savvy reference to behaviors in industry and awareness of the power that it was clear they would probably have.)

That might be why it wasn't surprising to hear of things like someone entrusting a bunch of old university backup tapes to Google's stewardship.

This has played out with mixed results, and I think Google could be doing much better for humanity and for techie culture.

Google didn’t kill Usenet; it was already pretty much dead. Web forums had all but taken their place (and where are their archives now? So much is lost).

If you look at the history, Google basically rescued the data from a collapsing Deja News, and made it available again. A nice gesture, which didn’t serve to benefit Google much in the long term.

If we want to preserve history then we can’t rely on for-profit companies. We need to instead fund non-profits whose specific charter is archival and preservation, like the Internet Archive.

> The usenet archives should really be made public

Given the nature of Usenet, they were if anyone wanted them.

Various people sent their old tape reels and other backups to Deja News, which compiled everything. But Deja News never made freely available the individual archives or the collection, nor did Google. The oldest stuff is locked away by Google because the only hard copy was destroyed when sent to Deja News. As time wore on most of the remaining fragments that at one point could have been recompiled independently also disappeared.

What Google is doing by refusing to publish the archive or even share it with parties like the Internet Archive is completely unjustifiable and anathema to everything they once stood for.

> What Google is doing by refusing to publish the archive or even share it with parties like the Internet Archive is completely unjustifiable and anathema to everything they once stood for.

Couldn't a copyright claim (or something under the GDPR or UK's DPA) be used to regain access to those though?

Just because something is published to a public forum doesn't mean you relinquish your rights.

Copyright is a legal mechanism for restricting others from making copies, not for demanding they make copies for you. Off hand I'm unaware of any general legal mechanism to accomplish that outside of a contract or promise.

That’s why I suggested the DPA which does allow for rightsholders to request copies of data pertinent to themselves - I’d argue that usenet postings would fall under that scope.

Doesn't that just create an incentive to destroy the archive before GDPR authorities can shake them down over it?

Perhaps - but it also creates an incentive for companies to destroy inappropriately-held and collected personal data they have no business possessing.

The DPA isn’t new - it was created in 1988 - and UK ISPs had Usenet/NNTP servers long after that.

Google acquired probably the biggest searchable archive, Deja News. What we needed was some kind of self-sustaining org with a strict charter to preserve the archive no matter what.

Archive.org ?

Maybe, though making themselves a target of book publishers may have risked their other responsibilities.

They were until they were not.

> they don't care about usenet.

They cared enough about to kill it.

Controversial question: Why should we preserve code that no one uses anymore? Why should we not allow some information to be simply lost?

Because it's a cultural artifact, of its time. It's history. And some people would like to be able to read it, or do other things with it.

Personally I'd like to be able to link to my own posts from that time, for when people asked me what I used to do. But I can't find them any more.

These groups are mostly not code. They are conversations, design discussions, ideological discussions, jokes, that sort of thing.

Like what we have now in social media, except back then there was pretty much only Usenet, and it had a very different feel than the current social networks.

They are where things ideas like the smiley, and free and open source software, and utopian ideas of internet culture were developed. All the early internet memes. And of course all the knowledge people shared.

Conducted in public at the time and thought to be archived for the long term.

Wonder what people will think in a hundred years when they read that everyone believed the universe was made up almost entirely of invisible and intangible matter? It'll be some future generation's flat earth joke.

This past Sunday's New York Times noted that until the 1860's, almost all reputable scientists insisted that pandas were a myth.

As someone else pointed out, losing information is bad because we can't know what value it might have in the future, only what value it has to us today. A lot of things from the past that we are certain had no value to people at the time (such as literal garbage heaps) are of immense value to historians today in understanding the past and the context within which those "worthless" things existed.

You're right though that a decision will probably have to be made at some point about what to keep and what to toss (how big is YouTube, exactly? Are we really going to keep every video, in its original resolution, forever?), but this is just plaintext, it takes up almost no space. The decision doesn't even have to be made, since it's easy to find the means to store this, so why bother making it? Kicking the can down the road is actually the best decision in this case, since the people of the future will (hopefully) have a clearer understanding about what was important in our own past than we do currently.

Why should we preserve old websites that no one uses? Why bother with historical documentation at all?

It's because, at the time, you don't know what information is going to be important and what is just garbage. Documents that are apparently useless today could become fascinating tomorrow.

No, it's a reasonable question. We're not going to preserve, certainly not in a findable way, every piece of digital flotsam that has ever been summoned into existence. In general, we probably should save what we can of Usenet for historical value as balanced against the fact that the archives are tiny in the scheme of things. They're probably also messy but that's probably OK.

Interestingly, when some people saved a great deal of the Usenet archives pre-Deja News, one of them said something to the effect of they wished they had prioritized saving social discussions and so forth because, by and large, saving discussions about a bug in a long ago version of SunOS probably wasn't very interesting.

saving discussions about a bug in a long ago version of SunOS probably wasn't very interesting.

Honestly even that sounds pretty fascinating:

It could help someone gather stats on the nature, frequency, and severity of bugs over time and across companies from another angle.

It could provide a fresh perspective on modern OSes by showing how historic OSes did things.

And it might be good material for a course on the history of software engineering practices, showing classes of bugs that have been eliminated, and styles of development and customer support that worked or didn't work.

I suspect the information would be too fragmentary to extract anything statistically useful in it. But, yes, there are possibly historically interesting nuggets in those sorts of topics.

Here's the article I was thinking of by the way. https://www.salon.com/2002/01/08/saving_usenet/

Why not? Our capacity for storage has been increasing exponentially such that yesterday’s data is basically of negligible size compared to what we are producing today. There’s no reason to delete history.

So no one is keeping you from doing so. No reason to hope some one will do it.

Indeed! That's why I regularly donate to the Internet Archive. :-)

Which is a very laudable response! With the caveat that pack-ratting everything is going to be an endless treadmill. I certainly favor preservation but at some point you do have to consider what you're saving and why.

You assumption "no one uses anymore" is glaringly wrong in this case.

Those archives are full of useful and informative information.

Not everthing changes fast. Common Lisp has been around for 30 years basically unchanged. The discussions back there can be truly informative for today.

It does take time to wade thought it, but people have been collecting (via the google archive, when it existed, sigh) curated lists.

https://www.xach.com/naggum/articles/ https://www.xach.com/rpw3/articles/

For the same reason we don't just tear down the pyramids and build condos there.

There are still interesting things to be learned from ancient artifacts.

But we do tear down old condos to build new ones. Should we also endeavor to retain every geocities and myspace page?

And if not, what makes comp.lang more like the pyramids than geocities?

Should we also endeavor to retain every geocities and myspace page?

Yes: https://www.archiveteam.org/index.php?title=GeoCities

Digital data is not exclusionary in physical space like condos. And even random myspace pages with hacked stylesheets show the common culture of an era.

Do you know about cuneiform? Lots of what is known are just ledgers and exercise books...

Never forget that we do not know the future.

Future digital tourism.

That or risk future archaeologists thinking COBOL was some God of the time and the natives built large metal obelisks in dedicated worship temples.

why do mennonites and other such groups use low/deprecated technologies? partially due to religious creed, but also because when the electricity is gone, oil lamps still function, and horses dont need a petrol pump to keep running.

likewise many people are clinging to the local operating system rather than moving to the SAAS model.

so what happens if we lose the oldschool languages and platforms entirely, for whatever reason ?

if TBTF corporations are somehow hobbled or neutralized, we need old hand tools to build a tech newtopia from the rubble. if those tools are destroyed then we are beholden to a system that stands on very thin ice.

I would add to this that not all forward progress is necessarily good or well thought out. If there is value in an old thing that hasn't been unlocked yet, and it is lost to history, we become collectively worse for wear. Things like Lisp are old and pretty darn cool to have as an option.

I second the need to rebuild from the rubble is often overlooked, especially by corporations driven by profit centered goals.

The thought process and conversations that produced the code give insight into how to more generally produce code of that kind. Typically code currently in use is in continuity with code that was previously used, either as a system dependency or conceptual dependency. So it's still useful to have history around, like it would be to have comments in current code.

Well I think it’s ok in general for some information to be lost, but I think a lot of HN users value this specific information.

I’m sad to see that this was downvoted, it’s contains the key questions. I think they have good answers.

1) Eventually, everything will be lost anyway. The original print of King Kong is gone. A fire at Universal Studios wiped out the masters for a lot of music at once https://en.wikipedia.org/wiki/2008_Universal_fire . Floods destroy family photos all the time. But those are examples of the forces of decay, of natural entropy, of error. The Library of Alexandria probably contained a lot of useless crap but also nuggets we’d want to know today. Information is memories, useful information is useful memories, and there’s no compelling REASON to lose it. Other sections of usenet history were wiped out when Google acquired it (a lot of comp.database.olap content I had a hand in) and groups of people just lost a knowledge base.

2) It’s not simply code that no one uses anymore. It’s a knowledge base on how and why, debates over constructs and usage that are useful beyond code-sharing snippets a la Stack Overflow.

3) There is an argument for letting some information get lost or at least super-obscure, but it’s hard to see this being a good example. Tide Pod Challenge videos come to mind. GDPR and right to be forgotten mandate something akin to information loss.

4) I posted this elsewhere but I’ll share here too: there was a comment made on the original article about preserving prior art for IP (patent) purposes. That alone is in the public interest. Irrelevant to your questions in general, but pertinent to each of them in this case.

It belongs in a museum!

The fact that nobody had enough fucks to give to archive these groups tells you everything you need to know about decentralized peer-to-peer proof-of-work blockchain nerd hobbies. This content exists on a completely open peer-to-peer content distribution network and here you are whining that one company -- the company that already rescued this archive in a midnight U-Haul run 20 years ago -- failed to archive it.

Seriously! I have the same issue with a lot of modern online communities/projects too. They all assume whatever platform they're currently publishing on will be there forever.

Brb archiving my Twitter posts

>The fact that nobody had enough fucks to give to archive these groups

Well, you assume. Maybe it was just decentralized enough you haven't heard about it.

Google has bought dejanews and has profited immensely from open source and open information.

So I do think they have an obligation either a) to make the whole archive available for anyone or b) maintain it properly.

Properly means restoring the fast UI from around 2004.

If you found a human at Google instead of a bot, it would probably say their only obligation is to their shareholders.

It's probably not a good idea to depend on a public company to steward an important community.

Does the Internet Archive have copies of all the old stuff at least?

Their only obligation, if we take for granted that there are any humans left at Google, is keeping the aforementioned bots powered.

Which is sad, but expected.

There are quite a few humans at Google, both in HN and at twitter. Sadly all of them that I talked with seemed like people that I would not want to interact with again.

Wait, Google feels any obligations at all? I thought they only made decisions based on what's most likely to maximize their growth?

"... their only obligation is to their shareholders."

That'd be an improvement.

Page & Brin retain controlling interest, despite their minority stake.

How did it profit from the Usenet archives? Genuinely curious.

How did it profit from the Usenet archives? Genuinely curious.

Dejanews was the seed material for Google Groups, any profit derived from that (ads) was from content posted to Usenet by people who never intended for it to be used for that.

Groups doesn't (and didn't ever?) Show ads as far as I know. So you're reaching for second or third order effects at best.

You think that realtime ad impressions is all they get from you reading granular forum posts?

Sadly, even in 2020, nothing has yet replaced what Deja was at the time it got acquired and destroyed.

> So you're reaching for second or third order effects at best.

I'm curious what second or third order effects you think a usenet archive had on GG.

From G side they gained users interacting with 50K+ topics and occasionally posting views, sentiment etc. (and likely all Deja News historic interactions)

In addition to the search history, email content, geo location etc. G have for many people

Deja did a fairly good job at destroying itself beforehand - not just financially.

I remember how awesome the initial version of the Google usenet archive was. It's horrifying how much they have let the UX deteriorate.

This type of behavior is why I can never consider GCP. How many people have been burned at this point by Google randomly shutting down something they rely on?

I've had two Google accounts shut down in the last six months with no explanation. There is no appeal. The consumer services I've used (Feed Reader, Play Music) have been shut down, and the cloud service I was most interested in was luckily shut down before I was able to use it. (They used to have a service to resize & manipulate images in Blob Storage. I found a good AWS alternative[1] instead). I cannot rely on Google for anything at all, and definitely not for something as important as cloud services.

[1] https://github.com/awslabs/serverless-image-handler

Are there any indications to you why your accounts got shut down? Any pattern you noticed?

I - as most of us - have a personal google account, and our company uses a google business account. While I'm following news regarding google cancelling accounts at will, I fail to notice a reliable pattern: (alleged) fraud and other illegal stuff seems to comprise a good part of it, but at most 30-50%.

No, there is no pattern. The last one happened when I got a new Android phone. I logged in on my work account and my personal account, and the work account got suspended. It said "suspicious app", but the only app I used it with was Google Meet. The personal account was used for much more, but didn't get suspended. I half suspect that they deliberately have false alarms so they can act like they're more secure, but it's more likely just a horrible, unaccountable AI.

I treat all Google accounts as throwaways now and don't use the work email at all because I want to know that I can actually receive emails that are sent to me. That's a huge problem even without randomly losing access, because their spam filter has a ton of false positives and those emails don't get forwarded to my real address.

>Their spam filter has a ton of false positives and those emails don't get forwarded to my real address.

This is very interesting to me. I've used Gmail for 10 years now and I've found the spam filter to be nearly impeccable. I can't recall a single false positive. I can't even recall a single false negative, though I am moderately careful about who I provide my email to.

Now I'm left wondering if most people think about Gmail more like me or more like you...

Anyone like me who sends maybe 10 emails a month to gmail.

I'm in their spam sin bin since a spammer managed to find an old test account on my SMTP server with a weak password and spammed the world for a day or so a few years ago. The problem is that I don't send enough emails to get out of the bin.

This isn't a problem google cares about, small senders with no reputation are basically screwed. I can deliver to gmail hosted accounts I've got a relationship with (personal & my own work address) but I can't reliably send to other email addresses at work.

You won't necessarily be able to know if there was a false positive. Google might reject it or, worse, accept the message and not deliver it to you.

I get a ton of false positives. Probably more false positives than actual spam.

I check the spam folder weekly and always find something important in it.

My experience mirrors yours.

Something like this allowed me to disable gmail’s spam filter when I set up forwarding:


It is an awful hack. If they can’t be bothered to maintain their spam filters, they should at least let people opt out.

Thank you!

How do you sign up for new accounts given the mobile number requirement? Do you just reuse the same number?

I tried to make a Google account for work use the other day, and got stuck at that point. Given Google's history it seems silly to use my personal account for work, or to connect the two accounts in any way.

I had two personal accounts that I registered before GMail had a phone requirement. I don't remember setting up the work account, but I probably just used my normal phone number.

> Play Music

Play Music has not been shut down (yet), and you can transfer everything to Youtube Music, which is available at the same price (and in my opinon a superior product).

Some randomly selected people can transfer everything to YouTube Music. I can't, and it may be months before Google would allow me to. It's exactly that kind of treatment that makes me feel like Google has zero respect for its customers.

Spotify is generally better than Play Music though, so it was for the best in the end.

Google Achilles heel is they have two businesses

a) Spy on people and sell the data to advertisers.

b) Use that data to directly push ads

That's basically incompatible with b2b services. Or consumer services. As a customer you're judged by how valuable the data they are collecting on you is. Which is less than a support call costs. That bleeds into every facet of their business. As such even if you pay them money you get the same treatment because they can't think any different.

> sell the data to advertisers

Do they?

They don't sell the golden goose. They rent a limited form of access to it.

Which service from Google rent user data?

Advertisers can choose to advertise to customers that fall into various buckets.

Income, gender, location, recent history of viewing specific types of pages, etc.

This is what I mean by limited form of access. Advertisers do not receive user information, but are granted the ability to use it when setting up ad campaigns.

They don't sell peoples personal data at least publicly. But that makes them care even less about end users, even paying end users. Which means a business is foolish to rely on them.

One thing that's become extremely clear to me over the last decade or so is that almost all tech companies simply do not care about the past, and I suspect at least part of that is so their narrative of progress can be subjected to fewer challenges from those who look back and compare.

Also, and this may be a bit of a tangential point, but the "deny the past because it has something bad" that Google has effectively done here is uncomfortably close to the set of recent and far more political events.

> do not care about the past...

You just reminded me of a quote from an electronic music documentary 25 years ago. One of the Detroit techno artists insisted on taking the filmmakers to a historic theatre that had been left to crumble & turned into a car park:

"In America especially, nobody tends to care about these kinds of things. People in America tend to let this shit just die, let it go. No respect for the history. I, being a techno, electronic, high-tech futurist musician, I totally believe in the future! But as well, I believe in a historic and well kept past. I believe there are some things that are important. Now, maybe this is more important like this, because in this atmosphere, you can realize how much people don't care, how much they don't respect. And it can make you realize how much you should respect."

- Derrick May, DJ/Composer, Universal Techno (1996)


The segment starts at 16:00 in the video and is about 2 minutes long.

I don't think it's quite as simple as "Americans don't care about the past" when discussing cities like Detroit. The actual reason those places were left to rot is a lot worse imo, and it's the same reason that led to San Francisco becoming such a (cheap) haven for LGBT people / artists / etc in the 1970s and '80s: http://cornersideyard.blogspot.com/2020/06/repost-personal-s...

> almost all tech companies simply do not care about the past

You may be surprised that it's not just companies. It's not hard to find people who think it's better for old stuff to just be deleted.

"He who controls the present controls the past. He who controls the past controls the future" - Orwell, "1984"

> Usenet predates Google's spam handling tools

In fact Usenet predates spam itself, since the first spam (Canter & Siegel) was on Usenet itself in 1994 (I was there).

Anyone know if anyone not google has newsgroup archives publicly accessible (The Internet Archive maybe?)

I found this Usenet Historical Collection link - https://archive.org/details/usenethistorical - in a previous HN thread (https://news.ycombinator.com/item?id=16667796).

I have no idea how useful the collection may prove to be. I found 'comp' but it doesn't offer a webpage view, just a link to download a file. https://archive.org/details/usenet-comp

Maybe someone could set up a public inbox[1] instance that allows access to those groups either via HTTP or NNTP.

[1] https://public-inbox.org/README.html

It should be the full archive.


I think you have to register. Not sure how much history is there.

A lot of posts are missing from this one.

Most free and ISP based usenet feeds had a lot of missing posts, especially since they allowed older posts to expire. Even the commercial usenet providers only started their archives about 12 years ago.

No, no, no. These groups and other Usenet groups archives must be preserved. They're our history.

Anyone looking for a hobby? It is time to become a data hoarder https://www.reddit.com/r/DataHoarder/

Either those Usenet groups are not part of the world, or they don't consist of information, or Google just failed at "organizing the world's information."

Google has definitely failed. Finding anything that's not frecent is basically impossible.

I read the article and I read the threads here, and maybe I missed it—but why did these groups disappear? Were they banned due to bad words or a mistaken spam filter?

Here's what I get:


> Banned Content Warning

> The group that you are attempting to view (comp.lang.forth) has been identified as containing spam, malware or other malicious content. Content in this group is now limited to view-only mode for those with access.

> Group owners can request an appeal after they have taken steps to clean up potentially offensive content in the forum. For more information about content policies on Google Groups, please see our Help Centre article on abuse and our Terms of Service.

There's no content available for me.

Forth is pretty grim, but I wouldn't go that far ...

Is there means you access and archive it or is too late?


Looks like there has been (likely automated, nearly all of them are the same Italian phrase) mechanical legal complaints and it probably caused this instance of automated blocking going wild.

As an engineer I can understand the desire to automate everything, but please at least have some heuristics to detect this kind of easy-to-detect mechanical behavior before giving the model a full authority to block anyone it doesn't like.

Okay, I did some research, and I think I figured out what caused these usenet group banning.

A Genoese lawyer has been a victim of harassing and heavy doxing for some time, you can find many twitter accounts accusing him of paedophilia in cahoots with epstein, berlusconi, the pope and so on (no, I'm not kidding; clearly the stalker has obvious mental sanity problems).

The stalker is very prolific and is wallpapering the internet with his copy-paste-accouse in every corner, from newspaper comments to ancient forums to usenet. The lawyer report and ask for removation where he can but also he does not seem very worried because it seems that this issue goes on from two years ...

I don't think I can say the name of the subjects in question but in any case I'm archiving the harassment accounts before proceeding with the report, then I'll try to get in touch with the lawyer and see if he can request a new, less "coarse" censorship.

I am Italian and this is very interesting: all the requests were made with the topic "Stalking Diffamation Illegal processing of personal data" but this (https://www.lumendatabase.org/notices/21395773) is simply fantastic. It seem that a fool with persecution mania has reported half usenet and the bots was auto-triggered...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact