Hacker News new | past | comments | ask | show | jobs | submit login
Twitter now requires an account to view tweets (techcrunch.com)
1285 points by celsoazevedo on June 30, 2023 | hide | past | favorite | 1179 comments



From Elon's twitter (https://twitter.com/elonmusk/status/1674942336583757825?s=20)

"This will be unlocked shortly. Per my earlier post, drastic & immediate action was necessary due to EXTREME levels of data scraping.

Almost every company doing AI, from startups to some of the biggest corporations on Earth, was scraping vast amounts of data.

It is rather galling to have to bring large numbers of servers online on an emergency basis just to facilitate some AI startup’s outrageous valuation."


I have a bridge to sell anyone who believes this is true. AI companies have been a boon to businesses that want to lock down user data and now have an excuse. It may be true to the extent that Musk is legitimately angry that Twitter isn't getting a piece of that AI VC money (I'm sure he is). But:

A) Twitter would probably move in this direction even if AI companies didn't exist, this is an excuse. Nothing about Musk's Twitter has indicated that he cares about Open data access or anonymous access to the site, and this follows a general trend of closing down the platform to non-monetizable users. Musk has abundantly shown in the past that he would prefer everyone browsing Twitter be logged into an account.

B) "it's temporary" -- how? You don't have a way to stop this other than forcing login. That situation is not going to change next week. To call this "temporary emergency measures" is so funny; there is no engineering solution for this and you're not going to be able to successfully sue companies for scraping Twitter. Put a captcha in front of it? Sure, let me know how that goes.

You going to wait and see if the AI market collapses in the next month?

If this does turn out to be temporary, it'll only be because of migrations off of Twitter and because of user criticism, because Musk is impulsive and bends easily under pressure. But nothing about the situation Musk is complaining about is going to change next week.


FWIW: I've been scraping the shit out of social media for my AI training. I also do Amazon, AliExpress, etc.

Libs like puppeteer is so good these days that it's impossible to tell real users from fake traffic. Most of the blocks are just IP blocks.


Right. And the IP blocks only add a small cost to the scraping because it forces people to use residential IPs which can't sanely be blocked.


Not to mention Elon is great mates with the VC douchebags that are busy hyping and profiting off the AI hype train.


> there is no engineering solution for this and you're not going to be able to successfully sue companies for scraping Twitter

There absolutely is, if you try instead of whining on internet. People at Vercel have already developed new anti-bot + fingerprinting + rate limiting techniques which look quite promising. I dare say within a year, new tools will be powerful enough to do this easily.


> I dare say within a year, new tools will be powerful enough to do this easily.

I see where you're coming from, but if Twitter is in a position where it can't roll out those protections right now, given its current head counts, etc... it's not going to be in a position where it can roll out those protections next week. Probably not next month.

So it's less that no one could block companies from scraping Twitter (although anti-scraping mechanisms are probably always going to be a cat-and-mouse game, so I'm not sure that there is ever going to be a perfect easy solution). It's more that if Twitter can't do it right now, nothing is going to magically change any time soon about the situation it has found itself in. And waiting a year (even waiting 6 months) for tools to become available before rolling back this rate limiting would be incredibly self-destructive for Twitter.

The way I see it, they're basically guaranteeing that they will need to roll back these changes before they have a solution to whatever specific problem or irritation Musk is fixated on. They're not going to gain additional engineering capabilities in the next week. And how long does Musk plan to leave rate-limiting in place? A social media site where people can't look at content is just broken.


> Nothing about Musk's Twitter has indicated that he cares about Open data access or anonymous access to the site

Not so, he tasked George Hotz with getting rid of that horrible popup which prevented you from scrolling down much if you weren't logged in, which was added soon before he bought Twitter. When that was removed I rejoiced. But now Twitter's gone 100x in the opposite direction.


I don't know; was that Musk's idea, or was that Hotz's idea? I vaguely think this was a change that Hotz wanted that Musk went along with.

To be fair, Musk will regularly pay lip-service to the idea of Open communication. I guess that's not literally nothing, but most large site policies have been in the direction of locking down content.

If there ever was a version of Musk that cared about Open access, it's been a while since that version of him saw the light of day. It's very consistent with his overall behavior to believe that he views Twitter content as being primarily his property rather than a community resource, and that he thinks that scrapers/AI companies/researchers are literally stealing from him if they derive any value at all from data that Twitter hosts.


[flagged]


Let me guess. He got bored and moved on before making any breakthrough.


It's been a week now, you prediction held. Still can't access without an account.


Pretty sure they'd get into troubles with blue check mark buyers implicit promises if it's not temporary.


Elon has a good point there. Much of the current AI hotness is predicated on stealing peoples content and exploiting the infrastructure that other people have built. I don’t think it’s acceptable.

The licenses, compensation models, law, technical solutions, attribution, security and privacy all need time to catch up. Regulation has a role to play as its a bit of a free for all right now.

The irony of Elon mentioning “outrageous valuations” though!


Why would an AI company start scraping twitter html, instead of using an already existing archive? Something similar to archive.org could earn money from that. If all you want is the content, there's no reason to suck it through a straw.

I'd expect those that require real time data, such as stock market bots or sentiment data providers, to scrape twitter (if they don't provide the data by other means, for example the "firehose", which is another great way to earn money).

None of this makes much sense.

Also, it's much more complicated than it seems. The web works because the data is public. You cannot think of it as "my data". (Especially not twitter, since it is really their users'!) Twitter is not higher quality data than any other web page.

If we accept that thinking, every home page would require a login to see that specific company's phone number of opening hours. Those pieces of data are also valuable, in the right circumstances! And then the web would either not exist or the account system required would be so wide spread that accounts would carry no value and the system would become useless.


> Why would an AI company start scraping twitter html, instead of using an already existing archive?

I can think of a few possible reasons. They might want more up-to-date info, or they might have no real developers and the scraper was created by a business guru who prompted ChatGPT and didn't understand the code that came out.

Given what else Musk has asserted about Twitter, and how often former or current Twitter devs have contradicted him, it may not even be what Musk said.

> Twitter is not higher quality data than any other web page

Eh, depends how much you can infer from retweet, favourites, etc.

Won't be the only such site, but it's probably better training data than blog posts are these days.

But yeah, I absolutely agree that Twitter doing this caused a lot of damage to any orgs, corporate or government, which wanted to be public, anything from restaurants announcing special offers to governments issuing hurricane warnings. Twitter isn't big enough to assume everyone has an account, like Facebook is.


If there was more value in requiring login than there is in having this public and easily accessible, it would be behind a login form. The current internet has 99% of the time nothing to do with values someone imagined in the 80s


Value to whom? Twitter is more valuable to its users, to journalists who embed tweets in stories, and to web users at large who follow links and search results if it does not require a login to view posts.

Of course, none of those people own Twitter, and it may well be more valuable to its owners if it does require a login.


What you describe is the Facebook business model. Which seems to be a valid model, but twitter was not built around it and such a pivot would break all business moats around the company.

There was no web in the 80s so not sure what values you refer to, or how they are relevant to today's businesses.


What do you think the webs value is today?

What do you think the webs value was imagined to be in the “80s”?


Because they want an advantage over competitors who are using the archives already…


If every AI company pointed their scrapers to archive.org, that site would go down immediately as well.

This is just kicking the can down the road.

We have a major structural problem now. We want data to be free and machine readable, but no startup (and even a giant like Twitter) can afford the server cost to withstand all those machines.


> Elon has a good point there. Much of the current AI hotness is predicated on stealing peoples content and exploiting the infrastructure that other people have built. I don’t think it’s acceptable.

But then so is Twitter. They don’t produce any content whatsoever. The data they are having a fit about is not theirs, it’s been volunteered by the users. It’s the same line Reddit is pushing, and it’s bullshit. AI companies scraping the web is no more unethical than Google doing it.


Well, one thing is people going there and putting their data on a platform. That’s their choice.

Taking/scrapping/stealing that data out of said platform for the benefit of your over-hyped “disruptive” startup - and implying that others should give you all for free - is the issue.


That’s not the point. Twitter has a non-exclusive license to distribute the content; it’s not the owner of the data regardless of the high horse Musk feels like riding today.


Please don't throw around the word "stealing" so loosely.

Scraping data from a public website is not "stealing". It might be a violation of the terms of service, but then you have the whole issue of click-through (formerly shrink-wrap) licenses and contracts of adhesion.

If someone isn't vetting you and potentially signing you to a more meaningful contract before giving you access, for free, to data, then using that data for any purpose whatsoever (except republishing it or derived works, which might, depending on the nature of the data or the derived works, be a violation of someone's copyright) is so far from "stealing" that using that word is wrong, and I suspect intentionally inflammatory.

That's why Elon limited access, rather than going to the police to file charges for theft, or suing over copyright violation or breach of contract. Not to say he absolutely couldn't do the latter, but it's hardly a clear win.


There are much bigger issues.

1. Users or one might say content creators don't own their data. Not just do those platform owners make a lot of money with the content (which they have a license to, as per site ToS) but then you have third parties scraping it now for commercial products. Using the data to train models that are then sold back to some of the same social media users who produced the content for free in the first place wasn't a thing until very recently, it used to be a select few doing machine learning research in the past. The laws are lacking behind the tech development and regular internet users are being exploited because of it.

2. It absolutely is stealing in some cases, and even worse. For example when they scrape it for content which they then use to train their bots to impersonate humans. Or on Twitter, there's a very common type of bot that steals content from young attractive female social media users in China, auto-translated to English, to pose as them. If you're in finance and crypto circles they're swarming with these accounts (guess the scammers know their targets).

3. In general this is only going to get worse from here on. LLM are getting better and better. On sites like Twitter you already have no idea if you're interacting with a human or not. But these "AI" can not actually think for themselves, they can only emulate, they can copy other humans. At least so far. So for the sake of making progress and ensuring we can still have intelligent discussions and find novel ideas online, it's imperative to have a way to keep the machines out. Social media must become sybil resistant or it dies in a vicious circle of self-referencing bots ever parroting the same old talking points, or variations thereof. We urgently need human ID!


Theft requires the victim to be deprived of their property.

Nobody is deprived of their property, intellectual or otherwise, when this stuff happens, thus is it not right to call it stealing.


You may wish AI didn't exist, but it does. There's no putting the genie back in the bottle. We can still go after people who commit crimes using AI. Perhaps one day AGI will be possible and we will want to have discussions and share ideas with it just as do now with each other.

Governments, researchers, and all kinds of third parties have already been scraping every publicly available bit of data possible. There may be an increase now, but it's nothing new. It won't be the end of the society or the end of the internet anymore than AI will.

Also: https://www.youtube.com/watch?v=IeTybKL1pM4


We may be using different definitions of 'intelligence'. To me there is no AI that currently exists but I'm aware the companies market it as such of course.

>have already been scraping every publicly available bit of data possible

Data scraping is limited by economics just like anything else in the world. Storage costs money, someone has to pay for it. Researchers do not have unlimited funds. Some select few governments like the US may have most of the publicly accessible web archived. Keep in mind it's dynamic and requires massive data infrastructure to pull this off, there's tons of new data coming in daily. Private startups getting in on the action in a big way is a relatively new phenomenon, this used to be limited to enterprises with a specific purpose. Now everyone and their 4chan cousin are experimenting with their own deep learning models.


Heres the thing about the ToC and licenses:

Bots aren't people and can't read nor consent. They just consume.

Any page which can be served without first displaying a ToC or other terms which explicitly prohibit access is not protected by a ToC or other license from scraping, as they can be considered a Point of First Contact in each case, as the bot has selected each link from a simple aggregation of all links it encounters (each interaction being "new" in essence).

Now it could be argued that ignoring robots.txt is an explicit contravention of norms and standards which could be viewed as a violation of an implicit licence, but there is no law requiring adherence to robots.txt and thus no mandate that a program even look for it iiuc.


Bots aren’t people and can’t consent, sure - but they are tools that are wielded or deployed by people who absolutely can consent (setting aside whether click-wrap terms are enforceable or not). If I throw a brick through a window, it’s me in the shit, not the brick.


If I have an open door to my business and someone's automated robot walks in the door to see what's available, how is that different?

Even more applicable, this is like saying that a person walking down the street can't have a camera and take a picture of the front of the building....

Because the page you land on when entering a url is in fact little different than a store front, with the associated signage and access points defining how a person or automated device may interact with that business.

If you want to have it different then you have to actually put everything behind a locked door with no window, right?


This could easily be solved by making the unauthenticated access hard for machines to consume, like introducing delays or some kind of captcha or even just proof of work (reverse some hash). While the authenticated get all the snappiness they want.

I'm strictly anti account, so he just lost me as audience. The next walled garden after Facebook and Instagram that won't ever see me again.


It already was semi-hard to machine-read, that is the reason I use Nitter for doing my small-scale continuous scraping of twitter which is now temporarily broken. Nitter is tons easier to parse as it's not reliant on JS, etc, and simpler to create screenshots of with headless chrome.

However if you mean implementing some even worse obfuscation (kind of like FB putting parts of words in different divs etc) that is not really compatible with the situation that this needed to be done as more of a temporary emergency measure. And PoW doesn't sound reasonable because it sets mobile devices against the scraper's servers. If all of this was just so easy, scraping would be dead. Good that it isn't.


> And PoW doesn't sound reasonable because it sets mobile devices against the scraper's servers.

Scraper servers and mobile devices have different access patterns though. I I'm reading tweets then I'm fine waiting 1 second for a tweet to load. Page load times for this kind of bloated stuff are super slow anyway, meanwhile my mobile could spend a second or two on some PoW. But if you want to large-scale scrape, you suddenly have to pay for 1bn CPU seconds. And this PoW could even keep continuously increasing per IP. 0.1% with every tweet. Not noticeablr for the casual surfer sitting on the toilet, neck-breaking for scrapers.

> If all of this was just so easy, scraping would be dead. Good that it isn't.

Small-scale scraping could still be provided through API access or just a login.

The reason they are not doing the "easy" thing is that they don't see a need (yet, perhaps). Just get an account, they'd say, and they are right. It works for Instagram too, except for some weirdos who nobody really cares about.


Of course the scraper would have to pay too. But it makes for a race between how much they are willing to pay, versus how much worse the experience gets for real users. And for successful mobile apps, reducing average load even during active use is important (example: idle games that don't want to make your phone a drying iron, companies invest in custom engines and make all kinds of compromises to avoid this). And burst-allowing rate limiting is something I'm quite sure was already in place, especially with prejudice towards datacenter/VPN IP's. But similarly to how it is with search engine scraping, professional scrapers already have costly workarounds for these.

>The reason they are not doing the "easy" thing is that they don't see a need (yet, perhaps).

This argument just doesn't make any sense. Twitter notes that this is hurting them. Previews in chat apps, just clicking links in non-loggedin contexts is are broken. I feel like you just predict that this will turn out to be more accepted in the near future and become more a more permanent decision, which you don't like.


Im not fine waiting 1 second.

Most baffling is mobile reddit, where it takes like 6 seconds to load. Do they want us to use their crappy app, or they just dont care?


They're acting like they're desperate for you to use their crappy app.


They’re pulling every underhanded trick in the book to try and force mobile users onto the app. Yeah, I think they want you to use the app.


You can still get a login and have no delay.

For non-auth use, I rather wait for 1 second than not have any access at all. Which is the current state of affairs.


Maybe that is already the PoW anti-scraping measures haha.


HTTP Status Code 429 exists for this very purpose. While I sympathise with the idea that services need to protect their content from scraping to power AIs, I can't help but feel its a convenient excuse for these companies to re-implement archaic philosophies about online services. i.e. Killing off 3rd party apps and walling their garden higher, both feel very boomer in their retreat from the openness of the internet that seemed to be en vogue prior to smartphones. Perhaps this is just the transition from engineers building services to business, legal and finance trying to force the profit.

Correct me if I'm wrong, but surely throttling scrapers (at least ones that are not nefarious in their habits) is a problem that can be mitigated server-side, so I find it somewhat galling that its the excuse.


> is a problem that can be mitigated server-side

No matter what you do, this will cost server infra. That's Musk's argument for disabling access altogether.

Therefore it would make sense to have a solution which burdens the client disproportionately in relation to the server. A burden so low for the casual user that it's negligible but in aggregate, at scale, would break things. Which is what he wants.


> Which is what he wants.

Looks to me like both reddit and twitter are using the wedge to rather increase the height of the wall of their gardens and kill 3rd party development as opposed to genuinely trying to license bulk-users appropriately.

You're gonna need to license api keys so you're already identifying consumers and there's your infra which you need anyway. At which point you can throttle anyone obviously abusing whatever free/open-source tier offering you give out as standard.


Unless the captcha is annoying enough to a significant degrees, I doubt that it would work. With all the money in the bucket, scrapers can just hire a captcha farm to get pass the captcha with help from a real human.

Also a side note: distributed Web crawler is not unheard of these days, as well as residential IP proxies. Meaning the effectiveness of Proof of Work model maybe also limited.


How do residential proxies help? Scraping would effectively be bitcoin mining, which costs resources without shortcut.


Many online services (including Twitter) do employ some kind of IP address scoring system as part of their anti-scraping effort.

These systems tend to treat residential proxies as normal users, and puts less restrictions on them. On the other hand, if the IP address belongs to some (untrusted) IDCs, then the system will enable more annoying restrictions (say rate limits etc) against it, making scraping less efficient.


Sounds like outright banning is a stopgap measure, maybe they will implement one of these solutions


The other option would be to front caches through ISPs and the like.

This works far better when the items requested are small in number but large in volume (that is: a large number of requests against a small set of origin resources). When dealing with widespread and deep scraping, other strategies might be necessary, but these aren't impossible to envision.

Specifically permitted scraping interfaces or APIs for large-volume data access would be another option.

Of course, there's the associated issue that data aggregation itself conveys insights and power, and there might be concerns amongst those who think they're providing incidental and low-volume access to records discovering that there's a wholesale trade occurring in the background (whether that's remunerated or free of charge).


Elon is making a point and a reminder for everyone that what you share on social nets like Twitter is basically not owned by you, but the service.

Actually I’m surprised this took so long to do, and in the light of doing so shows that perhaps Twitter was sold for its existing content rather than existing or active user base.


> [...] perhaps Twitter was sold for its existing content rather than existing or active user base.

Not a very significant distinction: if active users stopped posting, scrappers wouldn't have much of a reason to keep scrapping.


Most AI startups don’t particularly care if content is 3 minutes old or 3 years.


If you're adding new/marginal data you should want it to be as current as possible so it'll have things like trending slang terms.


They absolutely do. What are you even saying?


AI startups training data covers content going back years. DALLE for example was trained on hundreds of year old paintings alongside more modern works.

Age may be included as part of the training but they generally want to suck up as much data as possible.


Yet.


> basically not owned by you, but the service.

It's shared ownership. You own it, but give Twitter non-exclusive permission to also use it.

This is why news agencies request permission from a twitter account before sharing a picture they took.


Twitter can delete the post without your consent, so you don't really own it.


You consented to their being able to delete it when you agreed to their terms of service. It’s like if you hire someone to clean your home. Mostly they’re tidying up and dealing with dirt and dust, but if they see what looks like a used napkin lying somewhere, they will probably throw it out without first asking if you still want it - without that being stealing and without ever owning it themselves.

It may seem weird weird to compare useful content to a used napkin, but hey, successful business founder stereotypes do quite often involve have an idea written on a napkin…


I didn't consent to anything, I don't have a Twitter account. I'm talking about people who do. And they often mistakenly think their content will stay on Twitter forever, so they don't need to back it up.


Fair enough. By “you consented … when you agreed”, I really meant “one consents … when one agrees”, as is common in informal English.

Yes, it’s a mistake to rely on social media content remaining up forever, agreed. That’s separate from ownership. Backups are important even for data on a hard drive you physically own, since hard drives can fail or be damaged or lost.


Sorry, I don't think I've ever seen “you” used as a generic pronoun in a past-tense sentence, which is why I took it personally.


Also fair - it’s possible that my choice of tense made that the only literal reading, but my point was intended as general and not accusatory.


You can't park there mate.


That's not past tense.


Ownership is not the same with right to display.

I can own a picture, but I can't place it on the NY Times website.


It's not though. They have a very permissive license but they don't have any actual IP ownership.


Was already the case for ICQ. (yes, it was in their ToS)


How do you define stealing? Is the AI data obtained from accessing private data? Data that users did not make publicly available but kept to themselves on their own devices?


I can't really agree. We've already had rulings about data scraping and I don't see the difference here. Just that a lot of people do it now?

Also, Twitter is a public platform. Twitter didn't generate comments, and people posting on a public account are indirectly subject to public viewing. Not much different from being indirectly recorded in a public park


> stealing peoples content

The Twitters and Reddits need to be careful here when complaining because without users generating free content, they also have no business.


We know that quality data is king, with all due respect to people tweeting, that data is most likely garbage.


If I visit Twitter to work out how to sort some JavaScript issue, and that makes my company $X, am I stealing content, or am I just using the platform?

There's one major player making money off of other peoples content here, and that's Twitter. Why are they ok doing that, but not anyone else?


Scraping has been a thing since the web started. Happens to every public site.

I recall that email at one point was 90% spam.


But is Twitter a good source for AIs?


Used responsibly, of course it is. A developer is able to ingest current language used in exchanges about current topics, as well as cite prominent sources that are still using the platform.


> exploiting the infrastructure that other people have built

Including you and me. WE built part of the infrastructure.


It's not like Twitter is compensating tweet authors either. For using art the debate is still opened in my opinion even if I'm personally not in favor of it but I don't see how those platforms built on user made content (even more of a clear cut than AI) can have a say on this


People are happy to put their content on social networks. Maybe they get some value in return such as sales, exposure, signalling or simple enjoyment.

Many people who aren’t that privacy conscious would however object to lots of companies, big and small, sucking their content into their databases for their own uses, then republishing after it’s passed through a few AI models.


>People are happy to put their content on social networks.

Do they have a choice? A handful of corporations have captured all the network effects. If you need to reach an audience to do your job or find your "friends", what other choice do you have but to give your data to them?


I fail to understand this argument. If your friends are close enough do you need a big corporate network to share content/thoughts?

If they aren't close why do you care?

Alternatively, what would make more sense is to participate in communities gated by these networks but then it's your choice to be there.


>If your friends are close enough do you need a big corporate network to share content/thoughts? If they aren't close why do you care?

I don't personally have this problem, but my observation is that most social relationships are somewhere in between closest friends and don't care.

My own concern is more about participating in professional, neighbourhood, civil society or political communities. Choosing not to be where they have decided to congregate means not being able to do my job and not making my voice heard where many decisions affecting me are taken.


Yes. AI is becoming the content launderer. I mean what's the difference? You could ask an AI to make not star wars. And what's the difference between that and all the not star wars movies made in the 80s? It's that it was automated this time around?

I think this points out that AIs clearly do not work like human brains. Human brains do not need all of the content of humanity to produce a replica of art station mediocre.


It's not like there's many alternatives, network effects are very powerful and even with Musk running the company into the ground, there's not many people really quitting which tells a lot on how hard that effect can be.


They have announced a plan to compensate creators based on ads shown and also have implemented a subscribers feature (people paying users for special access to some tweets)


I had no idea this existed, thanks for the insights.


Well they killed the API, what did they think would happen? It's easier to control access and rate limit with a proper API.


The actual problem seems to be that a large number entities now want a full copy of the entire site.

But why not just... provide it? Charge however much for a box of hard drives containing every publicly-available tweet, mailed to the address of buyer's choosing. Then the startups get their stupid tweets and you don't have any load problems on your servers.


What do you even charge for that? We might never make a repository of human made content with no Ai postings in it ever again. Seems like selling the golden goose to me


It's already public information. The point isn't to extract rents, it's to remove the incentive for server-melting mass scraping.


Substantially higher loads than Twitter gets today were not "melting the servers" until Musk summarily fired most of the engineers, stopped paying data center (etc.) bills, and then started demanding miscellaneous code changes on tight deadlines with few if any people left who understood the consequences or how to debug resulting problems.

In other words, the root problem is incompetent management, not any technical issue.

Don't worry though, the legal system is still coming for Musk, and he will be forced to cough up the additional billions (?) he has unlawfully cheated out of a wide assortment of counterparties in violation of his various contracts. And as employee attrition continues, whatever technical problems Twitter has today will only get worse, with or without "scraping".


Scraping has a different load pattern than ordinary use because of caching. Frequently accessed data gets served out of caches and CDNs. Infrequently accessed data results in cache misses that generate (expensive) database queries. Most data is infrequently accessed but scraping accesses everything, so it's disproportionately resource intensive. Then the infrequently accessed data displaces frequently accessed data in the cache, making it even worse.


In theory, wouldn’t continuous scraping by AI farms et al put a log of this infrequent data into cache though?


Caches are only so large. Expanding them doesn't buy you much, and increases costs greatly.

The key benefit to a cache is that a small set of content accounts for a large set of traffic. This can be staggeringly effective with even a very limited amount of caching.

Your options are:

1. Maintain the same cache size. This means your origin servers get far more requests, and that you perform far more cache evictions. Both run "hotter" and are less efficient.

2. Increase the cache size. Problem here is that you're moving a lot of low-yield data to the cache. On average it's ... only requested once, so you're paying for far more storage, you're not reducing traffic by much (everything still has to be served from origin), and your costs just went up a lot.

3. Throttle traffic. The sensible place to do this IMO would be for traffic from the caching layer to the origin servers, and preferably for requesting clients which are making an abnormally large set of non-cached object requests. Serve the legitimate traffic reasonably quickly, but trickle out cold results to high-demand clients slowly. I don't know to what extent caching systems already incorporate this, though I suspect at least some of this is implemented.

4. Provide an alternate archival interface. This is its own separately maintained and networked store, might have regulated or metered access (perhaps through an API), might also serve out specific content on a schedule (e.g., X blocks or Y timespan of data are available at specific times, perhaps over multipath protocols), to help manage caching. Alternatively, partner with a specific datacentre provider to serve the data within given facilities, reducing backbone-transit costs and limitations.

5. Drop-ship data on request. The "stationwagon full of data tapes" solution.

6. Provide access to representative samples of data. LLM AI apparently likes to eat everything it can get its hands on, but for many purposes, selectively-sampled data may be sufficient for statistical analysis, trendspotting, and even much security analysis. Random sampling is, through another lens, an unbiased method for discarding data to avoid information overload.


Twitter feels more stable today, with less spam, than one year ago. There's of course parts that have been deliberately shut down, but that's not an argument about the core product.


Pandemic lock downs are 99% over. People are getting back outside and returning to office. These effects have little to do with Twitter's specific actions.


I see more spam these days, particularly coming from accounts that paid for the blue check mark. IIRC, Musk said that paid verification would make things better since scammers wouldn't dare pay for it (I would find where he said this but I hit the 600 tweet limit), but given how lax their verification standards are, it seems to be a boon to scammers, much the same way that Let's Encrypt let anyone get a free TLS cert at the cost of destroying the perceived legitimacy that came with having HTTPS in front of your domain.

(And IMO, that perceived legitimacy was unfounded for both HTTPS and the blue check before both were easy to get, it's just that the bar had to drop to the floor for most people to realize how little it meant.)


The "massive layoffs" was just twitter returning to the same staffing level they had in 2019, after they massively overhired in 2020-2021. This information is public, but this hasn't stopped people from building a fable around doomsday prophecies.


I mean, it’s clear the Musk overcorrected. The fact that managers were asked to name their best employees, only to then be fired and replaced by them, or that musk purposefully avoided legal obligations to pay out severance/health insurance payments (I forget the exact name)/other severance, and that the site has had multiple technical issues that make it feel like there’s no review/QA process all show that he doesn’t know what he’s doing.

He got laughed out of a Twitter call thing with lead engineers in the industry for saying he wanted to “rewrite the entire stack” and not having a definition for what he meant.

Doomed or not, Musk is terrible at just about everything he does and Twitter is no exception


I think this action from Twitter showed that it isn't public information. It is pretty much twitter's to do whatever they want with it.


I think that’s always been known, but the tacit agreement between users and Twitter has always been “I’ll post my content and anyone can see it, if they want to engage they make an account”. From a business perspective this feels like a big negative to me for Twitter. I’ve followed several links the last few days and been prompted to login, and nothing about those links felt valuable enough to do so.


Just because it is published doesn't mean authors don't retain rights on it. None of that content is public-domain.


It's about $1 per thousand tweets and access to 0.3% of the total volume. I think the subscription is 50M "new" tweets each month? There are other providers who continually scrape Twitter and sell their back catalogue.

https://www.wired.com/story/twitter-data-api-prices-out-near...

Researchers are complaining that it's far too high for academic grants. Probably true, but that's no different from other obscenely priced subscriptions like access to satellite imagery (can easily be $1k for a single image which you have no right to distribute). I'm less convinced that it's impossible for them to do research with 50 million tweets a month, or with what data there is available. Most researchers can't afford any of the AI SAAS company subscriptions anyway. Data labelling platforms - without the workers - can cost 10-20k a year. I spoke to one company that wouldn't get out of bed for a contract less than 100k. Most offer a free tier a la Matlab in the hope that students will spin out companies and then sign up. I don't have an opinion on what archival tweets should cost, but I do think it's an opportunity to explore more efficient analyses.


> We might never make a repository of human made content with no Ai postings in it ever again.

Wow, never thought of it that way before. Kinda hit me hard for some reason.


Honestly I think that's why reddit is closing itself up too. Everyone sitting on a website like this might be sitting on a Ai training goldmine that can never be replicated.


Too little too late. Anything pre-ChatGPT is already scrapped, packaged and mirrored around the Internet; anything post ChatGPT launch is increasingly mixed up with LLM-generated output. And it's not that the most recent data has any extra value. You don't need most recent knowledge to train LLMs. They're not good for reproducing facts anyway. Training up their "cognitive abilities" doesn't need fresh data, it needs just human-generated data.


Precisely, which brings us back around to the question: why are social media companies really doing this?

I think "AI is takin' ooor contents!" is a convenient excuse to tighten the screws further. Having a Boogeyman in the form of technology that's already under worried discussion by press and politicians is a great way to convince users how super-super-serious the problem must be, and to blow a dog whistle at other companies to indicate they should so the same.

It's no coincidence that the first two companies to do this so actively and recently are both overvalued, not profitable, and don't actually directly produce any of the content on their platforms.


One that's slowly ageing away though.


Synthetic data fed into training isn't necessarily a bad thing. It can produce great results in many cases.


I've seen that work with self-driving cars. Simulating driving data is actually better since you can introduce black swan events that might not happen often in real world.


It doesn't matter all that much. Smaller but better data is better for training than a large, but garbage dataset.


Do you think twitter has no AI postings?


Are you really sure it's legal? In theory it's not different from providing the same information from API or website... but do people working in law think so?


Twitter purchased Gnip years ago, and it's a reseller of social media data. Companies that want all the public tweets, nicely formatted and with proper licensing, can just buy the data from Twitter directly.


I'm assuming their terms give them permission to redistribute everybody's tweets, since that's kind of the whole site. I don't know why they'd restrict themselves to doing it over the internet and not the mail, but do you have any reason to think that to be the case?


We're talking about Elon Musk. I'd be surprised if he gave a shit.


So, I'd just made that suggestion myself a few moments ago.

That said, there are concerns with data aggregation, as patterns and trends become visible which aren't clear in small-sample or live-stream (that is, available in near-time to its creation) data. And the creators of corpora such as Twitter, Facebook YouTube, TikTok, etc., might well have reason to be concerned.

This isn't idle or uninformed. I've done data analysis in the past on what were for the time considered to be large datasets. I've been analyzing HN front-page activity for the past month or so, which is interesting. I've found it somewhat concerning when looking at individual user data, though, here being the submitter of front-page items. It's possible to look at patterns over time (who does and does not make submissions on specific days of the week?) or across sites (what accounts heavily contribute to specific website submissions?). In the latter case, I'd been told by someone (in the context of discussing my project) of an alt identity they have on HN, and could see that the alternate was also strongly represented among submitters of a specific site.

Yes, the information is public. Yes, anyone with a couple of days to burn downloading the front-page archive could do similar analysis. And yes, there's far more intrusive data analytics being done as we speak at vastly greater scale, precision, and insights. That doesn't make me any more comfortable taking a deep dive into that space.

It's one thing to be in public amongst throngs or a crowd, with incidental encounters leaving little trace. It's another to be followed, tracked, and recorded in minute detail, and more, for that to occur for large populations. Not a hypothetical, mind, but present-day reality.

The fact that incidental conversations and sharings of experiences are now centralised, recorded, analyzed, identified, and shared amongst myriad groups with a wide range of interests is a growing concern. The notion of "publishing" used to involve a very deliberate process of crafting and memoising a message, then distributing it through specific channels. Today, we publish our lives through incidental data smog, utterly without our awareness or involvement for the most part. And often in jurisdictions and societies with few or no protections, or regard for human and civil rights, let alone a strong personal privacy tradition.

As I've said many times in many variants of this discussion, scale matters, and present scale is utterly unprecedented.


This is a legitimate concern, but whether the people doing the analysis get the data via scraping vs. a box of hard drives is pretty irrelevant to it. To actually solve it you would need the data to not be public.

One of the things you could do is reduce the granularity. So instead of showing that someone posted at 1:23:45 PM on Saturday, July 1, 2023, you show that they posted the week of June 25, 2023. Then you're not going to be doing much time of day or day of week analysis because you don't have that anymore.


Yes, once the data are out there ... it's difficult to do much.

Though I've thought for quite some time that making the trade and transaction of such data illegal might help a lot.

Otherwise ... what I see many people falling into the trap of is thinking of their discussions amongst friends online as equivalent, say, to a discussion in a public space such as a park or cafe --- possibly overheard by bystanders, but not broadcast to the world.

In fact there is both a recording and distribution modality attached to online discussions that's utterly different to such spoken conversations, and those also give rise to the capability to aggregate and correlate information from many sources.

Socially, legally, psychologically, legislatively, and even technically, we're ill-equipped to deal with this.

Fuzzing and randomising data can help, but has been shown to be stubbornly prone to de-fuzzing and de-randomising, especially where it can be correlated to other signals, either unfuzzed or differently-fuzzed.


I despise Musk as much as anyone else and charging for API access has hurt a lot of valuable use cases like improving accessibility but … how about not massive scraping a site that doesn’t want you to?


Scraping isn’t illegal, and to be honest, I’m not even sure it’s unethical. I’m assuming you think it so — if so, why? I’m not disagreeing, but haven’t given it much thought.


It’s ethical when your average Joe does it on a small scale to scrap their favorite favorite YouTuber or to buy something when it becomes available.

When you have financial incentive to build your business on someone’s data and you scrap literally millions if not billions of pages - it’s unethical.


The thing with social media platforms is that this data is user-generated, so you've got the company "owning" user content.

This data is often of great public value. I track conversations around a social issue as part of my work for a non-profit.

I'd counter it's unethical to prevent people from accessing this data.


I’m not disagreeing with your comment but

> great public value

Having been to twitter mostly through the most recent prominent war, man the signal to noise ratio is really low even when being careful about who to follow and who to block. There is so much disinformation, bad takes, uninformed opinions presented as facts, pure evil, etc.

So I guess it could be used for training very specific things or cataloging the underbelly of humanity but for general human knowledge it’s a frigging cesspool.


OK, not gonna argue with that. There is, I guess, a perception that it matters because policy-makers, and the wonks and hacks that influence them are hooked. The value for me (and ergo the public, some classic NGO thinking there for you) lies in understanding those dynamics.

I do not use the Twitters myself, and actively discourage others from doing so. Sends people bonkers.


I mean, we have found election manipulations like large-scale inauthentic activity of out-of-staters explicitly targeting African Americans, and projects here even to the extent of the perpetrators getting indicted. Other projects were tracking vaccine side-effect self-reports faster than the CDC and other disaster intelligence.

We were actually gearing up to switch to paid accounts as we found use cases that could subsidize these efforts... And then the starting price for reasonably small volumes shot up to like $500k/yr.


So, are we saying it's unethical for Google and other search engines who make money off of ad revenue to scrape sites like Twitter? Or are they paying a large sum to Twitter to do this?


If Google doesn't provide a way to say "please don't scrape my site", then it 100% unethical.

We have robots.txt. If Google doesn't respect that, it's unethical. Don't you think so?


Does twitter's robots.txt forbid scraping? Judging by the fact it shows up in Google I'd assume not.


Maybe it's time for an llm.txt

Not that the people you want to respect that would


The tricky part is it's much more harder to prove that they didn't respect that.


When there is a value exchange between the two entities that are relatively similar then I think it is ethical. People trade Google making money on ads for their site being found when people search. It is also possible to opt-out.


They benefit mutually from their symbiosis. Financially, AI bro model #1321 doesn’t bring anyone value except their owners.


If done against the wishes of the owner of the site, yes, I would consider that unethical. Thankfully, Google respects robots.txt and noindex.


But it's it ethical for the site owner to block access to random people and companies in the internet to _my_ data? I posted that tweet with the expectation that it's gonna be publicly available. Now the owner of the site is breaking that expectation. I would say that this part is also unethical.

Especially since they're not moderating things or anything.


I would say that this part is also unethical.

Agreed. However, it's probably covered by their terms of service.

Same thing with the recent reddit kerfuffle. I'd have much preferred a Usenet 2.0 instead of centralizing global communications in the hands of a handful of private companies with associated user-hostile incentive structures.


Being indexed by google is optional. Twitter could stop it a any time if they thought it was a bad deal for them. That not comparable to a startup company trying to scrape the entire site to train their AI and using sophisticated techniques to bypass protections Twitter has put in place


Translation: it’s ethical when I do it.


You wouldn’t download a car, would you?


Except with modern software, some wannabe genius programmer will think they can get a bunch of money or cred or whatever by infantilizing the process down to something your grandma could use. Then, suddenly, everyone is scraping. The net effect is largely the same -- server operators see an overwhelming proportion of requests from bots. Still ethical?


> I’m not even sure it’s unethical.

If it doesn't respect robots.txt, it is unethical.


Is it ethical for the "public square" to have a robots.txt?

Musk is trying to have his cake and eat it...

(Clearly it's not a public square, but his position is incoherent).


Yes, it is ethical. In many countries it is legal for humans to walk around the public square and overhear all conversations.

It is NOT legal to install cameras that record everyone's conversations, much less sell the laundered results.

Pre-2023 people went on Twitter with the expectation that their output would be read by humans.

A traditional search engine is different: It redirects to the original. A bastardized search engine that shows snippets is more questionable, but still miles away from the AI steal.


Many countries have freedom of panorama, which means it is legal to video record the public square. I'm not aware if anywhere has specific laws on mounting the camera on a robot.


>Pre-2023 people went on Twitter with the expectation that their output would be read by huma ns

Expectations =/= reality. And the reality is that bits have been reading comments for over a decade.


a) It looks to be permitted according to Twitter's robots.txt

b) Given Twitter is public, user generated content which they don't own but simply have a license I wouldn't call it unethical in the slightest.


If the background of the issue is as Musk described, then it certainly is not allowed by twitter’s robots.txt, which allows a maximum of one request per second.

I do a lot of data scraping, so I’m sympathetic to the people who want to do it, but violating the robots.txt (or other published policies) is absolutely unethical, regardless of the license of the content the service is hosting. Another way of describing an unauthorised usecase taking a service offline is a denial of service attack, which (again, if Musk’s description of the problem is accurate) seems to be the issue Twitter was facing, with a choice between restricting services or scaling forever to meet the scrapers requirements.

Personally I would have probably tried to start with a captcha, but all this dogpiling just looks like low effort Musk hate. The prevailing sentiment on HN has become so passionately anti-Musk that it’s hard to view any criticism of him or Twitter here with any credibility.


"You wouldn't download a car."

The only reason these websites and platforms aggregate any content at all is because they're effectively giant public squares.


No means no ! :)


Moreover, isn't making scraping impossible illegal per a couple-of-years-old bill?


This isn't going to make them stop either. Musk is about to see a spike in account creations using the method of lowest resistance. I expect "sign in with apple" will disappear as an option soon, given its requirement of supporting "hide my email" that makes it trivial to create multiple twitter profiles from one apple ID.


This works in his favor though, more accounts means higher ad rev and MAU for valuation.


Higher ad rev only until the advertisers realise your users don’t buy anything and ads are wasted and your ARPU drops through the floor.


It’s funny how being a locust is cool when you’re doing it, and a problem when others do it in a way which affects you.


You might want to think twice about taking him at face value then. He says it’s about scraping but who knows anymore


And yet people do. Kind of predicting what various people react including scammers, bots, scrapper and what not is, like, job of a management in a company like this.


> I despise Musk as much as anyone else

It's interesting how you assume that most people despise Musk.


Oh shit you solved the problem


> I despise Musk as much as anyone else

I don't despise Musk at all. Don't agree with him on everything, but he is a genuine and interesting person.


He holds views that were the progressive norm 15 years ago, which are now considered bigoted, this is considered unacceptable today. There's a lot I don't agree with him on, like Ukraine, but "despise" is a word I reserve for the likes of Putin.


I don’t think Putin is the epitome of evil that the west portray him to be either. War is hell and he surely started the larger scale war, but just remember that you’ve probably been introduced to less than 1% of his side of things as a western citizen. The western world has gone to war many, many times in history for lesser reasons.


I'm from a country that's arguably part of the West today (Romania).

Your nonsense straight out of the a Soviet propaganda book doesn't work on me.

Go ask people from Eastern Europe how they felt about 45/50+ years of Soviet imposed governments and regimes.

The West has done awful things but in what way do they excuse the attempted ethnic cleansing of Ukraine?


What do you know about Putin’s motives? What propaganda do you think you’re under?

You’re probably smart enough to understand that out of spite and regret of your country’s history with the Russians your countrymen have more motivation than many others to judge the Russian efforts without any further investigation into the matter.

The same applies for myself, since I’m Finnish. It’s almost sad to see how people abandon all reason and critical thinking skills because of some ingrained belief that “Russia bad”. All of my knowledge of the human nature leads me to believe that they’re no more bad than the next people, and that they probably have some motives to go to a taxing war that we don’t really understand here in the west - seeing as the first casualty in war is the truth.


>Yeah he started the one of the deadliest wars in the 21 century, threatens to destroy the entire planet with nuclear weapons, but he is not that evil because there were other wars started by the west

That's an extremely dumb take


I’ll rephrase your argument for you: “Why don’t you listen to the rapist’s opinion? The victim is surely not blameless. Besides, your cousin is a shoplifter”.


Is this really the best response you could put forward in to people trying to make a nuanced point?


Nuanced point? That person just said 'sure, putin is committing genocide and destroying an entire country, but we haven't heard his side of the story'

Then they followed it up with the old hacker news chestnut of 'whatabout the west'


Would his side of the story matter to you? I don’t think it’s a particularly nuanced point to you since you’ve already made up your mind, however ignorant it might be.


Putin already gave his side of the story. He declared Ukraine an invalid country, said there were nazis there and then went into full out war to destroy the country while committing countless atrocities.

I don’t think it’s a particularly nuanced point to you

What point and why do you keep saying 'nuance' over and over while giving zero actual information? What are you trying to say and what evidence is there?

Let's think about this super hard. What is the justification for an unprovoked genocidal war? Why are you defending putin?

however ignorant it might be.

Show me where you get your information, lets see the source of this nonsense.


It wasn't "nuanced" at all. Just muddled and obsequious.


1. As already mentioned, it's hardly a nuanced point.

2. If you actually want to hear my opinion, then the realm of geopolitics + good old-fashioned hate of the US government does a number on people's logic, so we get what I can only charitably describe as a parade of non-sequiturs, whataboutisms and other fallacies. And so it can be useful to frame it in the simpler terms, for example you could hardly find anyone even on this site who would condone the forced takeover of parts of people's homes. Literally the same is happening on a scale of the countries.


Totally agreed. Make it about individuals where the analogy holds.

It cuts right through the bullshit and frequently exposes a lot of hypocrisy, hate, self loathing, racism, etc.


Does Twitter have that same approach with user data?


You think companies massively scraping right now would respect API rate limit?


API rate limits are more easily enforceable. If they keep scraping there are methods to detect and thwart behaviour. I don't think twitter has the appropriate talent and work environment to allow proper solutions to be implemented. It's all knee jerk reaction to whatever Elon decides.


It's more easily enforced, except when you don't give them enough they just go back to scraping. Or create a million fake developer accounts and pool the free quota if that's possible. These are not hypotheticals, loads of companies have done both against all kinds of APIs over the years, Twitter included.


"control access and rate limit"

Isn't that basically what they did with the API changes?


But they were too stingy with the tiers and too greedy with their prices. Even for minor use cases where you need to make, say, 100 API calls a day, you’ll need to pay $100/month.

Which just leads people to scrape.


Why is it twitter that is “stingy” and not the people scrapping so they don’t have to pay?


I'm not going to pay $100 just to fetch 3000 records for a hobby project. I'll either skip the project, or I'll just abuse my scraping tool.

If they'd made some more reasonable pricing tiers, I would have been happy to pay.

Fetching something as simple as the total follower count from an API shouldn't be more (exorbitantly) more expensive than fetching data from, say, GPT-4. No reasonable person can make an argument for $10c/call pricing.


Did you actually read that comment? I think the point is very clear -- given a reasonable price, people may would want to use the API instead of scaping the data themselves. If you instead ask for exorbitant amount of money, it only forces people to scrape, because there is no business model that would make it possible to pay.


Isn't the firehose API still available?


Indeed, spot on.


Sorry I don’t buy it. Hundreds of millions of people use Twitter, and we are to understand that there are an enough people scraping to the extent that they had to suddenly take drastic action by shuttering unauthenticated access? Any dev would have told him that those supposedly scraping could simply setup Selenium or some other headless browser to login before scraping.

This smells of another failed Musk experiment at twiddling with the knobs to increase engagement, to me.


Not only that but unauthenticated access is the easiest thing to cache. There is no need to "bring large numbers of servers online". He's lying.


A bot scraping content will tend to go deep into the archives and hit all content systematically. Caching isn't as effective if you hit everything whereas real users will tend to hit the same content over and over again.

It can add nontrivial load.


They could but signing in, even in selenium, means agreeing to twitter's TOS. See the LinkedIn scraping case.


The same way these AI code completion tools respected GPL-licensed code?


You don't generally need to accept licenses in order to scrape something, only if you want to distribute it.

The legal ambiguity comes from the question of whether LLM outputs are a derivative work of the training data. I expect that they aren't, but anything can happen.


> Hundreds of millions of people use Twitter, and we are to understand that there are an enough people scraping to the extent that they had to suddenly take drastic action by shuttering unauthenticated access

Suppose 1 million people are accessing Twitter at any given time. An actual person might only be making 1 request / second. That's 1 million requests / second.

Suppose there are 100 AI companies scraping Twitter. A bot like this can make thousands to tens of thousands of requests per second. That's an additional million requests / second.

There are probably more than 100 "AI" companies now, trying to train their own bespoke LLMs. They're popping up like weeds so I can totally see Twitter's load doubling or tripling recently. So sorry, I just don't get the skepticism. Sure it could be a cover for something else, but his actual stated reason seems totally possible.


> A bot like this can make thousands to tens of thousands of requests per second.

You don't need to use a bot to do this, Twitter literally did this to themselves through their own buggy code https://sfba.social/@sysop408/110639474671754723

If Silicon Valley was still being produced, this would make for a great episode.


Yeah no you cant just 'use selenium'. To keep the same scraping volume you might need thousands of accounts and 10x the compute.


It’s not a little “use selenium” switch you can click, but it absolutely is an option (and there are others) if the barrier is simply to have an authenticated account and be logged in.

If these data scraping operations are as sophisticated and determined as he claims this measure is insufficient and actually it really hurts Twitter far more than it helps. Case in point: we stopped sharing Twitter links because when you click them in most iOS apps it opens up an unauthenticated web view and presents you with a login screen. So we just collectively decided “ah ok no sharing Twitter” and moved on.

I’m sure there are companies scraping Twitter. I just don’t buy that it’s as big of an issue as he claims it is, and that preventing people from viewing tweets without logging in is a way to mitigate against that (I’d first look at banning problematic IP addresses first, personally).

To me it’s either:

1) a very poor and very temporary mitigation against scraping, that could be bypassed with a bit of effort

2) an experiment in optimising metrics - Musk sees lots of unauthenticated users consuming Twitter, tries to steer them into signing up

3) it’s all just a big mistake

Option #2 makes the most sense to me, but frankly none of them are good


A decade ago I worked on building AI systems that made (legitimate, paid) use of the Twitter “firehose”. At that time more than 99% of the data was garbage. It’s worse now. The value then was largely in two areas: historical trends (something like Google trends), and breaking news; and only the latter was really that interesting. I doubt it’s a high value data source scraped in bulk; it could have value in a much more targeted approach. Seems unlikely to require the addition of “large numbers of servers … on an emergency basis”.


Is twitter worth scraping though for AI? I mean, Reddit I get, but twitter content has suuuuch a low signal to noise ratio.


Seems to entertain many so has value in that sense I guess. Plus perhaps post vs replies make some sort of challenge & response pair that can be leveraged?


Metadata is probably more interesting than the raw content.


> Reddit I get

Reddit is a cesspool of misinformation and phallic references.

I guess there may be certain subreddits/tweeters that someone might want to train on, but I dont understand why.


I get only get

> Something went wrong. Try reloading.

on that link. Which, frankly, is hilarious.


This is a direct consequence of Elon gutting Twitter's infrastructure.


This makes no logical sense. Why would the scraping not restart after its unlocked? More realistically, he got a lot of backlash from users and website owners where embedded tweets suddenly stopped showing up.


Presumably they will put some protections in place.


Possibly, but they wrote those protections in ~5 hours? Seems dubious.


Where did you get the 5 hours from?


I used to work for a very large financial institution. Scraping from finance apps was a material source of load even with substantial countermeasures in place. I can’t imagine what it does to sites like Twitter and Reddit (and HN).


HN volume is absolutely tiny (the ids are sequential so you can easily check how many items there are in a given day) and there’s an API. It’s no comparison.


HN has an open API look at the footer


Isn't Musk a major investor in OpenAI? If he said "my data feeds chatgpt, not its competitors", that would make sense, right?


He's been somewhat critical of OpenAI. Specifically the part about it pivoting to a for-profit business.

> I’m still confused as to how a non-profit to which I donated ~$100M somehow became a $30B market cap for-profit. If this is legal, why doesn’t everyone do it?

https://twitter.com/elonmusk/status/1636047019893481474

> I donated the first $100M to OpenAI when it was a non-profit, but have no ownership or control

https://twitter.com/elonmusk/status/1639142924985278466


> He's been somewhat critical of OpenAI. Specifically the part about it pivoting to a for-profit business.

Because it was thereby starting to compete with his own for-profit.

He was even asking for a moratorium for 6 months so his company could catch up.


You’re sharing lies. He donated $10 mil and talks it up 10x.


There's a lot of lies in tweets. The point was: he's been disparaging OpenAI. It doesn't matter how much he donated or if he donated.


Isn't the more important point that his consistent lying makes his disparagings unreliable as a signal?


Can you point out any of his lies?


I assume you are joking. In case you aren't, start with the subject of this very thread ($100M claimed Vs $10M reality). Then work backwards through every claim he ever made about anything. Here's a collection of his top hits https://www.elonmusk.today/


He tried to pressure them in 2020 to make him a CEO. They refused, so he pulled promised funding when they were on the brink of bankruptcy. They made a deal with Microsoft instead.

Then in 2022, they blew up and Elon's been spitting venom at them ever since as he missed his chance.


No, he sold all (I think) of his shares long time ago apparently because of conflict of interest with Tesla.


Wasn’t in a non profit back then? Do they actually have shares? I thought part of the point is they don’t turn a profit to pay out to investors.


He had committed to providing funding to them and was on the board. Being on the board is indeed the only form of control in a non-profit.

He tried to pressure them to make him a CEO, they refused, so he said "no money then, go bankrupt" and quit the board. They made a deal with Microsoft and survived.

Now he's pissed.


Disables API, gets scraped, needs more servers.

Congrats you played yourself.


Disables API, gets scraped, needs more servers, disables access without logins, gets millions of fake accounts, has to deal with the fake accounts, in the process deletes tons of real accounts, users pissed, scraping continues, server bills keep rising...


There was an old woman who swallowed a fly …

Twitter is maybe at the dog stage. Perhaps it’ll die.


Missed opportunity to poison the scrapers' well by showing them AI-generated tweets.

Bad data is devastating in a way a HTTP 401 is not.


> immediate action was necessary due to EXTREME levels of data scraping

It sounds like Elon doesn't "get" the open web


I think it will be sort of interesting to see what AI scraping does to the open internet.

I think that we are already putting too much content into Social Media platforms (HN included). Stuff that we sort of ought to self host because then we would actually own it. But will you even want to run your own sites publicly if they are getting scraped? I guess it’s it really a new issue as such, but I imagine it’ll only get worse as the LLM craze continues to rise.


But you can buy verified twitter accounts starting from $0,035 per pc. I really don't understand how can it pose any serious roadblock for scrapers.


You haven't been able to look at anything but the /explore endpoint for weeks without an account, and the "content" on there has been total garbage.

I was relieved when they started asking for an account this week because now I'll finally be able to break my habit of navigating to Twitter to "see what's happening" only to find a bunch of sports memes, pop music drama, or right-wing trolls pretending like Hunter Biden is the most nefarious person on the planet.

Imo, Elon is lying and he locked everything down for PR so he would make a headline and frame it like his site's content is _so valuable_ that he just had to take drastic measures to stop AI from training on it.


He's not wrong on this occasion, there are multiple companies out there, some even with a multi-billion dollar valuation that "farm" tweets for many reasons.


The whole thing is down now, returning “rate limit exceeded” in the UI. A very hamfisted affair imho


Planet Earth Inhabitants in 2023: 8 Billion -> Social Media Users -> 4.8 Billion -> Twitter Users -> 368 million active users who engage at least once a month. If those AI Models are being trained on a reduced set of 3% of human beings, they will lose a lot.


They could just provide or even sell well curated datasets instead.


that's just a justification to force user to create an account ... he killed the api that already had an rate limit ... its so obvious that it hurts


I just get a 'something went wrong' message from this link. Is that because I'm not logged in, if so why don't they say that?


> Something went wrong. Try reloading.


I am trying…


Doesn't really make sense… What prevents a logged in bot from continuing to scrape vast amounts of data?


Then we just add 2fa to the scraper and continue scraping the shit out of Twitter. Checkmate.


Updating from the future (20230703T113430Z) and this has not been unlocked.


I was about to say that, I read the same post, I also agree somewhat.


Curious how Elon is going to train his OpenAI competitor. Scraping?


Leave it to Musk to create a hair-on-fire emergency out of a regular cup of coffee. "EXTREME"! Please...

But hey, I am eternally grateful. As a super-compulsive Twitter user, I read accounts and my own lists after I log out. This fixes it.

Have your "emergency", Elon.


That's Musk-speak. Translation: "we broke shit again and need to shed load to keep the site up."

I wish it would stays that way though so that I wouldn't land on the site by accident.


Less likely the reason is technical / cost of service (which is very cheap) and more likely he is trying to exercise leverage in pursuit of monetizing engagement (which had already happened [1] ). It’s not that he broke the website’s tech but rather he broke the website’s business.

[1] https://www.datacenterdynamics.com/en/news/twitter-pays-goog...


Oh how the disruptors hate disruptors


Yeah but he's full of crap; why would we believe anything he says about anything?


The above comment doesn't deserve to be downvoted. If Musk wants people to believe his statements, he should have refrained from being a serial liar. Now his reputation is in the trash, and he has only himself to blame.


In fairness, the first part of my comment is a personal attack that could have been left out. But the second part is what you're agreeing with, that he doesn't have credibility because of making stuff up a bunch recently. And I think that's right.


His tweets are as trustworthy as Trump's.


And as full of random capitals.


I wonder if it's people seeking to move away from Twitter and working around crippled APIs, or if it's ClosedAI in which Musk himself invested before...


Amusing that I called precisely the cause well before Elon tweeted (https://news.ycombinator.com/item?id=36542697) and was downvoted for it.


HN has an audience which is full of hypocrisy


This killed nitter.

Fuck.

I guess I'm done with Twitter.

Reddit is in Eternal September. Twitter is login-walled. If HN is next, I'll probably be mostly done with the Internet.

This version of the Internet is starting to suck. :(


HN isn't next. We hate change.

(More precisely: we're acutely aware that users hate change, and since we do too, it's kind of an easy call.)

Also, the value of HN to YC consists of the community and keeping the community happy is therefore a must.

(Did I say happy? More precisely: as happy as possible under the circumstances)


I appreciate that attitude and value it myself, but I like to point out that it is not without risk.

If the world around HN (including its community) changes, stasis can damage or kill it as well.

Specifically regarding the issue of the original posting:

- HN is already an important data source for large language model training. [1]

- To the best of my knowledge there is no freely downloadable and current data dump of HN. [2]

- The HN-API does not offer all the data that scraping can get. For example, if a post had ever hit the front page or the highest front page position reached, is an interesting data point that is missing.

- The Algolia-HN-API has the same limitations.

In my opinion this will lead to increased usage of the API and increased scraping which all costs money. HN might be forced to find a solution for this.

[1] For example, the RefinedWeb paper lists HN as one of only 12 websites that were excluded. From what I understand, it was excluded because it went into the final dataset unvetted. RefinedWeb was used for the Falcon model.

https://arxiv.org/pdf/2306.01116.pdf

[2] The closest thing is probably the Google BigQuery "bigquery-public-data.hacker_news" dataset. It claims to be updated daily, but really is from late September 2022. Also I could not find the download link which other data sets offered on BigQuery have. Does anyone know if I can download the complete thing anyhow?

https://console.cloud.google.com/bigquery?p=bigquery-public-...


I don't expect HN give a fuck about the scraping. It's pure HTML, no images, probably cached all to hell for users who aren't logged in anyway.

The one thing I see as a future issue is that people are starting to post comments that clearly look like they were manufactured by ChatGPT and friends. Or that could just be the way some people talk and I've spent too long with ChatGPT now and start to smell it everywhere.


HN does have performance / capacity issues, and you'll find that if you're crawling the site rapidly, you'll quickly have your IP banned.

I've had that happen even under manual browsing (when logged out). My front-page analytics project hit that limit quickly (within about 30 requests, probably less). Adding in a reasonable delay got around that.

Keep in mind that a lot of Web infrastructure tends over time to operate just at the edge of stability, as capacity costs money.


I mean, given that AI might be trained on HN, maybe it’s ChatGPT saying things that sound like HN commenters…


This attitude that: if it ain't broke, don't destroy it, is something I'm finding myself valuing increasingly. I use Stylish to make HN look a bit more readable and prettier, and beyond that, it functions exactly how I want it to and I'm glad to see that the institutional momentum here is a core value.


> If it ain't broke, don't destroy it.

That's actually a nice way of putting it! Better than the "fix" version, because this is clear on the consequences.

I feel like it might be applied to everything from OS UI design (Windows 11), web platform redesigns (Google's icons) to whatever is going on with social media (the silly enshittification term describes this) and many other things.


Yeah, it really is "destroy".

Let's say that you are the proud owner of a goose that lays golden eggs. "fixing" would be switching it to a different feed that might make it more productive, or it might make it sick. But this year the trend is to give it a few good kicks to see if that helps.



All I do is hit cmd+plus to increase the font size.


Take a look at my profile and CSS hacks, if you're interested.

(I've a slightly more updated set locally, can share those if requested.)


I’ve been here for 6 years and the format is still a breath of fresh air when I come back from things like Twitter or Reddit.

It’s not that I’m opposed to change really. I love good ideas, and being surprised by new and unfamiliar things is usually a joy. Communication via text is hard to improve upon though, and I’m not convinced any major social media platforms have found ways to improve this in any meaningful ways.

I want to read interesting things and discuss them with interesting people. This is hard on most platforms. HN makes it easier than everything else I use.


I grew up on the 'modern web' and I love HN's format the most.


You did say, recently, that you don't like the idea of paid third-party apps using the HN API.[0]

I thought that was an odd change for HN. After all, the majority of the value still accrues to the HN owners. In fact, user's that are prepared to pay for an app to access a free website largely comprised of adverts, are typically more valuable than the rest. Those users have money to burn and skin in the game!

https://news.ycombinator.com/item?id=36363325


That's a fine point. I just get uncomfortable with it because the currency of HN should be curiosity, not money, so energetically it doesn't feel like a good fit.


Elon says he used it as a way to stop AI data scraping because the servers were suddenly hit by a large load.... (As a weird form of DDOS shielding in other words)

HN could be hit with such a large load too, since we have pretty good and lengthy discussion here, good for AI data training.

Would you believe it a valid tool to keep the community happy?

Since I doubt we would be happy if we can't access the site because some AI decided this was the time to scrape, either.


I agree that it could become a problem but I'd rather wait until the problem shows itself clearly, rather than (potentially) over-reacting in advance. Sometimes the medicine turns out to be worse than the disease; plus it's a better fit for being lazy.


I'd love to see the source control change log for HN. I can't even remember a single visible change in all the years I've been a user. I think there have been a few under the surface though.

I wish all the other sites on the Internet would wake up every morning, look at their TODO list and say "Nah, not today."


It's ambiguous what GP refers to as "next", but if it's the Eternal September part, I believe HN unfortunately already suffers a lot from it. In my subjective opinion, comment quality took a nosedive in the past 2-3 months or so. That said, I have no idea how to fix it, if it needs fixing at all.


I'm not saying you're wrong and anyway it is hard, if not impossible, to evaluate objectively—but I can tell you two things for sure. One is that people have been saying more or less exactly this about HN for at least 15 years; the other is that HN is subject to a lot of random fluctuations, and random swings tend to get interpreted by humans as long-term trends—not because they are, but because that is what humans do.


In addition to my sibling comment: HN also steps in to quash developing negative patterns in all sorts of ways. There's a long list of banned sites, there is the flamewar detector (though I've ... questions ... about that), dupes detection (or flagging), there are weightings and penalties given to various sites. I believe also some keyword and other patterns are looked for as well, "Reddit" being among ones dang's recently discussed.

So yes, occasionally some new pattern or trend will emerge, but HN adapts to those fairly quickly.


Paralleling what dang's said here, I've been looking at 17 years of HN front page activity over the month or so, and am starting to tackle the question of topic drift and/or focus over that period.

I've been sort of live-blogging the experience on the Fediverse: <https://toot.cat/@dredmorbius/tagged/HackerNewsAnalytics>, as well as in some of my HN posts.

My current tack involves looking at sites (as reported in parentheses at the end of each HN front-page post title) and classifying those. With slightly more than 30% of sites categorised, I can classify about 65% of all HN posts.

For the full dataset (17 years), that's roughly:

     1  63913  35.73%  UNCLASSIFIED
     2  22589  12.63%  blog
     3  15112   8.45%  general news
     4  13823   7.73%  tech news
     5  12851   7.18%  programming
     6   8622   4.82%  corporate comm.
     7   8459   4.73%  academic / science
     8   7294   4.08%  n/a
     9   5324   2.98%  business news
    10   3803   2.13%  general interest
    11   2151   1.20%  social media
    12   2074   1.16%  software
    13   1613   0.90%  technology
    14   1463   0.82%  video
    15   1144   0.64%  general info (wiki)
    16   1009   0.56%  government
    17    724   0.40%  misc documents
    18    720   0.40%  law
    19    702   0.39%  tech discussion
    20    620   0.35%  science news
Tons of caveats: this depends heavily on how I classify individual sites, a given site's stories might well be technical, social, or political, etc., etc.

The breakdown-by-year analysis is in development, but if anything programming-specific content as increased in prevalence. Political discussion seems not to have (though it rose significantly ~2014). Cryptocurrency and blockchain-specific sites also peaked about that time (I suspect much of that discussion is now mainstream). General news has always been a huge portion of HN discussion, as have individual (and corporate) blogs.

Note again that this isn't about discussion and comments, or even the titles or article contents (I'm thinking of looking at those, it's ... a challenge for me).

But across nearly 200,000 front-page stories, on which nearly half of all HN discussion occurs (based on another API-based study looking at comprehensive posts), the overall trending seems at first blush to be pretty consistent and if anything improving over time.

(As with all preliminary results, I'm hoping I won't have to eat my words here. Though I'm reasonably confident in most of this.)

From the classifications above, the places you might find some that "suffering" would be in general news, genral interest, and social media categories. All but the first of those are single-digit percentages, and a lot of that general-news content is about technology, business, finance, and science, all of which would crowd out the sort of social and political issues which seem to generate strong feelings.

The "UNCLASSIFIED" sites are a wide mix, though most are probably a mix of blogs, corporate / organisational communications, and the like. The mean posts per site is 1.739951, so gains from additional site-categorisation are pretty slim. I have captured a lot of obvious patterns via regexes and string matches, so academic/science and major (or even minor) blogging and social media sites aren't a large fraction.

More recent discussion here: <https://news.ycombinator.com/item?id=36524001>


More on "UNCLASSIFIED": there are 36,520 of those sites.

It's not practical to list all of them. But we can randomly sample. And large-sample statistics start to apply at about n=30, so let's just grab 30 of those sites at random using `sort -R | head -30`:

   1  sfg.io
   1  extroverteddeveloper.com
   2  letmego.com
   1  thestrad.com
   2  bombmagazine.org
   1  domlaut.com
   1  bootstrap.io
   1  jumpdriveair.com
   2  desmos.com
   1  leo32345.com
   1  echopen.org
   1  schd.ws
   1  web3us.com
   7  akkartik.name
   1  bcardarella.com
   1  cancerletter.com
   1  platinumgames.com
   1  industrytap.com
   2  worldoftea.org
   1  motion.ai
   1  vectorly.io
   2  enterprise.google.com
   1  lift-heavy.com
   1  davidpeter.me
   1  panoye.com
   3  thestrategybridge.org
   2  fontsquirrel.com
   1  kettunen.io
   1  moogfoundation.org
   2  elekslabs.com
That's a few foundations, a few blogs, a corporate site (enterprise.google.com), and something about tea, all with a small number of posts (1--7).

I'm looking at some slightly larger samples (60--100) here on my own system, and can actually make some comparisons across samples (to see how much variance there is) which can give some more information on tuning what I would expect to find under the "UNCLASSIFIED" sites.


Fair enough! Personally one thing I would like is some sort of inbox functionality to notify you if someone replies to your comment, but can definitely live without it!


One welcome addition would be support for embedded images - sometimes you need to share some screenshots. For example right now, I am seeing rate limiting messsages from Twitter, from the normal account.

But may be it will increase your costs a lot.


I think of features like this on a regular basis that this site needs. Then I catch myself. It's feature-creep like that which has killed practically everything else.


oh god no.

posts not being able to submit images is a huge part of what makes HN valuable (to me).

once HN needs to handle images, HN falls apart


Thanks for that. You’re about the only website I trust not to turn on me these days.


I don’t know how much running the site costs, but I hope the people paying for it keep agreeing with that sentiment.


They have so far and have reason to continue to, not just because of how they feel about HN but because the economics of curiosity are a good fit for YC's business. That's the miracle (I would even say) about HN - it occupies a sweet spot where it can be funded to just be good*, and the economics work because it's in the interests of the business.

I've written about this a fair bit over the years if anyone wants more: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

* ("good" here meaning "as good as possible under the circumstances")


Agreed. Align models with happiness.


The choice is not between no change and drastic change, but selecting a rate of change that is appropriate. Things which do not change, die, as the only thing that is constant is change. Change or be changed.


I visit a few sites that never changed (or at least changed things only few were even aware of) and they live long enough to lose count of hordes of change-praisers appear and die.

I believe this eternal loop of change is a trap that impatient people force themselves in. Instead of accustoming and learning the current state, they rush into another one with a change that has unclear implications. As a result, they never get where they are and lose any track of where they were or where they’re heading at. Their only comfort can be found in a constant change.

These few sites are like home to me. One of them I visit with years-long pauses and every time I return it’s the same user experience. That’s invaluable.


I'd be curious to hear about some of those sites.


We've actually made many changes to HN over the years, but almost all have been subtle enough that most users don't notice.


Sharks haven't changed in about 450 million years. There are designs that just work and don't need to change unless the environment changes drastically.



Just innovate one thing: show a counter how many new replies you have since last visit.


No thanks, I'm happy that HN doesn't contribute to notification hell...


One number is notification hell?


This might interest you,

https://news.ycombinator.com/item?id=22930391

[dang]: "Push notifications seem to jack up the nervous system in a way that's good for engagement but not necessarily for users..."


Who talked about push notifications?


It's extremely rough around the edges but getting into fediverse content (mastodon, kbin, lemmy, etc) has been extremely rewarding to me.

It's like twitter and reddit but 15 years ago (and by that I mostly mean it's janky and full of bugs. Just like the web was 15 years ago!)


For me the main barrier is that I want to have portable/roaming control over my IDENTITY, even if the content hosting is (for now) entirely through a system administered by someone else. If I control the identity, I can at least keep local copies and rehost/repost content later.

Instead, it feels like the current Fediverse demands that I make a blind choice to entrust not merely a copy of my content but also my whole future identity to whatever of these current instances looks the most stable/trustworthy at first glance, hoping my choice will be good for 1-5-10-15 years. It's stressful, and then I look into self-hosting, and then I put the whole thing off for another week...

AFAICT I would need to set up a whole federated node of my own in order to get that level of identity-control. Serious question: Is there any technical limitation preventing the admin of an instance from just seizing an particular account and permanently impersonating the original owner?

In contrast, I was hoping/expecting some kind of identity backed by a private asymmetric key. Even if signing every single message would be impractical, one could at least use it to prove "The person bob@banana.instance has the same private key that was used to initialize bob@apple.instance."


This is basically the entire point of the Authenticated Transfer Protocol (AT Protocol), which powers Bluesky. I think it does a ton of stuff right, including portable identity backed by solid cryptography (no blockchain or "crypto"!) and has a lot of promise. It's still in development, but I am hopeful that it will live up to its promise.


can't a malicious bluesky admin steal/MITM users' private keys by messing with whatever frontend javascript users interact with?


Yes, at the end of the day a malicious client is always a risk with this sort of thing. But the AT Proto does have some mitigation in place—users have a signing key which their PDS needs to act on their behalf (sign posts, etc) and a separate recovery key which users can hold fully self-sovereign and use to transfer their identity in case they detect malicious behavior. It's not foolproof of course, nothing is, but it is thoughtfully designed.

But yes, the protocol does have a fair bit of trust of your PDS built in. But that's inevitable for decent UX—imo the crypto craze proved that basically no one wants to (or can) hold their own keys day-to-day. If you want to have a cryptographic protocol that the average person can use, some amount of trust is necessary. The AT Protocol artfully threads the needle and finds a good compromise that is a (large) improvement over the status quo, in my opinion.


In theory, kinda, but you can bring-your-own client, and "the" web client is decoupled from the back-end instance.

"bsky.app" works as a web client for the official "bsky.social" instance, but it also works with the instance I self-host (or any other spec-compliant instance). Likewise, 3rd party clients work with the official instance, and also with 3rd party instances.

However, no key-stealing could possibly happen right now in any case because... the PDS ("instance") holds your signing key - the client never even sees it. Having the server hold your signing keys is very user-friendly, but of course not ideal for security and identity self-sovereignty. In general, the security model involves trusting your PDS (just as you trust your mastodon instance admin, or twitter dot com - the improvements are centered around making it easier to jump ship if you change your mind).

Client-signed posting is something that's not even possible right now, but I believe it's somewhere on the roadmap. If it doesn't happen some time soon I'll be implementing it myself. (I'm writing my own PDS software)


How is this better than everyone having their own Wordpress or Drupal install?


That's never going to work for the average person, sadly. And it misses a lot of social features that a lot of people (myself included) want from social media. Simply put, the UX is way too far off what people want and need.


It will, ISPs just need to start providing the basic hosting infrastructure on their routers again, like they used to. Thankfully we're also at a time where IPv6 is mature enough so that this is greatly simplified !


Wordpress doesn't have ActivityPub built in, it's a plugin in beta currently. Without AP, there is no client that can pull in website feeds and provide discoverability between WordPress sites, Mastodon posts, etc.


Back in the old days, activitypub was my Rss feed reader. Discoverability was driven by good old fashioned cross linking, comment discussions, and skimmable feeds from aggregators like the one we're on.

People love to reinvent the wheel and claim it's a whole new thing. No ideas on the web have really been innovative since the bubble popped. The innovation has all been on delivery and execution (not wanting to discount any of that).


WordPress is not exactly known for its security.


Sure it is? WordPress updates itself and all plugins automatically. I've had Wordpress sites running for over a decade with zero security concerns ever popping up.

Maybe your point of view is outdated?


> For me the main barrier is that I want to have portable/roaming control over my IDENTITY, even if the content hosting is (for now) entirely through a system administered by someone else. If I control the identity, I can at least keep local copies and rehost/repost content later.

This is why I want domains as identities to succeed. I want to own my handle on every platform, but I don’t want to self host.


Do you know of any existing projects in this space?

I was toying with an idea/protocol where:

1. You add a TXT/CNAME that points to a trusted "authentication provider".

2. When you try and login to a website that supports the protocol, it checks the DNS record and redirects you to your provider.

3. You then "prove" that you own the domain to the provider - how this is done would be specific to each provider, but one possible method could be by providing a signed message that can be verified vs. a public key stored in a DNS record.

4. The provider redirects you back to the original website with a token.

5. Finally the original website consumes this token by sending it in a request to the provider. The response contains the domain as confirmation of the user's identity.

This approach removes the need for self-hosting as users can point and setup their names with third party providers.

Users can also trivially switch to a different/self-hosted provider by changing the CNAME.

Communities could also allow direct registration by hosting their own provider instance and pointing a wildcard subdomain at it: (i.e. *.users.ycombinator.com).

Users could then sign up to said provider using traditional email/password and claim a single subdomain: (i.e. tlonny.users.ycombinator.com)

Thoughts?


GP wants not to self host yet wants to have the control.

Though what you described is just a regular federated identity workflow, except autodiscovery through DNS (though that is already a thing for some)


So where is it kept then?


Sounds they want self custody of their keys. This isn't what the general public want.

Decoupling identity from social is a good idea but you can't just migrate the key storage to a single custodian entity. There'd need to be a multiple custodians to ensure the same power imbalances didn't reappear in a different form (e.g. Google owning everyone's logins).


Exactly. There is no magic trustable non-local/distributed system that replaces 'self-hosted' for this purpose.

All that is needed is a to create your local identity (e.g. like storing fingerprint biometrics on your laptop) and a clever way to sync between physical devices (e.g. through bluetooth).

We're in this weird situation where people don't want to be responsible for managing their own data/id, but can't trust others to do so for them.


I raised issues on mastodon and plemora advancing this view a few years back, to an initially frosty reception that’s since become a grudging “nice to have but hard to add”.

My recommendation was much like MX records for email, so you can use a hosted server under your own identity.


There are people who want to add distributed identity to ActivityPub. It was left out of the spec but there were things left in to make it possible to add later. That's my understanding from a distance, anyway.


I've been able to switch mastadon instances without any problems; most instances seem to handle whatever activitypub machinery transfers followers.

So as long as both your source and destination support account transfers, you can usually switch and even seamlessly bring along most of your followers without them noticing.

No idea about your admin question. All bets are likely off with a bad admin. If you want actual cryptographically guaranteed communication, that doesn't exist in a usable form (except for Secure Scuttlebutt, and that's reeeeally stretching the "usable" part)


> I've been able to switch mastadon instances without any problems; most instances seem to handle whatever activitypub machinery transfers followers.

But none of your content goes with, and that's really terrible because it gives a lot of power to petty-kingdom jackass admins.

AT Protocol is heartening because maybe they've got a good solution to that.


You can export your content on the old insurance and import on the new instance


That doesn't actually work, though. Old links don't update or forward to matching instances of your toots on Mastodon. They're isolated. It's a bad experience.


That's a good point. It would be nice to have a streamlined export flow that rewrites your links for you.

(It's technically possible to edit the links in the posts you exported yourself before importing, but technically correct isn't the best kind of "correct")


If your instance suddenly dies without prior notice, you won't be able to transfer your account.


Keep an eye on Key Event Receipt Infrastructure (KERI). https://keri.one/

Current spec drafts at <https://github.com/WebOfTrust/keri>.


Didn’t the guy who founded the web basically make something like that?


Yes, Tim-Berners Lee led the Solid project[1], which reverses the client/server identity and data model. The user will store their own data and the service provider can only access it under the policy set by the user.

The promise is that one can not only transfer the identity and all personal data across instances of a single service, but also across different services (imagine from mastodon to Lemmy).

Sadly, it never caught on.

[1]: https://solidproject.org/


Firstly, agreed totally about identity.

Secondly, signing every message wouldn’t be impractical at all, I don’t think. We’ve had the technology to do this for a long time and it’s very simple. What we don’t have is good key management. For average users, this would have to be something provided by their devices (phone or the Secure Enclave in your Mac or whatever) - managing keys and the web-of-trust shamozzle are the main reasons why encrypted email for everyone never took off.



Fediverse identity is based wholly on URL. If you want to own an identity, you own a domain.


This sounds like you want to self host, then federate with instances?


Not OP but I want to point my dns at a host and have them handle it.

You can pay for that service, but you have to administer the instance, and it’s not able to reuse the servers RAM for multiple domains; it’s not like email where spam management is built in.


I'm not too clued in with how federation works - but is this practical?

Doesn't federation require admins of both instances to agree to federate?

If so, I could imagine it not scaling for larger instances that could receive 100+ federation requests a day.

Further, you would have to repeat this step each time to encountered a new instance you wished to participate on.

Is my understanding of the federation process correct or have I totally missed the mark? :D


The federation is opt-out, so by default an instance will accept any federation request. You only block bad instances after the fact (or use some kind of shared blocklist)


The main problem with the fediverse is that none of the people I want to read, post there. And they never will, because they are disparate sociopolitical demographics and the fediverse by design keeps them in separate instances that I at some level need to think about.

The majority use-case requires centralization, which is subject to the network effects that constitute 95% of Twitter's value. Great that it works for you and some others, but it cannot work for most.


I tried it but I just can't get into the flow of things, it doesn't feel like a lot happens during the day, but maybe i'm on the wrong server? I just want to expel my bowels and doom scroll bad funny memes


> it doesn't feel like a lot happens during the day, but maybe i'm on the wrong server?

We are oriented to think that way after ~15 years of the algorithmic engagement-maxi world of Twitter. It always looks like there's a lot happening all the time but look deeper and it's a bunch of people offering their weak takes on hot topics to build their brand.

What was the last thing you remember being must-see sfuff on Twitter?


My vice was reddit, I could spend a lot of time on different subreddits for different topics and usually always find something new.

But I do agree with you, being away from the noise has been really freeing for my mind


> We are oriented to think that way after ~15 years of the algorithmic engagement-maxi world of Twitter.

I never cared about algorithmic engagement.

Before (and during) the engagement Twitter:

- has everyone you needed there, centralised

- has search, where you can find people and topics you care about

Mastodon has none of that: you have to know which server to join and how to find people and topics. Centralization always beats distribution in convenience.


You may not care about algorithmic engagement, but that doesn't mean that the content you are looking is free of the incentives created by said algorithmic engagement. I use Twitter search too, but it's full of hot takes, because that's what the network rewards.

It's kinda like the "I don't care about politics" stand. You might not care, but the institutions you interact with every day certainly do.


I mostly use Twitter search to find specific posts like "@rich_harris web components" or similar.

This is extremely useful


> What was the last thing you remember being must-see sfuff on Twitter?

The Russia circus last weekend made for some pretty good near real-time intrigue. That being said, I don’t care what crazy thing is happening…I’m not creating an account to hear what’s being said on “The Global Town Square ™”


My experience of it was actually opposite. I first started from Twitter, trying to piece together something from the chaos of tweets, but then I went to an actual news site which had a nicely packed timeline with latest action and had more accurate and more up-to-date information.

In theory, Twitter should be that real-time news feed from these kind of events, but it doesn't actually work. Signal-to-noise ratio is just very low.


Maybe if nothing is happening then nothing is happening and you should do something else ;)


Maybe try bluesky


The vaguely detailed, unclear, NIH’d corporate rip-off version that already has privacy concerns (public block list issue) that’s even less developed?


How does fediverse intend to pay for server/developer cost? For new technologies many smart people work for free as long as it excites them and when it just comes to maintenance and fixing bugs it wouldn't be cheap for any technology with so many moving parts. Also, early adopters donate with much higher probability than when the masses arrive.


Coming of age in the late 90s/early 00s we had plenty of forums to choose from, hosted by hobbyists, with nary a monetization scheme in sight. And this was in the era when the tech was far less accessible and the hardware far more expensive. Sure, maybe a modern $5/month VPS running basic forum software isn't going to handle 100,000,000 active users, but it sure will handle 10,000 active users, and that's more than enough to have a healthy community.

(Note: I'm of the opinion that fediverse-style federation in the context of forums is merely a nice-to-have; the web is already naturally federated, and people should not feel bad if they want to save money/tech complexity/administration complexity by settling for ordinary self-hosted forums.)


You are off by 2 orders of magnitude. Plans for 10k active users on managed instance generally seems to be in the range of $250-450[1][2].

[1]: https://toot.io/pricing.html

[2]: https://masto.host/pricing/


This is specifically why I call out the federated model as a nice-to-have, not a requirement. ActivityPub is way, way more demanding of CPU and transfer than a simple forum, and as a result it's extremely difficult to self-host at scale. You can easily service 10,000 daily active users on a VPS serving lean, statically-rendered forum pages.


Mastodon was not designed to be cheap to host


> Mastodon was not designed to be cheap to host

Is there a reason for this, though? Need to be able to iterate on features quickly? Maybe not being able to tackle various complexities with the total available resources? Or maybe federation is just inherently expensive?

Why couldn't we have an alternative written in a more performant language/runtime with maybe things like lower quality images/videos or something?


> Is there a reason for this, though?

Because performance was not a concern when it was designed - or it could be that it was designed for small communities, and therefore not possible to scale-up cheaply. One of the problem is the caching of pictures from the different instances connected (if I remember correctly) which makes the data storage requirements go up very fast


The big issue with hosting forums and the like is trying to keep the bots at bay. I have seen very small forums get over run in next to no time. And putting in bot checks leads to frustration with the users.


Good point, my implicit assumption is that, unlike the classic forums of my youth, forums in the post-LLM age will want to adopt the "tree of invites" model (e.g. how lobste.rs does it) rather than allowing unrestricted write privileges (read privileges can still be public). This creates a localized web of trust that will be mostly manageable at medium scales; ban or revoke invite privileges to any users whose invitees turn out to be bots or sockpuppets.


I pay $10 a month to my masto instance. They do a superb job.

Is that the best way? No clue. But I'm glad to chip in.


That's up for each instance to decide. Could be subscription, donation, I can't see why there couldn't be local (non federated) advertising


Fediverse doesn’t feel like a solution


Public service announcement: the only good thing on twitter, massimo, has at least one mirror on mastodon. https://mastodon.social/@rainmaker1973@birdsite.miisu.net


However, BirdsiteLive instances come with their own information for the public.

* https://jdebp.uk/FGA/fediverse-third-party-mirrors.html


Agree. https://beehaw.org in particular is great


Finding communities is the toughest part.


It's still at the early days, so give it some times. But there are already some lively communities and, IMO, they are generally better in terms of quality since Fediverse is more niche and has higher barrier of entry.


This. Twitter is doing what they can to drive people to mastodon. I've held off closing my Twitter account but I need to get better familiar with mastodon. I was wondering if there are accounts like NWS <location> (weather updates) over there, or so they plan to have some soon. Also from brief exposure, servers like mastodon social reads like left wing echo chambers which is also cringe.


What kind of annoyances have you had on Twitter, as a user, that makes you want to leave?

I have been using Twitter more often lately as I really don't care about what they do to their APIs or whatever, and it's honestly improving lately.


The main reason is I followed a lot of scientists in my field of research who left the platform. Plus there's been subtle (and some not so subtle) changes to the algorithm and the software.

Most of the people I followed before were clearly left wing. After Musk takeover a lot of them left the platform. Plus the algorithm now pushed more right-wing content in the home page. I wouldn't mind if it were real people discussing valid talking points. The problems are 1) they are all coming from blue check mark accounts, 2) most of it are clearly misinformation, and 3) you can tell most of these tweets and replies are troll and bot accounts. It's just annoying.

Musk boosting his own tweets in the feed was annoying. Had to unfollow him.

I use Twitter via mobile website, and it breaks more frequently than before.

Overall, it's become like Musk's other product, Tesla. It over promises and under delivers. It's not reliable anymore. As a 2x Toyota owner I cannot stand products that are results of crappy engineering. So there you go.

Edit: one more thing: it used to be that I could go to the trending hashtags and get latest news in a second. It's not the case anymore. Case in point: yesterday France was trending. I saw the tweets and got the impression that some member of minority community has committed mass stabbing or rape again. Because the Twitter results were brigaded by right wing blue check mark accounts spewing anti immigration propaganda. It was not until I read a BBC article that I realized what happened was complete opposite, police executed an immigrant at a traffic stop.

Twitter has lost almost all of its core values under Musk. It's just sad.


> This version of the Internet is starting to suck. :(

The internet is fine. The highly centralized businesses that built themselves on the technology are meeting their inevitable end.


The internet is most definitely not fine. There is an incoming tsunami of chat gpt generated bullshit that is going to make most open discussion sites more or less useless. Twitter requiring accounts and Reddit shutting down apis are both related: chat gpt et al are a threat to the social media business and ironically made possible because of the social media business. TBF I think we should all be exercising extreme (even more than usual) skepticism on any discussion sites any more.


Social media companies are acting like a drug user who was getting their dope for free but now has to turn tricks behind the dumpster. Easy money (aka low loan rates) has dried up or much harder to justify so companies that have used their user population as a means to profit are realizing they need to charge money for things like blue check marks or API usage when all the VC's won't give them a hit anymore.

Publicly free content from users devolves to garbage content. I think the Chat GPT effect is they're realizing its easier for companies/entities to generate garbage that is at or exceeding the intelligence of comments by actual people (a low bar). Sure there are pockets of usefulness but this is tiny amongst the firehose of garbage.

If all that is publicly available on social platforms is just garbage nonsense, people will just stop going if any barrier is thrown in front of them. The internet as a technology stack is fine. This is how social media dies (hopefully).


+1 your concern is valid.

A potential saving grace: I bet within a year or so it will be easy to self-host LLMs that are easy to fine tune and run. Then there will be a few open source tools that you can use yourself, privately, to capture your level of interest while reading, and periodically make a reader/summarize/filter agent.

This is not scary if people can fairly easily run it all themselves, keeping their data private. It would help wade through crap, and there is some irony in using LLMs to have personal filtering and summarization.

Compared to future versions of ChatGPT, Bard, etc. the models that individuals can self host will be much weaker, but I think that they will be strong enough for personal agents and eventually be cheap to run, and affordable to fine tune.


Right, if you aren’t already doing this, you’re about to have to screen every single comment online in a fucked up game of ‘human or robot’


> Right, if you aren’t already doing this, you’re about to have to screen every single comment online in a fucked up game of ‘human or robot’

Hah, this might mean that we'd be one step closer to the dead internet (conspiracy) theory actually being reality: https://en.m.wikipedia.org/wiki/Dead_Internet_theory

What an interesting concept.


I mean this has been true for a while: we've all laboured under the fiction that any given screen name maps 1:1 with a human being.

On every large platform, this hasn't been the majority for a long time.


And don’t forget net neutrality. ISPs haven’t!

The clock is ticking until you’ll need to decide which internet package you prefer, based on content providers and limited by region.

At least Yahoo will probably come free with the base package.


Sounds to me like the Internet is, in fact, fine. It's only those "open" discussion sites which are having trouble.

Those sites never fit my definition of open anyway (free, permissively licensed technology and content). The ones that do are smaller, aren't a monoculture and seem to be pretty untroubled so far. No one wants to scrape the little Mastodon or Lemmy instances or other small community sites I pay attention to.

Big deal if something is a threat to the social media business. The social media business is a cancer which should be destroyed anyway. Go outside and engage in real socializing instead of the depression-spawning, teen-girl-murdering version peddled by Zuckerberg and Musk. It's much better and once you change your habits you'll never look back. Maybe it has something to do with all the vitamin D you get from being outside.


If anything, nature is healing. I’ve noticed at least 1 community return to forums due to all of this (https://mholdschool.com/), and while unfortunately AFAIK there isn’t any good FLOSS software for it, it’s certainly a start.


PhpBB is GPLv2, and it still works just fine. Discourse is also GPL, if you’re looking for a more modern look. There are a number of others too (https://en.wikipedia.org/wiki/Comparison_of_Internet_forum_s...)


Does PhpBB have an adaptive layout? I've used several older forums that require scrolling sideways because they don't.



The UI looks the same after all those 20 years.


And this is a feature!


Literally feel waves of nostalgia looking at that layout. It's perfect!


Which means it works?


Good question. It seems to be a half empty/full glass situation.



[flagged]


Umm..

> Discourse is the 100% open source discussion platform built for the next decade of the Internet

Are you thinking of Discord?


I am, thanks. But also Discourse is so terrible that it should be excluded too.


Zig community at reddit was closed recently and they decided to make its discourse instance as the primary discussion center.

https://ziggit.dev/


Discourse is going to pull the same shenanigans as Reddit and Twitter, you can be sure of that. No one is going to host millions of users for free, forever, they'll all come around to get their investment back one day.


Are you thinking of Discord?

https://discord.com vs https://www.discourse.org


What? Discourse is free software, they aren't "hosting millions of users" at all, nevemind for free.


Someone is going to have to host users.


Yes, the respective projects themselves. If you can't trust a project not to fuck over its users, why are you so concerned about the hosting of their discussion platform in particular, rather than just about everything else?


Think before you speak. Discourse is a self-hostable solution


Do you think self-hosting means "it's free to host"? Do you understand someone is paying the costs of hosting, and that someone can do whatever the hell they want to recover such costs?


usually those costs, for a specific community, are not very high. not sure about discourse specifically, but you can serve thousands of users for relatively cheaply.



What happened to vBulletin by Jelsoft? Needs to be open sourced.


People use xenforo now, which was created by some former developers of vbulletin so it’s very similar.


Will have to check it out, thanks!



Woltlab are still around as well. IIRC Burning Board was vBulletin’s fiercest competitor at the time


> This killed nitter.

I suspect that was the intent.


Yeah. Honestly, at this point you have to think the ketamine isn’t being microdosed[0], or the whole Twitter escapade is an intentional op to burn down an account independent forum that has frequently been a source of pain for those in power.

I suspect the later. We’ve seen billionaires do it countless times.

[0] https://www.independent.co.uk/life-style/health-and-families...


Man, you don’t need the special K to see that Elon is not great at non-engineering things.


Elon is not great at engineering things, either. The main thing that he’s great at is self promotion and convincing people that they should give him (even more) money.


An admitted Aspie not getting SOCIAL media. Color my surprised.


Nah, how many people are using Twitter without being logged in compared to how many people legitimately change every link they receive to Nitter?

I used the 'redirect to nitter' Firefox extension and Android app but it got quite unreliable and nobody else that I know uses nitter at all. I think Nitter users would be a tiny, perhaps even immeasurable minority compared to casual readers that now are incentivized to either log in or fuck off (but... FOMO).


I didn't mean nitter _specifically_ but rather all forms of alternative/anonymous UIs for twitter (that strip all of the engagement/ad/tracking stuff from twitter.)

Of anything I suspect Elon saw what happened woth reddit and had a "wait, we should do that!" Moment.


Likely more of a bonus. I doubt anyone at Twitter factored nitter into any business decisions.


Bingo! It's only a matter of time another service takes over. I would be extremely surprised if at any given time there wasn't at least two startups in the dark trying to be the next twitter. Just waiting for the right moment for a large group of people to get pissed off at twitter to launch their service to the public.


To not know of any is sort of weird. To believe a for-profit enterprise should replace the last is unspeakably dumb.


Wait until you realize that highly centralized businesses are a feature, not a bug.

We've BEEN through federated platforms before. We've even been through PROTOCOLS before. They're all horrible. The successor to any platform that currently exists will have slight improvements to what already exists, and that's IF they're able to do so.

I don't have a dog in this fight, but I do have over 30 years of being around social media platforms on the internet.


Every time someone says this they forget the internet itself is a federated platform.


Every time someone says this it's because they don't have a credible counterargument to the central premise.


Why are centralized businesses doomed to fail? It’s been the primary organizational mode for the last 10,000 years.


> It’s been the primary organizational mode for the last 10,000 years.

For most of human history, most businesses were small ma and pa shops operated by a few local people. These days large business chains are the norm. You could say that centralized big business killed the decentralized small ma and pa shops.


Genuinely curious: which centralized businesses existed in 8000 BCE?


I would bet that the first multinational organization to corner an intercontinental market was the apostolic church of Rome.

If https://en.wikipedia.org/wiki/Timeline_of_international_trad... is right, the necessary technology for international trade was not available before 4000 BCE (horse riding ~3700 BCE, long-distance sea travel ~2000 BCE).


And more to the point, what fraction of centralized businesses in the past 10,000 (or even 1,000, or even 100) years did not eventually fail?



As the saying goes, the market can stay irrational longer than you can stay liquid - everything eventually falls, but Twitter's not on the long side of "eventually" here.


Pressure to squeeze out profits continously until there’s nothing but a husk left.


Except weren’t all these services running at a loss?

How would smaller companies do better?


The key here is that the environment of near-zero interest rates these services proliferated in is over, so there is a drive to wall up and monetize their content more aggressively. That will probably fail, because all value they had was in community interaction. Who would want to train an LLM on post-2023 twitter content?


Why do you think ZIRP won’t come back eventually?


By actually charging for their services?

For every gmail, how many independent email providers do you think exist out there?


Right but isn’t that what they are claiming would drive people away from the big companies?

Why would you choose to not pay for the big tech products and then pay for a smaller version?


Because Big Tech demands big profits. They can not exist in worlds where they can not have a monopoly or oligopoly.

Smaller companies and ISVs, on the other hand, will be better off if their market is commoditized. They won't have to spend so much to compete in R&D, they just need to find each the best way to serve their (comparatively) small customer base.


Two notes on this:

One is that centralized _business_ has certainly not been the primary organizational mode. You can talk about centralized _government_ (of whatever variety you'd like), but the distinction there is that the centralized government had some sense of itself in an ecosystem - a citizenry, a land, a future, etc. - and businesses do not.

The second is that centralized entities sure wrote a bunch of stuff _down_, but it's hard to say they were the primary organizational mode for any but the last 50-200 years - the reach of serious centralized bureaucracies has only really begun to match their propaganda in the industrial and now computer age. Until extremely recently, the actual effective reach of a centralized bureaucracy was a day's horseback ride - control degrades rapidly as one leaves the core. "Heaven is high and the emperor is far away", as the saying goes.

Edit: With regards to the second note, "Against the Grain" by James C. Scott is a solid read.


It's not centralized businesses that inevitably fail, it's businesses that become top dog that inevitably fail.

If you own a company and want it to last through the ages and you aren't literally the only guy in town, never become number one. Aim high, but don't hit the top.

The same can be said for countries and practically any organization or group. You stay an underdog if you do not want to ever fail.


This doesn't make intuitive sense. Are you sure you're not seeing the results of survivorship bias? Or is there a mechanism in place that kills top players?


It's because once you're top dog, you stop aiming high and start maintaining your top spot. You stop being ambitious and innovative, instead you start being anxious and wearing rose colored glasses.

That leads to complacency, corruption, and delusion, ultimately leading to failure.

Everyone and everything from mundane individuals to megacorporations and empires have all fallen from grace once they became top dog. No exceptions.

If you want to last, aim high but don't become top dog.


To paraphrase Gilmore, “the net interprets [mandatory login and other access fuckery] as censorship and routes around it”

If Twitter locks out more readers, people will stop posting and move elsewhere. If Reddit ejects the mods that made it successful, communities will evolve elsewhere.

The forest will regrow and different paths will form, routing around the dying patches.


This is the main reason I don't use quora and pinterest. But I guess they are still around.


At least Google Images is no longer just Pinterest results.

The only time I run into this Fuck You pattern is Instagram refusing to let you replay a video.


Yes they are, and both are a lot more crappy than they were before.


At least Pinterest is making money, and as a company, that's the only thing it cares about.

They aren't trying to delight you, or make the best product for you.

They're trying to suck as much money out of you as possible. If that means making a worse product, you're gonna get a worse product.

Until this gets beat into the average person's head, we'll keep falling for companies that we think will be different.


Yup. I have come to realize I care exactly as much as they do.

Makes this all really easy. It is a big world out there with lots to do, build, see, laugh, love, play.

Fuck em. I just do not need it. Whatever it is, the vast majority of the time.


Also Experts Exchange.


Wow, I don't think I've heard of them since shortly after they hyphenated themselves.


That's one community I don't miss.


Technically inclined people, perhaps. I think you're underestimating what the vast majority of Internet users are ready to put up with.


This is true of any medium in any free society. The net has nothing to do with it. It certainly isn’t capable of interpreting some website action as censorship. It’s not sentient. And there’s no dynamic network routing/damage at play here. It’s just people going somewhere else.


I have a prediction that will make you happier: This is a temporary thing, that will generate a TON of press (outrage ; praise by the fans at the amazing bold strategy move) and will also generate a TON of signups right as there is a wave of people leaving Reddit.

And then it will magically open back up. (More press)

And that's the story of pretty much every one of the outrageous/bold/brilliant/terrible strategic decisions you've heard of since the Twitter takeover.

What's most amazing is that it works everytime, I'm surprised there isn't an Onion copy/paste article about this each time.


This wasn’t the case for Twitter API pricing changes.

My Twitter usage is down a lot since they killed off 3rd party clients.


> My Twitter usage is down a lot since they killed off 3rd party clients.

Every user hostile move they've made basically halves my usage.

I used to spend way too much time on Twitter, now it's 10 minutes a week and I don't feel like I miss much.


It’s a gigantically dumbass move that’s killed off all amateur creators and consumers. My feed is just filled with full-time content creators churning out 1/20 threads that are consumed by other full time content creators.

Just a boring echochamber.


> 1/20 threads

Its the twitter equivalent of youtube thumbnail face with 10 million $$$$$$$ in the title.


This EXTREME EMERGENCY that Musk is talking about is not AI - it's Musk himself.

He is killing engagement and ad income, has to slow the bleed, and is now looking under every rock for some gains, GOTO 1.


New Coke anyone?


>will also generate a TON of signups

I've been meaning to make a Twitter account since I follow a few accounts for some games that I play.

Between Twitter not being owned by a loon anymore and finally a reason to overcome my laziness, why not?

And yes, I know I'm playing right into Twitter's hand. Whatever, I actually like Musk anyway (an unpopular opinion that will no doubt get me flagged around these parts).


hilarious how this is downvoted


This internet started to suck a long time ago, but today was a red letter day.

Reddit's been circling the drain for years, but today is the day it truly crossed a line for millions of people at once (They killed Apollo and RIF).

Twitter's been getting rapidly worse since Musk bought it, but this is another red line.

We're not even allowed to talk about the influence of bots and astroturfers here. A popular post today was flagged within an hour, just for pointing out that a site claims to sell upvotes on HN.

I don't like it when the tech giants get in sync like this...


You can talk about the influence of bots and astroturfers...so long as you agree it's all one big conspiracy theory, like all the comments did on my post from yesterday [1], "The Gentleperson's Guide to Forum Spies". [2]

[1]: https://news.ycombinator.com/item?id=36526827

[2]: https://web.archive.org/web/20221215015113/https://pastebin....


I just opened up RIF to see it was dead. I knew it was coming but I wasn't ready for it. How infuriating.


Fyi narwhal got an extension.


Oh, I see that. Better than nothing.


I would say black letter day.

- Reddit API lockdown

- Twitter account wall

- YouTube increases aggressiveness against Adblock

This is on top of similar behavior from other sites that happened long ago. Facebook, Instagram, LinkedIn, etc.

But drastic changes across platforms within 24 hours is unprecedented. This does not bode well.


It feels like the "natural" development of VC funded websites - they offer a service that's heavily subsidized and losing money hand over fist to displace other services (I guess in Reddit's case it was self-hosted forums and similar?).

But that's clearly not sustainable, it's inevitable they'll pivot to monetizing the service - and that's clearly going to make it less attractive than the heavily-subsidized version people have gotten used to.

I guess the ideas is to lock the users in to the degree that the increasing monetization is put up with, and slowly enough there's no "sticker shock" of a previously-free service suddenly having a price.

I'm constantly amazed people are surprised by this, isn't it obvious that tying to a loss-leader service isn't sustainable?


The bigger problem is the plan was basically infinte growth and was only possible due to 0% intrest rates. The last 15 years has been a weird fever dream of free money, now that intrest is a thing again suddenly the infinte spending expansion and figure out profitablity later model is no longer sustianable and reddit twitter google meta ect suddenly have to actually make money again. Basically the internet everyone has been used to for 15 years is dead as the reality of debt suddenly resumes. Expect things to keep getting worse or revert to the old version of the internet of scattered low power sites and forums. Frankly only microsoft is really the only one that has a somewhat sustianable model for an intrest rate enviroment.


> Expect things to keep getting worse or revert to the old version of the internet of scattered low power sites and forums.

This is actually the most mood-lightening comment I've seen on this matter. A return to the internet of 15 years ago doesn't sound so bad to me. (Of course, the mood darkens again when I remember that it won't really be that, because, e.g., that internet of 15 years ago couldn't handle the bot-spam of today.)


Reddit didn't need to be heavily subsidized. Old Reddit was open source, had community-built apps, and could have been run indefinitely with a handful of employees and Reddit Gold. Like a Craigslist or Wikipedia model.

Instead with their VC funding they spent $$$ to introduce things like NFTs and TikTok scrolling, which nobody asked for, but it burned through a lot of cash.


HN doesn’t really have a business model so I’m not sure how much it can be corrupted. I guess if YCombinator ever gets tired of paying the bills , but I’d imagine having a majority(?) of startup/tech employees visit your site almost daily is quite good for them in indirect ways


I assume that if HN ever caused them any serious PR damage, they'd pull the plug that day. It's always been made clear that it's a single-server site sitting somewhere that was written as a hobby a long time ago. That kind of nothing cost they're willing to take on indefinitely, but not any serious bad media cost. I've always felt that was the reason for the heavy moderation, especially as compared to the anarchic early days. Even with the salaries of dang et al. expenses are probably a rounding error.


Based on the content that gets posted, I don't think there's a way for HN to cause serious PR damage to them.


That's not independent of the moderation, though. Topics that invite heated discussion but aren't absolutely necessary to talk about on HN get nuked very quickly. If it would be absurd for it not to be discussed here, it gets a pinned note from dang telling everyone to behave, and gets carefully monitored to make sure it doesn't degenerate into a riot.


HN business model is to perpetuate YCombinator's culture. It's probably some of the most effective money YC spends.


+ customer funnel (launch / show HN), + private, free, targeted job board


It’s effective because in this instance their interests align with their audience’s, which is pretty much an ideal situation in such a capitalist economy. Also, I don’t know how much HN costs, but I cannot believe it is disproportionate in the PR and communication budget of something like YCombinator.


Cult, you mean?


And loving it ;)


Bingo! Being open to access, but heavily moderated, is the perfect spot for them. HN is essentially perfect for it's use case and is unlikely to change soon.


Tapping on the tech pulse and information control/access is the "business model".

Remember, this is a VC firm that runs the joint. We're paying with our data, and that gives them early/first mover advantage.


I'm cautiously optimistic about Lemmy. Anyone can spin up an instance, so it's decentralized, but instances are connected, so there's community.

It may still be rough around the edges, but to me it feels like the spirit of the old phpBB forums combined with almost 20 years of lessons learned from Reddit.


<rant> Not responding directly, just piggy-backing for lack of a better place to put this comment.

The problem with Lemmy is that one gets sent to some place like https://github.com/maltfield/awesome-lemmy-instances, is immediately confronted with a ton of weird links like "butts.international" and "badblocks.rocks", what even is this? And about 100000 other servers just named "lemmy", "notlemmy", and "lemmy1". So you click a few at random optimistically, then get hit with login page, or a server error, or an apparently empty test-server. You begin to think you're being pranked, like am I supposed to brute-force click like 50 things to find something that's not a joke? Maybe you go to https://join-lemmy.org/ and it says "After you create an account, you can find communities", so great, it's inaccessible anonymously, the same as twitter. You go to https://lemmymap.feddit.de/ and after 15m of page-loading get a hilariously useless cyberpunk-looking word-soup where you can't click any links, much less search for topics/communities (btw there are 2068431 running instances and somehow butts.international is still front and center in my cyberpunk view)

Finally, by ignoring recommended tooling and just using google-search I found a community relevant to my interests, but it has pretty bad content and a whopping 1 user/day. Another google search trying to find a certain topic, I find one, but it has only 3 total comments, and I could not tell what month/year the posts were added.

So, clearly I don't really know what I'm doing here, but this stuff is ridiculous. As long as we're crawling 2068431 instances why don't we look at the communities it hosts and the volume/recency of traffic? At least filter totally empty stuff and/or make it easier to get all the test instances in a sandbox! Discoverability is so bad that I can barely get to the point where I'm considering usability / content.


You're making a very good point. I looked at lemmy stuff before but I moved to kbin instead. It has a more familiar interface (to reddit) and it is federated with lemmy. This all sounds good and it is. But even there I basically have no idea what's going on. It has more content than the lemmy instances you visited but every once in a while when I remember to visit I only look at whatever landing page I configured and can't figure out what's kbin, what's lemmy, what "magazine" I am looking at etc. To be fair, it's already a pretty good product and it is likely to get a lot better. That's exactly why I keep trying.


As someone who signed up for Lemmy and is planning to replace my Reddit use with it, I hear these points loud and clear. You also make a good point about Google search and how bad it’s become - I see so many stories of people adding “Reddit” to their search in order to get any decent results. This is the natural result when you have every business paying people for SEO and trying to game the system.

Between account walls and search’s indexing problem, it’s become very hard to find small to mid sized active communities on your own. In fact this problem seems to be something people are trying to solve in Reddit communities via related subreddits on the sidebar.

So having gone from using search engine’s to crawl for relevant content that was out there, people are now creating content specifically to end up in Google’s search results - destroying the value search once had. Indicated by what people have done on Reddit, and these discussions about finding alternatives, it seems we are well on our way back to webrings. I welcome this.


“it's inaccessible anonymously, the same as twitter”

This is not true. Communities can be browsed without login on their home instances.


Good example of irritating comment that led to the odyssey above. What you say also appears to not be true. The correct assertion is maybe: If you can find an instance, and if the instance is configured to allow anonymous, then you can browse communities. Since nothing is ranked or searchable, then if you're willing to do that for N instances and M communities, enduring server errors/login screens the whole way, maybe you can find decent content after weeks of brute-force labor. This isn't practical for someone who just wants to spend a few minutes finding a replacement for /r/math and /r/physics or whatever.

For the short term at least, since discoverability is so broken, I think those who want to advocate for Lemmy will be better served by just linking to content or curating indexes of active communities. It's not that useful to anyone if the focus is always about pretending everything is fine, or presenting prospective users with totally useless machine-generated indexes where we cannot tell the test-servers from production.


IIUC it's federated,not decentralized. Still pretty good but if a server drops offline it will be pretty inconvenient. The communities on that server will be dead.

Matrix does better as the rooms are decentalized so can continue even if the creating server drops offline. But user accounts are still only federated.


Lemmy.ml's admin is pro chinese government and actively censors comments that are critical. What that means to you is your decision, but I want to make people aware before the mass migration date arrives.

https://old.reddit.com/r/Save3rdPartyApps/comments/14k67qt/l...

Highly sus


Lemmy's team is very politicized, but even then it took significant pushback to change their minds about an issue that the community was decrying for reasons that were almost entirely technical.

https://github.com/LemmyNet/lemmy/issues/622

I would not bed with these fellows myself.


It bothers me a little bit that having a strong stance against intolerance is seen as being “politicized.” That should just be normal and expected behavior.

Maybe they were abrasive in initially fighting the request to make technical changes to the slur filter, but hey when you ask for free enhancements to open source code you either do the work and provide a pull request or be prepared to be told no.

I empathize with their concern about becoming another Voat or Gab. They want federation but they don’t want a Wild West.


The problem is that the stance is incredibly shortsighted and in a way bigoted itself. Take a word filter that contains some regex for n**a. They are saying you should never use slurs and this word in particular in public discourse.

So the word above word is used in lyrics of a music genre with predominantly black musicians. In addition to saying we don't want our software to be used by racists, they also say "we don't want our Software to be used to discuss certain kinds of black music" (arguably a racist stance just by itself). Talk about unintended side effects.


yes, this is one of the trade offs of any system built where one must decide between human moderation/curation vs automating moderation/curation.

if automation is chosen there will absolutely be situations where perfection is impossible. if human’s unparalleled ability to see nuance is chosen then the cost scales along with the amount of information.

the fact is, if we want a community and we want to keep signal above noise, we will need some form of removal of spam, child porn, racism, etc…

automatic tools can’t nuance as well as humans.

then human mods start nuancing and someone will point at stuff and call it biased.


> It bothers me a little bit that having a strong stance against intolerance is seen as being “politicized.” That should just be normal and expected behavior.

It did not seem to me a politicized discussion but a technical issue with filtering using hardcoded blacklists that are just too prone to the Scunthorpe Problem. Perhaps because too many people in the USA despise the mere existence of other languages :)


I think we have to remember that this isn’t a commercial product, it’s a small project. They had a quick and dirty solution and weren’t willing to abandon it but also weren’t initially willing to put in the time to make a more robust solution.


Isn’t that just one instance? You don’t have to join that one.


It's the official instance, so it has a privileged position.


What privileges specifically? Are you forced to join it?


It is what people will probably view first when they find out about Lemmy, so it has the most attention.


This seems to be inaccurate. When I go to join-lemmy.org, and click join a server, the first servers on the list are at least semi-randomized recommended servers. Every time you refresh the page the selections change.

As far as the “popular” list, which is placed below the recommended list, lemmy.ml doesn’t have any special privileges there. It just happens to be the most popular. If something becomes more popular it will go on the top.


wow an explicitly marxist-leninist instance supports an explicitly marxist-leninist state? who would have thought


Can servers be ad supported? Like can I run a Google ad on my own instance?


There's nothing stopping you, the source code is freely available to edit when you start up your instance. It would probably limit the appeal of your instance to many people and maybe some very ideological smaller instances might defederate, I doubt the other major instances would care much since only those signed up at or browsing through your instance would see the ads.


I guess if I put enough effort into it I could fork it. Is it that easy though? Or not built in already


We know how much we all love ads. Let's have more of them!


I see absolutely no reason why an instance might not decide to fund itself on local ads. I see no reason why you couldn't choose an ad-supported or non-ad instance


Forums used to be ad supported, nothing particularly wrong with being ad supported. Problems occur when your investors expect 10x return on something ad supported.

But to have something pay for it self. I'd rather not lose money on it


Actually, there is something wrong with ad supported platforms. Advertisers start imposing restrictions on the actual content, or the owners of the platforms enforce restrictions pre-emptively so it never arises.

I was on a forum, and the kind of language and even topics for discussion were severely restricted, partly because of advertising.


I mean I consider this a huge win: delete your Twitter account and never again will you be tempted to go read a tweet. If only I could set an anonymous expat cookie for all the services I've left behind letting them know "No, seriously: I left and I'm never coming back. No reason to track me, show me your content or ask me to login." Where's my restraining order cookie telling Facebook to fuck off outta my life, never to return?


Hosts file? Blackhole Facebook content entirely?


that's a good point

i havent' logged into twitter since forever but still have accounts

i need to delete them


Was using Fritter for Twitter and Infinity for Reddit. Both apps allowed local subscriptions without forcing a login. They were perfect. Both now dead in the water for me.


For everyone really.


I’m so jaded with the internet at this point. Most phone apps and games are a microtransaction hell. Most social media apps are a race to the bottom with clickbait. Every website is littered with ads and popups and cookie banners.

Time to go outside and forget that the online world exists.


There is any believable amount of content being made where the value is so absurdly low, I welcome this change. For everyone that wants to setup a copy/paste version of a CRUD app or YouTube channel, that is a ton of time and effort being wasted. I’m not saying we should be focused on solely optimizing everyone’s time and efforts, but it’s clear we have tipped the scale too far with how much time and effort is being dedicated to bullshit.

If we as a society decided to channel these efforts into building infrastructure improvements and homes, I think this would help a lot more. I understand the biggest problems in that respect are legal and cultural, but I can’t help but feel people have tried nothing and are all out of ideas.


I have Starlink, which gives me a US IP when I connect to it, and I was surprised to see the amount of ads people there are subjected. Most don’t do business where I am, so their free tier is effectively without ads for me (except twitch. It choses to show me US ads for some reason).


> This version of the Internet is starting to suck. :(

What sucks is people flocking like sheeps to centralized platforms that have not their best interests at heart.

Decentralization has been the answer from day 1 but people dont understand shit and make the same mistake over and over again


Reddit's Eternal September began no later than 2018 when Tumblr banned porn. What's happening now is that it has hit an apparent growth ceiling (most of the most upvoted posts are from more than a year ago) and the owners have realized that they have to make it profitable now or never.

As long as the site is growing, it doesn't have to be profitable. But when the music stops...


I think the big problem with profitability comes directly from why the userbase grew so large to begin with. People use it as entertainment, and are uploading content that is expensive to host. Video and images are magnitudes larger than simple text and links to that content. If Reddit didn’t take it upon themselves to host media, they wouldn’t be in such a crunch. They also wouldn’t have grown so large, true, but there was no question of Reddit’s sustainability then. With mainly text and links, weathering the slowdown in growth storm would be easier.


> This version of the Internet is starting to suck. :(

Totally! The internet was better before pay walls / auth walls.

I get why we're here today, and I get that the last phase was just about acquiring an audience, and getting people hooked, and now this part of getting everyone to pay was always the plan, but this part really does suck.

Just feels like all the services lined up to start shitting on users at the same time. Netflix, YouTube, Reddit, Twitter, NYT (and all the newspapers really)... you can't watch Amazon Prime without having 300 "buy now" buttons in your face.

This phase of capitalism sucks.


I remember seeing a documentary way back in 2000 about the internet that said something like "the internet is freely accessible now, but some people believe eventually pay walls will begin to block access to more and more portions of the internet." And I thought how ridiculous, that will never happen, information wants to be free!

Well, looks like I was only partially right. Most access is still "free", but at the cost of enshittification.


Ah, the F2P (free-to-play) model. Indeed I’m surprised that microtransactions and pay-to-win type dark patterns have mostly not been adopted on the internet at large. And loot boxes! Can’t forget loot boxes.


Information wants to be free, but information is power. And some people want to hoard all the power for themselves.


If P2P micropayments could somehow have succeeded, then a different Internet could have been possible. Tipping content creators directly is impossible without megacorporations taking a cut, be it Apple, Paypal, Patreon etc, and their unit economics work better with recurring payments, which lands us in subscription hell.

One would almost be tempted to ask for a cheque in the mail like in the old days.


P2P payments/transaction media have been on the way out for decades due to the State's interest in controlling the financial transaction medium as tightly as possible.

The database has doomed us all! For it is the Seed of all Evil in the hands of the Wicked! ...and arguably the Good, but misguided!


flattr was an interesting idea of a micropayment approach. Asically a subscription amount and then spread around sites.

But I guess for publishers it never produced enough revenue to be worth maintaining the integration.


Cryptocurrency almost solves this, but the scaling issues and high knowledge barrier to entry is still holding this idea back.


It doesn't solve it at all, I am talking about real money you can spend in a store, not tokens that need to be converted through an exchange into cash each time.


There are stores where you can pay with cryptocurrencies. That said, GP said "almost". We will see if it ever reaches the point where this word will not be needed.


The reason why you can't spend it at the store is because the store doesn't accept it. The reason why the store doesn't accept it is because of the aforementioned scaling and complexity issues.


Embrace the audience, extend the functionality, then exploit the users.


Disagree . Two of its most toxic waterholes finally gone. Join the wholesome and useful internet . Its out there. I promise .


I just made an updated tool that should work

Replace http://twitter.com by http://traittor.net

http://twitter.com/elonmusk/status/1675187969420828672

to

http://traittor.net/elonmusk/status/1675187969420828672

Is normally bypassing no registered account and limit by day.

Unfortunately it does not work with all the tweets and especially the recent ones

Have fun :)


It-always-did.jpg.

I'm at least glad that people will start looking into the alternatives.

The next lesson they need to learn is TANSTAAFL. Nothing good can come with an internet where publishers are paid with eyeballs. We need to rescue the idea that "voting with your wallet" is the best and fairest way to have quality content.


Somewhat agree, though the income disparity across nations (and even within) makes pricing somewhat more difficult- but I agree with the spirit of it having to be non-free.


also killed my irc bot that scrapes tweets to display them. Imo It won't help twitter to have no free read API calls, people won't click on every twitter link just to read two lines of text. I'd only need ~30 a day but I'm not paying 100 dollars a month (?!) just to read a few tweets.


What's so funny is everyone saying social media sucks and needs to go away. Then when it starts to it's all "no not like that!"


I feel the same. Had 4 twitter accounts I followed on nitter after losing my account, and now, not sure. Feels like a time of (light) mourning.


I had JUST started using Nitter’s RSS feeds. And it was so comfy avoiding all of the “algorithm” too.


HN can run on a $10 droplet with cloudflare cdn. It is a text based forum with relatively low volume visitors.


I worry that at some point, Google will start monetising GMail and "Sign in with Google". This might be a calamity of a much larger scale.


I second you word by word.

Years ago I was brought to twitter compared to Facebook exactly because you could read without being logged in. After a few years it had become a hellish place with lots of flames and arguments, but it still had some value. It became clear that my engagement was mainly to discuss with random people about things knowing they would never change their minds (neither would I) on things like Covid vaccines. It was a huge waste of time, but I found it out that surfing it as non logged would amore allow me to read without being able to reply to the most stupid comments. Some sort of read only Twitter. Now that it has gone, Twitter has irrelevant.


The internet has sucked for a long time, it’s just leveling up again.


See you on obscure irc channels :)


At this point even "Truth Social" is looking less bad


Don't hold your breath, its tale of mismanagement is long and sordid.


Lol


> I guess I'm done with Twitter.

people been saying this for a decade +

you'll be back


Ugh, what an eventful time of social media this has been.

First Twitter API, then Reddit API, so today Apollo and many more Reddit clients shut down, and now Nitter. :-(

I'm happy Lemmy is kind of taking off. I think it's helped more than Mastodon because it's less realtime/feed focused and slower paced. It also doesn't require you to form a friend circle to benefit. Instead, the community is waiting for you already. You just sign up on an instance and add your communities. Done. This helped me a lot, together with sites like https://sub.rehab


I really like lemmy too. I think the biggest issue is that people think of Lemmy itself as the replacement of reddit.com. But what makes more sense to me is thinking about Lemmy as a tool to build separate websites that are each a replacement for reddit.com and that these can interoperate with each other to grow based on the local users interests. I think the biggest hurdle is figuring out where to create your first account... it's just not intuitive and of course the more established sites shut down signups during mass waves as an anti-September protective measure.

I ended up starting at programming.dev because someone on HN mentioned it and it at least seemed to have a focus and also wasn't a ghost town. And that was pretty good but I've also joined beehaw (takes some time) because I like its size and decorum and generically would choose to get on their side of a defederation. And after I'm starting to understand how this whole activity pub and defederation and federation works, I really am optimistic about it.

I think somebody needs to build something that's a crossover between GitHub pages and activitypub that sort of behaves like discus and integrates with Lemmy/kbin/mastodon. So that blog writers can have comments at their own sites again and they can integrate together to grow organically. I haven't quite pieced it all together, but I sort of see that could grow as a replacement for what we lost with Google Reader and the loss of blog commenting communities.


Yeah, good point about instances being communities! It’s like how the Star Trek one did it and my mind boggled when I realized you could (and someone has done so) make a national Lemmy _instance_ rather than just a /r/mycountry. And then everything there is localized! So you basically have a social news _site_ in your language that can federate with all the rest.

Of course, this goes for any major interest category but it just hit me the hardest so far to realize this.

Another cool development would be a science-oriented Lemmy instance with lots of special purpose sciency stuff.

Viewed like this, the sky is the limit for Lemmy and it could have potential to grow a lot!


> Another cool development would be a science-oriented Lemmy instance with lots of special purpose sciency stuff.

Something like this?

https://mander.xyz/communities

> An instance dedicated to nature and science. > The main focus of this instance is the natural sciences, and the scope encompasses all of the STEM fields.


> I think that's the biggest hurdle is figuring out where to create your first account.

Agreed. I think the barrier would be lower if I knew I could migrate my identity to another instance if the first one became sketchy or shut down or de-federated.

Instead AFAICT I have to choose not just what community to join and where the content will initially live, but also which of these random groups to trust with my identity indefinitely going forward.


Maybe we need some sort of self-identifying system.

Like SSH keys, where you manage your own identity and then share a public key to each instance that identifies you to that instance.

Like an identity client you could self manage if you wanted to. Make it optional, portable, and transferable. So you can choose to let a server host manage your identity, or migrate to a self managed identity.

If I had more time on my hands….


I've seen discussions that Mastodon has a way of migrating accounts. I don't know if that's a Mastodon specific thing or at the ActivityPub level. I don't think it migrates the content but I'm not sure. What I think it does is notify followers of the migration. I'm not sure anything like PKI is necessary.

Conceptually a planned migrations should have a period of concurrent access to both the old and new accounts and it should be easy to publish a handshake to confirm the migration to followers to update contact info. That's my thought anyway. Something like keybase (does that still exist?) could also be used for similar sorts of proofs.

The way people used to handle this on Reddit is people would send a message from their old account saying "hey XYZ is my new account" and that seems sufficient.


What is Lemmy?



see also https://kbin.pub - kbin instances interoperates with Lemmy, but I somewhat prefer the interface.


One implementation of a Reddit style interface on top of ActivityPub.

It's one of the nice things about AP is that whatever interface you like best Twitter/Reddit/RSS can get you access to the same content.


I'm not sure about that entirely. It seems like kbin can federate with both mastodon and lemmy. Lemmy and kbin can federate with each other. But I don't think Lemmy and mastodon can right now. I don't know if that's deliberate or just unimplemented. I will say the kbin/mastodon interactions just seem... strange. Like you see mastodon users but I don't know why I wouldn't just use a mastodon client. But I haven't spent a lot of time directly inside kbin. I prefer the lemmy interface so far. People have described it as lemmy being like old.reddit and kbin being like new.reddit in feel.


Front Man for Motörhead. RIP.

Is Lemmy named after Lemmy?


Every keypress is a history navigation!!! What madness is this?


What we had before was a low interest rate phenomenon.


Indeed, I thought HN of all places would understand that the last 15 years were a bubble of zero interest rates. Now, companies have to be profitable or die.


I like tildes much more than lemmy, to be honest.


It always irked me that public institutions embraced Twitter as a primary means of communication in many cases. It never seemed right that a private company was being put in between public functions and the the public which depends on them. Not to mention, since I live outside the US, a private company in another country. I even wrote complaints to one local public body to ask them to set up a communication channel that wasn't controlled by a foreign private entity. Of course, they laughed at me and I went away like a good citizen.

It's hard to map this onto my framework of "reasonable paranoia". Even while I felt uncomfortable about it, it never occurred to me that Twitter would actually cut off access. Now here we are.


Privatization is a thing beyond seemingly fundamental online services.

Local water systems in Guam are privatized, much to the chagrin of local activists.

Elsewhere, there's an article on the guardian just like yesterday about the impact of privatization on either the UK's or some locale inside the UK, for their water usage. Basically the water department is now the highest debt entity in all water departments and it's the one that's privatized.

Privatizing public data is a shortsighted thoughtless approach to public communication.

So Play stupid games went stupid prizes.


Electricity markets are a prime example of ideology winning over logic. In Australia, the eastern states have a privatised electricity market while Western Australia has a very limited market with agreements to hold a certain amount of gas fuel at a certain price. Currently the price of electricity is significantly higher in the eastern market than the "less free" western one.


In the UK, water (Thames Water), the country natural and fundamental resource is owned by a Chinese wealth fund and Canadian Pension funds. How could anyone think this was a good idea?


> How could anyone think this was a good idea?

The profit motive.

https://en.m.wikipedia.org/wiki/Capital_accumulation


Yep, god bless our communist energy policy, while the eastern states were screaming about rocketing energy prices a year back, ours were stable because the electricity company is state owned and contributes profit to the public purse, and as you say a certain portion of gas has to be reserved for the local market.

Coming from the UK I don’t want competition in basic utilities like there is there. I don’t want to have to shop around for the best energy price every year, or to be at the mercy of the free market on pricing for the most basic of necessities like power and water.


The western Australian example just demonstrates why petrostates are wealthy - if you have big energy resources you can make stuff cheaper, surprise! And realistically all that's really happening there is west Australia is effectively imposing a tax on gas profits to subsidize the grid. There's no magic success here - just a redistribution of gas profits to electricity consumers.

I don't actually think the Australian grid is a good example of privatisation failure. There's little proof that we have meaningfully lost anything aside from the marching cry of the left.


WA owns its electricity generation and supply companies, so no it’s not just a tax, it’s nationalised infrastructure and it works great for us.

It’s not just cheaper, it takes the privatised profits off the table and instead feeds them back.

It’s not magic, it’s pretty good sense though, and it serves the people of the state well.


> There's little proof that we have meaningfully lost anything

Did we gain anything? I ask in earnest, I don't know.

I can guess that the cost of running, maintenance, etc are now "not a cost of the state", but the cost doesn't just disappear, it was always on the consumer via tax or bills (or tax and bills, hopefully in a way that sums to the same cost!).


In theory, you remove government overhead. Government bureacracies are infamous for expanding without the requirement to make a profit as a constraint.

As it pertains to power, People tend to promote government as a means to make it cheaper - but this is sort of a fallacy - the only way the government can do it cheaper is if it's operationally more efficient to a significant degree. I don't see how a government achieves this. Private energy generators are usually not very profitable - the people making money were commodity producers, recently, not electricity generators.


There's absolutely no evidence of privatisation driving up costs in Australia.

It's the hysterical marching cry of young naive people railing against the ever-increasingly vague boogeyman of neoliberalism but the actual evidence doesn't back it up.

You talk about ideology over logic while demonstrating that exact same thing.

> The AER report contains no consistent correlation between higher bills and privatisation.

> The ABS index of electricity prices across Australia, showing movement of electricity prices over time, also doesn't demonstrate a link between privatisation and price rises.

> Whether comparing electricity bills, prices or the relative price index of electricity in each state, there is no consistent link between privatisation and what consumers pay for their electricity.

> Experts say the biggest influences on what people pay for electricity are costs of transmission and distribution. They say these costs have risen in recent years irrespective of whether the owners of the transmission and distribution networks are privatised.

https://www.abc.net.au/news/2015-03-25/fact-check-does-priva...


A natural resource lieing solely in the hands of profit-driven private companies. Take a moment to reflect on the potential risks this poses for the average consumer, the consequences of placing profit above all.

Will you entrust your well-being to CEOs and boards whose sole priority is relentless growth and maximizing profits, regardless of the consequences?

Laws can only do so much to prevent vital resources from becoming unaffordable for the very people who deserve them. These companies, with no regard for your, the citizen's, vote or input, prioritize profits above all else. To them, it's just another business move, leaving the consequences behind as they move on to the next venture.

You can argue and consider other ideas, but the higher risk in the private side is always there.


> These companies, with no regard for your, the citizen's, vote or input, prioritize profits above all else.

And they are easily reigned in by the Australian Electricity Regulator, while governments owned businesses are not.

We recently had a price cap put on coal and gas prices, private companies had to eat a massive loss, government owned generators huffed and puffed until they got billion dollar bailouts from the rest of the nation for their coal guzzling power plants that should have been shut down years ago had they not been taxpayer liabilities.

There's simply no evidence of these things driving up prices. We have a huge government owned pumped hydro power storage project underway that has blown out from $2b to $10b in the spacce of a few years, if this was privately owned it would have gone bankrupt, instead more and more cash from electricity users will eventually have to be paid to fund it. It's costing 5x more than simply putting grid scale batteries in the cities and the government has sunk cost fallacy that it can't walk away from.

This is the problem with government ownership of electricity assets in Australia, politics gets completely in the way of good decision making and projects that should have failed long ago.

As I said before, where's the evidence, because all I see is ideology.

This is what happens when governments run electricity projects:

> He assured the electorate it would cost $2 billion and be up and running by 2021.

> By April 2019, a contract for part of the project was signed for $5.1 billion — and that doesn't include transmission costs, which will cost billions more.

> Who will actually pay for transmission is still being decided.

> "Someone's going to pay for it," Snowy Hydro CEO Paul Broad told 7.30.

> "The taxpayers will pay for it through your taxes, or you pay for it through your bills.

The current cost is $10b and the project has blown out to the end of the decade. No one is admitting how far they've dug or the status of meeting targets. The transmission bill is still up in the air but will likely be an additional $5b added onto household bills.

https://www.abc.net.au/news/2019-10-14/snowy-hydro-2.0-expen...


Snowy Hydro 2 was a scam from day one. It was created by a LNP government so that it could convince voter it cared about climate change. Once that leader was ousted the truth came out and the real LNP climate denial was there for all to see.

Also, the LNP have proved time and time again they can't build any public infrastructure.

Their Inland Rail project is another total disaster, a project wasting many more tens of billions and achieved absolutely nothing.

And how can anyone forget the failure that was the LNP re-design of the National Broadband; a design that moved away from fiber optic to instead use copper wire.


Why is it, despite having some of the largest LNG gas resources in the world, the Eastern states of Australia pay more for their LNG than customers in Asia?

Is it a coincidence that Western Australia (WA), a state with the most government regulation in regard to LNG exports also has the lowest LNG prices by a long margin?

In general WA, has the most regulated energy sector with the highest level of public ownership in electricity production and transmission, and by strange coincidence it also has the cheapest gas, cheapest coal and lowest electricity prices.


I'm asking anyone to provide evidence of privatisation riving up costs in Australia. It's simply not there. The most privatised state in the country has the cheapest prices, this is undeniable.

Put up your "logic" rather your ideology if you want to convince people otherwise. Downvotes don't count sorry :)

Prices are the cheapest in Victoria, with full privatisation of the network. Prices are also most expensive in South Australia, with nearly full privatisation of the network. No rational person can look at that and proclaim there is a correlation.

We have states with nearly all generation, distribution and transmission being government owned, we also have states completely out of the business. This should be a simple slam dunk to people who loudly make these easily proven claims and yet they never have any proof.

Prices are rising uniformly across the board, including in the heavily government owned states, which coincidently are the worst at rolling out renewables because they are protecting their fossil fuel golden eggs at the expense of the environment.

https://theconversation.com/myths-not-facts-muddy-the-electr...


Two eight year old pieces don't reflect the recent changes in Australian electricity prices which is the period in which the most grevious examples of price hikes have occurred.

While interesting time capsules one wonders whether Lynne Chester holds the same opinions today or has updated the prices tables 2007-2014 with more current data.

Personally I'd look less at the month to month prices and more at the decade on decade projections .. what are the winning long term strategies for cost effective power generation with the "lowest bad for climate" emmissions totals.


Play stupid games, externize stupid prizes.


Facebook cut off access to Australian fire brigade pages, among others.


Facebook news is shutting off in Canada where there are currently wildfires. I'm leaving no comment on why.


wrong.

> The move comes in reaction to the federal government's Online News Act, Bill C-18, which would require the tech giant to pay Canadian media companies for linking to or otherwise repurposing their content online

https://www.cbc.ca/player/play/2221275715931


The fuck is wrong?


No comment on why?

It is very clear that it is due to them not wanting to comply with bill C-18.

Reported on by every major news outlet that I’ve seen


Wow that's a pretty interesting story. Intrigued to see how it goes. As a lawyer for a tech co I've always wanted to entertain a situation like this (small market is behaving so absurdly you simply leave it) but it's pretty rare. Good test case.


Something similar happened in Spain in 2014, when a local law would have forced Google to pay for linking or using abstracts. They shut down Google News.

https://www.theguardian.com/media-network/2014/dec/12/google...

It remained closed until 2022 when the law was changed and newspapers were allowed to negotiate directly with Google individually.

https://www.reuters.com/technology/google-news-re-opens-spai...


But Google and Facebook both said the same would happen in France and Australia, where they are now complying with both: Spain caved too quickly.


Too quickly? I can’t tell if you are joking.


The Australia bill is different and allows negotiations no?


How's USA based Facebook related to wildfires in Canada? Check cbc.ca for the updates related to wildfires.


Due to some revenue sharing rules that just went into effect in Canada, designed to prevent news from being absorbed by Facebook and never leading to the ads being seen on the original website, FB is closing link. Sharing to cbc.ca. People can go there directly, but eventually safety information being everywhere is a good thing.


Perhaps it's only me, but Facebook is the last place I would think of to search information about wildfires.


Many people I know seem to basically rely on stuff they need to know showing up in their Facebook (or sometimes, even more bizarrely, Instagram) feed.


A lot of people spend time on Facebook. Therefore, it's a good way to push urgent information.

It's not where I would look for information. But that assumes I know I should be looking.


one doesn't search for it one stumbles onto it

and the only place to stumble onto information is where one spends the time - facebook, reddit, heck - even hackernews


A large chunk of old people in my life unfortunately still rely on FB for some news


The "revenue sharing rules" you speak of were a shakedown. Imagine if YC had to pay Tech Crunch for linking this post. They would shut down HN immediately. That's what they are requiring of FB.


Yeah except they aren't just providing a link, they are wrapping up the headline, a picture, and the first graph or two from the story onto timelines surrounded by their ads, which sends no revenue to the source. People don't click through. It just looks like "Facebook News". It's not that much different from a content farm.


> Yeah except they aren't just providing a link, they are wrapping up the headline, a picture, and the first graph or two from the story onto timelines surrounded by their ads, which sends no revenue to the source

Only to the extent allowed by news websites headers. Maybe this whole thing could have been solved by politicians understanding tech a little better?


> Maybe this whole thing could have been solved by politicians understanding tech a little better?

How often do we hear, on HN and elsewhere, people complain about laws they don't like wish politicians better understood what they are legislating? This implies the existence of a technological solution, set by technocrats who "understand" better. Whoever asks the question suggests politicians don't know better, offering that explanation without proof.

Also, for the Free Market enthusiasts out there, why hasn't the market solved this problem? What forces are preventing all parties from working out a technical and economic solution?


The market already sorted it out. Fat cat greedy news corps bought the politicians to circumvent the market.

This is like mandating car makers to pay whip makers.

> Also, for the Free Market enthusiasts out there, why hasn't the market solved this problem?

Why are assuming the market hasn't solved it? And that too without proof?

> Whoever asks the question suggests politicians don't know better, offering that explanation without proof.

It is easy. There is zero discussion of technology beyond sharing of links in the bill.


Or rather would shutdown linking to TechCrunch.


Imagine if HN just copied the first few paragraphs of the TechCrunch article and stole the pictures as well, completely cutting TechCrunch out of the loop.


Yep and nothing of value was lost.


Only the Canadian govt. is to blame for this.


The government of Canada is of the people, by the people, for the people of Canada. You're saying it's the people of Canada who are responsible for a business interest refusing to extract their wealth if it can not be done for free, and you're entirely correct.

Goodbye Meta and Google and other greedy foreign exploiters and don't let the door hit you on the way out.


> The government of Canada is of the people, by the people, for the people of Canada. You're saying it's the people of Canada who are responsible for a business interest

So they can never make mistakes and be blame free in any scenario?

> refusing to extract their wealth if it can not be done for free, and you're entirely correct.

Yes, now they can keep all their wealth with them. Why are they complaining? Why fret over this if extraction is being stopped by these companies shutting down link sharing?

> Goodbye Meta and Google and other greedy foreign exploiters and don't let the door hit you on the way out.

Yes, it is the foreign companies who are greedy and not rent seeking mega news corps who want $$$ for linking to them lol.

There are no large media companies in Canada, just independent journalism pure in heart and no commercial intent. /s

> greedy foreign exploiters

So xenophobia is a cause for this action?


In the Australian case Facebook acted in retaliation.

The law was about news media, Facebook shutdown a bunch of unrelated govt services pages in retaliation.

Facebook definitely deserves a lot of blame. Likely in this case too. I'm pro capitalist as much as the next guy but laws are by the people, if a business wants to extract money from our community it needs to play by our rules.

Facebook cutting access is like a badly raised toddler throwing a tantrum.


By this logic, shouldn't news orgs pay people in their stories?

> I'm pro capitalist as much as the next guy but laws are by the people, if a business wants to extract money from our community it needs to play by our rules.

Wait, if FB was extracting money, wouldn't FB shutting down linking cause more money to flow into news orgs? Shouldn't you celebrate this as it will reduce the "extraction"?

Why fret over this if extraction is being stopped?

If news orgs truly believe in extraction, they would be celebrating this shutdown in the streets.

> Facebook cutting access is like a badly raised toddler throwing a tantrum.

In the free world, we are allowed to choose our actions to some extent.


People are moving. https://redditmigration.com


Public institutions now often require agreeing to the Google Account ToS to get public schooling.

IMO this should be illegal.


And if it's not Google it's Microsoft. Which is, and I can't believe I'm saying this, the lesser of two evils at this point. There's a 2003 me staring through a time travel portal absolutely aghast that 2023 me is saying something like this.


Microsoft has always been an apex predator, like a lion or maybe a hippo. It can end you in a single bite, but at least you know what you’re dealing with.

Google is more like a malaria carrying species of mosquito. Little bites that you barely notice, but in aggregate are actually much more dangerous.

I know the metaphor is a bit stretched but even in the late 90’d it was clear what Microsoft did. Google has gone from “don’t be evil” to “this is the definition of open source” to …


Hippos aren't predators.


True, but they are widely known as being one of the most dangerous animals in Africa; number 11 on this list [0]. So you (probably) know what you’re dealing with.

Of course my point was that the mosquito, whose bite is far less “annoying”, is actually #1.

[0] https://www.sciencealert.com/what-are-the-worlds-15-deadlies...


All big tech companies that store non-e2ee data are equally evil, because they all by law must make that data available to government spies without a warrant. (They know this, and they still collect the data.)

These same government spies run an international network of torture centers.


Every public school and government institution for at least the last two decades has depended on private SaaS companies for everything from recording grades, to managing school lunches, payroll, attendance and almost every other facet of education


It does beg the question why the public sector doesn't aim to create a public source software division for all of these administrative needs. Jobs program and all that.


Because it would be full of unique unpatched exploits, 20 years behind schedule and 10x over budget.

I wish, the US government, was like that full of decisive, well intentioned, intelligent boot strapped innovators but the motiff of government projects are things like:

One bathroom stall at build costing millions and years behind schedule

City trash bins costing hundreds of thousands

Airplane trash bins costing tens of thousands

And the list certainly Curtis go on those are just the recent ones I’ve read about.

I’m watching a government software replacement worth tens of billions and seeing the attempt to integrate is painful. They sure aren’t agile.

I don’t work in tech.


That's just it. Create a publicly visible repo of code for all of these administrative government softwares. Let the public weigh in. Create a contribution process. We're not talking state secrets. Just an quasi-open sourced government platform.


Your average mid level developer (or in my case cloud consultant) at BigTech makes more than the school superintendent in most major cities in the US.

How are they ever going to be able to compete for talent?


That seems silly to me. Private solutions are almost always going to be better, more secure, cheaper.


For the explicit purpose of my personal data not being siphoned off, owned, and monetized by 3rd party private companies via proxy interactions with the government (like, you know, public school) that are almost virtually unavoidable for most citizens.


You trust the competence of the government to keep your information private?

And I can guarantee you they are going to bring in high priced consultants to do the work because no government or board of education is going to ever pay software developers their market value and have them on their payroll.


Private companies (Salesforce, snowflake, etc) offer "government cloud" services where they are restricted by law (fedramp and others) from using your data for their own purposes. Do you not believe that is actually happening? I've worked on them myself.


I find the “more secure” claim hard to believe. More eyes on projects is always a good thing, and “security through obscurity” is a fallacy.


There have been a number of security vulnerabilities that have lingered in open source for years.

Open source is more secure is just as much of a fallacy.


Cheaper as long as costs are externalized, sure.


It’s cheaper because an edTech company can create a piece of software once and sell it to many different school systems at a low marginal cost. That’s kind of how software works.


Would you say with a straight face that Microsoft Teams is better than Zulip/Jitsi?


Yes for large corporations and governments that aren’t just working with Teams, they are working with Microsoft Office and SharePoint and want something that works well and is understood by their contractors. Teams is already bundled and integrated with their massive enterprise Microsoft contracts.

In a given month, as a consultant I’m working with clients that use Slack, Teams, Google Meet, Zoom, and if I’m initiating a meeting, Amazon Chime.


Unfortunately, we are living in a world where using private company's social media is the most effective way to communicate public announcement as lots of people are using it daily. I am not sure if people are going to install government social media app just to receive information pertaining to public matters.


At that point “just” normalize RSS feeds. They serve an even better job for none of the centralization. My state already sends weather warnings via RSS, and I’m sure lots of other things I haven’t explored yet.


No ad revenue can be made off RSS. Google made sure of that.


A government is not making ad revenue.


That was true yesterday. It doesn't seem true today.


Don't forget to avoid reliance on a private company's app store while you're at it!

You'd need to convince people to side-load an app from a government website, or install a government app store.


Would you trust a government App Store?


Depends on the government. Where I live, yes. In many other countries, no.


Publishing into walled gardens is fine. But government should also publish on its own website.


On the other hand, public institutions have long been criticized for not communicating where the people are. Notices posted on their own website are unlikely to read by anywhere as broad an audience. Plenty uses newspapers, but even my grandparents get stuff from Twitter before the paper now, if just from younger people sharing it with them. I agree Twitter shouldn't be the only channel, but it is where the people are, or at least were.


The answer would seem to be POSSE - Publish on your Own Site, Syndicate Everywhere

https://indieweb.org/POSSE


If only we had a standard to make syndication of data simple, I mean really, really simple. It had to be so simple the standard could even be called Really Simple Syndication.


Oh what a great idea! And maybe Google could make something to read all those!


It’ll never last…


How does this help to fix the essentially non-technical problem that on long enough time horizons, and by long enough maybe 5 years at most these days, content that is optimized to provoke reaction ("engagement") outcompetes content that is optimized to promote contemplation, on any network, whether syndicated or not.


It doesn't, but I don't think governmental agencies particularly have a problem with posting outrage clickbait stories.


Yeah, they pay people to do that for them.