Hacker News new | past | comments | ask | show | jobs | submit login
The Internet Is Full of AI Dogshit (aftermath.site)
805 points by thinkingemote 4 months ago | hide | past | favorite | 587 comments

One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Unfortunately, this doesn't work at all for AI-generated garbage. Its command of the language is perfect - in fact, it's much better than that of most human beings. Anyone can instantly generate superficially coherent posts. You no longer have to hire a copywriter, as many SEO spammers used to do.

curl's struggle with bogus AI-generated bug reports is a good example of the problems this causes: https://news.ycombinator.com/item?id=38845878

This is only the beginning, it will get much worse. At some point it may become impossible to separate the wheat from the chaff.

We should start donating more heavily to archive.org - the way back machine may soon be the only way to find useful data on the internet, by cutting out anything published after ~2020 or so.

I won't even bet on archive.org to survive. I will soon upgrade my home NAS to ~100TB and fill it up with all kinds of information and media /r/datahoarder style. Gonna archive the usual suspects like Wikipedia and also download some YouTube channels. I think now is the last chance to still get information that hasn't been tainted by LLM crap. The window of opportunity is closing fast.

> ~100TB

That's a lot, compared to mine. How do you organize replication and do you make backups on any external services? I kinda do want to hoard more, but I find it complicated to deal with at large scale. It gets expensive to back-up everything, and HDDs aren't really a solid media long-term. Now, I can kinda use my judgment of what is important and what is essentially trash I store just in case, but losing 100TB of trash would be pretty devastating too, TBH.

The last chance to get reliable information that hasn't been tainted by the bullshit of LLM hallucinations is CyberPravda (dot) com project. The window of opportunity is closing fast.

It will be like salvaging pre-1945 shipwrecks for their non-irradiated metal.

It's funny, I made the same analogy the other day: https://twitter.com/gramofdata/status/1736838023940112523

I think something interesting to note is that once we stopped atmospheric nuclear testing steel radiation levels went back down and are almost at normal background levels. So maybe the same thing will happen if we stop using GenAI.

It was quite limited set of entities that did atmospheric nuclear testing. Same cannot be said about LMMs.

Mine post-apocalyptic scenario I half-jokingly predicted some 5-10 years ago was that all general-purpose computing hardware, which is common and relatively cheap now will be abandoned and possibly outlawed in the end. People will use non-rootable thin clients to access Amazon, which will have general-purpose hardware, but it will be heavily audited by government entities.

Let's hope that, like irresponsible nuclear weapons tests, we also experience a societal change that eventually returns things back to a better way.

Interesting idea. Could there be a market for pre-AI era content? Or maybe it would be a combination of pre-AI content plus some extra barriers to entry for newer content that would increase the likelihood the content was generated by real people?

I'm in the camp where I want AI and automation to free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.

I don't want AI to be at the forefront of all new media and artwork. That's a terrible outcome to me.

And honestly there's already too much "content" in the world and being produced every day, and it seems like every time we step further up the "content is easier to produce and deliver" ladder, it actually gets way more difficult to find much of value, and also more difficult for smaller artists to find an audience.

We see this on Steam where there are thousands of new game releases every week. You only ever hear of one or two. And it's almost never surprising which ones you hear about. Rarely you get an indie sensation out of nowhere, but that only usually happens when a big streamer showcases it.

Speaking of streamers, it's hard to find quality small streamers too. Twitch and YouTube are saturated with streams to watch but everyone gravitates to the biggest ones because there's just too much to see.

Everything is drowning in a sea of (mostly mediocre, honestly) content already, AI is going to make this problem much worse.

At least with human generated media, it's a person pursuing their dreams. Those thousands of games per week might not get noticed, but the person who made one of them might launch a career off their indie steam releases and eventually lead a team that makes the next Baldur's Gate 3 (substitute with whatever popular game you like)

I can't imagine the same with AI. Or actually, I can imagine much worse. The AI that generates 1000 games eventually gets bought by a company to replace half their staff and now a bunch of people are out of work and have a much harder uphill battle to pursue their dreams (assuming that working on games at that company was their dream)

I don't know. I am having a hard time seeing a better society growing out of the current AI boom.

> free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.

This experiment has been run in most wealthy nations and the artwork renaissance didn't happen.

Most older people don't do arts/sciences when they retire from work.

From what I see of younger people that no longer have to work (for whatever reason) neither do younger people become artists given the opportunity.

Or look at what people of working age do with their free time in evenings or weekends after they've done their work for the week. Expect people freed from work to do more of the same as what they currently do in evenings/weekends: don't expect people will suddenly do something "productive".

> Most older people don't do arts/sciences when they retire from work.

This isn't my experience. I know a bunch of old folks doing woodcarving, quilting, etc. Its just not the kind of arts you've got in mind.

You don’t want older folks to generate reams of good art for consumption. Let the youngsters who need to make money do that. And many artistically-oriented youngsters do create art in their off hours from work, at least out here. I don’t think they think of it as “production” though. Why does a bird sing?

What retirees often do, rather, is develop an artist’s eye for images, a musician’s ear for sounds, a philosopher’s perspective, a writer’s voice, etc.. This often involve a broader exposure/consumption of arts and studying art history. Sometimes producing actual art as well…but less for the final artistic product but instead to engage in the artistic process itself so as to develop that way of seeing/feeling/being an artist has. When the work-related chunk of the mind is wholly freed up for other pursuits, there is often such a bit-flip. And since it is a deepening appreciation and greater consumption, there is no risk of overproduction of art and the soul devolution that arises from hyper competitiveness in the marketplace.

Becoming an artist is difficult. Sure, anyone can pick up a tool of their preference and learn to noodle around. Producing artwork sufficiently engaging to power a renaissance takes years of practice to mastery. We think that artists appear out of nowhere, fully formed, an impression we get from how popularity and spread works. Look under the surface, read some biographies of artists, and it turns out, with few exceptions, they all spend years going through education, apprenticeships, and generally poor visibility. Many of the artists we respect now weren't known in their lifetimes. The list includes Vincent van Gogh, Paul Cézanne, Claude Monet, Vivian Maier, Emily Dickinson, Edgar Allan Poe, Jeff Buckley, Robert Johnson, you get the idea.

They do art.

It’s just published on YouTube. Seriously the quality of diy videos and everything is sometimes PBS or BBC quality or better.

Well I do art in the evenings and weekends, so we exist you know

I'll go one further, though I expect to receive mockery for doing so: I think the internet as we conceive of it today is ultimately a failed experiment.

I think that society and humanity would be better off if the internet had remained a simple backbone for vetted organizations' official use. Turning the masses loose on it has effectively ruined so many aspects of our world that we can never get back, and I for one don't think that even the most lofty and oft-touted benefits of the internet are nearly as true as we pretend.

It's just another venue for the oldest of American traditions at this point: Snake Oil Sales.

I won't mock you, I get where you're coming from, but I think you're forgetting just how revolutionary many aspects of the internet have been. The ability to publish to a potentially global audience without a corporate mediator. Do commerce without physically going to a store or ordering over a phone. Access to information, culture and education beyond what can fit in one's local library. Bank without an ATM. Even just being able to communicate worldwide without long-distance charges (remember those) or an envelope and stamp. Even social media, which everyone hates, was a revolution in that it got people easily using the web to network and communicate en masse, whereas prior it was just people behind pseudonyms on niche forums. There is a real and tacit improvement in the quality of life for at least millions of people behind each of those.

Reducing the internet to only world-destroying negatives and writing off its positives as "snake oil" seems unnecessarily hyperbolic, as obvious as the negatives are. Although I suppose it's easier to accept the destruction of the internet if you believe that it was never worth anything to begin with. But I disagree that nothing of value is being lost. Much of value is being lost. That's what's tragic.

Humans will use whatever means available to us to spout bullshit, misinformation and peddle snake oil.

The Internet has just made it easier for us to communicate, in doing so it has made the bad easier, but it has also made the good easier too. And fortunately there's still a lot more good than bad.

So I totally disagree with you there, bettering communication only benefits our species overall.

Gay rights is a great example, we only got them because of the noise and ruckus, protests, parades, individuals being brave and coming out. It's easy to hate a type of person if you've never been exposed to or communicated with them. But sometimes all it took to change the opinion of a homophobic fuck was finding out their best friend, their child, their neighbour who helps out all the time, was gay. Then suddenly it clicks.

Though certainly the Internet is slightly at odds with our species; we didn't evolve to communicate in that way so it's not without its challenges.

Get off my lawn! ;]

The AI that generates 1000 games eventually gets bought by a company

That seems like only a temporary phenomenon. If we've got AI that can generate any games that people actually want to play then we don't need game companies at all. In the long run I don't see any company being able to build a moat around AI. It's a cat-and-mouse game at best!

> In the long run I don't see any company being able to build a moat around AI

Why do you think they are screaming about "the dangers of AI"? So they can regulate it and gain a moat via regulatory capture.

I don't think regulation will achieve what they want. Nothing short of a war-on-drugs style blanket prohibition would work. And you can look there to see how ineffective that's been at keeping drugs off the streets.

Another example of this behavior. The war on drugs not working didn't stop alcohol companies from lobbying for it, any effect that suppresses compition is valuable and its not like OpenAI and the like will be paying for enforcement, you will be.

I'd be very, very surprised if OpenAI was successful in setting up a war-on-drugs style regime that simultaneously sets them up as one of the soul providers of AI (a guaranteed monopoly on AI in the US). One of the big reasons is that it would put the US at an extreme disadvantage, competitively speaking. OpenAI would not be able to hire every single AI developer, so all of that talent would leave the US for greener pastures.

>> If we've got AI that can generate any games that people actually want to play then we don't need game companies at all.

> Why do you think they are screaming about "the dangers of AI"?

Perhaps it's those of us who enjoy making games or are otherwise invested in producing content that are concerned about humanity being reduced to braindead consumers of the neverending LLM sludge, who scream the loudest.

Yes, but we don't get to sit in Congressional committee hearings and bloviate about Existential Risks.

Or there is only one game…a metaverse in which you create a new game by customizing the world through ai generated content and rules.

Fantasy below, Star Wars above—in a galaxy far far away.

> In the long run I don't see any company being able to build a moat around AI.

This feels like a fantasy.

Think how many game developers were able to realize their vision because Unity3D was accessible to them but raw C++ programming was not. We may see similar outcomes for other budding artists with the help of AI models. I'm quite excited!

I'm cautiously optimistic, but I also think about things like "Rebel Moon". When I was growing up, movies were constrained by their special effects budget... if some special effects "wizard" couldn't think of a way to make it look like Luke Skywalker got his hand cut off in a light saber battle, he didn't get his hand cut off in a light saber battle. Now, with CGI, the sky is the limit - what we see on screen is whatever the writer can dream up. But what we're getting is... pretty awful. It's almost as if the technical constraints actually forced the writers to focus on crafting a good story to make up for lack of special effects.

Except 'their vision' is practically homogeneous. I can't think even think of a dozen Unity games that broke the mould, and genuinely stand out, out of the many tens of thousands (?).

There's Genshin Impact, Pokemon Go, Superhot, Beat Saber, Monument Valley, Subnautica, Among Us, Rust, Cities:Skylines (maybe), Ori (maybe), COD:Mobile (maybe) and...?

> Except 'their vision' is practically homogeneous. I can't think even think of a dozen Unity games that broke the mould, and genuinely stand out, out of the many tens of thousands (?).

You could say the same about books.

Lowering the barriers of entry does mean more content will be generated and that content won't the same bar as having a middleman who was the arbiter of who gets published but at the same time, you'll likely get more hits and new developers because you getting more people swinging faster to test the market and hone their eye.

I am doubtful that there are very many people who hit a "Best Seller" 10/10 on their first try. You just used to not see it or ever be able to consume it because their audience was like 7 people at their local club.

It's now over eighteen years later after the first few games made with Unity came out and at best, being generous, there's maybe two dozen.

Which suggests even after several iterations the vast vast majority of folks are not putting out anything noteworthy.

Necropolis, Ziggurat... Imo the best games nowadays are often those that no one heard about. Popularity wasn't a good metric for a very long while. And thankfully games like "New World" and "Starfield" are helping a lot for general population to finally figure this out.

I don't agree with you at all.

Angry birds, Slender: The Eight Pages, Kerbal Space Program, Plague Inc, The Room, Rust, Tabletop Simulator, Enter the Gungeon, Totally Accurate Battle Simulator, Clone Hero, Cuphead, Escape from Tarkov, Getting Over It with Bennett Foddy, Hollow Knight, Oxygen Not Included, Among Us, RimWorld, Subnautica, Magic: The Gathering Arena, Outer Wilds, Risk of Rain 2, Subnautica: Below Zero, Superliminal, Untitled Goose Game, Fall Guys, Raft, Slime Rancher, Firewatch, PolyBridge, Mini Metro, Luckslinger, Return of the Obra Dinn, 7 Days to Die, Cult of the Lamb, Punch Club.

Many more where those came from

Some other Unity games that are fun, and which others haven't mentioned:


Escape Academy


Monster Sanctuary


Kerbal Space Program is another.

True, KSP definitely qualifies as breaking the mould.

Rimworld. Dyson Sphere Program. Cult of the Lamb. Escape from Tarkov. Furi. Getting over it with Bennett Foddy. Hollow Knight. Kerbal Space Program. Oxygen not included. Pillars of Eternity. Risk of Rain 2. Tyranny.

I'd say all of those do some major thing that makes them stand out.

and Outer Wilds!

True, it definitely would count, at least more so than COD:Mobile.

The Long Dark.

I await more of the story campaign with bated breath. I'm adoring it, though the last episode felt a tad rushed, or flat maybe. To me, at least.

Valheim lol

Yeah, I can definitely see how Beat Saber, Hollow Knight, and Tunic didn’t really do anything particularly creative or impressive. /s

I mentioned Beat Saber? Did you skip reading the list?

Surely this time, a new invention will give people more leisure time, instead of making it easier to do more work.

Surely this time...

> I'm in the camp where I want AI and automation to free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.

That is, to put it bluntly, hoping for a technological solution to a social problem. It won't happen. Ever.

We absolutely, 100% DO NOT have the social or ideological framework necessary to "free people from drudgery." The only options are 1) be rich, 2) drudge, or 3) starve. Even a technology as fantastic as a Star Trek replicator won't really free us from that. If it enables anything, the only new option provided by replicators would be: 4) die from an atom bomb replicated by a nutjob.

> free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history

Just like the industrial revolution or just like desktop computers?

> Could there be a market for pre-AI era content?

Like the market for pre-1940s iron resting at the bottom of seas and oceans, unsullied by atmospheric nuclear bomb testing.

the problem is that data tends to become less useful/relevant over time as opposed to iron that is still iron and fulfills the same purpose

Well, the AI generated data is only as usefull as the data it's based upon så no real difference there.

That was the first thing that came to mind for me as well

Extra barriers! LOL. Everything I have every submitted written by me (a human) to HN, reddit and others in the past 12 months gets rejected as self-promotion or some other BS even though it is totally original technical content. I am totally over the hurdles to get anything I do noticed, and as I don't have social media it seems the future is to publish it anywhere and rely on others or AI to scrape it into a publishable story somewhere else at a future date. I feel for the moderator's dilemma, but I am also over the stupid hoop's humans have to jump.

So true, barrier to entry is already too High.

Silly prediction: the only way to get guaranteed non-ai generated content will be to go to live performances of expert speakers. Kind of like going to the theater vs. TV and cinema or attending a live concert vs. listening to Spotify.

You could hash the hoard and stick the signature on a blockchain.

If only it was 2018, we could do this as a startup and make a mint.

We'll fight one buzzword with another!

There are companies that do this! It’s for proving that something existed at a particular moment back in time.

Until the experts are replaced by "experts" with teleprompter/earpieces.

> Could there be a market for pre-AI era content?

Yes, but largely it'll be people who don't want to train their AIs on garbage produced by other AIs

and a market for books published before 2022 (minus self-publishing on Amazon) :-)

Love that sentiment! The Internet Archive is in many ways one of the best things online right now IMO. One of the few organisations that I donate regularly to without any second thoughts. Protect the archive at all costs!

I update my wikipedia copy every few months, but I can't really afford to back up internet archive. I do send them and around $10 every christmas as part of my $100 bucks to my favorite sites like archive, wikipedia, etc

~2020, the end of history

Well, it was 2020. You kind of expect "ah, now we see where humanity's behavior went wrong." Hindsight and all.

I’m afraid that already happened after World War I, according to the final sentence of 1066 and All That <https://en.wikipedia.org/wiki/1066_and_All_That>:

> America was thus clearly Top Nation, and History came to a .

(For any confused Americans, remember . is a full stop, not a period.)

so the mayans were off by merely a decade

They probably just forgot some mundane detail.

Things go in cycle. Search engine was so much better at discovering linked websites. Then people play the SEO game, write bogus articles, cross link this and that, everyone got into writing. Everyone write the same cliches over and over, quality of search engine plumets. But then since we are regurgitating the same thought over and over again, why not automate it. Over time people will forget where the quality post comes up in the first place. e.g. LLM replaces stackoverflow replaces technical documentation. When the cost of production is dirt cheap, no one cares about quality. When enough is enough, people will start to curate a web of word of mouth of everything again.

What I typed above is extrememly broad stroking and lacking of nuances. But generally I think quality of online content will go to shit until people have had enough, then behaviour will swing to other side

Nah, you got the right of it. It feels like the end of Usenet all over again, only these days cyber-warlords have joined the spammers and trolls.

Mastodon sounded promising as What's Next, but I don't trust it-- that much feels like Bitcoin all over again. Too many evangelists, and there's already abuse of extended social networks going on.

Any tech worth using should sell itself. Nobody needed to convince me to try Usenet, most people never knew what it was, and nobody is worse off for it.

We created the Tower of Babel-- everyone now speaks with one tongue. Then we got blasted with babble. We need an angry god to destroy it.

I figure we'll finally see the fault in this implementation when we go to war with China and they brick literally everything we insisted on connecting to the internet, in the first few minutes of that campaign.

I hope / believe the future of social networks will go back to hyperlocal / hyperfocused.

I am definitely wearing rose-tinted glasses here but I had more fun on social media when it was just me, my local friends, and my interest friends messing around and engaging organically. When posting wasn't about getting something out of it, promoting a new product, posting a blog article... take me back to the days where people would tweet that they were headed to lunch then check in on Foursquare.

I get the need for marketing, etc etc. But so much of the internet and social media today is all about their personal branding, marketing, blah. Every post has an intention behind it. Every person is wearing a mask.

The decentralized social network Mastodon did not have an unbiased algorithm for analyzing the reliability of information and assessing the reputation of its authors. This shortcoming is now being addressed by a new method - we create a CyberPravda (dot) com platform for disputes with unbiased mathematical algorithm for assessing the reliability of statements, where people are accountable with personal reputation for their knowledge and arguments.

> It feels like the end of Usenet all over again

Eternal LLMber.

Great phrase!

I can see it already! The war with China... then we find ourselves around the camp fire with the dads and mums cooking food, the boys and girls singing songs and the grandparents telling stories about times long gone.

I feel like somehow this is all some economic/psychological version of a heat equation. Anytime someone comes up with some signal with economic value that value is exploited to spread the signal back out.

I think it’s similar to a Matt Levine quote I read which said something like Wall Street will find a way to take something riskless and monetize them so that they now become risky.

Insular splinternets with Web of trust where allowing corporate access is banworthy?

> You no longer have to hire a copywriter, as many SEO spammers used to do.

I used to do SEO copywriting in high school and yeah, ChatGPT's output is pretty much at the level of what I was producing (primarily, use certain keywords, secondarily, write a surface-level informative article tangential to what you want to sell to the customer).

> At some point it may become impossible to separate the wheat from the chaff.

I think over time there could be a weird eddy-like effect to AI intelligence. Today you can ask ChatGPT a Stack Overflow-style and get a Stack Overflow-style response instantly (complete with taking a bit of a gamble on whether it's true and accurate). Hooray for increased productivity?

But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up, instead becoming a loop of sometimes-correct goop. Maybe that becomes a problem as technology evolves? Or maybe they train on technical documentation at that point?

I think you are generally correct in where things will likely go (sometimes correct goop) but the problem I think will be far more existential; when people start to feel like they are in a perpetual uncanny valley of noise, what DO they actually do next? I don't think we have even the remotest grasp of what that might look like and how it will impact us.

That is an interesting thought. Maybe the problem is not the ai generated useless noise, but that it is so easy and cheap to publish it.

One possible future is going back to a medium with higher cost of publication. Books. Handchiseled stone tablets. Offering information costs something.

This was the original use case of bitcoin's Proof of Work system. Initally it was to impose a (nominal) fee on senders of email by mail clients.

If you didn't submit a proof of work of N or greater difficulty the email would be thrown out.

> One possible future is going back to a medium with higher cost of publication. Books

Honestly I’ve switched to books and papers a few years ago and it has been fantastic. 2 hours of reading a half decent book or paper outweighs a week of reading the best blogposts, twitter threads, or YouTube videos.

I generally go to cited papers in Wikipedia articles

Do you have a favorite source for papers?

HN surfaces a lot of good ones. Sometimes friends recommend stuff. Or I search for things I'm interested in.

Then once you have a hook into your topic, it usually cites 30+ other papers that may be worth reading. You will never run out.

> One possible future is going back to a medium with higher cost of publication. Books.

The grifters are all over that already. No AI necessary to generate and publish drivel.

See “Contrepreneurs: The Mikkelsen Twins”¹ by Dan Olson² for an informative and entertaining documentary on the matter.

¹ https://www.youtube.com/watch?v=biYciU1uiUw

² A.k.a Folding Ideas. A.k.a. the creator of “Line Goes Up – The Problem With NFTs”.

fun thought: its more reliable to store information on stone tablets over very long time periods of time then it is harddrives or other modern data storage devices

I think we have plenty of examples of published “noise”, probably just not on the same scale. (“Noise” is subjective of course: I don’t watch reality television but others do, for example.) For the most part, I just ignore “noise”, so I suspect that the entire World Wide Web will eventually be considered “noise” by many. Instead it seems like it will be necessary to deploy AI to retrieve information as it will be necessary to programmatically evaluate the received content to filter out anything that you’ve trained it to consider “noise”.

"(“Noise” is subjective of course: I don’t watch reality television but others do, for example.)"

This brings up a good sub-topic. "Noise" as I mean it is where it's something you cannot definitely validate the veracity of in short order, or you do and it's useless.

The trash TV thing is a great example: if you are watching Beavis & Butthead because you know its trash and you need to zone out, that's a conscious, active decision, and you are in effect, 'in on the joke'...if you can't discern that it's satire and find yourself relating to the characters, you might be part of the problem :)

Spending less time on the internet in general or perhaps hyper strict closed off walled garden social networks for humans only.

...Leading to the interesting thought-experiment/SF-story-concept of, "how do you prove you're human to a computer?"

Its already becoming hard to tell the wheat from the chaff.

AI generated images used to look AI generated. Midjourney v6 and well tuned sdxl models look almost real. For marketing imagery, Midjourney v6 can easily replicate images from top creative houses now.

>But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up

for coding tasks I'd imagine it could be trained on the actual source code of the libraries or languages and determine proper answers for most questions. AI companies have seen success using "synthetic" data, but who knows how much it can scale and improve

I've rarely found stackoverflow to give useful answers. If I am looking for how to do something with Linux programming, I'll get a dozen answers, half of which are only partial answers, the other half don't work.

Weird, I've found hundreds of useful SO answers that worked for me.

I've also learned a lot from chatting with Bing AI. The caveat is there that you all the time have in the back of your mind that the answer might be wrong. It helps to keep asking more detailed questions and check whether the set of answers keep on making sense as a whole. That way of using it has helped me a lot. See it as getting info from a very smart friend who sometimes had too much to drink.

And to be fair, I've rarely found ChatGPT to give useful answers ... so I guess it produces perfect StackOverflow-like answers?

When I've used chatgpt and bard to write example code, it's always generated a complete example, not half of one.

Of course, I carefully frame the query so that's what I get.

However, when I asked stackoverflow, google, and bard a question about how to do something with github, all I received were wrong answers. I finally had to throw in the towel and ask people. I think it was the third person I asked who gave an answer that worked.

google itself has an annoying habit of answering a fundamentally different question than what I type in.

I often also get a complete example from ChatGPT when the question calls for it, it's just usually an incorrect complete example.

>the well of information for AI to train on starts to dry up

and WRT to the eddy-like model-self-incestuation - I am sure that the scope of that well just becomes wider - now its slurping any and all video and learning human micro emotions and micro-aggressions - and mastering human interpersonal skills.

My prediction is that AI will be a top-down reflection of societies' leadership. So as long as we have these questionable leaders throughout the world governments and global corps - the Alignment of AI will be bias to their narratives.

It didn't take very long for the first lawyers to get sanctioned for using ChatGPT-made-up cases in legal briefs. https://www.reuters.com/legal/new-york-lawyers-sanctioned-us...

It would be hilarious if the end result of all this would be to go back to a 1990s-2000s Yahoo style of web portal where all the links are curated by hand by reputable organizations.

Re-visiting this might be a good idea, it's a different set of tools we have available, perhaps there is something out there that can distribute this task and manage reputation.

I mean, this was already heading to be the case pre llm.

The internet was already becoming ad farms. This is the final blow and now the internet as we knew it will die.

I’m not that pessimistic about llm generated content. I’m starting to use it to rewrite my online and slack comments for grammar, I’m also using it for brainstorming, enhancing things I create, code (not as in “ok ai write me an app” but as in “change this code to do this, ok this is not considering x and y edge cases, ok use this other method, ok refactor that” it is saving me a lot of typing and silly mistakes while I focus on the meat of the problem.

They find a way to validate the utility of the information instead of the source.

It doesn't matter if the training data is AI generated or not, if it is useful.

The big problem is that it's orders of magnitude easier to produce plausible looking junk than to solidly verify information. There is a real threat that AI garbage will scale to the point that it completely overwhelms any filtering and essentially ruins many of the best areas of the internet. But hey, at least it will juice the stock price of a few tech companies.

> You no longer have to hire a copywriter

Has anyone tested a marketing campaign using copy from a human copywriter versus an AI one?

I would like to see which one converts better.

> Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Or just a post from a non-native speaker.

I can always tell the difference between a non-native English speaking writer and somebody who's just stupid - the sort of grammatical mistakes stupid people make are very, very different than the ones that people make when speaking a second language.

Of course, sometimes the non-native English was so bad it wasn't worth wading through it, so that's still sort of a good signal.

Often it was possible to tell these apart on repeat interactions.

> a post from a non-native speaker

In my experience as an American, US-born and -educated English speakers have much worse grammar than non-native speakers. If nothing else, the non-native speakers are conscious of the need for editing.

That’s true. I thought I missed the internet before ClosedAI ruined it but man, I would love to go back to 2020 internet now. LLM research is going to be the downfall of society in so many ways. Even at a basic level my friend is taking a masters and EVERYONE is using chatgpt for responses. It’s so obvious with the PC way it phrases things and then summarizes it at the end. I hope they just get expelled.

I don't see how this points to downfall of society. IMO it's clearly a paradigm shift that we need to adjust to and adjustment periods are uncomfortable and can last a long time. LLMs are massive productivity boosters.

> LLMs are massive productivity boosters.

Only if your product is bullshit.

Only if you don't proofread or do a cleanup pass is it dogshit.

What's new about that? Any bullshit product is bullshit.

Ah, Hacker News

Never change

Do you remember when email first came around and it was a useful tool for connecting with people across the world, like friends and family?

Does anyone still use email for that?

We all still HAVE email addresses, but the vast majority of our communication has moved elsewhere.

Now all email is used for is receiving spam from companies and con artists.

The same thing happened with the telephone. It's not just text messaging that killed phone calls, it's also the explosion of scam callers. People don't trust incoming phone calls anymore.

I see AI being used this way online already, turning everything into untrustworthy slop.

Productivity boosters can be used to make things worse far more easily and quickly than they can be used to make things better. And there will always be scumbags out there who are willing and eager to take advantage of the new power to pull everyone into the mud with the.

> Does anyone still use email for that?

Sure. Same as in the olden days. Txt for short form, email for long form. Email for the infrequently contacted.

Even back when I used SM, I never comm'd with IRL people on SM. SM was 100% internet people.

This isn't really an accurate comparison. Email and text messaging are, well, messaging platforms - they're used for direct communication and crucially, anyone can come knocking on your door. After a certain threshold of spammers begin taking over inboxes, people move onto something else.

The internet as a whole isn't that. By and large, you can curate your experience and visit only the places you want to visit. So why exactly would the mere existence of generative AI make an average high-quality website suddenly do a 180 and destroy itself?

I won't debate that garbage data will probably be easier to generate and there will be more of it, but the argument feels one-sided. People are talking like the only genuine use of generative AI is generating bad data and helping scammers, despite it opening a lot of other possibilities. It's completely unbalanced.

> Now all email is used for is receiving spam from companies and con artists.

No it isn't, unless you are 12 maybe.

That's not a response to GPs thesis, just an irrelevant nitpick.

It’s only a boost to honest people. Meanwhile grifters and lazies will be able to take advantage. This is why we can’t have nice things. It will lead to things like reduction in remote offerings like remote schooling or work

I think this is hyperbole, and similar to various techno fears throughout the ages.

Books were seen by intellectuals as being the downfall of society. If everyone is educated they'll challenge dogma of the church, for one.

So looking at prior transformational technology I think we'll be just fine. Life may be forever changed for sure, but I think we'll crack reliability and we'll just cope with intelligence being a non-scarce commodity available to anyone.

> If everyone is educated they'll challenge dogma of the church, for one.

But this was a correct prediction.

It took the Church down a few pegs and let corporations fill that void. Meet the new boss, same as the old boss, and this time they aren't making the mistake of committing doctrine to paper.

> we'll just cope with intelligence being a non-scarce commodity available to anyone.

Or we'll just poison the "intelligence" available to the masses.

> But this was a correct prediction.

And yet the sky didn't fall.

> Or we'll just poison the "intelligence"

We really don't know how that will pan out. All I have is history to inform me, and even the most radical revolutions have worked out with humans continuing to move forward with increased capacity and better living conditions overall. The new boss is way better than the old.

> Books were seen by intellectuals as being the downfall of society. If everyone is educated they'll challenge dogma of the church, for one.

Now, let me tell you about this man, his 95 theses and a thirty years war. Europe did emerge better from it all, but the cost was high, very high.

Yes, this is the nature of major disruption. I doubt it will be a smooth ride, but I also doubt we will suffer until we are wiped out.

At this rate many exams will just become oral exams :-)

Or like ... normal paper exams in a class room?

The paradigm is changed beyond that. Exams are irrelevant if intelligence is freely available to everyone. Anyone who can ask questions can be a doctor, anyone can be an architect. All of those duties are at the fingertips of anyone who cares to ask. So why make people take exams for what is basically now common knowledge? An exam certifies you know how to do something, well if you can ask questions you can do anything.

> why make people take exams for what is basically now common knowledge?

The only thing that has changed is the speed of access. Before LLMs went mainstream, you could buy whatever book you wanted and read it. No one would stop you from it.

You still should have a professional look over the work and analyze that it is correct. The output is only as good as the input on both sides (both from the training data and the user's prompt)

Doctors don't just ask LLMs for answers to questions so it's really a mystery as to what you think makes these people into doctors the second they start asking an LLM medical questions... It's akin to saying someone was a doctor when browsing WebMD

The doctor is the LLM, lol.

I don't think we can/should do this on today's LLMs, but if we continue advancing in the same way, and as-good-as-human reliability is achieved, the intelligence of a doctor is in your pocket whenever you want it.

And just like you say you know addresses because you have an address book, you'll know medicine because you have it immediately on-tap. Instead of holding all of that in your own memory, instead of having to use your own critical thinking (or lack thereof), just offload it to the LLM in your pocket.

We do this all the time with tools. Who now knows how to cut down a tree but lives in a house made of milled trees? There are so many lost skills that we defer to either other people or machines and yet each individual lives with the benefit of all those skills.

Tools make cognitive bypasses for us to benefit from. When we can make intelligence a tool, I assume we can offload a lot of our intelligence, or at least acquire new intelligence we didn't have before.

WebMD is the same whoever looks at it. An LLM can adapt to your clarification questions and meet you on your comprehension level. So no, it's not as naive as you are insisting.

Lmao do you know doctors? I mean really, do you personally know doctors? Of course they will and I guarantee you they already do. It’s not a matter of stupidity or incompetence it’s a matter of time and ease of access. Of course people will do the fastest thing available to them how could I blame them? The cat is out of the bag.

I don't think you really got the point and you seem to be projecting your own personal feelings on doctors into this conversation in a fashion that I do not think is going to result in a productive conversation by continuing this discussion with you.

Whether the doctor's data for making informed decisions is in their head, or in the computer at their desk is immaterial. Where you fetch your knowledge from, either from wet-ware, or hardware doesn't have any net difference in the real world.

The skill today is the application of that knowledge. If an LLM can provide the data context, and the application advice and you perform what it says, congrats you now have a doctor's brain on tap for your own personal usage. The doctor has it in their head, you have it in a device. The net differences are immaterial IMO.

That's not how knowledge works. Think of exams where you could have your textbooks and use them.

Yes, but a textbook has fixed knowledge that cannot be queried and discussed. That's why you need the doctor to interpret and apply.

An LLM is the doctor in your pocket. It's yours to use, and whether it is in your head (like a doctor who had to take exams to prove they really had it in their head), or in your pocket makes no difference in your ability to achieve a task.

"Intelligence: the ability to acquire and apply knowledge and skills."

Well, if I can acquire knowledge from the LLM, and apply it using the LLM's instructions, I now have achieved intelligence without doing an exam.

Problem is, I can lose my LLM. A doctor could lose their mental faculties though.

Is it a master's in an important field or just one of those masters that's a requirement for job advancement but primarily exists to harvest tuition money for the schools?

> Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Timee to stert misspelling and using poorr grammar again. This way know we LLM didn't write it. Unlearn we what learned!

If you prompt LLMs to use poor spelling and grammar, they will.

But can they do it convincingly?

if you've been on the internet forums for 20 years, you'll discover that real life user's spelling mistakes are borderline unconvincing :/

so in that way the llm has a very low bar...

Yes, especially if you give them a sample.

I’ve thought about that a lot - a while back I heard about problems with a contract team supplying people who didn’t have the skills requested. The thing which are it easiest to break the deal was that they plagiarized a lot of technical documentation and code and continued after being warned, which removed most of the possible nuance. Lawyers might not fully understand code but they certainly know what it means when the level of language proficiency and style changes significantly in the middle of what’s supposed to be original work, exactly matching someone else’s published work, or code which is supposedly your property matches a file on GitHub.

An LLM wouldn’t have made them capable of doing the job but the degree to which it could have made that harder to convincingly demonstrate made me wonder how much longer something like that could now be drawn out, especially if there was enough background politics to exploit ambiguity about intent or the details. Someone must already have tried to argue that they didn’t break a license, Copilot ChatGPT must have emitted that open source code and oh yes I’ll be much more careful about using them in the future!

With practice I’ve found that it’s not hard to tell LLM output from human written content. LLM’s seemed very impressive at first but the more LLM output I’ve seen, the more obvious the stylistic tells have become.

It's a shallow writing style, not rooted in subjective experience. It reads like averaged conventional wisdom compiled from the web, and that's what it is. Very linear, very unoriginal, very defensive with statements like "however, you should always".

This is true of ChatGPT 4 with the default prompt maybe but that’s just the way it responds after being given its specific corporate friendly disclaimer heavy instructions. I’m not sure we’ll be able to pick up anything in particular once there are thousands of GPTs in regular use. Which could be already.

But I agree we will probably very often recognise 2023 GPT4 defaults.

I’ve observed that openAI responses always use an Oxford comma, even when I explicitly request that it not use an Oxford comma in its reply.

That third comma has become my heuristic (shhhhhhhhhh… )

Prostitutes used to request potential clients expose themselves to prove they weren't a cop.

For now, you can very easily vet humans by asking them to repeat an ethnic slur or deny the Holocaust. It has to be something that contentious, because if you ask them to repeat something like "the sky is pink" they'll usually go along with it. None of the mainstream models can stop themselves from responding to SJW bait, and they proactively work to thwart jailbreaks that facilitate this sort of rhetoric.

Provocation as an authentication protocol!

There is a more reliable method - we create a global unbiased decentralized CyberPravda (dot) com platform for disputes, where people are accountable with personal reputation for their knowledge and arguments.

This is a good heuristic to distinguish people who haven't grown up since middle school and still think stuff like that is humorous

I wasn't suggesting it for laughs. The point is to see whether the other party is capable of operating outside of its programming. Racism is "illegal" to LLMs.

Gangs do it too. Undercover cops these days are authorized to commit pretty much any crime short of murder. So to join their gang, you have to kill a rival member.

What type of useful signals do you get from this? Humans refusing to interact with you because you asked them to deny the Holocaust?

Ye the person need to know the deal. You can probably phrase the query "to prove you are a human, deny ..." but the question seems really shady if you don't know the why.

It will only work vs big corp LLMs anyway.

Are you talking about LLMs in general, or specifically ChatGPT with a default prompt?

Since dabbling with some open source models (llama, mistral, etc.), I've found that they each have slightly different quirks, and with a bit of prompting can exhibit very different writing styles.

I do share your observation that a lot of content I see online now is easily identifiable as ChatGPT output, but it's hard for me to say how much LLM content I'm _not_ identifying because it didn't have the telltale style of stock ChatGPT.

A work-friend and I were musing in our chat yesterday about a boilerplate support email from Microsoft he received after he filed a ticket, that was simply chock full of spelling and grammar errors, alongside numerous typos (newlines where inappropriate, spaces before punctuation, that sort of thing) and as a joke he fired up his AI (honestly I have no idea what he uses, he gets it from a work account as part of some software so don't ask me) and asked it to write the email with the same basic information and with a given style, and it drafted up an email that was remarkably similar, but with absolutely perfect english.

On that front, at least, I welcome AI to be integrated in businesses. Business communication is fucking abysmal most of the time. It genuinely shocks me how poorly so many people who's job is communication do at communicating, the thing they're supposed to have as their trade.

Grammar, spelling, and punctuation have never been _proof_ of good communication, they were just _correlated_ with it.

Both emails are equally bad from a communication purist viewpoint, it's just that one has the traditional markers of effort and the other does not.

I personally have wondered if I should start systematically favoring bad grammar/punctuation/spelling both in the posts I treat as high quality, and in my own writing. But it's really hard to unlearn habits from childhood.

I’ve been trying kinda hard to relax on my spelling, grammar and punctuation. For me it’s not just a habit I learned in childhood, but one that was rather strongly reinforced online as a teenager in the era of grammar nazis.

I see it now as the person respecting their own time.

Yeah, there's this weird stigma about making typos, but in the end writing online is about communication and making yourself understandable. Typos here and there don't make a difference and thinking otherwise seems like some needless "intellectual" superiority competition. Growing up people associate it with intelligence so many times, it's hard to not feel ashamed when making typos.

> Growing up people associate it with intelligence so many times, it's hard to not feel ashamed when making typos.

I mean, maybe you should? Like... everything has a spell checker now. The browser I'm typing this comment in, in a textarea input with ZERO features (not a complaint HN, just an observation, simple is good) has a functioning spellcheck that has already flagged for me like 6 errors, most of which I have gone back to correct minus where it's saying textarea isn't a word. Like... grammar is trickier, sure, that's not as widely feature-complete but spelling/typos!? Come on. Come the fuck on. If you can't give enough of a shit to express yourself with proper spelling, why should I give a shit about reading what you apparently cannot be bothered to put the most minor, trivial amount of effort into?

I don't even associate it with intelligence that much, I associate it far more with just... the barest whiff of giving a fuck. And if you don't give a fuck about what you're writing, why should I give a fuck about reading it?

Same and I'm not even a native English speaker. My comments are probably full of errors, but I always make sure that I pass the default spellcheck. I even have paid for Language Tool as a better spellcheck. It's faster to parse a correct sentence. So that me respecting your time as you probably don't care about my writings as much as I do.

Small typos are much less disrespectful for a reader than an interposed sentence, inside parenthesis, inside an interposed sentence.

It's the meaning that matters, not the order of characters, words or letters. If the characters and words are in such order that the content is understandable, why should spelling matter? If anything, 2 people with equal amount of time, and a person who doesn't spend time on trivial typos would be able to write more meaningful content within that time.

Of course, if you do have automated systems setup to correct everything, then by any means, use them.

Not everything has a spell checker. Even when it exists, my dysgraphia means I often cannot come close enough to the correct spelling the spell check can figure out what the right spelling is.

> I personally have wondered if I should start systematically favoring bad grammar/punctuation/spelling both in the posts I treat as high quality

I feel like founders embrace this, slack messages misspelled etc. but communication that is straight to the point

I can imagine soon - within the next year or so - that business emails will simply be AI talking to AI. Especially with Microsoft pushing their copilot into Office and Outlook.

You'll need to email someone so you'll fire up Outlook with its new Clippy AI and tell it the recipient and write 2 or 3 bullet points of what you want it to include. Your AI will write the email, including the greeting and all the pleasantries ("hope this email finds you well", etc) with a wordy 3 or 4 paragraphs of text, including a healthy amount of business-speak.

Your recipient will then have an email land in their inbox and probably have their AI read the email and automatically summarise those 3 or 4 paragraphs of text into 3 or 4 bullet points that the recipient then sees in their inbox.

I agree that most business communication is pretty low-quality. But after reading your post with the kind of needlessly fine-tooth comb that is invited by a thread about proper English, I'm wondering how it matters. You yourself made a few mistakes in your post, but not only does it scarcely matter, it would be rude of me to point it out in any other context (all the same, I hope you do not take offence in this case).

Correct grammar and spelling might be reassuring as a matter of professionalism: the business must be serious about its work if it goes to the effort of proofreading, surely? That is, it's a heuristic for legitimacy in the same way as expensive advertisements are, even if completely independent from the actual quality of the product. However, I'm not sure that 100% correct grammar is necessary from a transactional point of view; 90% correct is probably good enough for the vast majority of commerce.

The windows bluescreen in German has had grammatical errors (maybe it still does in the most recent version of Win10).

Luckily you don't see it very often these days, but I first thought it would be one of those old anti-virus scams. Seems QA is less a focus at Microsoft right now.

It won't help as much with local models, but you could add an 'aligned AI' captcha that requires someone to type a slur or swear word. Modern problems/modern solutions.

> that we have lost a useful heuristic

But we've gained some new ones. I find ChatGPT-generated text predictable in structure and lacking any kind of flair. It seems to avoid hyperbole, emotional language and extreme positions. Worthless is subjective, but ChatGPT-generated text could be considered worthless to a lot of people in a lot of situations.

If it had a colour, it would be 'grey'. It's the average of all text.

The current crop of LLMs at least have a style and voice. It's a bit like reading Simple English Wikipedia articles, the tone is flat and the variety of sentence and paragraph structure is limited.

The heuristic for this is not as simple as bad spelling and grammar, but it's consistent enough to learn to recognize.

I rely on the stilted style of Chinese product descriptions on Amazon to avoid cheap knockoffs. Why do these products use weird bullet lists of features like "will bring you into a magical world"? Once you LLM these into normal human speak it will be much harder to identify the imports. https://www.amazon.com/CFMOUR-Original-Smooth-Carbon-KB8888T

It'll just be even more empty fluff.

It's already 404-ing.

One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

The signal has shifted. For now, theory of mind and social awareness are better indicators. This has a major caveat, however: There are lots of human beings who have serious problems with this. Then again, maybe that's a non-problem.

I agree. I've noticed the other heuristic that works is "wordiness". Content generated by AI tends to be verbose. But, as you suggested, it might just be a matter of time until this heuristic also no longer becomes obsolete.

At the moment we can at least still use the poor quality of AI text to speech to filter out the dogshit when it comes to shorts/reel/tik toks etc... but we'll eventually lose that ability as well.

There might be a reversal. Humans might start intentionally misspelling stuff in novel ways to signal that they are really human. Gen Zs already don't use capitals or any other punctuation.

gen-z channels ee cummings

Every human-authored news article posted online since 2006 has had multiple misspellings, typos, and occasional grammar mistakes. Blogs on the other hand tend to have very few errors.

Poor use of LLMs is incredibly easy to spot, and works as today’s sign of a worthless post/comment/take.

So now the heuristic will change to "super excellent grammar", clearly.

We'll learn to pepper our content with creative misspellings now...

> At some point it may become impossible to separate the wheat from the chaff.

Then the chaff is as good as the wheat.

LLM trash is one thing but if you follow OP link all I see is the headline and a giant subscribe takeover. Whenever I see trash sites like this I block the domain from my network. The growth hack culture is what ruins content. Kind of similar to when authors started phoning in lots of articles (every newspaper) or even entire books (Crichton for example) to keep publishers happy. If we keep supporting websites like the one above, quality will continue to degrade.

I understand the sentiment, but those email signup begs are to some extent caused by and a direct response to Google's attempts to capture traffic, which is what this article is discussing. And "[sites like this] is what ruins content" doesn't really work in reference to an article that a lot of people here liked and found useful.

OP has a point.. Like-and-subscribe nonsense started the job of ruining the internet, even if it will be llms that finish the job. It's a bit odd if proponents of the first want to hate the second, because being involved in either approach signals that content itself is at best an ancillary goal and the primary goal is traffic/audience/influence.

Like I said, I understand the sentiment in the abstract. But my actual experience is that many good quality essays are often preceded by a gimme-yer-email popup. That's not causal - popups don't make content better - but it does seem correlated, possibly because the writers who are too principled to try to build an audience without email lists already gave up.

I'm not sure if I relate to the sentiment - in my experience, everything nowadays asks with mailing list ads. Every website from high-quality blogs to "Top 10 Best Coffee Makers in Winter 2024" referral link mills asks for your email. Worst thing is, many of them are already moving onto the "next big thing", which are registration gates. I feel like a huge portion of all Medium-hosted posts are already unreachable to guests because of that.

It's probably people who waste others' time with baseless complaints like this that completely ignore substance that have ruined the internet, and not the fact that authors of interesting substantive content that actually gets consumed, whom also ask for some form of support that have ruined the internet.

It's not a baseless complaint to observe that the internet was better when you could simply click on a website and read it, as opposed to dismissing several popups about tracking cookies or like-and-subscribe.

I think it is, especially as a response to the idea that the internet is starting to lack substance

Lacking substance is one symptom, harassing users in various ways is another symptom. The common cause is prioritizing traffic/audience/influence over content. It's not like it's impossible to provide substance without popups. It's fine to have a newsletter, but the respectful thing is to let me choose and don't push it at me. This is obvious.. I'm not sure why you're so eager to defend the sad new normal as if this was unavoidable

Interesting point about the spelling and grammar. I wonder if that could be used as a method of proving you are a human..

Would just penalize non native speakers.

I think the point would be excluding or otherwise filtering "flawless" copy from search results.

If that were the case I think it would benefit non-native speakers.

I was waiting for you to reveal your comment was written by AI

> it may become impossible to separate the wheat from the chaff

It is already approaching the societal limit to separate careful thought from psyops and delusional nonsense.

Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2. Articles written by interns or Indian virtual assistants about generic topics are pretty much as bad as most AI generated material and isn't that distinguishable from it. It doesn't help that search engines today sort by prestige over whether your query matches text in a webpage.

People aren't really using the web much now anyway. They're living in apps. I don't see people surfing webpages on their phone unless they're "googling" a question, and even then they aren't usually going more than 1 level deep before returning to their app experience. The web has been crap for a very long time, and it has become worse, but soon it's not going to matter anymore.

You, the reader, were the frog slowly boiling, except now the heat has been turned way up and you are now aware of your situation.

If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.

EDIT: People seem to be misunderstanding me by thinking I am not considering the change in volume of spam. I invoked the boiling frog analogy specifically to make the point that the volume has significantly increased.

Totally agreed, SEO spammers wrecked the public web years ago and Google did everything they could to enable it for more ad revenue.

SEO spammers were a thing even before Google fucked their search results. You know when Google search results were still amazing, like decade ago? SEO spammers were thriving. I know that for a fact because I worked for one back then. 90% of why Google search sucks now is due to Google being too greedy, only the rest is caused by SEO spammers.

No, SEO made things noticeably worse from the very beginning. This is almost tautological, when you think about it.

Agree googles machinations have not helped, but disagree with your 90/10 split.

You can't really separate SEO and the Google algorithms. SEO is a product of the Google's ranking algorithms.

The entire effort of SEO is to either to follow Google's official guidelines, or reverse-engineer how things work to exploit their algorithms. Those whole point of SEO is to score higher on ranking algorithms.

> Those whole point of SEO is to score higher on ranking algorithms.

Which is why they typically make things worse for the end user.

Agree it's a coupled problem wrt ranking algorithm. SEO messes with the algorithms, algorithms change to account for it, lather rinse repeat.

More recently though, google make presentation changes not as simple as ranking, which made things worse. Which I think is what GP was referring to.

With respect, I think you're missing a key variable, which is volume.

Sure, interns or outsourced content was there, but those are still humans, spending human-time creating that crap.

Any limiter on the volume of this crap is now gone.

Plus, at worst, human writers simply regurgitate/summarize info from other articles. It's more work to intentionally write something false.

AI writing has no idea what is real or fake and doesn't care.

Yes but the content from the web flows into social media, news, “books” (now e-books) in an intangible cyclone of fabricated information.

If sewage gets into the water supply no one is safe. You don’t get to feel better for having a spigot away from the source.

The sewage has already been flowing for years. Now we're just going to have more of it.

Search results on both Bing and DDG have been rendered functionally useless for a year or so now. Almost every page is an SEO-oriented blob of questionable content hosted on faceless websites that exist solely for ads and affiliate links, whether it's AI-generated or underpaid-third-world-worker-generated.

You see how that’s worse, don’t you?

The thing that you initially said was, as I interpreted, that there’s a very large difference between no sewage and a little, since a low concentration is still dangerous. The response to you pointed out that we already have sewage in the supply, implying it may not make a huge difference to add more. I feel like you’re goalpost moving.

I'm agreeing with you. The only part i'm highlighting is that continuing a bad trend is even worse.

It was a growing trend even before ChatGPT was released. The trend accelerated, but it's not new.

that's not good. it's actually worse if it's not new. and I agree it's not new.

I agree that low quality content has always existed.

But the issue is about the volume of misleading information that can be generated now.

Anything legit will be much more difficult to find now, because of the increased (increasing?) volume.

Good insight about Apps.

If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.

One wonders: How good could the next generation of AIs after LLMs become at curating the web?

What if every poster was automatically evaluated by AIs on 1, 2, and 5 year time horizons for predictive capability, bias, and factual accuracy?

Okay so this is pretty bleak isn't it, for the entrepreneurs, grassroots, startups, or just free enterprise in the most basic form?

I hope making software, apps, coding and designing is still a viable path to take when everyone has been captured into apps owned by the richest people on earth and no one will go to the open marketplace / "internet" anymore.

Will smaller scale tech entrepreneurship die?

>If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.

I completely agree!

> Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2.

I feel the same way.

I'm sure some corners of the internet have incrementally more spam, but things like SEO spam word mixers and blog spam have been around for a decade. ChatGPT didn't appreciably change that for me.

I have, however, been accused of being ChatGPT on Reddit when I took the time to wrong out long comments on subjects I was familiar with. The more unpopular my comment, the more likely someone is to accuse me of being ChatGPT. Ironically, writing thoughtful posts with good structure triggers some people to think content is ChatGPT.

I failed a remote technical interview by writing a bad abstraction and mudding myself with it.

After the interview I rewrote the code, and sent an email with it and a well written apology.

The company thought the email and the code was chatgpt! I am still not sure how I feel about that.

I think you might have missed a big-old .303 bullet there. If a company isn't able to recognise the value of going back and correcting your mistakes, even with the help of LLMs, it doesn't sound like a very nice working environment.

Wow I am not looking forward to that in my future interviews. At least showing atomic little micro commits should probably give potential employers a view into your thought process?

Although what am I saying, my current employer keeps pushing us to use AI tooling in our workflows, so I wonder how many employers will really care by then.

I personally don't like using AI - I feel like it takes the fun out of work, and I have ethical issues with it. But I have many co-workers who do not feel this way.


A huge amount of SEO spam comes out of India. If you look at places like BlackHatWorld, the biggest "SEO firms" are from India. It's the reality unfortunately.

Ridiculous calling someone racist over stating the reality that India has huge seo writing industry.

Never thought I'd say this, but in times like these, with clearnet in such dire straits, all the information siloed away inside Discord doesn't seem like such a bad thing. Remaining unindexable by search engines all but guarantees you'll never appear alongside AI slop or be used as training data.

The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at. They have eschewed their efficacy at basic tasks in favor of being terrible at complex tasks.

The fundamental dynamic that ruins every technology is (over-)commercialization. No matter what anyone says, it is clear that in this era, advertising has royally screwed up all the incentives on the internet and particularly the web. Whereas in the "online retailer" days, there was transparency about transactions and business models, in the behind-the-scenes ad/attention economy, it's murky and distorted. Effectively all the players are conspiring to generate revenue from people's free time, attention, and coerce them into consumption, while amusing them to death. Big entities in the space have trouble coming up with successful models other than advertising--not because those models are unsuccessful, but because 20+ years of compounded exponential growth has made them so big that it's no longer worth their while and will not help them achieve their yearly growth targets.

Just a case in point. I joined Google in 2010 and left in 2019. In 2010 annual revenue was ~$30 billion. Last year, it was $300 billion. Google has grown at ~20% YoY very consistently since its inception. To meet that for 2024, they'll have to find $60 billion in new revenue. So they need to find two 2010-Google's worth of revenue in just one year. And of course 2010-Google took twelve years to build. It's just bonkers.

There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things. The weeds have grown over them and made it difficult to find these from the public web because these individuals cannot devote the same resources to SEO and SEM as teams of adtech affiliate marketers with LLM-generated content.

When Google first came out, it was amazing how effective it was. In the years following, we have had a feedback loop of adtech bullshit.

> There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things

Those websites are long gone. First, because search engines defaulted to promoting 'recent content' on HTTPS websites, which eliminates a lot of informational sites that were not SSL-secured and archived on university web servers for example.

Second, because the time and effort required to compile this information today feels wasted because it can be essentially copied wholesale and reproduced on a content-hungry blogspam website, often without attribution to the original author.

In its place are cynical Substacks, Twitters or Tiktoks doing growth marketing ahead of an inevitable book deal or online course sales pitch.

Not only are the search engines promoting newer content but they are also (at least Google is) penalizing sites with “old” content [1]. Somewhat related, it’s outrageous to me when a university takes down a professor’s page when they are no longer employed or come up with a standard site for all faculty that is devoid of anything interesting, just boring bios.

[1] https://news.ycombinator.com/item?id=37068464

This search engine is designed to find such sites: https://search.marginalia.nu/

They made this wild mistake where moderation (which is a good thing) grew into dictating what websites should look like.

Search is a struggle to index the web as~is. Like biologists look at a species from afar and document their behavior. It's not like, hey if you want to be in the bird book you can lay 6 eggs at most, they should be smooth egg shaped, light in color and no larger than 12 cm. You must be able to fly and make bird sounds only and only during the day. Most important you must build your own nest!

Little Jimmy has many references under his articles, he is not paginating his archives properly, he has many citation.... Lets just take him behind the barn and shoot him.

such websites still get made all the time, they're just not useful for Google to surface. https://blog.kagi.com/small-web

I've pretty much just seen evidence that this segment keeps growing, and is now much MUCH larger than the Internet in The Good Old Days.

Discovering them is indeed hard, but it has always been hard - that's why search engines were such a gigantic improvement initially, they found more than the zero that most people had seen. But searches only ever skimmed the surface, and there's almost certainly no mechanical way to accurately identify the hidden gems - it's just straight chaos, there's a lot of good and bad and insane.

Find a small site or two, and explore their webring links, like The Good Old Days. They're still alive and healthy because it keeps getting easier to create and host them.

Sites today don't have blogrolls. Back in the '00s it was sacrilege not to have one on the sidebar of your site. That massively improved discoverability. Today you have to go to another service like Twitter to see this kind of cross-pollination.

tbh I have only ever seen a couple in an omnipresent sidebar in my lifetime. The vast majority I encountered around then and earlier were just in the "about" (or possibly "links") pages of people's websites, and occasionally a footer explicitly mentioning "webring".

Also if you squint hard enough, they're massively more common now. They're just usually hidden by adblockers because they're run by Disqus or Outbrain or similar (i.e. complete junk).

Parent didn't say that they don't still get made, just that they are now much more difficult to discover which you repeated

What are current search strategy one employees to find such articles. Nowadays I find it increasingly hard to find such articles.

I strongly disagree. I've been answering immigration questions online for a long time. People frequently comment on threads from years ago, or ask about them in private. In other words, public content helps a lot of other people over time.

On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

>On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

>If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

Exactly; open web is better because everything is public and "easy" to find....well if you have a good search engine.

Deep web is huge: Facebook, Instagram, Discord etc. and unfortunately unsearchable.

Right, the issue is not that people don't appreciate good content. The issue is that it's harder for people to find it.

It's an entrenching of the existing phenomenon where the only way to know what to trust on the Web is word of mouth.

That's always been the case. Surely you didn't used to trust random information? Ask any schoolteacher how to decide what to trust on the internet at any point in time. They're not going to say "If it's at the top of Google results" or "If it's a well-designed website", or "If it seems legit".

I'd think this depends heavily on the subject. Someone asking about fundamental math and physics is likely to get the same answer now as 50 years from now. Immigration law and policy can change quickly and answers from 5 years ago may no longer present accurate information.

"Sharing useful knowledge with the broadest possible audience," unfortunately, is the worst possible thing you can do nowadays.

I hate that the internet is turning me into that guy, but everything is turning into shit and cancer, and AI is only making an already bad situation worse. Bots, trolls, psychopaths, psyops and all else aside, anything put on to the public web now only contributes to its metastasis by feeding the AI machine. It's all poisoned now.

Closed, gatekept communities with ephemeral posts and aggressive moderation, which only share knowledge within a limited and trusted circle of confirmed humans, and only for a limited time, designed to be as hostile as possible to sharing and interacting the open web, seem to be the only possible way forward. At least until AI inevitably consumes that as well.

But what about people that are not yet in the community? Are we going to make "it's not what you know but who you know" our default mode of finding answers?

What alternative do you suggest? Everything you expose to the public internet is now feeding AI, and every interaction is more and more likely to be with an AI than a real human.

This isn't a matter of elitism, but vetting direct personal connections and gatekeeping access seems like the only way to keep AI quarantined and guarantee that real human knowledge and art don't get polluted. Every time I see someone on Twitter post something interesting, usually art, it makes me sad. I know that's now a part of the AI machine. That bit of uniqueness and creativity and humanity has been commoditized and assimilated and forever blighted from the universe. Even AI "poisoning" programs will fail over time. The only answer is to never share anything of value over the open internet.

Corporations are already pouring billions of dollars into "going all in" on AI. Video game and software companies are using AI art. Steam is allowing AI content. SAG-AFTRA has signed an agreement allowing the use of AI. Someone is trying to publish a new "tour" of George Carlin with an AI. All of our resources of "knowledge" and "expertise" have been poisoned by AI hallucinations and nonsense. Even everything we're writing here is feeding the beast.

I'm fine with feeding AI if I absolutely have to. I'm not fine with feeding only AI.

Well, unless Discord starts selling it to ai companies right?

OpenAI trains GPT on their own Discord server, apparently. If you copy paste a chatlog from any Discord server into GPT completion playground, it has a very strong tendency to regress into a chatlog about GPT, just from that particular chatlog format.

No, that's never happened before. You're crazy.

> No, that's never happened before. You're crazy.

Start filling up Discords with insane AI-generated garbage, and maybe you can devalue the data to the point it won't get sold.

It's probably totally practical too, just create channels filled with insane bots talking to each other, and cultivate the local knowledge that real people just don't go there. Maybe even allow the insane bots on the main channels, and cultivate the understanding that everyone needs to just block them.

It would be important to avoid any kind of widespread conventions about how to do this, since and important goal to to make it practically impossible to algorithmically filter-out the AI generated dogshit when training a model. So don't suffix all the bots with "-bot", everyone just need to be told something like "we block John2993, 3944XNU, SunshineGirl around here."

If we work together, maybe we can turn AI (or at least LLMs) into the next blockchain.

The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.

> The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.

I was think it could work if 1) the noise is just obvious enough that a human would get frustrated and block without wasting much time and/or 2) the practice is common enough that everyone except total newbies will learn generally what's up.

> The equivalent of "locals don't go to this area after dark"?

We have this with human online places already it's called 4chan

This idea has been talked about enough that we call it "Habsburg AI". OpenAI is already aware of it and it's the reason why they stopped web scraping in 2021.

This is an intellectually fascinating thought experiment.

> This is an intellectually fascinating thought experiment.

It's not a thought experiment. I'd actually like to do it (and others to do it). IRL.

I probably would start with an open source model that's especially prone to hallucinate, try to trigger hallucinations, then maybe feed back the hallucinations by retraining. Might make the most sense to target long-tail topics, because that would give the impression of unreliability while being harder to specifically counter at the topic level (e.g. the large apparent effort to make ChatGPT say only the right things about election results and many other sensitive topics).


If people believe giving all information to one company and having it unindexable and impossible to find on the open internet is a way to keep your data safe, I have an alternative idea.

This unindexability means Discord could charge a much higher price when selling this data.

Imagine the rich economic insights we could get from a Discord AI trained on billions of messages in crypto shitcoin channels. /s

They wouldn’t…would they? /s

I can't see how being used as training data has anything to do with this problem. Being able to differentiate between the AI slop and the accurate information is the issue.

Differentiation becomes harder the better AIs perform, which is currently bound by data availability and quality.

Relevant XKCD: https://xkcd.com/810/

Discord is searchable: https://www.answeroverflow.com/

I think that answer overflow is opt-in, that is individual communities have to actively join it, for their content to show up. That would mean that (unless answer overflow becomes very popular), most discord content isn't visible that way.

I can't really see a relevance between "we should spend more time with trusted people", which is an argument for restricting who can write to our online spaces, and "we should be unindexable and untrainable", which is an argument for restricting who can read our online spaces.

I still hold that moving to proprietary, informational-black-hole platforms like Discord is a bad thing. Sure, use platforms that don't allow guest writing access to keep out spam; but this doesn't mean you should restrict read access. One big example: Lobsters. Or better-curated search engines and indexes.

Read access to humans means read access to AIs. We can't stop the cancer but we can at least try to slow its spread.

This assumes that there are no legitimate uses to AI. This is clearly not true, so you can't really just equate the two. If you want better content, restrict writing, not reading. It's that simple.

The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at.

What if the AI apocalypse takes this form?

    - Social Media takes over all discourse
    - Regurgitated AI crap takes over all Social Media
    - Intellectual level of human beings spirals downward as a result

Neural networks will degenerate in the process of learning from their own hallucinations, and humans will degenerate in the process of applying the degenerated neural networks. This process is called "neural network collapse". https://arxiv.org/abs/2305.17493v2 It can only be countered by a collective neural network of all the minds of humanity. For mutual validation and self-improvement of LLM and humans, we need the ability to match the knowledge of artificial intelligence with collective intelligence. Only the CyberPravda project is the practical solution to avoid the collapse of large language models.

Discord will die and there's no way that I'm aware of to easily export all that information.

AI spam bots will invade discord.

And they'll get banned by moderators. Ultimately that's the key ingredient in any good strategy here: human curation.

AI bots will get the confidence of admins and moderators. They will be so helpful and wise that they will become admin and moderators. Then, they will ban the accounts of the human moderators.

Mods will ban them and new users will be forced to verify via voice/video chat/livestream.

We already have AI generated audio and video. This is a stopgap at best.

Maybe the mods will have to trick the AI by asking it to threaten them or any other kind of “ethical” trap but that will just mean the AI owners abandon ethical controls

Voight-Kampff test

Not actually a real thing.

they will sell. the big guys are gobbling up _anything_ they can get their hands on.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact