Hacker News new | past | comments | ask | show | jobs | submit login
Nearly all of the Google images results for "baby peacock" are AI generated (twitter.com/notengoprisa)
415 points by jsheard 35 days ago | hide | past | favorite | 420 comments



Almost all of the "product X vs Y" results are AI ramblings now. This growth of the dead Internet is making me want to sign up for Kagi. We're going to need a certification for human generated content at some point.


Kagi is not a panacea unfortunately. I pay for it and daily drive it to support a Google alternative, but I still have real trouble with my results being full of AI garbage (both image and text search).

As mentioned, product comparisons are a big one but another worrying area is anything medical related.

I was trying to find research about a medicine I'm taking this week and the already SEO infested results of 5 years ago have become immeasurably worse, with 100s of pages of GPT generated spam trying to attract your click.

I ended up ditching search alltogether and ended up finding a semi-relevant paper on the nih.gov and going through the citations manually to trying and find information.


That matches my experience. Kagi doesn't surface much content beyond what Google/Bing do. What it does better out of the box is guessing which content is low-quality and displaying so that it takes up less space, allowing you to see a few more pages worth of search results on the first page. And then it lets you permanently filter out sites you consider to be low quality so you don't see them at all. That would have been awesome 10 years ago when search spam was dominated by a few dozen sites per subject that mastered SEO (say expertsexchange), but it is less useful now that there are millions of AI content mills drowning out the real content.

For content that isn't time sensitive the best trick that I have found is to exclude the last 10-15 years from search results. I've setup a Firefox keyword searches[1] for this, and find myself using them for the majority of my searches, and only use normal search for subjects where the information must be from the last few years. It does penalize "evergreen" pages where sites continuously make minor changes to pages to bump their SEO, which sucks for some old articles at contemporary sites, but for the most part gives much better results.

[1] For example: https://www.google.com/search?q=%s&source=lnt&tbs=cdr%3A1%2C...


> For content that isn't time sensitive the best trick that I have found is to exclude the last 10-15 years from search results. I've setup a Firefox keyword searches[1] for this, and find myself using them for the majority of my searches...

OMG. I'm so happy how much AI is improving our lives right now. It really is the future, and that future is bright.

Thanks guys!


Honestly, yikes. And it's not going to be better. Producing AI content is so cheap (and apparently so effective) SEOs are going to milk it forever.


I use Kagi personally every day and my results are definitely not full of AI garbage so would like to better understand your context.

Have you reported any of those issues to Kagi (support/discord/user forum)? We are pretty good at dealing with search quality issues.


Vladimir, when posting things like this, you might want to disclose that you're Kagi's CEO.


Hehe. He really should.

I’m not affiliated with Kagi in any way and my results are not full of LLM generated content either.

Though there is an LLM generated baby peacock pretty high in the image search, and when I go to the website it literally says that it is an example of an AI generated image and not a real baby peacock.


Most of the times I do, but it gets tiring as I post on HN a lot, and here I am primarly a HN user and have been one long before I've become a Kagi CEO. I feel that saying 'we' in the message (as I did) and having a clear disclaimer in my bio is enough.


The UK NHS website is usually pretty good for this so sticking "NHS" in the search terms might help, although I imagine they may not cover non-UK brand names.


"site:" namespace still works on Google... for now.


> I ended up ditching search alltogether and ended up finding a semi-relevant paper on the nih.gov and going through the citations manually to trying and find information.

I've been doing this for years now. The normienet as I call it is nigh worthless, and I don't even bother trying to find information on it.


I also use it daily. One of my favorite functions is being able to boost certain domains and block or downgrade results from other domains. So I boost results from domains I trust which significantly improves my results. They have a page with commonly boosted/blocked/downgraded sites which serves as a good starting point.


It really is a werid feeling remembering the internet of my youth and even my 20s and knowing that it will never exist again.


I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century. It was a unique point in time.

I'm ready to pay for a walled garden where the incentives are aligned towards me, instead of against me. I know that puts me in a minority, but I'm tired of the advertising 'net.


I've said it before and say it again, I firmly think AOL was just ahead of its time.

Bring it back. Charge me 10 or 20 a month. Give me the walled off chatrooms, forums, IM, articles, keywords, search, etc. Revamp it, make it modern. And make a mobile app.

Everyone wanted a free and open Internet, until AI and the bots ruined it all.


> until AI and the bots ruined it all

Well... advertising as a business model ruined it all. They get paid for getting page views, so the business model optimizes for maximum page views at minimum cost of creation. This is the end result of what Google and Facebook have spent the last 20 years building.

But I'm sure the engineers who built all this have very nice yachts, so it's all fine.


This is the root cause. "AI and Bots" are just two of the many symptoms


With every passing day I basically think this is going to be the future of the internet. Many disparate private/semi-private groups while the "public" internet becomes overloaded with AI slop.

It's largely already happening in places like Discord.

I think the first company that can capture what Discord has but is not wrapped in that "gamer aesthetic" ui/ux is going to do really well.


>Many disparate private/semi-private groups while the "public" internet becomes overloaded with AI slop.

And eventually someone is going to come up with a search engine for Discord, and the cycle will start all over.


Discord could probably become the new internet if they introduced a URN scheme and permitted folks to host web pages.


Discord is becoming a bloated slop fest. It has literally been implicated at times for bad gaming performance [1]. Remember when it used to just be a chat app? And don't get me started on its "quests" system, which may burst your ideal internet bubble. [2]

[1] https://www.tomshardware.com/news/discord-throttles-nvidia-g...

[2] https://www.thestreet.com/technology/discord-is-making-a-maj...


Guess what tools would be used the most to write content for those web pages.


i think the problem with revamping Q-Link/AOL into the be-all end-all for everyone (thats human) youre gonna hafta pump the prime with AI chatbots to give the appearance of lots of people to draw you in, kinda like how reddit admins pumped the prime by making tons of posts early on. just a little light treason.


I don't think it works today. You have walled communities in Discord and messaging apps, but if you are looking also for the degree of anarchy in those days, today we know that your everyday person can make money off the internet but you didn't know that then, and I think that colors a lot of the experience.


All it would take is a not phone friendly internet. Just a slight barrier to entry changes the dynamics quite a bit.


It still exists. Currently it looks like Patreons and their associated communities, long-running web forums, small chatrooms on platforms like Discord or Facebook or Instagram, and so on. Small communities, with relatively high barriers to entry.


That's not the same Internet.

Patreon, forums, Discord, Facebook, Instagram and so on are all centralized.

With the Internet of the 90s, discussion happened with decentralized Usenet (owned by nobody) with more-focused discussion happening on mailing lists (literally owned by whoever was smart enough to get Majordomo compiled and running).

Email was handled by individual ISPs or other orgs instead of funneled through a handful of blessed providers like Gmail.

Real-time chat was distributed on networks of IRC servers that were individually operated by people, not corporations.

Quickly publishing a thing on the web meant putting some files in ~/public_html, not selecting a WordPress host or using imgur.

Ports 80 and 25 were not blocked by default.

Multiplayer games were self-hosted without a central corpo authority.

One could construct an argument that supports either way being better than the other, but the the Internet of today is not the same thing as it was a quarter of a century ago.

(Anyone can make a "discord server," but all that means is that they've placed some data in some corpo database that they can never actually own or control.)


> Multiplayer games were self-hosted without a central corpo authority.

Losing "dedicated servers" was a huge loss in my opinion. It was fun to play on the same handful of servers and get to know the same group of people. Dedicated servers were also free from profit-driven "matchmaking" schemes since you ended up playing with whoever was on at the time.

I also miss the chaotic multiplayer of the era, where the priority was having fun and not improving your rank on the leaderboard.

Edit: Dedicated servers also each had their own moderation. So if you wanted to play on a server that banned anyone nasty you could. But you could play with a bunch of folks dropping "gamer words" as well.

Then there were custom sprays - something no company would allow in their game today. It's sad how constrained and censored online gaming experiences are today. In many games even dropping the "f-bomb" can get you banned and typing assassin in the chat window yields "***in".


It's crazy to me that custom sprays existed at all in any capacity for so long. They were novel and sometimes funny but man, I really don't miss the days at all of playing a game and having your roommate/family ask why your computer screen is plastered with goatse or meatspin or the like. I remember more than a handful of awkward conversations trying to explain WHY that wasn't actually part of the game itself.


All of this still exists today though. It hasn't disappeared, it has only been drowned.

If you want to find it again, change your search engine. Use wiby.me, or search.marginalia.nu. Subscribe to RSS feeds of sites you find interesting, and go from there. Hop on gemini. Subscribe to some activitypub accounts (you don't even need an account for that). Communal IRC servers still exist, forums still exist, independent emails still exist.

Stop falling for the overall doomism HN is so quick to fall for, start doing and living what you want


Shout out to lemmy as well. Still missing some users to seriously challenge reddit, but also lacks all the spam and astroturfing.


That's a pretty narrow view. My Internet in the 90s and early 2000s was on AOL Instant Messenger and MSN Network, Yahoo! Mail and phpBB web forums hosted by random people with pictures hosted on PhotoBucket. Plenty centralized.

The technology doesn't matter, it's changing all the time. The difference is the size of the community, and the barriers to entry. If these communities were easy to discover and join and hard to be booted out from, they wouldn't be protected from The Slop. It used to be just getting on the Internet was the barrier. Now we need new barriers: paywalls and word-of-mouth and moderation.


I have no reason to doubt your experience.

But AOL IM and MSN were centralized walled gardens that came rather late to the game, and neither PhotoBucket nor phpBB existed at all in the 20th century that is the context here.


Would you pay a nominal amount (like 5 cents or 25 cents) to consume one piece of good, ad-free content, assuming that there was no login, no account, no friction, etc? You click, you read, and 5 cents is magically transferred from you to the writer?

I would. But I've asked a lot of people who say "no, I don't want to pay when I can read it for free. I don't mind the ads that much."


> I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century.

I'm a little sad for anyone who didn't get to experience of pre-Internet era.

Internet is lead of our time.


> where the incentives are aligned towards me, instead of against me.

It's great to read these words. People are starting to get it. The Internet is not for you, it's against you.


"The internet" isn't for or against anything, it's just a vast computer network. It's the humans with an agenda that exploit the internet (specifically the web) that are against you.


You're correct of course leptons, but do please give credence to important shorthand. As I've put it before, the Internet is the battleground. Ground is not necessarily "neutral". It favours certain forces and tactics. See the "Nine Situations" [0] We used to occupy type-5 (open ground) which was also mostly type-2 (dispersive). We are now on "serious" and "difficult" ground. Therefore the environment itself is hostile. Most of what people commonly think of as "The Internet" - that's not your friend any more.

[0] https://suntzusaid.com/book/11


If you replace "the internet" with "web browsers", you would be correct. "The internet" doesn't care what bits and bytes travel through it, nor who uses it.


Maybe if you've time for a long but very interesting read have a look at this [0]. R. Berjon gives a good explanation of why the internet (and we mean IP routing, DNS, BGP and ISP governance etc) is far from a level system and what needs to be done to restore it to public service as a global tool for the good of humanity.

[0] https://berjon.com/public-interest-internet/


> I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century. It was a unique point in time.

I did, and...well, let's be careful how we look back at it.

Punch the monkey? Ad supported 'free' internet that literally put an adbar at the top of your browser at all times? Dreadfully slow loads of someone's animated construction sign GIF? Waiting for dial up to connect after 20 tries? Tracking super pixels? Java web applets? Flash? Watching your favorite ISP implode or get bought up? To say nothing of the pre-Google search results (I miss the categories though).

I have plenty of good memories from those days, but it still had plenty of problems. And it wasn't exactly a bastion of research material either unless you really went digging or paid for access.


It may be a minority, but you're not alone.


the problem with that is not the payment, it's that you will only be sharing it with people similarly willing to pay for a walled garden. I'm guessing most of what we're nostalgic for was created by people who wouldn't be up for that


It was created by people that

1. could afford a computer back then and saw the utility of owning one. 2. had access to the internet, so either in college, a 'tech' company, or ties to some local collective that provided access.

When people say 'the old internet' they are referring to a very self selective/elite group.


> When people say 'the old internet' they are referring to a very self selective/elite group.

And that's what made it fun I guess


> I'm a little sad for anyone who didn't get to experience the Internet of the twentieth century. It was a unique point in time.

Sadly, they won't know what they were missing. It'll be the new normal

Some asshole tech apologist is probably getting ready to post that section from Plato where Socrates complains about writing any minute now.

Of course, that asshole is oblivious to the fact that most if not all of us probably just don't understand what Socrates was missing, so he's just showing his ignorance and stupidity.

> I'm ready to pay for a walled garden where the incentives are aligned towards me, instead of against me. I know that puts me in a minority, but I'm tired of the advertising 'net.

The problem is that, even if you try to do that, the incentives are probably still aligned against you, just maybe less blatantly.

Just look at how many formerly ad-free paid services are adding ads, and how hardware users literally own acts against their interests by pushing ads in their faces (e.g. smart TVs).

The guy who runs the walled garden will always be tempted to get some extra cash by adding ad revenue to your subscription feed, or cut costs by replacing human curated stuff with AI slop (maybe cleaned up a bit).


Isn't it worse knowing what was lost instead of never knowing how good things actually were?


I only just put it together but Peter Watt's Rifters series is some epic earth grimdark hard-sci-fi, the first taking place as practically horror, confined deep under water.

But my point is, the latter books have this has amazing post-internet, just a ravaged chaotic Wildlands filled with rabid programs & wild viruses. Packets staggering half intact across the virtualscape, hit by digital storms. Our internet isn't quite so amazing, but I see the relationship more subtly with where we have gone, with so so so many generated sites happy to regurgitate information poorly at you or to sell you a slant quietly. Bereft of real sites, real traffic. Watts is a master writer. Maelstrom.

First book Starfish is free. https://www.rifters.com/real/STARFISH.htm


Thanks for the recommendation and free link!

Another book on my reading list!


> It really is a werid feeling remembering the internet of my youth and even my 20s and knowing that it will never exist again.

User facing ability to whitelist and blacklist websites in search results, ability to set weights for websites you want to see higher in search results.

Spamlists for search results, so even if you don't have knowledge/experience to do it yourself, you can still protect them from spam.

It's recreation of e-mail situation, not because it's good, but because www is getting even worse than e-mail.


A mesh network on top of IP with an enforcable license agreement that prohibits all commercial use would suffice to get the old net back. Bonus points if no html/css/js is involved but some sane display technology instead.


No way. What you are describing is Gemini, but even more niche - a place which is explicitly walled-off from the "big net", which only nostalgic people with right technical skills and a desire to jump some hoops can get to.

This is not going to work - as time progresses, there will be less and less nostalgic people who are willing to put up with that complexity. And "non-commercial" part will ensure that there _never_ be an option to say: "I am tired of fixing my homeserver once again, I am going to put up my site to (github|sourceforge|$1 hosting) and forget about it".

Compare to early web. First thing that came to my mind was Bowden's Hobby Circuits site [0]. It's designed for advanced beginners - simple projects, nice explanations. And there are no hoops to jump through - I've personally sent the links to it to many people via forums, private emails, and so on. It apparently went down in 2023, but while it was still up, I remember regularly finding it from google searches and via links from other pages.

[0] https://web.archive.org/web/20220429084959/http://www.bowden...


Without most of what makes the internet useful, sadly.


I'm not sure I like your parent's idea, but it isn't like the regular internet would go away... when you want useful, go there.

I wouldn't mind a modern take on geocities sorta system. Where: (1) You can make a webpage that could be about whatever the bleep you wanted it to be about. (2) Only allowed a reduced subset of web technologies. (3) That was free from any advertising or commerce/sales. (4) Was only available to individuals or businesses no larger than closely-held corporation. (5) had clear limitation on AI uses. (6) Had a complete index, categorized and tagged, of all the sites available.

But if I am being honest, that is just the nostalgia for the old internet talking.



I'm already treating the WWW and the commercial internet generally as "Babylon". You have to use it for a lot of stuff (doing commerce, interacting with the government), but why would you willingly use it on your own time?


> This growth of the dead internet

It is quite surreal to witness. It is certainly fueled by the commercialization of internet due to ads and centralization to user hostile platforms.

The old internet seems to be doing much better. But it lost most of its users in the last 15 years...


The old internet seems to be doing much better. But it lost most of its users in the last 15 years..

What do you mean by this? How do you find the old internet?


You don't find them easely. That is the point I guess. But I am not reffering to some obscure darkweb here.

Many of e.g. the old niche forums still exist. Like, FOSS sites. GNU project sites seemed not to have aged a day in 20 years, i.e. still party like its 2004.

Also, I think non English sites are better off since Reddit mainly ate English communities and sites.

Facebook is probably what killed most of the living internet. Small community sites. Like the local Kennel club or Boat marina.

A good example of the old internet would be Matthew's Volvo site:

https://www.matthewsvolvosite.com/forums/search.php?search_i...


You're on it right now. HN is a very old site with old users and old mods that links to other old sites


Something Awful is a good example. Other forums behind paywalls or ones that are invite only.


Even non-paywalled forums are pretty good compared to the "open" internet.


Searching with "Reddit" at the end of every query helps but I suppose it's only a matter of time when most content on Reddit is also AI-generated.


Reddit is already lost. I was talking to the mods in a large political subreddit and they said after Reddit started charging for API access, all the tools they used to keep on top of the trolls and bots stopped working, and the quality of the whole subreddit declined visibly and dramatically.


> Reddit is already lost. I was talking to the mods in a large political subreddit and they said after Reddit started charging for API access, all the tools they used to keep on top of the trolls and bots stopped working, and the quality of the whole subreddit declined visibly and dramatically.

The whole point of the API access change was to charge AI model-makers. I'd be ironic if the API change made destroyed their product and made their data unsellable.


Yes, everyone warned Spez about that at the time. He didn’t care, he wanted that IPO.

Recently, they came around looking to recruit me. I told them fire Spez or fuck off. (17 year Reddit user here)


I used to moderate a fairly large subreddit, used to, I decided to leave and never come back after the API debacle, it was a long time coming though.

I think if the business model had been thought in the sense of the communities and involving mods and users in, it would have been genius, a lot of smaller companies would kill to have people genuinely recommending their products/tools that are hidden behind the biggest wall of them all...

Alas they completely ignored this as a viable avenue and went for the quick buck.

But this won't last and at some point people are going to move on from mass scraping, either because they already got what they want or because garbage goes in garbage comes out or because most of the content will be bot generated and require too much filtering to be useful.

Of course this is the opinion and rambling of a moderately educated individual and I might be totally wrong.

Change can often be for the best, pretty sure it wasn't in this case...


It's not just the mods, it's also the culture of the website that changed.


Agreed. There still are resistant subs, but the main culture moved in a worse direction. That is subjective of course, but it isn't just reddit trying to "clean house" that makes everything a little more sterile and boring. It also become even more intolerant of "wrong" opinions. That has always been the case to a degree, but it got seriously worse.


Some subreddit got purged over the years, some for the best, some for the worse, I couldn't find where in the new UI you could find moderator names.

Aaron is probably turning once again in his grave...


If you know anyone who works in marketing/PR, ask them how they use Reddit. That has been gamified as much as SEO since about 2020. I’m assuming, anything except “why is there a fire in this street?” kind of posts are just ads at this point.


The "top n" subreddits are surely gamified but smaller, highly-focused subreddits seem ok.


Fair. Exceptions definitely exist, but unless it's location based subreddit, I, personally, wouldn't trust it. There are fun methods that I've been told about. Like marketing companies maintain multiple very real life looking accounts, participating in discussions for months/years with no product affiliations, and very casually throwing in plugs which eventually generate revenues. Or giving a list of 10 items, and inserting their product in between, as a sale is better than no sale. Or marketing a competing product in a very obvious way, then replying from another account telling how it's an ad to eventually lead sales to themselves.

I have no idea how intensive these are, as I most of these over bar drinks when I was traveling. Could've just been someone making stuff up as well. But my marketing friends have confirmed that they use Reddit very heavily, as it's a great sales funnel if you play it right.


I wonder at what point it'll be replaced with "site:4chan.net".


It's also not much use to anyone who doesn't use Google ever since Reddit started blocking all crawlers besides Googlebot. Old cached results might still show up in Bing/DDG/Kagi but they can't index any of the newer stuff.


What makes you say this? I just tried a few Reddit searches about events from the last few days and they work as expected (tested in Kagi)


https://www.404media.co/google-is-the-only-search-engine-tha...

https://www.reddit.com/robots.txt

Reddit serves a different robots.txt to Googlebot, you can see a snippet of it in Googles summary. If Kagi is getting recent Reddit results then they must be either ignoring robots.txt, using Google as a backend, or also paying Reddit for access like Google.

Bing and DuckDuckGo are certainly still locked out, I just tried searching for "reddit hurricane milton" on both and none of the results are actually from Reddit.


Kagi didn't make it explicit, but based on what they wrote I think they API call Google and other search engines to show as part of their results.

https://help.kagi.com/kagi/search-details/search-sources.htm...


Most of the Reddit content is now[0] fake.

[0] Gradually for several years already.


The niche subreddit's I follow seem to be ok. I stay away from the big ones. All the default ones are garbage from what I can see.


Kagi's results for "baby peacock" are showing almost the same set (Mostly AI) as Google's.


It's surprising how many times you see this pattern on HN

"Google sucks!"(50 upvotes)

"That's why I use Kagi!"(45 upvotes)

"Actually Kagi has the exact same problem and you have to pay for it."(2 upvotes)


Search “peachick”, it works fine. I assume Google would be the same.

I guess using the correct terminology matters.


> I guess using the correct terminology matters.

If people were actually searching up "peachick" that'd probably be SEO spammed to hell, too.


Kagi's images are an entirely different set for me.


Unfortunately, as much as I do like Kagi overall, it goes out of its way to inject AI slop into the results with its sketchy summarization feature


Most product reviews are simply pumping amazon comments into AI to generate a review. with a final "pros/cons" section that is basically the same summary amazon AI generates.


> We're going to need a certification for human generated content at some point.

People keep saying this and I keep warning them to be careful what they wish for. The most likely outcome is that "certification of human generated content" arrives in the form of remote attestation where you can't get on the internet unless you're on a device with a cryptographically sealed boot chain that prevents any untrusted code from running and also has your camera on to make sure you're a human. It won't be required by law, but no real websites will let you sign in without it, and any sites that don't use it will be overrun with junk.

I hate this future but it's looking increasingly inevitable.


Your unauthorized access has been reported to the Fair Use Bureau.


There's ways to do this without destroying anonymity. Ideally, you verify you're human by signing up for some centralized service in real-life, maybe at the post office or something. And then people can ask this service if you're real by providing your super-long rotating token. So, just like an existing IDP but big.


That's how the Internet works in China.


Whether something is human generated is (mostly) beside the point. The problem is that spam is incentivized today. Any solution must directly attack the financial incentive to spam. Therefore what's needed for a start is for search engines to heavily downweight ads, trackers, and affiliate links (obviously search engines run by ad companies will not do this). Shilling (e.g. on reddit) should be handled as criminal fraud.


Even Google is trying to get into the X vs Y game, with pretty funny results if you ask for a nonsensical comparison.

https://x.com/samhenrigold/status/1843040235325964549

...or a sensical comparison where it just completely misses the point.

https://i.imgur.com/FotFZ3F.jpeg


Couldn't reproduce - in fact, the second hit is a threads version of the same post - but I get no AI suggestions for this query. Humorous Google queries (or AI queries more generally) are definitely a trope, so I can never really tell if they actually happened or if it's all for karma.


Google also routinely removes AI suggestions for searches that produce embarrassing results (you don't get them for searches about keeping cheese on your pizza anymore, for example), so it's even harder to validate once a result goes viral.


Look at the bright side: "do a barrel roll" is still a fully-supported Easter egg.


I still get the second one when I search "Difference between sauce and dressing" on Google. The Oven vs Ottoman empire one I don't get an AI overview.

Edit: Similar to the second one I just did Panda Bear vs Australia which informed me "Australians value authenticity, sincerity, and modesty. Giant pandas are solitary and peaceful, but will fight back if escape is impossible. "


My memory is that these were pretty terrible long before the generative ai boom.


I'm glad that Kagi (and others) exist as an alternative for people who don't want generative AI in their searches.

Personally, I'm excited about more generative AI being added to my search results, and I'll probably switch to whichever search engine ends up with the best version of it.


This peacock thing was the last straw for me. I installed Kagi just moments ago.

And of course the first image for "baby peacock" is the same white chick thing… obviously because this story is making the rounds —_—


AI tools on the search page: sure, cool. I use perplexity a lot, actually. I'm in favor of this.

Search results that are full of content mills serving pre-genned content: no thanks. It's in the same category as those fake stackoverflow scrape sites.


Not sure if you’re being sarcastic, but they’re not talking about AI features of the search engine itself (Kagi has those too), but about nonsensical AI generated content on the web that exist solely for the purpose of getting you clicking on some ads. Kagi tries to make those sites stand out less on the search results.


human-verified content is going to be the next billion dollar company.


Perhaps you're thinking of the Wikimedia Foundation.

There is plenty of space there for more volunteer editors to verify content, and likewise, WMF operates its own cloud platform where developers are automating tools that do maintenance and transformation on the human-contributed content.

Then, there is Wikidata, a machine-readable Wiki. Many other projects draw data from here, so that it can be localized and presented appropriately. Yet, its UI and SPARQL language are accessible to ordinary users, so have fun verifying the content there, too!


I don't think you understand what I meant by human verified, but I used a very vague term to express what I meant, I meant proving that some input or data that comes from a user was generated by a human (whatever we define that to mean) rather than an LLM or multi-modal image/video/audio model output.


You mean, like, the Book of Kells and a nice stage production of Lysistrata, vs. a PowerPoint and Star Wars? Interesting.


This issue in terms of cost is that if you want this to be truly human-verified for real, you're gonna have to dip into the real world.


Revival of the curators.


We need certification for human generated content for yesterday.

Not only that, we desperately need cryptographic prof that content X was produced by person Y.


How can that ever work in a world filled with people that are eager to lie to you?


digital signatures?


But I can generate something with AI and then sign it myself and say "I wrote it, pinky promise".

Since most people don't write anything on the internet, I can pay people $5 to use their signature and operate a "sign farm".

Look at the effort these people go though to send their spam.


A web of trust or reputation based system can be built on top of the signature scheme - maybe an appearance of smaller invite-only forums that share reputation. Or maybe it will just become an integrated part of the moderation of existing platforms like Reddit, Mastodon and Bluesky

If you put your name on AI spam people will flag your post and no-one will bother to see your posts for at least a few years.


And if people don't like you what you're talking about then people will flag your post too. This is not going to work at any serious scale – certainly not across the internet – because abuse will be rampant.


Sure, but we’re working with a real person's reputation. If they are willing to gamble with AI, that’s their problem.

I’m hoping for a “this human being validated this message”. I think that alone could solve multiple problems.


Maybe OpenAI is really just Sam Altman's last ditch effort to save WorldCoin?


You can fix this at a hardware level with Cameras. This will, at least, save the most harmful kind of lying. Where real-looking photos and videos are used to defame real people.


I'm not really sure if I follow your suggestion; what would be done with the camera? Authenticate that a picture is real? I'm not sure how that will be workable? For example, how would I (random internet person) verify a picture from you (also random internet person) is real?


You can verify that a particular camera produced a particular photograph. Essentially, the camera would sign the photograph. Random internet person to other random internet person it wouldn't matter, I imagine. But for, say, a news paper, they can verify that a particular image was produced by a particular camera at a particular time.


product X vs Y are not really any worse now than pre-GPT (i.e. they were absolute crap long before GPT came on to the scene).


> We're going to need a certification for human generated content at some point

I wrote some ideas up about this many years ago: https://github.com/pjlsergeant/multimedia-trust-and-certific...


I'm not sure human-generated content is any better on the whole. BS-laden drivel has been pervasive for some time now, even before AI started taking over.

I'm talking about those 300-word, ad-ridden crap articles that are SEO'd right to the top, and if you're lucky you might get the 3-word answer you were looking for: "<300 words of shit>... and in conclusion, <1-step answer>.". Anyway, humans have been getting paid pennies to write those for a while.

AI just turns the throughput on that up to 11, where there's just no end in sight. I think this is like the primary failure mode of AI at this point. It's not going to kill us - we're going to use it to kill the internet. OTOH, maybe then we just go outside and play.


In the world of content moderation, we refer to this as constructive friction. if you make it too easy to do a thing, the quality of that thing goes down. Difficulty forces people to actually think about what they are writing, whether it is germaine and accurate. So generative AI, as you point out, removes all the friction, and you end up with bland soup.


Ironically, ChatGPT and similar LLM chatbots are great for those kinds of searches.


You would have to be soft in the head to rely on any LLM for researching information on a medication you're actively taking.


It won't end until the motivation ends. Referrals , and ad revenue.


Before AI, product comparison sites were ramblings of interns paid by people who found out you could make money from SEO-optimized blogs.

And long before the Internet, people slapped random concoctions together and sold them as medicine, advertising them as cure-alls.


Any source of content can be controlled or manipulated in non-obvious ways. And we already have strong algorithms for manipulating human attention (resulting in the growth of non-falsifiable conspiracy theories, for one). There is no clear approach leading out of information dystopia.


Drives me nuts. The internet is dead.

I just bought a home and I have been googling the best way to tackle certain home improvement projects - like how to prepare trim for painting. Virtually every result is some kind of content farm of AI-generated bullshit with advertising between every paragraph, an autoplay video (completely unrelated) that scrolls with the page, a modal popup asking me to accept cookies, a second rapid-fire modal popup asking me to join the newsletter to "become a friend"

For better or worse, Reddit is really the only place to go find legitimate information anymore.


For this kind of search, YouTube and TikTok (yes, TikTok) are your best bet. Videos are not (completely) flooded by AI (yet) and you can find pretty much anything about manual work.

I prefer text content to videos by a long shot, but genuine, human text content is almost dead. Reddit might be one of the rare exceptions for now. There are also random, still active, old school forums for lots of things but they tend to become extremely hard to find.


Gaining information from a video (often just someone talking into their phone) feels like sucking a milkshake through a coffee stirrer compared to reading a forum post written by a human. Worse, you can't see how deep that milkshake is at a glance, so you may end up with just a sip from a melted puddle vs. the big volume of content you wanted.


That depends. For anything that involves learning how to do a physical movement, video is an infinitely more information dense medium to learn it through.


I would disagree, I've done a number of diy home improvement projects with decent (for dyi) results using mostly youtube. My old school washer and dryer have been saved repeatedly by following step-by-step youtube videos for fixes and part replacements.


You are right that youtube is better but so much of that content is also biased towards sponsors. At least the good instructional content with high production value tends to be very heavy on sponsorships. The indie stuff can be great, but you are gonna have a 720p shaky camera with terrible lighting and lots of umms and backstories about why I am redoing my vintage farmhouse (a-la the recipe meme where every recipe page has a 32 paragraph preamble before the actual recipe)


For what it's worth, the last time I had a home improvement project I needed youtube help with, the one-and-a-half minute mumble-tronic video shot on a Nokia brick-phone was the most helpful one.

Would I have preferred a nice 1080p, shot in good lighting on a flat white table? Yes. But those also tend to be 30 minutes along, and as you said, with a sponsorship for HurfDurfVPN in the middle.


Well why do you expect people to teach you for free how to do home improvement? Those people who know how to do it well are working with it and you can pay them to improve your home.


AI-generated youtube videos are here too, although they're fairly easy to spot for now. The general formula seems to be a bunch of stock images / AI-generated images / stock footage relevant to the video title, with a LLM-generated script read out by an elevenlabs-style voice.


TikTok is where you go to find someone (if not synthetic voice) reading to you a 30 second summary of the manufacturer's press kit and pretending like they reviewed it.


Get a general purpose home maintenance book.

For example, https://archive.org/details/stanleyhomerepai0000fine/page/14... links to the chapter "Painting Trim the Right Way" from the book Stanley Home Repairs, 2014.

Could also look at used book stores. Home repair hasn't changed much.

Edit: Could even fire up Wine and try the CD-ROM "Black & Decker Everyday Home Repairs" (published by Broderbund) at https://archive.org/details/BlackDeckerEverydayHomeRepairsBr... . https://www.goodreads.com/book/show/3424503-everyday-home-re... says;

> Like its predecessor in book format, the CD-ROM version offers easy-to-follow, step-by-step instructions on more than 100 common household problems, from how to fix a leaky faucet to repairing hardwood floors. What's more, the CD-ROM version incorporates animation and narration to help make the repair project even easier to understand and complete. Instructions can be viewed one step at a time or all at once, and, if desired, can be printed out and taken directly to the repair site. Included with each repair project is the projected time needed to complete the work, estimated cost, and a list of materials and tools needed.

That sounds pretty nifty, actually!


I think this and validated sources are the best direction.

A trip to the bookstore to buy "x for dummies" can save dozens of hours of web searching.

The current iteration of the internet and AI is lacking depth, detail, and expertise.

You can find 1 million shallow answers on reddit, or echoed in AI, but anything more than the most cursory introduction is buried.

Not only is shallow information easier to generate, it is what most users want, and therefore most engines and services cater to it.

To find better content,you need to go to specialty outlets that don't cater to the lowest common denominator.


Owner/builder here, of a 1939 home. I invested in a home reference library part way through my own improvements; I should have done it before even lifting a screwdriver. Renovations (https://www.bookfinder.com/search_s/?title=Renovation%205th%...), from Taunton Press, is the first source I consult when starting a home improvement project. Chapter 18 is all about painting. Many of the other titles from Taunton are excellent, but Renovations is unmatched in it's coverage.

All of the flat white MDF trim you buy is primed and ready for painting, too.


I have an older car and a newer car. I can find out how to do any repair on my old car because it existed during the old internet when people did all kinds of write ups.

The information on working on my new car is non-existent other than Youtube videos where the majority is just a random dude who knows nothing filming himself doing a horrible self repair.


IME for home improvement Youtube is the best resource, though I can understand if you were hoping for text and pictures.


Also your local library probably has a bunch of home improvement books. They're probably from the 80s, but trim painting techniques don't change that much.


What you can do in the library is sample a wide variety of the books in the topic you want, and soon you'll identify an author, or publisher, as your favorite, and then you can go purchase more in their series, and consider donating them back to the library when you're done with the project!


> For better or worse, Reddit is really the only place to go find legitimate information anymore.

This is frightening and, I fear, true.

But I'd also add one odd little counterpoint: some of the most useful discussions and learning experiences I've had in the last four years have happened in private Facebook groups. As soon as the incentive to build a following using growth-hacking and AI -- which private groups mitigate to a greater extent -- is taken away, you get back to the helpful stuff.

The FreeCAD group on Facebook is great, for example. And there are private photography groups, 3D printing groups, music groups etc., where people have an incentive to be authentic.

Public Facebook feeds are drowning in AI slop. But people who manage their own groups are keeping the spirit alive. It's almost at the point where I think Facebook will ultimately morph into a paid groups platform.


The video sites are gonna be way better for this. Or reddit. I don’t know how much longer that will be true with AI video generation becoming cheaper over time though


I used to search in English to get more results. Short-term, I might start searching in my native tongue to get less results.


Yeah, I've had this experience as well. I'll have to go 4 or 5 pages deep in the results to get to a forum thread someone wrote in 2005 referencing a product that doesn't exist anymore plus a bunch of advice that's mostly still applicable.


This was probably always the likely outcome of an internet economy that revolves around the production and monetization of "content".

We started by putting advertisements on existing content, then moved to social networking and social media, which was essentially an engine for crowdsourcing the production of greater amounts of content against which to show advertisements. Because money is up for grabs, producing content is now a significant business, and as such, technology is meeting the demand with a way to produce content that is cheaper than the money it can make.

The problem of moderating undesirable human-generated content was already starting to intrude into this business model, but now generative tools are also producing undesirable content faster than moderation can keep up. And at some point of saturation, people will become disinterested, and tools which could previously use algorithmic heuristics to determine which content is good vs bad will begin to become useless. The only way out I can see is something along the lines of human curation of human-generated content. But I'm not sure there is a business model there at the scale the industry demands.


> We started by putting advertisements on existing content, then moved to social networking and social media, which was essentially an engine for crowdsourcing the production of greater amounts of content against which to show advertisements.

I see a lot of people talk nostalgically about blogs, but they were an early example of the internet changing from ever green content to churning out articles on content farms. If people remember the early internet, it was more like browsing a library. You weren’t expecting most sites to get updated on a daily - or often even a monthly - basis. Articles were almost always organized by content, not by how recent they were.

Blogging’s hyper-focus on what’s new really changed a lot of that, and many sites got noticeably worse as they switched from focusing on growing a library of evergreen content to focusing on churning out new hits. Online discussions went through a similar process when they changed from forums to Reddit/HN style upvoting. I still have discussions on old forums that are over a decade old. After a few hours on Reddit or HN, the posts drop off the page and all discussion dies.


Blogs were great when they supported RSS, you could subscribe to feed and get updates if they happened every day, or randomly months or years in the future. There was no need for refreshing to see if there was something new.


I feel like RSS feeds made it to easy for me to follow lots of blogs to the point where the amount of content was too much. Being forced to manually review blogs for updates works as a filter in that I only go through the effort (albeit still small) of visiting the page if I was interested enough in keeping up to date with it. Not saying RSS didn’t have great advantages; just that your comment made me think of this potential downside.


Also with some blogs we started to attach content to personalities, which was different than consuming content from another internet stranger.

And with personalities you have some form of relation to, you want these more recent updates instead of sticking to topics of interest.

Reddit is at least still focused on topics instead of people. I think this is why for some it still is more interesting than platforms like Insta, Facebook or Twitter.


That's a fascinating perspective. I imagine A blog should do something like press releases and describe and progress made on the actual website or plans for-. Forums should then play with ideas and chat is for hammering out details that are hard to communicate or overly noisy and for talking about stuff unrelated to the project.


> I imagine A blog should do something like press releases and describe and progress made on the actual website or plans for-

A lot of older websites actually used to do this with a “what’s new” section or page. With blogging, “what’s new” became the entire site, with almost the entirety of the content (everything that wasn't new) now hidden.

Ironically, after mentioning that discussion dies off incredibly quickly when HN stories fall of the front page, this discussion was moved off the front page to a day old discussion. My guess is that almost no one will see it now.


People some how forgot the concept of accumulated value. It use to be my main argument against platforms. You dump your stuff onthere and then it vanishes into the memory hole entirely by design. The point is to keep you on the website not to make your stuff available. It is a strange contradiction that really shouldn't be.

I remember when forums gradually turned into q&a repos.


Ive seen it! But yeah, I typically only browse the best stories section, basically I really want some human curation in my feed which might help with the vast amount of infogarbage generated by llms?


There isn't, because human trust can barely scale past 100 people, much less the entire internet. I think humans will recede into the tribes we were always built to understand and be a part of. Private group chats are far more popular and ubiquitous than we give them credit for, and I only see that accelerating in this climate.


Private chats which effectively become echo chambers further dividing an already divided society is what I foresee.


"Echo chambers" have been the default for almost all of human civilization right up until about 10-20 years ago. You communicated with your immediate circle of friends and coworkers rather than arguing politics with LLM bots on Twitter.


No, this is different. A local bubble populated with a normal-ish distribution of people is different from a distributed bubble populated with people who became grouped either voluntarily or algorithmically.

For example: https://news.ycombinator.com/item?id=25667362


At some level there is always a private mode. Think family and friends. Do you not have any issue with everything being public? I think the parent suggests we’re not made for very large groups and I kind of agree. I can’t name 100 people I know or known in my life. Maybe I can (barely) but with great effort.


on the contrary.

Good fences make for good neighbors.

Its not a coincidence that the printing press brought devastating war to europe in the form of the wars of reformation [1] .

The internet is another real tool for knocking down fences for free, by anyone. Its only a matter of time when there's pushback by angry fence-owners.

We absolutely need less friction and more of minding our own business and focusing on our own back yard instead of chiming in on someone thousands of miles away.

Don't get me wrong, I'm 100% for the free flow of information, but what people (HN crowd?) don't understand is that a significant subgroup of humans cannot tolerate relentless change or challenges to their worldview for too long.

[1] https://en.wikipedia.org/wiki/European_wars_of_religion#Defi...


See also the importing of millions of people whose worldviews we cannot reconcile with ours.

It’s not just the HN crowd.


I think that said something about your worldview rather than those million.


The fact that you made a throwaway in order to make a personal attack rather than substantively engage with the point indicates that you don't actually have a good argument against it.


Im not sure all private chat groups are really private. Maybe some are but can’t help thinking the industry isn’t at least running AI on private chats and summarize what people are talking about.


That’s orthogonal to the point though, which is about the social value of private chats etc. Not whether they’re truly technologically “private”.


Private in the sense of admittance, not observation.


Human curation is possible in an open system, but when you have a few large silos this algorithmic efficiency is put to use and we can observe the result. But I agree and hope people will lose interest and stop consuming trash. The gamble on the other side is that people will get used to poorer and poorer algo served content and the industry will continue squeeze profit out by any means necessary and indefinately. By looking at the history of cable television it appears there is a breaking point.


> The only way out I can see is something along the lines of human curation of human-generated content.

That's retweets.

> undesirable human-generated content was already starting to intrude into this business model, but now generative tools are also producing undesirable content faster than moderation can keep up.

> people will become disinterested, and tools which could previously use algorithmic heuristics to determine which content is good vs bad will begin to become useless.

So what these parts are saying is, tiny monoculture of bored college kids are always going to figure out the algorithm and dominate the platform with porn and spams and chew up all resources, and that both improved toolings and tie-in to monetary incentives intended to empower weaker groups to curb kids only worsens the gap, and that that's problematic because financial influencers are paying to be validated by the masses, not to be humiliated by few content market influencers.

But what is the problem with that? Those "undesirable content" producers are just optimizing for what market values most. If that's problematic, then the existence of the market itself is the problem. What are we going to do with that? Simply destroying it might make sense, perhaps.


>This was probably always the likely outcome of an internet economy that revolves around the production and monetization of "content".

hasn't publishing since Gutenberg been driven by the monetization of content? Looking at the history of the Catholic Church, potentially before that too.


I’ve been idly wondering about something like the Web of Trust. A social network where users vouch for one another's actually-a-real-humanness. There could be setting that let you adjust the size of the network you see (people you’ve actually met? One remove from that?)


What you’re describing is early Facebook. Your feed was only from your 1st degree connections. Content mattered because it was from people you cared about (and inherently knew, because users wouldn’t accept friend requests from people they didn't know). It really was the pinnacle of social media.


Wasn't this Instagram early on too? I think many social networks start off like this, but then either grow out of it and/or "sell out".


Why does it matter that the user is a human, especially if you can't tell the difference?


“Content” is the advertising term for whatever fills the space between the ads.


Content is negative space for the industry


> But I'm not sure there is a business model there at the scale the industry demands.

This is the kicker. When unfettered by regulation or leaders/workers with morals, most industries would rather avoid human curation because they want to sell you something. Amazon sellers would rather you not see or not trust the ratings because they want you to buy their stuff without knowing it's going to fall apart. Amazon makes a profit off it, so they somewhat encourage it (although they also have the dual pressure of knowing that if people distrust Amazon enough they'll leave and go somewhere else, so they have to keep customers somewhat happy).

No, curation has to come from individuals, grassroots organizations, and/or companies without a financial interest in the things being curated - and it has to revolve around a web of trust, because as Reddit has shown, anonymous curation doesn't work once the borderline criminal content marketers find the forum and exploit it.

> The only way out I can see is something along the lines of human curation of human-generated content.

...however, unfortunately, curation doesn't solve the problem of people desiring AI-generated content. That's a much harder problem. Even verifying that something was created by a human in the first place is hard. I don't want to think about that. I'm just going to focus on curation because that's easier and it's also incredibly important for the lowering quality of physical goods as well.


No offense and I understand, but that use of "AI-generated content" sounds like somewhat of an euphemism. I think there are not significant number of people who specifically prefer AI generated versions, but rather it's referring to certain kind of content that the attempt to democratize and trivialize its generation by releasing AI models had completely backfired.

This distinction is important, because while AI is faster than humans, it's at best cheap gateway drugs into skilled human generations.


It's not just images. I frequently get genAI word salad in the top three to five results when I google anything that could be considered a common question. You don't even realize at first when you start reading. Then it makes you start to question the things that aren't obviously genAI. You can sort of tell the kinds of things that a human might be wrong about, the ways in which they're wrong, how they sound when they're wrong, how likely they are to be wrong, the formats and platforms wrongness exists within, how often they are wrong and how other humans respond to that. AI is a different beast. No intuition or experience can tell you when reasonable-sounding AI is wrong.

Our entire framework of unconscious heuristics for ranking the quality of communicated information being rendered useless overnight may be a recipe for insanity and misery. Virtually nothing has made me this genuinely sad about technology in all my life.


Tbh I think this is just it for the public internet. It's not Google that's failing, it's the substance of the public internet that has failed. Whenever I need help or questions answered on something, I don't google it, I don't post on public forums, I ask on private group chats where I know everyone is a real person, no one is making money, no one is copy pasting chatgpt to collect internet points to sell their account later.

There is only one way I can see things changing and people aren't going to like it. All content on the internet gets linked to a legal ID. Every post on facebook, every comment can be attributed to a real person.


> All content on the internet gets linked to a legal ID

Identity theft would go through the roof.


I don't think the heuristics are that different. SEO-spam and BS content existed before, and both Google and YT were full of them, all made by human "content creators" who optimized for clicks and focused on gaming the YT recommendation system. AI content isn't that different. But unfortunately it's now 100x easier to generate such content, so we see a lot more of it. The problem is fundamentally a problem of incentives and the ad-based business model, not a problem of AI. AI has made the enshittification problem a lot more visible, but it existed before.

I don't know what the solution here is. My guess is that the "public internet" will become less and less relevant over time as it becomes overrun by low-quality content, and a lot of communication will move to smaller communities that rely heavily verifying human identity and credentials.


Not to steal your thunder but loss of privacy still makes me much sadder


I have a hard time explaining why, perhaps because I did not know what a baby peacock looks like, but this somehow really drove home the "dark side of AI" for me.

I have gotten used to trusting search results somewhat. Sure there would be oddball results and nearly non-sensical ones, but they would be scarce through a sea of relevant images. Now with this, I would be blind to the things I don't know and as someone who grew up it with Google "just being there", it truly scares me.


If, like me, you don't have a Twitter account and want to see more than just the single post: https://xcancel.com/notengoprisa/status/1842550658102079556


And if you want to automatically get redirected to xcancel: https://einaregilsson.com/redirector/


Google is going to have to solve for this somehow if they want to remain relevant, right? If searching for an image and generating the image yield the same result, what's the point of image search any more?


> what's the point of image search any more?

The same could be said for regular search. Pretty much anything I search for yields a page of ads followed by pages of content farmers followed by pages of "almost sounds like experts but is still just a content farmer."

Financing the Internet with advertising has really made it difficult to find good quality content. The incentives are completely misaligned, unless you are a 'content creator' or Google.


Watermarking I think is supposed to be the goal, but I don’t think anyone can think that the web is in anything but managed decline. The AI feeding itself AI will just end it all. I think Platformer describes it best: https://www.platformer.news/google-io-ai-search-sundar-picha...

The question is what comes next, and I don’t think anyone has an answer to that.


I think what comes next is interest-based, influencer-moderated, semi-private chat rooms. For example a lot of hobby youtuber have moderated discord servers. My diy 3d printing communities have discord servers. I have a few invite-only discord servers for various circles of friends and family.


Well, and once this kills the web then Google's AI no longer has a data source its AI can tap to answer questions about anything that happens afterward, so it kind of needs the web to at least limp along.


Google was for the Open Web since day 1 but their aggressive ad based business model kind of doomed the Open Web because a lot of people were also incentivized cause of Google to aggressively peruse ad revenue. I think the first degree of separation between websites should be which ones are commercial and which ones are non-commercial. Of course that doesn't mean that commercial websites are bad because essentially what you want is quality information and content but at least you know what you are getting into. E.g. if you are consuming content and information from non-commercial website you know that the owner of the website is not trying to sell you something via promoted content or via affiliate link or via something else. The main question is how do you support the Open Web? Is it ads, subscription, donations etc. Hobbyists are the champions of the Open Web because they produce quality information and content free of charge for the sake of helping people and because they love creating it.


aggressively pursue ad revenue*


What do you mean "solve"? This is solving the problem. If people see things they consider good enough, that's all they care about.

Source: every other piece of news or social media on the planet.


They're more likely to solve it so that you can't tell the baby peacocks are AI-generated


I assume this is one example of why big tech lobbies for stricter AI controls. They aren't afraid of AI takeover; they are afraid of AI destroying their businesses.


Their image search is presumably not profitable anyway so not sure they care


Maybe they will generate the images themselves and show a mix of both.


The problem is - how does Google get paid for providing this service. In a way, the better the service they offer, the less money they make. It really sucks.

Would you pay money to Google or some other company in exchange for a genuinely good search service that prioritizes well written content, and avoids AI (or human) generated crapticles?


Most egregious is the one copying the title from Snopes' "Video Genuinely Shows White 'Baby Peacock'?" (with the question mark cut off). A page all about how the picture isn't a real baby peacock.

But also, if you search the more accurate term, "peachick", you seem to get 100% real images, although half the pages call them "baby peacocks".


And the first result is from Adobe Stock, who you might assume would have higher standards than Pinterest and TikTok, but here we are.


In the near future, a significant portion of YouTube videos and podcasts will likely be AI-generated (e.g., through tools like Notebook LM).

However, I'm uncertain whether audiences will truly enjoy this AI-generated content. Personally, I prefer content created by humans—it feels more authentic and engaging to me.

It’s crucial for AI tools to include robust detection mechanisms, such as reliable watermarks, to help other platforms identify AI-generated content. Unfortunately, current detection tools for AI-generated audio are still lacking - https://www.npr.org/2024/04/05/1241446778/deepfake-audio-det...

[Edit] We just put together a list of notebooklm generated "podcasts": https://github.com/ListenNotes/notebooklm-generated-fake-pod...

Consider whether you'd enjoy listening to AI-generated podcasts. I believe people might be okay with shows they create themselves, but are less likely to appreciate 'podcasts' ai-generated by others.


>Personally, I prefer content created by humans—it feels more authentic and engaging to me.

I'd like to think that too, but I wonder how long - if at all - this will be true. I "want" to like human generated content more, but I suspect AI may be able to optimize for human engagement more, especially for simple dopamine inducing content (like tiktok videos). After all, we're less complicated than we like to think.

>It’s crucial for AI tools to include robust detection mechanisms, such as reliable watermarks, to help other platforms identify AI-generated content.

This will never work, unfortunately. There's no way to exclude rogue actors, and there's plenty of profit in AIs pretending to be human. If anything, we will have to watermark/sign human generated content.


> In the near future, a significant portion of YouTube videos and podcasts will likely be AI-generated

It's not helpful that you're making a binary distinction here.

As an example, as much as 10 years ago, I would find Youtube videos where the narration was entirely TTS. The creators didn't want to use their own voice, and so they wrote the script, and fed it into a TTS system. As you can expect from the state of the art at the time, it sounded terrible. Yet people enjoyed the videos and they had high view counts.

Are we calling this AI-generated?

We now have better TTS (without generative AI). Way better. I presume those types of videos are now better for me to watch. You may still be able to tell it's not a human because the tone doesn't have much variance. You'd probably have to listen for a minute or longer to discern that, though.

Are we calling this AI-generated?

Now with generative AI, we have voices that perhaps you won't be able to identify as AI. But it's all good as long as a human wrote the script, right?

Are we calling this AI-generated?

Finally, take the same video. The creator writes the script, but feels he's not a good writer (or English is not his native tongue, and he likely has lots of grammatical errors). So he passes his script to GPT and asks it to rewrite it - and not just fix grammatical errors but have it improve it, with some end goal in mind ("This will be the script for a popular video...") He then reviews that the essence he was trying to capture was conveyed, and goes ahead with the voice generation.

Is this AI-generated?

To me, all of these are fine, and not in any way inferior to one with a completely human workflow. As long as the creator is a human, and he feels it is conveying what he needed to convey.

I would love to take a first draft of a blog post, send it to GPT, and have it write it for me. The reason I don't is that so far, whatever it produces doesn't have my "voice". It may capture what I meant to say, but the writing style is completely different from mine. If I could get GPT/Claude to mimic my style more, I'd absolutely run with it. Almost no one likes endless editing - especially writers!


Question is how long till you can’t tell the difference


My FAANG working spouse thinks that AIs and Robocallers should be mandated to identify themselves. She thinks a audible "Beep-boop" at the end of a sentence for calls and video would be appropriate.


Barring that, every human could end sentences with a couple racial slurs to verify that they are not on AI


Microsoft is just waiting in the wings with Tay for just such an occurrence.


I support that idea. Along with properly implemented authentication so you can't just spoof your way to someone's phone, and painfully stiff fines for violators.


It's almost impossible now. NotebookLM really impressed me. I knew voice synthesis has gotten better than Stephen Hawking's "voice" but I really wasn't expecting having two realistic voices with emotions that even banter with each other. There is a bit of banality to them - they like to call something a "a game changer" practically every "podcast" and the insights into the material is pretty shallow, but they are probably better than the average podcaster already.


It's impressive at first until you realise they're practically ad libbing a script. They're filled with all the same annoying American clichés ("you know me, I like x", your aforementioned "a game changer", plenty of "wow"). It would be impossible to listen to two in a row without realising how repetitive it is.


Fun fact, Stephen Hawkins could’ve used a much “better” synthetic voice but decided to stick with that one as people associated it with him.


And he associated it with him, in his thoughts.


At Listen Notes, we recently removed over 500 fake podcasts generated by Notebook LM in just the past weekend.

It's disappointing to see scammers and black-hat SEOs already leveraging Notebook LM to mass-produce fake podcasts and distribute them across various platforms.


Personally I'm opposed to the unlimited slop machine.


After Google continued to make it progressively more difficult to use their Image search to navigate/download to the actual image I wrote an image search tool that could be hot keyed from your OS to search the google image repository and copy to clipboard in a fast manner using a custom Google Search Engine id.

About a year back I found that 90% of the results I was getting were AI generated, so I added a flag "No AI" which basically acts as a quick and dirty filter by limiting results to pre-2022. It's not perfect but it works as a stopgap measure.

https://github.com/scpedicini/truman-show


Wouldn't surprise me if in a few years Google, for certain keywords:

- autogenerates URLs (tha look legit)

- autogenerates content for such URLs (that look kinda legit)

All of this would be possible if one is using Chrome (otherwise the fake URLs wouldn't lead to anywhere). Of course, full of ads.

Think about it, some people are not really looking for some web site that talks about "baby peacocks". They are looking for baby peacocks: content, images, video. If Google can autogenerate good-enough content, then these kind of users would be satisfied (may not even notice the difference).

Maybe Google ditches the URL and all: type keywords, and get content (with ads)!


> would be possible if one is using Chrome (otherwise the fake URLs wouldn't lead to anywhere).

Didn't they do something like that with AMP. I recall that if you were using chrome and visited an AMP site from Google the address bar would say site.com even though the content was being served from google.com.



You're giving them ideas.


at least an ai://baby-peacocks/images would be honest


This sounds plausible actually


Like a search engine?


Like a "generate engine".


One of the replies mentions this uBlock Origin AI blocklist (haven't tried it myself): https://github.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist


I think we should have an allow list approach at this point. Maybe a web directory of trustful websites.


We're about to go back to the directory style websites of the 90s!


Yeeha!


I'm a part time maker and purchase a lot of designs off of Etsy to make into physical goods. I have to weed through so many AI images when purchasing designs off of Etsy now. I wish they required users to indicate if AI was used to produce the image so I could then filter them out.


Sellers are actually supposed to mark items as AI generated/assisted, where applicable: https://techcrunch.com/2024/07/09/etsy-new-seller-policy-202...

Whether they actually do this (and whether there's any incentive to do so), is obviously not a given


It's currently optional for sellers, Etsy says "This info won’t change what buyers see for now, but will be used to improve the shopping experience in the future."


Same now when trying to find 3d models to print. It's just a whole bunch of hueforge ai spam.


Fitting that this is a copy paste submission taken from another source (linked in dupe comments), likely by a bot based on post history. The computers are turning on each other.


We can only hope they consume each other in some kind of survival-of-the-fittest type scenario, and when all is said and done, we can turn the last one off and set the clock back to 2015 and try again.


2015 is peak humanity for you or something? We had good EDM back then, but that's pretty much it.


This phenomenon has been such a spur of motivation to start writing again. I love it.

The only way we can make sure the internet retains any goodness is by contributing good things to it. Passive consumption will rapidly turn into sub-mediocre drudgery. I suppose it already has.

Be the change you want to see, I guess. I’m a shitty writer, but at least I can beat the dissonant, bland, formulaic rambling of ChatGPT (here’s hoping, anyway).

I’m optimistic that a lot of us can keep something good going. We'll find ways to keep pockets of internet worth visiting, just like we did before search engines worked well.


There are a number of ways this might get solved, but I would speculate that it will generally be solved by adding image metadata that is signed by a certificate authority similar to the way SSL certificates are assigned to domains.

I think eventually all digital cameras and image scanners will securely hash and sign images just as forensic cameras do to certify that an image was "captured" instead of generated.

Of course this leaves a grey area for image editing applications such as Photoshop, so there may also need to be some other level of certificate base signing introduced there as well.


search "baby peacock before:2023" => doesn't have AI generated images


The Internet equivalent of pre-war steel:

https://wikipedia.org/wiki/Low-background_steel


And it's hard to fully trust any post 2021 text as well. It's pushing me to seek information from pre 2021 books for information.


Until the AI generated book is published and the AI generated websites all tell you it was published in 2014.


Wonder who will coin said term. Before AI. Pure clean internet 1991-2022


You sure about that "pure clean" part?


Pre-AI internet.


Low background jpegs!


Until Google breaks search syntax/tags again...


works for reddit also!


Someone needs to invent a “for humans, by humans” web. Possibly a future luxury.


The good news is that some AI company will invent this, to provide a source for their LLM.


this reminds me of the matrix


It's so hard to do this because the better you make it, the more valuable it is for someone to a) scrape it all b) try to insert fake content


Hear me out: is that Wikipedia? I am sure people are submitting all sorts of AI-generated information, but it's probably getting rejected? (If someone better informed than me has any data one way or the other, I'm super curious)


Quite contrary, people are gleefully machine translating Wikipedia to make more Wikipedia (in different languages). And arguing for it.


LLMs are reinforced through adversarial training - you would essentially be playing a keep-up game with AI generated garbage that would get exponentially more difficult to pull ahead in.


There was this image that was circling some 20 years ago around and later, with the Internet becoming a cable tv-like service where you'd be a subscriber to particular big companies sites and additional "free-range" pages

So the pessimist in me can see the Internet being affected by the free-vs-premium formula: "basic" Internet with ads, tracking, AI fillers, limited access to +18 content, in the worst form comes with these pre-defined sites and "premium" that's free of these limitations but it also in time tries to squeeze more money from users - like "premium but with ads"


I feel like this is what we are trying to do at Reddit, but needless to say it's going to get harder and harder.


I sense Reddit is a lost cause. Even before the latest wave of generative AI you could tell things were heavily manipulated.

I dare say that I haven’t noticed that much of a change in things and that could either be because LLMs are just that good at Reddit content, or that because Reddit was already so botted and manipulated it didn’t really change much.


Reddit started to decline dramatically once they started to charge for API access last year. Mods on a politics subreddit I was talking to said all the free tools they used to keep on top of things stopped working, so they could no longer filter out the trolls and bots.

I sure hope the money that Reddit made makes up for the readers who are fleeing.


Someone is trying to: https://brainmade.org/


A new sales point for Web3.


Actual Web3 at this point should be going offline.


How so?


They would add whatever feature to Web3 in order to sell the idea, at this point it's the Alchemy of IT.


Ever since Sora I've been thinking about the overall death of the internet "content". It all came back stronger with Meta Movie Gen.

I know there are no girls on the internet, but this AI crap is on another level. Even if find a trustworthy creator, I might be seeing a fake video of them. Say I like MKBHD reviews, I will need to pay attention if I am really watching his video on his official channel.

My guard will have to be up so much, all the time, I actually don't think it will even be healthy to "consume content" anymore. Why live a life where almost everything I see can be a lie? Makes me not want to use any of this anymore.


> My guard will have to be up so much, all the time, I actually don't think it will even be healthy to "consume content" anymore. Why live a life where almost everything I see can be a lie? Makes me not want to use any of this anymore.

While I generally agree with your whole comment, I feel like this part has been true for years on social media well before AI generated content hit the scene.


True, and we can go back to any type of media who always have a bias, but overall it just feels different. On one hand, humans writing and humans communicating, even if they have an agenda, is one thing. On the other hand, machines writing and machines communicating is a different level.

Maybe I am overall in a bad mood regarding all this, but this recent article https://time.com/7026050/chatgpt-quit-teaching-ai-essay/ also struck a chord. Do I really wanna spend my time reading/watching machines talking to each other? How long until browsing Reddit or HN will be worthless?

Do I wanna get old with lower cognitive abilities and become this? https://slate.com/advice/2024/10/grandparents-misinformation...


Yeah that's a good point. It's not so much that it was already happening before, but it's the shear quantity of it. People have been doctoring images for political gain for forever, but that at least took some photoshop skills. Now anyone can just pop out thousands of misinformation photos, articles, and even now videos in a few hours.


Exactly, yeah - the noise to signal ratio is shifted catastrophically because you can generate infinite amounts of bullshit in a flash. You don't have to trickily convince people that X is true instead of Y with some carefully planted forgery, you can just drown out the Y with a billion fake X.

It effectively kills the non-walled internet as an information repository.


Or, consider using Wikimedia Commons, where images are painstakingly categorized, documented, and freely licensed:

https://commons.wikimedia.org/wiki/Category:Pavo_cristatus_(...


I’ve had a mild amount of success asking some nature photographers if they would be willing to make a few of their photos freely licensed so that they can be used on Wikipedia articles.

Wikimedia is a fantastic resource.


Creative Commons offers a portal if you wish to cast a wider net (music, video, 3D models)

It also includes Google Images and Flickr.

https://search.creativecommons.org/

I found the peachicks on Commons by searching "peacock" and then following categories up the tree. If people use the wrong search engine with naïve search terms, I don't know what to tell ya.

This is a parallel example of why reference librarians are still worth consulting, because they will guide you to the library's resources and databases, and demonstrate how to use search queries.


So, I was drawing an eagle for a new imprint, and I needed a reference for good looking claws. So I used my Google Images search shortcut to get pictures of eagles, and it was almost all AI. If you ask yourself the question, eagle claws suffer from the same problem that human hands go through with AI, so it's completely useless.

Yandex images search is flawless though.


Make a search engine that doesn’t have AI result except when I specifically ask for it, or you soon won’t have a search engine business.

A really quick fix is to search with “-ai” and that Google doesn’t do this implicitly for images is really strange.


How do you expect them to implement that?


Well so far it works to just add ”-ai” so it feels like a pretty easy addition


The hard part is identifying what's AI generated and what's not.


Spam (created by humans) evolves.

So do humans.

If Google prioritizes AI slop, Google will be deprioritized.


AI slop (or convincing lies) is not distinguishable from genuine, human-generated content. Machines definitely can't do it, and humans often can't either. That problem with get worse.


Why is that a problem, anyway? If machines can play chess better than any human, it is reasonable to assume that they can write articles better than many humans. What's wrong with Internet filled with good content generated by AI?


AI can't create good (meaning truthful) content. It's literally impossible for LLMs to be hallucination free. It's just not how they work.

The problem is going to get worse as hallucinations are used as training data because even the AI companies can't tell the difference between AI content and human content.


I really don't think that's the case. The lesson of the 2020s internet is that the biggest players have become too big to be disrupted.

The masses are fully here now. They're too passive to know or care what's going on. They stick with the path of least resistance: Google, Amazon, Reddit, Twitter, etc. No matter how hostile or shitty those options become.

We have to put aside the way we've thought about the internet before now because it doesn't apply anymore. There will be no more MySpace -> Facebook. The internet is no longer made up of a high enough percentage of conscientious and deliberate users to make a difference.


Already is. I suggest everybody else de-google as well.


Just tested on Kagi, and it’s catching a similar set of images.

I massively rate Kagi, but this is way less than ideal.


I just did it on Kagi and except for the obvious stock.adobe.com ones, all of the AI generated images were from snopes and media sites repeating this story, but I have quite a few sites blocked (pinterest is definitely nuked)


Would you care to share what you do instead? For search in particular, the g-suite, etc. are not such a big deal. I'm really hoping for something other than use duck duck go / bing / etc. because AFAIK they all serve advertisement funded trash and I've yet to hear a really compelling alternative, tho I've been too lazy/busy to try Kagi.


I know you said that you've been too lazy and busy to try it, but I am very happy with Kagi. If you don't want a search engine that serves advertisement funded Trash , then I recommend supporting the search engine whose business model is to provide searches without ads via subscription.

My search lists are curated very well through my settings and even just using the recommended block list keeps a lot of junk out of my search results. If I find a bad site, I can block it from all future results pretty quickly. I also can use regex on the URL's in the search result to redirect things like Reddit to old.reddit automatically. It's very nice.


We need a slim p2p social network swarm protocol.

If we could subscribe and suggest content along our interest graphs, we would control the algorithm and could prune slop with ease.

It'd be incredibly awesome if news, forums, and social media worked like BitTorrent.


> I'm really hoping for something other than use duck duck go / bing / etc. because AFAIK they all serve advertisement funded trash

DDG lets you turn ads off completely.


I tried the same search on DDG and Bing, and saw a variety of the same fake images on both. The monster is already past the gate.


You don't think the AI content creators will target the next search engine if Google fails? I don't think Google WANTS to prioritize AI slop, they just are unable to not do it.


It's is more about Google search than it is about the internet.

There was a period in the past when human spam was a problem that was not trivial to solve.

As always, modern problems require modern solutions.


The most effective spam filtering is done not by content but by various white and black lists of providers. Essentially it is a trust score which is a very old solution to a lot of problems.


I don’t think “better spam detection technology” can help out of this even in theory. The whole point of LLMs is that, by construction, they produce content which is statistically indistinguishable from human text. Almost by definition, any statistical test to distinguish LLM text from human text could be turned around to make better LLMs.


So statistical test won't work, and we need something else


I noticed recently when searching for images of cities that they're nearly all over-the-top unrealistic HDR images, beyond what you used to find in an travel agent's catalogue.


I stopped using Google search...why even bother now. Results are just some crappy page with ads. Aastroturfed wikpedia page is also suspect. Chatgpt can answer questions in seconds. Just not sure if correct, but most of the time more than good enough. I feel like Google is destroying their credibility by day. Just go to zoo to see peacocks and take pictures. At least it will be an real experience not some virtual manipulation


The described problem is AI-slop in Google, and your solution is to drink directly from the spigot of ChatGPT?


I suppose the logic could be: if you're going to consume AI generated content anyway, why not use a setup where you have control over the system prompt and other parameters? Not sure if ChatGPT qualifies there, though.


My solution is to use the computer as little as possible. Go see the world to know what a peacock looks lie. Last time I've seen Peacocks was in Lisbon in St. George's castle, 4 years ago. The kind of questions I ask ChatGPT, are mostly code questions. Or for it to help me with planning something. I ask it questions, and it can provide some sort of logic behind the answer which I can then reason about. Sure it can mislead me, but it's more like an ongoing conversation I'm having with it. So its more like an opinion I'm getting, and I discount it. I'm generally a skeptical person. So I'm well aware of the manipulations that are happening online. Google is just a weaponized player in misinformation warefare at this point. It purposely will go out of its way to build consensus, for conflicts. Bunch of technocratic Billionaire overlords would get you to support genocide if it would benefit them. So I just don't trust google at this point for anything news related. And the rest of their content seems to be just a giant trap of spam pages.


A lot of comments amount to: "Internet is dead". AI is crap for sure but far from making Internet dead or useless. Consider: - emails, - bills and payments, - banking, - searching for and buying stuff (assuming you already know what you want that is), - calls/chats - whatsapp, messenger, etc. - youtube (for learning), - social stuff - however bad.

AI? This shall pass too. Internet will find its way.


Isn't this a self solving conundrum really? If google dies because of being completely useless, then no one has incentive to keep generating clickbait and fake content anymore do they?


Maybe it's more accurate to say that Web 2.0 is dead.


DDG/Bing had only one Ai image for my search of baby peafowl. Unfortunately the one AI image was from stock.adobe.com


Same here. Seems like DDG does better (but not perfect) at avoiding nonsense results than google in this case.


This is why Google took down its cached results. It's going to horde pre-LLM internet data. Perhaps sell it but I doubt it.

Our best bet is to have scraped all that data, and give you a temporal parameter to search, like:

+"Sponge bob" year:2012


I am reminded of the decline of MySpace. It was just thousands of bots posing as users, posting ads on people's pages for e-books, pharmaceuticals etc. The bots remained talking to each other long after the last humans left.


If you use the applicable phrase "peacock chick" or "peacock hatchling" then the results are better. Garbage in and all that.


I'm at a juncture in my career where I'm asking what could really motivate me to do anything that I really feel is worth doing in tech. In my earlier years I remember using both CompuServe and Prodigy. I'm not sure if it just hindsight colored by nostalgia, but I yern for the feeling I had as a young teenager when I could explore a quirky and curated world of information.

I'm starting to think that all this AI stuff has finally pushed the ads-based Internet past its tipping point.

I feel I could be motivated to work on a walled garden with moderation paid for by subscription fees. What would it be worth to you to have an entirely new online experience free of all the enshittification of the past 15 years?

Personally, I pay for Kagi just to have a small taste of what that could be like. But what if not just the search engine, but also all the sites be funded entirely by a subscription fee paid to the service profider? What if privacy could be a foremost feature of that world? What if advertising and astroturfing were strictly forbidden, and human authors would have to be vetted by other humans to be allowed a place in this world? "This content is Certified ads- and AI-Free(tm)."

I really don't know how well something like that would turn out in 2024, but I feel I wouldn't be alone in wanting to give it a try.


We could also have a public library but for the Internet. A list of sites and articles curated and maintained by librarians and experts and paid for by local taxes.


Facebook is flooded with this! Fake photo of poor people asking for help and you see thousands of like and people commenting how they can help


A good chunk of those replying will be bots


What’s the idea? Why do they use bots to reply?


To make it look like it is has actual engagement


actual baby peacocks are almost indistinguishable from guinea hatchlings and there's a strong resemblance to baby chickens or turkeys.


You know that, but what about some kid looking on google images?


I’m an adult and I didn’t know that


The irony of Google's core value proposition (search) being rendered useless by a technology that Google is investing heavily into (AI). It's a self-licking icecream cone of suckage.


The internet needs strong provenance to ensure content is created by trusted parties.

It has to be done in a decentralised way to ensure no enterprise controls who is trusted and who isn't.


[deleted]


AI is not a "species" in any sense of the word.


It's not even the right terminology. I think you should probably use "peafowl". The search "peafowl chicks" seems to return all real images.


I think this is kind of key to the issue, the good content is there if you know how to find it. But if you don't know the right terminology then you are going to search for baby peacock and get bad results.


Well, that makes me wonder if the search isn't flawed primarily because it is an image search. I try to stress to my children how important it is to prefer reading material over "watching material". And while these are stills (photographs seems inappropriate), the fact that because they are images the search can't possible help you to self-correct. Google has no opportunity to show you the correct terminology within the results, and you do not learn enough to then go out and find the images you were hoping for.

I know there are exceptions. There are answers I've wanted that can be found within the first few minutes of the first video on Youtube, which I've gone days without discovering because I'm video-averse. But I suspect that the habit is, on average, more benefit than detriment.


Kind of like how the Internet worked pre-Google.


Well, on the "bright" side, all but 2 of the striked ones are either explicitly AI generated art (the 3 Adobe Stock ones, 2 from freepik, and the 1 from Instagram) or about noting the images aren't real (the 2 Snopes and the 1 in the bottom left calling out the feet).

On the sad side the TikTok and YouTube ones that likely led to all of this aren't labeled and are present, not to mention the complete lack of "I want the AI things automatically filtered, I'm not interested in trends I'm searching for actual things right now" button. Without something like that it will become harder to use Google to find new content.

I mean people obviously like the content, it's cute enough to get shared around so much to make itself popular in these images and to trigger the post on X about it. Nothing wrong with that... but if it's not easily filterable for what the user is actually trying to find then Google has somewhat failed at its goal.


Given the query, "baby peacock", doesn't describe something that actually exists, what results is Google expected to return? Actual baby peafowl? Cartoons of peacocks with "baby" proportions? Should they be consistent with the results for a similarly fanciful query like "baby rooster"?


It seems like the image results for "baby peacock" are returning articles talking about AI-generated photos of baby peacocks due to some recent trend involving an AI-generated baby peacock image.

Have people tried searching for other animals? Maybe this isn't a case of Google being inundated with AI-generated photos, but just something to do with the results for this particular phrase.


This might not be insightful, but I think we need to adapt.

Search and Internet is dead. It will be. There is no going back with AI. We must to learn how to deal with. You too should rethink how to approach the Internet, how to surf it.

If search is dead, are there any solutions to it? I use more RSS source now, because this is human created content. I navigate more to "word of mouth".


I don't know why, but AI-generated images have a very particular look; here I pick up on a certain bokeh blurring and huge, shiny eyes. The peacock actually reminds me of the AI girl that always gets generated: a sort of Asian Amanda Seyfried with unnaturally huge Alita-like eyes.


If this is the beginning, where are we going to be in 2035? I just can't imagine it without being so wildly speculative.


I've started thinking more and more about a short throwaway conversation in Anathem about how the internet in their world is absolutely ruined by AI and the only solution they have left is a user driven reputation system for entities and how one of the characters just earned a lot of "reputons" for recording an event.

Mostly I think about how something like that is going to be signed into law by some state and it'll require everything you do to be linked to your government issued ID card so they can "prove" you're not spreading AI misinformation and all the horrendous unintended side effects that will spread from there.


"Anyone can post information on any topic. The vast majority of what's on the Reticulum is, therefore, crap. It has to be filtered. The filtering systems are ancient. My people have been improving them, and their interfaces, since the time of the Reconstitution."

...

"Asynchronous, symmetrically anonymized, moderated open-cry repute auction. Don't even bother trying to parse that. The acronym is pre-Reconstitution. There hasn't been a true asamocra for 3600 years. Instead we do other things that serve the same purpose and we call them by the old name. In most cases, it takes a few days for a provably irreversible phase transition to occur in the reputon glass - never mind - and another day after that to make sure you aren't just being spoofed by ephemeral stochastic nucleation."

Fantastic book. I read it twice so far, highly recommended. So many little off-handed conceptual gems everywhere.


Luckily, Altman already has Worldcoin revved up and ready to go! Isn't that convenient?!


Dang - what protections does HN use for AI generated comment garbage similar to this baby peacock issue?


Images were already pretty devalued because of how good phone cameras are and how every person has one. Now it will just give it a little more kick


Searched for 'peacock chick' and got 100% genuine images.

Searching for 'baby dog' would probably get you garbage images too. (it does)


"peacock chick" has many more real images than AI


who ever searched for baby peacock? in this searchspace, is peacock distinguished from peahen? because peaweewee is potentially not as interesting a search as peacock, and I'm referring to the tail as the romance languages refer to it.


It sounds a bit unfair to use content from a site without crediting the source in an obvious way. I'm sure this shameless content hijacking can't continue as in the end there will not be any source to query. Robots.txt should allow meta-tags like block 'all AI' bots (or these AI companies should pay their sources).


Yandex and Bing don't seem to have this problem as much as Google does.


Surely a big ingredient of this problem is: Google Results == Internet.


This phenomenon is also heavily starting to affect NSFW images too. It is awful.


I’m mostly deGoogled, and this trend has been pretty minor for me.


why are image results mostly coming from recent uploads? if i search for cool frogs its super likely the best photo came from 1982 and thats what i want to see


the "business model" is going to be created soon, it will be doable with bitcoin. the problem is that we have to redefine what quality means.


OpenAI must be destroyed!


A "baby peacock" is not a thing, so I honestly don't see the search quality issue here. The text "baby peacock" is associated with these fabricated images.


There's no practicality to being so pedantic.


There is though; if unusual word combinations are correlated with AI imagery.


Have you ever encountered the extremely large contingent of HN commenters who claim to prefer that Google interpret their search literally, exactly, and at face value? Wouldn't they be howling mad if Google silently adjusted the core concept of your search from "baby peacock" to "peafowl chick"?

In any case the web and Google's index of it is crowdsourced. If the web associates this image and that phrase, what are they supposed to do about it?


Baby male peafowls don't exist?


They're called peachicks.


But if I told someone that I had baby peacocks on my farm, they wouldn't look at me bewildered and wonder what kind of animal I'm talking about. If they know what a peacock is, then they know what a baby peacock is, whether I'm using the correct word or not. The same is true if I say "baby cow" instead of calf, or "baby horse" instead of foal. You and I can picture exactly what those animals look like in our heads, and it seems AI can too.


The humans would know what you meant, but the machines do not, because until recently nobody had ever said "baby peacock".

https://trends.google.com/trends/explore?date=today%203-m&ge...


Humans: "Hey this is bad"

Tech: "Gosh we better tune our algos so these images are even MORE indistinguishable from the real thing"

Evidently the road to hell is paved with novelty image generators.


I was searching google images for "cat professor" recently.

Same, as far as I could tell all AI garbage with weird saturation and colors and uncanny valley .... they look weird / didn't work for me.


... Did you want a picture of a real cat professor?


I was hoping for a picture of a real cat. There's a different look that real photos have. The AI photos all look like computer polished weirdness.


So generate your own.


I do not have the required animal.


The web became trashed over a decade ago.


The return of Britannica hard copies?


If you Google meat Cove it’s presented as a “human settlement” in Nova Scotia

I predict the word meat fucked with the AI


Not on duckduckgo


Uh?, this seems totally normal, a few clear AI images here and there but all those seem legi...

And then I remembered that I was on duckduckgo.


This is from 2023


I wonder: do all the HNers who are excited about their GenAI product or wrapper or startup understand, at a fundamental level, that they are an intrinsic part of this deterioration?

Or is this one of those fundamental attribution error things:

- MY product is a powerful tool for creators who wish to save time

- THEIR product is just a poorly-though-out slop generator

Does it occur to people to instead be part of something real and visceral, and not just blame social media's ad-driven impression model, not pretend they are only part of a trend for which they can't be totally blamed?


You have only had google image search for what, 20-years? Why do you think it is a fundamental part of humanity's growth story?

You talk about being a part of something "real and visceral" but you're complaining about the demise of being able to sit at your desk and see pictures of wildlife. Maybe it's okay that google image search dies and makes people go out and find the wildlife they want to see.

The internet, even in its best format (e.g. ad-free, free access information for all; and communication with all of humanity) has a ton of real downsides. It's not clear to me that AI should be strangled in its infancy to save the internet (which does _not_ exist in that "best" format).


>Maybe it's okay that google image search dies and makes people go out and find the wildlife they want to see.

I don't think that is what will happen if google images dies.


haha, no definitely not! The internet is mostly not "real and visceral" so losing parts of it to AI-generated nonsense IS a loss, just not a loss of the actual underlying thing (in this case: baby peacocks).


Unfortunately your comment is doing the same thing, just at a different level—something like this:

- I am a thoughtful technologist, building real things for real people, concerned about others and the social impact of my work;

- they are greedy and ignorant, destroying society for short-term personal gain, no matter what the consequences.

It's human nature to put badness on an abstract them, but we don't get anywhere that way. It's good for getting agreement (e.g. upvotes), because we all put ourselves in that sweet I bucket and participate in the down-with-them feeling. But it only leads to more of what everyone decries.


First off, no, it did absolutely not do the same thing. It was a polemic question, sure, but it was a specific criticism of a technology and its proponents.

I did not make any claims about myself at all, until I was separately accused of being something or other by someone projecting onto me whatever it was they needed to feel better about themselves.

Second, you have rate-limited me with the "posting too fast" thing so I couldn't reply to your comment or other ad hominem, even though I was posting at a rate no faster than the discussions about OpenSCAD and FreeCAD I had been involved with earlier (considerably less, I would say).

It's IMO really classless to use your administrative privileges to silence people after you accuse them of something but before they can respond, but I am not surprised to see that.

I will repeat again: I think it is really clear to me, and really to everyone I have me outside this bubble, that there is no fine distinction to be drawn between content generating AI projects that are "good" and those that are contributing to "slop". It's all slop-generation; e.g. NotebookLM is no better or cleverer than Midjourney.

Every tool HNers are excited about is going to be used to make the world's culture, and the web, worse.

I'd encourage you and those reading to consider this.

Sure, you can't make much of a change by yourself. But you don't have to be part of what amounts to inflicting automated cultural vandalism on an unprecedented scale.

Goodbye.


Sure but doesn't every technological development have these tradeoffs?

You could say what you say about anyone at any time. Where do you draw the line? I guarantee you'll be guilty of the exact same thing. I don't want to generalize, but IMO this sentiment of yours, I hear most loudly from software engineers far removed from ordinary non-technical end users: is making beautiful new LISPs and CNIs and Python package auditing tools the only valid work with seemingly no tradeoffs?


> I hear most loudly from software engineers far removed from ordinary non-technical end users

I am absolutely not far removed from non-technical end users. They are my client base, ultimately. As a freelancer I focus on building real things that make things better for people whose faces and voices I get to know. GenAI will be useless to them, because it is antithetical to what they do.

And that focus is only getting keener; I want nothing to do with the AI-generated web.


> They are my client base, ultimately... I focus on building real things that make things better for people... faces and voices I get to know.

So what I'm hearing is, "I agree very strongly with the people who pay me." Or to put it in your words:

"MY product is a powerful tool for creators who wish to save time."

"THEIR product is just a poorly-thought-out slop generator"


The problem with this line of reasoning is that things can get steadily worse and you'll never be allowed to say or do anything about it.

No, everything is not the same as everything else.


Every technical advancement has tradeoffs. Not every technical advancement has billions of dollars sloshing around doing absolutely nothing except making the web worse and further ruining the environment. What a shockingly bad-faith way to interpret GP's argument, wow.


The comment is an interesting but very cookie cutter sort of vamp and drama. The comment trades in a bunch of generalization, much like yours, and you know, generalization doesn't feel good when it directly attacks you.

I don't sincerely believe that people who are working on Kubernetes features or observability tools are bad people. Do high drama personalities who engage in a mode of discourse of "wow" and "shockingly" say valid things too? Yeah. But it's as simple as, log in your own eye before you worry about the thorns in others. Exceptionally ironic because the poster is vamping about "Attribution errors." Another POV is, shysters project.


There's a sort of "technological fundamental attribution error" that comes into play a lot with new technologies. Every past technology has, whatever its benefits to humanity, become substantially tarnished by abuse and malicious use. But this one won't be! Promise!

That said, I don't really think this is a tide any individual market actor can reasonably stem. It's going to require some pretty fundamental changes in the way we use the internet.


I propose a new rule. "Please respond to the actual actions and consequences of said actions, not what is said in a statement to generate positive PR. Assume putting one's money where one's mouth is, is harder to do than simply blow hot air about creating a private, ethical platform."

Sick and tired of giving parasites benefit of the doubt they've long sucked dry.


Was it ever any different with social media, surveillance advertising, SEO, NFTs, etc?


Did they care over the past 10 years where they decimated social lives, city night life, and humanity? https://sherwood.news/world/still-searching-for-that-connect...


Are you saying AI isn’t useful? My product is painstakingly crafted and uses AI but in my opinion it uses it tastefully and with great utility. Also 95%+ of my development efforts are not on improving the AI even though I use a .ai TLD. I think it’s crazy for a modern company/product _not_ to use AI, and the grifters building clear wrappers for GPT and other insanely low-quality efforts are already pretty much dead.


> Are you saying AI isn’t useful? My product is painstakingly crafted and uses AI but in my opinion it uses it tastefully and with great utility.

Sure. And THEIR products are just thoughtless slop generators.


They want money, they’re riding the hype wave, not advancing anything and I’m sure most or all know it.


"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

https://news.ycombinator.com/newsguidelines.html


Fair, sorry, thanks.


[flagged]


Luckily I genuinely don't care :-)


Honestly the amount of "The Internet is dead!!1!" comments in this thread is more depressing than TFA.


AI is the ultimate enshittification.


everything terrible about SEO, but now exponentially cheaper and faster to excrete.


You got downvoted because the people here can't handle the truth.


It's over. Time to turn of the computer and touch grass.


A bit tangential, but it's interesting to see comments like "we should start hosting our own websites". We were discussing it with my friends, and it seems like there was a significant changes to what is considered as "cool" in terms of social validation. I understand that I'm dumbing it down right now, but it's not just AI that contributes to it. It's definitely accelerating this feeling though.

In early 2010s when Instagram, Twitter, Facebook started getting big, all the websites and apps had this process of discovery that you had to go through to make it fun for yourself. It obviously turned some people off of it, and made the onboarding a bit harder, but you needed to follow some people, send some friend requests, and in the end you would mostly see things you've actively wanted to see. Even when the algorithms started sorting the timelines, it would still be (mostly) within the things you've chosen to see. Even Youtube's recommendation algorithm was pretty simple, and it would suggest extremely similar videos.

I think it changed around 2016, when the algorithms started trying to determine what you like, based on your interaction with other things, rather than your explicit action of saying "i want stuff from this person/channel/etc.". I'm sure a significant chunk of us have worked on similar algorithms, so you get the gist of it. But this change resulted in users getting attention from the global audience (because in order for algo to detect what you like, it has to throw in suggestions from everywhere).

I get that forums have existed for decades, and people were getting Reddit karma since 2000s, but it was still more deliberate action when you wanted to see something. TikTok, YouTube and Instagram changed the entire playing field in the last 6 years or so, where your real life "social score" didn't have to be depend on whom you know in real world for anyone. It translates into - you can generate posts, content, whatever you wanna call it, for everyone rather than actively getting someone's attention. Like, going viral on YouTube was a big thing at some point. There's some ongoing meme-like comments saying "you would be invited to Ellen's show in 2010", which is kinda true because breaking out of the "only seen by people whom you know" box was extremely rare.

Well, now, everyone, technically has a chance, which incentivizes people to constantly push out content. It doesn't matter, if you're doing it for just social media clout, or financial motives, and etc. It's just possible for something to go "big", albeit for minuscule benefits from it. So there's constant churn of... content. And now AI is just making it even simpler to create such content. But again, resulting in even further decrease of social importance of such pictures/videos/texts.

I understand there's always a group of people that "write/create/paint for themselves", which I understand. I'm on a similar boat. But the if majority of creators have different incentives, the platforms will cater for them. And in this case, platform is the whole Internet, and incentives are "financial, and seeking global attention". Right now, it takes about a minute to create a video and post it on any of the websites, which was basically impossible back in the day. That barrier of entry, combined with one's deliberate discovery what, I think, was making the internet look more fun.

I'm not touching the subject of ad-infestation in every corner, and it definitely accelerated the downward spiral of average quality of content. But in the end, I blame ourselves for choosing this path, because we could've put pressure on global-algorithms of YouTube, TikTok, and etc. We chose to not to do so, because, well, it still gives us dopamine hits.


Stam Eteg3329gu


[flagged]


I tried it and it was 95% relevant images. Not sure what you mean.


30% are mixed, 20% non-white in mine results.


In the near future, certain talking points we wish to discuss won’t be allowed by the downvote/flagging mafia, so we’ll link to Reddit instead while proclaiming how HN is so much better than those plebs over there.


Irrelevant to the topic at hand. Stealing people's attention for your pet issue is rude.


Wasn't Reddit run by the CIA at this point?


Is this showing the world your search bubble?


I'm actually a bit excited by this problem, believe it or not.

Like what solutions are we gonna come up with to solve it? Is the human side of the internet (however we create it) going to become more pure? Perhaps in discovering ways to avoid low quality AI content, we'll also find ways to escape from destructive recommender systems and monetized advertisements as well. Strange as it sounds, solving this problem could lead us to a much brighter future!


https://en.wikipedia.org/wiki/Red_Queen_hypothesis

(also noteworthy for the 'Publication' section near the bottom)


Would you be as excited if there were no viable solutions?


A handful will take this path. The big herd will not.


Well, nearly all of the Google Images results for "Woman" show a woman with makeup and additionally the photos were altered via Photoshop.

We have been creating our own reality even before AI.


Creating a version of reality is significantly different from conjuring abject falsehoods. There is an objective reality for what (e.g.) a baby peacock should look like, and this AI slop is inherently misleading about that.


His comparison isn't totally off. At some point our global perception of a certain subject might be totally different from what it is in reality just because all images about X are the optimal, AI improved, photoshopped version. This is in fact what women mean when they say that beauty standards are becoming unrealistic: Quite literally the standard image of women is being altered. Kind of similar to how the standard image of a baby peacock is being altered.


the incel to HN pipeline is now complete


that's not appropriate here


but sexism is? awesome.


Is this a joke? Right, women wear makeup, might as well AI generate every image we look for.


All AI image and video generators must be forced to add metadata and watermarks and all uploading technology (browsers, iphone & Android SDKs, etc and websites, apps, etc) need to publish/label AI Generated or not. Then search engines worth their salt can filter out the AI crap and boom we are back to how the Internet was or if you want to see the fake crap change the filter.


> All AI image and video generators must be forced to add metadata and watermarks and all uploading technology

This is already impossible because it's impossible to enforce. You can't stop something running on a random laptop, and you can't stop models running on server farms in, say, North Korea.


Those for profit can be forced to add watermarks/metadata and uploading tech (Google Chrome, Apple, Google Android, Firefox all what the public uses now) can be forced too.

If it cant verify the source it could label it suspect :-). Just thinking here ... you got any other ideas or we are just going to let the Internet die by the hands of AI as Neil DeGrasse Tyson predicts https://www.youtube.com/watch?v=SAuDmBYwLq4 or you just gonna downvote someone who tries to come up with solutions.


Might be simpler to have uploads/indexing requires metadata to not be categorized as trash.


yeah that's a good idea :)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: