Let's not let Google off the hook by blaming AI sludge though. Whenever I use Google in a context where I don't have an ad blocker I'm always surprised when every single result above the fold is a paid placement that's clearly not what I'm looking for. And Google has become awfully comfortable just ignoring terms in my queries.
Not that LLM sludge isn't also an issue, of course.
Same way Google has turned the other way when it comes to fraudulent ads on YouTube.
It's obvious that Elon Musk doesn't want to sell me crypto, how an automated system made by one of the largest technology companies in the world hasn't detected its fraud is alarming, and Google has the resources to also pay for manual review.
They just don't care that their system is being used to ruin people's lives
I saw an ad on YouTube once selling washing up tablets to kids to.... eat.
If you search for something on google it says it is returning "x of 82,000,000 results." at the top. But if you actually click through the results pages you'll find that it will only ever return 400 results max for any unchanging search string. If you have 10 results per page that's 39 pages. If you have 100 per page that's 4 pages. Of those 400 results at least half (near the start) are ads and SEO trash. So it is only possible to actually look at ~200 results per search with google. And that is why search is so useless these days. It's 900 instead of 400 for bing, but it's the same problem.
There are no free search engines anymore. AI spam doesn't really change the dynamic.
Search is now worse than it was in 1996, Trust me I was there kids :)
What you have now is a pale shadow of a real, functioning internet.
Perhaps the way forwards radically different.
If a consortium of independent crawlers could release a massively
compressed model periodically, say 12 times per year, then search
could move to the client side. Anyone got sensible estimates on what
the size of a model could be to give say about 80% of the capability
of Google/Bing?
I think it would weigh in smaller than 1TB, and with clever
differential code you'd only have to download the changes.
Isn't it time to move search of the web off the web?
Remember the "I'm feeling lucky" button on google. Their search was so good that you could, for a while at least, count on that button getting to where you wanted to go in a single click.
Yep. I remember when I used to be able to search for exact strings and engines would find them! Booleans even worked. It was a magical time for web search from ~1999 to 2015.
At universities all around the world you'll sit a course called
"Research Methods". It's a bit of statistics, philosophy of science,
hypothesis formulation, significance testing, understanding
epistemology, quality, quantity, bias.....
I've a fairly good overview of it, because I've taught it at least 10
semesters.
One of the things baked into every research methods course is
search. Sometimes you learn the interfaces to specific tools for
searching papers. But most of it is what you describe; boolean
operators, regular expressions, sorting and filtering...
The students are still told to use Google and that this works with
Google. But this information hasn't been useful for almost 5 years
now.
For 5 years we've been training bachelors, masters and PhD students to
go out there and use tools and techniques that are almost completely
irrelevant. The primary official tool at the foundation of all
academic research is broken.
Almost no research methods professors I know at any university have
though to train students to deal with advertising, spam, AI clutter,
disinformation - to deal with the reality of Google as it actually is.
For 5 years we've been misleading students. Because we got so hung-up
on a monopoly, we've allowed a single corporation to fuck the whole of
Western academic research - because year after year I definitely see
poorer results.
One day I hope the world is able to look back at the colossal cost of
BigTech to culture.
Come on that's a ridiculous criticism. Nobody* actually clicks through all those pages. People rarely even click to the second page. Instead they adjust their search terms.
The reason Google is starting to suck is that the first page results are often trash.
Does AI really make this much worse? Search engine quality was already low because of SEO spam, or other copycat scrapped sites. I mostly google using site:blabla.com to filter out that crap.
I am sure google will create special meta tags for sites to state their content is Ai generated and then perhaps utilising to improve ranking, ai generated content < "manual" content
Not that LLM sludge isn't also an issue, of course.