Hacker News new | past | comments | ask | show | jobs | submit login
Is AI the solution to search's problem? (lucaserb.substack.com)
21 points by lucaserb on Jan 25, 2023 | hide | past | favorite | 37 comments



Maybe I’m an outliner. But I’m not buying it.

To me, it misses to analyze and address a fundamental problem of why google “got so bad”.

I believe this is mainly a problem of the top 5%. There is reason why google presents their results like they do. It’s not like they’re a bunch of idiots not knowing their craft.

Yes, for those of use around the internet since 1996, the results got worse. But since then the entire population got online. They’re just fine with the top results. They’re good enough. Google’s machine knows and sees this in the data.

I am sure there is value in summarizing and querying the text. But this will increase the convenience not necessarily the quality of results.


I'm not buying this explanation.

Google isn't delivering information based on what's relevant to users, Google is funneling you into their business: to funnel you into someone else's business. And they're doing it not by being very useful, but by being the default you reach for on every product you buy. To set up your life to not use google is a big hassle, I can safely say this because I've done it.

This is not simply an issue of "your needs are simply more advanced than normies", this is an issue of "indexing the internet doesnt work, we can't win the cat and mouse game of SEO, so we will just ensure a userbase by controlling the environment around everyone." Sprinkle in a little bit of gaslighting and information control and you've got the current state of affairs with regard to internet search.


This is mainly an effect of Google being such a dominant player though. The sort of search results that are dominating Google is essentially a sort of local minimum in the fitness function.

It's pretty easy to create an search engine that outperforms google in dealing with SEO by valuing different things than Google. But as long as you have a system with a strong monopoly in search directing a huge swath of web traffic, there will be a sort of evolutionary pressure in the direction of what we're seeing on Google.

It's not a function of the sort of search enginge Google is, but its market position, and the fact that there is a metric shit-ton of money to be made in successfully adapting to its algorithms.


But this doesn't negate what I've said. Google no longer cares what results appear in front of you below the ads, unless those results are verboten topics, in which case they're hidden.

I use Marginalia reasonably often and quite like it BTW. It's a good tool. But there's a reason most of the popular alternatives to google are metasearch engines, it's because outperforming google is hard. So google threw in the towel, gave up competing on quality and now simply buys their way into every web connected product as the default interface to information to ensure hegemony. Marginalia succeeds at sifting through the SEO for the reasons you said, but then comes the problem of actual search, and even with Google being the big gas giant attracting all the garbage, nobody competes with peak google in quality, for a variety of reasons.


There is an aspect of cannibalization, Google can't really grow much more than they already have in the search niche and most other things they've tried (cloud, streaming video, social media, shopping, music) has competent competitors. The only way to present black numbers to the investors is to essentially milk the cow harder in a way that makes their offering worse.

I do agree that what has allowed Google to stick around is indeed that they're very well rounded. There are many competitors that do one or a few things better, but there isn't really a compelling replacement.

My point though is that most of Google's problems with "SEO" stem from their monopolistic position in the market. Their search engine is essentially a firehose of cash flowing into whatever ranks well. This creates a sort of dynamic where what ranks well gets more money, and is able to proliferate; and what ranks poorly dies off. This is essentially an evolutionary process. What is perceived as adversarial SEO is essentially Google's own shadow. It takes that shape it does because of the things they value.

Search engine spam is not just a flaw in Google's algorithms, it is a consequence of their position in the market.

Whatever replaces google, whether it's some sort of AI or Marginalia Search (not very likely) will have exactly the same problem as long as it exerts the same dominance over the web's traffic.


Agree. But not sure what you mean by the "top 5%".

Google have deliberately stopped trying to provide an index of the web. They now focus solely on revenue (advertising) maximization. A search will show paid sites, then popular sites, up to a few pages, and that's it. For example, search for "brioche buns" - you can't convince me that there are only 420 pages in the entire www that refer to "brioche buns" :(


It seems if a site is not selling something, and/or buying ads, and isn't in the most popular social media sites, Google just dumps it into the "No one knows what lies beyond this border" bin. Tons of forums, blogs, personal and informational sites, etc., all hidden somewhere in the fog beyond the 'edge of the world' and 'here be dragons' signs.


Sorry I was not more precise. I meant the top 5% of internet users. The most experienced / most demanding users.


>About 11,300,000 results

I get a lot more than 400.


Those results aren’t real. If you actually try to retrieve them, you will find out that Google stops delivering unique results after a few pages. (Other search engines are the same in this respect.)


Yes, they are real. It will stop ranking them after a few pages because the results become worse and it becomes too expensive to rank.


Honestly, 'google has gotten so bad', it is just a meme.

It is just not true.

Use any of the competing search engines for even a day and you'll come running back to Google. Believe me, I tried it. I really wanted to switch to Bing or the Duck, but they just don't compare. If the search results of Google are not to your liking it is because search is hard. Also login, if you are a super user, Google will catch on to this as well.


No it's not a meme but it's important to split this as a bi-modal thing.

For keyword based search (thinking index and crawlers e.g. 'motorized+camera+focus -autofocus') Google has objectively gotten worse since about 2016 (my personal theory is that it was when Giannandrea headed Search and started allowing ML to be used in the search team).

Search for the general population that is familiar with Natural Language queries i.e. 'what is that film where the guy keeps living the same day over and over?' That type of search is probably improved and is likely making Google a lot of money, which is why the keyword based search is degradating.

Also I think currently Search is putting too much enphasis on 'freshness' for technical queries. Most of the time I am looking for what is prior art on a particular topic, it is absolutely fine to show me a blog posted in 2004 by some geek in Slovakia.

The noise you hear online is that most technical people kind of rely on keyword and bolean operators to find what they are looking for and so are suffering the most from the business shift in Search.

Also not affilliated but definitely supportive... I wholeheartedly agree with other commenters, Kagi kicks ass in technical queries.


I think the key is appending the word 'reddit' at the end of every search. Come to think of it, this feels like the precursor to ChatGPT since ChatGPT often feels like it responds with a Reddit comment that has been awarded and voted to the top anyway.


Right, but the value in "site:reddit.com" is the human nature of the content: we trust that a human wrote the Reddit comment in good faith, and we trust that it was upvoted by people who are knowledgeable in that area of expertise. I fear we are losing that important element with ChatGPT; or worse, it will create a feedback loop where Reddit comments are "enhanced" with ChatGPT snippets and fed back into the system, and we're back at square one trying to find a way to cut through the noise.


Adding site:reddit.com to searches improves the chances you'll get a result that is not totally wrong or spam because reddit has a built-in moderation system (upvotes/downvotes). The community also has little tolerance for SEO clone and other junk sites.

It's not perfect -- there are definitely gamed topics and bullshitters -- but it's light years better than default Google. I don't think I've ever seen a clone of StackOverflow appear on an upvoted reddit post or comment while I see those sites every day on Google.

I agree with you on the risks of that feedback loop. I'm hopeful that humans will still be around to moderate information on forums.


> Use any of the competing search engines for even a day and you'll come running back to Google.

I haven't. I switched to ddg years ago and I haven't gone back since. I basically forgot that google search exists and that people still use it.


Since we’re sharing anecdata, here’s mine: Kagi delivers better results than Google for my daily programming-related queries and I haven’t looked back since switching 6 months ago.


No. Generative AI at least won't. ChatGPT is Borges Library of Babel, it produces any reasonably sounding sentence in existence if you ask it to, of course useful information by definition is somehow in there but so is everything else. If your problem is that too much crap is on the internet, well that crap is in ChatGPT, and then some.

Even if you only get fiction 20% of the time that's still absolute poison for a search engine that sounds completely authoritative. It's like having sand on your toothbrush.

The solution to abundance of bad info on the internet are two things. One, rudimentary search engine skills, i.e. filtering by source, keyword search and some basic literacy because all the good sources are still there and growing. Secondly, trusted sources curated by people that you have confidence in to produce high quality output.

The discourse increasingly reminds me of Alexa/"smart" home assistants which people thought would replace everything because the natural language interaction seems futuristic.


This isn't a solution to the search problem but a good option would be to add the ability to search by original index date. Google and DDG have operators for searching by date, but they are limited and seem to be looking for dates declared in the indexed documents, not on an internal date. One advantage to searching by original index date is that it cannot be manipulated by SEO's retroactively, so searches can be crafted to get results from before a time when SEOs started manipulating the topic you are interested in.


First the LLM needs to know what it doesn't know, and it looks like except for the explicit limitations embedded by the developers it isn't capable of doing that. When I asked ChatGPT about its own model variants, it gave me wild answers for things that made sense but didn't exist.

"""

There are several variants of GPT-3 model, each fine-tuned for specific tasks or industries. Here are a few examples:

- GPT-3 "davinci" is fine-tuned for creative and imaginative writing tasks, such as poetry, short stories, and song lyrics.

- GPT-3 "curie" is fine-tuned for conversational and dialog generation.

- GPT-3 "babbage" is fine-tuned for code generation and language understanding.

- GPT-3 "einstein" is fine-tuned for question answering tasks.

- GPT-3 "jules" is fine-tuned for summarization.

- GPT-3 "parliament" is fine-tuned for legal text generation.

- GPT-3 "bronte" is fine-tuned for creative writing tasks like storytelling and fiction-writing.

"""

I'd almost assume "parliament" it's a leak of some sort if any of the other examples (except for davinci) were accurate.


Another possibility is that these code names do exist internally within OpenAI and chatgpt is aware of them.

Ah, i guess that's what you meant by leak. I think some of the others are accurate - babbage in particular is very accurate!


AI is not the solution. The root issue here is Goodheart's law. SEO marketers are hacking rankings with mediocre content, this leads to lower returns on time invested for generating valuable content, so writers / creators divert their attention elsewhere. Kagi.ai is a good example of how google could be fixed.

Generative models dont know what they dont know, they will always struggle on new topics and worse still they wont provide feedback to the user to adjust their query as they will simply hallucinate details


I think the next major leap will be when it accurately recommends real world books, videos, excerpts, papers, and products that naturally flow from your interaction.


Anecdotally, I've been using ChatGPT to do the legwork of copying-and-pasting from Stack Overflow, and it's honestly really quite good. Have asked it to write maybe a dozen programs, even asked it why a particular line of code won't compile, and it does a really good job of explaining how things work.


Google search from 2010-2015 was a great tool. Google search today is a sad shadow of its former self.


Very interesting! And the bear-themed solution is cute! I would love to use such a tool to summarize and to ELI5 things like employment contracts or fine prints on a bank statement on the web. This tool might save folks a lot of time and energy.


I think this line of reasoning is probably correct: these language models aren't intelligence, they're just the next iteration on information lookup systems.


I totally read that as "Is AI a solution in search of a problem?" at first, but I guess that summarises the article pretty well too.


Radioactivity was also in search of a problem when we found it.


Which led to trailblazing innovations like radioactive toothpaste ;)

https://en.wikipedia.org/wiki/Doramad_Radioactive_Toothpaste


This solves the problem of pages full of ads and keywords and irrelevant content. It doesn't solve the problem that's talked about in the article - there are other solutions, usually alternative search engines (kagi[0], or if more AI=more better, the kagi labs with contextual answers with sources[1]. Not sponsored, just like using them. Similar service offered by Hey.com, but I didn't try them much).

[0]: https://kagi.com [1]: https://labs.kagi.com/ai/contextai


Using OpenAI to research and learn faster online


How reliable is the information that it returns?


sometimes I ask it if it's sure it completely changes its mind. you have to really scrutinize the response. this is to say it's not very reliable.


From my experience same like wiki - good place to start your research path, but not best to end it.


Ads will not go away. Wait until the free-tier model is fine-tuned to promote certain products in its output.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: