Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google is still about sources. That deserves respect. If this new wave of deep learning AI means sources are hard to find, that's unfair and dangerous because it's hard to validate and correct information. The downfall of using deep learning as a shortcut to the symbolic AI approach. A good AI search result should look like an annotated research paper, or a well curated Wikipedia article.


Google is about advertising. And SEO. The "sources" are generally what their algorithm considers to be 'good sources' but we have no indication that they have any validity. It remains to be seen whether AIs can somehow validate whether a source makes sense. "Reputable" at this point is irrelevant for anything except breaking news. Even in academia, citations dont mean much other than "this person knew how to promote their work well". We live in era of too many citations/references , and that has been detrimental. If we are going to be using machine learning from now on instead of 'reputation' , that is a good thing. At least we can know transparently what are the objectives the ML system is optimizing


Yes, and ultimately a search engine has to be a research tool, to find the best links. If there's a simple answer, it should be right there. That's what snippets were about, though they need development.

Once the search prompts becomes the end point for answers, and they never follow reference links, we're on to something else, something completely different. What happens to the web when that happens? Not many people are talking about that.

Some day I'll happily walk away from Google, but I respect that their model, through schema.org and being based on the open web, creates generally useful metadata. Sure, it's open to being gamed, which has had drastic negative effects, but that's not completely their fault, there's no easy answer to any of this.


A source is a witness. A witness to an experiment or an event. That's the truth we need to get back to


but basically, reference should be true.


> Google is still about sources. That deserves respect.

This isn’t true. They are well-known for taking information from the crawled pages and displaying it directly on the results page so people stop visiting the sources altogether.

This will only accelerate with LLM “search engines”, but let’s not pretend Google’s above doing the same thing. They happily do this themselves.

How Google eats a business whole:

https://theoutline.com/post/1399/how-google-ate-celebritynet...


100% agree.

I have a content website with some tables that I compiled from carefully scouring forums, taking my own measurements, and from people emailing me new info.

Well, Google just displays my tables in the search results, so no one needs to visit my site. This may be great for a user, but it's a real disincentive to create content if Google is just going to scrape it and display it.

Last time I looked, my site ranked #1 for the searches above, but a user would have to scroll through 3 screens worth of whatever else is on a search results page now to get to my link.


I think the proposition for that feature is fair. Some of them are also based on metadata people intentionally add to sites to be used this way. Source content is featured, people can choose to follow it. Everyone wins. In fact you pretty much have to follow the source, because by now we know the answer is not as simple or correct as it seems. Not sure how that translates into a completely synthetic summary that appears to be authorative.

Of course, it would be suicidal for Microsoft to just provide answers, they also want the advertising revenue. But do they expect it to come from the page sources, or something else, which adds its own distortion. Another dimension to examine.


Am I wrong to say that Google will cite the information so you can go to the original site?


While I get where you are coming from, when observing people around me they just want a query answered and don't care. We will see, I don't think the answer is a well curated wikipedia article because that's just too much information. The AI use-cases so far involve that you have a question you want an answer, the rest I think can be server by good old search, which will still exist.

All the benefits of deep learning (natural language understanding, context, engaging answers) make this ergonomically possible and not able to be replicated using symbolic AI. It's frankly irrelevant to the discussion right now and will at best imho only play some part in the overall process. If attributable sources are important, I think it's way likelier that a deep learning approach will be used. But I think for most queries answered by an AI this will not be important since they will be trivial and I expect for the rest you switch to the normal search behaviour to be able to trust your source.


In a larger sense, this is true, for now. But it's not a reasonable shortcut to supply information that is presented as correct but is in fact partially made up. Either there will be a well deserved public backlash, or in a pretty real way, reality will be irrecoverably warped.

It's pretty hard to imagine any answer that is useful, no matter how trivial, if it's wrong 20% of the time. Deep Learning is an amazing parlour trick that can be useful in the right circumstances, which basically means an expert has to be present to interpret its results.

I'm not convinced having a fully padded conversation is what people want. I find it annoying. Google was on the right path with snippets, they just need to make them easier to consume and emphasize the fact the search can be continued. I think Google is being very conservative, partially for safety reasons, but also because they won't take chances if they don't have to.

Of course, all these searches would be much more useful if they had more intimate knowledge of the searcher, but that's another dangerous path.


> It's pretty hard to imagine any answer that is useful, no matter how trivial, if it's wrong 20% of the time

But that means that the error is independent of the search term. That's not my experience. Of course you can't trust it 100% but I think it's pretty good for easy searches to a degree that I would trust it to be correct. You have to get used to the tool. If the accuracy is good enough for easy search terms for people to get some use out of it, then we have a product I think. Of course it's probabilistic but even wikipedia is wrong sometimes. These things won't be correct, it's always an approximation, but that's also a reality of live. Information is inherently unreliable and so are our tools.

Just view those tools as you would view humans. They help, they can be wrong, and if you need confidence you have to get some sources. At least that's how I would view them.

Wether this all works out is another question. But it's an approach different to search as it exists and may complement it. People thought Siri would revolutionise everything and then it didn't.


I agree with what you are staying, but any time it's presented as more than a research tool, maybe useful for the most trivial answers, it's a problem. It should be emphasized that it is based on sources and helps scour and fake-summarize them, but despite Microsoft calling it a "copilot for the web," that's not how it's being received. It would simply be better if every statement it made were referenced.


Google is about sources?

Google just delivers the most popular sites. Not sure what you mean when you say it's about sources.


You can see where any result Google gives you has come from. Ask it for the weather today and it'll say at the bottom that it's sourced from weather.com.

ChatGPT and the like are completely opaque.


In case you missed the demo, the bing GPT adds references to its results


And how did it pick the facts out of these sources and compile it into a conclusion? That is a black box you will simply have to either trust or second guess everything it replies with. The first choice is naivety and the second choice is arguably a poor product.

A more honest "knowledge engine" would be one that did not draw conclusions but simply replied with the links relevant to your question, letting you judge the quality and draw your own conclusions to not have the engine carry that burden itself.

And then you are back to a traditional search engine again.


> would be one that did not draw conclusions but simply replied with the links relevant

Thats neglecting the facts that humans lie and are opaque as well. You never know when a "reputable" person will stop being reputable. How ML vets its sources is going to be the relevant question going forward. While MLs are black boxes, their objectives and measures are not, and the companies should make them transparent. After all, it is only going to increase trust to their systems


Yes, but there's not always a direct line, it implies it has an answer then … The whole batteries in the ocean thing is what happens.


The Google-accessible web has become a small set of pages hyper optimized to serve the maximum of ads and meet arbitrary Google metrics. Many of which are user-hostile such as forcing long visits on pages, leading to burying information deep in a page, it have been added in an attempt to limit spam.

The "sources" for Google have become a competitor between spammers.

I fear what may happen once the world starts trying to game AI results towards their own commercial interests.

Spammers ruined Google, at least for my use cases. What will spammers do to AI?


It's a source of spam and plagiarized content


Are you talking about Google or OpenAI?


Google. I'm done with Google search and waiting for some new competition. My results are filled with spam and cloned fake sites. I barely use Google any more


What do you use?



In the context of LLM AIs vs. traditional search engines, I tried this one. The results were...hilarious to say the least.

> what is fc barcelona's latest result

> FC Barcelona's latest result is a 2-1 win against Real Betis in La Liga on February 5th, 2023

Half truths, whole lies. Barcelona's last game was in Feb 5 indeed but against Sevilla at a 3-0 scoreline [1]. The 2-1 win against Betis was in Feb 1 [2].

In fairness this escaped me. Pretty close to the truth. What tipped me off was my next question:

> who scored the goals

> FC Barcelona's goalscorers in their 2-1 win against Real Betis on February 5th, 2023 were Ángela Sosa, Rinsola Babajide and Natalia Montilla.

Bro, I've no idea who those people are and they certainly aren't FCB first team members.

(Unfortunately, ChatGPT is at capacity atm so I can't replicate the same experiment. Maybe Perplexity is just a poorly-written/trained bot.)

In contrast, I just have to type "fc barcelona" in Google to get their most recent results accurately; then click on those results to see who scored the goals, accurately once more. It even helps correlate the actual game date to the confusing "matchday" league taxonomy.

My point being, LLMs have been impressive in their current iteration but I don't think they will replace traditional search anytime soon. Maybe it will eat at the market share but I don't think this is an existential threat to Google. Even the fact that ChatGPT is at capacity at the moment says something about the real-world viability of LLMs as search engines. 2012 called and is asking if you are web-scale.

[1] https://www.eurosport.com/football/liga/2022-2023/live-fc-ba...

[2] https://www.eurosport.com/football/liga/2022-2023/live-real-...


Interesting! Looks really good at a glance. I wish it were open source. That's when someone like me would be happy with progress.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: