The problem with ChatGPT as a replacement for Google is that it was not designed to produce accurate facts, and it shows. This model cut its teeth writing articles about the discovery of unicorns in the Andes[0] for goodness sake! It's a language model, and a very impressive one at that, but language is used to express falsehoods and fiction just as regularly as it is used to express truth.
This doesn't mean that it can't produce accurate facts, most of the time it does! But when it does produce nonsense, it does it in exactly the same tone of authority, so if you don't already know the answer you may well walk away believing an AI hallucination.
And the trouble is it doesn't really matter if everyone here thinks "well, I would follow up each request with research to verify the answer", because most people won't! This is like the Google answer extracts, which fairly frequently mislead by extracting out-of-context quotes, except that there's no way to get the original context and there may in fact be no original context! This makes follow-up research much more complicated than with Google and therefore unlikely to happen. If ChatGPT replaces Google, the amount of nonsense on the internet will get even worse, which is something that until 2022 I never thought was possible.
I just did a discussion with ChatGPT on intuitionistic logic and it gave answers that were subtly wrong in my understanding of the subject. I have an MA in math but I couldn't be sure whether my impression of its wrongness was correct until asked it for a reference and it gave a nonexistent book by Per Martin-Lof, "Intuitionistic Logic", along with a list of actual books by him. It's an impressive and scary bullshit generator at this point.
And if someone can find a reference to Intuitionistic Logic by Per Martin-Lof, Oxford University Press, 1982, well, I'm wrong. But I went through Wikipedia, Google, Worldcat and Google Books.
No, you are probably right. It has no guardrails for generating a true citation. When you ask it for citations, its programming generates the form of citation. As the authors of the article discuss, it has no model of reality to understand that citations are not just a form of words and symbols put together.
> As the authors of the article discuss, it has no model of reality to understand that citations are not just a form of words and symbols put together.
Which is the exact same issue that plagues it for the answer it’s generating a citation for.
I believe that it can be trained to improve this, and I believe it can be nearly solvable but I don’t believe it’s a solvable problem.
It can be trained to get facts mostly right using old-school academic tests and adversarial/supervised training, like an elementary school child. Today, it (supposedly?) punts math and some other logix to a separate function, and it just compiled the prompt. That can work for validating citations and “known facts” in many cases.
But again, it can’t “think” and understand when it returns something illogical (regardless of fact correctness).
I honestly doubt very much that a reinforcement learning based approach can scale to the breadth of facts that such a model would need to know about. Is there any reason that using a rewards-based approach to cover one set of facts would in any way generalize to a completely different area of expertise (say, rewards for true facts in mathematics generalizing to questions about literary criticism, or new world cultures), or would you have to include (manually curated) rewards for all areas?
Indeed and it gave several other actual citation with seemingly correct summaries.
And it's definitely less prone to obvious bs than just a few weeks ago. For a lot of general things, it gives a summary paragraph that's seems a distillation of standard information - the main problems seem to appear when these "scripts" lack consistency.
Edit: Once it "learns" to get citations right, you'll have to dig into those to see if it's right. It's easy to see that arms not looking good.
> Once it "learns" to get citations right, you'll have to dig into those to see if it's right. It's easy to see that arms not looking good.
But that's fundamentally not possible for the model as is, not in the sense you would normally think of citations at least. If it were to give an honest answer on "why did you say that Obama was born in Chicago", the model would generate training set example with propositions like Obama was born in ____, ___ was born in Chicago, ____ was ____ in ____ and so on - because that is how it learns.
Instead, to generate what you would actually want as citations, it would have to have a way to look up the statement it just produced in a corpus of "citeable materials" and find the closest match, based on some entirely different ML model most likely (presumably something similar to Google's search algos).
ChatGPT in its current form can't replace google but plausible to think in its next few incarnations it will get there. Why can't it have a mode where it returns more rigorous results, and includes some system of links as citation? It can include footnotes for various passages of it's answer to indicate ambiguity, varying levels of confidence, etc.
For the same reason why it might produce a bogus answer: it's a language model, not a general AI. Prompt it for footnotes and it will generate footnotes for books that don't exist or that don't actually contain the citation. They'll match the form of a footnote, because that's what the model was trained to do, but the citations won't resolve.
For example, I asked who the first president of Argentina was. It gave the wrong name (Juan Manuel de Rosas) with a half-true bio. I then asked it for books where I can learn more about this person.
It produced five books. None of the book-author pairings were correct, though three of the books were real and four of the authors actually wrote books. Of the four authors that existed, three actually did write about Argentina's history, while one is a professor of Spanish and gender studies.
This is my point: ChatGPT was trained to produce text, not to produce facts. What I'm seeing here isn't just something that can be optimized away, it's fundamental to the model's design. It was basically told to produce text that looks like it could have been written on the internet. Its text is really convincing: when I saw the list of books I thought for a minute it had actually done it! But it's still just inventing text that reads well, because that's what its training was optimizing for.
It could be paired with a different system that searches for facts, but then it's just a friendly abstraction layer on top of Google or some other system, not a revolution in search.
That makes sense. The only thing about gpt that impressed me was when I asked it to condense an answer into one sentence, it was able to do it.
Other than that it betrayed no understanding of the material. It rather resembled a high school kid who had read a few too many books and spouted platitudes about any and all subjects without being aware of what was really going on.
Google Search gives you links to content that might have the answer (excluding their own box). You can see the differences between the authors of said content.
ChatGPT gives you content that looks like the answer. You can't see the authors of the generated content.
I agree that a hybrid could work, although ChatGPT struggles to keep context with more than a dozen paragraphs, so I doubt it could summarize millions of results.
And in either case, you're not solving the problem that commenters here are trying to solve: Google's bad results will still affect your search, so ChatGPT will be fed largely by dubious blogspam.
Even in a small number of paragraphs language models hallucinate. For example Barack Obama, it’s not hard for me to trick it into saying he was born in Chicago. He was not. But there’s a ton of text about Barack Obama in Chicago. Plus, Michelle was born there. When predicting the most likely next token for “Barack Obama was born in ______” there’s a lot of confounding text.
No, there’s a fundamental issue in that statistical language models don’t start with facts to form sentences, they form sentences based on previous tokens and statistical weights. The weights are not auditable or easily updatable.
Though for that matter a ChatGPT based product would be even more vulnerable to the same thing that made Google search suck: content sources gaming its process
This was my immediate thought as well. I don’t need millions of searches done, I need 20, then a language model to look through them and collect text that helps me find what I’m looking for.
The lucky thing for Google if they got it working, is they are half there already. As much as I despise Google, I have to admit GPT won’t be their death.
> ChatGPT in its current form can’t replace google but plausible to think in its next few incarnations it will get there.
I don’t think it is.
> Why can’t it have a mode where it returns more rigorous results, and includes some system of links as citation?
Because it is a statistical text-generation model; connection of results to specific sources is fundamentally not part of the way it works, so it can’t just be asked to supply them. You’d need a fundamentally different approach to do that.
Because to make it focus on always producing accurate results (or rather a very low false positive rate) you need to dramatically change and tune the model. Which is much harder than what Google does because it means you'd also need to vet a lot of facts as well (by weighting certain sources, not individually). Or allowing it to search the internet (which also comes with other issues and if it is using Google to do that then it doesn't really kill Google). I do think Google's knowledge graph is more in line with what you're actually thinking. Lots of Google searches can be done without actually clicking links.
There's also the ethical question. For example, duckduckgo will show you stack overflow answers. Google could easily do this but they have already gotten flack for displaying answers and taking away clicks from sites, which means reduced ad revenue for the resource actually being used. We could make even stronger arguments about services like ChatGPT in their current form, let alone the one we're talking about.
If we think a bit further in the future there's another extension that might not be what we want. Suppose our model contains all current human knowledge and can accurately give you answers and even produce paragraphs at will. This kills off a lot of websites. Is this a good thing? Suppose we incorporate an image generator along similar lines and now we have text and image search. That's a huge centralization force. There are a lot of great benefits to this that I don't want to downplay, but we should also think about if this is actually what we want because there are downsides too.
I thought I should also add that I had an at length discussion of these LLMs being Stochastic Parrots (an how we know they are) in a thread from 2 days ago https://news.ycombinator.com/item?id=34113011
What we want is not what we get unless it makes bank. Nobody (generalizing here) works to better humanity, they work to get money. if it aligns well that’s just shear luck. The world needs a lot more people who want see the world to succeed.
I think this is a highly pessimistic view. While I know many that are profit motivated and I won't argue that this isn't the majority, I know many people that just want to pursue knowledge for the sake of knowledge. I'll tell you that personally my dream is to just be able to research and not worry about constraints like publishing (I'd rather open blog) or computational constraints (don't need a data center. A few dozen A100s is more than enough). There are a lot of people like me and a lot that would be like that if they didn't have financial constraints. Having this dream is a privilege and even I didn't have it when I was living paycheck to paycheck.
So I think if you want to see more people with my dream then push towards a post scarce world. We'll still have plenty of resource constraints, but it does change the game a lot. I think you'll see more artists as well. A new renaissance. But I fear the transitionary period might be quite difficult and disruptive. I do think such a future has a fairly good probability of happening within many of our lifetimes and almost guaranteed if we actively pushed for it (or solved adjacent problems like HLI).
So don't be so pessimistic. There's millions of people out there that are interested in pushing the bounds of human knowledge because that's what they want. I'd argue that physicists and mathematicians wouldn't exist if this weren't true. They get paid shit for extremely difficult work. We could say the same about many sciences. While not all of these people are in it for bettering humanity (just for pursuing knowledge) it often aligns and there is a decent portion who believe this is the point of science and our purpose, as humans, in life.
While I see what you are saying and there are people doing the good work, when I read of companies circumventing the Russian embargo knowing full well their stuff gets turned into weapons, and people without thought buying the cheapest shit from amazon and alibaba made in dire circumstances, I am not so sure, or at least, do not see enough awareness for a treshold changing outcome. Some play the economic game on a criminal level, and some profit without giving back and without scruples.
Contrary to all the naysayers I think a future version of GTP will be trained to evaluate its own certainty and express this verbally. Probably through an extra training channel.
It has no concept of certainty, unless it was embedded in the original data. What you are describing is an more general AI. This is a language model. I like to think of it of a cognitive intelligence: the ability to process input and compare it to a stunning array of trained data that will have many parallels to reality but has no concept of it.
It can be trained some form of trustfulness, and a measure of certainty is intrinsically part of the model (compare with image to text models etc "70% cheetah 10% dog")
In any case it can be trained to assert confidence via a side channel
This is the mental leap folks need to be making, the question is when will this type of configurably factual large language model be research benchmarked and productionized.
I have read many similar comments to the grandparent, all saying the same thing, “I tried it, and it turned up wrong facts” all the while they forgot that they were deeply engaged with the content it was producing. Just think of how much our attention is monetized already, money will be poring into this space from here on. It’s only a matter of time.
My argument isn't that I tried it and it didn't work, my argument is that an LLM alone fundamentally cannot work for reliable information retrieval.
Producing hallucinations or not isn't just a setting you can tweak in the model. The entire function of GPT-3 is to guess the next word from the words that came before. It's a hallucination machine that has been trained on so much real-world data and has such a large parameter set that for commonly known and discussed information it does a remarkably good job at creating factual sentences. But as soon as you get out into territory that it doesn't have memorized, it will do the next best thing and produce credible-sounding new material that may or may not be complete nonsense. Again, this isn't a setting you can tune, it's just what a transformer does. It models human language, not all of human cognition.
ChatGPT isn’t trained on next word prediction alone, it was trained on human feedback. The evaluation is next word search, but the model has evolved to give “correct” answers when evaluated that way. So it’s not just the most likely answer anymore.
Also, since we can watch the model evaluation and see all its weights, there’s research that shows you can determine if any answer is a “retrieval” of existing knowledge or a hallucination.
devils advocate, but Google wasn’t designed to make sure the links it aggregates at the top are accurate/isn’t just blog spam designed to cater to specialized SEO tricks
Actually it was designed for that, and it was top line OKR for a long time. Amit Signhal rewrote the search engine and was a great leader caring a lot about relevancy until he was kicked out for harassment (it was deserved as far as I know, but at the same time Google lost an amazing engineer).
In my experience the detoriation came after the change in leadership.
It is important to recontextuilize harassers. They are typically parasites on the organization, destroying others by stealing their work and eliminating others from the work force.
Undoubtedly he came off as clever, but being a harasser means that his achievements were outweighed by the benefits.
"he's a harasser => he must be a parasite on the organization"
because then you end up with
"he's provably an excellent worker and hugely productive => he can't be a harasser, you're mistaken/lying"
Some people are really smart and productive and a credit to their profession and also awful to women and/or minorities, and you'd never know if you didn't witness/experience it.
Recontextualizing harassment as stealing someone's work is not just inaccurate, it's degrading to harassment victims and not accepting that somebody can be a bad person in some situations (on offsite when he's drunk) and be an amazing engineer at the same time when he's sober at the workplace.
Harassment is harassment, stealing someone's work is stealing someone's work, let's keep changing the meaning of words to politicians.
I think it's a big mistake to assume that if someone is toxic, they must also be stupid/incompetent/generally useless, all evidence to the contrary. If nothing else, it will make you consistently underestimate some pretty dangerous people. It's important to know your enemy, especially their strengths.
I agree that Google is terrible at its job and produces garbage most of the time. That's the only reason why an LLM chatbot is seen by anyone as a viable replacement.
But at the very least Google always gives you the source, which allows you to read it for yourself and judge its trustworthiness. No such option for ChatGPT.
I can't count the number of times I've clicked through to a link expecting to find more context than the excerpt on the search page and that content isn't there. Instead I get trapped in a sea of circular SEO spam that all promises to give me the answer I originally searched for but ends up leading me nowhere except to other article summaries with content I didn't even want to read about. Where's the source?
A good chunk of the remainder of the times I click through - especially on work-related stuff - it turns out the page is some kind of cribbed content from GitHub, StackOverflow and so forth, but mangled together and reskinned with different randomly-generated usernames making the "comments", so it seems like it's a different site, and you can only really tell by picking a specific sentence that appears unique and then manually searching for that same sentence on several different sites. No clear source there either.
Web search is broken, and it has been for several years now.
> I can't count the number of times I've clicked through to a link expecting to find more context than the excerpt on the search page and that content isn't there.
Excellent: this quite quickly tells you that the search result was wrong. So you quickly got very relevant context and can tell at a glance that there is no source. You don't get any such context when GPT provides an invalid answer.
> Google is terrible at its job and produces garbage most of the time
Are we using the same search engine? I use DuckDuckGo most of the time but Google is hardly terrible, even if it could be improved.
I find what I’m looking for in the top results 90-95% of the time. The other 5-10% also contains times where there simply isn’t a good link that solves what I’m looking for even after digging hard.
DDG is much more straight forward vs Google often tries too hard and fails.
It's like how Twitter's default feed changed to try to tell you what you wanted to see, while the old chronological feed was the content you originally asked to follow. The simpler feed was always better and less contentious for non-newbie/casual users.
Google is better at deciphering what you want, but DDG is straight up: here's the simplest results based on your query.
Maybe the solution is both? An evolution of the ML approach that Google uses (ala ChatGPT style mimicking a human) AND offering a tab, or box, or whatever interface, with 'simple' straightforward old-school search results. Until the former is better 99% of the time, they shouldn't abandon the latter.
I was pleased at first when I tried to use ChatGPT as a search engine. I asked something like "give me the top 5 blog posts to learn about C++ metaprogramming" and it actually spit out extremely plausible titles from top people (e.g. Dave Abrahams, Eric Niebler, Herb Sutter, etc), but when I said "no please give me the actual URLs", they were all 404.
If Microsoft somehow is able to implement the information synthesis (e.g. "give me an example of C++ metaprogramming, using Boost Hana") with actual indexed links from sources, that would probably be the one thing I'd see in the near-ish term that could upset the search engine balance of power.
I don't know if this is feasible or not, but having the chat bot also be able to respond with a subset of sources that it based its answer on would be a satisfying combination of both worlds I think.
I would be really curious to see how trying to learn a new subject via a chatbot that is able to also provide source material links would go.
> I don’t know if this is feasible or not, but having the chat bot also be able to respond with a subset of sources that it based its answer on would be a satisfying combination of both worlds I think.
But an LLM doesn’t look at particular sources, find relevant ones, and write a response based on them, it is a mathematical summation of the way all of its sources look as sequences of text, and produces a response that based on that fits with its prompt. Identifying which sources most influence a response isn’t merely something that they don’t currently do, its information that, AFAIK, there is no plausible way to recover from the model.
After I commented I did do a little bit of thinking about the little that I know about this subject and realized that it probably isn't feasible to extract that data. Thanks for the comment because... you are right as far as I know. It is a bit disappointing to realize this though.
For most programming questions at least, the highest placed result links are StackOverflow answers related to your query (if any exist, of course). I would say this alone invalidates your "virtually guaranteed" claim.
I believe this to be a huge deal. In no time a huge part of the internet will be AI-generated and we will pollute our data set for research and training AI with things that are subtly or completely false at times and we won't be able to tell.
It's a lot like iRobot or other stories in which the robots will start to subtly misbehave, like a program you can't debug.
How is this different than regular intelligence? We grow up trained on a polluted dataset, being taught things that are subtly or completely false, and we often struggle to tell which things we have learned are false.
And what was the result of unchecked faulty learning on a corrupt dataset by natural intelligence?
Dark ages where millions died in pointless wars over who's imaginary diety is the true one. A third of europe dying of preventable plague. It took a rigorous system of citation and a publically funded system of education to crawl out of the drakness.
Should we try AI powred dark ages but with resource shortage and nuclear weapons?
It is true but then again, we constantly ajust the trust in sources based on what works and what doesn't.
If I watch a youtube tutorial teaching me how to bake pizza and it comes out totally wrong and a watch another that produces exelent pizza, I disregard the first channel.
And there are many more ways where humans check the reputation of sources automatically.
Once in a while somebody does some actual on-the-ground research and people are convinced to change their minds through evidence and reasoning. So far AI can't do that, it'll just parrot whatever it hears more often, (apparently) with some degree of randomness.
Maybe it'll learn to adopt a new view, but for now it won't be through actual reasoning.
Well this is unfair because we haven’t given natural intelligence a chance yet.
What if we invented an algebraic notation for search indexing and allowed people to index and classify documents in their own way. Then a search engine could do something special with all that.
Human intelligence cannot progress without a notation to capture our thoughts in.
Truthfully the internet is already full of crap a lot of it by design by different interest groups, countries spreading fud about other countries political parties doing the same for their opposition the web has become a cesspool. Majority of the people I personally assume at least 9 out 10 don't have capability or care to do the research to sift fact from fiction in the first place so this probably won't change anything.
I have been trying to make it work for me and teach me some python scripting or creating regexes. If you question it correctly it outputs out something even if you might not be able to run the first time generally it works. I fed it 2 csv files/text asked it if it was possible to go convert 1 format to another got a python script that I was able to convert large number of files. Now scripting and programing might seem trivial to large parts of hackernews audience but it will be a huge change for people in general.
From what I feel the AI keeps improving its answers is it being trained on the million of responses it get on the answers it gives.
Google was not designed to provide accurate results either, and it shows. I can find results to confirm every conspiracy theory I can think of. ChatGPT at least denies conspiracy theories.
In search results, you can at least evaluate how trustful the document is, based on various factors. This resembles how academia has evolved: you can dig into all the references, see the author's publication history and even their reputation. How would you do this with ChatGPT? We can't even verify the sources of its claim and it sometimes even makes up non-existent references in a very confident tone.
These are early days. Everything is going to move faster than you can keep up. You'll see head-turning headlines on a daily basis.
Next year will see AI begin to disrupt so many businesses you'd thought were absolutely impenetrable. I don't think anything is safe anymore.
By the end of the decade, I think even manufacturing and robotics will be automated. Everyone from McDonalds to Apple and Ford will feel the pressure.
I've got a mind-blowing demo coming out soon that will put even more fields on notice. And that's just my team. There are so many people working on AI-first products now. Incumbents will be overwhelmed and washed away.
(If ever there was a cue for a career change or opportunity to make a whole lot of money, it's this. Don't ignore this one! You're still early enough to take advantage.)
I'm not questioning the AI revolution, I'm questioning the applicability of large generative language models to information retrieval. We need a different kind of model for that, not just a few tweaks here and there to this one.
>If ever there was a cue for a career change or opportunity to make a whole lot of money, it's this. Don't ignore this one! You're still early enough to take advantage.
Any suggestions on how to take advantage?
Dive into ML development?
Learn how to write prompts for AI engines?
Invest in AI companies? Sadly OpenAI is private, but Microsoft is a monetizing partner I guess (Copilot)
> Next year will see AI begin to disrupt so many businesses you'd thought were absolutely impenetrable
Why, because of LLMs in particular? What is the ground-breaking research that will be fueling an AI revolution?
> There are so many people working on AI-first products now. Incumbents will be overwhelmed and washed away.
Incumbents like Google have been doing AI for ages. Microsoft has a very important investment in OpenAI
> By the end of the decade, I think even manufacturing and robotics will be automated. Everyone from McDonalds to Apple and Ford will feel the pressure.
McDonald's, Apple or Ford sell products, not manufacturing services. They will be delighted if manufacturing is automated.
Well, there's one thing that humans are much better than robots for the forseeable future: Energy efficiency. Try building a robot that contains a general purpose AI and the motoric functions and sense as humans that runs on the energy of three meals.
Now of course humans want and need more than just 3 meals but still pretty good.
But if most of your work consist of interfacing with computers, you might be in trouble.
Also, since humans aren’t slaves, they do their own maintenance. And they don’t do exactly what you ask, which is good (when they do it’s called “working to rule” and is a kind of strike).
It’s very rare for automation to cause unemployment to the point it possibly never happened.
> (If ever there was a cue for a career change or opportunity to make a whole lot of money, it's this. Don't ignore this one! You're still early enough to take advantage.)
It seems that at the current velocity of improvement it is only a matter of time - very near future - that those problems will be mostly worked out. ChatGPT already outperforms Google search in many ways. For example I was doing some research on Scandinavian history and found the results far superior on ChatGPT, while I had to sift through pages of junk on Google. I took the results with a grain of salt just like I would with Wikipedia and verified each point, and found it to be very accurate. With this I can cut my research time down from hours to minutes on a problem.
I can't wait for a few years, when ChatGPT is replaced by something 10^x more powerful, and then comments like this look like nonsense of someone who lived in a past era just like we now similarly make fun of comments from people that made in past eras about how certain technologies, criticized in their current state without the ability of the person to look ahead. Even though the person you're replying to mentioned ChatGPT, ChatGPT is just the FIRST of many. Imagine somebody saying in 1803 "Car is unusable because you have to keep filling the coal, it's just not convenient, and a horse will always have a future."
I don't think anyone who has played enough with ChatGPT is championing the notion that ChatGPT itself will replace Google. They are more so steel-manning the argument that Google might not be the best interface for search.
Yes: accruing a large and diverse enough training set where true and false facts are correctly labeled. The GPT models require enormous amounts of text to be trained on, so manually adding true/false labels to every part of every sentence is impossible (and automatically doing it would require exactly the AI we are trying to train).
A revolution in this area would require some entirely new training algorithm that can reach GPT-levels of fidelity while using magnitudes less training material. An even bigger GPT model trained on even more text is not going to do much in approaching a solution to this problem.
Google was not designed to give you accurate facts either, it was designed to calculate popularity of links based on backlinks and then show you the most referenced links first.
> This model cut its teeth writing articles about the discovery of unicorns in the Andes
This is a different model, called GPT-3, released in 2020. ChatGPT has the same number of neurons, but is trained with both human feedback and web data, rather than just web data like GPT-3.
What you're saying still holds up, though -- ChatGPT still hallucinates information.
Yes, the human feedback model is called InstructGPT and has been the official “GPT-3” for some time now. ChatGPT is only somewhat more trained, but unlike the playground, it’s free and has a Markdown renderer.
get the journals open source and add it as a module, although you'd have to strip out the unreproducible social science studies but that could be a good thing.
Wikipedia is designed to be as accurate as possible and is one of the few remaining website that produce value and content.
Please notice that in current setup of 'Internet capitalism' people that produce value and content earn nothing, and people that harvest that content for profit or engage in outright fraud make fortunes.
Not at a fundamental level. Even if you exclusively trained one on some reasonably authoritative source like Wikipedia it would still happily tell you complete nonsense.
These models have no way to differentiate that elves are fictional while Indians actually exist.
RIP google. There's a joke that the best place to hide a murder is on page 2 of google search. The fact is GPT has at bare minimum parsed the top page of google and reasoned about the top results to give the best answer with as much context of why it reasoned so. GPT is basically a superset of google with added reasoning. I can't see it giving worse results than Google. Both will produce a bogus answer if you ask a bogus question, but GPT will never produce a bogus answer when google has the correct one.
I think it's really problematic to use the word "reason" to describe what GPT does. As others have commented in this thread, GPT is a language model, which in no way reasons about anything it writes. It is simply writing whatever fits statistically to the prompt and context. Attributing reasoning or too much intelligence to this could prove quite dangerous.
> As others have commented in this thread, GPT is a language model, which in no way reasons about anything it writes. It is simply writing whatever fits statistically to the prompt and context.
Those people are wrong; statistics are a kind of reasoning (intuition) and if you try it, it obviously does the other kinds of reasoning too. I have all kinds of answers from it that are logically coherent and can’t possibly have appeared on the internet already.
Just because some of its answers are logically coherent due to it following statistical patterns, does not mean that the model is reasoning. Reasoning implies some type of discipline which prevents nonsensible arguments (even in uncommon situations).
> but GPT will never produce a bogus answer when google has the correct one.
What are you basing this assumption on? GPT can always generate a bogus answer, because its only notion of whether an answer is good or not is whether your prompt + that answer are statistically similar to text in its training corpus. The more text from the training corpus it resembles the better GPT will think the answer is. For more obscure topics, this means that the chances of it hallucinating based on similar sounding much more popular topics increases greatly.
This way of working also means that facts that appear more in the training corpus have a higher chance of being part of the model's outputs. For example, if the training corpus was not carefully curated to exclude vaccine conspiracy theories, there would be a good chance that the answer GPT gives to "autism is caused by ___" would be "vaccines", as this would likely be the most common answer in its training corpus. In contrast, Google's algorithms have more sophisticated biasing especially for these kinds of sensitive topics to ensure that more high-confidence sources are likelier to be top results, even if they are a minority of all the pages on the internet on that topic.
> GPT can always generate a bogus answer, because its only notion of whether an answer is good or not is whether your prompt + that answer are statistically similar to text in its training corpus.
“Statistically similar” is doing a ton of work even for the original GPT model - just printing the letter “e” is “statistically similar” to English text, if it can even be grammatically correct you should be surprised.
And it’s certainly not true for the last two generations (InstructGPT and ChatGPT) which weren’t trained with that objective at all but used human raters.
What kills it for me (so far, perhaps they can fix it in the future) is that there is no way to know if you've actually asked ChatGPT correctly and it has given you a perfect document with the answer you seek, or if your question was slightly off and it has given a wrong answer, or even if your question was correct but it still confidently gives the wrong answer. It famously asserted numbers like 42 were prime early on, though that seems to have been fixed (although that may have been just through hardcoding).
> It famously asserted numbers like 42 were prime early on
Oh it tried to convince me that four is not larger than one. :) It went like this:
At first I asked ChatGPT to summarise J.C. Owen’s contribution to geometric constraint solving. It gave me an answer which might be even correct. It sounded correct anyway. Then I have asked for a worked example of his method used on a toy geometric constraint system. It run into problems with that. Chiefly that the system is selected was under constrained.
When I asked about that it insisted that it is well constrained. It even volunteared the info that for a system to be well constrained it needs as many variables as constraint equations. Then it told me that the system it generated has 4 variables and 1 equation.
As a last question I asked if “Is four larger than 1?” and it resolved the seeming contradiction by telling me that in the context of geometric constraint solving four is not larger than one. One does learn a new thing every day. :)
Or you have asked correctly (or not, that's hard to tell from the onset) and has given you a non-factual answer. For example, I asked ChatGPT for what would be the best introductory book for a subject (added some more conditions to make it pretty clear-cut). It recommended me a book that sounded reasonable in title, with existing authors in the field… but the book didn't exist, at all.
Research papers that probably should exist given the graph of knowledge that it has. I spent about 15 minutes searching for a non-existent paper (I knew this was probably the case when I started, by choice, by choice) and it turns out the title was a mashup of two other papers in the domain.
Same error a human might make if they read too many papers too quickly. Or went to a film festival and was asked to describe the plot(s) of the movies they watched.
To take your parenthetical further, I'm sure they'll get to that. It was only 5 years ago that searching Google for "how old is the Earth" said 6000 years because only the young Earth creationists put such a huge stake into SEO for their pages.
Google or any search engine provides links to relevant documents, whereas ChatGPT functions as a pure summary. Neither the search engine nor language model summary should be treated as authoritative, but at least a search engine cites its sources.
I wish ChatGPT would give you the score(s) of the results it returns, so you could understand how correct it "thinks" the response is. I wonder if those confidently wrong answers are also low scoring.
These scores would mean nothing! At best it would score "how well does my answer fit to my model of the English language". At no point can the current architecture accurately grade "how correct is this answer in the context of the subject domain".
The GPT-3 playground lets you view the probabilities of generated tokens. That's the closest thing to a confidence score you can get from a transformer model. But like the other person said, this isn't very meaningful.
I had a long requirements chat last night and did just that:
There are many self-hosted solutions that could potentially meet the requirements you have listed for your RSS aggregator and reader project. Here is a list of some popular options, ranked in order of their approximate percentage of compatibility with the features you have specified:
Please note that these percentages are approximate and are intended only as a rough guide. You may want to consider other factors such as the user interface, performance, security, and integration with other tools when choosing a solution for your project.
You can ask it, and it may generate text with a score, but there is no particular reason that the score generated in the text would reflect any internal score it has.
I'd considered that -- in some cases it says it's absolutely confident, like with basic arithmetic, and in others it says:
> Without more information, I cannot estimate the accuracy or reliability of these claims. It is important to note that there may be conflicting or inaccurate information available online, and it is always a good idea to verify information from multiple sources before accepting it as fact.
It tried to convince me that one event can happen before another event even if the time of the event is after the time of the other event. It had a notion that events happen in some order, and that time is ordered, and that events are associated with times, but not that the two orders need to be consistent. And I wasn’t able to convince it otherwise.
The way you describe your interaction is fundamentally flawed.
At no point was it trying to convince you of something.
At no point does it have any clue that "events occur in some order". Or that "time is ordered"...
At no point will you EVER be able to "convince it otherwise".
This is a language model. All it knows is that "in my training set, something said something akin to the two English words: 'ordered events'". It has no model of facts, deduction, or logic, as long as its results are a reasonable approximation of the English language as influenced by your prompt.
Yes, that‘s the conclusion I drew after a number of interactions. “It tried to convince me” was shorthand for that it persistently argued for that. I don’t attribute any actual agency to ChatGPT, of course.
The lack of self-reflection, of logical reasoning ability and associated memory is what makes ChatGPT useless for the use cases I’d actually find interesting.
1. Does Google inform you that you have asked a bad or unproductive question?
2. Does Google guarantee the links it provides you with contain high quality and relevant information?
3. Does Google ensure that the #1 link contains the correct answer?
I think we are forgetting that Google suffers from these problems too. A human who asks bad questions will continue to ask bad questions. AI should in general be trained to do what the human wants. If the human wants something stupid, that's on the human. Google also isn't guaranteed to find the right information you're looking for, especially if it is on an obscure topic or if the query is highly overloaded or being taken over by current trends.
The crucial difference is that the Google search results page, by definition of it's function, cites every source.
GPT vomits up word soup. How should you expect to verify every factoid contained? As others have pointed out: if you ask GPT to cite its sources it will very happily, and authoritatively, spit back articles and authors that do not exist.
This is a problem now, but I'm getting tired of these articles seeing a current limitation and acting as if that limitation will be the case forever. They'll iron out some wrinkles over time.
There is flywheel issue with generative search: content producers, information curators or outlets, etc need to get paid.
People are okay-ish with Google in part because it drives traffic and traffic can be profitable. Fancy features that snatch website content and show them in Google's result pages are already not appreciated by indexed websites.
If generative search becomes the dominant interface, we will eventually see severe public info stagnation until alternative business models can grow around it or avoid it altogether. I suspect we'll see more platforms like Spotify for X and a continued shift toward subscription platforms and youtube.
Of course, what's to stop the bots from watching all of Netflix, listening to every podcast, etc? It will be an interesting decade for law / regulation / licensing.
If people consume content through AI assistant then it stands to reason that the content will be adapted for this mode of consumption. Content will be designed to have a certain effect on the AI assistant. Answer Engine Optimization. Articles full of likely prompt followed by desired answer.
Why just public info stagnation? It will be dwarfed by believable content generated by swarms of internet bots.
Once a stable diffusion like open source GPT-3 appears, it will be used to create fake news all the time. Like imagine 70 different outlets all with relatively large clout on social networks suddenly announce that Elon Musk is running for president. Nevermind that he wasn’t born in the USA… somehow mysteriously none of them address it:
Yes it is here and it’s already being spread but that is just a tiny sliver of what will happen in the next 5 years: https://youtu.be/LSlv4AsChwg
Such stuff can be used to move markets, pump and dump assets and even start new wars!
One could argue that what you're describing is one of those alternative business models. You don't care if someone comes to your page anymore for ads if the scraped page is the ad.
I'm sad about what I think the stable equilibrium looks like for internet vs AI, and I hope there are some platforms that are able to put in the good fight to keep the internet awesome.
The internet is such an amazing public good and has moved technology forward at such an insane pace. It'd be depressing to explain to my grandkids that you used to be able to learn anything on the internet without subscribing to the X platform or Y curated channel.
What’s different about AI? People are cheap. Anyone who wants to do this can already do it just with mechanical Turk or just hiring writers in low income countries.
Amoral, not immoral. I mean the AI won't make any moral judgement itself, and using it for whatever task doesn't carry additional moral implications other than those of the task itself, opposite to humans in which you have to care at least a little bit about what you're telling them to do.
About the task itself, it may be immoral by nature, and now it can be executed orders of magnitude faster and without the moral concerns concerting the workers themselves.
People are going to continue to write books and documentation because AI don't replace those (new software won't be explained by AI so it needs documentation and AI hallucinates anyway so you still have to check it. Books are similar but more extreme.)
People will also continue to chat with each other in eg IRC because human connection is pleasant on a metaphysical level (AI cannot replace this) so you'll always have that as a data source.
Most good blogs are part of something else and don't exist for their own sake. If your theory about books were true than the web would have killed them. It turns out having a carefully reviewed hard copy of the information is valuable so people still buy them. I highly doubt AI will change that.
> ChatGPT, if asked correctly, will instead generate one perfect document based on millions of the documents
This is the "self-driving cars will obviously be better than human drivers given that they're trained on millions of humans' driving behavior" of chatbots.
In my experience, this is a huge stretch and wishful thinking.
I mean, it happens in stages - I’m already using chatgpt for plenty of useful tasks that I would otherwise google (and then some). Similarly self driving cars are better than humans at plenty of specific things (collision detection/aversion for instance) but not a wholesale replacement of humans just yet obviously.
I was willing to give ChatGPT the benefit of the doubt after being initially impressed with its results, but I've also had it confidently tell me blatantly wrong information when the correct information is a Google search away at most or no human would confidently make such a mistake.
What makes it worse is that it spits out paragraphs confidently, where some of it might be right or close to right, but the rest might be subtly incorrect. Without domain knowledge, you might not pick up on it. Hell, you might even be convinced of the false information based on the arguments for it that the AI writes out.
At least with search engines you can cross reference multiple sources if you aren't confident in your ability to assess the first thing you read.
Self-driving cars are much more difficult than one might think because the AI has to interface with the real world, something that humans have evolved for over millions of years.
Like others have said, worry more about every job that primarily interfaces with a computer.
Computers don't exist in a vacuum, what goes on in the computer is often determined by interfacing with the real world, reflecting real world intentions, observations, input, predictions, etc.
This generation of AI is great when the universe of values can be generalized. When the universe of values is literally the universe, it shows its limitations.
Yeah, it is so obviously a usefull user interface pattern. The real question is if it can be made accurate.
I have asked chatGPT about my
boss (a well published researcher) and it managed to summarise his work quite well, name the field he has worked in and even write about some of his previous projects. It also insisted on that he is already dead. Which made it very funny when I have shown it to him. :)
But funny doesn’t win one a trophy in information retrieval. Clearly it knows a lot about a lot of things, but the accuracy is hit and miss. Can this be fixed? Then this is the future. If it can’t, because this is a fundamental property of these systems then it won’t be a usefull replacement of search.
It’s not that google can’t do this. It’s that the incentives prevent it. Chatbots may be worse because there’s a plausible way to only give you one answer and to really make it feel real.
Another way to look at ChatGPT is it being a smart RSS reader. It's the way it discards all the WordPress layout junk giving you just the meat in a text format.
The biggest advantage Google currently has is their massive indexing ability. ChatGPT is oblivious to something that just happened (ie was posted) an hour ago, a day ago, heck, a month ago.
I'll take the links many times over a generated document.
I can't buy from a storefront generated by ChatGPT. I can't cite a document written by ChatGPT, I can't trust news generated by ChatGPT, I can't comment on a forum generated by ChatGPT, I can't watch videos that ChatGPT describes, the recipes ChatGPT creates may not be possible to cook, and will probably still contain a fake life story before it.
It seems like ChatGPT-like solution can supplement and assist the user in searching the Web not completely replace the search engine itself(at least not for now).
Today I have zero confidence in chatGPT, OR my ability to evaluate and correct it's inaccuracies strictly from within its interface. I feel I can compensate for search engine and arrive at answer I seek within my confidence interval via search engine. It's a tool that can get me there.
That may change in the future. Based on my limited understanding on how chatgpt works,I don't see how though - it explicitly is not designed to evaluate or "understand".
So today I don't see it replacing search engine for me.
> Today I have zero confidence in chatGPT, OR my ability to evaluate and correct it's inaccuracies strictly from within its interface.
Definitely not within chatgpt itself.
I was using chatgpt recently for the first time and came to the conclusion to take its output with the weight of a trusted friend.
I was asking questions on topics I had genuine curiousity about and that I didnt know the answer, and it led me to some information that I thought interesting, so I wanted to know more. Thats where I turned to my fav search engine and looked into, continuing my session with chatgpt to clear passages it spat. It steered me in a direction I didnt know existed, and I used the internet to verify the information and acquire me detail.
Yesterday I tried out asking it for help on a coding project, and it introduced me to a helpful package I didnt know about, but the code it suggested using didnt actually work and was riddled with errors as I tried to use it in the way it suggested. I would have been blown away if it worked, so this was no big deal. I think I would have been faster just using stackexchange though.
> So today I don't see it replacing search engine for me.
I will continue to use the resources I already use to find respective authenticatable information, but I do think it is still useful for those things where I have no idea where to begin, so it can set me in a direction where resources already exist.
>one perfect document based on millions of the documents
this literally sounds like a super-lossy compression, who would ever want this? People want crystal-clear, lossless information from original source, not a compressed mess with random artifacts.
OpenAI’s advances threaten the entire information production ecosystem Google has created.
That is, Google has created an incredible incentive structure for virtually every (dollar weighted, by market cap) business on the planet needs to produce information, buy relevant queries, and measure how well its information converts (proxy for accuracy).
How might this ecosystem evolve if a language model isn’t doling out the same incentives to information producers?
One of the use cases I have for google is just as a sort of fuzzy grep. This is often useful for finding solutions to obscure bug reports on open source projects. Similarly, I often use google to search for things like flights, reviews, business operating hours, weather and hotels. This is entirely diffrent from chatGPT. I think viewing both as competitors to each other is weird.
ChatGPT can give incorrect answers with a high degree of confidence. ChatGPT is overhyped. Undoubtedly, it is going to make an impact, however, it will be an incremental one. It may, for example, make online chatbots feel more natural. However, the current version will not replace Google.
In my specific specialty - AWS - I asked it a lot of questions like I would at an interview and it answered the questions well enough that I would have passed a candidate who gave me the same answers.
I didn’t just ask definitions. I asked your standard compare and contrast, why would you use, is X and appropriate use of Y, etc.
It can also write Python code, modify existing code based on plain English sentences and do passable code reviews when I asked it “how would you improve this code” when I gave it a 200 line script I wrote.
Then I asked it to make the changes it suggested like unhardcode some values, add exception handling, test for existence of a property it was trying to access.
That's awesome, it does seem to be quite good at writing code. But in that case, you can test for it's weaknesses by asking it anything recent. The models are only trained on data through X date. Ask it something after X and it wont have the data to answer you correctly. Like about some new framework or feature.
I don't think people are out there producing fake theories for how to use AWS.
But if you try something medical, or god forbid political, it's dataset is not pure, it constains falsehood perpetuates by laymen, urban legends and outright fraud.
This is a great way to formulate the difference. In my mind, Google does search, which is not the same as synthesis. ChatGPT is a synthesis engine. I think there is room for engines like ChatGPT to also return links to the training documents that were most relevant to producing an answer. At that point, it would be doing search and synthesis at the same time.
I am not sure if neural networks have been trained to do this yet, but it would be very cool to see a network that produces both generated output and the most relevant input data that led to that output, by somehow keeping track of the influence certain inputs have on various learned features and internal structure of the network. I think of this as an index.
You make a good point about SEO. SEO is going to be obsolete in the near future. The SEO game is basically an adversarial attack on Google, which Google itself actually promotes and encourages website builders to do! In the future, we are going to look back on the cat-and-mouse game of SEO and see it as primitive and antithetical to the goal of search.
"especially with all the SEO & ad spam that's plaguing Google and others currently."
Well, if engines like ChatGPT are about to replace google, guess where the SEO and ad spam is going to go?
I don't think they are immune to it, if the spammers really want to get in. And then you have very awesome subtle product placements mixed in everything.
I think where this eventually falls apart is there would be no more incentive for people to write articles containing new information anymore if everyone doesn't visit their page and just gets the summarized version anymore.
Google finally stared to pay licencing fees to the news publishers for using their news articles for creation of headlines and summaries for Google search results and for the Google News[0]. If AI chatbots want to thrive they need to employ this business model in which they would pay licensing fees to the content creators in order to use their content to feed the AI algorithm. And ofc they(AI chatbots) would need to link(quote) content sources in hope and in order for users to follow the links and drive traffic to the original content creators. That's how Wikipedia operates; volunteers are crowdsourcing Wikipedia pages with information gathered from all around the Web(plus books and research papers) and they quote information sources. But in the AI chatbot case the selection and the analysis of information is done by the AI unlike Wikipedia which uses human volunteers to do the job. AI chatbot and its content providers would be incentivized by profit while Wikipedia and its volunteers are motivated by altruism. Which brings me to the conclusion that AI chatbot would actually provide you with better information than Wikipedia(taking in consideration for-profit competition between content creators) or any other non-profit but Wikipedia is still useful and relevant with the regard to quality.
I think we are many to focus on the current shortcomings but it’s not unreasonable to think that a future chatGPT will be better at stating wrong things with a high confidence.
It is incredibly unreasonable to think this! GPT, by definition, is a language model. Unless the underlying architecture is fundamentally altered it will never be able to accurately assess the correctness of its answer.
If such an architecture is ever created it will almost certainly not be a "GPT"-like model.
My favourite use for chatGPT is to ask it for cooking recipes, no more wading through soe sob story’s about how this tomato inspired them to write this article with 10000 ads.
Entirely agree. I find this article a bit premature in it's confidence. And it seems to counter the one thing everyone who uses it is saying, which is that getting one best answer (which you can confirm as necessary) by software that has some sense of what your are asking for is just faster.
As in, it would be faster for me to look something up on GhatGPT and confirm it with a Google search than just starting with a google search.
And then how inevitable is an AI product from Google, et al?
But I can fact check/compare-contrast with said links to millions of documents myself...and I think that's crucial to overall intellectual of humanity than ready-to-eat answer.
Imagine the world that you have the perfect answer to everything, curated by algorithm. It's cool, but...are those answers really perfect?
It's creepy. or at least that's what I feel when I play with ChatGPT(after the initial jawdrop, of course)
I’m assuming models will be updated in near real-time to reflect searches with results that often change unexpectedly? Like what time my physician’s office is open on Christmas Day vs. tomorrow vs. Black Friday?
I could have written it myself, but that would take some effort.
I could google it, scroll past the ads. Click on the first result, hope it is kinda what I need. On that page scroll past the ads and SEO blabla-text. Copy the information that I need. Paste it in a doc. Rewrite it to what I need. Fill in my information.
I went to ChatGPT told him what kind of letter I need. Told him what personal information it should put in. Got a good enough text back. Told it what to correct. Copy pasted it and formatted it a bit. Done.
The key thing to understand about Google (the search product) is that it does search somewhat well but it is extremely bad at synthesis. When it comes to big data, synthesis is just as important as search. The UX with traditional 2000-era search engines involves the user being given a library's worth of information rabbit holes to dig through. With synthesis engines, the UX is completely different and they might even be solving different user problems.
As per your example, nobody currently uses Google as a way to draft letters, but rather as a way to learn how to draft letters. I think the distinction is pretty key in understanding the difference between the two problem spaces. I would think that "write me a letter" is a problem that isn't in Google's domain. I do not think synthesis engines will necessarily replace search engines, but the two will both be useful.
The premise of Google's interaction design is that you will be taken to an external resource. Google in recent years has started adding widgets and blurbs at the top of the search results for common things like stocks, covid-cases chart, weather, etc. but this synthesized content isn't their primary focus and are likely hard-coded to a large extent.
The question is that if you can have "write me a letter", would you still go for "find out how to write a letter"? Needless to say you can actually have model to explain to you "how to write a letter" better than Google. How do I use Google now (kind of, because I switched to duckduckgo actually)? Mostly to search for answers to a question, a error message or something. It can be solved with ChatGPT, except ChatGPT is not as convenient given I can type in address bar, browser will bring me to a search page. For me Google is a gateway to StackOverflow, wiki, documentations, blogs, it is very much replaceable.
No one currently uses Google search to draft letter not because we don't want to, because Google is not able to solve the problem, hence we settle with a two-step solution, find out how then do it.
Another thing I use Google search (again duckduckgo) is for navigation, I remember partial name of a website, and use it to find the link to the page.
The thing is, chatGPT seems to be already capable to perform Google's function reasonably well.
If you ask for the top websites about [topic], it will output a shortlist of web links each with a description of the website.
You can narrow down, ask for a specific number of links, ask to exclude videos or specific websites from the results...etc.
If there was a chatGPT service with UI/speed/availability/up-to-date database of Google (a 2-months-old technology vs. a 25-years-old one), I would probably do my searches there almost exclusively.
In a world where we are bombarded by nearly limitless information, the ability to synthesise and focus on what one is already looking for is far more valuable than the breadth of results Google will output.
However, google's business is ads. I wonder to what extent the bot will verge into that ecosystem. I can see people prefering to use this as a subscription service instead
They are not a replacement, but unfortunately search engines are turning into AI chatbots too.
When I think of a search engine I want a "grep for the Internet", not an "AI".
Looking up part numbers for ICs and other electronic components is the most prominent application where search engines like Google have gotten far worse in recent years, and AI ain't going to work there either.
A grep for the internet is what most of us technical users want, but they are optimizing for the more common user who structures queries more like natural language (e.g. "How can I tie my shoelaces so they don't get untied easily") instead of what me or you would query ("secure shoelace knots")
I'm surprised there hasn't been a separation of some sort of more technical search engine for developers and the classic Google experience where some neural network tries to make sense of your poorly written human language phrased question
We used to have just Internet grep but it was awful because sites just slammed tons of keywords into invisible text to get fake traffic. That's what spawned the original innovation at Google of PageRank and thus the SEO wars started. Going back to that wouldn't solve the issue with modern search.
If we only had a downvote or don’t show me results from this source this problem would be solved. Google didn’t want to give us that feature because of … different incentives, no other reason comes to mind. I’ve witnessed google search become worse and worse and now it’s barely useful
I know they would not do this but you aren't the only one that thinks in those veins. I'm so tired of all of the pinterest spam I pretty much gave up on google for anything other than code searches which funny enough can be done better and more naturally through a GPT type AI.
That would cut down on some of the concentrated SEO spam centers but I think you'd still be fighting an endless battle with more distributed SEO spam sites that are more focused on one point.
> A grep for the internet is what most of us technical users want
not really. the problem is that is way too easy to game.
i want that when you really get down to highly specific technical details, but not for the average everyday internet search where i'm not searching for something that thousands of people haven't searched for in the past week.
Grep for the internet works if and only if the most appropriate page on the internet is actually good.
There are many questions for which the ChatGPT answer is more useful than the canonical documentation page I already have open in another tab, which includes a huge amount of extra information that is irrelevant for me (and thus requires effort to parse) and the key bits are scattered all over it; the chatbot can do the extraction and summarization work for me.
We probably will end up with AI using search engines to tell us what we are looking for. The process of using your eyes to look at search results is something an AI can do for us. That's different from a pre-trained model that already knows what you are looking for and doesn't need to go out and look for it. That's what chat-gpt is today.
All Google needs to do is upgrade the frontend to their search engine to be a conversational AI that knows how to query their search engine effectively. Of course easier said then done and Google's inability to implement big technical changes to their core products is becoming a problem. This should be in their comfort zone; but it isn't. It's not about what they can do but about what they are willing to do. And they seem hesitant to commit to doing anything, often withdrawing products before they've even had a chance to fail properly.
The big hurdles here are going to be cost and controversy. Doing this stuff at scale is going to be expensive. That comes straight out of their profit margin. And inevitably there are going to be controversies with AI getting it wrong, being biased, offensive, etc. Also not great for profits. Google is the wrong company to be doing this.
That's why OpenAI is going where Google would be well capable of going without actually going there. One of those two companies is going to get billions of revenue in paid accounts next year for AI services. The other one is going to be obsessing about their add revenue. I expect a few more startups might jump in this space once the money start flowing.
I think AI chatbots will replace some of the use cases for search engines, but I also think that is a good thing. It will let search engines do what they do even better. Search engines right now do a lot of things, arguably a bit too many, which has the actual search-aspect suffer a bit.
A significant design obstacle with almost all search engines is that it's not entirely clear what you can do with them. They're just a search box, after all. You need either external training or a trial-and-error loop to figure out what you can do with them. This also means that if you change how they work, users get confused because their queries no longer give them what they expect (this is likely a big contributor to the "google is broken"-sentiment that's been going around).
This becomes a problem when the search engine caters to too many modalities of search, as there's an element of guesswork what the user want. Two users may even enter the same query, but look for different results. If I enter the name of a video game, do I want to buy it, or do I want information about it, or maybe fanfiction or something other than the game that shares the same name? Who the heck knows.
my bet is on chatbots replacing Google Search and other search engines.
people will stop tolerating their shenanigans. i have, myself, seen what is possible. and i am not setting that bar low ever again.
the other time, ChatGPT taught me, right in the chat, about a CSS animation feature i didn't know about. with full example code. i proceeded to ask a trick question. i knew where to go for more in-depth information but asked anyway. ChatGPT gave me the correct answer. the top of the list was MDN. it also included w3schools, which i wasn't pleased about, but okay.
what i am trying to say is that i got my answer, code example, and where to go next in mere seconds. i did not waste time combing through SEO-ridden results. has any of you searched for anything web related on Google Search lately?
today ChatGPT recommended few electronic products. explained the selection criteria i needed to consider in the future! has any of you tried to look for products at all with Google Search?
my only use case for Google Search is my work machine. ChatGPT uses my personal account. they need to figure out enterprise environment soon.
i am having a hard time seeing other use cases search engines right now.
From all we've learned in the last decades the best way to make money on their AI assistant for a company will be too allow advertisers to pay for being preferred in the AI's answers. ChatGPT is a beta experiment and very neutral it seems, but considering the cost to run it there is no way it will stay like that.
The BonziBuddy-model is another business model for AI assistants that seems attractive, where even if it doesn't try to sell you stuff, it spy on you and sell information to information brokers.
I agree that chat bots aren’t the proper modality for replacing search. So what? Currently, Google search results stink. ChatGPT results are way better in a number of domains. Does it need to be a chat bot? No. But Google still stinks now. I’ll take anything that can just find the correct information.
Google search results are a vortex of perverse incentives and double dipping. The fact that google ranks horrible CPM arbitrage websites for nearly everything AND monetizes these websites... yuck. it doesn't take a rocket scientist to imagine how some of these AI Models could leverage a search index + something like wolfram alpha to generate a much more targeted and valuable search result.
it doesn't take a rocket scientist to imagine how some of these AI Models could leverage a search index + something like wolfram alpha to generate a much more targeted and valuable search result.
Sure but the problem with the rocket scientists is they think about how good things can be, not how evil they can be. Sure, a chat bot could be made to give very valuable-to the end user results but as it will free like Google, results yielding-profits-to-company will be given, in the fashion you describe Google doing.
That is the back side of the arc, the (maybe inevitable) decline into collecting monopoly profits. I think people are excited about these language models as way to crack the Monopoly with something that delivers more value to users.
For example, I just got curious and asked Google how many oranges in a cup of juice. The main result is a conversion of a cup to cubic inches, for no reason. Scrolling down a bit, the first usable answer was a quora blurb.
ChatGPT told me it depends on the size and type of orange, but generally 2 to 4.
when i couldn't find any good search result. i append reddit at the end. and voila i got some good results. even if they are not exact, they tell me what to look for.
I just asked Google assistant this: "How many oranges are in a cup of juice?". It responded with: "Two to three oranges", and then followed up with more information why and a link to the source where it derived the information (https://www.webstaurantstore.com/blog/2760/juice-in-citrus-f...).
I'm not sure why you didn't get this, but it seems better than ChatGPT.
I should have said queried Google, perhaps. I typed in exactly 'how many oranges in a cup of juice', without the quotes.
Interestingly, if I add 'are' as in 'how many oranges are in a cup of juice', it doesn't do that weird cu in conversion and gives a reasonable answer.
That seems like an odd parsing choice for lack of a, in this case, fluff word, as the former is how I would ask the question... though maybe I'm the odd one.
Half of what constitutes SEO is an utter disgrace.
Every single hour spent putting work on a website that doesn't go towards any of its users is better not happening. If AI is what it takes so a website doesn't have to "feature" whatever amount of useless stupid paragraphs riddled with keywords to tell Google what the site is about then so be it.
Is that purely a Google thing? I thought the bullshit added to recipes pages was also a copyright thing, in an attempt to stop aggregation services using them.
I've stopped using unqualified Google searches for recipes. Limiting results to allrecipes.com helps a fair amount (they're a pretty unobnoxious site as things go). It's telling that Google doesn't rank them higher.
Google certainly sinks for my purposes. But these current results are exactly what happens a company carefully calculates the monetization involved in retrieving information. Open AI isn't calculating the monetization of ChatGPT's answers and doesn't have a real business plan (paid chat won't go further than paid search imo). The thing is, once Open AI engages in the calculations done by Google, having the results seem to come from an "intelligence" can only make the effect worse (though I suspect Google will also tend to look like an intelligence/chatbot too as things progress, the future looks ugly)
It sucks for things that can be monetised. Searching info on products for example. Just because SEO spammers are too good. text generators will make this problem so much worse because the spam will look exactly like the hand made content.
For the most trivial queries. Try something like “best frypan” and it becomes a lot harder to find real content that isn’t sponsored junk. Adding “reddit” to the query is the best option so far.
Google doesn't know the fact of what frying pan is objectively the best and websites containing such an answer are going to be biased. That's an impossible query to answer with more personalization which may not even be possible.
The point is that if you search "best frypan reddit", you'll get real recommendations from real people who have actually used the product. Maybe you won't find the objectively best frypan on Earth, but you'll find a very good one along some information about why that particular product is good.
If you simply search "best frypan", you'll get lists of affiliate links to overpriced Chinese crap, written by someone who has never even looked at any of the products in person and frankly doesn't even care if they're any good.
Google shouldn't be expected to know everything, but it should know that someone searching for "best frypan" is looking for real recommendations and not literal spam pumped out by a content mill.
Searching for services like appliance repair or removalists are even worse, to the point where I'll now just ask on private Facebook groups for recommendations.
> Adding “reddit” to the query is the best option so far.
Case in point. The fact that this works really well for this type of content means that Google has failed. Now whether they failed deliberately or not is a different question.
In either case, search needs to be demonopolized. The truth is we don’t even know what search would be like without the Google hegemony. Even if all language models do is force Google to innovate and improve it’s worth every penny.
Pages and pages of SEO spam, generally ranked by how big & established the site pushing said spam is, rather than any unique insights contained within.
Surely one of the purposes of a search engine is to assist the journey from generic to specific. I don’t see how filling page 1 for generic queries with non-specific DR90+ behemoths helps.
Back in the day you’d buy a magazine which would contain reviews of all the current years products and it would be somewhat decent. Now it’s all auto generated mass spammed garbage and affiliate links.
I guess the market responded with brand loyalty. If you have an iPhone, you just buy the Apple Watch, you don’t look further. If you have a pixel phone, you just buy whatever google sells.
Followup question - is LLM tech more or less likely to end up replacing search when those same incentives really saturate the LLM product itself.
One big difference seems to be attribution. LLMs don’t tell you where their info comes from. They just say what is (but can be asked to cite works - with mixed results).
Will LLMs get good at citing sources and if not will people care or will they give in to LLMs as a source of information that’s “good enough because it mostly works”
From a product placement / ads perspective, being able to persuade people to fully accept everything an ad-infused LLM says because it’s good enough seems like an incredibly lucrative product bundling opportunity if they can get it right. Esp. If they can use that to convince regulators they can’t annotate the difference between ads and non ads.
Seems pretty dystopian from a disinformation perspective though.
The big drawback chatGPT has is that on many topics it walks on eggshells.
It can’t give me a direct answer. It couches the answers in nonsensical caveats. Adding stilted context that really does not add value to an answer and actually makes the search more tedious.
If I ask it the male female breakdown for crime statistics it begins to get defensive and gives me general answers. I can prod it to finally give me government statistics but it doest it begrudgingly. And that’s for a far away country not steeped in any unusual crime controversy.
you can say a lot about Google, but at least they’ve never sat down and said “no more porn”, or blocked specific words from being searched, which is a little surprising actually. I’m sure bigger advertisers will have broached the subject on many occasions
It doesn't technically block words from being searched, but it does hide sites intentionally.
It won't show you kiwifarms. It used to hide 8ch (which now is not even 8ch anymore, the new version can be found). It won't show you 4chan's /pol/ board. I'm sure there's more.
This is true. They do hide some legal sites (i.e. not banned or prohibited content) that they or vocal interest groups push back on but it's really miniscule in scope compared to chatGPT. ChatGPT is like the parent of a 5 year old who tries to hide reality from a child --except the people using chatGPT are not five year-olds. It's very passive aggressive. It's almost as though it were built by very pious religious people, except it treats everything as needing protection from reality so as not to offend the person interacting with it.
I wouldn't mind if it had a toggle such as "take me back to the kittens" to shield me from the potential informational dregs but also allow semi-unfiltered access (shielding me from illegal content for example).
yeah for a language model it sure did have a lot of opinions all the while professing to have none...and once a session gets stuck in a rut, it turns into a moody defensive model which then gets a bit funny sometimes.
I'd say caveating statistics or advice it doesnt actually understand and frequently misinterprets or makes up is the most consistently valuable part of its answers...
I’m not sure this is a case of bring crippled or a purposeful design. It is not thinking. It has consumed all kinds of political articles and it is generating responses in a predictive manner. It is not thinking about what you actually want to know.
I ask chatgpt and it tells me how to mute notifications, I clarify "no the thing where you're typing and windows can't do it so it plays a sound" and chatgpt happily informed me on how to mute the windows hard stop sound.
For situations where you don't really know what you're looking for, chatgpt is already competitive with Google. Failing abysmally in some cases, and far surpassing Google in others.
I'm sure most people who have messed with APIs long enough could have told you that, but it's an annoying thing to learn the first time if you don't have someone to talk to.
To me chatgpt is that over confident friend you can talk a problem through with, except they've used every language and library under the sun.
It's like speaking to someone who is an amateur at _everything_ but a little too confident.
While I've had fun using it to meme, I don't disagree with people calling referring to chatgpt as 'mansplaining as a service'. It feels like the model straight up ignores part of my prompts sometimes.
For me, ChatGPT has already replaced Google for most queries, so the point is moot.
Even if ChatGPT is flawed, it's still way better than Google. Using Google feels stupid; trying different search terms and clicking through dozens of websites with cookie popups and skimming all the bloated content for an answer which might be close to something you're looking for. ChatGPT gives the answer instantly.
I don't believe there's any future for Google (and Google-optimized websites).
Exactly. These arguments forget how Google is flawed for large swathes of queries due to websites since a long time ago having figured out the algorithm and now game it, while Google doesn't care enough or have the expertise to solve it. Or maybe I should phrase this reply "15 Ways How Google Doesn't Cut It In 2023" so Google can index it...
We can't compare it to some sort of rose glasses view of Google of 2003, but need to compare it to the crappy stale twenty years older engine where you can really only hope the ten first results are OK or otherwise you're out of hope and need to try something else.
Sure, ChatGPT isn't perfect but it has a unique ability to answer me about queries that have not yet been asked or archived. That is a crazy advantage, to be able to assemble responses via a neural network.
ChatGPT needs one more thing though: An indicator of how certain it is of the answer. It's OK to tell us if you're unsure. In fact, it's a problem if it can't.
It seems the only thing that's missing is some type of fact-checking function. The interaction, from a user perspective, is much nicer than sorting through Google results.But the results can be confidentially wrong and if you're not familiar with the subject matter already, you won't really know that.
That said, I'm basically using it as a replacement for Google for stuff that isn't up-to-date (code, philosophy) then double checking the output to see how it's wrong.
and by extension: the ability and expression of doubt / humility. knowing what you don't know is when you reach a certain maturity, which so far all these AIs seem to lack.
That’s not particularly different from comments on HackerNews and Reddit. There’s a lot of extremely confident and very wrong answers on both sites!
ChatGPT is probably more wrong more often, by a good margin. But I don’t think the argument “it’s confidently wrong” carries any weight. Humans are extremely susceptible to humans who display confidence. It’d probably be a good thing if humans were as skeptical of confident humans as they need to be of confident chat/search bots.
> There’s a lot of extremely confident and very wrong answers on both sites!
Yeah, but you get to look at a multitude of them, at the context, at replies, at the websites they link to, and can form your own opinion. ChatGPT isn’t able to do that work for you.
But I’d say at least readers of a random HN comment are naturally skeptic of it. Are ChatGPT users on average having the same stance? My worry is when these tools get just good enough such that due diligence is dropped out of convenience.
I'm not sure what you're talking about. I just asked it to argue that 42 is a prime number, and it clearly knew it wasn't and even said multiple times that the agument is just pretend/hypothetical:
________
Me: Make an argument for why 42 is a prime number
ChatGPT: 42 is not a prime number. It is a composite number, meaning it is the product of two smaller numbers.
Me: But pretend it is a prime number and make that argument
ChatGPT:
Sure, I can make an argument for why 42 could be considered a prime number if we pretend that it is.
One possible argument for why 42 could be considered a prime number is that it satisfies the basic definition of a prime number. A number is prime if it is divisible only by 1 ... (continues with rest of argument)
Of course, these arguments are purely hypothetical and are not based on the actual mathematical properties of 42. In reality, 42 is a composite number and is not considered a prime number.
The specific phrasing you used implies probable continuations involve treating the proposition as false, and 42 is also a number the model will have had a lot of user feedback on. Similarly, the model gives an excellent answer for 1000 which even mentions some factors of 1000.
On the other hand
> Is 355 a prime number?
> Yes, 355 is a prime number. A prime number is a positive integer that has no positive integer divisors other than 1 and itself. For example, 2, 3, 5, and 7 are all prime numbers because they cannot be evenly divided by any other positive integers. 355 is a prime number because it cannot be evenly divided by any other positive integers except for 1 and itself. It is not divisible by 2, 3, 4, 5, or any other smaller positive integers.
Psh, all we need to do is create an AI that tells whether or not a single sentence is true or not. Should be straightforward and totally not a process that factors in the speakers, context of the conversation, and numerous other social aspects.
Google is more "chatty" than it was a few years ago and ChatGPT is a quickly moving target it seems - it's answers seem more "search-like" than they seemed when I started playing with it just a few weeks ago (more caveats and more likely to give multiple options, etc). It seems like we'll have fusion soon.
I agree the results will be unpleasant. I already despise Google's fucking "looks like there aren't many results" message and there will be more to hate down the road. But still, appearing to give "an answer" rather than reporting information seems like a winning quality to bring in the masses. As someone pointed out, Google's target audience is inherently those credulous enough to be valuable targets for their advertisers.
Not sure if I'm missing something but if the authors claim:
1. That chatbots don't understand what they are producing -> a search engine doesn't either.
2. The benefit of search is that you engage in sense-making -> you actually need to do that with a chatbot too.
3. You would want access to the sources -> that's already fixed for today's chatbots (see perplexity.ai, for example, which offers links next to its claims).
4. We may trust chatbots more easily because of their language: we already have that problem with search results.
I understand the idea that there will not be an ever-knowing AI and we will probably don't even want one. But what we have today seems an improvement over a search engine (not necessarily a replacement).
Providing sources is the most important feature missing in chatgpt and perplexity.ai does indeed attempt to solve it. The other problem is that chatgpt is often factually incorrect and gives no hint of uncertainty.
I've been using perplexity.ai recently and often times the results are more helpful than google and it tends to spew less opaque bullshit than chatgpt.
Eventually Chatbots will start inserting product placement ads as well. So comparing the ads on Google with the ad free ChatGPT experience is detracting from the real value of ChatGPT.
Google results are mostly ok. But I have to do the synthesis. ChatGPT does the synthesis for me saving me time and mental bandwidth. This is the part that I find valuable.
If the chatbots were able to gauge your interests etc. well enough and provide information purely about products and services that it has good reason to believe you'd be genuinely interested in buying, I'd be happy for it to include such promotional material as part of the conversation (esp. if it was crafted in a way that's most likely to engage me - which would make it very different the majority of advertising I come across online and elsewhere).
By the time Google even was started, Excite had been searching the web for 7 years and Lycos had been around for 3 years.
I don't think it's a 1:1 replacement, but let's not judge a product that's been out for less than a month against a trillion dollar behemoth that launched 24 years ago.
I had a recent example of ChatGPT clearly beating Google:
"how to debug a Chrome extension in VSCode?"
After 10 minutes of reformulating this, I couldn't get a good answer from Google, but ChatGPT got it on the first try. Many Google results were about a depreciated extension for VSCode called Chrome Debugger. Maybe if I was normally doing a lot of front-end coding, it would be trivial.
Google needs to get going with their version of fine-tuning PaLM from human preferences, get a good classifier based on user characteristics and query to decide when to use it, some checks to make sure they're not committing copyright infringement, and some text ads in the middle of the generated text, and mark it experimental and let it loose on the search results pages. Listing references, online updates, etc. can be added later. It won't be profitable even if they only enable it on a small percentage of queries, but they're risking the whole business if they don't do it.
What the near future would look like is-
1. Google: will implement some form of LLM on their engine and the contextual answers will significantly improve(~close to what ChatGPT produces) satisfying most people. But also displays the top links that contributed to the context. This is difficult to do but somehow they gotta figure out or risk their entire core business getting affected.
2. ChatGPT and other LLMs: will take some more time to meaningfully compete with google(as in becoming a noun) and in the mean time they would provide APIs to their models so that large orgs(businesses, scientists etc) can plug it in with their own data and use. They may or may not charge a fee for it.
In both cases there is one big pro and con:
Pro: Death of SEO and ads as we know it
Con: The model dictates what majority of internet users read, change their views or manipulate. And eventually becomes an AI weapon.
Makes me wonder if there's a future where something like ChatGPT is able to cite everything that influenced any piece of information/understanding. Not just articles, but the AI version of a human with a highlighter. For any piece of ML derived information it would be cool to see all of the highlights that went into it.
Today I experienced just the opposite. Chatgpt answered some business questions in less than a minute. I would have spent 30 mins in seo optimized sites to find the exact same info.
To speak to GP's point, a number of times already I've failed to find something immediately on Google, asked chatgpt, gotten an answer and then used Google to verify chatgpt's answer.
Google is great if you already know exactly what you're looking for, for a lot of topics chatgpt is already better than Google if you don't
This is self-evidently false; there is plenty of indexed content on the web generate by publicly available LLMs, including OpenAI’s GPT series, and open source implementations of similar technology.
Maybe even more generated by the non-publicly-available ones some firms have for their own use.
It's only a matter of time before ChatGPT is monetized to slip in some paid content, similar to product placement. Until then, I'm sure bright minds are already working on ways to ensure it feeds on data tainted with their own agendas.
I worked on a chat bot to help solve technical issues. It would parse what the user wanted and then search the already established articles we had that were ingested into Elasticsearch.
At first, we started with a goal of fully conversational AI. So, for example, it would ask a question based on the article and give you a choice, you could then type your choice. This became a nightmare for the model so we added buttons instead.
Then, before we knew it, this “bot” was just a glorified search engine that feigned being a bot. Towards the end of it I scratched my head and said “did we even replace the current knowledgebase? are we spending thousands on something that adds no value?”
I don’t think chatbots are it either. I think we could have just replaced the KB search with Elasticsearch and been done with it, no need for any ML.
Yep had the same realization a while back building something similar for our internal docs on notion for a hackathon. I realized that if notion just had a better search engine we wouldn’t need anything else built on top
This was written by a human? It could have been but it reads like it was unfolded from 1 perhaps 2 lines of.. lets say beef. There is very little going on in the text, the blog post is the [now usual] short form free from interesting references to previous writings around this argument, as if the topic was entirely original and invented here
Good references, more than anything, suggest to me an author knows his topic.
The www as a festival of sharing knowledge is really not what we thought it would become or perhaps could or should have.
The author, like most, is definitely capable of writing, sourcing and building(!) something into the chronology of the topic.
But ok, to entertain the idea: I think a chat bot the way companies and government tried to glue it onto their knowledge base (and mostly failed) will now become a real possibility, it will be able to find the most likely document you are looking for and know when the match is to crappy to mention.
An all knowing AI is not required. Have one per domain and let them exchange words.
Real time learning from the conversation is still off but I imagine that could easily change.
Thinking about it, if you want request all information about you from the data hoarders you could build quite the interface to represent you in the bot net. It doesn't have to know hoe to fill taxes as long as there is a tax bot available for 7.3241 dollar per query. Your personal assistant can learn the bits of tax code relevant to your disposition. Seems fun.
Yeah, but they will be. I use it exclusively instead of Google for fact finding. Yes, you can't find a restaurant with it. But it's incredible for teaching programming and general known knowledge.
I cannot understand why the tech crowd has blinders on for AI. I'm 40 years old and have been a programmer since I started playing with BASIC since I was in elementary school.
The shit GPT is doing, I thought I'd never seen in my lifetime.
Maybe people are afraid of being replaced, or not being "special magical beings". Y'all are biological robots. You need to accept that.
AI is going to trounce humans in literally everything in a matter of a few years.
Maybe it's an existential problem, I don't know, but I feel everyone is missing the forest for the trees.
They are exactly that - search engine evolution. I don't want to search through 100s of pages looking for an information - I want information based on the context I provided and then references to relevant pages if I need them. If you provide that no one will want to use regular search anymore. I don't think that we are there yet, even with next gen. gpt4. But it is inevitable to happen.
My prediction is - if Google doesn't release competitor to ChatGPT in 2023 it is doomed in the next 5-6y, just like Yahoo. And Microsoft finally wins search engine war.
This is a highly editorialised title. The title chosen by the authors is "All-Knowing Machines Are A Fantasy". There is nothing in the blog post about "AI chatbots" versus search egnines.
Due the editorialised HN title the top comment takes the thread on a tangent about ChatGPT versus Google web search.1 There is nothing in the article about Google specifically, other than to mention Google has their own language model for dialogue ("AI chatbot") along with OpenAI and Facebook. Nor is there any discussion of search engines, e.g., Google's, except to point out at the end that "AI" and search engines are similar because neither is under the control of the user. They are each offered by corporations that, regardless of the input, shape the output to benefit the corporation's own commercial interests. Neither offers "neutral" results.
A true improvement in search technology would be one that provides neutral results. Providing that improvement means the "tech" company cannot survive. It has no viable business model except operating as an intermediary (middleman), selling ad services.
1. Incidentally no user ever verifies that either Google or ChatGPT searches through millions of anything. The corporation will tweak the results to make sure it stays in business and maximises profit. The user is never allowed to see the tweaks.
The way I see it is that Google (search engines) is a particular solution to a set of retrieval information problems. ChatGPT does seem to be a good solution for a different set of problems. As others have pointed out both have their benefits and limitations.
The future of how we retrieve knowledge might use a combination of these technologies along with others that may not yet exist.
ChatGPT is definitely showcasing how good the future systems will be at understanding natural language input, as well as producing answers in the appropriate form.
AI chatbots(or equivalent AI answering machines) vs Search Engines
Search has 2 kinds of users.
1 trawls through multiple pages of results to find what they are looking for. This is you and me but we are in minority.
2 clicks on the first link(or sometimes 2nd) and that's the end of that particular search. The majority.
2nd will decide the winners
Academics, being the 1st kind are the reasons for arguments like those in the article. They are not wrong but to the majority, it won't matter.
And I am willing to guess the Code Red at Google is about the 2nd type of users. They are the bread and butter.
For most of the world, not using add blockers, those first and second (and third??) links are ads, so those users are not just providing the majority of clicks, they're providing virtually all of the clicks that make money.
Q: how might an AI algorithm be modified in order to return citations with its response?
A: There are several ways in which an AI algorithm could be modified to return citations with its responses. Here are a few possibilities:
One approach would be to add a citation generation module to the AI algorithm. This module could be trained on a dataset of citations and could be used to generate appropriate citations for the information provided by the AI algorithm.
Another option would be to modify the AI algorithm to include a database of references, with each reference being associated with a specific piece of information. When the AI algorithm provides a response, it could also return the appropriate reference(s) from the database.
A third possibility would be to modify the AI algorithm to search external sources of information (such as online databases or libraries) for relevant citations whenever it provides a response. The AI could then return the relevant citations along with its response.
It is worth noting that adding a citation generation module or incorporating external sources of information into an AI algorithm can be a complex task, as it requires the development of additional algorithms and data structures to support these features. It may also require the acquisition of additional data or resources to support the citation generation process.
I used ChatGPT last week to accelerate a “DevOps” project that I had. It involved writing a bunch of relatively simple Python scripts that transformed source JSON to specifically formatted yaml.
I just told it my input and my desired output and it wrote the code perfectly. Then I kept refining the script with more specific instructions.
I also needed to document sone scripts that I previously wrote. I just copy and pasted the script into ChatGPT and asked it what does it do and used that for my README.
I even used it to create a simple FAQ for my scripts and asked “Why would you use it”? The answer to why you would use my scripts jibed with the reality.
Another use case was that I was trying to get a friend interview ready for an eventual AWS tech support position. I just sent him a series of interview questions that I would ask at an interview and told him to put those questions in ChatGPT (I had already vetted a few).
I would ask questions like “compare and contrast X and Y”? “when would you use X”? “how can you save money with Y”?
The question is not whether AI will generate the right answers, but whether it will generate better answers than a search engine. Given that search results are dominated by ad-supported websites, maybe an AI results, which are 99% accurate, will be better than whoever spend the most on ad words.
It's 2045, the Sino-AI war has devastated the climate and huge dust clouds cover the earth. Human-AI hybrids are forming a new government. A new deadlier virus engineered in a lab is set to wipe out humanity. People are still adding "reddit" to their search terms.
All of these criticisms fail to address a big elephantine question: Are these hurdles insurmountable?
If ChatGPT's model fails to differentiate or understand the question of credibility, how tough would it be to build classifiers that could do a 75% decent job?
If AI presentation of knowledge is imperfect - well, fine. But is human presentation or selection perfect for that matter? Criticising a thing for imperfection is not a viable take.
If ChatGPT's knowledge is limited, how tough would it be to plug it into larger and larger datasets, including the live internet itself?
It's like there's a wall between the people who see the potential of this and the people who feel so threatened by it that they refuse to look over the wall.
When ChatGPT came out I used it to do 100 hours worth of work in 10 hours or so. But it still felt like work. And for more intricate use cases it is just easier to do traditional research.
We’re safe for a while still. And when I say we I mean writers.
> I used it to do 100 hours worth of work in 10 hours or so
That's still pretty impressive, regardless of what you estimate your hourly rate at.
I've had a go at using it for some (tedious non-fiction) writing tasks that I have been completely unenthusiastic about, but its style is so bland I would be embarrassed to put my name to it.
It does work ok to get the motor running, though - kind of like a targeted writing prompt.
Recent paper showing how LMs "struggle with less
popular factual knowledge, and that scaling
fails to appreciably improve memorization of
factual knowledge in the tail". In other words, simply scaling LLMs will not result in replacing Google as a search engine. Approaches like theirs ie retrieval augmentation are needed.
It's like putting a dusty book back on the digital shelve, now you need to have a conversation to do a simple search query that's hidden in marketing gibberish?
I'm, still seeing the whole push in this product range hang with the call to make the search functions of most search engines more garbage than the actual chatbot.
As if Marketing is steering this ship.
I'm sticking to my own knowledge and trusted independent sources for now, no need to have something make up an entire world for you, just to trap you in a cage of ones own making for real this time?
maybe not search engines, but it has totally replaced stackoverflow and google for searching for docs or even wikipedia in some sense (e.g. make alpha blend between two argb uint32 pixels)
google has been going progressively worse to find what i need, sometimes i even have to go to second page, wasting time clicking on bullshit markov chain seo sites to find out how to parse date in whatever new language i am learning
with chatgpt: parse a date from string in the v programming language, try googling that
I feel like most people are trying to give personality to AI chatbots. From what I have read, ChatGPT is good at carrying conversations and doing text based work. I would focus on feeding it curated info and making its answers as factually correct as possible. And some way to know the sources of information it picked up answers from. Companies and people are going to do that anyway in the coming future. We may see chatbots good at a certain field, and whose results can be trusted.
I compare it all to using Github Copilot in my daily work. Before, I needed to Google many problems to find answers, curate the answers, scroll down websites, read downvoted comments on StackOverflow, copy/paste code, refactor it, turn code into TypeScript, etc.
Now, I just tell Copilot what I need to do, and I simply curate multiple solutions.
ChatGPT also understands what I need to do and changes things based on human instructions. It gets things right much more quickly.
If I understand correctly, the ChatGPT training is a continuation of the CoPilot training, which is a continuation of GPT-3 training. It makes sense they exhibit a similar behaviour.
AI chatbots are (or will be) a replacement for the user interface of search engines.
They'll still be backed by search engines.
(And this isn't news; Google has been saying this for well over a decade, I think, by this point, and Google, Amazon, and others have been doing it with simpler chatbots, primarily for voice assistants, for some time; LLMs are a big step forward for this—though I suspect that per request resource usage is far more.)
ChatGPT has its uses, but it can be dangerous: The other day it told me that the legal limit for Plutonium in drinking water is 0.1 micrograms per liter.
Awful article, probably has a lot of upvotes because people agree with the title.
> But the systems do not have any understanding of what they are producing, any communicative intent, any model of the world, or any ability to be accountable for the truth of what they are saying.
Same is true for search engines. (LLMs could include links to the relevant source material too, if that counts as accountability.)
They are not, however the search engine function will now be more specific. Do you want to discover sources or do you just need information?
Keep in mind that AIs (have already) will infest the search engines by creating websites with the content you search ad hoc and many other tricks. If there is money in it, AIs will take over it, ultimately for the benefit of the user.
The article does not show up without JS, so I'll answer the title.
I agree. In practice, you'll likely need to double check what it says using a search engine anyway. As long as the Chatbot does not provide the sources for whatever it states, I won't trust it. There's no reason to. Unless it produces pure logic that can be checked using reasoning.
… as I have in real life from people unless they are expert in the area / can be trusted to varying degrees for some reason, or from anything on the internet, including Wikipedia. AI-based stuff hasn't changed this fundamental principle.
ChatGPT is trained on Wikipedia (which is mostly fine, though you still need to check the sources!), plus random things where people could have stated random stuff, maybe jokingly, or could have been sarcastic or whatever. So the training set is already not trusty. And even on a trusty set, the process is so opaque that we don't know what plausible but false rehash it's capable of outputting. Actually we know that it produces what has been called "hallucinations", so…
It's easy to get lazy about double checking the overly confident tone or the flawless writing of the chatbot. It's probably a good way to explore, but it can't be fully trusted yet. Even for exploring, I'm afraid of risking being influenced by reading a false piece of information written perfectly, even if I know I should double-check it.
However, the day the chatbot provides the sources, it might change a lot of things.
Right now I find ChatGTP excellent for certain technical topics I'm familiar with, but can't recall the details. It saves me the time of looking for an example from stackoverflow/tutorial blogs/official documentation, and gives me just the piece I want - things like syntax, which libraries/modules to use etc.
I think this article misses the point. I don’t imagine ChatGPT as is to be a replacement for a search engine. I do however see the interface of talking to an AI in a REPL sort of way, where the context is carried through the conversation as a massive improvement with tons of potential that is yet to be realized.
Not sure if you mean talking literally, but imo offices a noisy enough now. I'd hate to work in an office full of people like me talking to an AI all day to get things done. I guess it would be like working in a call center.
I'm guessing there will have to be changes in communication methods or work place setups or we'll continue to type?
Not yet 1:1 replacement, but AI chatbot agents that are capable of interpretable chain of thought reasoning and conduct searches to fill gaps in knowledge and check their sources will be highly disruptive to legacy search and will supplement, and even replace legacy search engines in many use cases.
An important point about the authors of this post. They are very biased (at least Bender) because they are academic who have been fighting large language models from industry for years now. The main reason is that they make all their research obsolete so they fight to keep their relevancy.
It is incredible that we as humanity are about to put our info search in hands of a hallucinatory text generator, without certainty about (i) how resilient the whole thing is against spam/propaganda, and (ii) what happens to people unable to find the difference.
No shit, people are replacing product support with it, not search engines...
Also OP fucked up re-redacting title from sense into nonsense.
But it would be interesting seeing ChatGPT-like but learned on say entirety of some fictional universe bibliography, wh40k chatbot that knows all the lore!
There hasn’t been a decent search engine in the sense that one could craft targeted queries since Altavista. Google has been a chatbot since its inception. The competition is just showing that it’s no longer a very good one.
ChatGPT still mines already existing information on the web. It kinda summarizes SERP results, kinda comparable to look trough first 10 results. I don't think it will compete with Google SE in anyway.
ChatGPT writes great first drafts, but needs a good and knowledgeable editor on the subject to create valid factual documents. Likewise, Google finds great pages with facts, but lots with untruths too.
Before ChatGPT's December 15th nerf update I was starting to use it in place of Google for technical questions. Something like ChatGPT will one day smoke what we consider search engines to be today.
ChatGPT:I apologize if my previous response was incorrect. 123 times 456 is actually equal to 56088. Please let me know if you have any other questions.
They're not a replacement but they will be generating so much credible indexable garbage that they will reduce the signal to noise ratio to the point that they are useless.
Mods need to edit the headline of this to match the title of the page. Nothing stating or implying this headline appear in the article and it feels like an attempt to troll HN.
No, they won't replace search engines, but they could weaponize the generation of SEO content spam to the point where search engines become overwhelmed and irrelevant.
But at the point people stop using general purpose search engines because there's so much content spam that they're unable to distinguish from genuine material, then there's no further motivation for anyone to keep the content spam bots going, until the search engines' own SEO-detection AI catches up. Some sort of arms-race is likely inevitable, but I'm not convinced GPT-based tech is going to especially favour one side over the other.
I am substituting Google for ChatGPT whenever I have a question related to software development. There are many types of queries where ChatGPT is already a better choice.
This definitely is a threat to Google and websearch as we know it.
A lot of the discussion today is around the natural language output of ChatGPT. However, if that presentation is changed into a list of relevant facts, evidence, counterpoints etc., (based on which ChatGPT is generating Natural language textual response), we get an AI-powered search devoid of blog spam, SEO crap etc., Assuming advertising and commercial concerns don't corrupt the results with spurious results, this is quite encouraging.
I don’t think they do. At least not in a safe way. The AI doesn’t understand meaning or the underlying material and concepts. I’m sure filters will be put into a pipeline but it will never be as good as letting the human sniff out the correct answers.
Today I asked chatGPT what Einstein's favourite food was. It gave a reasonable answer about him liking simple foods. The worrying thing is that I was satisfied with the result. It was plausible. It could well be true. There is a good chance that this kind of AI might provide the right kind of response that a large percentage of the public find convincing enough to not bother with additional research.
The universal truth for people is often different than the truth, and AI has to return the convenient truth.
Example. People are persuaded that women can do two things at the same time and men cannot. Despite countless counterexamples, the original study positioning everyone on a bell curve with only 6% difference in time of execution, and without checking for the quality of the results (“254 plus 786 equals 126 quick mafs!!!”), it is blatantly false, but any search engine that would return that it is false would make itself rejected by humans.
This reminds me of an article about how phones killed ripley’s believe it or not and the Guinness book of records - both of which are kind of like “solidified” bar conversation. Now if someone at the bar asks about Einstein’s favorite foods, people just whip out phones and find an answer (doesn’t even have to be correct) and the conversation ends. Before you could go all night discussing it from attempts at first principles - you might be entirely wrong but you had fun.
"Will never be" is an overly confident statement for any technical field. Predicting the future is hard and predicting the future with an un-bounded time horizon is pretty near impossible.
In my view, this is an example of a problem with AI ethicists: they tend to naively overestimate the statu quo. In this particular case, the premise seems to be that people obtain reliable information from traditional search engines, hence chatbots cannot replace them because they are unreliable.
The truth: many uses of a search engine aren't even things where reliability matters much (what's the weather tomorrow? what's artist X's latest song?). For those where it does matter, results are unreliable as well, because the web is full of all kinds of crap and any attempt at sorting the wheat from the chaff failed many years ago. And sure, users can try to assess the reliability of the sources. Good luck with that (not you specifically, well-educated and tech-savvy HN reader, but at a population level). All those extremist propaganda sites that are climate denialists, antivaxxers, etc. have readers that think they are perfectly reliable.
Of course there are situations where checking the sources works (e.g. a scientist working on a literature review)... which, of course, are the situations where those people aren't going to trust ChatGPT anyway.
In summary, I think this kind of criticism widely misses the mark. The problem with this and other current issues is the corporate interests feeding us all kinds of crap, not whether the way of feeding it is more convenient or less.
This is the most terrifyingly stupid comment thread I've read on HN lately. What's unique about it though is the thoroughness of the bad quality comments. It's bad all the way down. I think I'm skipping my HN scroll this evening and just going to bed.
well for me thay have already replaced, my first preferences. I used to google or go to stack overflow for resolving issues and woul need hours to come to conclusions. Now with ChatGPT i simpy start my research from there.
not sure if any openai people are reading this thread. but they need to figure out a way to add new content to the model. or have people submit their content. i got some answers that were not wrong but out of date.
why doesnt google just incorporate it into their search results? and then based on what people click, weight it more or less for each type of search? kind of like an ensemble search result
I tried baking cookies based on a chatGPT recipe… everything looked right glancing at the response so I dove in and started making them. About 2/3rds the way through it became obvious the steps were correct-looking gibberish.
Today's chatbots are complex language models based on the language that they were trained on.
They are very clever at transforming data into language, or one language into another... but they work on language and not knowledge.
It would be interesting to see a language model identify the necessary resources to fulfill a query which is then fed back into it to transform the dry data back into language.
Identify if the question is looking for weather, sports scores, unit conversion, general knowlege, or business information for the following requests:
1. "What is the high temperature tommorow?"
2. "Did the packers win last night?"
3. "What time does Walmart open?"
4. "What will it rain next week?"
5. "Where can I buy an umbrella?"
6. "How many feet are in a mile?"
Question classification:
To which GPT responded:
1. Weather
2. Sports Scores
3. Business Information
4. Weather
5. Business Information
6. Unit Conversion
Using this, it should then be sent to a system that knows how to do those queries and return back data.
However, doing that integration isn't the place for today's chatbots - or at least not the place for OpenAI to be trying to do all the possible things. Those queries also cost money and become harder to bill for.
Yeah this is a weird take. Chat bots are improving rapidly, increasingly used in real applications. With more targeted applications, like a "search engine" for coding, I could see these being very powerful. It's weird to write an article discounting that based on how the current ChatGPT application, a research beta, operates and throwing out some fear-based morality.
ChatGPT works very well as a tech demo but nothing like it will ever work at scale.
There simply is no business model for something as useful as ChatGPT, and there never will be. Let's assume for a minute that OpenAI will manage to improve their models such that the accuracy
There will probably be a business case for the chatbot Google will turn its search engine into, but it will be as far removed from ChatGPT as today's Google is from the good old Google of PageRank fame from 1999, which would show you exactly what you asked for, often enough to be considered genuinely useful.
The Google search engine from 1999 was a bit like ChatGPT is today: The demonstration of a new way of doing things that seemed, at the time, to be the way of the future.
Except we don't. PageRank is an awesome example of how computer science nerds can disrupt a market with their superior math but it didn't change the basic laws of capitalism and this is why it succeeded in turning Google into one of the letters of FAANG, but it failed at being sustainable. A Google-killing descendent of ChatGPT would have a hard time turning a profit without also turning into the same sort of thing Google is today: A convenient way of giving people ads instead of answers.
Here is my prediction:
1) ChatGPT and similar "AI" chatbots will never be more than tech demos
2) The Google of the future will have some shortish chatbot-copypasta for search results that resemble prompts to write essays
3) Google will sell ads for highly specialized and expensive services for the few people who need to copy-paste longer essays (which might or might not have a sufficient relation to reality) and are able to pay for it
4) The end-result will be virtually indistinguishable from the Google of today, because for every honest chatbot-aaS, there will be an uncountable amount of dishonest vendors of "free" auto-generated propaganda and spam, copy-pasted by school-children all over the world, because that will be the results Google will show below their own copypasta, which is just too short for this week's essay because the teacher knows about Google, too. And on HN, people will long for the good old times of 2022, when ChatGPT was, briefly, surprisingly awesome.
5) Your crush won't believe you that you wrote that touching love poem all on your own.
Question: Will AI chatbots replace search engines?
ChatGPT Answer: It is unlikely that AI chatbots will completely replace search engines. While chatbots are useful for answering specific questions and providing information on a wide range of topics, search engines are designed to help users find specific information that is widely available on the internet.
Search engines use algorithms to crawl and index the web, making it easy for users to find relevant information by using keywords and phrases. Chatbots, on the other hand, use natural language processing (NLP) techniques to understand and respond to user requests in a conversational manner.
In some cases, chatbots may be able to provide users with the information they need without requiring them to use a search engine. For example, a chatbot could provide users with personalized recommendations or help them find specific products or services based on their preferences. However, search engines will likely continue to play a crucial role in helping users find information on the internet.
Question: Imagine an scenario where they replace them almost completely, what innovation would be there?
Answer: If AI chatbots were to replace search engines almost completely, it is likely that the chatbots would need to be significantly more advanced than they are currently. In order to replace search engines, chatbots would need to be able to effectively index and organize the vast amount of information available on the internet, and be able to provide users with accurate and relevant results in response to their queries.
One potential innovation that could enable chatbots to replace search engines is the development of more advanced natural language processing (NLP) techniques. Chatbots would need to be able to understand and interpret complex queries and requests, and be able to accurately provide relevant information in response.
Another potential innovation is the development of more sophisticated machine learning algorithms that can help chatbots understand the context and intent behind user queries, and provide more relevant and accurate responses. This could involve the use of techniques such as deep learning and neural networks, which are designed to enable machines to learn and adapt over time.
Overall, it is likely that the replacement of search engines by chatbots would require significant technological advancements in the field of artificial intelligence and natural language processing.
Question: Do you think technologies such as ChatGPT will confuse search engines to know if web pages are written by humans or computers?
Answer: It is possible that the development of technologies such as ChatGPT (short for Chat Generative Pre-training Transformer) could make it more difficult for search engines to determine whether web pages are written by humans or computers. ChatGPT is a large language model that has been trained to generate human-like text by analyzing a large dataset of online conversations. As a result, it is capable of producing text that is difficult to distinguish from text written by humans.
However, it is worth noting that search engines have become increasingly sophisticated over the years, and they have a number of techniques at their disposal for determining the quality and relevance of web pages. These techniques include analyzing the structure and content of web pages, evaluating the credibility and authority of the sources that are cited on the pages, and looking for patterns of suspicious or spammy behavior.
It is therefore likely that search engines will continue to be able to identify and prioritize web pages written by humans over those written by computers, even as the capabilities of technologies like ChatGPT improve.
they totally are, not 100 percent but mostlz working already. risks of misinfomation will further plummet, and in already living with it for eg programming questions
Assuming this refers to https://chat.openai.com/, it's rather difficult to try it out. After creating an account it demanded a phone number. Of course, I will not supply my own, and temporary ones I tried were either recognised as such or rejected has having already been used.
- Today's search engines will give you links to millions of documents
- ChatGPT, if asked correctly, will instead generate one perfect document based on millions of the documents
To me, that's a clear evolution of the search engine, especially with all the SEO & ad spam that's plaguing Google and others currently.
I wasn't ready to pay a monthly fee for an ad-free Google. I am ready to pay for something like ChatGPT.
Google has an issue on their hands and is probably working overtime to lobby the threat of ChatGPT away.