The site sampled in the above reply [1] appears to be a nonsense content farm probably generated with GPT-2 (it is repetitive and the date on the Mars post is 2021, though I know blog post dates can be set to anything).
If true it would mean that Bing has been serving up GPT-generated nonsense for some time, and now they’re going to add another layer of nonsense on top of that.
As someone said in the replies “it’s just bullshit all the way down”.
I agree with others who have said these companies have no (stated, tested) plan to ensure these systems don’t spill lies and nonsense. It’s possible that will be harmless but this is doubtful, and either way I do not feel good about these companies rushing to push out a half baked product.
To my amateur understanding it seems like LLMs are like the language center of our brain, in that they can take abstract multi dimensional ideas and turn them in to words. But we have other portions of our brain that actually handle logic and memory and produce the ideas to verbalize. It seems like LLMs are missing this piece. Crude analogy I know, but I think LLMs could be fantastic when paired with a more complex system, and the rush to deploy them as-is may be counter productive.
I asked Bing Chat the same question and the response was much better:
> As of today, the human population of Mars stands at zero. There are no humans living on the surface of Mars, only rovers and landers. Mars is the fourth planet from the Sun and the second smallest planet in the Solar System. It has a very thin atmosphere and a cold and dry climate. Mars is governed by the Martian Congressional Republic in the fictional universe of The Expanse, but not in reality. Some websites may give false or misleading information about the population of Mars, such as 4.8 billion or 2.5 billion people, but these are not based on any scientific evidence or credible sources.
Right, I meant to suggest that this was an old data sampling algorithm. My point is that if this is already the low quality they were serving before, it is hard to see how adding a hallucinating algorithm in to the mix is going to help anything.
I wonder if the main problem holding AI back will be that there will be so much nonsense on the internet that we struggle to make useful bots since we can no longer implicitly trust that most info is true or at least useful.
I would say this is what we get for asking non-expert data sources for information we want to present as authoritative.
Let's say we go back before the time of the internet and asked 100,000 random individuals for factual information on random subjects. You'd have a corpus of facts, but you'd also have tons of old wives tales and information that is just wrong.
The internet democratized posting information, but I would say it also did the same with stupidity. Random sites, reddit posts, and stuff we read from hacker news doesn't have to have anything at all to do with the truth.
Maybe pushing models to have some factual information bases that are weighted heavier will help, but I don't see how AI in its current form will come off any better than a person that reads tons of bad information and buys into it.
The problem is we could be going backwards. ChatGPT is working off a pre LLM internet and it works surprisingly well. If we scrape the internet again in 5 years, could we even get a model as good as the one today?
I think the last paragraph quite makes sense. It seems "true" that some kind of reasoning capability emerges as LLMs get bigger, which makes those LLMs quite useful and blows a lot of people's minds at the beginning. But, I think, essentially, the fundamental training goal of LLMs--guessing what the next word should be--pushes the model into a kind of reasonable nonsense generator, and the reasoning capability emerges because it can help the model to make stuff up. Therefore, we should be cautious about the result generated by these LLMs. They might be reasonable, but to make up the next word is their real top priority.
>> It’s possible that will be harmless but this is doubtful, and either way I do not feel good about these companies rushing to push out a half baked product.
It might be half-baked, but this is the apex of the top peak of the hype cycle. Now is the time to strike when the iron is hot, Hot, HOT$$$.
I hope I'll be alive next year to see what happens when the dust settles.
Asking bing “what is the human population on Mars”:
https://mastodon.social/@jhpot/109859745864083061
And then this reply was notable as well:
https://neurodifferent.me/@ZoDoneRightNow/109860796225392633
The site sampled in the above reply [1] appears to be a nonsense content farm probably generated with GPT-2 (it is repetitive and the date on the Mars post is 2021, though I know blog post dates can be set to anything).
If true it would mean that Bing has been serving up GPT-generated nonsense for some time, and now they’re going to add another layer of nonsense on top of that.
As someone said in the replies “it’s just bullshit all the way down”.
I agree with others who have said these companies have no (stated, tested) plan to ensure these systems don’t spill lies and nonsense. It’s possible that will be harmless but this is doubtful, and either way I do not feel good about these companies rushing to push out a half baked product.
To my amateur understanding it seems like LLMs are like the language center of our brain, in that they can take abstract multi dimensional ideas and turn them in to words. But we have other portions of our brain that actually handle logic and memory and produce the ideas to verbalize. It seems like LLMs are missing this piece. Crude analogy I know, but I think LLMs could be fantastic when paired with a more complex system, and the rush to deploy them as-is may be counter productive.
[1] https://tsam.net/what-is-the-population-of-mars/