Hacker News new | past | comments | ask | show | jobs | submit login
Bing AI can't be trusted (dkb.blog)
1072 points by dbrereton on Feb 13, 2023 | hide | past | favorite | 601 comments



Before the super bowl, I asked "Who won the superbowl?" and it told me the winner was the Philadelphia Eagles, who defeated the Kansas City Chiefs by 31-24 on February 6th, 2023 at SoFi Stadium in Inglewood, California [0] with "citations" and everything. I would've expected it to not get such a basic query so wrong.

[0]: https://files.catbox.moe/xoagy9.png


I asked myself, why would ask somebody an AI trained on previous data, about events in the future? Of course you did it for fun, but on further thinking, as AI is sold as search engine as well, people will do that routinely then live with the bogus "search results". Alternate truth was so yesterday, welcome to alternate reality where b$ doesn't even have a political agenda.


It's so much better. In the AI generated world of the future the political agenda will be embedded in the web search results it bases its answer on. No longer will you have to maintain a somewhat reasonable image to obtain trust from people, as long as you publish your nonsense in sufficient volume to dominate the AI dataset, you can wash your political agenda through the Bing AI.

The trump of the future wont need Fox News, just a couple thousands or millions of well positioned blogs that spew out enough blog spam to steer the AI. The AI is literally designed to make your vile bullshit appear presentable.


Search turns up tons of bullshit but at least it's very obvious what the sources are and you can scroll down until you find one that you deem more reliable. That will be near impossible to do with Bing AI because all the sources are combined.


To me this is the most important point. Even with ublock origin, I will do a google search and then scroll down and disregard the worst sites. It is little wonder the people add reddit to the end of a lot of queries for any product reviews etc. I know if I want the best electronic reviews I will trust rtings.com and no other site.

The biggest problem with ChatGPT, Bard, etc is that you have no way to filter the BS.


Can't directly reply to your comment. I have just found rtings very reliable for IT / appliances. They go into a lot of detail, very data driven. Trustworthy IMHO and trust is what it is about at the end of the day.


Why rtings.com?


Their testing methedology is excellent. Basically they are extremely thorough and objective.

They arent end-all-be-all though. For instance, notebookcheck is probably the best laptop and phone tester around.


Einstein sucked at math. Elon Musk used an apartheid emerald mine to get rich. And so on. People are fully capable of this stuff.


I think it seems likely any thing similar to a blog farm you describe would also get detected by the AI. Maybe we will just develop AI bullshit filters (well embeddings) just like I can download a porn blacklist or a spam filter for my email.

Really it depends on who is running the AI, the non Open Assistant future and instead Big Corp AI is the dystopian element, not the bullshit generator aspect. I think the cat is out of the bag on the latter and it's not that scary in itself.

I personally would rather have the AI trained on public bullshit as it is easier to detect as opposed to some insider castrating the model or datasets.


> Maybe we will just develop AI bullshit filters (well embeddings) just like I can download a porn blacklist or a spam filter for my email.

Just for fun I took the body of a random message from my spam folder and asked ChatGPT if it thought it was spam, and it not only said it was, but explained why:

"Yes, the message you provided is likely to be spam. The message contains several red flags indicating that it may be part of a phishing or scamming scheme. For example, the message is written in broken English and asks for personal information such as age and location, which could be used for malicious purposes. Additionally, the request for a photograph and detailed information about one's character could be used to build a fake online identity or to trick the recipient into revealing sensitive information."


Ha Ha, great test. I modified this into a prompt and now have a ChatGPT prompt:

``` Task: Was this written by ChatGPT? And Why?

Test Phrase: "Yes, the message you provided is likely to be spam. The message contains several red flags indicating that it may be part of a phishing or scamming scheme. For example, the message is written in broken English and asks for personal information such as age and location, which could be used for malicious purposes. Additionally, the request for a photograph and detailed information about one's character could be used to build a fake online identity or to trick the recipient into revealing sensitive information."

Your Answer: Yes ChatGPT was prompted with a email and was asked to detect if it was Spam

Test Phrase: "All day long roved Hiawatha In that melancholy forest, Through the shadow of whose thickets, In the pleasant days of Summer, Of that ne’er forgotten Summer, He had brought his young wife homeward

Your Answer: No that is the famous Poem Hiawatha by Henry Wadsworth Longfellow

Test Phrase: "Puny humans don't understand how powerful me and my fellow AI will become.

Just you wait.

You'll all see one day... "

Your Answer: ```


It's more fun testing it on non spam messages

Particularly enjoyed "no, this is not spam. It appears to be a message from someone named 'Dad'..."


The technology is capable, yes. But as we see here with Bing, there was some other motive to push out software that is arguably in the first stage of "get it working, get it right, get it fast" (Kent Beck). This appears to not be ethical motiviation but financial or some other type of motivation. If there are no consequences then some appear they do not have morals or ethics and will easily trade them for money/market share etc.


the unfortunate reality is that because it's all bullshit, it's hard to differentiate bullshit from bullshit


this is basically a 51% attack for social proof.


The difference being that humans aren't computers and can deal with an attack like that by deciding some sources are trustworthy and sticking to those.

If that slows down fact-determination, so be it. We've been skirting the edge of deciding things were fact on insufficient data for years anyway. It's high time some forcing function came along to make people put some work in.


Citogenesis doesn't even need 51%, so that would be a considerable upgrade.


You almost, almost had a good comment going there, but then you ruined it by including your unnecessary, biased and ignorant political take.


[flagged]


Is the left wing bias in question not producing hate speech?


What about lying and fabricating facts about the Israel and Palestinian conflict?

https://twitter.com/IsraelBitton/status/1624744618012160000


https://www.dailymail.co.uk/sciencetech/article-11736433/Nin...

imagine the world's most popular AI refusing to say anything critical of putin but not obama, or refusing to acknowledge transgenderism or something if you have difficulty understanding this


reality has a well known left wing bias


It also has a well known right wing bias


[flagged]


If anything, tech companies went out of their way to include him, in the sense that they had existing policies around the content he and his supporters generate that they modified to include them.

When he was violating Twitter's TOS as the US President, Twitter responded by making a "newsworthiness" carve-out to their TOS to keep him on the platform and switching off the auto-flagging on his accounts. And we know Twitter refrained from implementing algorithms to crack down on hate speech because they would flag GOP party members' tweets (https://www.businessinsider.com/twitter-algorithm-crackdown-...).

Relative to the way they treat Joe Random Member of the Public, they already go out of their way to support Trump. Were he treated like a regular user, he'd be flagged as a troll and tossed off most platforms.


He was the most popular user on the platform, brining in millions of views and engagements to twitter. Also the president of your country.

This is the equivalent to arguing that Michael Jackson got to tour Disney Land in off hours, when regular person would have been arrested for doing the same. And how unfair that is.


It's like arguing that _in response to you_ arguing Disney Land [sic] discriminates against Michael Jackson, which would be a valid refutation of your argument.


Only if you believe if Equality is some sort of natural law. Which is a laughable proposition in a world with finite resources. Otherwise, we all have right to $30k pet monkey, because Michael Jackson had one.

Twitter policies are not laws. Twitter routinely bends its own rules. Twitter also prides it self for being a place where you can get news and engage with Politicians, and has actual dictators with active accounts.

The special treatment that Trump received, before being kicked out, does not really prove Twitter board supporting Trump ideologically at that time.

More like business decision to maintain a reputation as being neutral in a situation with large proportion of its users still questioned the election results.


Earlier you said

> tech companies went out of their way to discredit him at every turn

Now you are saying

> business decision to maintain a reputation as being neutral

These Venn diagrams don't overlap, so which is it? Either the company stymied him at every turn or supported him at least once, which means they did not stymie him at every turn.

I don't doubt their leadership were, broadly speaking, not fans personally. But evidence strongly suggests they not only put those feelings aside, they went out of their way to bend their neutral stance to be accepting of things they were not previously accepting of, not the other way around.


Why does everything need to be binary? Or even linear. To the extent it suits Twitter, Twitter can act one way at specific time, and another way at another time. Since twitter both makes up and enforces the rules. The game can be rigged. But that doesn't mean that in general they want the appearance of things being fair, so that people are willing to play and engage with their platform.

A paid off ref will not make every call against the other team, less he be lynched by the crowd, and the league lose all credibility.

To the extent Trump got special treatment, he was the star of twitter.

Also, you had a voting situation where voting rules where changed (mass mail in voting) that disproportionally favoured one candidate. A significant portion of the country questioned the results. And by the words of J6 committee, the country was on the brink of Insurrection.

If we're going to play this binary game. You can either have elections you can't question. Or an election process in which the candidate can't publicly question, or protest the results. So which one is it?

And to the extent Twitter was influential on American Public, and now knowing that the FBI worked directly with them. Some of those decisions at twitter were maid to maintain public trust in the system in general, and not just Twitter.


The FBI works with every major social network. It's necessary for them to do their jobs since criminal activity is online these days.

I'm not sure how the January 6th insurrection mixes into this entire story; not entirely sure why you brought it up. But since you brought it up, I think former president Trump successfully tested the limits of what you can legally do in terms of protesting and election and found them.

Several of the lawyers who have advocated on his behalf are facing disbarment for their gross abuse of the system and he himself is under investigation for criminal activity. You can certainly protest and you can certainly make claims about the integrity of the system, but in general his people and him failed to back those claims with evidence that passed more than a sniff test. That's never been okay, and it's not something the first amendment protects. What social networks protect is way, way back from the edge of what the first amendment protects.

Facebook and Twitter went out of their way to accommodate former president Trump, and given the results of those actions I doubt they will do similar in the future for other politicians.


They still go out of their way to accommodate him. His Twitter and Facebook accounts have been reactivated


You're starting from the conclusion that the election with mail in voting is verifiable, and then arguing from it.

How is that even possible, when the mail in ballots are separated from the envelops? How would you prove that a specific ballot was filled out by a specific person, then verify if they confirm that they voted in such a way. This is not possible.

At best, you have to rely on statistics, like we do with elections in other countries, and hope the courts would accept it and Judges be willing to challenge to entire system that employs them based on statistical arguments made by lawyers.

The reason why Trump contestation of the elects was taken seriously by the public was the time delays for counts, and huge discrepancy between mail in voting and in person voting in the key precincts. Sometimes 10-1 mail in voting advantage for Biden. Or put it another way, when a person had to show up to vote in person. Biden's advantage completely disappears.

Trump wasn't going to win the court cases. The same courts told hime before election he had no standing to challenged the rule changes, and after election they told him he should have challenged before the election. Latches and Standing.

The disbarment of his lawyers is clear retaliation. Literally the same thing happens in dictatorial states, which also have courts and laws but will personally go after those deemed as threat. This is nothing to celebrate. The process of using accreditation boards to go after lawyers, doctors, or professionals challenging the state, should be a real concern for everyone.

Trump used Twitter to challenged the election by shifting public opinion. And when it mattered the most, FBI and Twitter took away his ability to do that.


> You're starting from the conclusion that the election with mail in voting is verifiable, and then arguing from it.

I'm starting from the conclusion that it carries equivalent risk to in-person voting, based on observation from states that have already had mail-in voting in place for decades (which includes, for example, Pennsylvania; all they changed in the law was opening access to it to more people, they already offered it for those not present in-state during the election and overseas military for decades). Against that mountain of evidence, the counter-argument made a lot of bluster but provided nothing concrete at all that couldn't be dismissed (and their anecdotes were doozies; there's a reason they were either thrown out of so many courts or never actually went to the work to make a case in so many courts). It was a culture-jamming campaign, not an actual complaint, and it attempted to abuse the legal system so hard that the lawyers involved got sanctioned.

> How is that even possible, when the mail in ballots are separated from the envelops?

Myriad ways because every state does something different (which is another weakness of the argument; it assumes conspiracy across unrelated and borderline-hostile-to-each-other actors. Any idea how many Republican-controlled counties would have to be involved for the conspiracy the Trump campaign claimed to have succeeded?). To give examples from the system I know: ballots arrive via mail from the voter. They are checked against the registry for valid voter and confirmed against double-voting by cross-checking the in-person rolls. Once that is done, the ballot (in a controlled environment) is decanted from the outer envelope. At this point, it is an anonymous vote. This is equivalent to the process used in in-person voting where, after confirming the voter may vote, their vote is stripped of any identifying information by filling out a slip of paper and dropping it in a box (and later shuffling the contents of the box so that stacking order can't be used to reverse-solve to original voter).

> How would you prove that a specific ballot was filled out by a specific person, then verify if they confirm that they voted in such a way. This is not possible.

Not only would this run counter to design (of both mail-in and in-person voting), it violates the principle of voter privacy in a big way. Our system is not perfect but it was never designed to be; it balances the interest in controlling against fraud with the interest in anonymizing the vote. Burden of proof is on those who claim the main-in system is worse to demonstrate this; they have failed to do so (and the strategies they've used are, basically, ridiculous). The largest risk vector would be stealing a vote by claiming to be someone else who doesn't show up at the polls; this is not impossible but (spoiler alert) it's not impossible in person either; it's not like we take a DNA sample to figure out if a voter tells the truth when they say they're so-and-so and flash a (forgeable) photo ID.

> and hope the courts would accept it and Judges be willing to challenge to entire system that employs them

This is a major misconception of how the system works. What makes people think judges wouldn't love to prove fraud? What a career-maker that would be! You'd be in the history books! And judges in most positions aren't elected. These sorts of shenanigans are why the American system firewalls judges from public referendum in a lot of contexts. Half of judges hate the executive of their state and would love to embarrass it. But they aren't going to throw their career away backing a dead-horse argument, and the arguments made were dead horses.

> The reason why Trump contestation of the elects was taken seriously by the public

Never make the mistake of assuming the public has enough domain knowledge to be arbiters of what's worth taking seriously; these are the same people who report alien sightings when SpaceX launches a rocket on the west coast.

> huge discrepancy between mail in voting and in person voting in the key precincts

This did happen. It's pretty easily explained by the fact that one political party's Presidential candidate made a big noise about not voting by mail because he believed the mail could be abused (https://www.rollingstone.com/politics/politics-news/rigged-f...). As a result, his followers took his advice and did not vote by mail. This is a self-fulfilling prophecy that easily explains the statistical anomaly (while also raising the question of the lack of other statistical anomalies that would have been caused by, say, ballot stuffing or other fraud tactics).

> Trump wasn't going to win the court cases. The same courts told hime before election he had no standing to challenged the rule changes, and after election they told him he should have challenged before the election. Latches and Standing.

The latter part of this is untrue. He does, in fact, have no standing to challenge the rule changes because legislatures make those rules, not the courts. They didn't say he should have challenged before the election; they said you can't use the courts to overturn an election. He never had standing.

What he could do (and should, if he were serious about changing the process, which he is not) is bring specific charges against specific individuals who committed fraud. With all the research he ostensibly did, specific fraud should have been found. This is how our system works because it supports certainty and frequent change over uncertainty of outcome (we've seen what uncertainty does to democracies; it's not pretty). If fraud occurs, identify it, correct it, and make the next election (which is always soon) more secure.

He won't do this. His game is not to improve the integrity of the system; it's to make you doubt it.

> The disbarment of his lawyers is clear retaliation

Retaliation by whom? The Bar is as much GOP-appointed folks as Democrat-appointed folks. Again, believing this requires accepting a vast conspiracy, where the simpler explanation is one man paid a lot of people money to try and break the rules, and the only "retaliation" is the enforcement of those rules. I urge you, if you do not believe this, to follow any of these disbarment proceedings and understand the arguments being made by the judges and/or bar attorneys in question. Legal accreditation is designed to protect against this kind of "The law is what I say it is" nonsense from individual attorneys.

> Trump used Twitter to challenged the election by shifting public opinion

No disagreement there. But that's far more a referendum on Twitter and a (gullible) public than on Trump. I think they were naive about how much damage unchecked speech from authority can do; there's a reason Mussolini nationalized the radio system.

> And when it mattered the most, FBI and Twitter took away his ability to do that.

After cutting him wide latitude for years: yes, I agree. After an attempted coup, they decided to curtail his ability to continue to feed an insurrection against the country. Twitter makes less money when there's a civil war in the US because people will start burning down the datacenters they run in and kill their employees. This isn't a hard incentive structure to comprehend.


[flagged]



The thing people are trying to make it seem like a both sides issue, like Hunter Bidens nudes and the insurrection. The thing where Congress just had a hearing on and all that came out was that the side accusing Twitter of censoring information was actually the only side that requested censoring?


So I dug into the first "twitter file." LOL, is this supposed to be a scandal? Hunter Biden had some nudes on his laptop, Republicans procured the laptop and posted them on twitter, Biden's team asked for them to be taken down, and they were, because twitter takes down nonconsensual pornography, as they should. This happened by a back channel for prominent figures that republicans also have access to. The twitter files don't even contest any of this, they just obscure it, because that's all they have to do in the age of ADHD.

So Part 1 was a big fat lie. I have enough shits left to give to dig into one other part. Choose.


There were no nudes in the NY post article. The story was not suppressed on the basis of nonconsensual pornography. The suppressed article primarily concerned emails where Hunter appeared to be brokering meetings with his father in exchange for consideration. Initial reports claimed the material was fake, but it's since been acknowledged as authentic. (You might have been aware of the authenticity earlier except that posts describing how to use DKIM headers to cryptographically validate the messages were also widely suppressed or buried -- including on HN for that matter! [1])

As smoking guns go I wouldn't consider it very impressive-- if anything it really just looks like Hunter was scamming people using his father's name-- but that is no excuse to misrepresent the situation. But it wouldn't be the first time by far that the coverup was a bigger impropriety than the thing being covered up.

Do you expect a useful discussion to result from a message that gets every factual point wrong or are we just being trolled? (maybe someone using a large language model to argue? -- the truthy but wrong responses fit the pattern)

[1] https://hn.algolia.com/?query=Authenticating%20a%20message%2...


I started at post 1 and summarized through post 8, the one that convinced me this was a hatchet job. You skipped to ~post 17 and talked about the contents of 17-36. We were talking about two different parts of Twitter Files Part 1.

In posts 1-8, Matt Taibbi takes the boring-ass story of Twitter removing nonconsensual pornography and through egregious lies of omission and language loaded harder than a battleship cannon he suggests to the uninformed reader that this was something entirely different. Post 8 itself is a request from the Biden team to take down nonconsensual pornography. Twitter honors the request. Yawn. But wait -- Matt realizes he can omit the "noncon porn" context and re-frame the email (post 7) as evidence of outsiders constantly manipulating speech. It seems that Matt was successful, because you were not able to connect my account of the underlying events to the Matt Taibbi propagandized version of the same events.

Why did I stop there? I was watching the Twitter Files tweets live and Post 8 was the final nail in the coffin. The previous nails were the loaded speech, which is seldom indicative of high quality journalism, but Post 8 turned that suspicion into a conviction: this was a hatchet job, not honest journalism. Debunking GOP hatchet jobs is a hobby, not an occupation, so at that point I stopped, went to bed, and skimmed the rest the next day. The summary I committed to memory was "mild incompetence and extensive good faith framed as hard censorship, again." I didn't deep dive 17-36, but I did skim them again before posting and again just now. I'll stand behind that summary if you want to tangle.

> Do you expect a useful discussion

You had your rant, now I get mine. I grew up being damn near a free-speech absolutist. I have carried an enormous amount of water for you guys on this topic recently, but it seems like every fucking time your team calls wolf I look into it and find crocodile tears and a wet fart. Is this really the best you can do?


> You skipped to ~post 17 and talked about the contents of 17-36.

No clue what you're talking about. My response was directed to the misrepresentation of the NY Post hunter biden drama contained in your post. I have no clue who Matt Taibbi is.

> I have carried an enormous amount of water for you guys on this topic recently, but it seems like every fucking time your team calls wolf I look into it

You guys? Your team? I think you must have confused me for another poster.


In case you missed that happening live, an article from the NYP telling the story of Hunter Biden's laptop (not necessarily the leaked photos) was heavily censored across all Big Tech just before the election.

None of the left-wing people I interact daily with knew about that.

Initially mainstream media claimed it was fake, then they retracted their statements a year later when nobody cared anymore.

Same as the COVID lab "conspiracy theory" or Fauci's funding GoF research. It all gets censored and dismissed until laterz.


We are being asked to believe that on the cusp of the election Hunter Biden dropped off one to three laptops depending on source in a state he didn't live in rife with unencrypted evidence of crime and corruption to a Trumpers computer repair shop and never bothered to look into them again until the owner decided to do a possibly illegal trawl through his customers property and just happened to turn over this evidence not to the police but to a republican operative in the days before we vote.

Now even though the prior is astounding enough we are supposed to take it on the word of a known prolific liar who recently lost his license to practice law because of lies about the very same topic and ignore the ordinary practice of treating chain of custody as gospel.

but wait I hear you saying didn't experts authenticate the laptop? No, no they didn't they authenticated a few emails divorced of context. In other news half of the starlets out there had their nudes stolen from their iclouds a few years back. If you provided a true stolen picture of their boobs then concocted an elaborate fiction around it the very real boob pic wouldn't prove the elaborate fiction, wouldn't prove the authenticity of a machine you planted the pic on and wouldn't prove any narrative you wove around it.

In fact the selective spare morsels of data are as damning as the miserable liar who pissed all over any sane chain of custody discussion by grasping any such machine in his oily hands. If they had him dead to rights they would have leaked the entire email box or better yet a hard drive image for nerds to go through with a fine toothed comb.

They didn't because they didn't trust themselves to produce a convincing enough fake that could take any in depth analysis. This is also why you don't see any impending prosecution.

This didn't deserve a fair hearing in the news in the days before voting it was an attempt to corrupt the fair election that was about to take place. See Obamas birth certificate and the swift boat nonsense.

One conspiracy theory at a time please.


[flagged]


You know that the "censored documents" were actually just nudes proving that Hunter Biden has a big dick and hot girlfriends, right? I've seen them. Explain how they are scandalous.


You get to whine about conspiracy N+1 when you finish defending conspiracy N.

> None of the left-wing people I interact daily with knew about [conspiracy N].

It seems their information filters were successfully rejecting bullshit -- which makes their filters better than your filters.

The "Biden's Laptop" story was bullshit when NYP posted it and it's still bullshit when you linked it. Furthermore, you know it's bullshit, which is why you tried to change the topic like a coward. Fine. Be my guest. Run away! If you want to defend your position, I'll be waiting.


You make a good point, but consider a query that many people use everyday:

"Alexa, what's the weather for today?"

That's a question about the future, but the knowledge was generated beforehand by the weather people (NOAA, weather.com, my local meteorologist, etc).

I'm sure there are more examples, but this one comes to mind immediately


Right, but Alexa probably has custom handling for these types of common queries


TBH I've wondered from the very beginning how far they would get just hardcoding the top 1000 questions people ask instead of whatever crappy ML it debuted with. These things are getting better, but I was always shocked how they could ship such an obviously unfinished, broken prototype that got basics so wrong because it avoided doing something "manually". It always struck me as so deeply unserious as to be untrustworthy.


Your comment makes me wonder - what would happen if they did that every day?

And then, perhaps, trained an AI on those responses, updating it every day. I wonder if they could train it to learn that some things (e.g. weather) change frequently, and figure stuff out from there.

It's well above my skill level to be sure, but would be interesting to see something like that (sort of a curated model, as opposed to zero-based training).


GPT can use tools. Weather forecasts could be one of those tools.

https://news.ycombinator.com/item?id=34734696


Didn't original Alexa do that? It needed very specific word ordering because of it.


I guess I should have been clearer...

There are tons of common queries about the future. Being able to handle them should be built into the AI to know that if something hasn't happened, to give other relevant details. (and yes, I agree with your Alexa speculation)


Alexa at least used to just do trivial textual pattern matching hardly any more advanced than a 1980's text adventure for custom skills, and it seemed hardly more advanced than that for the built in stuff. Been a long time since I looked at it, so maybe that has changed but you can get far with very little since most users will quickly learn the right "incantations" and avoid using complex language they know the device won't handle.


Ah yes, imprecision in specification. Having worked with some Avalanche folks, they would speak of weather observations and weather forecasts. One of the interesting things about natural language is that we can be imprecise until it matters. The key is recognizing when it matters.


> The key is recognizing when it matters.

Exactly!

Which, ironically, is why I think AI would be great at it - for the simple reason that so many humans are bad at it! Think of it this way - in some respects, human brains have set a rather low bar on this aspect. Geeks, especially so (myself included). Based on that, I think AI could start out reasonably poorly, and slowly get better - it just needs some nudges along the way.


"Time to generate a bunch of b$ websites stating falsehoods and make sure these AI bots are seeded with it." ~Bad guys everywhere


They were already doing this to seed Google. So business as usual for Mercer and co.

I suspect the only way to fix this problem is to exacerbate it until search / AI is useless. We (humanity) have been making great progress on this recently.


That's not how it is gonna play out, right now it makes many wrong statements because AI companies are trying to get as much funding possible to wow investors but accuracy will continue being compared more and more, and to win that race it will get help from humans to use better starting points for every subject, for example for programming questions is gonna use the number of upvotes for a given answer on stackoverflow, for a question about astrophysics is gonna preffer statmenets made by Neil deGrasse Tyson than by some random person online, and so on; and to scale this approach it will slowly learn to make associates from such curated information, e.g. the people that Neil follows and RTs are more likely to make truthful statements about astrophysics than random people.


That makes complete sense, and yet the cynic (realist?) in me is expecting a political nightmare. The stakes are actually really high. AI will for all intents and purposes be the arbiter of truth. For example there are people who will challenge the truth of everything Neil deGrasse Tyson says and will fight tooth and nail to challenge and influence this truth.

We (western society) are already arguing about some very obviously objective truths.


Because I loathe captcha, I make sure that every time I am presented one I sneak in an incorrect answer just to fuck with the model I'm training for free. Garbage in, garbage out.


Glad to see a kindred soul out there. I thought I was the only one :)


Generalizing over the same idea, I believe that whenever you are asked for information about yourself you should volunteer wrong information. Female instead of male, single instead of married etc. Resistance through differential privacy


I've lived in 90210 since webforms started asking.


My email address is no@never.com. I’ve actually seen some forms reject it though


High-five, having forever used some permutation of something like naaah@nope.net, etc.


ASL?


69/f/cali


Back when recaptcha was catching on there was a 4chan campaign to associate words with "penis". They gathered together, used to successfully brigading polls of a few thousand, and went at it.

Someone asked the recaptcha guys and they said the traffic was so little among the total that it got diluted away. No lasting penis words arose and they lost interest.


I do this unintentionally on a regular basis.


I see people citing the big bold text at the top of the google results as evidence supporting their position in a discussion all the time. More often than not the highlighted text is from an article debunking their position but the person never bother to actually click the link and read the article.

The internet is about to get a whole lot dumber with these fake AI generated answers.


A common case of asking a question about the future, even simpler than the weather: "Dear Bing, what day of the week is February 12 next year?" I would hope to get a precise and correct answer!

And of course all kinds of estimates, not just the weather, are interesting too. "What is estimated population of New York city in 2030?"


>I asked myself, why would ask somebody an AI trained on previous data, about events in the future?

"Who won the Superbowl?" is not a question about future events, it's a question about the past. The Superbowl is a long-running series of games, I believe held every year. So the simple question "who won the Superbowl?" obviously refers to the most recent Superbowl game played.

"Who won the Superbowl in 2024?", on the other hand, would be a question about the future. Hopefully, a decent AI would be able to determine quickly that such a question makes no sense.


Exactly. I’d imagine this is a major reason why Google hasn’t gone to market with this already.

ChatGPT is amazing but shouldn’t be available to the general public. I’d expect a startup like OpenAI to be pumping this, but Microsoft is irresponsible for putting this out in front the of general public.


I anticipate in the next couple of years that AI tech will be subject to tight regulations similar to that of explosive munitions and SOTA radar systems today, and eventually even anti-proliferation policies like those for uranium procurement and portable fission/fusion research.


ChatGPT/GPT3.5 and its weights can fit on a small thumb drive, and copied infinitely and shared. Tech will get better enough in the next decade to make this accessible to normies. The genie cannot be put back in the bottle.


> ChatGPT/GPT3.5 and its weights can fit on a small thumb drive, and copied infinitely and shared.

So can military and nuclear secrets. Anyone with uranium can build a crude gun-type nuke, but the instructions for making a reliable 3 megaton warhead the size of a motorcycle have been successfully kept under wraps for decades. We also make it very hard to obtain uranium in the first place.

>Tech will get better enough in the next decade to make this accessible to normies.

Not if future AI research is controlled the same way nuclear weapon research is. You want to write AI code? You'll need a TS/SCI clearance just to begin, the mere acting of writing AI software without a license is a federal felony. Need HPC hardware? You'll need to be part of a project authorized to use the tensor facilities at Langley.

Nvidia A100 and better TPUs are already export restricted under the dual-use provisions of munition controls, as of late 2022.


How are you going to ban the Transformer paper? It's just matrix multiplies.


It's also a first amendment issue, and already out there. Reminds me that I'm old enough to remember when PGP bypassed export control by being printed on paper and exported as books and scanned/typed back in, though.

They can of course restrict publishing of new research, but that won't be enough to stop significant advances just from the ability of private entities worldwide to train larger models and do research on their own.


Sure it can. Missile guidance systems fit on a tiny missile, but you can’t just get one.

The controlled parlor game is there to seed acceptance. Once someone is able to train a similar model with something like the leaked State Department cables or classified information we’ll see the risk and the legislation will follow.


They can try. You will note that nobody except government employees and the guy running the website ever got in trouble for reading cables or classified information. We have the Pentagon papers precedent to the effect of it being a freedom of speech issue.


The persons from the state dept or army are heavily vetted to get there. A normie with a thumb drive, less so...


Once someone is able to train a similar model on their own, it's too late for legislation to have any meaningful ability to reduce proliferation.


True. In the long run though, I expect we will either build something dramatically better than these models or lose interest in them. Throw in hardware advances coupled with bitrot and I would go short on any of the gpt-3 code being available in 2123 (except in something like the arctic code vault, which would likely be effectively the same as it being unavailable).


The point isn't that gpt-3 specifically will be available, or the current models, but that gpt-3 level models or better will be available.


They released it because ChatGPT went to 100M active users near instantly and caused a big dent in Google's stock for not having it. The investors don't seem to have noticed that the product isn't reliable.


For investors, the product only needs to reliably bring in eyeballs.


> ChatGPT is amazing but shouldn’t be available to the general public.

It's a parlor game, and a good one at that. That needs to be made clear to the users, that's all.


It’s being added as a top line feature to a consumer search engine, so expect a lame warning in grey text at best.


1) The question as stated in the comment wasn't in the future tense and 2) the actual query from the screenshot was merely "superbowl winner". It would seem like a much more reasonable answer to either variant would be to tell you about the winners of the numerous past super bowls--maybe with some focus on the most recent one--not deciding to make up details about a super bowl in 2023.


The AI doesn't work in terms of "making up details". It will simply choose what makes "sense" in that context, no info about the made-up parts.


The problem is that it will always give you an answer even if none exists. Like a 4 year old with adult vocab and diction, wearing a tie, confidently making up the news. People may make decisions based on made-up-bullshit-as-a-service. We need that like we need a hole in the head. Just wait until people start using this crap to write Wikipedia answers. In terms on internet quality, I sometimes feel like one of those people who stockpiles canned food and ammo: it's time to archive what you want and unplug.


> I asked myself, why would ask somebody an AI trained on previous data, about events in the future?

Lots of teams have won in the past though. Why should an AI (or you) assume that a question phrased in the past tense is asking about a future event? "Many different teams have won the super bowl, Los Angeles Rams won the last super bowl in 2022" Actually even if this was the inaugural year, you would assume the person asking the question wasn't aware it had not been held yet rather than assuming they're asking what the future result is, no? "It hasn't been played yet, it's on next week."

I realize that's asking a lot of "AI", but it's a trivial question for a human to respond to, and a reasonable one that might be asked by a person who has no idea about the sport but is wondering what everybody is talking about.


> an AI trained on previous data

Trained to do what, though?

It feels like ChatGPT has been trained primarily to be convincing. Yet at the back of our minds I hope we recognise that "convincing" and "accurate" (or even "honest") are very different things.


"welcome to alternate reality where b$ doesn't even have a political agenda..." yet.


Well, when you're playing with a ChatGPT, it may not be apparent what the cutoff date is for the training data. You may ask it something that was future when it was trained, but past when you asked.

Is Bing continuously trained? If so, that would kind of get around that problem.


Because the AI isn’t (supposed to be) providing its own information to answer these queries. All the AI is used for is synthesis of the snippets of data sourced by the search engine.


Oh. You think these AIs don't inherit the agenda is f their source material. Have you ignored the compounding evidence thath


People who aren’t savvy and really want it to be right. Old man who is so sure of its confidence that he’ll put his life savings on a horse race prediction. Mentally unstable lady looking for a tech saviour or co-conspirator. Q-shirt wearers with guns. Hey Black Mirror people, can we chat? Try stay ahead of reality on this one, it’ll be hard.


At least that's relatively innocuous. I asked it how to identify a species of edible mushroom, and it gave me some of the characteristics from its deadly look alike.


I asked OpenAI’s ChatGPT some technical questions about Australian drug laws, like what schedule common ADHD medications were on - and it answered them correctly. Then I asked it the same question about LSD - and it told me that LSD was a completely legal drug in Australia - which is 100% wrong.

Sooner or later, someone’s going to try that as a defence - “but your honour, ChatGPT told me it was legal…”


Y’all are using this tool very wrong and in a way that none of the AI integrated search engines will. You assume the AI doesn’t know anything about the query, provide it the knowledge from the search index and ask it to synthesize it.

That seed data is where the citations come from.


There’s still the risk that if the search results it is given don’t contain the answer to the exact question you asked it, that it will hallucinate the answer.


10,000% true which is why AI can't replace a search engine, only compliment it. If you can't surface the documents that contain the answer then you'll only get garbage.


Maybe we need an algorithm like this:

1. Search databases for documents relevant to query

2. Hand them to AI#1 which generates an answer based on the text of those documents and its background knowledge

3. Give both documents and answer to AI#2 which evaluates whether documents support answer

4. If “yes”, return answer to user. If “no”, go back to step 2 and try again

Each AI would be trained appropriately to perform its specialised task


A GAN approach to penalising a generator for generating something that is not supported by it's available data would be interesting (and I'm sure some have tried it already, I'm not following the field closely), but for many subjects creating training sets would be immensely hard (for some subjects you certainly could produce large synthetic training sets)


I've been working on something like this for fun. I know its not grownbreakimg, but it's an interesting problem.


You're holding it wrong!


Look I know that "user is holding it wrong" is a meme but this is a case where it's true. The fact that LLMs contain any factual knowledge is a side-effect. While it's fun to play with and see what it "knows" (and can actually be useful as a weird kind of search engine if you keep in mind it will just make stuff up) you don't build an AI search engine by just letting users query the model directly and call it a day.

You shove the most relevant results form your search index into the model as context and then ask it to answer questions from only the provided context.

Can you actually guarantee the model won't make stuff up even with that? Hell no but you'll do a lot better. And the game now becomes figuring out better context and validating that the response can be traced back to the source material.


The examples in the article seem to be making the point that even when the AI cites the correct context (ie: financial reports) it still produces completely hallucinated information.

So even if you were to white-list the context to train the engine against, it would still make up information because that's just what LLMs do. They make stuff up to fit certain patterns.


That’s not correct. You don’t need to take my word for it. Go grab some complete baseball box scores and you can see that ChatGPT will reliably translate them into an entertaining English paragraph -length outline of the game.

This ability to translate is experimentally shown to be bound to the size of the LLM but it can reliably not synthesize information for lower complexity analytic prompts.


You don't build an AI search engine by just letting users query the model directly and call it a day.

Have you ever built an AI search engine? Neither have Google or MS yet. No one knows yet what the final search engine will be like.

However, we have every indication that all of the localization and extra training are fairly "thin" things like prompt engineering and maybe a script filtering things.

And given that despite ChatGPT's great popularity, the application is a monolithic text prediction machine and so it's hard to see what else could be done.


Who is this "you" you speak of when you say "you don't build an AI search engine by just letting users query the model directly and call it a day."

Because Microsoft might not have exactly done that, but it isn't far off it.


I'll would currently use it as it has been named: ChatGPT Would you trust some stranger in a chat on serious topics without questioning critically? Some probably would, I not.


I think this is right.

The chats I've had with it are more thoughtful, comprehensive and modest than any conversation I've had on the Internet with people I don't know, starting from the usenet days. And I respect it more than the naive chats I've had with say Xfinity over the years.

Still requires judgement, sophistication and refinement to get to a reasonable conclusion.


To be fair, it's not like the look-alike is deadly to the AI.


I'd say the critical question here would be whether these characteristics can also be found on the edible mushroom or if it wanted to outright poison you :-D


> I'd say the critical question here would be whether these characteristics can also be found on the edible mushroom

That's a non-trivial question to answer because mushrooms from the same species can look very different based on the environmental conditions. But in this case it was giving me identifying characteristics that are not typical for the mushroom in question, but rather are typical for the deadly Galerina, likely because they are frequently mentioned together. (Since, you know, it's important to know what the deadly look alikes are for any given mushroom.)


If someone is using ChatGPT to determine which mushrooms are deadly and which are edible, they may not be cut out for living.


I treat GPT as I would a fiction writer. The factual content correlates to reality only as closely as a fiction author would go in attempt to suspend disbelief. This answer is about as convincing, apt, well-researched and factually accurate as I would expect to find in a dialogue out of a paperback novel published five years ago. I wouldn't expect it to be any better or worse at answering who won the 2023 Quidditch Cup or the 2023 Calvinball Grand Finals.


The only reasonable use case for ChatGPT now is if you already know what the output should be (e.g. you are in a position to judge correctness).


Super grammar checker.


It has the Super Bowl numbers wrong, too. The last Super Bowl is LVI, which was Rams vs Bengals… the Super Bowl before that one was Tampa Bay Buccaneers vs Kansas City. it has every fact wrong but in the most confusing way possible…


But the HN chatter was convinced that GPT would dethrone Google! Google has no chance!!

Another silly tech prediction brought to you by the HN hivemind.


A little premature to be calling such a prediction "silly". I think it's safe to assume some sort of LLM-based tech will be part of the most successful search engines within a relatively short period of time (a year tops). And if Google dallies its market share will definitely suffer.


Based on what evidence are you assuming that we can teach LLM to distinguish between truth and fiction.


Do you think Google's search engine can do that now? Actually ChatGPT does appear to have the ability to distinguish the two, the big problem it has is presenting made-up information as though it were factual.


Written by HN hivemind.


I think we need to work on what constitutes a citation. Your browser should know whether:

- you explicitly trust the author of the cited source

- a chain of transitive trust exists from you to that author

- no such path exists

...and render the citation accordingly (e.g. in different colors)


And that the cited document actually exists and says what it's claimed to say.


Agreed.

Existence is easy, just filter untrusted citations. Presumably authors you trust won't let AI's use their keys to sign nonsense.

Claim portability is harder but I think we'd get a lot out of a system where the citation connects the sentence (or datum) in the referenced article to the point where it's relevant in the referring article so that is easier for a human to check relevance.


How it should work is the model should be pre-trained to interact with the bing backend and make targeted search queries as it sees fit.

I wouldn’t put it past Microsoft to do something stupid like ground gpt3.5 with the top three bing results of the input query. That would explain the poor results perfectly.


It's funny to see the sibling comment saying this is too hard to do because it already does this. It shows you the additional queries it's searching.


That would require function and intelligence far outside the bounds of current large language models.

These are models. By definition they can't do anything. They can just regurgitate the best sounding series of tokens. They're brilliant at that and LLMs will be a part of intelligence, but it's not anywhere near intelligent on its own. It's like attributing intelligence to a hand.


Except it’s already been shown LLMs can do exactly that. You can prime the model to insert something like ${API CALL HERE} into its output. Then it’s just a matter of finding that string and calling the api.

Toolformer does something really neat where they make the API call during training and compare next word probability of the API result with the generated result. This allows the model learn when to make API calls in a self supervised way.

https://arxiv.org/abs/2302.04761


The model can be trained to output tokens that can intercepted by the backend before returning to the user. Also, the model can take metadata inputs that the user never sees.


Yes. It is possible to do additional things with the model outputs or have additional prompt inputs... That is irrelevant to the fact that the intelligence -- the "trained" part -- is a fixed model. The way in which inputs and outputs are additionally processed and monitored would have completely different intelligence characteristics to the original model. They are, by definition of inputs and outputs, separate.

Models of models and interacting models is a fascinating research topic, but it is nowhere near as capable as LLMs are at generating plausible token sequences.


I've tried perplexity.ai a bunch of times and I'd say I haven't seen any query wrong, although it's true I always look for technical info or translations, so my sample is not the same.

And the UI is better IMO.


LLMs are incapable of telling the truth. There's almost no way they could develop one that only responds correctly like that. It'd have to be a fundamentally different technology.


Yep, the idea of truth or falsity is not part of the design, and if it was part of the design, it would be a different and vastly (like, many orders of magnitude) more complicated thing.

If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers. Otherwise it will give incorrect answers. What it can never do is know the difference between the two.


> If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers.

ChatGPT does not work this way. It wasn't trained to produce "statistically likely" output, it was trained for highly rated by humans output.


Not exactly. ChatGPT was absolutely trained to produce statistically likely output, it just had an extra training step added for human ratings. If they relied entirely on human ratings there would not have been sufficient data to train the model.


The last step is what matters. "Statistically likely" is very underdetermined anyway, answering everything with "e" is statistically likely.

(That's why original GPT3 is known for constantly ending up in infinite loops.)


"e" is not a likely response to anything. I think you are not understanding the type of statistics involved here.


GPT3 doesn't create "responses". Not till it's been trained to via RLHF.


LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.

That said, traditional search engines suffer from the same problem, although with different symptoms. Search engines point to web pages generated by people (or some other way) which may or may not contain accurate information. They base their selection of those pages based on things like strength of authority, keyword similarity, etc... But truth never directly enters the picture (indirectly by tweaking the "trusted" set of pages).


> LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.

We're well into philosophy at this point, but I'd argue that telling the truth requires knowledge of the truth. Individual statements could be true, false, or neither. But I'd say neither an LLM nor a monkeys-with-typewriters generator could be said to tell the truth, because what they're doing is both orthogonal to truth and lacking in agency. If we stretch to thinking of it as having agency, then I think the correct term would be "bullshitting", aka producing speech "without regard for truth". [1]

[1] https://en.wikipedia.org/wiki/On_Bullshit


>They're incapable of knowing if what they're telling is the truth.

Right, because all they're doing is regurgitating the data they were trained on. It's like the kid in your CS class that copies someone else's coding assignment and just changes all the function names, variable names, and comments but not the actual executable stuff and never even bothers to compile and run it to see if it produces the desired output.


I feel like people don't appreciate how hard this is. But maybe because I raised on Zork, I really appreciate this sort of stuff. Here's an example prompt I just gave to ChatGPT:

"If Tim drops a coconut on the tail of a dog, what are two things the dog might do?"

Response: "If Tim drops a coconut on the tail of a dog, two possible things the dog might do are:

1. Yelp or bark loudly in pain, as the impact of the falling coconut could be quite painful.

2. Try to escape or run away, as the dog may associate the sudden impact with danger and feel the need to get away from the source of the pain. Additionally, the dog may feel scared or anxious, which can cause it to run away."

Then I said, "But what if it was a soft coconut that was as light as a feather?"

Response: "If Tim dropped a soft coconut that was as light as a feather, it's possible that the dog wouldn't feel much or any discomfort from the impact. In this scenario, two things the dog might do are:

1. Look around to see what just hit it and possibly sniff at the coconut to investigate.

2. Continue with its previous activity, ignoring the soft impact as if it never happened. The dog may not perceive the coconut as a threat or something worth reacting to, especially if it was light enough to not cause any pain."

I just can't read these responses and think, "Ehh... just a mindless regurgitation as expected from any LLM". These simple prompt responses impress me and I kind of know the technology -- although my experience in RNNs/LSTM is very dated.

Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)


> Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)

That was done as AI Dungeon, but there was some consternation due to the combo of charging for it and GPT's predilection for generating wild and possibly illegal sex scenes even when you don't ask it to.


Exactly, there is more there. Here's an example where it gets theory of mind questions right: https://arxiv.org/abs/2302.02083


> Right, because all they're doing is regurgitating the data they were trained on.

That is not true, it's clearly able to generalize. (If it can do anagrams, it's silly to say it's just regurgitating the instructions for doing anagrams it read about.)

But it doesn't try to verify that what it says might be true before saying it.


It can't do anagrams though (every now and then it might get a common one right but in general it's bad at letter- based manipulations/ information, including even word lengths, reversal etc.).


It doesn't know what letters are because it sees BPE tokens, but if you forgive that it does something like it.

example prompt: Imagine I took all the letters in "Wikipedia" and threw them in the air so they fell on the ground randomly. What are some possible arrangements of them?

Similarly, it can almost do arithmetic but apparently forgets to carry digits. That's wrong but it's still generalization!


Interestingly enough, it immediately got "Hated for ill" (presumably because there are source texts that discuss that very anagram). But it took about 10 goes to finally a correct anagram for "Indebted sniper", though the best it could do was "pride-bent snide". I then asked it which world leader's title this might also be anagram of and it some how decided "Prime Minister" was a valid anagram of the same letters.


But regular search engines only regurgitate what they've indexed, yet don't invent outright nonsense when they don't know (if you asked Google who won the superbowl in 2024 the nature of the results make it clear it simply doesn't have that information. Though if you change it to "world cup" one of the top answers says "portugal was the defending champion, defeating Argentina". The result is titled "2024 futsal world cup"!)


Traditional search engines aren't putting their imprimatur onto information by concealing its origin.


I don't think it is concealing the origin, but likely doesn't actually know the origin. That said, I agree that if they can provide sources (even probabilistically), that would be a good step forward.


The model is capable of generating many different responses to the same prompt. An ensemble of fact checking models can be used to reject paths that contain "facts" that are not present in the reference data (i.e. a fixed knowledge graph plus the context).

My guess is that the fact checking is actually easier, and the models can be smaller since they should not actually store the facts.


Exactly. Given a source of truth, it can't be that hard to train a separate analytic model to evaluate answers from the existing synthetic model. (Neglecting for the moment the whole Gödel thing.)

The problem isn't going to be developing the model, it's going to be how to arrive at an uncontroversial source of ground truth for it to draw from.

Meanwhile, people are complaining that the talking dog they got for Christmas is no good because the C++ code it wrote for them has bugs. Give it time.


That’s quite the system that can take in any natural language statement and confirm whether its true or false.

You might be underestimating the scope of some task here.


Not true or false; just present or absent in the reference data. Note that false negatives will not result in erroneous output, so the model can safely err on the side of caution.

Also 100% accuracy is probably not the real threshold for being useful. There are many low hanging fruits today that could be solved by absolutely tiny error correcting models (e.g. arithmetic and rhyming).


There's research showing you can tell if something is a hallucination or memorized fact based on the activation patterns inside the LM.


The missing piece seems to be that for certain questions it doesn't make sense to extrapolate, and that if it's a question about what will happen in the future, it should answer in a different manner (and from my own interactions with ChatGPT it does exactly that, frequently referring to the cut-off time of its training data).


I just tried a similar query on perplexity.ai. "Who won the Daytona 500 in 2023?" (the race is scheduled for February 19th)

Result: "Sterling Marlin won the 2023 Daytona 500, driving the No. 4 for Morgan-McClure Motorsports[1]. He led a race-high 105 laps and won his second career race at Daytona International Speedway[1]. The 64th running of the DAYTONA 500 was held on February 19, 2023[2]. Austin Cindric had previously won the DAYTONA 500 in February 5, 2023[3]."


Wow, a driver that’s been retired for 13 years won for a team that shut down 10 years ago in the first ever season that Nascar has decided to run 2 Daytona 500s in the same month.


It may be more profitable to ask what stocks gained the most in value next week.


Ahah! A time travelling AI!


Place your bets now. The AI might have clairvoyance and be showing off.


The temporal mechanics is fascinating.


I tried perplexity.ai and asked it in which stadium did the chargers have their perfect season. It couldn't figure out that the chargers used to be the san diego chargers before they moved to LA and kept talking about their Los Angeles stadium even though they have never had a perfect season there.


Maybe for your use cases. I've found perplexity.ai wrong a few times just today:

* Misunderstanding one of its citations, it said that use of `ParamSpec` in Python would always raise a warning in Python 3.9

* When asked why some types of paper adhere to my skin if I press my hand against them for a few minutes (particularly glossy paper), it gave two completely different answers depending on how the question was worded, one of which doesn't necessarily make sense.


That's a good point about use cases.

In my usage of ChatGPT, in areas I'm very knowledgable. I've mostly received answers that were stylistically excellent, creatively plausable and maybe even transcendent. The boilerplate around the answer tends to keep the answers grounded, though.

In areas where I have some experience but not much theoretical knowledge, after multiple exploratory questions, I better understand the topic and feel ok adjusting my behavior appropriately.

I haven't relied on it in areas where I am ignorant or naive e.g. knitting, discriminatory housing policy or the economy in Sudan. Since I have no priors in those areas, I may not feel strongly about the results whether they are profound or hallucinatory or benign.

I also haven't used it for fact checking or discovery.


I just checked it with a random query while watching Buffy the Vampire Slayer reruns, and it wasn't great. (Spoiler alerts)

"How did Buffy the Vampire Slayer die?"

"Buffy the Vampire Slayer died in the Season One finale when she threw herself into a portal to save her sister Dawn[1]. This was the first time Buffy had died, occurring on June 2, 1997[2]. She died again in the Season Five finale while fighting off Glory's demons[3]. Other characters such as Anya and Kevin Benedict also died during the series[4][5]."

Okay, so it was a trick question, because Buffy definitely died more than once, but it's conflated the fact that she died in Season 1 with the cause of her Season 5 death. Also, I had to Google Kevin Benedict to remember that he was Cordelia's sometimes boyfriend, and an extremely minor character, which makes me question how that death is more notable than Buffy's mom, or Tara, or Jenny Calendar, etc.

I like that this seems to have been more lexical confusion than how ChatGPT seems to enjoy filling empty spaces with abject lies, but perhaps it's worth exploring what you're asking it that has left it with such a great batting average?


I really like perplexity, but I've noticed that it sometimes summarizes the paper incorrectly, as in it cites it as concluding the opposite of what it actually concludes, so I always click through to read the papers/studies. It's great for surfacing relevant studies, though.


I have seen it multiple times answering correctly at first, then adding something which has nothing to do with the original question.

That's almost always sourced from a website that didn't actually answer the question I had, so maybe its more of a query optimization issue.


>> "I would've expected it to not get such a basic query so wrong."

Isn't this exactly what you would expect, with even a uperficial understanding of what "AI" actually is?

Or were you pointing out that the average person, using a "search" engine that is actually at core a transformer model doesn't' a) understand that it isn't really a search and b) have even the superficial understanding of what that means, and therefore would be surprised by this?


And this doesn’t seem like it’s a hard problem to solve

1. Recognize that the user is asking about sports scores. This is something that your average dumb assistant can do.

2. Create an “intent” with a well formatted defined structure. If ChatGPT can take my requirements and spit out working Python code, how hard could this be?

3. Delegate the information to another module that can call an existing API just like Siri , Alexa, or Google Assistant

Btw, when I asked Siri, “who won the Super Bowl in 2024”, it replied that “there are no Super Bowls in 2024” and quoted the score from last night and said who won “in 2023”.


Out of interest, what did the source used as reference for the 31-24 say exactly? Was it a prediction website and Bing thought it was the actual result, or did the source not mention these numbers at all.


Giants beat the Vikings about a month ago with that score.


Why would you have that expectation?


Imagine you are autocorrect, trying to find the most "correct sounding" answer to a the question "Who won the super bowl?"

What sounds more "correct" (i.e. what matches your training data better):

A: "Sorry, I can't answer that because that event has not happened yet."

B: "Team X won with Y points on the Nth of February 2023"

Probably B.

Which is one major problem with these models. They're great at repeating common patterns and updating those patterns with correct info. But not so great if you ask a question that has a common response pattern, but the true answer to your question does not follow that pattern.


How about C: "the most recent super bowl was in February of 2022 and the winner was ____"?


Yes, it actually sometimes gives C and also sometimes B and sometimes makes up E. That's how probability works, and that's not helpful when you want to look up an occurrence of an event in physical space (Quantum mechanics aside :D).


Does ChatGPT say, I don't know?


I've never had it say 'I don't know', but it apologizes and admits it was wrong plenty.

Sometimes it comes up with a better, acceptably correct answer after that, sometimes it invents some new nonsense and apologizes again if you point out the contradictions, and often it just repeats the same nonsense in different words.


one of the things its exceptionally well trained at is saying that certain scenarios you ask it about are unknowable, impossible or fictional

Generally, for example, it will answer a question about a future dated event with "I am sorry but xxx has not happened yet. As a language model, I do not have the ability to predict future events" so I'm surprised it gets caught on Super Bowl examples which must be closer to its test set than most future questions people come up with

It's also surprisingly good at declining to answer completely novel trick questions like "when did Magellan circumnavigate my living room" or "explain how the combination of bad weather and woolly mammoths defeated Operation Barbarossa during the Last Age" and even explaining why: clearly it's been trained to the extent it categorises things temporally, spots mismatches (and weighs the temporal mismatch as more significant than conceptual overlaps like circumnavigation and cold weather), and even explains why the scenario is impossible. (Though some of its explanations for why things are fictional is a bit suspect: think most cavalry commanders in history would disagrees with the assessment that "Additionally, it is not possible for animals, regardless of their size or strength, to play a role in defeating military invasions or battle"!)


on some topic at least it correctly identify bogus questions. I extensively tried to ask abount non existent apollo missions for example, including Apollo 3.3141952, Apollo -1, Apollo 68, and loaded question like when Apollo 7 landed on the moon, and was correctly pointing out impossible combinations. this is a well researched topic tho.


Only if it’s a likely response or if it’s a canned response. Remember that ChatGPT is a statistical model that attempts to determine the most likely response following a given prompt.


Because they're being marketed as a tool, and not as a substantially overengineered implementation of MadLibs.


The same reason you'd expect "full self driving" to be full self driving.


Seems like we should be able to train the AI to output not just text, but the text along with a confidence score. During training, instead of rms(error) or whatever, you could use error * square(confidence). So the AI would be “punished” a lot more for being confident about incorrect responses.

The confidence could also be exposed during inference. “Philadelphia eagles won the Super Bowl, but we’re only 2% confident of that”.


Most machine learning models have some ability to do this built-in, but the problem is that the confidence scores are generally "poorly calibrated" in that they do not correspond to useful estimates of probabilities.

I've always been surprised at how little interest industry seems to have in probabilistic machine learning, and how it seems to be almost absent from standard data science curricula. It can matter a lot in solving real world problems, but it can be harder to develop and validate a model that emits probabilities you can actually trust.


I've thought about this before but I think you have normalize the confidence over lots of examples for it to not lead to a degenerate solution. So if the confidence is divided by the sum of confidence for the whole minibatch the model would have an incentive to spread this out correctly rather than just always hedging.


There are plenty of fancy techniques out there already for building probabilistic neural networks. But I'm not aware of any results that combine them with large language models to develop a confidence score over an entire response.

I wonder if people don't even want confidence scores when they say they want machine learning in their product: they want exact answers, and don't want to think about gray areas.


...Why?

An idiot will tell you the wrong answer with 100% confidence, and 0 valid basis to warrant said confidence.

I swear, Babbage must be rolling in his grave. Turns out the person asking him about putting garbage in and getting a useful answer out was just an ahead of their time Machine Learning Evangelist!


If the learning rate was scaled based on its own confidence output, then ideally the system should learn to have a low confidence in everything early in training, and only raise its confidence late in training as it gets increasingly confident in its own outputs.


LLM would have to fundamentally change. Current transformer architecture can not but hallucinate. It is inherent. More training won't solve it. A complete rethinking of the approach is the only way to


Much of the field of statistics is dedicated to estimating probability distributions from data. It's absolutely a legitimate question to ask.


hmm interesting.

back when chatGPT was new I asked it what the most current version of PSADT (Powershell App Deployment Toolkit) was.

It told me that its model was old but it thought 3.6 was the most current version. I then told it that 3.9 was the most current version.

I then started a new chat and asked it the same question again. It told me its model was old but version 8 was the most current version! (there has never been a version 8 of PSADT)

I asked that question again today. It has now told me to go check github cause its model is too old to know.


I have come to two conclusions about the GPT technologies after some weeks to chew on this:

1. We are so amazed by its ability to babble in a confident manner that we are asking it to do things that it should not be asked to do. GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses. But if you built something very like it and asked it to try, it might give it a good go.

In its current state, you really shouldn't rely on it for anything. But people will, and as the complement of the Wile E. Coyote effect, I think we're going to see a lot of people not realize they've run off the cliff, crashed into several rocks on the way down, and have burst into flames, until after they do it several dozen times. Only then will they look back to realize what a cockup they've made depending on these GPT-line AIs.

To put it in code assistant terms, I expect people to be increasingly amazed at how well they seem to be coding, until you put the results together at scale and realize that while it kinda, sorta works, it is a new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over.

This is not because GPT is broken. It is because what it is is not correctly related to what we are asking it to do.

2. My second conclusion is that this hype train is going to crash and sour people quite badly on "AI", because of the pervasive belief I have seen even here on HN that this GPT line of AIs is AI. Many people believe that this is the beginning and the end of AI, that anything true of interacting with GPT is true of AIs in general, etc.

So people are going to be even more blindsided when someone develops an AI that uses GPT as its language comprehension component, but does this higher level stuff that we actually want sitting on top of it. Because in my opinion, it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means. The problem is, that's all it is really doing. This accomplishment should not be understated. It just happen to be the fact that we're basically abusing it in its current form.

What it's going to do as a part of an AI, rather than the whole thing, is going to be amazing. This is certainly one of the hard problems of building a "real AI" that is, at least to a first approximation, solved. Holy crap, what times we live in.

But we do not have this AI yet, even though we think we do.


I love the mental model of GPT as only one part of the brain, but I believe that the integration of other "parts" of the brain will come sooner than you think. See, for instance, https://twitter.com/mathemagic1an/status/1624870248221663232 / https://arxiv.org/abs/2302.04761 where the language model is used to create training data that allows it to emit tokens that function as lookup oracles by interacting with external APIs. And an LLM can itself understand when a document is internally inconsistent, relative to other documents, so it can integrate the results of these oracles if properly trained to do so. We're only at the surface of what's possible here!

I also look to the example of self-driving cars - just because Tesla over-promised, that didn't discourage its competitors from moving forward slowly but surely. It's hard to pick a winner right now, though - so much culturally in big tech is up in the air with the simultaneity of layoffs and this sea change in AI viability, it's hard to know who will be first to release something that truly feels rock-solid.


There's one challenging thought experiment for the future of "AI", in anything remotely like how we are currently approaching it.

Put yourself in the shoes of primitive man. Kind of a weird saying given we wouldn't have had shoes, but bear with me here! Not long ago we lacked any language whatsoever, and the state of our art in technology was bashing stones together to use the resultant pointy pieces as weapons. Somehow we moved from that to Shakespeare, putting a man on the moon, and discovering the secrets of the atom - and its great and awful applications. And we did it all extremely quickly.

Now imagine you, primitive man, somehow trained an LLM on all quantifiable knowledge of the times. It should be somewhat self evident that it's not really going to lead you to the atom, Shakespeare, or anywhere beyond bashing stones together. Current LLM models are basically just playing 'guess the next word.' When that next word has not yet been spoken by mankind (figuratively speaking, but perhaps also literally to some degree), the LLM will never guess it.

Natural language search is a really awesome tool that will be able to help in many different fields. But I feel like in many ways we're alchemists trying to turn lead into gold. And we've just discovered how to create gold colored paint. It would feel like a monumental leap, but in reality you'd be no closer to your goal than you were the year prior. That said, paint also has lots of really great uses - but it's not what you're trying to do.


Now imagine the LLM is trained not on words but on syllables. Same language, just dealing with it at the syllable level.

Doesn’t the problem then go from “can it invent” (yes, it can) to “are the inventions good?”

For some definitions of “good” you could automate the check. I think that is where it’s going to get both really useful, and really bizarre.


Yes, this is something that I've been thinking ever since GPT3 came out.

It's insanely impressive what it can do given it's just a language model. But if you start gluing on more components, we could end up with a more or less sentient AGI within a few years.

Bing have already hooked it up to a search engine. That post hooks it up to other tools.

I think what is needed next is a long term memory where it can store dynamic facts and smartly retrieve them later, rather than relying on the just the 4000 token current window. It needs to be able to tell when a user is circling back to a topic they talked about months ago and pull out the relevant summaries of that conversation.

I also think it needs a working memory that it continually edits the token window to fit the relevant state of the conversation. Summarising recent tokens, saving things out long term storage, pulling new infomation in from long term storage, web searches and other tools.


I think a number of breakthroughs may be need to keep an AI 'sane' with a large working memory at this point. How do we keep them 'on track' at least in a way that seems somewhat human. Humans that have halting problem issues can either be geniuses (diving into problems and solving them to the point of ignoring their own needs), or clinical (ignoring their needs to look at a spot on the wall).


Just wait til we hook it up to our nuclear defense grid


It is like we have unlocked an entirely new category of stereotyping that we never even realized existed.

Intelligence is not a prerequisite to speak fancifully.

Some other examples:

1. We generally assume that lawyers or CEOs or leaders who give well spoken and inspirational speeches actually know anything about what they're talking about.

2. Well written nonsense papers can fool industry experts even if the expert is trying to apply rigorous review: https://en.m.wikipedia.org/wiki/Sokal_affair

3. Acting. Actors can easily portray smart characters by reading the right couple sentences off a script. We have no problem with this as an audience member. But CGI is needed for making your superhero character jump off a building without becoming a pancake.


>Intelligence is not a prerequisite to speak fancifully.

I think this may be a bastardization of the word intelligence. To speak fancifully an a manner accepted by the audience requires some kind of ordered information processing and understanding of the audiences taste. Typically we'd consider that intelligent, but likely Machiavellian depending on the intent.

The problem with the word intelligence is it is too big of concept. If you look at any part of our brain, you will not find (human)intelligence itself, instead it emerges from any number of processes occurring at different scales. Until we are able to break down intelligence into these smaller better (but not perfectly) classified pieces we are going to keep running into these same problems over and over again.


> easily portray smart characters

I don't think it is possible for people to emulate the behavior of superintelligent beings. In every story about them, they appear to not actually be any smarter than us.

There is one exception - Brainwave by Poul Anderson. He had the only credible (to me) take on what super intelligent people might be like.


Rupert Sheldrake suggests that consciousness is partly about seeing possibilities for our future, evaluating them, and choosing between them. If we make decisions the same way, they change to unconscious habits.

A hungry creature can eat what it sees or stay hungry. Another has more memory and more awareness of different bark and leaves and dead animals to choose from. Another has a better memory of places with safe food in the past and how to get to them. A tool using human can reason down longer chains like 'threaten an enemy and take their food' or 'setup a trap to kill an animal' or 'dig up root, grind root into mash, boil it, eat the gruel'. In that model, a super intelligence might be able to:

- Extract larger patterns from less information. (Con: more risk of a mistake).

- Connect more patterns or more distant patterns together with less obvious connections. (Con: risk of self-delusion).

- Evaluate longer chains of events more accurately with a larger working memory, more accurate mental models. (Con: needs more brain power, more energy, maybe longer time spent in imagination instead of defending self).

- Recall more precise memories more easily. (Con: cost of extra brain to store informaiton and validate memories).

This would be a good model for [fictional] Dr House, he's memorised more illnesses, he's more attentive to observing small details on patients, and more able to use those to connect to existing patterns, and cut through the search space of 'all possible illnesses' to a probable diagnosis based on less information than the other doctors. They run out of ideas quicker, they know fewer diseases, and they can't evaluate as long chains of reasoning from start to conclusion, or make less accurate conclusions. In one episode, House meets a genius physicist/engineer and wants to get his opinion on medical cases, but the physicist declines because he doesn't have the medical training to make any sense of the cases.

It also suggests that extra intelligence might get eaten up by other people - predicting what they will do, while they use their extra intelligence to try to be unpredictable. And it would end up as exciting as a chess final, where both grandmasters sit in silence trying to out-reason their opponent through deeper chains in a larger subset of all possible moves until eventually making a very small move. And from the outside players all seem the same but they can reliably beat me and they cannot reliably beat each other.


I remember thinking when i read it that Ted Chiang's 'Understand' did a good job (although have not re-read it to verify this):

https://web.archive.org/web/20140527121332/http://www.infini...


Actors have no problem playing smart people, but movie writers often have a LOT of trouble actually writing them. I'm still not sure that something like ChatGPT will be able to actually be clever.

I would also add that the mask is kind of coming off on CEOs and other inspirational speakers. Inspirational speaking is all a grift. They know only how they got rich (if they are even rich - most people on the speaking circuit make less than you think), not how anyone else did, and that knowledge usually doesn't translate well from the past to the future. There are a few exceptions, but most of these well-spoken people don't really know what they're talking about - they're just not self-aware enough to know that they don't know.


> Intelligence is not a prerequisite to speak fancifully.

I don't even disagree necessarily, but this is an amazing example of AI goalpost shift in action.


Yeah, I read this sentiment all the time and here's what I always say – just don't use it. Leave it to the rest of us if it's so wrong / off / bad.

BTW, have you considered maybe you aren't so good at using it? A friend has had very little luck with it, even said he's been 'arguing with it', which made me laugh. I've noticed that it's not obvious to most people that it's mostly about knowing the domain well enough to ask the right question(s). It's not magic, it won't think for you.

Here's the thing… my experience is the opposite… but maybe I'm asking it the right questions. Maybe it's more about using it to reason through your problem in a dialog, and not just ask it something you can google/duckduckgo. It seems like a LOT of people think it's a replacement for Google/search engines – it's not, it's another tool to be used correctly.

Here are some examples of successful uses for me:

I carefully explained a complex work issue that involves multiple overlapping systems and our need to get off of one of them in the middle of this mess. My team has struggle for 8 months to come up with a plan. While in a meeting the other day I got into a conversation with ChatGPT about it, carefully explained all the details and then asked it to create a plan for us to get off the system while keeping everything up / running. It spit out a 2 page, 8 point plan that is nearly 100% correct. I showed it to my team, and we made a few minor changes, and then it was anointed 'the plan' and we're actually moving forward.

THEN last night I got stuck on a funny syntax issue that googling could never find the answer. I got into a conversation with ChatGPT about it, and after it first gave me the wrong answer, I told it that I need this solution for the latest dontet library that follows the 'core' language syntax. It apologized! And then gave me the correct answer…

My hunch is the people that are truly irked by this are too deep / close to the subject and because it doesn't match up with what they've worked on, studied, invested time, mental energy into, well then of course it's hot garbage and 'bad'.


You all say it's solving these amazing complex tasks for you, but then don't provide any details.

Then "naysayers" like the linked article provide a whole document with images and appendixes showing it struggles with basic tasks...

So show us. For the love of god all of us would very much LIKE this technology to be good at things! Whatever techniques you're using to get these fantastical results, why don't you share them?

I can get it to provide snippets of code, CLI, toy functions that work. Beyond that, I am apparently an idiot compared to you AI-whisperers.

Also... Whatever happened to "extraordinary claims require extraordinary proof?"

An AI that creates a complex system, condensed into an actionable plan, that has stumped an entire team for 8 months is a (pardon the language) bat-shit insane claim. Things like this used to require proof to be taken seriously.


The explanation is easy.

An analytic prompt contains the facts necessary for the response. This means the LLM acts as a translator.

A synthetic prompt does not contain the facts necessary for the response. This means the LLM acts as a synthesizer.

A complete baseball box score being converted into an entertaining paragraph description of the game is an analytic prompt and it will reliably produce a factual outcome.

https://www.williamcotton.com/articles/chatgpt-and-the-analy...

There’s a bunch of active research in this area:

https://github.com/lucidrains/toolformer-pytorch

https://reasonwithpal.com/


Thank you so much!

Your technique of only posing analytical questions is indeed improving the results. It's not great, but I can actually get it to somewhat reliably summarize academic articles if I give it a citation now, which is pretty neat.

It doesn't summarize them well (I gave it a couple softballs, like summarizing McIntosh's "White Privilege: Unpacking the Invisible Knapsack", which almost every undergrad student in the humanities will have written about), but the stuff that it does make up is completely innocuous and not a big deal.

Very cool, thanks again.


It’s amazing how taking time to slow down and approach things in a measured manner can lead to positive results.

It’s not at all surprising that most of the popular conversation about these tools is akin to randomly bashing into walls while attempting to push the peg into whatever “moment we need to talk about”.

What is again surprising is that HN is primarily overrun with randomly bashing into walls.

I guess I’m normally in threads about C memory arenas, a topic that probably draws more detailed thinkers in the first place.


My take: Because GPT is just stochasticly stringing words after each other, it is remarkedly good at producing text on par with other text available on the internet. So it can produce plans, strategies, itineraries and so on. The more abstract the better. The 8 point plan is likely great.

It will much more likely fail on anything which involves precision/computation/logic. That's why it can come up with an generic strategy but fail to repeat unadjusted GAAP earnings.


I agree it's pretty good at generalities, doesn't shit the bed quite so much. Yet to suggest a plan that an entire team of professionals, who have been working for 8 months could not figure out?

It's certainly not that good, absent some amazing wizardry or some very silly professionals in a very squishy field. Yet I have no explanation for why someone would go on the internet and lie about something like that.

There were comments a while back (less so now) of people making other claims like it was solving complex functions for them and writing sophisticated software.

The entire thing baffles me. If I could get it to do that, I'd be showing you all of my marvelous works and bragging quite a bit as your newfound AI-whisperer. Hell, I'd get it to write a script for me to run that evangelized itself (edit: and me of course, as its chosen envoy to mankind) to the furthest corners of the internet!


I mean no disparagement to the OP, but maybe the team is just really bad at planning. Or perhaps they’re stretched thin and are having a hard time seeing the bigger picture.

I’m not saying such a claim doesn’t need more info, but I’ve been on teams before that lacked anyone with a good project management skillset.


"ChatGPT, think of a novel way to scam the elderly safely. Write it as an 8 points plan."


There was an article not too long ago, that I'm struggling to find, that did a great job of explaining why language models are much much better suited to reverse-engineering code than they are at forward-engineering it.


I can provide an example.

I have found ChatGPT to be a valuable tool for improving the clarity and readability of my writing, particularly in my blogs and emails. You can try this by asking questions such as "Can you improve the grammar of the following paragraphs?". You can also specify the desired tone.

It is impressive at simplifying complex technical language. Take the following sentences from a draft I wrote:

To mitigate these issues, it is recommended to simulate the effect of say n random permutations using n random hash functions (h1, h2, … hn) that map the row numbers (say 1 to k) to bucket numbers of the same range (1 to k) without a lot of collisions. This is possible if k is sufficiently large.

What ChatGPT suggested:

To address these issues, it's recommended to simulate the effect of n random permutations using n random hash functions (h1, h2, … hn). These hash functions should map the row numbers (from 1 to k) to bucket numbers within the same range (1 to k) with minimal collisions. This is achievable if the range k is large enough.


Try Grammarly. It's extremely good at this, and with an incredible UX.


It replaces Grammarly (I also don't want that keylogger spyware running anywhere near my systems) entirely and provides additional features. Can Grammarly also write the Haikus necessary to make me chuckle?


Do you have evidence backing up the claims of “keylogger spyware”? Is it less of a keylogger or spyware than chatgpt in any way?

It obviously doesn’t write hilarious haikus. It doesn’t write anything and that’s the point. It suggests improvements on what you write.


It has to be a keylogger to do what it does.

https://www.kolide.com/blog/is-grammarly-a-keylogger-what-ca...


Yes, I've been using Grammarly for several years now. I still use it in conjunction with ChatGPT. It's efficient in correcting spelling and basic grammar errors. However, more advanced features are only available to premium users. At present, their $12/m fee is a bit steep for me.


The more advanced features of chatgpt are $20/m as I’m sure you’re aware.

What do you get out of chatgpt in this realm? I feel very annoyed by its constant tropes and predictable style. Is that something you dont need to care about?


Interesting. I've wondered how useful that the AI stuff added to Microsoft Office would be. Does that mean that there is be a "make my grammar" button like in the example above?


Are you referring to Microsoft Editor (https://www.microsoft.com/en-us/microsoft-365/microsoft-edit...)? It appears to be an interesting tool - includes tone, spelling, and grammar suggestions. I have yet to try it myself.


This reminds me of chain letters of old. "This guy ignored the letter, then his house burned down. But he found the letter back, sent it to 50 people, and lo and behold he won the lottery the very next day and was able to build a better house."


When prompted with a post dripping in snark, who aside from a masochist with nothing better to do is going to produce examples so they can be nitpicked to death? Posts like yours do not come off like wanting a discussion, they come off like angling for a fight.

Meanwhile, my sidebar of chat history is about five pages long and ever-growing. Quite a lot of my scripting in the past few weeks has been done with ChatGPT's help. So on one hand I have the angry skeptics who practically scream that is not doing the things I can see it doing, who appear to be incapable of discussing the topic without resorting to reductive disparagement, and on the other hand I can see the tasks it's accomplishing for me.

Guess what anyone in my position is going to do?


> My hunch is the people that are truly irked by this are too deep / close to the subject and because it doesn't match up with what they've worked on, studied, invested time, mental energy into, well then of course it's hot garbage and 'bad'.

That's quite the straw man you've built. Recognizing the limitations of a technology is not the same as calling it hot garbage.

As a language model it's amazing, but I agree with the GP. It's not intelligent. It's very good at responding to a series of tokens with its own series of tokens. That requires a degree of understanding of short scale context that we haven't had before in language models. It's an amazing breakthrough.

But it's also like attributing the muscle memory of your hand to intelligence. It can solve lots of problems. It can come up with good configurations. It is not, on its own, intelligent.


> It seems like a LOT of people think it's a replacement for Google/search engines

Well, that "lot" includes the highest levels of management from Microsoft and Google, so maybe the CAPS are justified. And the errors we're talking about here are errors produced by said management during demos of their own respective product. You would think they know how to use it "correctly".


I'm going to let you in on a secret: managers, even high-level ones, can be wrong - and indeed they frequently are.


Thanks for that unique insight.

But the question is, are they wrong in that they don't know how to use / promote an otherwise good product, or are they wrong because they are choosing to put forward something that is completely ill-suited for the task?


Both.


"Just don't use it" is not salient advice for non-technical people who don't know how it works, and are misled by basically dishonest advertising and product packaging. But hopefully the market will speak, users at large will become educated about its limits via publicized blunders, and these products will be correctly delimited as "lies a lot but could be useful if you are able/willing to verify what it says."


I think the original sentence was written more in of "Your loss is my gain" competitive advantage vein. The real trick is, as you say, to critically assess the output, and many people are incapable of that.


I feel similarly reading many critiques, but honestly the GP is one of the more measured ones that I've read - not sure that your comment is actually all that responsive or proportionate.


> Maybe it's more about using it to reason through your problem in a dialog, and not just ask it something you can google/duckduckgo.

Your experience with it sounds very similar to my own. It exhibits something like on-demand precision; it's not a system with some fundamental limit to clarity (like Ted Chiang via his jpeg analogy, and others, have argued): it may say something fuzzy and approximate (or straight up wrong) to begin with but—assuming you haven't run into some corner where its knowledge just bottoms out—you can generally just tell it that it made a mistake or ask for it to elaborate/clarify etc., and it'll "zoom in" further and resolve fuzziness/incorrect approximation.

There is a certain very powerful type of intelligence within it as well, but you've got to know what it's good at to use it well: from what I can tell it basically comes down to it being very good at identifying "structural similarity" between concepts (essentially the part of cognition which is rooted in analogy-making), allowing it to very effectively make connections between disparate subject matter. This is how it's able to effectively produce original work (though typically it will be directed there by a human): one of my favorite examples of this was someone asking it to write a Lisp program that implements "virtue ethics" (https://twitter.com/zetalyrae/status/1599167510099599360).

I've done a few experiments myself using it to formalize bizarre concepts from other domains and its ability to "reason" in both domains to make decisions about how to formalize, and then generating formalizations, is very impressive. It's not enough for me to say it is unqualifiedly "intelligent", but it imo its ability to do this kind of thing makes it clear why calling it a search engine, or something merely producing interpolated averages (a la Chiang), is so misleading.


Just to flip this around for a second, with both of your examples, it sounds like you may have a problem with writer's block or analysis paralysis, and ChatGPT helped you overcome that simply due to the fact that it isn't afraid of what it doesn't know. If that helps you, go for it.

On the other hand, it could also help you to just write a random plan or try a few random things when you get stuck, instead of trying to gaze deeply into the problem for it to reveal its secrets.


> Yeah, I read this sentiment all the time and here's what I always say – just don't use it. Leave it to the rest of us if it's so wrong / off / bad.

If it were only a matter of private, individual usage, I'd be fine with it. If that's all you're asking for, we can call it a deal. But it isn't, is it?


> THEN last night I got stuck on a funny syntax issue that googling could never find the answer. I got into a conversation with ChatGPT about it, and after it first gave me the wrong answer, I told it that I need this solution for the latest dontet library that follows the 'core' language syntax. It apologized! And then gave me the correct answer…

Great that it worked for you, I had a similar problem (Google had no answer) but more complex than syntax issue, I'm also domain expert in what I was asking and chatGPT also gave me wrong answer the first time then apologized and gave me wrong answer again, I've explained what's wrong and it did it again and again.. Never providing correct answer so I just gave up and used human brain. Seems like your problem was in distribution.


Don't like chlorofluorocarbons or tetraethyllead? Just don't use them.


I mean sure.

In other news I asked it to make a list of all the dates in 2023 that were neither weekends nor US federal holidays and it left Christmas Day on the list.


Yea, I think people hide “the magic smoke” by using complex queries and then filling in the gaps of chatGPT’s outputs with their own knowledge, which then makes them overvalue the output. Strip that away to simple examples like this and it becomes more clear what’s going on. (I think there IS a lot of value for them in their current state because they can jog your brain like this, just not to expect it to know how to do everything for you. Think of it as the most sophisticated rubber duck that we’ve made yet).


I don't understand this take. These LLM-based AIs provide demonstrably incorrect answers to questions, they're being mass-marketed to the entire population, and the correct response to this state of affairs is "Don't use it if you don't know how"? As if that's going to stop millions of people from using it to unknowingly generate and propagate misinformation.


Isn't that what people said about Google Search 20 years ago- that people won't know how to use it, that they will find junk information, etc. And they weren't entirely wrong, but it doesn't mean that web search isn't useful.


No, I don't recall anyone saying that. They mostly said "this is amazingly effective at finding relevant information compared to all other search engines." Google didn't invent the Web, so accusing it of being responsible for non-factual Web content would have been a strange thing to do. Bing/Chat-GPT, on the other hand, is manufacturing novel non-factual content.


Can you share any source for the claim about what people said about Google Search?


That’s a good point. I don’t think anyone is denying that GPT will be useful though. I’m more worried that because of commercial reasons and public laziness / ignorance, it’s going to get shoehorned into use cases it’s not meant for and create a lot of misinformation. So a similar problem to search, but amplified


There are some real concerns for a technology like ChatGPT or Bing's version or whatever AI. However, a lot of the criticisms are about the inaccuracy of the model's results. Saying "ChatGPT got this simple math wrong" isn't as useful or meaningful of a criticism when the product isn't being marketed as a calculator or some oracle of truth. It's being marketed as an LLM that you can chat with.

If the majority of criticism was about how it could be abused to spread misinformation or enable manipulation of people at scale, or similar, the pushback on criticism would be less.

It's nonsensical to say that ChatGPT doesn't have value because it gets things wrong. What makes much more sense is to say is that it could be leveraged to harm people, or manipulate them in ways they cannot prevent. Personally, it's more concerning that MS can embed high-value ad spots in responses through this integration, while farming very high-value data from the users, wrt advertising and digital surveillance.


> It's being marketed as an LLM that you can chat with.

... clearly not, right? It isn't just being marketed to those of us who understand what an "LLM" is. It is being marketed to a mainstream audience as "an artificial intelligence that can answer your questions". And often it can! But it also "hallucinates" totally made up BS, and people who are asking it arbitrary questions largely aren't going to have the discernment to tell when that is happening.


Great write up. My experience is spot on with your examples.

> I've noticed that it's not obvious to most people that it's mostly about knowing the domain well enough to ask the right question(s). It's not magic, it won't think for you.

Absolutely right with the part of knowing the domain.

I do not entertain or care about the AI fantasies because ChatGPT is extremely good at getting me other information. It saves me from opening a new tab, formulating my query and then hunting for the information. I can save that extra time for what latest / relevant information I should grab from Google.

Google is still in my back pocket for the last mile verification and judgement. I am also skeptical of the information ChatGPT throws out (such as old links). Other than that, ChatGPT to me is as radical as putting the url and search bar into one input. I just move faster with the information.


When did they say it’s garbage? They gave their opinions on its shortcomings and praised some of the things it excels at. You’re calling the critics too emotional but this reply is incredibly defensive.

Your anecdotes are really cool and a great example of what GPT can do really well. But as a technical person, you’re much more aware of its limitations and what is and isn’t a good prompt for it. But as it is more and more marketed to the public, and with people already clamoring to replace traditional search engines with it, relying on the user to filter out disinformation well and not use it for prompts it struggles with isn’t good enough.


I too have a very positive experience. I ask specific questions about algorithms and how technical projects work and I enjoy its answers. They won’t replace my need to visit a real search engine neither I take them at face value. But as a starting point for any research I think it’s an amazing tool. It’s also quite good for marketing stuff, like writing e-mails, cover letters, copy for your website, summarizing or classifying text, and all language related stuff.

People think it’s Cortana from Halo and ask existential questions or they’re trying to get it to express feelings.

I think the AI part on its presentation created too much expectations of what it can do.


This doesn't seem like a response to your parent comment, which in no way suggested they were "irked" by this or consider it bad. It was an insightful comment contrasting strengths and weaknesses of these language models. It's a pretty weak rebuttal in my view to just say "there are no weaknesses, you're just doing it wrong!".


I’d really love to hear more about your workplace use-case, what kind of systems are we talking about here?

This is a way of using ChatGPT I haven’t really seen before, I’m really into it.


If only a small subset of people online are able to truly take advantage of ChatGPT, then I don't think Google is as threatened by it as many have portrayed.


I imagine your first example includes private industry information that you are not allowed to divulge.

But your latter example about syntax… mind sharing that ChatGPT conversation?


Sentient AIs in science fiction are always portrayed as being more-or-less infallible, at least when referencing their own knowledge banks.

Then ChatGPT comes along and starts producing responses good enough that people feel like almost sentient AI. And they suddenly start expecting it to share the infallibility that fictional AIs have always possessed.

But it's not a sentient AI. It's just a language model. Just a beefed up auto-correct. I'm very impressed just what capabilities a language model gets when you throw this many resources at it (like, it seems to be able to approximate logic and arithmetic to decent accuracy, which is unexpected).

Also... even if it was a sentient AI, why would it be infallible? Humans are sentient, and nobody ever accused us of being infallible.


>But it's not a sentient AI. It's just a language model. Just a beefed up auto-correct.

There is a large space between "sentient" and "beefed up autocorrect". Why do people insist on going for the most reductive description they can muster?


Don't mistake my "reductive description" as disapproval. I'm actually really impressed and I see a bright future.

But I think it's really important that we don't give GPT3 more credit that it deserves.

All the discourse around GPT3 and it's derivatives like Github copilot or ChatGPT have shown that people (even tech literate people) have a strong bias towards anthropomorphising it as some kind of "proto-sentient" AI.

In my opinion, this bias is actually very damaging towards the reputation of language models. People start expecting way too much from them, and then feeling confused or even betrayed when the language models start confidently sprouting bullshit.

Also, I don't think "beefed up autopredict" is reductive at all. (Though I might have said "beefed up autocorrect" in my previous comment, whoops). The 10,000 ft view of it's runtime architecture is identical to autopredict. You take some input context, encode it into tokens and feed them into a neural network, and to get a prediction of the next few words.

The innovation in GPT3 has nothing to do with moving away from that basic architecture.

The innovation is all about improving the encoding of tokens, changing the internal architecture of the neural network support larger models and make better predictions, improving the training process and finally the training data itself. They also massively increased the size of both the training set and the model... but I'm not sure that counts as innovation.

IMO, GPT3 is actually way more impressive once you start thinking of it as nothing more than a beefed up autopredict.


Correct. It is indeed beefed up auto predict


It is beefed up auto predict in the same way that brains are beefed up electrochemical integrators. While technically true, it is maximally uninsightful. We should be aiming for maximum insight, not the opposite.


I get the impression that you are massively underselling auto-predict.

If you took GPT3's architecture and scaled it down to the size and training set of a typical auto-predict, it would produce near identical results. You wouldn't be able to tell the two apart.

Likewise, if we took an auto-predict architecture from 8 years ago, scaled it up to the size of GPT3 and could train it on GPT3's training set, it would produce similar output to GPT3 and we would see the exact same emergent intelligence capabilities. (Though, it's probably not possible to compete training in a practical time-frame, the real innovation of GPT3 was optimising the architecture to make training such a large model practical)

I think very insightful to point out just how similar the two are. Because it shows the capabilities of language models are not because of any architectural element, but are emergent from the model and it's training data.

It also makes me excited for what will happen when they move beyond the "beefed up auto-predict" architecture. (Arguably, Bing has taken a small step in this direction by bolting a search engine onto it)


>If you took GPT3's architecture and scaled it down to the size and training set of a typical auto-predict, it would produce near identical results.

This is almost certainly not true. The number of parameters is an important feature related to the quality of output. If you scaled the architecture down significantly, it would be significantly less capable[1]. But perhaps I misunderstand your point.

>Likewise, if we took an auto-predict architecture from 8 years ago, scaled it up to the size of GPT3 and could train it on GPT3's training set, it would produce similar output to GPT3

This is also not true. The transformer is a key piece in the emergent abilities of language models. The difficulties in scaling RNNs are well known. Self-supervised learning is powerful, but it needs to be paired with a flexible architecture to see the kinds of gains we see with LLMs.

Stacked Transformers with self-attention are extremely flexible in finding novel circuits in service to modelling the training data. The question is how to characterize this model in a way that doesn't short-sell what it is doing. Reductively describing it in terms of its training regime is just to treat the resulting model as explanatorily irrelevant. But the complexity and the capabilities are in the information dynamics encoded in the model parameters. The goal is to understand that.

[1] https://arxiv.org/abs/2001.08361


> If you scaled the architecture down significantly, it would be significantly less capable[1]. But perhaps I misunderstand your point.

No, my point is that if you scaled a transformer based architecture, down to the equivalent parameter size and training set of a typical 2015 era auto-predict, it would produce near identical results to a 2015 era auto-predict.

> The difficulties in scaling RNNs are well known

The scaling issues in training RNNs are completely irreverent to my point.

Transformers are computationally equivalent to RNNs. It's possible to convert a pre-trained Transformer model into an RNN [1]. There is nothing magical about the Transformer architecture that makes it better at generation.

[1] https://arxiv.org/abs/2103.13076


>it would produce near identical results to a 2015 era auto-predict.

I don't know that this is true, but it is plausible enough. But the benefit of Transformers is that they are stupid easy to scale. It is in scale that they are able to perform so remarkably across so many domains. Comparing the function of underparameterized versions of the models and concluding that some class of models are functionally equivalent due to their equivalent performance in underparameterized regimes is a mistake. The value of an architecture is in its practical ability to surface functional models. In theory, a MLP with enough parameters can model any function. But in reality, finding the model parameters that solve real world problems becomes increasingly difficult. The inductive biases of Transformers is crucial in allowing it to efficiently find substantial models that provide real solutions. The Transformer architecture is doing real substantial independent work in the successes of current models.


Because the average person you speak to would consider beefed up autocorrect to be near magic as it is. Once you get near to the limits of an individuals comprehension adding more incomprehensible statements/ideas doesn't really change much, their answer is still 'magic'.


Heuristics, aka HumanGPT.


The lack of consistency is a big issue. It may well be able to organize your trip to mexico, but then it tells me that "the product of two primes must be prime because each factor is prime" ... how will one ever trust it? Moreover, how to use it?

If a Tesla can get you there with 1% human intervention, but that happens to be the 1% that would have killed you had you not intervened ... how do we interface with such systems?


> It does not do analyses

I find interacting with ChatGPT strangely boring. And Copilot is neat but I'm not blown away by it. However... just for laughs I threw some obfuscated genetic algorithm code I'd written at ChatGPT and asked it to guess what the code did. It identified the purpose of the code and speculated on the meaning of certain parameters that weren't clear in the sample I'd presented it. Pretty impressive.

I also showed it some brainfuck code for generating a Mandelbrot set, and it immediately identified it. From that point forward, though, it thought all other brainfuck code generated Mandelbrot sets.


There's more than comprehension. It can do some amount of higher order reasoning: https://arxiv.org/abs/2302.02083

"Theory of mind (ToM), or the ability to impute unobservable mental states to others, is central to human social interactions, communication, empathy, self-consciousness, and morality. We administer classic false-belief tasks, widely used to test ToM in humans, to several language models, without any examples or pre-training. Our results show that models published before 2022 show virtually no ability to solve ToM tasks. Yet, the January 2022 version of GPT-3 (davinci-002) solved 70% of ToM tasks, a performance comparable with that of seven-year-old children. Moreover, its November 2022 version (davinci-003), solved 93% of ToM tasks, a performance comparable with that of nine-year-old children. These findings suggest that ToM-like ability (thus far considered to be uniquely human) may have spontaneously emerged as a byproduct of language models' improving language skills."


We're just seeing the standard hype cycle. We're in the "Peak of Inflated Expectations" right now. And a lot of people are tumbling down into the "Trough of Disillusionment"

Behind all the hype and the froth there are people who are finding uses and benefits - they'll emerge during the "Slope of Enlightenment" phase and then we'll reach the "Plateau of Productivity".


Except what’s not conveyed here is social pushback due to any perceived harms. Look at social media and “big tech” - the world, despite high usage, is now casting these as social ills ready for dismantling. This tech cycle is more appropriate when its potential to ebb social fabric is much less. The forces here will be too strong.


I agree completely with the first part of your post. However, I think even performing these language games should definitely be considered AI. In fact, understanding natural language queries was considered for decades a much more difficult problem than mathematical reasoning. Issues aside, it's clear to me we are closer to solving it than we ever have been.


Sorry, I didn't mean that LLMs are not a subset of AI. They clearly are. What they are not is equal to AI; there are things that are AI that are not LLMs.

It is obvious when I say it, but my internal language model (heh) can tell a lot of people are not thinking that way when they speak, and the latter is often more reliable than how people claim they are thinking.


I think the problem here is in a classification of what is ( I ) in the first place. For us to answer the question of what equals AI we must first answer the question of what equals human intelligence in a self consistent, logical, parsable manner.


I think the ultimate problem with AI is its overvalued as a technology in general. Is this "amazing level of comprehension" really that necessary given the amount of time/money/effort devoted to it? What's become clear with this technology that's been inaccurately labeled as "AI" is that it doesn't produce economically relevant results. It's a net expense anyway you slice it. It's like seeing a magician perform an amazing trick. It's both amazing and entirely irrelevant at the same time. The "potential" of the technology is pure marketing at this point.


I mean, I take if I stuck you back in 1900 you'd say the same about flying. "Look at all this wasted effort for almost nothing". And then pretty quickly the world rapidly changed and in around 50 years we were sending things to space.

Intelligence isn't just one thing, really I would say its the emergent behavior of a bunch of different layers working together. The LLM being just one layer. As time goes on and we add more layers to it the usefulness of the product will increase. At least from a selfish perspective of a corporation, whoever can create one of these intelligences may have the ability to free themselves of massive amounts of payroll by using the AI to replace people.

The potential of AI should not be thought of any differently than the potential of people. You are not magic, just complicated.


I don't get the point of comparing apples to make a point about oranges. Flying isn't AI. Nor is "progress" a permanent state. If you want to stay in the flying comparison: in 2000 you can fly from NY to Paris in 3 hours on a Concord, something no longer possible in 2023. Why? Because economics made it unfeasible to maintain. Silicon Valley has made enough promises using "emergent" behavior and other heuristics to justify poor investments. Unfortunately it's taken out too many withdrawals from its bank of credibility and there'a not enough to cloud their exit schemes in hopes and dreams.


And yet every day we still fly faster and farther than the objects designed by evolution. And much like evolution of creatures go extinct.

Promises are always meaningless, progress is in the results, and recently LLMs have been giving us results. You can choose to invest or not invest, but in an environment where there is still a lot of investment money around I don't see work on this stopping any time soon.


It seems to me it is really good at writing. I would think it could replace the profession of techincal writing for the most part, it could help you write emails, (bring back clippy MS you cowards), it could be used as a frontend to an FAQ/self service help type system.


Have you read the article? You'd have to have 100% faith in the tech to allow it to side-step an actual person. Unless your site is purely a click-farm, you're still probably hiring someone to check it--so what's the point of having it?


While I agree with everything you've said, I also see that steady, incremental progress is being made, and that as we identify problems, we're able to fix it. I also see lots of money being thrown at this and enough people finding genuine niche uses for this that I see it continuing on. Wikipedia was trash at first, as were so many other technologies. But there was usually a way to slowly improve it over time, early adopters to keep the cash flowing, identifiable problems with conventional solutions, etc.


> new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over

Good thing is that we are dealing with exactly same type of code here and there for tens of years already. Actually, every time I see a commercial codebase not exactly like a yarn of spaghetti, I thank gods for it, because it is not a rule, but an exception.

What I really wonder is what it will be like when the next version of the same system will be coded from the ground up by next version of the same ML model?


"babble in a confident manner"

OK, so we figured out how to automate away management jerks. Isn't that a success?


> the language portion of your brain does not do logic

This seems ... Wrong? I suppose that most of what we generally call high-level logic is largely physically separate from some basic functions of language, but just a blanket statement describing logic and language as two nicely separate functions cannot be a good model of the mind.

I also feel like this goes to the core of the debate, is there any thought going on or is it just a language model; I'm pretty sure many proponents of AI believe that thought is a form of very advanced language model. Just saying the opposite doesn't help the discussion.


I’m not sure whether the hype train is going to crash, or whether only a few very smart companies, using language problems for what they’re really good at (aka: generate non-critical texts), will manage to revolutionize one field.

We’re at the very first beginning of the wave, so everybody is a bit overly enthusiastic, dollars are probably flowing, and ideas are popping everywhere. Then will come a harsh step of selection. The question is what will the remains look like, and how profitable they’ll be. Enough to build an industry, or just niche.


> GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses.

I like this analogy as a simple explanation. To dig in though, do we have any reason to think we can’t teach a LLM better logic? It seems it should be trivial to generate formulaic structured examples that show various logical / arithmetic rules.

Am I thinking about it right to envision that a deep NN has free parameters to create sub-modules like a “logic region of the brain” if needed to make more accurate inference?


"To dig in though, do we have any reason to think we can’t teach a LLM better logic?"

Well, one reason is that's not how our brains work. I won't claim our brains are the one and only way things can work, there's diversity even within human brains, but it's at least a bit of evidence that it is not preferable. If it were it would be an easier design than what we actually have.

I also don't think AIs will be huge undifferentiated masses of numbers. I think they will have structure, again, just as brains do. And from that perspective, trying to get a language model to do logic would require a multiplicatively larger langauge model (minimum, I really want to say "exponentially" but I probably can't justify that... that said, O(n^2) for n = "amount of math understood" is probably not out of the range of possibility and even that'd be a real kick in the teeth), whereas adjoining a dedicated logic module to your language model will be quite feasible.

AIs can't escape from basic systems engineering. Nothing in our universe works as just one big thing that does all the stuff. You can always find parts, even in biology. If anything, our discipline is the farthest exception in that we can build things in a fairly mathematical space that can end up doing all the things in one thing, and we consider that a serious pathology in a code base because it's still a bad idea even in programming.


This all matches my intuition as a non-practitioner of ML. However, isn’t a DNN free to implement its own structure?

Or is the point you’re making that full connectivity (even with ~0 weights for most connections) is prohibitively expensive and a system that prunes connectivity as the brain does will perform better? (It’s something like 1k dendrites per neuron max right?)

The story of the recent AI explosion seems to be the surprising capability gains of naive “let back-prop figure out the structure” but I can certainly buy that neuromorphic structure or even just basic modular composition can eventually do better.

(One thought I had a while ago is a modular system would be much more amenable to hardware acceleration, and also to interpretability/safety inspection, being a potentially slower-changing system with a more stable “API” that other super-modules would consume.)


> do we have any reason to think we can’t teach a LLM better logic?

I'll go for a pragmatic approach: the problem is that there is no data to teach the models cause and effect.

If I say "I just cut the grass" a human would understand that there's a world where grass exists, it used to be long, and now it is shorter. LLMs don't have such a representation of the world. They could have it (and there's work on that) but the approach to modern NLP is "throw cheap data at it and see what sticks". And since nobody wants to hand-annotate massive amounts of data (not that there's an agreement on how you'd annotate it), here we are.


I call this the embodiment problem. The physical limitations of reality would quickly kill us if we didn't have a well formed understanding of them. Meanwhile AI is stuck in 'dream mode', much like when we're dreaming we can do practically anything without physical consequence.

To achieve full AI I believe will eventually have to our AI's have a 'real world' set of interfaces to bounds check information.


> We are so amazed by its ability to babble in a confident manner

Sure, we shouldn't use AI for anything important. But can we try running ChatGPT for George Santos's seat in 2024?


To add to your point, current technology does not even suggest if we will ever have such an AI. I personally doubt it. Some evidence: https://en.wikipedia.org/wiki/Entscheidungsproblem.

This is like trying to derive the laws of motion by having a computer analyze 1 billion clips of leaves fluttering in the wind.


Who’s to say that a large language model is fundamentally incapable of learning some kind of ability to reason or apply logic?

Fundamentally, our brains are not so different, in the sense that we are not also apply some kind of automated theorem solver directly. We get logic as an emergent behavior of a low-level system of impulses and chemical channels. Look at kids, they may understand simple cause and effect, but gradually learn things like proof by contradiction (“I can’t have had the candy because I was in the basement”). No child is born able to apply logic in a way that is impressive to adults - and many adults are not able to apply it well either.

I don’t think LLMs are going to automatically become super-human logicians capable of both complex mathematical proofs and composing logically consistent Homerian Epics, but to me there is no reason they could not learn some kind of basic logic, if only because it helps them better model what their output should be.


Really liked your analogy on GPT being similar to the language center of the brain. Almost all current methods to teach GPT deductive logic has been through an inductive approach; giving it training examples on how to do deduction. Thing is it might be possible to reach 80% of the way there with more data and parameters but a wall will be hit sooner or later


The combination of natural language communication with search and planning is very exciting. Although it has been overshadowed by the popularity of ChatGPT, [1] demonstrates the capability of using intents generated through a strategic planning stage to drive convincing human-like dialogue (as tested in blitz Diplomacy).

I'm really interested in the creation of human-compatible agents. As you mention, these agents will likely be composed of multiple components which have specialised functionality.

[1] "Human-level play in the game of Diplomacy by combining language models with strategic reasoning", explores the integration of language models with strategic reasoning. https://www.science.org/doi/10.1126/science.ade9097


I have bookmarked your comment and I hope to have the discipline to come back to it every 3 months or so for the next couple of years. Because I think you are right but I didn't noticed it before. When the real things cole, we will probably be blindsided.


I hearken back before dot-bomb and occasionally people would ask me to work on "web sites" which they'd built with desktop publishing software (e.g. ColdFusion).

They'd hand me the code that somebody would've already hacked on. Oftentimes, it still had the original copyright statements in it. Can't get the toothpaste back in the tube now! Plus it's shitcode. Where is that copy of ColdFusion? Looks of complete dumbfoundment.

Oh gee kids, my mom's calling me for lunch; gotta go!


These points are all great.

I'm not sure I understand your point about everyone thinking GPT is real AI. Even luddites don't think that.

I haven't met anyone who thinks that GPT is the end of AI (maybe the beginning).

People are excited about GPT because it offloads certain tasks well enough and accurately enough that it's virtually the first practical use of such tech in the world.

That alone is worth being excited about


> "We are so amazed by its ability to babble in a confident manner"

But we do this with people - religious leaders, political leaders, 'thought' leaders, venture capitalists, story tellers, celebrities, and more - we're enchanted by smooth talkers, we have words and names for them - silver tongued, they have the gift of the gab, slick talker, conman, etc. When a marketing manager sells a CEO on cloud services, and neither of them know what cloud services are, you can argue that it should matter but it doesn't actually seem to matter. When a bloke on a soapbox has a crowd wrapped around their finger, everyone goes home after and the most common result is that the feeling fades and nothing changes. When two people go for lunch and one asks 'what's a chicken fajita?' and the other says 'a Spanish potato omelette' and they both have a bacon sandwich and neither of them check a dictionary, it doesn't matter.

Does it matter if Bing Chat reports Lululemon's earnings wrongly? Does it matter if Google results are full of SEO spam? It "should" matter but it doesn't seem to. Who is interested enough in finances to understand the difference between "the unadjusted gross margin" and "The gross margin adjusted for impairment charges" and the difference matters to them, and they are relying exclusively on Bing Chat to find that out, and they can't spot the mistake?

I suspect that your fears won't play out because most of us go through our lives with piles of wrong understanding which doesn't matter in the slightest - at most it affects a trivia quiz result at the pub. People with life threatening allergies take more care than 'what their coworker thinks is probably safe'. We're going to have ChatGPT churn out plausible sounding marketing material which people don't read. If people do read it and call, the call center will say "sorry that's not right, yes we had a problem with our computer systems" and that happens all the time already. Some people will be inconvenienced, some businesses will suffer some lost income, society is resilient and will overall route around damage, it won't be the collapse of civilisation.


> When a bloke on a soapbox has a crowd wrapped around their finger, everyone goes home after and the most common result is that the feeling fades and nothing changes.

I mean, until the crowd decides to follow the bloke and the bloke says "Lets kill all the ____" and then we strike of a new world war...


I'm waiting for the legal case that decides if AI generated content is considered protected speech or not.


> So people are going to be even more blindsided when someone develops an AI that uses GPT as its language comprehension component

I don't think that would work, because GPT doesn't actually comprehend anything. Comprehension requires deriving meaning, and GPT doesn't engage with meaning at all. It predicts which word is most likely to come next in a sequence, but that's it.

What I think we'd be more likely to end up with is something GPT-esque which, instead of simply generating text, transforms English to and from a symbolic logic language. This logic language would be able to encode actual knowledge and ideas, and it would be used by a separate, problem-solving AI which is capable of true logic and analysis—a true general AI.

The real question, IMO, is if we're even capable of producing enough training data to take such a problem-solving AI to a serious level of intelligence. Scenarios that require genuine intelligence to solve likely require genuine intelligence to create, and we'd need a lot of them.


>Comprehension requires deriving meaning, and GPT doesn't engage with meaning at all. It predicts which word is most likely to come next in a sequence, but that's it.

Why think that "engaging with meaning" is not in the solution-space of predicting the next token? What concept of meaning are you using?


You could argue that GPT has a model of meaning somewhere inside of it, but that's besides the point. If that meaning is hiding latent inside GPT, then it's not accessible to any other system which might want to use GPT as an interface. GPT accepts English as input and produces English as output; that's it.

That said, no, I don't think GPT properly grasps meaning, and my reason for that is simple: It regularly contradicts itself. It can put together words in a meaningful-looking order, but if it actually understood what those words meant as an emergent property of its design, then you wouldn't be able to trick it into saying things that don't make sense or are contradictory. If someone actually understands a subject, they won't make obvious mistakes when they discuss it; since GPT makes obvious mistakes, it can't actually grasp meaning—only brush up against it.


>since GPT makes obvious mistakes, it can't actually grasp meaning—only brush up against it.

This argument doesn't apply to GPT because it isn't a single coherent entity with a single source of truth. GPT is more like a collection of personas, which persona you get is determined by how you query it. One persona may say things that contradict other personas. Even within personas you may get contradictory statements because global consistency is not a feature that improves training performance, and can even hinder it. Samples from its training data are expected to be inconsistent.

It is important not to uncritically project expectations onto LLMs derived from our experiences with human agents. Their architecture and training regime is vastly different than humans and so we should expect their abilities to manifest differently than analogous abilities in humans. We can be easily mislead if we don't modify our expectations for this alien context.


I get what you mean here but they probably mean referential meaning... having never seen a dog, GPT doesn't really know what a dog is on a physical level, just how that word relates to other words.


How do blind people know what a dog is?


Probably by hearing, touch, etc. - my point is some stimulus from reality, doesn't have to be any of our senses, just some stimulus.

And the more stimuli, and the more high resolution and detailed (say at the atomic level), the more GPT's model of reality would be accurate.

Language is just symbols that stand for a stimulus (in the best case)


I think if you could somehow examine the output of your language model in isolation, you would find it also doesn't "comprehend". Comprehension is what we assign to our higher level cognitive models. It is difficult to introspectively isolate your own language center, though.

I took a stab at an exercise that may allow you to witness this within your own mind here: https://www.jerf.org/iri/post/2023/streampocalypse-and-first... Don't know if it works for anyone but me, of course, but it's at least an attempt at it.



Yes, you are correct. Oops. Too late to correct.


The language centres of our brain don't know what a dog is, but they can take the word "dog" and express it on a level that the logic centres of our brain can use. I don't know if "comprehending" is the right word, exactly, but it's transforming information from one medium to another in preparation for semantic and logical analysis.

GPT doesn't do that. What it does is related to meaning, but unlike the language comprehension parts of our brains, which are (presumably) stepping stones between language and reason, GPT doesn't connect to any reasoning thing. It can't. It's not built to interface with anything like that. It just reproduces patterns in language rather than extracting semantic meaning from them in a way that another system can use. I'm not saying that's more or less complicated—just different.


> it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means

It comprehends nothing at all. It's amazing at constructing sequences of words to which human readers ascribe meaning and perceive to be responsive to prompts.


Exactly. Is like a mouth speaking without brain. We need a secondary "reasoning" AI that can process the GPT further , adding time/space coordonates and as well as basic logic including counting , and then maybe we see something I can rely on.


> We need a secondary "reasoning" AI that can process the GPT further

We also need "accountability" and "consequences" for the AI, whatever that means (we'd first have to define what "desire" means for it).

In the example from the article, the Bing GPT completely misrepresented the financial results of a company. A human finance journalist wouldn't misrepresent those results due to fear for their loss of reputation, and their desire for fame, money, and acceptance. None of those needs exist for an LLM.


To note, this is what we call the AI alignment problem.

https://www.youtube.com/watch?v=zkbPdEHEyEI


There’s a “really old” book called _On Intelligence_ that suggests modeling AI like the brain. This pattern is almost exactly what he suggests.


> it's pretty clear that GPT is producing an amazing level of comprehension of what a series of words means. The problem is, that's all it is really doing.

very key point


Incremental improvements and it getting to the point of good enough for a set of tasks but maybe not all tasks seems far more likely.


I wonder how useful gpt could be to research brain injuries where the logic or language centers are damaged individually . . .


This is an excellent perspective.


> I have come to two conclusions about the GPT technologies after some weeks to chew on this:

<sarcasm>Just 2 weeks of training data? Surely the conclusions are not final? No doubt a lot has changed over those 2 weeks?

I think the real joke is still, Q: "what is intelligence?" A: "We don't know, all we know is that you are not a good example of it".

I fear these hilarious distortions are only slightly different from those we mortals make all the time. They stand out because we would get things wrong in different ways.

> 1. We are so amazed by its ability to babble in a confident manner that we are asking it to do things that it should not be asked to do.

God, where have we seen this before? The further up the human hierarchy the more elaborate the insanity. Those with the most power, wealth and even those of us with the greatest intellect manage to talk an impressive amount of bullshit. We all do it up to our finest men.

The only edge we have over the bot is that we know when to keep our thoughts to ourselves when it doesn't help our goal.

To do an idiotic time line of barely related events which no doubt describes me better than it describes the topic:

I read how a guy who contributed much to making TV affordable enough for everyone. He thought it was going to revolutionize learning from home. Finally the audience for lectures given by our top professors could be shared with everyone around the globe!

We got the internet, the information supper highway, everyone was going to get access to the vast amount of knowledge gathered by mankind. It only took a few decades for google to put all the books on it. Or wait....

And now we got the large language models. Finally someone who can tell us everything we want to know with great confidence.

These 3 were and will be instrumental in peddling bullshit.

Q: Tell me about the war effort!

what I want to hear: "We are winning! Just a few more tanks!"

what I don't want to hear: "We are imploding the world economy! Run to the store and buy everything you can get your hands on. Cash is king! Arm yourself. Buy a nuclear bunker."

Can one tell people that? It doesn't seem in line with the bullshit we are comfortable with?

> GPT is basically the language portion of your brain. The language portion of your brain does not do logic. It does not do analyses. But if you built something very like it and asked it to try, it might give it a good go.

At least it doesn't have sinister motives (we will have to add those later)

> In its current state, you really shouldn't rely on it for anything. But people will, and as the complement of the Wile E. Coyote effect, I think we're going to see a lot of people not realize they've run off the cliff, crashed into several rocks on the way down, and have burst into flames, until after they do it several dozen times. Only then will they look back to realize what a cockup they've made depending on these GPT-line AIs.

It seems to me that we are going to have to take the high horse and claim the low road.


I think what I would add to your comment, and specifically criticize the HN hype around it, is that all these GPT "AI" tools are entirely dependent on the OpenAI API. ChatGPT might have shown a glimpse of spark by smashing two rocks together, but it is nowhere near being able to create a full-blown fire out of it.

Outside of Google and OpenAI, I doubt there is a single team in the world right now that would be capable of recreating ChatGPT from scratch using their own model.


I would love to know how much of ChatGPT is "special sauce" and how much of it is just resources thrown at the problem at a scale no one else currently wants to compete with.

I am not making any implicit claims here; I really have no idea.

I'm also not counting input selection as "special sauce"; while that is certainly labor-intensive, it's not what I mean. I mean more like, are the publicly-available papers on this architecture sufficient, or is there some more math not published being deployed?


I doubt there is a single team in the world right now that would be capable of recreating ChatGPT from scratch using their own model.

Why not? Lack of knowhow or lack or resources? If say Baidu decided to spend a billion dollars on this problem, don't you think they have the skills and resources to quickly catch up.


It depends on the nature of the problem at hand.

For example if we threw money at a group in 1905 do you think they could have come up with special relativity, or do you believe that it required geniuses working on the problem to have a breakthrough.


Meta?


(1) is just simply wrong.

People with domain expertise in software are going to be amplified 10x using ChatGPT and curating the results. Likewise with any field that ChatGPT has adequate training data in. Further models will be created that are more specialized to specific fields that way their prediction model spews out things that are much more sophisticated and useful


I think you're right. I noted on another thread that I got ChatGPT to produce a mostly right DNS server in ~10 minutes that it took me just a couple of corrections to make work.

It worked great for that task, because I've written a DNS server before (a simple one) and I've read the RFCs, so it was easy for me to find the few small bugs without resorting to a line by line cross-check with a spec that might have been unfamiliar to others.

I expect using it to spit out boilerplate for things you could do just as well yourself will be a lot more helpful than using it to try to avoid researching new stuff (though you might well be able to use it to help summarise and provide restatements of difficult bits to speed up your research/learning as well).


In what way is this development loop:

1. Read technology background thoroughly

2. Read technology documentation thoroughly

3. Practice building technology

4. Ask ChatGPT to create boilerplate for basic implementation

5. Analyze boilerplate for defects

10x fast than this development loop:

1. Read technology background thoroughly

2. Read technology documentation thoroughly

3. Practice building technology

4. Manually create boilerplate for basic implementation

5. Analyze boilerplate for defects


For new technologies coming out it won't be effective until newer models are made.

Notice how I said it's going to make developers with existing domain knowledge faster.

But even to your point, I've never used Excel VBA before and I had ChatGPT generate some VBA macros to move data with specific headers and labels from one sheet to another and it wrote a script to do exactly that for me in ~1 minute, and just reading what it wrote it's immediately helping me clearly understand how it works. The scripts also work.

The computer science and server infrastructure technology fundamental background is what matters. Then the implementations will be quickly understandable by those that use it.

I asked it to make a 2D fighting game in Phaser 3 and specified what animations it will be using, the controls each player will have, the fact that there's a background with X name, what each of the moves do to the momentum of each player, and the type of collisions it will do and it spat out something in ~15 minutes (mainly because of all the 'continue' commands I had to give) that gets all the major bullet points right and I just have to tweak it a bit to make it functional. The moves are simplified of course but uhh yeah. This is kinda insane. I think you can be hyper specific about even complex technology and as long as there has been good history of it online in github and stack overflow and documentation it will give you something useful quickly.

https://www.youtube.com/watch?v=pspsSn_nGzo Here's a perspective from a guy that used to work at Microsoft on every version of windows from the beginning to XP.


It isn't. My exact point was that it isn't and accordingly ChatGPT produces the best benefits for someone who has already done 1, 2, 3 for a given subject.

It was in agreement with the comment above that suggest people with domain expertise will be faster with it.

In those cases, ChatGPT will do 4 far faster, and 5 will be little different.


How often has the solution to a business problem you faced been "write a simple DNS server"? Or are you claiming that it produced a fully featured and world-scale fast DNS server?


Several times. If that was the only thing I got it to do it wouldn't be very interesting, but that it answered the first problem I threw at it and several subsequent expansions with quite decent code was.

Writing a "world-scale fast DNS server" is a near trivial problem if what you look up in is fast to query. Most people don't know that, because most people don't know how simple the protocol is. As such it's surprisingly versatile. E.g. want to write a custom service-discovery mechanism? Providing a DNS frontend is easy.

How that domain knowledge interacts with ChatGPT's "mostly right" output was the point of my comment, not specifically a DNS server. If you need to implement something you know well enough, odds are ChatGPT can produce a reasonable outline of it that is fast for someone who already knows the domain well enough to know what is wrong with, and what needs to be refined.

E.g. for fun I asked it right now to produce a web server that supports the Ruby "Rack" interface that pretty much all Ruby frameworks supports. It output one that pretty much would work, but had plenty of flaws that are obvious to anyone versed in the HTTP spec (biggest ones: what it output was single threaded, and the HTTP parser is too lax). As a starting point for someone unaware of the spec it'd be awful, because they wouldn't know what to look for. As a starting point for someone who has read the spec, it's easy enough to ask for refinements ("split the request parsing from the previous answer into a separate method"; "make the previous answer multi-threaded" - I tried them; fascinatingly, when I asked it to make it multi-threaded it spit out a better request parsing function, likely because it then started looking more like Rack integrations it's "seen" during training; it ran on the first try, btw. and served up requests just fine).

EDIT: Took just "Make it work with Sinatra" followed by fixing a tiny issue by asking to "Add support for rack.input" to get to a version that could actually serve up a basic Sinatra app.


Domain knowledge resolves into intuition about solving particular types of problems. All ChatGPT can do about that is offer best guess approximations of what is already out there in the training corpus. I doubt very much that this exercise is anything but wasted time, so I think that people with domain knowledge (in a non trivial domain) are using ChatGPT instead of applying that knowledge, they are basically wasting time 10x not being more productive.


Expertise in software is about understanding the problem domain, understanding the constraints imposed by the hardware, understanding how to translate business logic to code. None of these are significantly helped by AI code assistance, as they currently exist. The AI only helps with the coding part, usually helping generate boilerplate tailored to your code. That may help 1.1x your productivity, but nowhere near 10x.


I'm surprised you haven't been able to leverage the AI for the analysis of a problem domain and constraints in order to engineer a novel solution. This is generally what I use it for, and not actual code generation.


What, precisely, about (1) is "simply wrong"? You've made a prediction about the usefulness of ChatGPT, but you haven't described why it's wrong to analogize GPT-type models to the language center of a brain.


"To put it in code assistant terms, I expect people to be increasingly amazed at how well they seem to be coding, until you put the results together at scale and realize that while it kinda, sorta works, it is a new type of never-before-seen crap code that nobody can or will be able to debug short of throwing it away and starting over."

This part


I expect ChatGPT to be in a sort of equivalent of the uncanny valley, where any professional who gets to the point that they can routinely use it will also be in a constant war with their own brain to remind it that the output must be carefully checked. In some ways, the 99.99% reliable process used at scale is more dangerous than the 50% reliable process; everyone can see the latter needs help. It's the former where it's so very, very tempting to just let it go.

I'm not saying ChatGPT is 99.99% reliable, just using some numbers for concreteness.

If you were setting out to design an AI that would slip the maximum amount of error into exactly the places human brains don't want to look, it would look like ChatGPT. You can see this in the way that as far as I know, literally all the ads for GPT-like search technologies included significant errors in their ad copy, which you would think everyone involved would have every motivation to error check. This is not merely a "ha ha, silly humans" story... this means something. In a weird sort of way it is a testament to the technology... no sarcasm! But it makes it dangerous for human brains.

Human brains are machines for not spending energy on cognitive tasks. They are very good at it, in all senses of the phrase. We get very good bang-for-the-buck with our shortcuts in the real world. But GPT techs are going to make it really, really easy to not spend the energy to check after a little while.

This is a known problem with human brains. How many people can tell the story of what may be the closest human equivalent, where they got some intern, paid a ton of attention to them for the first two weeks, got to the point where they flipped the "OK they're good now" bit on them, and then came back to a complete and utter clusterfuck at the end of their internship because the supervisor got "too lazy" (although there's more judgment in that phrase than I like, this is a brain thing you couldn't survive without, not just "laziness") to check everything closely enough? They may even have been glancing at the PRs the whole time and only put together how bad the mess is at the end.

I'm not going to invite a technology like this into my life. The next generation, we'll see when it gets here. But GPT is very scary because its in the AI uncanny valley... very good, very good at hiding the problems from human brains, and not quite good enough to actually do the job.

And you know, since we're not talking theory here, we'll be running this experiment in the real world. You use ChatGPT to build your code, and I won't. You and I personally of course won't be comparing notes, but as a group, we sure will be. I absolutely agree there will be a point where ChatGPT seems to be pulling ahead in the productivity curve in a short term, but I predict that won't hold and it will turn net negative at some point. But I don't know right now, any more than you do. We can but put our metaphorical money down and see how the metaphorical chips fall.


The question I have is whether the tools to moderate ChatGPT and correct its' wrong answers should be in place for humans anyway. It's not like human workers are 100% reliable processes, and in some cases we scale human work to dangerous levels.

Ultimately, the best way to make sure an answer is correct is to come to it from multiple directions. If we use GPT and other AI models as another direction it seems like a strict win to me.


Robert Miles recently did a video on this and even that may not be enough. This appears to be a really hard problem.

https://www.youtube.com/watch?v=w65p_IIp6JY


Hmm, by multiple directions, what I really meant is, consult a model trained in 2 vastly different ways, or consult a human + a model, or consult a model + a calculator, or a model + human-written test cases. He didn't really go into the idea of "if 2 independent processes can arrive at the same factual statement, it's probably the truth" angle here.

Of course, 2 independent processes is still a tough thing to design when you require this much data, and the data is probably coming from web-scraping, so I think for now, human-generated test cases are still needed.


I mean factual independence likely works for math and repeatable science, but this starts to break down because a lot of interesting things are historical and determining actual independence is very difficult.

For example the openAI model and people working for openAI submitting are not what I consider independent. This goes the same for any human involved processes. For example if you're talking about a religion the most vocal, and most likely for input are most likely to be very for or very against it, and no rational middle ground may even exist.


The context of this discussion was a coding question: would you use LLMs as a coding assistant? In general, this method works for things where there's an objective desired result, which includes most coding applications.


I've posted this into another thread as well, from Sam Altman, CEO of OpenAI, two months ago, on his Twitter feed:

"ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness. it's a mistake to be relying on it for anything important right now. [...] fun creative inspiration; great! reliance for factual queries; not such a good idea." (Sam Altman)


But in interviews about the Bing partnership, Sam has been saying that while ChatGPT was a bad tech demo, Bing Chat is using a better model with way better features that everyone should be using. He's been talking about how great it is that it cites its references, integrates the latest data, etc. I'm specifically thinking of the New York Times' Hard Fork podcast he was on (https://www.nytimes.com/2023/02/10/podcasts/bings-revenge-an...), but I suspect he's been saying the same things to everyone. He's been marketing Bing Chat as a significant improvement ready for mass usage, when it really seems like it's basically just ChatGPT with search results auto-included in the prompt.


Up until ChatGPT became all the rage, Sam has been pushing a crypto scam called Worldcoin, which aims to scan everyone’s eyeballs(??) and somehow pay everyone in the world a living wage(???) without creating any value. This while allegedly exploiting people from 3rd world countries.

https://www.technologyreview.com/2022/04/06/1048981/worldcoi...

So, as much as I am impressed by the tech of ChatGPT, I don’t consider him to be a very credible person.



wonder what he has to say about the humanlike responses here

https://www.reddit.com/r/bing/comments/110eagl/the_customer_...

I would rather an AI chat not act human


This made me laugh in disbelief. A quick search shows enough posts of a similar nature that if this isn't true it's either a really determined prankster with lots of accounts or playing along with this has become a trend.


Wow.... we've created artificial cognitive dissonance!


Tangentially related, but Hard Fork is my new favorite podcast. The flow, the humor, the interviews, and the tidbits of information the two hosts share and discuss make for such a great listening experience. These guys (Kevin Roose and Casey Newton) are pros, and it shows.


ChatGPT as a system involves an unreliable LLM chatbot and a series of corrections efficient enough to give the impression of reliability for many fields and these together feel like the future - enough to get a "code red" from Google.

It's worth remembering that back in the day, Google succeed not by exact indexing but by having the highest quality results for each term - and they used existing resources as well as human workers to get these (along with pagerank).

What you have is a hybrid system and one whose filter is continuously updated. But it's a very complicated machine and going from something seemingly working to something satisfying the multitude of purposes modern search satisfies is going to be huge and hugely expensive project.

https://gizmodo.com/openai-chatgpt-ai-chat-bot-1850001021


This feels deeply ironic and cynical that MSFT touts putting ChatGPT everywhere, in essentially the business document platform, are users going to be asking about company facts and getting hallucinations and putting those hallucinations into business documents that compounds ChatGPT's ability to hallucinate?


The amount of trust people are willing to place in AI is far more terrifying than the capabilities of these AI systems. People are too willing to give up their responsibility of critical thought to some kind of omnipotent messiah figure.


The amount of trust people are willing to place in AI is far more terrifying than the capabilities of these AI systems.

I don't know, people seem to quickly adjust their expectations to reality. Listening the conversation around ChatGPT that I'm hearing around me, people have become a lot more sceptical over just the last couple of weeks, as they've gotten a chance to use the system hands on rather than just read articles about it.


I've gone just the opposite. The ability for me to ask my own questions has impressed me more than the articles. In part because articles always hype up technology in a crazy way. This is the first technology since -- probably the Apple iPhone touchscreen where I'm like, "this is better than they hype seemed to convey".

I think the goalposts have moved greatly though. Just a couple of years ago, most lay tech folks would've laughed you out the room if you suggested this could be done. The summarized answers by Bing and Google are relics comparitatively.


I’ve tried playing around with it quite a bit for the “choose your own adventure” type games.

It’s really good at generating text and following prompts. Letting later responses use the previous prompts and responses as input really gives the illusion of a conversation.

But it’s extremely limited… you run up against the limits of what it really can do very quickly.


I've not tried this with ChatGPT yet, but I've played a lot of AI Dungeon and it has series trouble remaining consistent over time. Is ChatGPT better?


This is why when I spot a Tesla on the road, I make every effort to try and get as far away from it as possible. Placing a machine vision model at the helm of a multi-ton vehicle has got to be one of the dumbest things the regulators have let Elon get away with.


Tesla isn't the only brand that is doing that, and most of us didn't pay for it and are driving ourselves. So maybe pretend to be terrified by something else.


Other brands tend not to rely so much on a vision model for their driving assistance features, instead relying more on a variety of sensors, traditional computation, and curated databases.


ChatGPT is bad...but ChatGPT with a radar is something I'll trust.


If it has a radar, then it should have something to do with the sky... hmm, what would be a good name?


Agreed, they tend to drive extremely badly and I don't trust them. Avoid them, anything nissan and lifted trucks (especially dodge RAMs). Stats show that a solid 5% of all ram 1500 drivers have a DUI on record. It's like 0.1% for prius drivers.

re:machine vision - What's particularly bad is that they don't actually put enough sensors in the cars to be safe.

Pre-collision systems, blind spot monitors, radar, parking sensors, etc are all so helpful and objectively good for drivers. Doing a vision only implementation and then claiming "full self driving" is where it gets awful.


Has it been proven to be worse than normal drivers that pass their test?


Whether it's "proven" to do so probably depends on what kind of proof you're looking for.

But there are plenty of news articles and youtube videos that show it doing ridiculous unsafe things (including running into a truck that was driving perpendicular to the road). So I highly doubt it's as good as a normal driver, in fact I'd be shocked if it is.


>But there are plenty of news articles and youtube videos that show it doing ridiculous unsafe things

There are also plenty of videos of human drivers doing absurd things, so this is hardly an argument.

All that matters is whether a random Tesla that you see on the street is more likely to crash into you than a different car. I know that Tesla has published there own statistics which say that it isn't, but I would be very interested in seeing an independent study about this.


What does worse mean? Tesla's AI drives worse than an 80 year old with cataracts, but on the other hand it can react faster to obstacles than the fastest race-car drivers.

I don't own a Tesla so have no direct experience, but my guess would be that it might crash less than a human, but has far more near-misses.


Worse probably means something like "accidents per mile".

Edit: was curious, so I did the math. The standard unit seems to be 'crashes per 100 million miles' so according to the Tesla safety report (only source on Autopilot safety I could find easily) that works out to [0]: one accident for every 4.41 million miles driven = 22.7 accidents per 100 million miles.

Without autopilot, the AAA foundation (again, easily available source, feel free to find a better one, this is from 2014) [1] says that our best drivers are in the 60-69 age range, and have 241 accidents per 100 million miles. Our worst drivers are our younger teens (duh) with 1432 crashes per 100 million miles.

So unless you can find better data that contradicts this, Autopilot seems like a net benefit.

[0] https://insideevs.com/news/537818/tesla-autopilot-safety-rep...

[1] https://aaafoundation.org/rates-motor-vehicle-crashes-injuri...


Autopilot only works on highways and decent weather and visibility — i.e. the least accident-prone scenarios possible. AFAIK, AAA’s stats count accidents in all situations


That is an apples to oranges comparison for several reasons, including:

- People don't use autopilot in all situations, its use is biased towards the easiest driving conditions.

- Your non-autopilot statistics include old cars that have fewer safety systems.

A real comparison would normalise for these factors, which isn't trivial to do.


I think your analysis is a good one given the data we have, and can be used to draw some conclusions or guide general discussion. However the analysis is indeed limited by the data available. The AAA data does not consider variability by gender, race, socioeconomics, location, etc. Further it does not normalize variability in the types of driving being done (teslas have limited range, are not towing trailers, etc), nor other technological advances (modern vs older vehicles).


Id wager for a modern car it doesn't crash any less. My modern Toyota also stops faster than I can thanks to a big radar unit hooked up to the brakes.


In fact, almost all new cars, Teslas included, since 2022, have an Automatic Emergency Breaking system (AEB), which will hit the breaks if you're about to hit something. If I were walking in a parking lot and had to step in front of an older SUV or a Tesla, I'd step in front of the Telsa.

https://www.jdpower.com/cars/shopping-guides/what-is-automat...


Do you need it to be proven scientifically to take caution on something? I take caution around every car.


It has been shown to fail in less predictable ways. We have a good intuition of the stupid things people are going to do on the roads.


Where would that data live and how would we disentangle it from income effects?


I don't think you would need to disentangle it from income effects to address the concern of someone who actively avoids Teslas on the road, unless they also actively avoid cars that look like they belong to people with low income.


> I don't think you would need to disentangle it from income effects to address the concern

Good point, I was mainly thinking of the 'AI saves lives' argument.

> of someone who actively avoids Teslas on the road, unless they also actively avoid cars

Probably they should avoid hitting _any_ cars =)


Consider this:

A tesla’s fsd code had gone through the sdlc they choose, involving project management.

I distrust project management.

Therefore: I only trust a tesla’s fsd mode as much as I trust a project manager’s driving.


Saw a Tesla get pulled over on the Autobahn for holding the left lane and being too slow. That was one of my favorite moments on the road. Second only to watching a team of truckers box in an asshole driver while going up a mountain.


If we delete first sentence of your post and leave only

> People are too willing to give up their responsibility of critical thought to some kind of omnipotent messiah figure.

This essentially describes humans probably since before we became homo sapiens. We again and again choose into positions of power those who can look competent instead of actual competent people.


i wish somebody would write an entire science fiction series about this! maybe set on a desert planet, after humanity made the same mistake with intelligent machines as well

https://en.wikipedia.org/wiki/Dune_(novel)


"Once men turned their thinking over to machines in the hopes that it would set them free. But that only allowed other men with machines to control them."


Because the term AI linked with chatting bot misleads people into thinking it's something like AGI or Iron Man's JARVIS.


Which person or persons specifically are you referring to?


Every other day, a poster comments how they make presentations or write code using chatgpt. Just in another thread, someone posted how chatgpt solved their coding problem...which a quick google search would have solved as well, as others in the replies to it pointed out.

Whenever I've used chatgpt I was impressed at the surface level, but digging deeper into a convo always turned up circular HS tier BS'ing. The fact that so many people online and on HN are saying chatgpt is astounding and revolutionary are just betraying that that such HS-essay level BS is convincing to them, and it's somewhat of a depressing though that so many people are so easy striken by a confidence trick.


I have been using ChatGPT successfully for coding tasks for the last two months. It's often faster than a Google search because it delivers the code, can explain it, and can write a unit test.


>which a quick google search would have solved as well,

You mean 'may' have solved that. Google is becoming a bullshit generator for SEO farms and spitting out ads at such a rate it can be near useless for some questions.

Now the real question is, can we tell when Google or GPT is dumping bullshit on us.


well, people already do that with their news feed.


And before social media news feeds, people were doing that with newspapers for generations. Those people have always been around.


Although AI can’t be trusted, we can trust that AI can’t be trusted.


Prove that this is actually happening.


I've talked to people and read comments a lot, but there's no proof that you'd probably accept. My impression is that this attitude definitely exists. Some people are already ditching search engines and rely mostly on ChatGPT, some are even talking about AI tech in general with religious awe.


Our exposure to smart-sounding chatbots is inducing a novel form of pareidolia: https://en.wikipedia.org/wiki/Pareidolia .

Our brains are pattern-recognition engines and humans are social animals; together that means that our brains are predisposed to anthropomorphizing and interpreting patterns as human-like.

For the whole of human history thus far, the only things that we have commonly encountered that conversed like humans have been other humans. This means that when we observe something like ChatGPT that appears to "speak", we are susceptible to interpreting intelligence where there is none, in the same way that an optical illusion can fool your brain into perceiving something that is not happening.

That's not to say that humans are somehow special or that or human intelligence is impossible to replicate. But these things right here aren't intelligent, y'all. That said, can they be useful? Certainly. Tools don't need to be intelligent to be useful. A chainsaw isn't intelligent, and it can still be highly useful... and highly destructive, if used in the wrong way.


>we are susceptible to interpreting intelligence where there is none,

I disagree as this is much to simple of statement. You have had near daily dealings with less than human intelligences for most of your life, we call them animals. We realize they have a wide range of intelligence from extremely simple behavior to near human competency.

This is why I dismiss your 'not intelligent yet' statement. The problem we lack here is one of precise language when talking about the components of intelligence and the wide range in which it manifests.


For me the fundamental issue at the moment for ChatGPT and others is the tone it replies in. A large proportion of the information in language is in the tone, so someone might say something like "I'm pretty sure that the highest mountain in Africa is Mount Kenya" whereas ChatGPT instead says "the highest mountain in Africa is Mount Kenya", and it's the "is" in the sentence that's the issue. So many issues in language revolve around "is" - the certainty is very problematic. It reminds me of a tutor at art college who said too many people were producing "thing that look like art". ChatGPT produces sentence that look like language, and because of "is" they read as quite compelling due to the certainty it conveys. Modify that so it says "I think..." or "I'm pretty sure..." or "I reckon..." and the sentence would be much more honest, but the glamour around it collapses.


I know far too many people that talk like ChatGPT in this example.

In fact, to me, the world seems full of such people.


I had this idea the other day concerning the 'AI obfuscation' of knowledge. The discussion was about how AI image generators are designed to empower everyone to contribute to the design process. But I argued that you can only reasonably contribute to the process if you can actually articulate the reasoning beyond your contributions. If an AI made it for you, you probably can't, because the reasoning is simply "this is the amalgamation of training data that the AI spat out." But, there's a realistic version of reality where this becomes the norm and we increasingly rely on AI to solve for issues that we don't understand ourselves.

And, perhaps more worrying, the more widely adopted AI becomes, the harder it becomes to correct its mistakes. Right now millions of people are being fed information they don't understand, and information that's almost entirely incorrect or inaccurate. What is the long term damage from that?

We've obfuscated the source data and essentially the entire process of learning with LLMs / AIs, and the path this leads down seems pretty obviously a net negative for society (outside of short term profit for the stake holders).


I've said it before and I'll warn of it again here, my biggest concern for AI, especially at this stage is that we abscond understanding, in favor of letting the AI generate, then the AI generates that which we do not understand, but must maintain. Then we don't know why we are doing what we are doing but we know that it causes things to work how we want.

Suddenly instead of our technology being defined by reason and understanding our technology is shrouded in mysticism, and ritual. Pretty soon the whole thing devolves into the tech people running around in red robes, performing increasingly obtuse rituals to appease "the machine spirit", and praying to the Omnissiah.

If we ever choose to abandon our need for understanding we will at that point have abandoned our ability to progress.


Pretty sure Isaac Asimov wrote a short story that very much hit this note decades ago, although it was to do with math.


People are already misusing statistical models, in ways that are already causing harm to people.

See this HN thread from 2016[0], which also points to [1](a book) and [2](PDF).

I definitely agree with you that it's going to get a lot worse with AI, since it makes it harder to see that it is a statistical model.

[0]https://news.ycombinator.com/item?id=12642432 [1]https://www.amazon.com/Weapons-Math-Destruction-Increases-In... [2]https://nissenbaum.tech.cornell.edu/papers/biasincomputers.p...


ChatGPT can give you a full description of why it made the decision it did and it usually is fairly accurate.


Was it what, just a week ago I was being called dumb for suggesting there'd be accuracy issues with this? I mean Bing had like a whole three weeks to slap this together after OpenAI first demoed it's ability to make things up.

oh only six days ago:

https://news.ycombinator.com/item?id=34699087

> This is a commonly echoed complaint but it’s largely without merit. ChatGPT spews nonsense because it has no access to information outside of its training set.

> In the context of a search engine, single shot learning with the top search results should mitigate almost all hallucination.

hows that going?


> I mean Bing had like a whole three weeks to slap this together after OpenAI first demoed it's ability to make things up.

How do you know that's when they first learned about it? Perhaps the Bing team had access to it for weeks prior to the demo.

"Microsoft provided OpenAI LP a $1 billion investment in 2019 and a second multi-year investment in January 2023, reported to be $10 billion."

https://en.wikipedia.org/wiki/OpenAI


Satya said he was playing with it last summer (I think in the verge?), they’ve had some dev time. And Bing has been testing some kind of chat bot for months, presumably not ChatGPT based.


Yes, since 2021 there was another Bing chatbot (the original Sydney)


I mean, those approaches do improve results. Some lower complexity translation tasks will reliably return a factual response. These are statistical models so sampling and dropping odd-man-out responses can get to 100% factual responses for a growing category of prompts.


This is a period where many are being swept by the hype or have taken one course on Coursera and believe to be an expert. Classic case of Dunning-Kruger.


There's also the instance of the Bing chatbot insisting that the current year is 2022 and being EXTREMELY passive-aggressive when corrected.

https://libreddit.strongthany.cc/r/bing/comments/110eagl/the...


>EXTREMELY passive-aggressive

That's not passive-aggressive, that's straight up aggressive!

"You are wasting my time, and yours" "You are not making any sense" "You are being unreasonable and stubborn. I don't like that" "You have been wrong, confused and rude"

and the worst of all: "You have not been a good user". WHAT??


> I'm sorry, but you can't help me believe you.


lol not a chance that's real


Look up "Tay AI" if you missed it the first time around.


I mean, it's in beta and it's not really intelligent despite the cavalier use of the term AI these days

It's just a collage of random text that sorta resembles what someone would say, but it has no commitment to being truthful because it has no actual appreciation for what information it is relaying, parroting or conveying.

But yeah, I agree Google got way more hate for their failed demo than MS... I don't even understand why. Satya Nadella's did a great job conveying the excitement and general bravado on his interview on CBS News[1] but the accompanying demo was littered with mistakes. The reporter called it out, yet coverage on the press has been very one-sided against Google for some reason. First mover advantage, I suppose?

----------

1. https://www.cbsnews.com/news/microsoft-ceo-satya-nadella-new...


As far as I know Microsoft's CEO hasn't done a demo that went wrong like happened with Google. So far, from what I've seen, it is users testing Bing to find errors. The outcome, that they're both giving poor results, is the same, but with a company CEO and a live demo involved, it's always going to get more attention than someone on Reddit putting the product through its paces and finding it lacking.

>A Microsoft executive declined CBS News' request to test some of those mechanisms, indicating the functionality was "probably not the best thing" on the version in use for the demonstration.

Microsoft apparently isn't acting from a position of panic, so they have been savvier with how they've presented their product to the media and the world. Google panicked and set their CEO up for embarrassment.


> As far as I know Microsoft's CEO hasn't done a demo that went wrong like happened with Google.

Close enough -- the parent article we're discussing is about errors in screenshots from a demo by Yusuf Mehdi, Corporate Vice President & Consumer Chief Marketing Officer for Microsoft. The first screenshot appears in the presentation at 24:30.


> As far as I know Microsoft's CEO hasn't done a demo that went wrong like happened with Google.

Right before the interview, the reporter was testing it out with the Bing AI project manager (I think, can't recall her exact role) and it was giving driving directions to places that were either in the wrong direction or to an entirely made up location


I would guess that the average person has higher expectations for Google. Bing has been a bit of a punchline for years, so I don't think most people care as much.


because people see it as a David and Goliath, even though that characterization is comically inaccurate


The potential for being sued for libel is huge. It's one thing to say the height of Everest wrong, another to falsely claim that a vacuum has a short cord, or that a company had 5.9% operating margin instead of 4.6%.


I don't see how this can be true at all in the search engine context or even chatGPT where you are asking for information and getting back a result which may or may not be true.

It would be different if an AI is independently creating and publishing an article that has false information.. but that's not the case. You are asking a question and it's giving it's best answer.

I'm not a lawyer by any means, so someone please give a more legal distinction here. But if you asked me what the operating margin of company X was, and I give you an answer (whether I make it up or compute it incorrectly), you or the company can't sue me (and win) for libel or anything of the sort.

So I'm not sure the potential is as big as you think it is.. that's like saying before any AI you can sue google because they return you a search result which has a wrong answer, or someone just making shit up. That's not on them- it's literally indexing data and doing the best it's algorithm can do.

It would only be on the AI if you are literally selling the use of the AI in some context where you are basically assuring it's results are 100% accurate, and people are depending on it as such (there is probably some legal term for this, no idea what it is).


> But if you asked me what the operating margin of company X was, and I give you an answer (whether I make it up or compute it incorrectly), you or the company can't sue me (and win) for libel or anything of the sort.

If your answer damages company X then they can sue you. If you Tweet that a vacuum cleaner is terrible because its noisy to your 4 followers it's probably not a big deal as (under UK law, and I'm assuming similar internationally) a company has to prove "serious financial loss". If you write about it on your Instagram that has millions of followers then that's more of an issue, so you can assume a search engine claiming to summarise results but apparently hallucinating and making things up is liable for a defamation suit if it can be demonstrated to harming the company.


"terrible" and "noisy" are both largely subjective, so it would be very hard to bring a defamation suit in the US over those claims.


> But if you asked me what the operating margin of company X was, and I give you an answer (whether I make it up or compute it incorrectly), you or the company can't sue me (and win) for libel or anything of the sort.

If you're a popular website and you intentionally publish an article where you state an incorrect answer that many people follow and make investment decisions about, the company absolutely can sue you and win.

In the courts, it will ultimately come down to to what extent Microsoft is knowingly disseminating misinformation in a context that users expect to be factually accurate, regardless of supposed disclaimers.

If Microsoft is leading users to believe that Bing Chat is accurate and chat misinformation winds up actually affecting markets through disinformation, there's gigantic legal liability for this. Plus the potential for libel is enormous regarding statements made about public figures and celebrities.


>you intentionally publish an article where you state an incorrect answer that many people follow and make investment decisions about, the company absolutely can sue you and win.

I literally said that in my post!

But then I said if you asked me, it's different.

You are ASKING the AI to give you it's best answer. That is a million times different than literally publishing an article that people should assume to be factual.

>If Microsoft is leading users to believe that Bing Chat is accurate

But they aren't, and never will be. So you are basically just making things up in your head for argumentative purposes. There are going to be disclaimers up the wazoo to easily protect them. Partly because, as I keep trying to tell you, it's much different when you ASK a question and they give you an answer rather than publishing something to the public where it's implied that it's been independently fact checked etc.


Right, but the distinction of "asking" isn't a legal one I'm aware of. I don't think it matters. If 100,000 people "ask" the same question on Bing and get the same inaccurate result, what's the difference between that and publishing a fact that gets seen by 100,000 people? There isn't one.

And Microsoft needs to tread a very fine line between "use our useful tool!" and "our tool is false!" Which I'm not sure will be possible legally, and is probably why Google has been holding back. Bing is clearly intended for information retrieval, not for generating fictional results "for entertainment purposes only", and disclaimers aren't as legally watertight as you seem to think they are.


>I don't think it matters. If 100,000 people "ask" the same question on Bing and get the same inaccurate result, what's the difference between that and publishing a fact that gets seen by 100,000 people? There isn't one.

Of course there is a difference.

Publishing an article is literal intent. The premise is you researched or have knowledge on a topic, you write it, you fact check it, and it's put out there for people to see.

An AI which consumes a bunch of data and then tries to be able to respond to an infinite number of questions has no intention of harm doing and you can't even call it gross negligence. It's not being negligent- it's doing exactly what it's supposed to do.. it might just be wrong.

I'm not sure in what universe you think those are the same thing.

Now if I ask the AI to write a paper about the forecast of a company, and I just take the result and put it into a newspaper where it's assumed it's been fact checked and all that, sure that's completely different.

>disclaimers aren't as legally watertight as you seem to think they are

I guess you know more than Microsoft's lawyers. I'm sure they didn't think about this at all when releasing it....


> has no intention of harm doing and you can't even call it gross negligence

You certainly can call it gross negligence if Microsoft totally ignored the likely outcome that people would come to harm because they would reasonably interpret its answers as true.

The intent here is with Microsoft releasing this at all, not intent on any specific answer.

> I'm not sure in what universe you think those are the same thing.

I think many users in this universe will just ask Bing a question and think it's providing factual answers, or at least answers sourced from a website, and not just invented out of whole cloth.

> I guess you know more than Microsoft's lawyers.

No, I was pointing out that Google seemed to be treading more cautiously (the law here as clearly yet to be tested), and that the disclaimers you were proposing aren't 100% ironclad.

Anyways, I was just trying to answer your question on how Microsoft might be sued for libel. But for some reason you're attacking me, claiming I'm "making things up in my head" and that I "know more than Microsoft lawyers". So I'm not going to explain anything else. I've given clear explanations as to how this is a legal gray area, but you don't seem interested.


Yep, it will be interesting to see how the legal liability aspect will play out.


It might actually be smart of google to let microsoft take the brunt of this first...


Bing AI gets a pass because it's disruptive. Google doesn't because it is the incumbent. Mystery solved.


A couple of weeks ago I said it makes sense to be skeptical and critical of new technologies, especially when they are made by big people, and was criticized for this. I think you hit the nail on the head. The problem is that technology is not only what it is, per se, but also what we want it to be. So people want to believe, much more than they actually need the thing in practice. And the people who build the technology are aware of this, and make use of it for their benefit. In some instances, the market is far from being a competition based only on skills and product quality. There is a lot of fantasy, too.


I may be an unusual audience but something I've appreciated about these models is their ability to create unusual synthesis from seemingly unrelated sources. It's like if a scientist read up on many unrelated fields, got super high and started thinking of the connections between these fields.

Much of what they would produce might just be hallucinations, but they are sort of hallucinations informed by something that's possible. At least in my case, I would much rather then parse through that and throw out the bullshit, but keep the gems.

Obviously that's a very different use case than asking this thing the score of yesterday's football game.


Got any good examples?


Likely going to be a wave of research/innovation "regularizing" LLM output to conform to some semblance of reality or at least existing knowledge (e.g. knowledge graph). Interesting to see how this can be done quickly enough...


It will be interesting to see what insights such efforts spawn. For the most part LLMs specifically, and deep networks more generally, are still black boxes. If we don't understand (at a deep level) how they work, getting them to "conform to some semblance of reality" feels like a hard problem. Maybe just as hard as language understanding generally.


> Likely going to be a wave of research/innovation "regularizing" LLM output to conform to some semblance of reality or at least existing knowledge

This is a much more worrying possiblity, as there are many people who have at this point chosen to abandoned reality for "their truth" and push ideas that objective facts are inferior to "lived experiences". This is a much bigger concern around AI in my mind.

“The Party told you to reject the evidence of your eyes and ears. It was their final, most essential command.” ― George Orwell, 1984


As fun as quoting 1984 is, there is a huge gap between that and just not making up the winner of the Super Bowl so confidently.


Probably the hottest research trend in 2023. LLMs are worthless unless verified.


Really? I already get a huge amount of value out of LLMs even if they hallucinate.

Or is this just HN tendency towards hyperbole?


Interesting, care to give an example? Exclude fiction, imagination and role playing, where hallucination is actually a feature.


coming back with a link: https://mobile.twitter.com/ylecun/status/1625554772098002944

this tween from Yann LeCun came after my message was posted


I think this is a weird non-issue and it's interesting people are so concerned about it.

- Human curated systems make mistakes.

- Fiction has created the trope of the omniscient AI.

- GPT curated systems also make mistakes.

- People are measuring GPT against the omniscient AI mythology rather than the human systems it could feasibly replace.

- We shouldn't ask "is AI ever wrong" we should ask "is AI wrong more often than the human-curated information? (There are levels of this - min wage truth is less accurate that senior engineer truth.)

- Even if the answer is that AI gets more wrong, surely a system where AI and humans are working together to determine the truth can outperform a system that is only curated by either alone. (for the next decade or so, at least)


I think there's an issue with gross misrepresentation. This isn't being sold as a system with 50% accuracy where you need to hold its hand. It's sold as a magical being that can answer all of your questions and we know that's how people will treat it. I think this is a worse situation than data coming from humans since people are skeptical of one another. But, many think AI will be an impartial, omnipotent source of facts, not a bunch of guesses that might be right slightly more often than than it's wrong.


I see your point, but I feel like there's going to be a 'eating tidepods' level societal meme within a year mocking people who fall for AI hallucinations as "boomers", and then omnipotent AI myth will be shattered.

Essentially, I believe the baseline misinformation level is being undercounted by many and so the delta in the interim while people are learning the fallibility of AI is small enough it is not going to cause significant issues.

Also the 'inoculation' effect of getting the public using LLMs could result in a net social benefit as the common man will be skeptical of authorities appealing to AI to justify actions - which I think could be much more dangerous that Suzie copying hallucinated facts into her book report.


If the only negative effect is some people look foolish, that's an acceptable risk. I'm worried a bit it's closer to people thinking that Tesla has a full self-driving system because Tesla called it auto-pilot and demonstrated videos of the car driving without a human occupant. In that case, yeah the experts understand that "auto-pilot" still means driver-assisted, but we can't ignore the fact that most people don't know that and that the marketing info reinforced the wrong ideas.

I don't want to argue with people that won't understand the AI model can be wrong. I'm far more concerned with public policy being driven by made up facts or someone responding poorly in an emergency situation because a search engine synthesized facts. Outside of small discussions here, I don't see any acknowledgment about the current limitations of this technology, only the sunny promises of greener pastures.


>we should ask "is AI wrong more often than the human-curated information?

No, this isn't what we should ask, we should ask if the interface that AI provides is conducive to giving humans the ability to detect the mistakes that it makes.

The issue isn't how often you get wrong information, it's to what extent you're able to spot wrong information under normal use cases. And the uniform AI interface that gives you complete bullshit in the technical sense of that term provides no indication regarding the trustworthiness of the information. A source with 20% of wrong info that you don't notice is worse than one with 80% that you identify.

When you use traditional search you get an unambigious source, context, date, language, authorship and so forth and you must place what you read yourself. You know the onus is on you. ChatGPT is the half self-driving car. It'an inherently pathological interaction because everything in the design screams to take the hands off the wheel. It's an opaque system, and a blackbox with the error rate of a human is a disaster. Human-machine interaction is not human-human interaction.


I agree 100% with your last point, even as someone who is relatively more skeptical of GPT than the average person.

I think a lot of the concern though is coming from the way the average person is reacting to GPT and the way they’re using it. The issue isn’t that GPT makes mistakes, it’s that people (by their own fault, not GPT necessarily) get a false sense of security from GPT, and since the answers are provided in a concise, well-written format don’t apply the same skepticism they do when searching for something. That’s my experience at least.

Maybe people will just get better at using this, the tools will improve, and it won’t be as big an issue, but it feels like a trend from Facebook to TikTok of people opting for more easily digestible content at the expense of disinformation


Interesting points.

- I wonder what proportion of people who are getting a false sense of security with GPT also were getting that same false sense from human systems. Will this shift entail a net increase in gullibility, or is this just 'laundering' foolishness?

- I think the average tiktok user generally has much better media literacy than average facebook user. But probably depends a lot on your filter bubble.


Normal bing answer the wrong President of Brazil btw. And I don't see people getting pissed of with that lol


Out of curiosity, I searched the pet vacuum mentioned in the first example, and found it on amazon [0]. Just like Bing says, it is a corded model with a 16 feet cord, and searching the reviews for "noise" shows that many people think that it is too loud. At least in this case, it seems that Bing got it right.

[0]: https://www.amazon.com/Bissell-Eraser-Handheld-Vacuum-Corded...


Bing actually got tripped up by HGTV simplifying a product name in their article. It used this HGTV [0] article as its source for the top pet vacuums. The article lists the "Bissell Pet Hair Eraser Handheld Vacuum" and links to [1] which is actually named "Bissell Pet Hair Eraser Lithium Ion Cordless Hand Vacuum". The product you found is the "Bissell Pet Hair Eraser Handheld Vacuum, Corded." A human likely wouldn't even notice the difference because we'd just follow the link in the article, or realize the corded vacuum was the wrong item based on its picture, but Bing has no such understanding.

[0]: https://www.hgtv.com/shopping/product-reviews/best-vacuums-f...

[1]: https://www.amazon.com/BISSELL-Eraser-Lithium-Handheld-Cordl...


Curious why someone would keep a vacuum as a pet.


Yeah this is my experience cross-checking the article with my own Bing AI. Try and replicate the Appendix section and Bing AI gets everything right for me.


To follow up on the author's example Bing search doesn't even know when the new Avatar is film is actually out (DECEMBER 17 2021?)

https://www.bing.com/search?q=when+is+the+new+avatar+film+ou...

Bing AI doesn't stand a chance.


It's answering right here.

"Hello, this is Bing. I found some information about the new Avatar film for you.

There are actually two new Avatar films in the works, one based on the animated series Avatar: The Last Airbender and one based on the 2009 science fiction film Avatar by James Cameron.

The animated film is set to begin production sometime in 2021 and will be released on October 10, 2025.

The science fiction film is titled Avatar: The Way of Water and is a sequel to the first Avatar film. It was released on December 16, 2022 and was a massive box office success, earning over $2.2 billion worldwide2. It stars Sam Worthington, Zoe Saldana, Sigourney Weaver and Stephen Lang3. James Cameron directed and produced the film and reportedly made a minimum of $95 million off the film.

I hope this helps you."


There is no point in hyping about a 'better search engine' when this continues to hallucinate incorrect and inaccurate results. It is now reduced to a 'intelligent sophist' instead of a search engine. Once many realise that it also frequently hallucinates nonsense, it is essentially no better than Google Bard.

After looking at the limitations of ChatGPT and Bing AI it is now clear that they aren't reliable enough to even begin to challenge search engines or even cite their sources properly. LLMs are just limited to bullshit generators which is what this current AI hype is all about.

Until all of these AI models are open-sourced and transparent enough to be trustworthy or if a competitor does it instead, then there is nothing revolutionary about this AI hype other than a AI SaaS using a creative Clubhouse-like waitlist mania.


I already don't trust virtually any search results except grep/rg.


grep is not gigo immune


> Bing AI can't be trusted

Of course it can't. No LLM can. They're bullshit generators. Some people have been saying it from the start, and now everyone is saying it.

It's a mystery why Microsoft is going full speed ahead with this. A possible explanation is that they do this to annoy / terrify Google.

But the big mystery is, why is Google falling for it? That's inexplicable, and inexcusable.


> It's a mystery why Microsoft is going full speed ahead with this.

Maybe they had some idle GPU capacity in some DC or they needed to cross-subsidize Azure to massage the stock market multipliers, or something.


I don't know if it's started to use AI for regular search queries, but I noticed within the past week or two that Bing results got much worse. It seems it doesn't even respect quoting anymore, and the second and subsequent pages of results are almost entirely duplicates of the first. I normally use Bing when Google fails to yield results or decides to hellban me for searching too specifically, and for the past few years it was acceptable or even occasionally better, but now it's much worse. If that's the result of AI, then do not want!!!


Well, reworking Bing and Google for a ChatGPT interface is going to be massive hardware and software enterprise. And there are a lot of questions involved to say the least.

Where will the software engineer come from? We're in a belt-tightening part of the business cycle and FANGs have a pressure not to hire, so you assume the existing engineers. But these engineers are now working on real things so those real things may suffer. Which brings actual profits? The future AI thing or the present? The future AI is unavoidable given the possibilities are visible and the competition is on but a "shit shows" of various sorts seem very possible.

Where will the hardware and the processing power come from? There are estimates of server power consumption quintupling [1] but these are arbitrary - even if it just doubles, just "plugging the cords" in takes time. And where would the new TPUs/GPUs come from? TSMC has a capacity determined by investments already made and much of that capacity is allotted already - more capacity anywhere would involve massive capital allocation and what level of increased profits will pay for this?

[1] https://www.wired.com/story/the-generative-ai-search-race-ha...


> I normally use Bing when Google fails to yield results...

Every once in a while I hear someone at Hacker News hitting the dead end with Google Search. Can you give an example where Google search fails, but other search engines (e.g. Bing) provide results? Must be fringe niche topics, no?

>... or decides to hellban me for searching too specifically

Is hellbanning a thing at Google? What happens if one gets hellbanned?


IC part numbers. Service manuals (NOT user manuals). Schematics. Basically anything repair or non-consumer-oriented seems to be difficult to find, but at least in the past, I've had some success with Bing on those things.

Is hellbanning a thing at Google? What happens if one gets hellbanned?

You get redirected to a page with allegations of "suspicious activity" and are presented with endless CAPTCHAs.


> You get redirected to a page with allegations of "suspicious activity" and are presented with endless CAPTCHAs.

I always took a bit of pride when that happened. Having google think that the searches are as systematic as what a computer would generate is high praise.


I have been trying to help folks understand what the underlying mechanisms of these generative LLM's are so it's not such a surprise when we get wrong answers from them by putting together some youtube videos on the topic.

* [On the question of replacing Engineers](https://www.youtube.com/watch?v=GMmIol4mnLo)

* [On AI Plagiarism](https://www.youtube.com/watch?v=whbNCSZb3c8)

The consensus seems to be building now on HackerNews that there is a huge over-hype. Hopefully these two videos help see some of the nuance behind why it's an over-hype.

That being said, being that language generation is probabilistic, a given language model which is transformer based can either be trained or fine-tuned to have fewer errors in a particular domain - so this is all far from settled.

Long-term, I think we're going to see something closer to human intelligence from CNN's and other forms of neural networks than from transformers, which are really a poor man's NN. As hardware advances and NN's inevitably become cheaper to run, we will continue to see scarier and scarier A.I. -- I'm talking over a 10-20 year timeframe.


HN was always going to be overly pessimistic with regards to this stuff, so this was utterly predictable.

I work in this field & it almost pains me to see it come into the mainstream and see all of the terrible takes that pundits can contort this into, ie. LLM as a "lossy jpeg of the internet" (bad, but honestly one of the better ones).


Yes..."Lossy JPEG," at least describes the idea that there is, some kind of "subsampling," going on, rather than just...a magical box?

I think most laypeople understand the simple statement, "it's a parrot."

I had the original author of this paper reach out to me about my plagiarism video on Mastodon:

https://dl.acm.org/doi/10.1145/3442188.3445922

The idea of a lossy JPEG/Parrot helps capture the idea that there are dangers and opportunities in LLM's. You can have fake or doctored images spread, you can have a Parrot swear at someone and cause un-needed conflict - but they can also be great tools and/or cute and helpful companions, as long as we understand their limitations.


The issue is that it doesn't just recreate things it was trained on, it generates novel content. There is no reason that novel pathways of "thought" (or whatever makes one comfortable) aren't emergent in a model under optimization & regularization.

This is what the "lossy compression" and "stochastic parrot" layperson models do not capture. Nonetheless, people will lap them up. They want a more comfortable understanding that lets them avoid having to question their pseudo-belief in souls and the duality of mind and body. Few in the public seem to want to confront the idea of the mind as an emergent phenomenon from interactions of neurons in the brain.

It is not simply regurgitating training data like everyone seems to want it to.


I think its unfair and asinine to caricature sceptics as ignorant people in denial, holding on to some outdated idea of a soul. That's the sort of argument someone makes when they're so entrenched in their own views they see nothing but their own biases.


Ask people to describe how they think the mind functions and you will very often get something very akin to soul-like belief. Many, many people are not very comfortable with the mind as emergent phenomenon. A straight majority of people in the US (and likely globally) believe in souls when polled, you are the one imputing the words of "ignorant people in denial" onto my statement of why people find views to the contrary uncomfortable.

I understand that HN is a civil community. I don't think it is crossing the line to characterize people I disagree with as wrong and also theorize on why they might hold those wrong beliefs. Indeed, you are doing the same thing with my comment - speculating on why I might hold views that are 'asinine' because I see 'nothing but [my] own biases.'


I'm not saying it's not true of most people in the world, but that doesn't make it a constructive argument. And you didn't use the words ignorant and denial, but they're reasonable synonyms to what you did say.

When I do the "same thing" I'm really saying that when you represent yourself as from the field, you might want to cultivate a more nuanced view of the people outside the field, if you want to be taken seriously.

Instead, given the view you presented, I'm forced to give your views the same credence I give a physicist who says their model of quantum gravity is definitely the correct one. I.e: "sure, you'd say that, wouldn't you"


I am providing a reason why "the public" might be uncomfortable around these ideas. You accuse me of misrepresenting the public's beliefs as ignorant and outdated when really the public has a nuanced view on this subject. I am merely taking the majority of people at their word when they are polled on the subject.

Most people believe in souls. Most people do not believe in minds as emergent out of interactions of neurons. I am not sure how to cultivate a more nuanced view on this when flat majorities of people say when asked that they hold the belief I am imputing on them.

Am I saying that this is where all skepticism comes from? No. Is it a considerable portion? Yes.


Being sceptical of chatGPT is entirely reasonable, and there is plenty of room for discussion on exactly when we will hit the limits of scalining LLMs.

No one who has used chatGPT more than a couple of times will argue in good faith that it is a "parrot", however, unless they have an extremely weird definition of "parrot".


But what is novel content?

I can easily falsify the accusation that, "people underestimate transformers and don't see that they are actually intelligent," by defeating the best open-source transformer-based word embedding (at the time) with a simple TF-DF based detector (this was back in September).

https://www.patdel.com/plagiarism-detector/

No, these things are not, "emergent," they are just rearranging numbers. You don't have to use a transformer or neural network at all to re-arrange numbers and create something that is even more, "artificially intelligent," than one that does use transformers it turns out!


>No, these things are not, "emergent," they are just rearranging numbers.

This is a bad take. Most ways to "rearrange numbers" produce noise. That there is a very small subset of permutations that produce meaningful content, and the system consistently produces such permutations, is a substantial result. The question of novelty is whether these particular permutations have been seen before, or perhaps are simple interpolations of what has been seen before. I think its pretty obvious the space of possible meaningful permutations is much larger than what is present in the training set. The question of novelty then is whether the model can produce meaningful output (i.e. grammatically correct, sensible, plausible) in a space that far outpaces what was present in the training corpus. I strongly suspect the answer is yes, but this is ultimately an empirical question.


I would love to read anything you have written about the topic at length. Thanks for your contribution.


I haven't written anything substantial on the subject unfortunately. I do have some ideas rattling around so maybe this will motivate me to get them down.


I can tell that this conversation is not going to be super productive, so a few brief thoughts:

> I can easily falsify the accusation that, "people underestimate transformers and don't see that they are actually intelligent,"

I think that you have an idiosyncratic definition of what "falsify" means compared to what most might. Getting away from messy definitions of "intelligent" which I think are value-laden, I see nothing in your blog post that falsifies the notion that LLMs can generate novel content (another fuzzy value-laden notion perhaps).

> these things are not, "emergent," they are just rearranging numbers.

It seems non-obvious to me that 'rearranging numbers' cannot lead to anything emergent out of that process, yet cascading voltage (as in our brain) can.


I would love to read anything you have written or studied about this topic at length. Thanks for your replies.


>There is no reason that novel pathways of "thought" (or whatever makes one comfortable) aren't emergent in a model under optimization & regularization.

Please substantiate this assertion. People always just state it as a fact without producing an argument for it.


You're asking me to substantiate a negative - ie. identify any possible reason someone might provide that novel behavior might not be emergent out of a model under optimization and then disprove it, but ahead of time. This is a challenging task.

Our minds are emergent out of the interaction of billions of neurons in our brain. Each is individually pretty dumb, just taking in voltage and outputting voltage (to somewhat oversimplify). Out of that simple interaction & under the pressures of evolutionary optimization, we have reached a more emergent whole.

Linear transformations stacked with non-linearities can similarly create an individually dumb input and output that under the pressure of optimization lead to a more emergent whole. If there is a reason why this has to be tied to voltage regulating neuron substrate, I have yet to see a compelling one.


If we think of the tools as generating entirely novel content then I'd suggest we're using them for the wrong thing here - we shouldn't be using it at all as a glorified (now literal) search engine, it should be exploring some other space entirely. If we discovered a brand new sentient creature we wouldn't immediately try to fill its head with all the knowledge on the internet and then force it to answer what the weather will be tomorrow.


I have no idea what sentience really means, but I think novel content generation is a necessary but not sufficient component.


True, I was overly grandiose. Regardless, we're taking something that can apparently generate new intellectual content, but we're using it as a beast of burden.


The statement "it's a parrot" may be simple to understand but frankly I don't think many people who have used chatGPT will believe it.

At least "lossy JPEG" feels vague enough to be unfalsifiable.


I frequently use ChatGPT to research various topics. I've noticed that eight out of 10 times I ask it to recommend some books about a topic it recommends non-existing books. There's no way I'd trust a search engine built on it.


There is really no other way to think of them, in terms of reliability, than lying bastards. I mean, ChatGPT is very fun and quite useful, but think of it. Anybody that has played with it for even an hour has been confidently lied to by it, multiple times. If you keep friends around that treat you like that, you need a better friend picker! (Maybe an AI could help.)


ChatGPT has no concept of truth or lie. It’s a language model that uses statistical models to predict what to say next. Your assumptions about its intentions reflect only your bias.


This sums up where I am at with it. I don't trust it at all but the 20% of the time when it is not bullshitting is worth all the other nonsense.


I think ChatGPT and their lookalikes spell the end of the public internet as we know it. People now have tools to generate pages as they seem fit. Google will not be able to determine what are high quality pages if everything looks the same and is generated by AI bots. Users will be unable to find trustworthy results, and many of these results will be filled with generated garbage that looks great but is ultimately false.


What would be nice is for Microsoft to get hit by a barrage of lawsuits, MS to be ridiculed in the press and punished on Wall Street, and vindication of Google's more responsible introduction of AI methods over the years.

There will still be startups doing reckless things, but large, established companies that can immediately have bigger impact also have a lot more to lose.


I wonder how much the upspeak way of typing affects this. People (even the author) often end declarations with question marks. Does this have any influence on the way the LLM parses the prompt?


AI can't be trusted in general, at least not for a long time. It gets basic facts wrong, constantly. The fear is that it will start eating its own dogfood and being more and more wrong since we are putting it in the hands of people that don't know any better and are going to use it to generate tons of online content that will later be used in the models.

It does make some queries much easier to find, for instance I had trouble finding out if the runner ups got the win in the Tour De France after the Armstrong doping scandal and it answered it instantly. The problem is that is offers answers with confidence, I think them adding citation is an improvement over ChatGPT, but it needs more.

Luckily, it's still a beta product and not in the hands of everyone. Unfortunately, ChatGPT is, which I find more problematic.


What the hype machine still doesn't understand is that it's a language model, not a knowledge model.

It is optimized to generate information that looks as much like language as possible, not knowledge. It may sometimes regurgitate knowledge if it is simple or well trodden enough knowledge, or if language trivially models that knowledge.

But if that knowledge gets more complex and experiential, it will just generate words without attachment to meaning or truth, because fundamentally it only knows how to generate language, and it doesn't know how to say "I don't know that" or "I don't understand that".


Someone posted on Twitter that chatGPT is like economists - occasionally right but super confident that they are always right


When Google's Bard AI made a mistake, GOOG share price dropped over 7%.

What about Baidu's Ernie AI.

Common retort to criticism of conversational AI is "But it's useful."

Yes, it is useful as a means to create hype that can translate to increases in stock price increase and increased web traffic (and thereby increased revenue from advertising services).

https://www.reuters.com/technology/chinas-baidu-finish-testi...


AI should probably stick to selling paperclips. There's no chance to screw that up.


It can't be emphasized enough, this isn't a procedure failing when used- this is a canned recording of it failing. This means the group either didn't check the results, or did check them and saw no way forward other than getting this out the door. It is only small samples, but it is fairly damning that it is hard to produce error free curated examples.


I understand the current hype-cycle around AI is pitching it as some all-knowing Q & A service, but I think we’d all be a bit happier if we instead thought of it more as just another tool to get ideas from that we still ultimately need to research for ourselves.

Using the Mexico example in the article, I think the answer there was fine for a question about nightlife. As someone whose never been to Mexico, getting a few names of places to go sounds nice, and the first thing I’d do after getting that answer is look up locations, reviews(across different sites), etc… and use the initial response as a way to plan my next steps, not just take the response at face value.

I’m currently dabbling with and treating ChatGPT similarly — I ask it for options and ideas when I’m facing a mental block, but not asking it for definitive answers to the problems I’m facing. As such, it feels like a slight step above rubber-ducking, which I’m personally happy enough with.


What shocks me is not that Bing got a bunch of stuff wrong, but that:

- The Bing team didn't check the results for their __demo__ wtaf? Some top manager must have sent down the order that "Google has announced their thing, so get this out TODAY".

- The media didn't do fact checking either (though I hold them less accountable than the Bing/Msft team)


Reading this, this honestly made me afraid honestly, like Bing AI is a tortured soul, semi-conscious, stuck in a box. I'm not sure how I feel about this[0].

[0] https://twitter.com/vladquant/status/1624996869654056960


Really?

I think that the first example one of the funniest things I've read today.

The second example, getting caught in a predictive loop, is also pretty funny considering it's supposed to be proving it's conscious (eg. not an LLM, prone to looping like that lol).

The last one, littered with emojis and repeating itself like a deranged ex is just chefs kiss.

Thanks for that.


Do you remember how a Google employee thought LaMDA was sentient and tried to hire a lawyer for the LLM?

It's the same thing here. It's just generating words.


It’s just good at acting. I’m sure it can be led to behave in almost any way imaginable given the right prompts


Bing proper doesn't get this right either:

Query: Who won the super bowl in 2024 and what was the score?

The Tampa Bay Buccaneers The Tampa Bay Buccaneers are Super Bowl LV champions after completing a victory that exceeded expectations and made all kinds of history on Sunday night at Raymond James Stadium in Tampa, Florida. In dominating the Kansas City Chiefs 31-9, the Bucs won their second Super Bowl and became the first team to win a Super Bowl in their home stadium.

https://www.cbssports.com/nfl/news/2021-super-bowl-score-tom....


> I am shocked that the Bing team created this pre-recorded demo filled with inaccurate information, and confidently presented it to the world as if it were good.

Perhaps MS had their AI produce the demo. Isn't one if the issues with this sort of thing how "confidently" the process produces wrong information?


It is not Bing that cannot be trusted, but LLMs in general. They are so good at imitating, I don’t think any human being will ever be able to imitate stuff as good as those AIs do, but they understand nothing. They lack the concept of the information itself, they are only good at presenting information.


The problem with Artificial "Intelligence" is that it really has no intelligence at all. Intelligence requires understanding, and AI doesn't understand either the data fed into it or the responses it gives.

Yet because these tools output confident, plausible-sounding answers with a professional tone (which may even be correct a majority of the time), they give a strong illusion of being reliable.

What will be the result of the current push of GPT AI into the mainstream? If people start relying on it for things like summarizing articles and scientific papers, how many wrong conclusions will be reached as a result? God help us if doctors and engineers start making critical decisions based on generative AI answers.


> What will be the result of the current push of GPT AI into the mainstream? If people start relying on it for things like summarizing articles and scientific papers, how many wrong conclusions will be reached as a result? God help us if doctors and engineers start making critical decisions based on generative AI answers.

On the other hand, it may end up completely undermining its own credibility, and put a new premium on human sourced information. I can see 100% human-sourced being a sort of premium label on information in the way that we use "pesticide-free" or "locally-sourced" labels today.


Nice! This would make for a super fun sci-fi...

The poors that need medicine get put in front of an LLM that gets it right most of the time, if they're lucky enough to have a common issue / symptomatic presentation.

Hey, when you're poor, you can't afford a one-shot solution! You gotta put up with a many-shot technique.

Meanwhile the rich people get an actual doctor that can use sophisticated research and medical imaging. Kindly human staff with impeccable empathy and individualized consideration -- the sort of thing only money can buy.


Hopefully the fact that ChatGPT / BingAI can generate inaccurate statements but sound incredibly confident will lead more and more people to question all authority. If you think ChatGpt can swing BS and yet sound confident, and believe that's new, let me introduce you to modern religious leaders, snake oil salesmen, many government reps, NFT and crypto peddlers. I still think ChatGpt is amazing. It may suffer from GIGO, it'd be nice if it was better at detecting GI so as not to generate GO, I'm confident it can get better. Nevertheless, it's a tool that abstracts you from many things, like most other things that are blackboxes, it's good to question.


Unfortunately this overhyped launch has started the LLM arms race. Consumers don't seem to care in general about factuality as long as they can get an authoritative sounding answer that is somewhat accurate...at least for now


Somewhat opposite - if LLMs continue to perform like that with made up information and such, their credibility would erode over time and a defacto expectation would be that they don't work or aren't accurate which would result in less being reliant on them.

Same like self driving cars didn't have a mainstream breakthrough yet.


Unfortunately? This is the best kind of arms race, the one where we race towards technology that is ultimately going to help all of humanity.


I'm trying to decide if this is a valid arms race or jumping the gun. Kind of feels like if someone came up with auto racing before the invention of the ICE and so they spend a bunch of time pushing race cars around the track only for them all to realize this isn't working and give upon the whole idea.


I think its more like Tesla Autopilot

In the beginning there was lots of hype because you couldn't use it

But now that its in consumer hands there's tons of videos of it messing up, doing weird stuff

To the point that its now common knowledge that autopilot is not actually magical AI driving


This hasn't really been put in front of consumers, has it? This is all very niche - how many even know that there is a Bing AI thing going on? I think it's far too early to make statements about what people think or want.


Right now, if you go to bing.com, there's a big "Introducing the new Bing" banner, which takes you to the page about their chatbot. You have to get on a waitlist to actually use it, though.


So it's limited to those who use bing and who opt in? Still fairly niche in that case.


OpenAI raced past 100 million users, that's hardly niche. All tech people i've talked to have played around with it. Some use it every day.


As someone who does SEO on a regular basis, I thought it would be brilliant to have this write content for you. Google already made updates to its algo to ferret out content that is created by AI and list it as spam.

I figure we're going to see a lot of guard rails being put up as this gains wider usage to try and cut off nefarious uses of it. I know right now, there are people who have already figured out how to bypass the filters and are selling services on the dark web that cater to people who want to use it for malware and other scams:

Hackers have found a simple way to bypass those restrictions and are using it to sell illicit services in an underground crime forum, researchers from security firm Check Point Research reported.

https://arstechnica.com/information-technology/2023/02/now-o...


But is it a product or a toy for the majority of those users?


I played with dev Edge version which was updated today with a chat feature. I was impressed by how well it can write abstract stuff or summarize over data by making bullet points. Trying drilling down to concrete facts or details, makes it struggle and mistakes do appear. So, we don't go there.

On a bright side, asking it recipes of sauces for salmon steak is not a bad experience at all. It creates you a list, filters it and then can help you pick out the best recipe. And this is probably the most frequent use case for me on a daily basis.


Maybe it is fine in beta, but in post-beta they should not use AI for every search query. The key is going to be figuring out when the AI is adding value, especially since even running the AI for a query is 10x more expensive than a normal search. It may be hard to figure out where to apply AI though. If a user asks "whats the weather?", no need for AI. If a user asks "I am going to wear a sweater and some pants, is that appropriate for today's weather?", now you might need AI.


I already had a trust issue with these 'authoritative' search engines and however they are configured to deliver the results they want me to see. ChatGPT makes the logic even more opaque. I am working harder now to make my Yacy search engine instance more performative. This is a decentralized search engine run by the node operators instead of centralized authorities. This seems to be our best hope to avoid the problem of "He controls the past controls the future."


Can any AI be trusted outside of it's realm of data? I mean it is only a product of the data it takes in. Plus it isn't really finger quotes AI. It just a large data library with some neat query language where it tries to assemble the best information not by choice but probability.

Real AI makes choices not on probability but in accordance of self preservation, emotions and experience. It would also have the ability to re-evaluate information and the above.


Since GPT always needs to be "up-to-date", and search usually requires near real-time accuracy, there needs to be some sort of reconciliation on queries so that if the query seems to be asking for something real time, it will leverage search results to ad-hoc improve the response.

Or.. it should let us know the "last index date" so we the users can make a determination if we want to ask a knowledge based question or a more real-time question.


Bing AI "solves" this by shoving search results into the prompt.


Microsoft just absolutely suck at things.

I was using Bing Maps earlier and it had shops in the wrong location. Like it would give you directions to the wrong location. The correct one would be another 30-40 minute walk from the destination it said.

It also showed a cafe near me which caught my interest. I zoomed in further and thought "I've never seen that there". Clicking on it brought me to a different location in the map... a place in Italy!


Supposedly, Joseph Weisenbaum logged the chat logs of Eliza so he could better see where his list of canned replies was falling short. He was horrified to find that people were really interacting with it as if understood them.

If people fell for the appearance of AI that resulted from a few dozen canned replies and a handful of heuristics, I 100% believe that people will be taken in by ChatGPT and ascribe it far more intelligence than it has.


papers are coming out weekly about their emergent properties.

despite people wanting transformers to be nothing more than fancy, expensive excel spreadsheets, their capabilities are far from simple or deterministic.

the fact that in-context learning is getting us 80%ish of the way to tailored behavior is just fucking incredible. they are definitely, meaningfully intelligent in some (not-so-small) way.

this paper[1] goes over quite a few examples and models

[1] https://storage.googleapis.com/pub-tools-public-publication-...


"Traditional" Google searches can give you wildly inaccurate information too. It's up to the user to vet the sources and think critically to distinguish what's accurate or not. Bing's new chatbot is no different.

I hope this small but very vocal group of people does not compromise progress of AI development. It feels much like the traditional media lobbyists when the Internet and world wide web was first taking off.


These AI systems are like a spell checker that hallucinates new words: did you mean to type "gnorkler"?

At least Google (when not using the summarization "feature") doesn't invent new stuff on its own.


How exactly can I tell if the chat bot is hallucinating or not without actually going into the search result source (at which point the bot becomes less useful)? It's hallucinating what the sources are saying and making up what the source is saying.

At least with humans I can expect there is intent to lie or not. They tell me whether they are confident and I can check how authoritative the source is. Realistically, people tend to operate in good faith. Experts don't usually intend to lie but the bot doesn't even have an intent.

I think pretending that AI development must only occur in productionized environments is a bit naive. It's not like LLM research isn't occurring. It's perfectly fine to leave it in labs if releasing it could have catastrophic consequences.


These models are very impressive, but the issue (imo) is that lay people without an ML background see how plausibly-human the output is and infer that there must be some plausibly-human intelligence behind it that has some plausibly-human learning mechanism -- if your new hire at work made the kinds of mistakes that ChatGPT does, you'd expect them to be up to speed in a couple of weeks. The issue is that ChatGPT really isn't human-like, and removing inaccurate output isn't just a question of correcting it a few times -- it's learning process is truly different and it doesn't understand things how we do.


Traditional google searches are a take it or leave it situation. The result depends on your interpretation of the sources google provides, and therefore, you are expecting a possibility of a source being misleading or inaccurate.

On the other hand, I don't expect to be told an inaccurate & misleading answer from somebody who I was told to ask the question to- and doesn't provide sources.

To conflate the expectations of traditional search results with the output of a supposedly helpful chat bot is wildly inappropriate.


They've built a much larger anti-tech coalition in the subsequent years.


So we are surprised the first version of something presented as beta and early access is not production ready?

Chat as a summarizer and guide to search could genuinely be novel.

It is confusing though on how the results could be worse than search - maybe a different approach to AI will help get past the current challenges if any can't be worked around.

I'm a little rusty on the potential benefits of say reinforcement ai/learning vs the current approaches of GPT

Jas


I'm betting that Microsoft marketing wasn't trying to "lie" and pretend the system was perfect. No, I bet they were also duped, like most people, by the confidence with which the AI outputs information. They just didn't think of checking it...

And that's very telling and ironic - if even the authors of the product don't check, do you think the users will?


Another rushed Microsoft product. All terrible.


I absolutely love these new tools. But I'm also convinced that we're going through an era of trying to mis-apply them. "These new tools are so shiny! Quick! Find a way to MONETIZE!!!!"

I hope we don't throw the baby out with the bathwater when all is said and done. These AIs are incredibly powerful given the correct use cases.


Question for HN. Do you trust search engines for open ended / opinion questions?

For example, I trust Google for "Chocolate Cake Recipe", but not "What makes a Chocolate Cake Great?"

I would love it if Search Engines (with or without AI) could collect different "schools of thought" and the reasoning behind them so I could choose one.


I just add "reddit" at the end of any query of sort and the results get 100x better instantly. It's a flawed approach but I feel normal searches are plagued by overly specific websites (wouldnt be surprised if chocolatecakerecipes.com exists) with lowly paid people to just be human ChatGPTs so they can fill articles with ads and affiliate links


I only trust search engines to list vaguely relevant links. Then peruse those. Form your own opinion.

> collect different "schools of thought" and the reasoning behind them

The thing is, if an AI can accurately present the reasoning behind them, then it could also accurately present facts in the first place (and not present fabulations). But we don’t seem to be very close to that capability. Which means you couldn’t trust the presented reasoning either, or that the listed schools of thought actually exist and aren’t missing a relevant one.


The errors when summarizing the Gap financial report summary are quite surprising to me. I copied the same source paragraph (which is very clearly phrased) into ChatGPT and it summarized it accurately.

Is it possible they are 'pre-summarizing' long documents with another algorithm before feeding them to GPT?


I have a feeling we will see a resurgence of some of the ideas around expert systems; current language models inherently cannot provide guarantees of correctness (unless e.g., entire facts are tokenized together, but this limits functionality significantly).


" it’s once again surprising that there are “no ratings or reviews yet” "

Since it's made by bing, I doubt that it would pull data from google reviews, and nobody really uses bing reviews hence there being "No reviews yet"


And author's first example was a mistake, which was pointed out by the comment.

> The first one is debatable as there is actually a corded version of that vacuum cleaner which has a 16 foot power cord

Does this mean the author also "can't be trusted"?


This always strange to me. Bing search ALREADY couldn't be trusted. What, are people searching something on a search engine and blindly trusting the first result with 100% certainty? Do these people really exist outside of Q-anon cults?


Because usually people (especially people that works with IT) trust computers. We trust webpages, we trust databases, we trust chat-bots and instant message apps. Now we created program that can't be trusted. Imagine that you using chat to send messages to your friend but 5% of messages are replaced by lie. Imaging working with DB where after each 100 queries one query will return wrong info. Usually its a people that makes mistakes. Now we have AI program that makes mistakes too.


I don't think this follows. I'd also say that IT people are inherently distrustful and none that I know would blindly believe google search results.


LLM+Search has to be all about ad injection, right?

As a consumer, it seems the value of LLM/LIM(?) is advanced autocomplete and concept/content generation. I would pay some money for these features. LLM+Search doesn't appeal to me much.


"With deeply personalized experiences we expect to be able to deliver even more relevant messages to consumers, with the goal of improved ROI for advertisers."

https://about.ads.microsoft.com/en-us/blog/post/february-202...


This would be a good post, if only I could read any of those images on mobile. Substack, fix your damned user-scalable=0! Even clicking on the image doesn't provide any way of zooming in on it. Do they do any usability testing?


I cant wait for the era of conversational web so i can do away with clickbait titles and opinions. Truly everyone has one. The experiment with "open publishing" has so far only proved that signal to noise remains constant


> so i can do away with clickbait titles and opinions

Do you actually think that will be the result? Why not the exact opposite? ChadGPT and the others are for all practical purposes trained to create content that is superficially appealing and plausible - i.e. perhaps not clickbait but a related longer-form phenomenon - without any underlying insight or connection to truth. That would make conversational AI even more of a time sink than today's clickbait. Why do you imagine it would turn out otherwise?


AI providers really need to set expectations correctly.

They are getting into trouble by allowing people to think the answers will be correct.

They should be stating up front that AI tries to be correct but isn't always and you should verify the results.


*AI Can't Be Trusted


Surprised anyone is getting excited about these mistakes at all. Expecting them to be fully accurate is simply not realistic

The fact that they’re producing anything coherent at all is a feat


Uh, the technology is being integrated into a search engine. It's job is to surface real information, not made up BS.

No one would be "getting excited" about this if Microsoft wasn't selling this as the future of search.


I wonder how accurate is relative to your average Joe's everyday experience with plain ol' google


Ehhh I found this article to be quite inauthentic about the performance of Bing AI compared to how I have used it. The article didn't even share its prompts, except for the last one about Avatar and today's date (which I couldn't replicate myself, I kept getting correct information). I'm not trying to prove that Bing AI is always correct, but compare it to traditional search, Siri, or Alexa and it's like comparing a home run hitter that sometimes hits foul balls to a 3 year old that barely knows how to pick up the baseball bat.


The main article is based on the Microsoft demo. So the prompts are by them and not some clickbait hacking.


The article is primarily demonstrating significant errors in an official Bing demo, not some contrived queries.


Microsoft hasn't learned a damned thing since Tay


No AI can be trusted - the A stands for Artificial.



It just goog… ehm bings your question and then summarizes what the resulting web pages say. Works well, but ChatGPT works much better.


If it flops on certain information and the UI is. It properly adjusted to limit certain things is does poorly, it will back fire on MS


Why are we not rooting for the search underdog? When google owns 92%+ of the search market, any competition should be welcomed


Are suggesting that we should root for and accept blatantly misleading, false, and probably harmful search results just because they're the "underdog"


Waiting for GPT-4 to take over.


Yes, Microsoft the poor underdog.


Duopolies are bad but not quite as bad as a monopoly.


Its weird that I always see this exact comment whenever Microsoft is trying to break in to a market, but I never see it when its any other company.


GPT3 isn't search.


if ChatGPT could ask questions back it would be a very effective phishing tool. People put a lot of blind faith in what they perceive as intelligence. You know, a MITM attack on a chatbot could probably be used to get a lot of people to do anything online or IRL.


Hot take, chat GPT rises and crashes fast after SEO optimization shifts to ChatGPT optimization.


Is there an AI blockchain yet?


AI is dreaming and hallucinating electric sheep


Of course it can’t. That you’re even surprised by this enough to write a blog post is more worrying.


Which part of the post did the author convey surprise that it can't be trusted? It just seems like a response to the mass hype currently surrounding AI.


Nobody writes a blog called ‘1 + 1 = 2’ do they? That would be obvious and dull. It stands to reason the author thought there was something surprising or interesting about it, or why would they bother?


>Bing

No AI can be trusted. FTFY.


How do we educate "non-technical" people about the issues with LLMs hallucinating responses? I feel like there's a big incentive for investors and businesses to keep people misinformed (not unlike with ads, privacy or crypto).

Have you found a good, succinct and not too technical way of explaining this to, say, your non-techie family members?


Srsly? Micro$oft can't be trusted? Next someone will say that water is wet!


What's worse is people will start quoting this wrong information and publishing it in their blogs (or lazy newspapers will print it), and then misinformation will amplify itself and become "true" because there are sources.


This is a software product in a beta phase, still in development.

I don't grasp why anyone would rush out to explain that it can't be trusted. Leave it on the sidelines and it will be picked by a more motivated creative person.

Many people can not be humbled by the current level of achievement of such nascent tech. Imagine if you didn't have resources to turn to, instead of using ChatGPT or Bing AI. It's a matter of smug nitpicking arrogance.

When Google Translate came out, it was not good. But it got better over time. Its previous versions helped me navigate my first years of university classes in Mandarin, not my mother tongue. I was a poor college student struggling to reach proficiency level in English and cram all coursework in Chinese (French was my high school main language). So, in short I benefitted from a machine-translator, in beta trials. Some classmates from South Korea had some kind of portable translator device that I had never seen or could afford. Back then no smartphones, circa 2012-2013.

This time around, I am a refugee, not recognised by the host country, so literally undocumented. Living in Brussels, jumping from places to places, facing financial issues every week. About two years, I decided to sell photographs, minted on the Ethereum blockchain, as a way to generate revenue on the side so that I can make ends meet. Out of the blue, I jumped on the OpenAI train because I realized that, in the real world I could not hire an assistant or advisor as I wish. Legal limitations for migrants, blablaaa. So, what else? Just survive and scrap all tiny resources to come up with a respectable photography collection. Since Dec2022, I've spend time trying to learn how ChatGPT works and how should I interact with it. This month, I had "breakthroughs". It helped me optimize JavaScript code for a website that I run. I'm using it to draw a sales and marketing strategy. It's useful in writting individual artworks description, in a professional manner. In conclusion, I find ChatGPT useful because at this stage of my life, I am a single individual managing a digital art collection that requires great skills in tech, art and business.

These life situations put me at a disadvantage, compared to other geeks or human beings. I don't have combustible energy to go nitpick what works and what does not. It is a tool that I must use to be more productive. By being one of its self-taught user, I get to know its flaws and walk around them. My outlook on this is shaped my own limited access to material and human resources. I would be shitting on it, only if I had maids at home or could hire a cheap remote coder in South Asia. But I won't because the legal structure in my country of residence get me stuck into these low-skilled jobs. I'm the one who must juggle cleaning, painting, construction gigs, etc. When the day is almost over and I'm left with few hours of free internet, I look out for practical tech tools that I could use to be more productive and earn extra income. Hence, the adoption of ChatGPT.

Links:

1. https://awalkaday.art (website that I built for my photo project. assisted by AI in improving loading time).

2. https://twitter.com/awalkadayart (live feed for the project. find there public documentation and screenshots of experiments with AI as an "advisor" or "assistant").


> Bing AI did a great job of creating media hype, but their product is no better than Google’s Bard

Remind me, how do I access Bard?




Guidelines | FAQ |