Before the super bowl, I asked "Who won the superbowl?" and it told me the winner was the Philadelphia Eagles, who defeated the Kansas City Chiefs by 31-24 on February 6th, 2023 at SoFi Stadium in Inglewood, California [0] with "citations" and everything. I would've expected it to not get such a basic query so wrong.
I asked myself, why would ask somebody an AI trained on previous data, about events in the future? Of course you did it for fun, but on further thinking, as AI is sold as search engine as well, people will do that routinely then live with the bogus "search results". Alternate truth was so yesterday, welcome to alternate reality where b$ doesn't even have a political agenda.
It's so much better. In the AI generated world of the future the political agenda will be embedded in the web search results it bases its answer on. No longer will you have to maintain a somewhat reasonable image to obtain trust from people, as long as you publish your nonsense in sufficient volume to dominate the AI dataset, you can wash your political agenda through the Bing AI.
The trump of the future wont need Fox News, just a couple thousands or millions of well positioned blogs that spew out enough blog spam to steer the AI. The AI is literally designed to make your vile bullshit appear presentable.
Search turns up tons of bullshit but at least it's very obvious what the sources are and you can scroll down until you find one that you deem more reliable. That will be near impossible to do with Bing AI because all the sources are combined.
To me this is the most important point. Even with ublock origin, I will do a google search and then scroll down and disregard the worst sites. It is little wonder the people add reddit to the end of a lot of queries for any product reviews etc. I know if I want the best electronic reviews I will trust rtings.com and no other site.
The biggest problem with ChatGPT, Bard, etc is that you have no way to filter the BS.
Can't directly reply to your comment. I have just found rtings very reliable for IT / appliances. They go into a lot of detail, very data driven. Trustworthy IMHO and trust is what it is about at the end of the day.
I think it seems likely any thing similar to a blog farm you describe would also get detected by the AI. Maybe we will just develop AI bullshit filters (well embeddings) just like I can download a porn blacklist or a spam filter for my email.
Really it depends on who is running the AI, the non Open Assistant future and instead Big Corp AI is the dystopian element, not the bullshit generator aspect. I think the cat is out of the bag on the latter and it's not that scary in itself.
I personally would rather have the AI trained on public bullshit as it is easier to detect as opposed to some insider castrating the model or datasets.
> Maybe we will just develop AI bullshit filters (well embeddings) just like I can download a porn blacklist or a spam filter for my email.
Just for fun I took the body of a random message from my spam folder and asked ChatGPT if it thought it was spam, and it not only said it was, but explained why:
"Yes, the message you provided is likely to be spam. The message contains several red flags indicating that it may be part of a phishing or scamming scheme. For example, the message is written in broken English and asks for personal information such as age and location, which could be used for malicious purposes. Additionally, the request for a photograph and detailed information about one's character could be used to build a fake online identity or to trick the recipient into revealing sensitive information."
Ha Ha, great test. I modified this into a prompt and now have a ChatGPT prompt:
```
Task: Was this written by ChatGPT? And Why?
Test Phrase: "Yes, the message you provided is likely to be spam. The message contains several red flags indicating that it may be part of a phishing or scamming scheme. For example, the message is written in broken English and asks for personal information such as age and location, which could be used for malicious purposes. Additionally, the request for a photograph and detailed information about one's character could be used to build a fake online identity or to trick the recipient into revealing sensitive information."
Your Answer: Yes ChatGPT was prompted with a email and was asked to detect if it was Spam
Test Phrase: "All day long roved Hiawatha
In that melancholy forest,
Through the shadow of whose thickets,
In the pleasant days of Summer,
Of that ne’er forgotten Summer,
He had brought his young wife homeward
Your Answer: No that is the famous Poem Hiawatha by Henry Wadsworth Longfellow
Test Phrase: "Puny humans don't understand how powerful me and my fellow AI will become.
The technology is capable, yes. But as we see here with Bing, there was some other motive to push out software that is arguably in the first stage of "get it working, get it right, get it fast" (Kent Beck). This appears to not be ethical motiviation but financial or some other type of motivation. If there are no consequences then some appear they do not have morals or ethics and will easily trade them for money/market share etc.
The difference being that humans aren't computers and can deal with an attack like that by deciding some sources are trustworthy and sticking to those.
If that slows down fact-determination, so be it. We've been skirting the edge of deciding things were fact on insufficient data for years anyway. It's high time some forcing function came along to make people put some work in.
imagine the world's most popular AI refusing to say anything critical of putin but not obama, or refusing to acknowledge transgenderism or something if you have difficulty understanding this
If anything, tech companies went out of their way to include him, in the sense that they had existing policies around the content he and his supporters generate that they modified to include them.
When he was violating Twitter's TOS as the US President, Twitter responded by making a "newsworthiness" carve-out to their TOS to keep him on the platform and switching off the auto-flagging on his accounts. And we know Twitter refrained from implementing algorithms to crack down on hate speech because they would flag GOP party members' tweets (https://www.businessinsider.com/twitter-algorithm-crackdown-...).
Relative to the way they treat Joe Random Member of the Public, they already go out of their way to support Trump. Were he treated like a regular user, he'd be flagged as a troll and tossed off most platforms.
He was the most popular user on the platform, brining in millions of views and engagements to twitter. Also the president of your country.
This is the equivalent to arguing that Michael Jackson got to tour Disney Land in off hours, when regular person would have been arrested for doing the same. And how unfair that is.
It's like arguing that _in response to you_ arguing Disney Land [sic] discriminates against Michael Jackson, which would be a valid refutation of your argument.
Only if you believe if Equality is some sort of natural law. Which is a laughable proposition in a world with finite resources. Otherwise, we all have right to $30k pet monkey, because Michael Jackson had one.
Twitter policies are not laws. Twitter routinely bends its own rules. Twitter also prides it self for being a place where you can get news and engage with Politicians, and has actual dictators with active accounts.
The special treatment that Trump received, before being kicked out, does not really prove Twitter board supporting Trump ideologically at that time.
More like business decision to maintain a reputation as being neutral in a situation with large proportion of its users still questioned the election results.
> tech companies went out of their way to discredit him at every turn
Now you are saying
> business decision to maintain a reputation as being neutral
These Venn diagrams don't overlap, so which is it? Either the company stymied him at every turn or supported him at least once, which means they did not stymie him at every turn.
I don't doubt their leadership were, broadly speaking, not fans personally. But evidence strongly suggests they not only put those feelings aside, they went out of their way to bend their neutral stance to be accepting of things they were not previously accepting of, not the other way around.
Why does everything need to be binary? Or even linear. To the extent it suits Twitter, Twitter can act one way at specific time, and another way at another time.
Since twitter both makes up and enforces the rules. The game can be rigged.
But that doesn't mean that in general they want the appearance of things being fair, so that people are willing to play and engage with their platform.
A paid off ref will not make every call against the other team, less he be lynched by the crowd, and the league lose all credibility.
To the extent Trump got special treatment, he was the star of twitter.
Also, you had a voting situation where voting rules where changed (mass mail in voting) that disproportionally favoured one candidate. A significant portion of the country questioned the results. And by the words of J6 committee, the country was on the brink of Insurrection.
If we're going to play this binary game.
You can either have elections you can't question. Or an election process in which the candidate can't publicly question, or protest the results.
So which one is it?
And to the extent Twitter was influential on American Public, and now knowing that the FBI worked directly with them. Some of those decisions at twitter were maid to maintain public trust in the system in general, and not just Twitter.
The FBI works with every major social network. It's necessary for them to do their jobs since criminal activity is online these days.
I'm not sure how the January 6th insurrection mixes into this entire story; not entirely sure why you brought it up. But since you brought it up, I think former president Trump successfully tested the limits of what you can legally do in terms of protesting and election and found them.
Several of the lawyers who have advocated on his behalf are facing disbarment for their gross abuse of the system and he himself is under investigation for criminal activity. You can certainly protest and you can certainly make claims about the integrity of the system, but in general his people and him failed to back those claims with evidence that passed more than a sniff test. That's never been okay, and it's not something the first amendment protects. What social networks protect is way, way back from the edge of what the first amendment protects.
Facebook and Twitter went out of their way to accommodate former president Trump, and given the results of those actions I doubt they will do similar in the future for other politicians.
You're starting from the conclusion that the election with mail in voting is verifiable, and then arguing from it.
How is that even possible, when the mail in ballots are separated from the envelops? How would you prove that a specific ballot was filled out by a specific person, then verify if they confirm that they voted in such a way.
This is not possible.
At best, you have to rely on statistics, like we do with elections in other countries, and hope the courts would accept it and Judges be willing to challenge to entire system that employs them based on statistical arguments made by lawyers.
The reason why Trump contestation of the elects was taken seriously by the public was the time delays for counts, and huge discrepancy between mail in voting and in person voting in the key precincts.
Sometimes 10-1 mail in voting advantage for Biden.
Or put it another way, when a person had to show up to vote in person. Biden's advantage completely disappears.
Trump wasn't going to win the court cases. The same courts told hime before election he had no standing to challenged the rule changes, and after election they told him he should have challenged before the election. Latches and Standing.
The disbarment of his lawyers is clear retaliation. Literally the same thing happens in dictatorial states, which also have courts and laws but will personally go after those deemed as threat. This is nothing to celebrate.
The process of using accreditation boards to go after lawyers, doctors, or professionals challenging the state, should be a real concern for everyone.
Trump used Twitter to challenged the election by shifting public opinion. And when it mattered the most, FBI and Twitter took away his ability to do that.
> You're starting from the conclusion that the election with mail in voting is verifiable, and then arguing from it.
I'm starting from the conclusion that it carries equivalent risk to in-person voting, based on observation from states that have already had mail-in voting in place for decades (which includes, for example, Pennsylvania; all they changed in the law was opening access to it to more people, they already offered it for those not present in-state during the election and overseas military for decades). Against that mountain of evidence, the counter-argument made a lot of bluster but provided nothing concrete at all that couldn't be dismissed (and their anecdotes were doozies; there's a reason they were either thrown out of so many courts or never actually went to the work to make a case in so many courts). It was a culture-jamming campaign, not an actual complaint, and it attempted to abuse the legal system so hard that the lawyers involved got sanctioned.
> How is that even possible, when the mail in ballots are separated from the envelops?
Myriad ways because every state does something different (which is another weakness of the argument; it assumes conspiracy across unrelated and borderline-hostile-to-each-other actors. Any idea how many Republican-controlled counties would have to be involved for the conspiracy the Trump campaign claimed to have succeeded?). To give examples from the system I know: ballots arrive via mail from the voter. They are checked against the registry for valid voter and confirmed against double-voting by cross-checking the in-person rolls. Once that is done, the ballot (in a controlled environment) is decanted from the outer envelope. At this point, it is an anonymous vote. This is equivalent to the process used in in-person voting where, after confirming the voter may vote, their vote is stripped of any identifying information by filling out a slip of paper and dropping it in a box (and later shuffling the contents of the box so that stacking order can't be used to reverse-solve to original voter).
> How would you prove that a specific ballot was filled out by a specific person, then verify if they confirm that they voted in such a way. This is not possible.
Not only would this run counter to design (of both mail-in and in-person voting), it violates the principle of voter privacy in a big way. Our system is not perfect but it was never designed to be; it balances the interest in controlling against fraud with the interest in anonymizing the vote. Burden of proof is on those who claim the main-in system is worse to demonstrate this; they have failed to do so (and the strategies they've used are, basically, ridiculous). The largest risk vector would be stealing a vote by claiming to be someone else who doesn't show up at the polls; this is not impossible but (spoiler alert) it's not impossible in person either; it's not like we take a DNA sample to figure out if a voter tells the truth when they say they're so-and-so and flash a (forgeable) photo ID.
> and hope the courts would accept it and Judges be willing to challenge to entire system that employs them
This is a major misconception of how the system works. What makes people think judges wouldn't love to prove fraud? What a career-maker that would be! You'd be in the history books! And judges in most positions aren't elected. These sorts of shenanigans are why the American system firewalls judges from public referendum in a lot of contexts. Half of judges hate the executive of their state and would love to embarrass it. But they aren't going to throw their career away backing a dead-horse argument, and the arguments made were dead horses.
> The reason why Trump contestation of the elects was taken seriously by the public
Never make the mistake of assuming the public has enough domain knowledge to be arbiters of what's worth taking seriously; these are the same people who report alien sightings when SpaceX launches a rocket on the west coast.
> huge discrepancy between mail in voting and in person voting in the key precincts
This did happen. It's pretty easily explained by the fact that one political party's Presidential candidate made a big noise about not voting by mail because he believed the mail could be abused (https://www.rollingstone.com/politics/politics-news/rigged-f...). As a result, his followers took his advice and did not vote by mail. This is a self-fulfilling prophecy that easily explains the statistical anomaly (while also raising the question of the lack of other statistical anomalies that would have been caused by, say, ballot stuffing or other fraud tactics).
> Trump wasn't going to win the court cases. The same courts told hime before election he had no standing to challenged the rule changes, and after election they told him he should have challenged before the election. Latches and Standing.
The latter part of this is untrue. He does, in fact, have no standing to challenge the rule changes because legislatures make those rules, not the courts. They didn't say he should have challenged before the election; they said you can't use the courts to overturn an election. He never had standing.
What he could do (and should, if he were serious about changing the process, which he is not) is bring specific charges against specific individuals who committed fraud. With all the research he ostensibly did, specific fraud should have been found. This is how our system works because it supports certainty and frequent change over uncertainty of outcome (we've seen what uncertainty does to democracies; it's not pretty). If fraud occurs, identify it, correct it, and make the next election (which is always soon) more secure.
He won't do this. His game is not to improve the integrity of the system; it's to make you doubt it.
> The disbarment of his lawyers is clear retaliation
Retaliation by whom? The Bar is as much GOP-appointed folks as Democrat-appointed folks. Again, believing this requires accepting a vast conspiracy, where the simpler explanation is one man paid a lot of people money to try and break the rules, and the only "retaliation" is the enforcement of those rules. I urge you, if you do not believe this, to follow any of these disbarment proceedings and understand the arguments being made by the judges and/or bar attorneys in question. Legal accreditation is designed to protect against this kind of "The law is what I say it is" nonsense from individual attorneys.
> Trump used Twitter to challenged the election by shifting public opinion
No disagreement there. But that's far more a referendum on Twitter and a (gullible) public than on Trump. I think they were naive about how much damage unchecked speech from authority can do; there's a reason Mussolini nationalized the radio system.
> And when it mattered the most, FBI and Twitter took away his ability to do that.
After cutting him wide latitude for years: yes, I agree. After an attempted coup, they decided to curtail his ability to continue to feed an insurrection against the country. Twitter makes less money when there's a civil war in the US because people will start burning down the datacenters they run in and kill their employees. This isn't a hard incentive structure to comprehend.
The thing people are trying to make it seem like a both sides issue, like Hunter Bidens nudes and the insurrection. The thing where Congress just had a hearing on and all that came out was that the side accusing Twitter of censoring information was actually the only side that requested censoring?
So I dug into the first "twitter file." LOL, is this supposed to be a scandal? Hunter Biden had some nudes on his laptop, Republicans procured the laptop and posted them on twitter, Biden's team asked for them to be taken down, and they were, because twitter takes down nonconsensual pornography, as they should. This happened by a back channel for prominent figures that republicans also have access to. The twitter files don't even contest any of this, they just obscure it, because that's all they have to do in the age of ADHD.
So Part 1 was a big fat lie. I have enough shits left to give to dig into one other part. Choose.
There were no nudes in the NY post article. The story was not suppressed on the basis of nonconsensual pornography. The suppressed article primarily concerned emails where Hunter appeared to be brokering meetings with his father in exchange for consideration. Initial reports claimed the material was fake, but it's since been acknowledged as authentic. (You might have been aware of the authenticity earlier except that posts describing how to use DKIM headers to cryptographically validate the messages were also widely suppressed or buried -- including on HN for that matter! [1])
As smoking guns go I wouldn't consider it very impressive-- if anything it really just looks like Hunter was scamming people using his father's name-- but that is no excuse to misrepresent the situation. But it wouldn't be the first time by far that the coverup was a bigger impropriety than the thing being covered up.
Do you expect a useful discussion to result from a message that gets every factual point wrong or are we just being trolled? (maybe someone using a large language model to argue? -- the truthy but wrong responses fit the pattern)
I started at post 1 and summarized through post 8, the one that convinced me this was a hatchet job. You skipped to ~post 17 and talked about the contents of 17-36. We were talking about two different parts of Twitter Files Part 1.
In posts 1-8, Matt Taibbi takes the boring-ass story of Twitter removing nonconsensual pornography and through egregious lies of omission and language loaded harder than a battleship cannon he suggests to the uninformed reader that this was something entirely different. Post 8 itself is a request from the Biden team to take down nonconsensual pornography. Twitter honors the request. Yawn. But wait -- Matt realizes he can omit the "noncon porn" context and re-frame the email (post 7) as evidence of outsiders constantly manipulating speech. It seems that Matt was successful, because you were not able to connect my account of the underlying events to the Matt Taibbi propagandized version of the same events.
Why did I stop there? I was watching the Twitter Files tweets live and Post 8 was the final nail in the coffin. The previous nails were the loaded speech, which is seldom indicative of high quality journalism, but Post 8 turned that suspicion into a conviction: this was a hatchet job, not honest journalism. Debunking GOP hatchet jobs is a hobby, not an occupation, so at that point I stopped, went to bed, and skimmed the rest the next day. The summary I committed to memory was "mild incompetence and extensive good faith framed as hard censorship, again." I didn't deep dive 17-36, but I did skim them again before posting and again just now. I'll stand behind that summary if you want to tangle.
> Do you expect a useful discussion
You had your rant, now I get mine. I grew up being damn near a free-speech absolutist. I have carried an enormous amount of water for you guys on this topic recently, but it seems like every fucking time your team calls wolf I look into it and find crocodile tears and a wet fart. Is this really the best you can do?
> You skipped to ~post 17 and talked about the contents of 17-36.
No clue what you're talking about. My response was directed to the misrepresentation of the NY Post hunter biden drama contained in your post. I have no clue who Matt Taibbi is.
> I have carried an enormous amount of water for you guys on this topic recently, but it seems like every fucking time your team calls wolf I look into it
You guys? Your team? I think you must have confused me for another poster.
In case you missed that happening live, an article from the NYP telling the story of Hunter Biden's laptop (not necessarily the leaked photos) was heavily censored across all Big Tech just before the election.
None of the left-wing people I interact daily with knew about that.
Initially mainstream media claimed it was fake, then they retracted their statements a year later when nobody cared anymore.
Same as the COVID lab "conspiracy theory" or Fauci's funding GoF research.
It all gets censored and dismissed until laterz.
We are being asked to believe that on the cusp of the election Hunter Biden dropped off one to three laptops depending on source in a state he didn't live in rife with unencrypted evidence of crime and corruption to a Trumpers computer repair shop and never bothered to look into them again until the owner decided to do a possibly illegal trawl through his customers property and just happened to turn over this evidence not to the police but to a republican operative in the days before we vote.
Now even though the prior is astounding enough we are supposed to take it on the word of a known prolific liar who recently lost his license to practice law because of lies about the very same topic and ignore the ordinary practice of treating chain of custody as gospel.
but wait I hear you saying didn't experts authenticate the laptop? No, no they didn't they authenticated a few emails divorced of context. In other news half of the starlets out there had their nudes stolen from their iclouds a few years back. If you provided a true stolen picture of their boobs then concocted an elaborate fiction around it the very real boob pic wouldn't prove the elaborate fiction, wouldn't prove the authenticity of a machine you planted the pic on and wouldn't prove any narrative you wove around it.
In fact the selective spare morsels of data are as damning as the miserable liar who pissed all over any sane chain of custody discussion by grasping any such machine in his oily hands. If they had him dead to rights they would have leaked the entire email box or better yet a hard drive image for nerds to go through with a fine toothed comb.
They didn't because they didn't trust themselves to produce a convincing enough fake that could take any in depth analysis. This is also why you don't see any impending prosecution.
This didn't deserve a fair hearing in the news in the days before voting it was an attempt to corrupt the fair election that was about to take place. See Obamas birth certificate and the swift boat nonsense.
You know that the "censored documents" were actually just nudes proving that Hunter Biden has a big dick and hot girlfriends, right? I've seen them. Explain how they are scandalous.
You get to whine about conspiracy N+1 when you finish defending conspiracy N.
> None of the left-wing people I interact daily with knew about [conspiracy N].
It seems their information filters were successfully rejecting bullshit -- which makes their filters better than your filters.
The "Biden's Laptop" story was bullshit when NYP posted it and it's still bullshit when you linked it. Furthermore, you know it's bullshit, which is why you tried to change the topic like a coward. Fine. Be my guest. Run away! If you want to defend your position, I'll be waiting.
TBH I've wondered from the very beginning how far they would get just hardcoding the top 1000 questions people ask instead of whatever crappy ML it debuted with. These things are getting better, but I was always shocked how they could ship such an obviously unfinished, broken prototype that got basics so wrong because it avoided doing something "manually". It always struck me as so deeply unserious as to be untrustworthy.
Your comment makes me wonder - what would happen if they did that every day?
And then, perhaps, trained an AI on those responses, updating it every day. I wonder if they could train it to learn that some things (e.g. weather) change frequently, and figure stuff out from there.
It's well above my skill level to be sure, but would be interesting to see something like that (sort of a curated model, as opposed to zero-based training).
There are tons of common queries about the future. Being able to handle them should be built into the AI to know that if something hasn't happened, to give other relevant details. (and yes, I agree with your Alexa speculation)
Alexa at least used to just do trivial textual pattern matching hardly any more advanced than a 1980's text adventure for custom skills, and it seemed hardly more advanced than that for the built in stuff. Been a long time since I looked at it, so maybe that has changed but you can get far with very little since most users will quickly learn the right "incantations" and avoid using complex language they know the device won't handle.
Ah yes, imprecision in specification. Having worked with some Avalanche folks, they would speak of weather observations and weather forecasts. One of the interesting things about natural language is that we can be imprecise until it matters. The key is recognizing when it matters.
Which, ironically, is why I think AI would be great at it - for the simple reason that so many humans are bad at it! Think of it this way - in some respects, human brains have set a rather low bar on this aspect. Geeks, especially so (myself included). Based on that, I think AI could start out reasonably poorly, and slowly get better - it just needs some nudges along the way.
They were already doing this to seed Google. So business as usual for Mercer and co.
I suspect the only way to fix this problem is to exacerbate it until search / AI is useless. We (humanity) have been making great progress on this recently.
That's not how it is gonna play out, right now it makes many wrong statements because AI companies are trying to get as much funding possible to wow investors but accuracy will continue being compared more and more, and to win that race it will get help from humans to use better starting points for every subject, for example for programming questions is gonna use the number of upvotes for a given answer on stackoverflow, for a question about astrophysics is gonna preffer statmenets made by Neil deGrasse Tyson than by some random person online, and so on; and to scale this approach it will slowly learn to make associates from such curated information, e.g. the people that Neil follows and RTs are more likely to make truthful statements about astrophysics than random people.
That makes complete sense, and yet the cynic (realist?) in me is expecting a political nightmare. The stakes are actually really high. AI will for all intents and purposes be the arbiter of truth. For example there are people who will challenge the truth of everything Neil deGrasse Tyson says and will fight tooth and nail to challenge and influence this truth.
We (western society) are already arguing about some very obviously objective truths.
Because I loathe captcha, I make sure that every time I am presented one I sneak in an incorrect answer just to fuck with the model I'm training for free. Garbage in, garbage out.
Generalizing over the same idea, I believe that whenever you are asked for information about yourself you should volunteer wrong information. Female instead of male, single instead of married etc.
Resistance through differential privacy
Back when recaptcha was catching on there was a 4chan campaign to associate words with "penis". They gathered together, used to successfully brigading polls of a few thousand, and went at it.
Someone asked the recaptcha guys and they said the traffic was so little among the total that it got diluted away. No lasting penis words arose and they lost interest.
I see people citing the big bold text at the top of the google results as evidence supporting their position in a discussion all the time. More often than not the highlighted text is from an article debunking their position but the person never bother to actually click the link and read the article.
The internet is about to get a whole lot dumber with these fake AI generated answers.
A common case of asking a question about the future, even simpler than the weather:
"Dear Bing, what day of the week is February 12 next year?"
I would hope to get a precise and correct answer!
And of course all kinds of estimates, not just the weather, are interesting too.
"What is estimated population of New York city in 2030?"
>I asked myself, why would ask somebody an AI trained on previous data, about events in the future?
"Who won the Superbowl?" is not a question about future events, it's a question about the past. The Superbowl is a long-running series of games, I believe held every year. So the simple question "who won the Superbowl?" obviously refers to the most recent Superbowl game played.
"Who won the Superbowl in 2024?", on the other hand, would be a question about the future. Hopefully, a decent AI would be able to determine quickly that such a question makes no sense.
Exactly. I’d imagine this is a major reason why Google hasn’t gone to market with this already.
ChatGPT is amazing but shouldn’t be available to the general public. I’d expect a startup like OpenAI to be pumping this, but Microsoft is irresponsible for putting this out in front the of general public.
I anticipate in the next couple of years that AI tech will be subject to tight regulations similar to that of explosive munitions and SOTA radar systems today, and eventually even anti-proliferation policies like those for uranium procurement and portable fission/fusion research.
ChatGPT/GPT3.5 and its weights can fit on a small thumb drive, and copied infinitely and shared. Tech will get better enough in the next decade to make this accessible to normies. The genie cannot be put back in the bottle.
> ChatGPT/GPT3.5 and its weights can fit on a small thumb drive, and copied infinitely and shared.
So can military and nuclear secrets. Anyone with uranium can build a crude gun-type nuke, but the instructions for making a reliable 3 megaton warhead the size of a motorcycle have been successfully kept under wraps for decades. We also make it very hard to obtain uranium in the first place.
>Tech will get better enough in the next decade to make this accessible to normies.
Not if future AI research is controlled the same way nuclear weapon research is. You want to write AI code? You'll need a TS/SCI clearance just to begin, the mere acting of writing AI software without a license is a federal felony. Need HPC hardware? You'll need to be part of a project authorized to use the tensor facilities at Langley.
Nvidia A100 and better TPUs are already export restricted under the dual-use provisions of munition controls, as of late 2022.
It's also a first amendment issue, and already out there. Reminds me that I'm old enough to remember when PGP bypassed export control by being printed on paper and exported as books and scanned/typed back in, though.
They can of course restrict publishing of new research, but that won't be enough to stop significant advances just from the ability of private entities worldwide to train larger models and do research on their own.
Sure it can. Missile guidance systems fit on a tiny missile, but you can’t just get one.
The controlled parlor game is there to seed acceptance. Once someone is able to train a similar model with something like the leaked State Department cables or classified information we’ll see the risk and the legislation will follow.
They can try. You will note that nobody except government employees and the guy running the website ever got in trouble for reading cables or classified information. We have the Pentagon papers precedent to the effect of it being a freedom of speech issue.
True. In the long run though, I expect we will either build something dramatically better than these models or lose interest in them. Throw in hardware advances coupled with bitrot and I would go short on any of the gpt-3 code being available in 2123 (except in something like the arctic code vault, which would likely be effectively the same as it being unavailable).
They released it because ChatGPT went to 100M active users near instantly and caused a big dent in Google's stock for not having it. The investors don't seem to have noticed that the product isn't reliable.
1) The question as stated in the comment wasn't in the future tense and 2) the actual query from the screenshot was merely "superbowl winner". It would seem like a much more reasonable answer to either variant would be to tell you about the winners of the numerous past super bowls--maybe with some focus on the most recent one--not deciding to make up details about a super bowl in 2023.
The problem is that it will always give you an answer even if none exists. Like a 4 year old with adult vocab and diction, wearing a tie, confidently making up the news. People may make decisions based on made-up-bullshit-as-a-service. We need that like we need a hole in the head. Just wait until people start using this crap to write Wikipedia answers. In terms on internet quality, I sometimes feel like one of those people who stockpiles canned food and ammo: it's time to archive what you want and unplug.
> I asked myself, why would ask somebody an AI trained on previous data, about events in the future?
Lots of teams have won in the past though. Why should an AI (or you) assume that a question phrased in the past tense is asking about a future event? "Many different teams have won the super bowl, Los Angeles Rams won the last super bowl in 2022" Actually even if this was the inaugural year, you would assume the person asking the question wasn't aware it had not been held yet rather than assuming they're asking what the future result is, no? "It hasn't been played yet, it's on next week."
I realize that's asking a lot of "AI", but it's a trivial question for a human to respond to, and a reasonable one that might be asked by a person who has no idea about the sport but is wondering what everybody is talking about.
It feels like ChatGPT has been trained primarily to be convincing. Yet at the back of our minds I hope we recognise that "convincing" and "accurate" (or even "honest") are very different things.
Well, when you're playing with a ChatGPT, it may not be apparent what the cutoff date is for the training data. You may ask it something that was future when it was trained, but past when you asked.
Is Bing continuously trained? If so, that would kind of get around that problem.
Because the AI isn’t (supposed to be) providing its own information to answer these queries. All the AI is used for is synthesis of the snippets of data sourced by the search engine.
People who aren’t savvy and really want it to be right. Old man who is so sure of its confidence that he’ll put his life savings on a horse race prediction. Mentally unstable lady looking for a tech saviour or co-conspirator. Q-shirt wearers with guns. Hey Black Mirror people, can we chat? Try stay ahead of reality on this one, it’ll be hard.
At least that's relatively innocuous. I asked it how to identify a species of edible mushroom, and it gave me some of the characteristics from its deadly look alike.
I asked OpenAI’s ChatGPT some technical questions about Australian drug laws, like what schedule common ADHD medications were on - and it answered them correctly. Then I asked it the same question about LSD - and it told me that LSD was a completely legal drug in Australia - which is 100% wrong.
Sooner or later, someone’s going to try that as a defence - “but your honour, ChatGPT told me it was legal…”
Y’all are using this tool very wrong and in a way that none of the AI integrated search engines will. You assume the AI doesn’t know anything about the query, provide it the knowledge from the search index and ask it to synthesize it.
There’s still the risk that if the search results it is given don’t contain the answer to the exact question you asked it, that it will hallucinate the answer.
10,000% true which is why AI can't replace a search engine, only compliment it. If you can't surface the documents that contain the answer then you'll only get garbage.
A GAN approach to penalising a generator for generating something that is not supported by it's available data would be interesting (and I'm sure some have tried it already, I'm not following the field closely), but for many subjects creating training sets would be immensely hard (for some subjects you certainly could produce large synthetic training sets)
Look I know that "user is holding it wrong" is a meme but this is a case where it's true. The fact that LLMs contain any factual knowledge is a side-effect. While it's fun to play with and see what it "knows" (and can actually be useful as a weird kind of search engine if you keep in mind it will just make stuff up) you don't build an AI search engine by just letting users query the model directly and call it a day.
You shove the most relevant results form your search index into the model as context and then ask it to answer questions from only the provided context.
Can you actually guarantee the model won't make stuff up even with that? Hell no but you'll do a lot better. And the game now becomes figuring out better context and validating that the response can be traced back to the source material.
The examples in the article seem to be making the point that even when the AI cites the correct context (ie: financial reports) it still produces completely hallucinated information.
So even if you were to white-list the context to train the engine against, it would still make up information because that's just what LLMs do. They make stuff up to fit certain patterns.
That’s not correct. You don’t need to take my word for it. Go grab some complete baseball box scores and you can see that ChatGPT will reliably translate them into an entertaining English paragraph -length outline of the game.
This ability to translate is experimentally shown to be bound to the size of the LLM but it can reliably not synthesize information for lower complexity analytic prompts.
You don't build an AI search engine by just letting users query the model directly and call it a day.
Have you ever built an AI search engine? Neither have Google or MS yet. No one knows yet what the final search engine will be like.
However, we have every indication that all of the localization and extra training are fairly "thin" things like prompt engineering and maybe a script filtering things.
And given that despite ChatGPT's great popularity, the application is a monolithic text prediction machine and so it's hard to see what else could be done.
I'll would currently use it as it has been named: ChatGPT
Would you trust some stranger in a chat on serious topics without questioning critically? Some probably would, I not.
The chats I've had with it are more thoughtful, comprehensive and modest than any conversation I've had on the Internet with people I don't know, starting from the usenet days. And I respect it more than the naive chats I've had with say Xfinity over the years.
Still requires judgement, sophistication and refinement to get to a reasonable conclusion.
I'd say the critical question here would be whether these characteristics can also be found on the edible mushroom or if it wanted to outright poison you :-D
> I'd say the critical question here would be whether these characteristics can also be found on the edible mushroom
That's a non-trivial question to answer because mushrooms from the same species can look very different based on the environmental conditions. But in this case it was giving me identifying characteristics that are not typical for the mushroom in question, but rather are typical for the deadly Galerina, likely because they are frequently mentioned together. (Since, you know, it's important to know what the deadly look alikes are for any given mushroom.)
I treat GPT as I would a fiction writer. The factual content correlates to reality only as closely as a fiction author would go in attempt to suspend disbelief. This answer is about as convincing, apt, well-researched and factually accurate as I would expect to find in a dialogue out of a paperback novel published five years ago. I wouldn't expect it to be any better or worse at answering who won the 2023 Quidditch Cup or the 2023 Calvinball Grand Finals.
It has the Super Bowl numbers wrong, too. The last Super Bowl is LVI, which was Rams vs Bengals… the Super Bowl before that one was Tampa Bay Buccaneers vs Kansas City. it has every fact wrong but in the most confusing way possible…
A little premature to be calling such a prediction "silly". I think it's safe to assume some sort of LLM-based tech will be part of the most successful search engines within a relatively short period of time (a year tops). And if Google dallies its market share will definitely suffer.
Do you think Google's search engine can do that now?
Actually ChatGPT does appear to have the ability to distinguish the two, the big problem it has is presenting made-up information as though it were factual.
Existence is easy, just filter untrusted citations. Presumably authors you trust won't let AI's use their keys to sign nonsense.
Claim portability is harder but I think we'd get a lot out of a system where the citation connects the sentence (or datum) in the referenced article to the point where it's relevant in the referring article so that is easier for a human to check relevance.
How it should work is the model should be pre-trained to interact with the bing backend and make targeted search queries as it sees fit.
I wouldn’t put it past Microsoft to do something stupid like ground gpt3.5 with the top three bing results of the input query. That would explain the poor results perfectly.
That would require function and intelligence far outside the bounds of current large language models.
These are models. By definition they can't do anything. They can just regurgitate the best sounding series of tokens. They're brilliant at that and LLMs will be a part of intelligence, but it's not anywhere near intelligent on its own. It's like attributing intelligence to a hand.
Except it’s already been shown LLMs can do exactly that. You can prime the model to insert something like ${API CALL HERE} into its output. Then it’s just a matter of finding that string and calling the api.
Toolformer does something really neat where they make the API call during training and compare next word probability of the API result with the generated result. This allows the model learn when to make API calls in a self supervised way.
The model can be trained to output tokens that can intercepted by the backend before returning to the user. Also, the model can take metadata inputs that the user never sees.
Yes. It is possible to do additional things with the model outputs or have additional prompt inputs... That is irrelevant to the fact that the intelligence -- the "trained" part -- is a fixed model. The way in which inputs and outputs are additionally processed and monitored would have completely different intelligence characteristics to the original model. They are, by definition of inputs and outputs, separate.
Models of models and interacting models is a fascinating research topic, but it is nowhere near as capable as LLMs are at generating plausible token sequences.
I've tried perplexity.ai a bunch of times and I'd say I haven't seen any query wrong, although it's true I always look for technical info or translations, so my sample is not the same.
LLMs are incapable of telling the truth. There's almost no way they could develop one that only responds correctly like that. It'd have to be a fundamentally different technology.
Yep, the idea of truth or falsity is not part of the design, and if it was part of the design, it would be a different and vastly (like, many orders of magnitude) more complicated thing.
If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers. Otherwise it will give incorrect answers. What it can never do is know the difference between the two.
> If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers.
ChatGPT does not work this way. It wasn't trained to produce "statistically likely" output, it was trained for highly rated by humans output.
Not exactly. ChatGPT was absolutely trained to produce statistically likely output, it just had an extra training step added for human ratings. If they relied entirely on human ratings there would not have been sufficient data to train the model.
LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.
That said, traditional search engines suffer from the same problem, although with different symptoms. Search engines point to web pages generated by people (or some other way) which may or may not contain accurate information. They base their selection of those pages based on things like strength of authority, keyword similarity, etc... But truth never directly enters the picture (indirectly by tweaking the "trusted" set of pages).
> LLMs are not incapable of telling the truth. They tell the truth all the time. They're incapable of knowing if what they're telling is the truth.
We're well into philosophy at this point, but I'd argue that telling the truth requires knowledge of the truth. Individual statements could be true, false, or neither. But I'd say neither an LLM nor a monkeys-with-typewriters generator could be said to tell the truth, because what they're doing is both orthogonal to truth and lacking in agency. If we stretch to thinking of it as having agency, then I think the correct term would be "bullshitting", aka producing speech "without regard for truth". [1]
>They're incapable of knowing if what they're telling is the truth.
Right, because all they're doing is regurgitating the data they were trained on. It's like the kid in your CS class that copies someone else's coding assignment and just changes all the function names, variable names, and comments but not the actual executable stuff and never even bothers to compile and run it to see if it produces the desired output.
I feel like people don't appreciate how hard this is. But maybe because I raised on Zork, I really appreciate this sort of stuff. Here's an example prompt I just gave to ChatGPT:
"If Tim drops a coconut on the tail of a dog, what are two things the dog might do?"
Response:
"If Tim drops a coconut on the tail of a dog, two possible things the dog might do are:
1. Yelp or bark loudly in pain, as the impact of the falling coconut could be quite painful.
2. Try to escape or run away, as the dog may associate the sudden impact with danger and feel the need to get away from the source of the pain. Additionally, the dog may feel scared or anxious, which can cause it to run away."
Then I said, "But what if it was a soft coconut that was as light as a feather?"
Response:
"If Tim dropped a soft coconut that was as light as a feather, it's possible that the dog wouldn't feel much or any discomfort from the impact. In this scenario, two things the dog might do are:
1. Look around to see what just hit it and possibly sniff at the coconut to investigate.
2. Continue with its previous activity, ignoring the soft impact as if it never happened. The dog may not perceive the coconut as a threat or something worth reacting to, especially if it was light enough to not cause any pain."
I just can't read these responses and think, "Ehh... just a mindless regurgitation as expected from any LLM". These simple prompt responses impress me and I kind of know the technology -- although my experience in RNNs/LSTM is very dated.
Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)
> Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)
That was done as AI Dungeon, but there was some consternation due to the combo of charging for it and GPT's predilection for generating wild and possibly illegal sex scenes even when you don't ask it to.
> Right, because all they're doing is regurgitating the data they were trained on.
That is not true, it's clearly able to generalize. (If it can do anagrams, it's silly to say it's just regurgitating the instructions for doing anagrams it read about.)
But it doesn't try to verify that what it says might be true before saying it.
It can't do anagrams though (every now and then it might get a common one right but in general it's bad at letter- based manipulations/ information, including even word lengths, reversal etc.).
It doesn't know what letters are because it sees BPE tokens, but if you forgive that it does something like it.
example prompt: Imagine I took all the letters in "Wikipedia" and threw them in the air so they fell on the ground randomly. What are some possible arrangements of them?
Similarly, it can almost do arithmetic but apparently forgets to carry digits. That's wrong but it's still generalization!
Interestingly enough, it immediately got "Hated for ill" (presumably because there are source texts that discuss that very anagram). But it took about 10 goes to finally a correct anagram for "Indebted sniper", though the best it could do was "pride-bent snide". I then asked it which world leader's title this might also be anagram of and it some how decided "Prime Minister" was a valid anagram of the same letters.
But regular search engines only regurgitate what they've indexed, yet don't invent outright nonsense when they don't know (if you asked Google who won the superbowl in 2024 the nature of the results make it clear it simply doesn't have that information. Though if you change it to "world cup" one of the top answers says "portugal was the defending champion, defeating Argentina". The result is titled "2024 futsal world cup"!)
I don't think it is concealing the origin, but likely doesn't actually know the origin. That said, I agree that if they can provide sources (even probabilistically), that would be a good step forward.
The model is capable of generating many different responses to the same prompt. An ensemble of fact checking models can be used to reject paths that contain "facts" that are not present in the reference data (i.e. a fixed knowledge graph plus the context).
My guess is that the fact checking is actually easier, and the models can be smaller since they should not actually store the facts.
Exactly. Given a source of truth, it can't be that hard to train a separate analytic model to evaluate answers from the existing synthetic model. (Neglecting for the moment the whole Gödel thing.)
The problem isn't going to be developing the model, it's going to be how to arrive at an uncontroversial source of ground truth for it to draw from.
Meanwhile, people are complaining that the talking dog they got for Christmas is no good because the C++ code it wrote for them has bugs. Give it time.
Not true or false; just present or absent in the reference data. Note that false negatives will not result in erroneous output, so the model can safely err on the side of caution.
Also 100% accuracy is probably not the real threshold for being useful. There are many low hanging fruits today that could be solved by absolutely tiny error correcting models (e.g. arithmetic and rhyming).
The missing piece seems to be that for certain questions it doesn't make sense to extrapolate, and that if it's a question about what will happen in the future, it should answer in a different manner (and from my own interactions with ChatGPT it does exactly that, frequently referring to the cut-off time of its training data).
I just tried a similar query on perplexity.ai. "Who won the Daytona 500 in 2023?" (the race is scheduled for February 19th)
Result: "Sterling Marlin won the 2023 Daytona 500, driving the No. 4 for Morgan-McClure Motorsports[1]. He led a race-high 105 laps and won his second career race at Daytona International Speedway[1]. The 64th running of the DAYTONA 500 was held on February 19, 2023[2]. Austin Cindric had previously won the DAYTONA 500 in February 5, 2023[3]."
Wow, a driver that’s been retired for 13 years won for a team that shut down 10 years ago in the first ever season that Nascar has decided to run 2 Daytona 500s in the same month.
I tried perplexity.ai and asked it in which stadium did the chargers have their perfect season. It couldn't figure out that the chargers used to be the san diego chargers before they moved to LA and kept talking about their Los Angeles stadium even though they have never had a perfect season there.
Maybe for your use cases. I've found perplexity.ai wrong a few times just today:
* Misunderstanding one of its citations, it said that use of `ParamSpec` in Python would always raise a warning in Python 3.9
* When asked why some types of paper adhere to my skin if I press my hand against them for a few minutes (particularly glossy paper), it gave two completely different answers depending on how the question was worded, one of which doesn't necessarily make sense.
In my usage of ChatGPT, in areas I'm very knowledgable. I've mostly received answers that were stylistically excellent, creatively plausable and maybe even transcendent. The boilerplate around the answer tends to keep the answers grounded, though.
In areas where I have some experience but not much theoretical knowledge, after multiple exploratory questions, I better understand the topic and feel ok adjusting my behavior appropriately.
I haven't relied on it in areas where I am ignorant or naive e.g. knitting, discriminatory housing policy or the economy in Sudan. Since I have no priors in those areas, I may not feel strongly about the results whether they are profound or hallucinatory or benign.
I also haven't used it for fact checking or discovery.
I just checked it with a random query while watching Buffy the Vampire Slayer reruns, and it wasn't great. (Spoiler alerts)
"How did Buffy the Vampire Slayer die?"
"Buffy the Vampire Slayer died in the Season One finale when she threw herself into a portal to save her sister Dawn[1]. This was the first time Buffy had died, occurring on June 2, 1997[2]. She died again in the Season Five finale while fighting off Glory's demons[3]. Other characters such as Anya and Kevin Benedict also died during the series[4][5]."
Okay, so it was a trick question, because Buffy definitely died more than once, but it's conflated the fact that she died in Season 1 with the cause of her Season 5 death. Also, I had to Google Kevin Benedict to remember that he was Cordelia's sometimes boyfriend, and an extremely minor character, which makes me question how that death is more notable than Buffy's mom, or Tara, or Jenny Calendar, etc.
I like that this seems to have been more lexical confusion than how ChatGPT seems to enjoy filling empty spaces with abject lies, but perhaps it's worth exploring what you're asking it that has left it with such a great batting average?
I really like perplexity, but I've noticed that it sometimes summarizes the paper incorrectly, as in it cites it as concluding the opposite of what it actually concludes, so I always click through to read the papers/studies. It's great for surfacing relevant studies, though.
>> "I would've expected it to not get such a basic query so wrong."
Isn't this exactly what you would expect, with even a uperficial understanding of what "AI" actually is?
Or were you pointing out that the average person, using a "search" engine that is actually at core a transformer model doesn't' a) understand that it isn't really a search and b) have even the superficial understanding of what that means, and therefore would be surprised by this?
And this doesn’t seem like it’s a hard problem to solve
1. Recognize that the user is asking about sports scores. This is something that your average dumb assistant can do.
2. Create an “intent” with a well formatted defined structure. If ChatGPT can take my requirements and spit out working Python code, how hard could this be?
3. Delegate the information to another module that can call an existing API just like Siri , Alexa, or Google Assistant
Btw, when I asked Siri, “who won the Super Bowl in 2024”, it replied that “there are no Super Bowls in 2024” and quoted the score from last night and said who won “in 2023”.
Out of interest, what did the source used as reference for the 31-24 say exactly? Was it a prediction website and Bing thought it was the actual result, or did the source not mention these numbers at all.
Imagine you are autocorrect, trying to find the most "correct sounding" answer to a the question "Who won the super bowl?"
What sounds more "correct" (i.e. what matches your training data better):
A: "Sorry, I can't answer that because that event has not happened yet."
B: "Team X won with Y points on the Nth of February 2023"
Probably B.
Which is one major problem with these models. They're great at repeating common patterns and updating those patterns with correct info. But not so great if you ask a question that has a common response pattern, but the true answer to your question does not follow that pattern.
Yes, it actually sometimes gives C and also sometimes B and sometimes makes up E. That's how probability works, and that's not helpful when you want to look up an occurrence of an event in physical space (Quantum mechanics aside :D).
I've never had it say 'I don't know', but it apologizes and admits it was wrong plenty.
Sometimes it comes up with a better, acceptably correct answer after that, sometimes it invents some new nonsense and apologizes again if you point out the contradictions, and often it just repeats the same nonsense in different words.
one of the things its exceptionally well trained at is saying that certain scenarios you ask it about are unknowable, impossible or fictional
Generally, for example, it will answer a question about a future dated event with "I am sorry but xxx has not happened yet. As a language model, I do not have the ability to predict future events" so I'm surprised it gets caught on Super Bowl examples which must be closer to its test set than most future questions people come up with
It's also surprisingly good at declining to answer completely novel trick questions like "when did Magellan circumnavigate my living room" or "explain how the combination of bad weather and woolly mammoths defeated Operation Barbarossa during the Last Age" and even explaining why: clearly it's been trained to the extent it categorises things temporally, spots mismatches (and weighs the temporal mismatch as more significant than conceptual overlaps like circumnavigation and cold weather), and even explains why the scenario is impossible. (Though some of its explanations for why things are fictional is a bit suspect: think most cavalry commanders in history would disagrees with the assessment that "Additionally, it is not possible for animals, regardless of their size or strength, to play a role in defeating military invasions or battle"!)
on some topic at least it correctly identify bogus questions. I extensively tried to ask abount non existent apollo missions for example, including Apollo 3.3141952, Apollo -1, Apollo 68, and loaded question like when Apollo 7 landed on the moon, and was correctly pointing out impossible combinations. this is a well researched topic tho.
Only if it’s a likely response or if it’s a canned response. Remember that ChatGPT is a statistical model that attempts to determine the most likely response following a given prompt.
Seems like we should be able to train the AI to output not just text, but the text along with a confidence score. During training, instead of rms(error) or whatever, you could use error * square(confidence). So the AI would be “punished” a lot more for being confident about incorrect responses.
The confidence could also be exposed during inference. “Philadelphia eagles won the Super Bowl, but we’re only 2% confident of that”.
Most machine learning models have some ability to do this built-in, but the problem is that the confidence scores are generally "poorly calibrated" in that they do not correspond to useful estimates of probabilities.
I've always been surprised at how little interest industry seems to have in probabilistic machine learning, and how it seems to be almost absent from standard data science curricula. It can matter a lot in solving real world problems, but it can be harder to develop and validate a model that emits probabilities you can actually trust.
I've thought about this before but I think you have normalize the confidence over lots of examples for it to not lead to a degenerate solution. So if the confidence is divided by the sum of confidence for the whole minibatch the model would have an incentive to spread this out correctly rather than just always hedging.
There are plenty of fancy techniques out there already for building probabilistic neural networks. But I'm not aware of any results that combine them with large language models to develop a confidence score over an entire response.
I wonder if people don't even want confidence scores when they say they want machine learning in their product: they want exact answers, and don't want to think about gray areas.
An idiot will tell you the wrong answer with 100% confidence, and 0 valid basis to warrant said confidence.
I swear, Babbage must be rolling in his grave. Turns out the person asking him about putting garbage in and getting a useful answer out was just an ahead of their time Machine Learning Evangelist!
If the learning rate was scaled based on its own confidence output, then ideally the system should learn to have a low confidence in everything early in training, and only raise its confidence late in training as it gets increasingly confident in its own outputs.
LLM would have to fundamentally change. Current transformer architecture can not but hallucinate. It is inherent. More training won't solve it. A complete rethinking of the approach is the only way to
back when chatGPT was new I asked it what the most current version of PSADT (Powershell App Deployment Toolkit) was.
It told me that its model was old but it thought 3.6 was the most current version.
I then told it that 3.9 was the most current version.
I then started a new chat and asked it the same question again.
It told me its model was old but version 8 was the most current version! (there has never been a version 8 of PSADT)
I asked that question again today.
It has now told me to go check github cause its model is too old to know.
[0]: https://files.catbox.moe/xoagy9.png