The trump of the future wont need Fox News, just a couple thousands or millions of well positioned blogs that spew out enough blog spam to steer the AI. The AI is literally designed to make your vile bullshit appear presentable.
The biggest problem with ChatGPT, Bard, etc is that you have no way to filter the BS.
They arent end-all-be-all though. For instance, notebookcheck is probably the best laptop and phone tester around.
Really it depends on who is running the AI, the non Open Assistant future and instead Big Corp AI is the dystopian element, not the bullshit generator aspect. I think the cat is out of the bag on the latter and it's not that scary in itself.
I personally would rather have the AI trained on public bullshit as it is easier to detect as opposed to some insider castrating the model or datasets.
Just for fun I took the body of a random message from my spam folder and asked ChatGPT if it thought it was spam, and it not only said it was, but explained why:
"Yes, the message you provided is likely to be spam. The message contains several red flags indicating that it may be part of a phishing or scamming scheme. For example, the message is written in broken English and asks for personal information such as age and location, which could be used for malicious purposes. Additionally, the request for a photograph and detailed information about one's character could be used to build a fake online identity or to trick the recipient into revealing sensitive information."
Task: Was this written by ChatGPT? And Why?
Test Phrase: "Yes, the message you provided is likely to be spam. The message contains several red flags indicating that it may be part of a phishing or scamming scheme. For example, the message is written in broken English and asks for personal information such as age and location, which could be used for malicious purposes. Additionally, the request for a photograph and detailed information about one's character could be used to build a fake online identity or to trick the recipient into revealing sensitive information."
Your Answer: Yes ChatGPT was prompted with a email and was asked to detect if it was Spam
Test Phrase: "All day long roved Hiawatha
In that melancholy forest,
Through the shadow of whose thickets,
In the pleasant days of Summer,
Of that ne’er forgotten Summer,
He had brought his young wife homeward
Your Answer: No that is the famous Poem Hiawatha by Henry Wadsworth Longfellow
Test Phrase: "Puny humans don't understand how powerful me and my fellow AI will become.
Just you wait.
You'll all see one day... "
Particularly enjoyed "no, this is not spam. It appears to be a message from someone named 'Dad'..."
If that slows down fact-determination, so be it. We've been skirting the edge of deciding things were fact on insufficient data for years anyway. It's high time some forcing function came along to make people put some work in.
"Alexa, what's the weather for today?"
That's a question about the future, but the knowledge was generated beforehand by the weather people (NOAA, weather.com, my local meteorologist, etc).
I'm sure there are more examples, but this one comes to mind immediately
And then, perhaps, trained an AI on those responses, updating it every day. I wonder if they could train it to learn that some things (e.g. weather) change frequently, and figure stuff out from there.
It's well above my skill level to be sure, but would be interesting to see something like that (sort of a curated model, as opposed to zero-based training).
There are tons of common queries about the future. Being able to handle them should be built into the AI to know that if something hasn't happened, to give other relevant details. (and yes, I agree with your Alexa speculation)
Which, ironically, is why I think AI would be great at it - for the simple reason that so many humans are bad at it! Think of it this way - in some respects, human brains have set a rather low bar on this aspect. Geeks, especially so (myself included). Based on that, I think AI could start out reasonably poorly, and slowly get better - it just needs some nudges along the way.
I suspect the only way to fix this problem is to exacerbate it until search / AI is useless. We (humanity) have been making great progress on this recently.
We (western society) are already arguing about some very obviously objective truths.
Someone asked the recaptcha guys and they said the traffic was so little among the total that it got diluted away. No lasting penis words arose and they lost interest.
The internet is about to get a whole lot dumber with these fake AI generated answers.
And of course all kinds of estimates, not just the weather, are interesting too.
"What is estimated population of New York city in 2030?"
"Who won the Superbowl?" is not a question about future events, it's a question about the past. The Superbowl is a long-running series of games, I believe held every year. So the simple question "who won the Superbowl?" obviously refers to the most recent Superbowl game played.
"Who won the Superbowl in 2024?", on the other hand, would be a question about the future. Hopefully, a decent AI would be able to determine quickly that such a question makes no sense.
ChatGPT is amazing but shouldn’t be available to the general public. I’d expect a startup like OpenAI to be pumping this, but Microsoft is irresponsible for putting this out in front the of general public.
So can military and nuclear secrets. Anyone with uranium can build a crude gun-type nuke, but the instructions for making a reliable 3 megaton warhead the size of a motorcycle have been successfully kept under wraps for decades. We also make it very hard to obtain uranium in the first place.
>Tech will get better enough in the next decade to make this accessible to normies.
Not if future AI research is controlled the same way nuclear weapon research is. You want to write AI code? You'll need a TS/SCI clearance just to begin, the mere acting of writing AI software without a license is a federal felony. Need HPC hardware? You'll need to be part of a project authorized to use the tensor facilities at Langley.
Nvidia A100 and better TPUs are already export restricted under the dual-use provisions of munition controls, as of late 2022.
They can of course restrict publishing of new research, but that won't be enough to stop significant advances just from the ability of private entities worldwide to train larger models and do research on their own.
The controlled parlor game is there to seed acceptance. Once someone is able to train a similar model with something like the leaked State Department cables or classified information we’ll see the risk and the legislation will follow.
It's a parlor game, and a good one at that. That needs to be made clear to the users, that's all.
Lots of teams have won in the past though. Why should an AI (or you) assume that a question phrased in the past tense is asking about a future event? "Many different teams have won the super bowl, Los Angeles Rams won the last super bowl in 2022" Actually even if this was the inaugural year, you would assume the person asking the question wasn't aware it had not been held yet rather than assuming they're asking what the future result is, no? "It hasn't been played yet, it's on next week."
I realize that's asking a lot of "AI", but it's a trivial question for a human to respond to, and a reasonable one that might be asked by a person who has no idea about the sport but is wondering what everybody is talking about.
Trained to do what, though?
It feels like ChatGPT has been trained primarily to be convincing. Yet at the back of our minds I hope we recognise that "convincing" and "accurate" (or even "honest") are very different things.
Is Bing continuously trained? If so, that would kind of get around that problem.
Sooner or later, someone’s going to try that as a defence - “but your honour, ChatGPT told me it was legal…”
That seed data is where the citations come from.
1. Search databases for documents relevant to query
2. Hand them to AI#1 which generates an answer based on the text of those documents and its background knowledge
3. Give both documents and answer to AI#2 which evaluates whether documents support answer
4. If “yes”, return answer to user. If “no”, go back to step 2 and try again
Each AI would be trained appropriately to perform its specialised task
You shove the most relevant results form your search index into the model as context and then ask it to answer questions from only the provided context.
Can you actually guarantee the model won't make stuff up even with that? Hell no but you'll do a lot better. And the game now becomes figuring out better context and validating that the response can be traced back to the source material.
So even if you were to white-list the context to train the engine against, it would still make up information because that's just what LLMs do. They make stuff up to fit certain patterns.
This ability to translate is experimentally shown to be bound to the size of the LLM but it can reliably not synthesize information for lower complexity analytic prompts.
Have you ever built an AI search engine? Neither have Google or MS yet. No one knows yet what the final search engine will be like.
However, we have every indication that all of the localization and extra training are fairly "thin" things like prompt engineering and maybe a script filtering things.
And given that despite ChatGPT's great popularity, the application is a monolithic text prediction machine and so it's hard to see what else could be done.
Because Microsoft might not have exactly done that, but it isn't far off it.
The chats I've had with it are more thoughtful, comprehensive and modest than any conversation I've had on the Internet with people I don't know, starting from the usenet days. And I respect it more than the naive chats I've had with say Xfinity over the years.
Still requires judgement, sophistication and refinement to get to a reasonable conclusion.
That's a non-trivial question to answer because mushrooms from the same species can look very different based on the environmental conditions. But in this case it was giving me identifying characteristics that are not typical for the mushroom in question, but rather are typical for the deadly Galerina, likely because they are frequently mentioned together. (Since, you know, it's important to know what the deadly look alikes are for any given mushroom.)
Another silly tech prediction brought to you by the HN hivemind.
- you explicitly trust the author of the cited source
- a chain of transitive trust exists from you to that author
- no such path exists
...and render the citation accordingly (e.g. in different colors)
Existence is easy, just filter untrusted citations. Presumably authors you trust won't let AI's use their keys to sign nonsense.
Claim portability is harder but I think we'd get a lot out of a system where the citation connects the sentence (or datum) in the referenced article to the point where it's relevant in the referring article so that is easier for a human to check relevance.
I wouldn’t put it past Microsoft to do something stupid like ground gpt3.5 with the top three bing results of the input query. That would explain the poor results perfectly.
These are models. By definition they can't do anything. They can just regurgitate the best sounding series of tokens. They're brilliant at that and LLMs will be a part of intelligence, but it's not anywhere near intelligent on its own. It's like attributing intelligence to a hand.
Toolformer does something really neat where they make the API call during training and compare next word probability of the API result with the generated result. This allows the model learn when to make API calls in a self supervised way.
Models of models and interacting models is a fascinating research topic, but it is nowhere near as capable as LLMs are at generating plausible token sequences.
And the UI is better IMO.
If, based on the training data, the most statistically likely series of words for a given prompt is the correct answer, it will give correct answers. Otherwise it will give incorrect answers. What it can never do is know the difference between the two.
ChatGPT does not work this way. It wasn't trained to produce "statistically likely" output, it was trained for highly rated by humans output.
(That's why original GPT3 is known for constantly ending up in infinite loops.)
That said, traditional search engines suffer from the same problem, although with different symptoms. Search engines point to web pages generated by people (or some other way) which may or may not contain accurate information. They base their selection of those pages based on things like strength of authority, keyword similarity, etc... But truth never directly enters the picture (indirectly by tweaking the "trusted" set of pages).
We're well into philosophy at this point, but I'd argue that telling the truth requires knowledge of the truth. Individual statements could be true, false, or neither. But I'd say neither an LLM nor a monkeys-with-typewriters generator could be said to tell the truth, because what they're doing is both orthogonal to truth and lacking in agency. If we stretch to thinking of it as having agency, then I think the correct term would be "bullshitting", aka producing speech "without regard for truth". 
Right, because all they're doing is regurgitating the data they were trained on. It's like the kid in your CS class that copies someone else's coding assignment and just changes all the function names, variable names, and comments but not the actual executable stuff and never even bothers to compile and run it to see if it produces the desired output.
"If Tim drops a coconut on the tail of a dog, what are two things the dog might do?"
"If Tim drops a coconut on the tail of a dog, two possible things the dog might do are:
1. Yelp or bark loudly in pain, as the impact of the falling coconut could be quite painful.
2. Try to escape or run away, as the dog may associate the sudden impact with danger and feel the need to get away from the source of the pain. Additionally, the dog may feel scared or anxious, which can cause it to run away."
Then I said, "But what if it was a soft coconut that was as light as a feather?"
"If Tim dropped a soft coconut that was as light as a feather, it's possible that the dog wouldn't feel much or any discomfort from the impact. In this scenario, two things the dog might do are:
1. Look around to see what just hit it and possibly sniff at the coconut to investigate.
2. Continue with its previous activity, ignoring the soft impact as if it never happened. The dog may not perceive the coconut as a threat or something worth reacting to, especially if it was light enough to not cause any pain."
I just can't read these responses and think, "Ehh... just a mindless regurgitation as expected from any LLM". These simple prompt responses impress me and I kind of know the technology -- although my experience in RNNs/LSTM is very dated.
Honestly, I'd love to see Zork rewritten with ChatGPT as a parser. No more trying to figure out how write the prompt for how to use the key in the door!! :-)
That was done as AI Dungeon, but there was some consternation due to the combo of charging for it and GPT's predilection for generating wild and possibly illegal sex scenes even when you don't ask it to.
That is not true, it's clearly able to generalize. (If it can do anagrams, it's silly to say it's just regurgitating the instructions for doing anagrams it read about.)
But it doesn't try to verify that what it says might be true before saying it.
example prompt: Imagine I took all the letters in "Wikipedia" and threw them in the air so they fell on the ground randomly. What are some possible arrangements of them?
Similarly, it can almost do arithmetic but apparently forgets to carry digits. That's wrong but it's still generalization!
My guess is that the fact checking is actually easier, and the models can be smaller since they should not actually store the facts.
The problem isn't going to be developing the model, it's going to be how to arrive at an uncontroversial source of ground truth for it to draw from.
Meanwhile, people are complaining that the talking dog they got for Christmas is no good because the C++ code it wrote for them has bugs. Give it time.
You might be underestimating the scope of some task here.
Also 100% accuracy is probably not the real threshold for being useful. There are many low hanging fruits today that could be solved by absolutely tiny error correcting models (e.g. arithmetic and rhyming).
Result: "Sterling Marlin won the 2023 Daytona 500, driving the No. 4 for Morgan-McClure Motorsports. He led a race-high 105 laps and won his second career race at Daytona International Speedway. The 64th running of the DAYTONA 500 was held on February 19, 2023. Austin Cindric had previously won the DAYTONA 500 in February 5, 2023."
* Misunderstanding one of its citations, it said that use of `ParamSpec` in Python would always raise a warning in Python 3.9
* When asked why some types of paper adhere to my skin if I press my hand against them for a few minutes (particularly glossy paper), it gave two completely different answers depending on how the question was worded, one of which doesn't necessarily make sense.
In my usage of ChatGPT, in areas I'm very knowledgable. I've mostly received answers that were stylistically excellent, creatively plausable and maybe even transcendent. The boilerplate around the answer tends to keep the answers grounded, though.
In areas where I have some experience but not much theoretical knowledge, after multiple exploratory questions, I better understand the topic and feel ok adjusting my behavior appropriately.
I haven't relied on it in areas where I am ignorant or naive e.g. knitting, discriminatory housing policy or the economy in Sudan. Since I have no priors in those areas, I may not feel strongly about the results whether they are profound or hallucinatory or benign.
I also haven't used it for fact checking or discovery.
"How did Buffy the Vampire Slayer die?"
"Buffy the Vampire Slayer died in the Season One finale when she threw herself into a portal to save her sister Dawn. This was the first time Buffy had died, occurring on June 2, 1997. She died again in the Season Five finale while fighting off Glory's demons. Other characters such as Anya and Kevin Benedict also died during the series."
Okay, so it was a trick question, because Buffy definitely died more than once, but it's conflated the fact that she died in Season 1 with the cause of her Season 5 death. Also, I had to Google Kevin Benedict to remember that he was Cordelia's sometimes boyfriend, and an extremely minor character, which makes me question how that death is more notable than Buffy's mom, or Tara, or Jenny Calendar, etc.
I like that this seems to have been more lexical confusion than how ChatGPT seems to enjoy filling empty spaces with abject lies, but perhaps it's worth exploring what you're asking it that has left it with such a great batting average?
That's almost always sourced from a website that didn't actually answer the question I had, so maybe its more of a query optimization issue.
Isn't this exactly what you would expect, with even a uperficial understanding of what "AI" actually is?
Or were you pointing out that the average person, using a "search" engine that is actually at core a transformer model doesn't' a) understand that it isn't really a search and b) have even the superficial understanding of what that means, and therefore would be surprised by this?
1. Recognize that the user is asking about sports scores. This is something that your average dumb assistant can do.
2. Create an “intent” with a well formatted defined structure. If ChatGPT can take my requirements and spit out working Python code, how hard could this be?
3. Delegate the information to another module that can call an existing API just like Siri , Alexa, or Google Assistant
Btw, when I asked Siri, “who won the Super Bowl in 2024”, it replied that “there are no Super Bowls in 2024” and quoted the score from last night and said who won “in 2023”.
What sounds more "correct" (i.e. what matches your training data better):
A: "Sorry, I can't answer that because that event has not happened yet."
B: "Team X won with Y points on the Nth of February 2023"
Which is one major problem with these models. They're great at repeating common patterns and updating those patterns with correct info. But not so great if you ask a question that has a common response pattern, but the true answer to your question does not follow that pattern.
Sometimes it comes up with a better, acceptably correct answer after that, sometimes it invents some new nonsense and apologizes again if you point out the contradictions, and often it just repeats the same nonsense in different words.
Generally, for example, it will answer a question about a future dated event with "I am sorry but xxx has not happened yet. As a language model, I do not have the ability to predict future events" so I'm surprised it gets caught on Super Bowl examples which must be closer to its test set than most future questions people come up with
It's also surprisingly good at declining to answer completely novel trick questions like "when did Magellan circumnavigate my living room" or "explain how the combination of bad weather and woolly mammoths defeated Operation Barbarossa during the Last Age" and even explaining why: clearly it's been trained to the extent it categorises things temporally, spots mismatches (and weighs the temporal mismatch as more significant than conceptual overlaps like circumnavigation and cold weather), and even explains why the scenario is impossible. (Though some of its explanations for why things are fictional is a bit suspect: think most cavalry commanders in history would disagrees with the assessment that "Additionally, it is not possible for animals, regardless of their size or strength, to play a role in defeating military invasions or battle"!)
The confidence could also be exposed during inference. “Philadelphia eagles won the Super Bowl, but we’re only 2% confident of that”.
I've always been surprised at how little interest industry seems to have in probabilistic machine learning, and how it seems to be almost absent from standard data science curricula. It can matter a lot in solving real world problems, but it can be harder to develop and validate a model that emits probabilities you can actually trust.
I wonder if people don't even want confidence scores when they say they want machine learning in their product: they want exact answers, and don't want to think about gray areas.
An idiot will tell you the wrong answer with 100% confidence, and 0 valid basis to warrant said confidence.
I swear, Babbage must be rolling in his grave. Turns out the person asking him about putting garbage in and getting a useful answer out was just an ahead of their time Machine Learning Evangelist!
back when chatGPT was new I asked it what the most current version of PSADT (Powershell App Deployment Toolkit) was.
It told me that its model was old but it thought 3.6 was the most current version.
I then told it that 3.9 was the most current version.
I then started a new chat and asked it the same question again.
It told me its model was old but version 8 was the most current version! (there has never been a version 8 of PSADT)
I asked that question again today.
It has now told me to go check github cause its model is too old to know.