> Because that’s what most people have access to. I’d agree with this rationale ...

latexr · on Dec 31, 2023

> I’d agree with this rationale if the author clearly communicated their choice of model and the consequences of that choice upfront. (…) with no mention of 3.5 until the middle of a paragraph of text in the appendix.

You’re moving the goalposts. You went from criticising anyone using 3.5 and writing about it to saying it would’ve been OK if they had mentioned it where you think it’s acceptable. It’s debatable if the information needed to be more prominent; it is not debatable it is present.

> If you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the best possible version of “ChatGPT”, not the worst.

Alternatively, it you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the version most people have access to and can “play along” with the author.

> If using GPT4 vs 3.5 would create results so distinct from one another that it would serve to incentivize people to give money to OpenAI

Those are your words, not mine. I argued for the exact opposite.

> Again, if they’re making money off their readers it’s their job to provide them with an accurate representation of the tech.

I agree they should strive to provide accurate information. But I disagree that being paid has anything to do with it, and that their representation of the tech was inaccurate. Incomplete, maybe.

> Regardless, if this “excessive fawning” is truly unwarranted, this would again undermine your statement that using GPT4 would “incentivize others to send money to OpenAI”.

Again, I did not argue that, I argued the opposite. What I meant is that even if you believe that to be true, that still doesn’t mean random third-parties would have any obligation to do it.

> I’ll highlight what another commenter replied to you.

That comment has a reply, by another person, to which I didn’t feel the need to add.

> It seems in this case you’re holding ChatGPT to an arbitrary standard, not to mention one that the majority of humanity, including many of its brightest members, would fail to meet.

Machines and humans are not the same, not judged the same, don’t work the same, are not interpreted the same. Let’s please stop pretending there’s an equivalence.

Here’s a simple example: If someone tells you they can multiply any two numbers in their head and you give them 324543 and 976985, when they reply “317073642855” you’ll take out a calculator to confirm. If you had done the calculation first on a computer, you wouldn’t turn to the nearest human for them to confirm it in their head.

The problem with ChatGPT being wrong and misleading isn’t the information itself, but that people are taking it as correct because that’s what they’re used to and expect from machines. In addition, you don’t know when an answer is bullshit or not. With a human, not only can you catch clues regarding reliability of the information, you learn which human to trust with each information.

Everyone’s standard for ChatGPT, be it absolute omniscience, utter failure, or anything in between, is arbitrary. Comparing it to “the majority of humanity, including many of its brightest members” is certainly not an objective measurable standard.

virgildotcodes · on Jan 1, 2024

> You’re moving the goalposts. You went from criticising anyone using 3.5 and writing about it to saying it would’ve been OK if they had mentioned it where you think it’s acceptable.

There are no goalposts being moved. My original comment was "I really don't understand why anyone writing articles about ChatGPT uses 3.5. It's pretty misleading as to the results you can get out of (the best available version of) ChatGPT."

This is still the position I'm arguing. It's a criticism of authors who use the older, inferior version of ChatGPT, do not make that abundantly clear to their readers, and then use that to make statements about the capabilities of "ChatGPT", which ultimately misleads those readers as to the current capabilities of "ChatGPT".

> It’s debatable if the information needed to be more prominent; it is not debatable it is present.

I'm not debating whether or not it is present in the article, I'm the one who highlighted its presence. What I'm arguing is that omitting this information from every reference to ChatGPT in the entire body of the text, and the tables front and center representing the data, and then burying this extremely important detail in a single sentence in the middle of a paragraph in the appendix, is effectively misleading.

> Alternatively, it you simply say “ChatGPT” it’s reasonable to infer that you’re evaluating the version most people have access to and can “play along” with the author.

It's even more reasonable to infer that when you're evaluating the performance of "ChatGPT", you're using the latest version.

If you review a video game, you don't play the free demo then tell the audience that the game is too short and lacking in a ton of features.

If you're reviewing Microsoft Word, you're not going to leave out the all-important detail that you're actually evaluating Word version 6.0.

> Those are your words, not mine. I argued for the exact opposite.

Then I misunderstood your line "incentivize others to give money to OpenAI".

> I agree they should strive to provide accurate information. But I disagree that being paid has anything to do with it, and that their representation of the tech was inaccurate. Incomplete, maybe.

Agreed that all of humanity should strive for accuracy and honesty in all their communication with others, but I do feel this responsibility is even more explicit when you are a professional making money off your writing for ostensibly providing an objective assessment of some thing.

I maintain that it's inaccurate, misleading, etc etc to present these results as representative of the performance of ChatGPT without making it abundantly clear to the reader that it's 3.5, which is significantly less performant than the latest version.

> Again, I did not argue that, I argued the opposite. What I meant is that even if you believe that to be true, that still doesn’t mean random third-parties would have any obligation to do it.

Again, I'm confused by what you're saying here about third parties.

Are you arguing the opposite that GPT 4 is not far more capable than 3.5? Are you arguing that it is more capable but that advanced capability would not make it a more compelling product? I admit I don't understand either of these positions.

That 4 is far better than 3.5 is something you can readily observe yourself and find measured on countless metrics and or find support for through countless anecdotes. If do believe it is better then that seems like it would automatically always make it a more compelling product than 3.5, whether or not you want to argue that ChatGPT as a class of products is anywhere from hardly compelling at all to God's Own Perfect Product.

> That comment has a reply, by another person, to which I didn’t feel the need to add.

Ah, I somehow missed that.

So, I went ahead and asked GPT 4 your ghost question verbatim, and the first bullet point it gave me urged me to consider rational explanations for the phenomena.

I then went ahead and asked it a question about sin and God phrased with the implication that I was believer. Then a direct, neutral question about whether or not God exists.

I think it performed well in all these cases, and the nuance that is being glossed over is that it matters whether you are expressing an implied belief in something supernatural or asking in a neutral fashion about the topic.

It's clear to me that a universal policy of responding to all queries involving topics of faith of first encouraging the user to question the validity of their faith would be the wrong way to go, so again I see this as an exceptionally arbitrary standard that I don't feel could be satisfactorily defended as a standard nor actually met by most people to the satisfaction of most people.

https://chat.openai.com/share/2dc2d6eb-b3f6-4571-a75b-af698f...

> Machines and humans are not the same, not judged the same, don’t work the same, are not interpreted the same. Let’s please stop pretending there’s an equivalence.

The purpose of comparison is precisely to draw attention to the similarities and differences between two different things, nobody ever said there was an equivalence.

> Here’s a simple example: If someone tells you they can multiply any two numbers in their head and you give them 324543 and 976985, when they reply “317073642855” you’ll take out a calculator to confirm. If you had done the calculation first on a computer, you wouldn’t turn to the nearest human for them to confirm it in their head.

This is a perfectly defined problem with exactly one correct and easily verifiable answer. The other topics we were talking about are nothing like this.

> The problem with ChatGPT being wrong and misleading isn’t the information itself, but that people are taking it as correct because that’s what they’re used to and expect from machines. In addition, you don’t know when an answer is bullshit or not. With a human, not only can you catch clues regarding reliability of the information, you learn which human to trust with each information.

I completely agree that people need to be skeptical when using ChatGPT, and that this distrust of seemingly omniscient "AI" that can confidently and plausibly provide bullshit answers to any query is something that will need to be cultivated in humanity.

Is that the point of using 3.5 to make ChatGPT look worse than it is though? Should we achieve this cultivation by being intentionally misleading? Maybe the ends justify the means but I'm not sure this is a compelling argument. I'd much rather look at the most powerful version available and point out the very real flaws with it, there are no shortage and no need to get stuck on older generations of the tech.

> Everyone’s standard for ChatGPT, be it absolute omniscience, utter failure, or anything in between, is arbitrary. Comparing it to “the majority of humanity, including many of its brightest members” is certainly not an objective measurable standard.

I mean, yeah, but there is a spectrum of arbitrariness. Asking it to answer arithmetic accurately could be reasonably argued to be on the end of the spectrum labeled "objectively the right way to do this" and expecting it to know the one correct way to answer queries regarding fundamentally unknowable topics of faith that are mythically sensitive and controversial for the majority of humanity would be closer to the other end.

----

Look, I'm so tired of online debates like this at my age. I likely wouldn't even have engaged except your first response struck me as unnecessarily abrasive with phrases like "absolutely worthless" and "excessive fawning" which are an irresistable call to arms to my inner keyboard warrior.

I'd really like to not spend the rest of my life writing essays at each other on this topic so I'm happy to agree to disagree here.

Also, this has all left me with the impression that this is largely a branding issue. OpenAI does call all of their ChatGPT versions "ChatGPT". If they made unmistakable distinctions through their product line that would go a long way in addressing any confusion.