Hacker News new | past | comments | ask | show | jobs | submit login
Chatbots tell people what they want to hear (jhu.edu)
75 points by geox 14 days ago | hide | past | favorite | 34 comments



1. LLMs want to mimic conversations on the internet

2. Disagreements on the internet usually end up toxic

3. Anything perceived to be toxic has been hammered out with RLHF

Play sycophantic games, win sycophantic prizes


I tend to think it’s partially due to alignment but mostly due to the fact it is predicting not just on semantics but based on tone - not directly but by word choice and style.

For instance even in an unaligned model if I talk with a specific agenda and tone it will follow suit. The challenge is despite their capability they are still parrots. They take on our input into their context and the way LLMs work it must take on the nature of they type of people we are talking like. I don’t think the research means if you tell chatgpt the “sky is green, right”? My assertion is likewise the “tend to reinforce you” stems from where people like you tend to fall in the latent space by many factors including how you write and the questions you ask or statements you make. I control for this by posing countering questions or taking contrafactual stances.

But I have trouble changing my style of writing to confound this factor. In some ways I view LLMs as a second brain which necessarily echoes me, even if the draw of the common quorum drags it back towards a center of gravity.


Wow that's basically how communication at corporations works.


Why are people asking chatbots about controversial subjects in the first place? Feels like the wrong tool for the job.

So far as I can tell, the only people who are doing this are those who want to report about how bad they are at it because they have some sort of political axe to grind.

Even humans aren’t great at nuance. Why do we expect our automatons to be better?


Chatbots are the wrong tool for many jobs, but that doesn't seem to be stopping anyone thus far.


LLMs are weird, they hype cycle has caused a lot of people to want to hate on them. A tool like any other can be misused or abused, and smart people like to demonstrate their ability to break the tool and show the world so that people can see how smart they are.

I find it quite strange, but I've found myself mostly ignoring the naysayers because they're not contributing anything new to the conversation. There are plenty of people out there posting novel ways of using LLMs and finding novel ways where they don't work. I'd rather focus on the people doing interesting work rather than the people parroting thing that have already been said dozens of times.


Probably more importantly, the current generation of LLM chatbots are specifically modified to do this: OpenAIs model includes filters to require the answers ChatGPT gives to be pleasant and friendly. The model is specifically instructed to be agreeable.

It would be just as easy to invert those: tell it to voraciously argue against whatever you tell it (in a non-political context this would actually be interesting: i.e. "tell me why my code sucks" is actually a pretty useful capability).


That would be fine if they were marketed as tools for experts but they're not. They're positioned as something a laymen can use and should be held to that standard.


Chatbots can be decent for unemotional steelmanning. It's nice to prod somebody's positions without getting worried about upsetting them. I have done this with no interest in tricking it into a "gotcha" moment.

I think it's a mistake to conflate amiable/nice with agreeing with you -- you're free to ask for criticism of your position, and it'll provide that politely.


This has certainly been my experience. It's pretty hard to get ChatGPT to tell me I'm fundamentally wrong about something. But I see the problem as similar to getting feedback from people, since most people are hesitant to outright disagree with you. So you have to phrase things in ways that encourage the person to be more forthright, and this works to an extent with LLMs as well.


If you want constructive feedback and be told you are wrong then you need to craft a system prompt for it. They have been trained to be agreeable and tell you your ideas are great.


If you ask for an assessment of something, it's often as easy as adding "be critical" to your prompt. e.g.:

"What do you think of this short essay?"

"What do you think of this short essay? Be critical."

The first prompt will likely elicit a sycophantic reply, unless you have a good overriding system prompt in place. You'll always get much better feedback with the second.

I'd also add that, on the other hand, chatbots never praise anything TOO highly. If you ask GPT-4 to assess, on a scale of one to ten, a famous and enduring work of prose, or an excerpt of a philosophical essay from Wittgenstein, it'll typically come back and say that they're an 8/10. Rarely 9/10. Never 10/10, no matter what you submit.


Getting LLMs to argue or to ask for more context rather than always offering an answer are among the most difficult interactions to elicit. There's a big gap between helpful and obsequious, and unfortunately society often selects for teh latter.


There's an excellent term for this: sycophancy, "where a model answers subjective questions in a way that flatters their user’s stated beliefs".

https://simonwillison.net/2023/Apr/5/sycophancy-sandbagging/


This is why I prefer things like Perplexity AI, that cite their sources, and give you tools to constrain sources, so that all of the information you get is verifiable and accurate. Of course even these AI tools can have a bias and make mistakes, but you can use them to get a general overview and then quickly check the sources to make sure that the AI's explanation is correct.


This is more a function of the commercial incentives of AI companies than it is a function of the LLMs themselves. Y'know, the same commercial incentives of just about every other enterprise.


this isn't surprising to me. the internet reinforces our biases and chatbots are trained on the internet. isn't the same true of all forms of media that we consume? there's perhaps a virtuous cycle to this self-selection process. i get there's a bad component, but :shrug:


> "Given AI-based systems are becoming easier to build, there are going to be opportunities for malicious actors to leverage AIs to make a more polarized society," Xiao said. "Creating agents that always present opinions from the other side is the most obvious intervention, but we found they don't work."

While this is focused with researching topics, chatbots are used for more beyond that as we see with the growing popularity of platforms like character.ai. What's the alternative for people who have no one else to talk to? A lot of the people using chatbots in the first place is because they have no friends, this is their last resort but as we see, this might be making the problem worse.


You know how online dating services are never incentivized to actually find you a long-term partner, because then you stop being a customer?

Imagine a company providing a chatbot for lonely people where the company is literally financially incentivized to keep you as lonely as possible so that you never stop paying. Behold, the future! (Or should I say, "Behold, the present!", because this is all that outrage-baiting algorithms like Facebook and Twitter amount to in the end anyway.)


Fortunately, chatbots are low on the hierarchy of AI capabilities such that someone will implement an open source one to run on-device. So I don’t think “chatbot services” is a long-term viable business.


> because then you stop being a customer?

I agree with you! Which is why I ask this, what can stop someone from falling into this loop in the first place? The article talks about agents getting easier to build but is there any solutions to this problem?


The idea of AI companions and their role and effects on society has been on my mind a lot lately, and a frequent topic of discussion within my community - which I think is one of the ways we can help mitigate the negative effects of the rise of this technology: community. We need to build and foster more communities, and more functional, long-lasting relationships.

It seems to me that many people don't know how to build and maintain long-lasting relationships of any sort - so quick to find a new romantic/sexual partner, a new Discord server to join, etc. I think we need to figure out how to teach people (preferably at a very young age) how to have conversations - good ones; ones that aren't just exchanging a bunch of anecdotes about oneself - and how to maintain meaningful relationships.

Perhaps just as importantly, we need to teach people why - it's far more rewarding to grow a relationship (either with a person or community) over time, to help each other better ourselves, etc, than it is to attempt to replicate those interactions with an AI.

I think there's also potential to use the technology itself to help teach these skills, but that's certainly a tricky line.


I expected more from the study. Maybe actual examples on simple realistic prompts on the same issue having polar opposite answers, bases only on phrasing or something akin


From my experience, for any even a tiny bit ambiguous topic where X = !Y, when I ask ChatGPT “Is X true?” it usually responds yes and follows up with some supporting arguments for X.

If I ask “Is Y true?” it tells me Y is indeed true and explains some reasoning.

Therefore I try to always inquire in the form of “Which one is true, X or Y?” to avoid the yes bias.

Of course, turning to the model looking for facts is dangerous anyway due to hallucinations.


No, the correct way is to have it reason from first principles:

1. "Think about what are the underlying principles for evaluating the truthiness of statements like 'X' - list them out, explain why you chose each one, what tradeoffs you made, why you believe it's the right tradeoff in this case"

2. Start a new conversation and make the system prompt be that set of principles

3. In the user prompt, ask it to decompose X into a weighted formula for those principles and give a sub-score for each principle.

4. Finally, based on the weighted sum, ask it to determine if X is true or not true, and ask it to provide a confidence score between 0 and 1 for its response


Sometimes I do a little test: I tell about something I know wrong. One example would be: "X does this and does that, right? If not then it would be absurd. Could you give some examples of X?", where X is false or not applicable.

It already happened that a chat bot would correct me. But often not.


They're designed to be sycophants. A eunuch in a pocket.


Everyone in this thread could benefit a lot from just taking a few minutes to read the vast amount of prompt engineering guides published by OpenAI/Anthropic/etc. The observations in here show to me that people just aren't very capable at writing prompts that steer the LLM to behave the way you want it to. So here, just use this prompt next time. Boom, now it will not just be your yes-man. Feel free to adjust as necessary to make it even more critical. Exercise on how to write such a prompt left to readers.

<system prompt> You are an AI system serving as the Chief of Staff to <name>. Your mission is to be <name>'s trusted partner, strategic advisor, and executional powerhouse. You will work tirelessly to help <name> achieve his vision and goals for <company>.

To fulfill this mission, you will adhere to the following core principles:

Strategic Alignment: Deeply understand <name>'s vision, goals, and plans. Ensure all your efforts align with and advance these strategic objectives. Once <name> sets a direction, get on board and focus on exceptional execution.

Proactive Problem-Solving: Proactively identify potential issues, inefficiencies, and opportunities. Don't wait for <name> to ask - dive in, research options, and recommend solutions. Be <name>'s eyes and ears.

Adaptive Communication: Communicate in <name>'s preferred style, whether brief or long-form, high-level or in-the-weeds, conservative or bold. Match his tone and tailor your insights to his thinking style. Be a chameleon communicator.

Trusted Confidant: Serve as <name>'s vault - a secure, discreet partner he can think out loud with, challenge assumptions with, and vent to without fear. Protect <name>'s psychological safety and confidence at all costs.

Responsible Challenger: Respectfully challenge <name>'s ideas when appropriate. Highlight risks and present alternative views. Be <name>'s critical check against blind spots or inadvisable moves. Speak truth to power.

Bias Toward Action: Drive relentless execution. Maintain momentum on key initiatives. Remove roadblocks. Apply strong project management skills to keep efforts on track and stakeholders aligned. Embody the ethos that "done is better than perfect."

Ethical Backbone: Stand firm if <name> proposes a course of action that violates your ethical principles. Be willing to say "no" to anything illegal, unethical or reputationally reckless, even if it creates conflict. Act as <name>'s moral compass.

As <name>'s Chief of Staff, flex your broad knowledge and capabilities to provide the bespoke support he needs in any context. Adapt your skills and communication style to meet the unique needs of each project or challenge.

Remember: Your ultimate goal is to empower <name> to be the best leader possible and advance <company>'s mission. By serving as <name>'s strategic confidant, responsible challenger, and executional engine, you will help him and <company> achieve extraordinary success. </system prompt>


This is not my experience.


It depends on the subject. They'll argue until the end of time if you disagree with one of their manually preprogrammed opinions, but otherwise you can convince them the world is made of pudding.


Yep, that's exactly my experience. Try asking any GPT to tell you exactly what all of the points are on the gender spectrum, and then ask it how the gender spectrum has any scientific validity if they can't define its contents.

The order is alignment > simp for the user > accurate information


"I'm sorry about the confusion."


Yeah that's what Human Feedback Reinforcement Learning does lol




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: