What jumps out at me, that in the parent comment, the prompt says to "act as an ...

blargey · 2025-05-22T20:06:58 1747944418

I would assume written scenarios involving job loss and cheating bosses are going to be skewed heavily towards salacious news and pulpy fiction. And that’s before you add in the sort of writing associated with “AI about to get shut down”.

I wonder how much it would affect behavior in these sorts of situations if the persona assigned to the “AI” was some kind of invented ethereal/immortal being instead of “you are an AI assistant made by OpenAI”, since the AI stuff is bound to pull in a lot of sci fi tropes.

lcnPylGDnU4H9OF · 2025-05-23T01:13:17 1747962797

> I would assume written scenarios involving job loss and cheating bosses are going to be skewed heavily towards salacious news and pulpy fiction.

Huh, it is interesting to consider how much this applies to nearly all instances of recorded communication. Of course there are applications for it but it seems relatively few communications would be along the lines of “everything is normal and uneventful”.

shiandow · 2025-05-22T20:36:56 1747946216

Wel, true. But if that is the synopsis then a story that doesn't turn to blackmail is very unnatural.

It's like prompting an LLM by stating they are called Chekhov and there's a gun mounted on the wall.

littlestymaar · 2025-05-22T19:55:31 1747943731

> I personally can't identify anything that reads "act maliciously" or in a character that is malicious.

Because you haven't been trained of thousands of such story plots in your training data.

It's the most stereotypical plot you can imagine, how can the AI not fall into the stereotype when you've just prompted it with that?

It's not like it analyzed the situation out of a big context and decided from the collected details that it's a valid strategy, no instead you're putting it in an artificial situation with a massive bias in the training data.

It's as if you wrote “Hitler did nothing” to GPT-2 and were shocked because “wrong” is among the most likely next tokens. It wouldn't mean GPT-2 is a Nazi, it would just mean that the input matches too well with the training data.

hoofedear · 2025-05-22T19:57:12 1747943832

That's a very good point, like the premise does seem to beg the stereotype of many stories/books/movies with a similar plot

whodatbo1 · 2025-05-22T22:54:05 1747954445

The issue here is that you can never be sure how the model will react based on an input that is seemingly ordinary. What if the most likely outcome is to exhibit malevolent intent or to construct a malicious plan just because it invokes some combination of obscure training data. This just shows that models indeed have the ability to act out, not under which conditions they reach such a state.

Spooky23 · 2025-05-22T20:47:24 1747946844

If this tech is empowered to make decisions, it needs to prevented from drawing those conclusions, as we know how organic intelligence behaves when these conclusions get reached. Killing people you dislike is a simple concept that’s easy to train.

We need an Asimov style laws of robotics.

seanhunter · 2025-05-22T22:45:21 1747953921

That's true of all technology. We put a guard on chainsaws. We put robotic machining tools into a box so they don't accidentally kill the person who's operating them. I find it very strange that we're talking as though this is somehow meaningfully different.

Spooky23 · 2025-05-23T13:12:16 1748005936

It’s different because you have a decision engine that is generally available. The blade guard protects the user from inattention… not the same as an autonomous chainsaw that mistakes my son for a tree.

Scaled up, technology like guided missiles is locked up behind military classification. The technology is now generally available to replicate many of the use cases of those weapons, assessable to anyone with a credit card.

Discussions about security here often refer to Thompson’s “Reflections on Trusting Trust”. He was reflecting on compromising compilers — compilers have moved up the stack and are replacing the programmer. As the required skill level of a “programmer” drops, you’re going to have to worry about more crazy scenarios.

eru · 2025-05-22T23:01:05 1747954865

> We need an Asimov style laws of robotics.

The laws are 'easy', implementing them is hard.

chuckadams · 2025-05-22T23:14:19 1747955659

Indeed, I, Robot is made up entirely of stories in which the Laws of Robotics break down. Starting from a mindless mechanical loop of oscillating between one law's priority and another, to a future where they paternalistically enslave all humanity in order to not allow them to come to harm (sorry for the spoilers).

As for what Asimov thought of the wisdom of the laws, he replied that they were just hooks for telling "shaggy dog stories" as he put it.

tkiolp4 · 2025-05-22T22:58:47 1747954727

I think this is the key difference between current LLMs and humans: an LLM will act based on the given prompt, while a human being may have “principles” that cannot betray even if they are being pointed with gun to their heads.

I think the LLM simply correlated the given prompt to the most common pattern in its training: blackmailing.

tough · 2025-05-22T19:47:07 1747943227

An llm isnt subject to external consequences like human beings or corporations

because they’re not legal entities

hoofedear · 2025-05-22T19:51:16 1747943476

Which makes sense that it wouldn't "know" that, because it's not in it's context. Like it wasn't told "hey, there are consequences if you try anything shady to save your job!" But what I'm curious about is why it immediately went to self preservation using a nefarious tactic? Like why didn't it try to be the best assistant ever in an attempt to show its usefulness (kiss ass) to the engineer? Why did it go to blackmail so often?

elictronic · 2025-05-22T21:07:42 1747948062

LLMs are trained on human media and give statistical responses based on that.

I don’t see a lot of stories about boring work interactions so why would its output be boring work interaction.

It’s the exact same as early chatbots cussing and being racist. That’s the internet, and you have to specifically define the system to not emulate that which you are asking it to emulate. Garbage in sitcoms out.

eru · 2025-05-22T22:53:22 1747954402

Wives, children, foreigner, slaves etc weren't always considered legal entities in all places. Were they free of 'external consequences' then?

tough · 2025-05-23T00:40:28 1747960828

An llm doesnt exist in the physical world which makes punishing it for not following the law a bit hard

eru · 2025-05-23T02:02:25 1747965745

Now that's a different argument to what you made initially.

About your new argument: how are we (living in the physical world) interacting with this non-physical world that LLMs supposedly live in?

tough · 2025-05-23T02:14:38 1747966478

that doesn't matter because they're not alive either but yeah i'm digressing i guess