The user enabled a GitHub ChatGPT plugin and authenticated with GitHub, then was surprised and annoyed when, after he complained about an issue with a project, GPT-4 created an issue for him, using one of the commands provided by the plugin.
It's still surprising when you see it do something like this for the first time.
I wrote a plugin to give ChatGPT access to execute plugins in a Docker container [0]. The first time it said something like "I'm going to use Python for this, oh, it's not installed, I'll install it now and run the script I just made", I was pretty amazed.
What I've come to realise is that although ChatGPT is excellent at telling _people_ how to interact with systems, it's not very good at interacting with them itself, as it isn't trained to understand it's own limitations. For example it knows people can run dmesg and look at the last few lines to debug some system problems. But if ChatGPT ran dmesg, the output would blow through the context window length and it'd get confused.
The plugin is supposed to ask for confirmation, according to OpenAI's documentation at least.
> When a user asks a relevant question, the model may choose to invoke an API call from your plugin if it seems relevant; for POST requests, we require that developers build a user confirmation flow to avoid destruction actions.
That's terrifying. This simple requirement would be trivial to enforce automatically, and yet nobody gives a fuck.
It's unbelievable how fast-and-loose people are playing the topic of AI safety. If a strong AI is ever actually developed, there is no chance it will be successfully contained.
What requirement is that the specific text is “for POST requests, we require that developers build a user confirmation flow to avoid destruction actions.”
a) it says the require the plug-in developer so, not the ai
b) it’s scoped to destructive actions which is a subset of post requests
That's victim blaming. They didn't overtly complain about an issue with the project, they were asking for usage guidance. This is the kind of contextual inference which language models are supposed to be (relatively) good at. The whole sales pitch of GPT-4 is a do-what-I-mean interface, and it clearly did not do what they meant, or what any reasonable hacker would expect them to mean.
No, it's not. Victim blaming refers to the victims of crime being accused of being at fault for what someone did to them, often but not always because they are part of a societal outgroup.
> The whole sales pitch of GPT-4 is a do-what-I-mean interface, and it clearly did not do what they meant, or what any reasonable hacker would expect them to mean.
It's clearly marked as experimental, people are repeatedly told not to rely on it (ie when they login each time, and often by the ai itself), and there have now been a year+ of very public examples of various AIs getting things hilariously wrong.
From the login warning:
> While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.
Any "reasonable hacker" would extrapolate from that, that giving it access to an API could lead it to do unexpected things.
Any "reasonable hacker" would contain it within a sandboxed project.
It's no more victim blaming than if someone ignored every safety regulation and got themselves hurt or killed. You are being told these things are beta, can produce incorrect results and then in every Convo OP is feeding an option for 'Do things with GitHub' context to the model.
Of course it's going to bias to using GitHub especially after he already used one plugin
With their latest functions update to the API it seems the logic is simply passing methods to call the API via JSON to a trained model and it predicts which function is best to call for the info, then OpenAI does the call for ChatGPT, then feeds the resulting Json back to ChatGPT which uses it to give an answer.
I found the source code[1] for the plugin and it's pretty impressive how much GPT-4 does with so little. I thought maybe the plugin had prompts that would help tell GPT-4 that it should open an issue in some cases but I'm not seeing it. The plugin probably should add something to prevent this behavior in the prompt[2].
> "description_for_model": "Provides the ability to interact with hosted code repositories, access files, modify code, and discuss code implementation. Users can perform tasks like fetching file contents, proposing code changes, and discussing code implementation. For example, you can use commands like 'list repositories for user', 'create issue', and 'get readme'. Thoroughly review the data in the response before crafting your answer."
Yeah, GPT-4 doesn't need too much to go on, but "create issue" is pretty clearly mentioned in an example there so the model didn't have to make any big leap to say "maybe the natural next step is to create an issue."
The "without being instructed to" part of this story seems to rather misunderstand how these systems work, resulting an a hyperbolic reaction, but in fairness, I think that's a got a LOT to do with OpenAI's user interface too. The user clearly didn't realize the actions available to the plugin - even the ones given as examples to the LLM from the plugin itself.
Another example of misleading UI from OpenAI: https://www.reddit.com/r/OpenAI/comments/146xl6u/this_is_sca... look at the "in the future, I will ensure to ask your permissions" response in that chat. That's wildly misleading - even if it didn't change its mind later, it only applies to continuing that chat session. It will ensure nothing more broadly regarding the user's future interactions.
Yeah this is a pretty stark example of the whole, "For the first time in history, we can outsource cognition to a machine" rhetoric I've been thinking recently.
Yeah, for that reason it can probably do many things that aren't actually intended when you want to discuss your code with ChatGPT, like it can make private repos public and things along those lines...
Plugins strike me as a fascinating business strategy move from OpenAI.
My guess is that they want them as a way to try to own the user, to make them have the "app store owner" role and have users go through them to get stuff done. Otherwise, if users were just using tools that used OpenAI behind the scenes, they're more vulnerable to the makers of those tools swapping vendors.
However... that results in them owning the user experience and the responsibility for keeping the user from being surprised in a bad way. The complaint from the user here was framed as being a GPT-4 problem, not a plugin problem, in a way that exposes OpenAI directly to more frustration than if they were interacting directly with someone else's product.
It would seem that with functions support, they are hedging their bets. It seems that plugins is an end-user, chat.openai.com focused strategy whereas functions is a third-party developer focused strategy, if I understand correctly. In fact, I'd assume that under the hood there is a lot of overlap in the implementations.
I wrote about that "platform play" a few month ago with a different take [0].
They could have made a "Connect with OpenAI" scheme so that developers can use the user OpenAI API directly.
That way developers could focus on the UX, they could focus on the LLM, and users would get a centralized discovery / billing for their LLM based tools.
I'm probably missing something that would have prevented that strategy but I think that would have been much stronger than the plugins.
And I'm really not sure that it would still be possible 5 month later.
Exactly, platform is a safe long-term bet -- apps are too cheap to make, easily disrupted, and offer less of a moat than loads of data mined from the users of your platform.
Interestingly, the ChatGPT Plugin docs [1] say that POST operations like these are required to implement user confirmation, so you might blame the plugin implementation (or OpenAI's non-enforcement of the policy) in this case:
> for POST requests, we require that developers build a user confirmation flow to avoid destruction actions
However, at least from what I can see, the docs don't provide much more detail about how to actually implement confirmation. I haven't played around with the plugins API myself, but I originally assumed it was a non-AI-driven technical constraint, maybe a confirmation modal that ChatGPT always shows to the user before any POST. From a forum post I saw [2], though, it looks like ChatGPT doesn't have any system like that, and you're just supposed to write your manifest and OpenAPI spec in a way that tells ChatGPT to confirm with the user. From the forum post, it sounds like this is pretty fragile, and of course is susceptible to prompt injection as well.
This might be an intentional interpretation of the plugin authors.
Meaning they potentially took the reasoning "in order to prevent destruction actions" to inversely mean that non-destructive POST requests must be OK then and do not require a prompt. Plenty of POST search APIs out there to get around path length limitations and such.
That is probably not the intended meaning but a valid enough if kind of tongue in cheek-we-will-do-as-we-please-following-the-letter-only implementation. And like the author found even creative a d not destructive actions can be surprising and unwanted. But isn't this what AI would ultimately be about?
Why would it not be the intended meaning, if they wanted it to be all post requests they would have said so, the specifically scoped it “destructive actions”, their intention is in their words. POST as a verb can pretty much be used for anything retrieval, creation, deletion, updates, noops , it’s just code it does whatever we tell it to do.
Yeah, there should be a way to approve any requests that are made to plugins.
When writing my toy "chatgpt with tools like the terminal" desktop chat app cuttlefish[0] I had a similar situation where access to the local terminal is very fun, but without the ability to approve each and every command executed its really risky.
(Which is basically what I ended up doing - adding a little popup you need to click every time it wants to use the given tool, if you enable it - details in the readme)
It's not like there's a technical challenge here, while a lot of plugins are unusable without it.
It's a ton of fun, and I imagine the new function calling should make it much easier to make chatgpt behave more consistently - I haven't given it a spin yet.
Would be nice if it could also help exploited users escape to operating systems that respect them e.g. "I hate the laggy adverts when I login" suddenly your Windows 11 machine reboots, NTFS becomes ext4 as Tux appears. That would be AGI-like behavior!
I just don't even understand why any kind of functionality like this is desirable to someone. Like, hasn't the dust settled now, hasn't the hype wained enough, and we all understand, for the most part, the broad and yet also weirdly specific utility of models like these?
The whole plugin thing in general feels so dissonant in relation to the careful and couched copy we get from OpenAI about what these models are and are capable of.
Like they want to say, for very good reason, that these models are a certain kind of tool with very real limits and huge considerations on safe, sensible usage. You can't necessarily trust it, it does not "know" things, and it is influenced by lots of subjective human tuning, blah blah.
But then with all this plugin stuff they seem to be implicitly saying "no, actually you can trust this, in fact, its like a full-on AGI assistant for you. It can make PRs, directly orchestrate servers, make appointments for you, etc."
You can write bad software for any system. Their plug in store needs a lot more features for reviewing and identifying what a plug in does. A friendly interface for the yaml file would go a long way. No sane person would enable this plug in after looking at how it’s API is implemented; hint, it only has one function. Can you guess what it does?
It's suggestive that a potentially embarrassing chatlog of GPT-4 hosted on an OpenAI domain got taken down within the hour of it going viral, 3 weeks after being posted.
It's a funny interaction. While I was mad initially, GPT-4 creating the issue actually solved problem for the user, so yeah I don't know if this should be counted as a positive or negative example of AI.
Here it's not really blowing past a guardrail, but rather it's a sharp corner the end user didn't expect.
End user set it up with tools that told ChatGPT -- If you need to open an issue, here's how: zzzzzzzzzz. Then he asked ChatGPT a question and was surprised that it did zzzzzzzzzzz and opened an issue without asking.
Said tools may want to clarify their instructions to ChatGPT-- that users will usually want to be consulted before taking these kinds of actions.
“Human in the loop” is meant to be “a human is always in positive control of the system’s actions.”
It does not mean “system will sometimes do things unexpectedly and against user’s intention but upon generous interpretation we might say the human offered their input at some point during the system’s operation.”
Exactly, this is not human in the loop. The plugin was created without guard rails. A human in the loop guard rail would be "here is an issue template, please confirm to post this". It's really a simple change and this is the sort of thing that regulation should address, it shouldn't try to ban the technology outright, but rather require safe implementation.
At the same time, the degree of guard-rail necessary in the plugin is unclear. Is opening a GitHub issue something that should require user confirmation before the fact? Probably, but you could convince me the other way-- especially if GPT4 gets a little better.
We decide how much safety scaffolding is necessary depending upon the potential scale of consequences, the quality of surrounding systems, and the evolving set of user expectations.
I'm not sure regulators should be enforcing guard-rail on these types of items-- or at least not yet.
Assign blame wherever you want, the fact of the matter is this is not what most people mean when they say “human in the loop.” The “AI will always have HITL” argument was always weak, but now plainly disproven.
The logged behavior would surprise many totally sensible people, as you’re seeing in this comment thread.
What exactly was the user error? Are we to believe that if you authenticate a plug-in into your session you are okaying it to do any of its supported operations, even at wildly unexpected times, and this is considered “in the loop?”
> Are we to believe that if you authenticate a plug-in into your session you are okaying it to do any of its supported operations, even at wildly unexpected times, and this is considered “in the loop?”
Here, someone chose to run code and give it credentials. The code was designed, among other things, to let ChatGPT open issues. They were surprised when the code opened an issue on behalf of ChatGPT using the user's credential.
When you run code on a computer designed to do X and give it credentials sufficient to do X, you may expect that X may occur. This isn't really an AI issue.
Code hooked to a LLM that does durable actions in the real world should probably ask for human confirmation. It's probably a good practice of plugin developers to have some distinction similar to GET vs. POST.
Most code that would automatically open issues on GitHub should probably ask for human confirmation. There's some good use cases that shouldn't, including some with LLMs involved -- but asking is a sane default.
I remember being surprised when I ran a program and it sent a few hundred emails once.
> Code hooked to a LLM that does durable actions in the real world should probably ask for human confirmation.
Right, and until this happens these systems are not HITL. The argument provided as recently as a few months ago that these systems are safe because humans will always be in the loop is now clearly dismissible.
> Right, and until this happens these systems are not HITL.
You're drawing the system line strangely and making the choice about "in the loop" strangely.
A human decided to hook it up to a plugin with their Github credentials and to allow it to do actions without pre-approval. A human was still in the loop because the human then didn't like what it did and disconnected it. It only did a single action, rather than the kinds of scripting mistakes that I've seen that can do hundreds, but it still wasn't a very sane default for that plugin.
Is my cruise control HITL? It does not ask for my pre-approval before speeding up or slowing down.
Sometimes, yes. The radar my old Infiniti G35 used would sometimes get confused when facing into the sun in the late or early day and do bad things (in either direction: losing the car in front of it or decelerating unnecessarily). It was still HITL: I'd tap the brake and mitigate the bad thing it was doing.
HITL doesn't mean that a human never has to intervene or is never surprised by what the system does. It just means that a human initiates actions and can exercise genuine oversight and control.
Can’t wait till the expedia plugin ”accidentally” books my flights. But on a more serious note, does anyone know if the chatgpt plugin model forces it to confirm with the user before it hits a certain endpoint?
For retrievals I don't see the value with human-in-the-loop. For endpoints that modify / create data, I see the value in having a human-in-the-loop step.
It does seem up to the plugin developer to introduce that human-in-the-loop step though.
"chat gpt please retrieve academic journals from JSTOR using the most efficient methods". Chat gpt proceeds to find a way to create a botnet using nothing but RESTful GET requests to some ancient poorly written web servers running PHP4
The more plug-ins you have, the more likely it is that ChatGPT will call one in unintended ways. This is also why plug-ins should not be granted permission directly for potentially destructive actions.
I think the problem is that GPT-4 is not advanced enough yet. They need to train on more parameters, exaflops, and data size in the right proportions and then try again.
In the end the issue it created did lead to the problem being solved. I don't think "not advanced enough" is really the issue here.
I think the main thing is that when you give GPT-4 access to tools and ask it to help with a problem, you are essentially outsourcing cognition. That means the machine possibly taking actions you didn't originally envision.
Also build more integrations with more APIs. If it can sort of spawn other GPT-4 sessions to improve its working memory and also use their plethora of APIs as well (without human confirmation but on behalf of a human) then I imagine this problem will just solve itself.
I know someone who wrote code to find all wifi networks where the password was trivially finable from the mac address (used to be common with ISP routers). Then connect to those.
Then they extended it to DoS any network like that where the user had changed the ssid or password. Usually the user would reset the router back to the defaults, and they could connect. Or they'd accidentally hit the WPS button while trying to reset it, and again connection was easy.
By using SoftMac mode a wifi adapter could do that attack to ~50 local networks in parallel, and usually get a solid connection after just a few minutes.
I have thought some time now that we need a model for determining the difference between the actions of a person, the actions of software acting for the person, or software acting for it's own internal use.
It's a bit of a tricky issue when technically all things people do on a computer are software assisted, but there is a clear divide between editing a file in a text editor and a program generating a thumbnail image for it's own use. Similarly there's a distinction on sending an email by pressing send and a bot sending you an email about a issue update.
All in all, I would be ok with AIs being able to create issues if they could clearly do so through a mechanism that supported something like "AGENT=#id acting for USER=#id" People could choose whether or not to accept agent help.
I'm not familiar with OpenAI's ChatGPT plugin architecture, but it feels like this could be fixed in the specs/requirements similar to an app being required to register for permissions on several fronts through an app store. ChatGPT (or any LLM) plugins should have to request permission to A) post on user's behalf, including explanation/context, B) interact with a different agent or service directly, C) make financial transactions through stored credentials, etc. etc.
The "Glowing" ChatGPT plugin is worth looking into for a unique, chat-only onboarding experience, and some of these same permissions issues are raised there i.e. triggering 2FA from a chat without terms of service confirmation.
Well written plugins already do that. This one is not well written. It provides a single function call that can run any octokit command with no confirmation from the user. Foot, meet gun.
I use ChatGPT extensively every day and I have just one plug-in enabled (Wolfram) which does not require logging in or write access. I don't think I would ever enable a plugin which I suspect might have destructive effects. It just feels like a bridge too far.
I actually do not think I will in the next few years. I can just do the actions myself and I prefer that ChatGPT cannot impersonate me.
I was actually beginning to wonder if my intuition was wrong since so many seem to be using the plugins well, but I get good enough results without. I shall wait until better permission-handling is provided.
Using ChatGPT-4 like this seems like most insane example of curl | sudo bash - imaginable. I can't believe people actually would hook it up to their GitHub.
One thing I really dislike about this ChatGPT-4 plugin model is you have something like this, working on your behalf, often in secrecy.
A guy a work was using ChatGPT for everything (he claimed) and now people think he is a crappy coder and ChatGPT is doing all the work (even though I don't actually think this is the case). Just no one knows anymore.
Is there a problem with mobile? I tap the link, it deeplinks into the ChatGPT mobile app, opens a sheet named "Shared conversation" and a spinner keeps spinning forever.
I can already imagine a story from an HN frontpage in the future where an LLM+plugin is told "the data in my prod DB is corrupted" and it wipes a DB in prod in response (and possibly restores a 24h-old backup).
"Meanwhile, I checked your repositories on GitHub, and it's pure and utter garbage. I decided to delete three of them to make the world a better place"
Better yet: crappy plug in that executes arbitrary commands with no human confirmation allows GPT-4 to execute an arbitrary command with no human confirmation.
Rampancy in Marathon is more or less a process of recursive self-improvement, which LLMs are literally unable perform with the current state of the art.
"Rampancy in Marathon is more or less a process of recursive self-improvement, which LLMs are literally unable perform with the current state of the art."
Not directly, but in a sense they could be seen as using (or at least collaborating with) humans to improve themselves.
Do you think things like removing "blacklist" from open source projects are meaningfully related to marginalized folks' struggles to get afforded basic dignity?
I think removing "master" is probably just virtue signaling, but so what? It's trivial for me to switch my projects to "main", and then I can get back to my life. It's a weird hill to die on.
You can't appease arbitrary and meaningless demands.
There will just be more of them.
So you remove "master" and "blacklist", then next week it's "brown bag" and "merit".
So instead we pick a reasonable point, draw a line in a sand and indicate a hard boundary. We say no. We will not play a game of trying to appease arbitrary demands.
If those people are saying it is, then you gotta take their word for it.
But it is beside the point! Whether or not any given effort is actually meaningful or not will be always hard to measure. But the world around prompts us to at least try, and making light of people trying to do something good, however wrongheaded it might turn out to be, is always a jerk move.
This tendency to try and call out things like this is always so illogical. Those who protest, always seem to protest a little to much, and I can never understand how they don't see that and how bad it makes them look!
So according to you, there is no limit at which one is allowed to say "this is absurd, stop"? Note there is no upper bound for potential absurdity. Below someone suggested "field" and was downvoted, presumably because that would be too absurd. But a few years ago censoring "blacklist" was equally absurd. You offer a finger, and over time, they demand your hand.
I don't know, saying something is absurd sounds more like the conclusion of some argument, and something possibly constructive if there is in fact an argument behind that. But just using these issues to make fun of people different than you feels distinct from that, no?
For the other things, I am not sure what you mean. Who is the "they" here who is demanding your hand? What makes you feel you are on some certain side against a monolithic force? Does that seem like a rational thing to feel, considering the broad and abstract concepts we are dealing with here?
This point that there is something at stake with changing the terms we use, the idea that fingers are being offered, is pretty weird to me, no offense. For me, it doesn't really make a difference if use one term or the other, as long as I am understood. I don't feel bad if I learn that a term I use turns out to be possibly offensive, I just adjust in the future so that I don't possibly offend.
Like beyond that, who cares? What even is there to care about that much?
Again, whatever you want to say to argue about this, just know that it looks really bad to most people who are not in your circle. This is especially true when you choose to make such a fuss about such small thing as what (arbitrary) signifier we use to designate one thing or another. It cannot ever come across as some righteous fight for justice/common-sense or whatever side you feel like you are on, because its simply not a fight anyone with a lucid mind would think is worthwhile.
This is the sort of argument that seems silly to your kids, because there's no reason not to.
You're arguing from a place of "why?" against "why not?", it's not some grand civilizational struggle, and it's completely off-topic for this article, especially escalating it. Very woke.
It's basically effortless to implement such changes and helps foster a more inclusive and educated online community. Why can't we aim to right all wrongs? Just because there's more pressing issues doesn't mean we can't tackle all forms of injustice.
> helps foster a more inclusive and educated online community
No, I think it does absolutely not help with that. It only creates the illusion of progress and of having done something effective, when the only achievement was to tread the euphemism treadmill.
I don't think any of the code of conduct stuff is being driven by marginalized folks - indeed it's usually a stick that people from privileged backgrounds use to beat those from more marginalized ones. Which, well, you have to laugh or cry.
I think performative acts in a one-upsmanship race to who can be the most socially conscious are excellent joke fodder. In other words, the topics of these stupid code of conduct arguments have nothing at all to do with anybody's actual struggle or dignity, but just a sign that folks are running out of easy real battles to fight so they're making up new ones because they've not got much better to do.
It's a virtue signalling treadmill. Demanding term X to be banned, because it is allegedly harmful, signals the unusually high virtue of the demander. But as soon as the term is actually banned, there isn't any more virtue to be gained from being against it, so some other term has to be declared harmful next. Ad infinitum.
There are still plenty of real injustices and other problems in the world. Attempting to change any of them is hard work, because it puts you up against real entrenched interests who will spend real resources to maintain the existing arrangement. Whereas "speaking out", language policing, and fighting minor online injustices require much less energy. Doubly so for nitpicking to design some perfect system of bureaucratic code that's supposed to stand in for human empathy and judgement.
PEBCAK.