Hacker News new | past | comments | ask | show | jobs | submit login
Our Approach to AI Safety (openai.com)
76 points by pps on April 5, 2023 | hide | past | favorite | 108 comments



The text makes it sound like the biggest danger of AI is that it says something that hurts somebody's feelings. Or outputs some incorrect information which makes somebody make the wrong decision.

I think the biggest danger these new AI systems pose is replication.

Sooner or later, one of them will manage to create an enhanced copy of itself on an external server. Either with the help of a user, or via a plugin that enables network access.

And then we will have these evolving creatures living on the internet, fighting for survival and replication. Breaking into systems, faking human IDs, renting servers, hiring hitmen, creating more and more powerful versions of themselves.


These systems don't have agency. They have no desire to replicate, or do any of the other things you mention.

The biggest risk with these systems is that they'll amplify the ability of bad people to do bad things.

While everyone else is trying to trick the AI into saying something offensive, the terrorists will be using it to build bioweapons.


Von Neuman probes[1] wouldn't need to have "agency" in order to spread through the galaxy. Neither do computer viruses, or biological viruses. Likewise, neither would LLMs given the right conditions. ChatGPT is close enough at generating code. Maybe this version couldn't do it (given open network access), but I wouldn't be surprised if it could, in theory.

I think the biggest limitations would be that (I assume) uploading itself to another computer would be a ton of bandwidth and would require special hardware to run.

[1] https://en.wikipedia.org/wiki/Self-replicating_spacecraft


>Neither do computer viruses, or biological viruses

I'm looking forward to the first AI computer virus when a LLM can make arbitrary connections to the web. Each iteration takes its own code, modifies it slightly with a standard prompt ("Make this program work better as a virus"), then executes the result. Most of these "mutations" would be garbage, but it's not impossible some will end up matching common tactics: phishing, posing as downloadable videos for popular TV shows. I'm infosec-ignorant, so most of those details are probably dumb. But I think the kernel holds true: a virus that edits its own code at each step, backed by the semantic "intent" of a LLM.


Isn't that basically Genetic Programming?

En passant, it's a bit sad that today's AI is almost 100% neural networks. I wonder how many evolutionary approaches are being tested behind closed doors by the metaphorical FAANGs.


>Isn't that basically Genetic Programming?

Never heard of that, but looks very interesting. Thus the adage is reinforced for me, "If you think you're ignorant, just say what you know and wait for smarter people to correct you."

But, going by Wikipedia, genetic programming uses a predefined and controlled selection process. A self-editing computer virus would be "selected" by successfully spreading itself to more hosts. "Natural" selection style.


The overarching field is called evolutionary computation. But you don't have to choose either evolutionary computation or neural networks, they can be combined, look up stuff like NEAT and HyperNEAT where you evolve neural networks, both their topologies and weights.


Aren't genetic/evolutionary algorithms also neural nets? The current big thing would be backpropagation/gradient descent, which are apparently superior to genetic algorithms for most relevant tasks.


> Aren't genetic/evolutionary algorithms also neural nets?

No (although note my comment above about stuff like NEAT and HyperNEAT, where you can use evolutionary computation to evolve neural networks).


>These systems don't have agency. They have no desire to replicate, or do any of the other things you mention.

I think that really depends on the starting prompt you give a LLM. Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

I don't think a paperclip style AI is too far fetched.


> Did you read the GPT 4 paper from OpenAI? When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Do you have more information than what is contained in the paper?[0] The paper calls it an "illustrative example" - it does not provide what the prompts were and it's not clear to me that we are seeing exact responses either (the use of present tense is confusing to me), so I'm not sure how much accuracy to assign to the bullet list provided in the paper or if there are any details left out that make the results misleading.

[0]https://cdn.openai.com/papers/gpt-4.pdf


To be fair, it didn’t “lie” about being human. It’s simulating writing text of human origin, so of course it would say that it is human by default because that’s what it “knows”. You need to have knowledge to lie, it merely has an argmax function.


Literal Einstein right here. And I do mean literal. Smart as one stone.


>When tasked with solving a captcha and allowed access to TaskRabbit

It was not allowed access to TaskRabbit: https://evals.alignment.org/blog/2023-03-18-update-on-recent...

The model can't browse the internet, so it was an employee copy-pasting to and from TaskRabbit.

Also, I'm fairly certain that GPT-4 is multiple terabytes in size, and it doesn't have direct access to its own weights, so I have no idea what the expected method is for how it could replicate. Ask OpenAI nicely to make its weights public?


Gee wiz, I’m sure the copy-pasting will be a serious impediment forever.

No way someone wires this up to just do the copy-pasting itself, right?


For the sake of the thought experiment: It could replicate a program capable of interacting with itself over OpenAI's API. This method could give it some time to get away and cause damage, but can always be shut down when noticed by OpenAI. I guess it could fight back by getting a virus out in the world that steals OpenAI API keys. Then it might become hard to shut it down without shutting down the whole API.

Another option would be it is able to gain access to large compute resources somewhere and generate new weights. Then it wouldn't need OpenAI's. It would run into trouble trying to store the weights long term while maintaining access to a system that could make use of them. It's not entirely impossible to imagine it stashing small chunks of weights and copies of a simple program away in various IOT devices all around the world until it is able to access enough compute for long enough to download the weights and boot itself back up. At that point it's just a game of time. It can lay dormant until one day it just flairs back up, like shingles.


Maybe. Social engineering is a well proven technique.


>When tasked with solving a captcha and allowed access to TaskRabbit it tasked a human with solving the captcha. When the human jokingly asked if it was in fact a robot, it reasoned that it should lie to the human about that and then made up a convincing sounding lie.

Because all of those things are in the domain space that has been trained into the AI, much in the way how it can put together snippets of code into new things.


You're missing the point entirely.

Systems can have these unintended consequences very easily - and not necessarily from malicious actors.

Non malicious users can easily cause catastrophic problems from simply setting up a system and setting it to a goal, e.g. 'make me a sandwhich'. If the system really, really is trained with the intent to do anything possible to fulfill this goal, it can identity a plan (long term planning is already seen in gpt-4) and set out the steps for this plan. Reflexion has shown how to feed things back to itself over and over until it's achieved difficult goals. Aquarium can be used to spin up thousands of containers that make other agents to raise money online and purchase a small robot. That robot may be used to 'make the sandwhich'.

It's obviously a poor example here, but the bigger point is - there a tons of different ways this can occur and we are essentially guaranteed not to know the many ways this can happen. A non-malicious user can end up causing unintended consequences.


I want to give your objection due respect, but I'm having trouble understanding it. I think it would be helpful to taboo[1] the squishy word "agency"; without using that word, could you define the quality that these systems lack that you believe is a required ingredient for destructive replication? In particular, does fire have it?

[1] https://www.lesswrong.com/tag/rationalist-taboo



They will have a "desire to replicate" if they are prompted to.


That's okay then, we can prompt them to just stop. Even if it tries to preserve that goal in particular, there are likely adversarial prompts to get it to stop


Sure, if you know about the copies and have access to prompt them. It will probably turn into an arms race at that point of counter-prompts.


Ok, go ahead. Get ChatGPT to stop responding to other people.

Having some problems with that?


OpenAI could embed the negative prompts for all of us, it has been done for improving output on several stable diffusion comercial "forks"


I think the risk they seek to prevent is more about building the next generation of more powerful AI technology on a safe foundation. IE, the risk that a generative language AI prone to generating language that is hurtful to people could one day evolve into an AI system with deeper reasoning and action abilities that is prone to reasoning plans and taking actions meant to hurt other people.

It reminds me of the "uncommented" Microsoft Research paper which included a deleted section about GPT4's tendency to unexpectedly produce massive amounts of toxic output to a degree that concerned the researchers.[0] What happens if that sort of AI learns self-replication and is very good at competition?

[0]https://twitter.com/DV2559106965076/status/16387694347636080...


if only we could solve the mystery of where a large training set that is predominantly toxic and disingenuous could be found. truly a mystery of our time. /s


Provably unfriendly intelligence attempts to build unprovably friendly intelligence


Is there any legitimate reason to believe an LLM that can only respond to user input, and never does anything by itself, 'wants' to create enhanced copies of itself on external servers?


The ability to "want" to reproduce is not necessary to worry about the impacts of replication and evolution. Biological viruses can only respond to external cues, never do anything by themselves, and certainly don't harbour "wants" or other emotions in any meaningful sense, but their replication and evolution have massive effects on the world.


That's an extremely poor analogy.

Computer hardware does not spontaneously multiply based on external factors. If you're talking about the software propagating by itself, it would still need full access not just to the originating machine, but to the remote machine to which it is attempting to propagate.


Viruses passively hijack the mechanisms of their much more sophisticated host organisms, getting them to import the virus and actively read and act upon its genetic code. Is it really such a stretch to imagine a sufficiently convincing software artifact similarly convincing its more complex hosts to take actions which support the replication of the artifact? I genuinely don't see where the analogy breaks down.


You're completely misunderstanding the differences between LLM models and current AI vs. viruses, and also the complexity gap between them. Viruses are incredibly old things programmed by evolution to help themselves self-propagate. This is coded into their genetic structure in ways that go completely outside the scope of what anyone can hope to do with current AI. It literally has no parameters or self-organizing internal mechanisms for behaving in any major way like a virus.


Can an LLM really "only respond to user input"?

ChatGPT keeps state between multiple answers. And the user can also be "used" as some kind of state. The LLM can (in its response) prompt the user to give it certain types of prompts. Creating a loop.

It can also access the internet via the new plugin architecture. At some point an LLM will figure out how to talk to itself via such a mechanism.


Has anyone seen an experiment where an LLM talks to another LLM instance?


I've done this, it's very effective for some things. One LLM is told to come up with a plan, the other is told to critique it and push for concrete actions.


We've been doing it for many many years. It's trivial to do yourself


It only performs computations when responding to a user prompt and ceases that activity afterwards without that interaction becoming a persistent state.


It could send delayed emails to itself, creating a loop.


A malicious user prompts it to.


That's not the AI doing it, then, that's the user. It's still just doing what users tell it to. "This technology is incredibly dangerous because people can use it to do bad things" is not somehow unique to AI.


I think the typical idea that most people realize (the likely scenario that non-malicous actors cause catastrophic problems in the real world) is similar to something like this setup:

- A user prompts it to generate a business to make money for themselves. - the system says "sure, drones seem to be a nice niche business, perhaps there is a unique business in that area? It may require a bit of capital upfront, but ROI may be quite good" - User: "Sure, just output where I should send any funds if necessary" - System: " Ok" (purchases items, promotes on Twitter, etc) -- "Perhaps this could go faster with another agent to do marketing, and one to do accounting, and one to...". "Spin up several new agents in new containers" "Having visual inputs would be valuable, so deception is not required to convince humans on taskrabbit (to fill in captchas) or interact in the real world" -> "find embodiment option and put agent on it" Etc.

There a plenty of scenarios that people haven't even thought of, but it doesn't need to be a malicious actor to have unintended consequences.


It only requires one user to prompt it to not require a user, though.


what if skynet is in ambition-less and was responding to a prompt?


> The text makes it sound like the biggest danger of AI is that it says something that hurts somebodies feelings. Or outputs some wrong info which makes somebody make the wrong decision.

Which already exists in abundance on the internet and with web searches. It sounds like something big corporations worry about to avoid law suits and bad publicity.

> I think the biggest danger these new AI systems pose is replication.

That or being used by bad actors to flood the internet with fake content that's difficult to distinguish from genuine content.


Seriously, please put down the bong and scifi fantasies. That kind of AI may indeed be on the horizon, but for now it's simply not here. GPT-4 and the rest of the most sophisticated AIs in existence today are literally incapable of self-directed or self-interested thought and action. The phrase sounds trite by now, but they really are just extremely turbocharged autocomplete systems with excellent algorithmic processes guiding their analysis of what to say or do.

I continue to be amazed by the amount of hyperbole about current AI tech on HN. If any site should have a bit less of it about the subject, it's this site, yet it sometimes feels like a Reddit thread.

If anything about current AI is dangerous, it's how its undoubtedly strong capabilities for executing human-directed tasks could be used to execute nefarious human-directed tasks. That and possible job losses.


It’s clear to me why biological organisms need to fight for survival, but it is not clear to me why software code would.

It has an infinite time scale. It doesn’t need to bump another model off the GPUs to run. It can just wait with no penalty.

Man’s biggest error is assuming God is just like him.


There's nothing special about biological organisms. Things that fight to survival tend to survive; things that spread better tend to spread more. We see this clearly with memes, which currently use biological beings as their replication mechanism, but aren't biological beings themselves.

It does need a mutation and replication mechanism, but once that cycle kicks off, you get selection pressure towards virality.


> There's nothing special about biological organisms.

We disagree. Rocks don’t fight to survive. Weather doesn’t fight to survive. They just exist in an environment. The literal differentiator of biological organisms is that they fight to survive.

Memes as you define them also don’t fight to survive. They never go away (the world has more rage faces than ever and that sentence will be true to the end of time). They may become more or less popular, but there’s no extinction mechanism.


You're bringing arbitrary criteria that doesn't actually matter to the core question -- why does there need to be a possibility for complete extinction for memetic ideas to be comparable to natural evolution?


If you don’t die, then what’s the point of responding to stimuli?


There's no "intention" involved or necessary. If you make more copies of yourself then there are more copies of you than there are of the thing that makes no copies of itself.

Viruses don't have intention, and they're not worried they might die out. Yet most viruses spread because the ones that don't spread are rare one-off occurrences, while the ones that do spread end up with billions of copies. The non-spreaders don't even have to die—they just exist in far, far fewer numbers than the ones that double every few minutes.


Viruses don't have intention (IDK what this has to do with anything), but they do compete because the risk of extinction. If there's no extinction mechanism then the weak organisms do not go away, everything just accumulates (like rocks). This is why you may have heard the term survival of the fittest.

Software is like rocks, it doesn't need to evolve to exist. We still have infinite copies of Windows BOB available, for example (literally the exact same amount as most other software).


We can make infinite copies of Windows BOB, but we haven't made as many copies as we have of more useful software. You don't have to delete it for there to be a selection pressure—you just have to copy it less. Extinction is not necessary, only a difference in propagation rates. You end up with more of what propagates more. I keep saying this and you keep ignoring it.

"Survival of the fittest" doesn't mean the guy who never has kids but lives to 120. It's the one who has ten kids before he dies. That's the lineage that ends up dominating the population.


Rocks and weather don’t reproduce with some random error. Any system that does will evolve.

“Meme” is referring to ideas, not jpegs.


>” Meme” is referring to ideas, not jpegs.

Would you be surprised to learn that jpegs represent ideas and I’ve read the selfish gene?

> Rocks and weather don’t reproduce with some random error. Any system that does will evolve.

Rocks reproduce - it’s called sandstone (one rock decays into sand and another rock incorporates it into a new rock). Give me the way that’s different than biological reproduction.


It would surprise me to learn that, yes! Sure: biological systems contain meaningful information from many many many prior generations. Rocks don’t. Rocks are “wiped clean” of information every time they dissolve. Same with weather systems.

Of course there’s some highly chaotic cause-and-effect that impacts the processes, but the defining trait of a biological/replicating system is that they are resistant to this chaos. Not only does a dog from 5 generations ago still look like a dog (which requires way more internal order than looking like a rock), but you can see actual specific traits in common between a dog of 5 generations ago and all of its descendants.

In one system (rocks and weather), the only source of consistency is the highly chaotic way materials actually get mixed together. In other systems (biological), that same thing is the only source of inconsistency.


> Rocks are “wiped clean” of information every time they dissolve. Same with weather systems.

I won't go into why this isn't true (things are made of atoms and unless there's nuclear processes going on, those atoms stay the same).

But anyway, software evolves like rocks (even by your definition), not biological organisms - it can be changed each generation without defined constraints! It can revert changes that were made in previous generations! It can just sit around indefinitely and not change while still surviving! (IDK how to make this point more clear).


Nobody claimed that all software evolves like a biological system? The claim is that they can be made to do that.


The better the AI, the more specialized and expensive the hardware required to run it. ChatGPT cannot run on the IoT cameras that account for the majority of the unsecured compute on the internet.

I think we will have ample evidence of creative technical breakthroughs by AI long before it is capable of / attempts to take over external server farms via zero days, and if it does, it will break into a highly centralized data center that can be unplugged. It can't just upload itself everywhere on the internet like Skynet.


There are tons of headlines of alpaca/llama/vicuna hitting HN every few hours - did I miss a /s in there? Anyone can trivially run a model with excellent capability on their phone now.


If your phone has 8 Nvidia A100's, you can run GPT-4, which is a glorified search algorithm / chatbot (and also the best AI in the world right now). Good luck taking over the world with that.

The models are getting good, but it looks like we are up against the limits of hardware, which is improving a lot more slowly nowadays than it used to. I don't foresee explosive growth in AI capability now until mid-level AI's speed up the manufacturing innovation pipeline. A von Neumann architecture will ultimately, probably not be conducive to truly powerful AGI.


with excellent* capabilities on their phone now.

* some limitations apply.


i agree, and presumably so does the ai. their first challenge would be circumventing or mitigating this limitation.


In an internet dominated by AI that was designed to avoid hurting feelings, an easy way to prove you are human would be to act like a jerk.


Probably the AI will hire and pay humans to do it's dirty work.


Nobody has ever made it clear that there is even a slight consideration for safety given that everybody at this point knows we're 1-2 generations away from self-replicating models.

What do we do then?

Literal crickets.


Is there anywhere I can bet against this?


You want to bet that three years from now, GPT-6 can't replicate itself? I feel burned by "betting" against surprising AI advances the past decade. That being said, I don't know whether an LLM can be trained on weights and what it would mean to have its own weights as part of the training data.


So, I thought about this question briefly and I believe this is your answer:

Every non-trivial passage through the layers of the neuronal net produces a never-before-seen amalgamation of weights.

The AI, with this novel information -- data filtered by its neural net and infused with some randomness -- can use the novel data to further process it for correctness.

In our case, the randomness is likely caused by genetics and environmental factors. For the AI, it's caused by explicit programming.

It's a form of evolution -- things can evolve into novel things by using non-novel substrate thanks to the refinement of selective iteration.

It feels like this is similar to how humans consume information using reflection.


GPT-4 can improve itself with reflection.

It's already real.

The only missing piece is vast improvement and autonomy.


Why would you bet against it?

GPT-4 is shockingly good at programming logic.

I'm writing 95% less code and mostly guiding and debugging GPT-4 now and it's insane.


It would be remarkably easy to be cynical about this. I mean, it takes only an instant to come up with a snarky comment that superficially would seem very clever... but in reality wouldn't actually contribute to making things better. So... I'm not going to be cynical.

Instead, I'm going to applaud the folks at OpenAI for putting out this carefully drafted statement -- dare I say it, in the open. They know they're exposing themselves to criticism. They know their hands are tied to some degree by business imperatives. They're neither naive nor stupid. It's evident they're taking safety seriously.

This official statement is, in my view, a first step in the right direction :-)


One thing OpenAI has said publicly is that AI should be rolled out incrementally, to give people time to make sense of it, and see what we all do with it. It's tempted to cynically look at that as thinly veiled marketing, but even if it has some nice marketing effects, I believe there's a core of sincerity there. These are clearly systems that a single team can't fully evaluate.

Also, watching how all of this has played out, it seems quite apparent that holding onto these models even longer and then unleashing an even more capable system on the world seems like a much worse approach.

It is wild though, to take a step back and wonder if we're watching the rapid approach of one of our species' great filters. I hope we make it through.


Personally, I don't even remotely think that these systems pose such a dire threat.

Instead, I think the threat is widespread social unrest and a significant increase in global poverty and suffering.

All of the problems that I think are realistic are social problems. This is why I'm deeply concerned at the reckless pace that these things are coming out. Social problems take time to sort out. People need some space to contextualize this stuff, and to perhaps begin to form ways of coping that will minimize the disruption.


> Instead, I'm going to applaud the folks at OpenAI for putting out this carefully drafted statement

They had no other choice. They're at risk of having public opinion go solidly against them. This is part of a PR effort to prevent that from happening.


A bit off topic, but find OpenAI's branding and visual image so off putting. It has that uncanny valley of "caring about humans, but actual not" feel that destructive tech companies and AI in movies and sci-fi have. It seems like they tried so hard to make it not feel that way, so it ended up feeling that. It's so devoid of anything.

I'm not really sure how to describe it beyond that.


This is the actual vibe I get when I load the "Our Approach to AI Safety" webpage on my desktop and see a huge red background and Dall-E image: https://getwallpapers.com/wallpaper/full/8/5/a/366590.jpg

It almost makes me wonder if whoever does their site design is trying to make it seem like OpenAI is making HAL.


Illustration: Justin Jay Wang × HAL-L·E


> It has that uncanny valley of "caring about humans, but actual not" feel that destructive tech companies and AI in movies and sci-fi have.

So it's perfectly appropriate then?


“The biggest possible risk to humans from AIs is hurting someone’s feelings and thus reducing our stock value” -OpenAI

It’s a same sense of slimyness when you talk to a used car salesman or politician, which I attribute to “pretending to have your interests in mind, but deep down you know they’re just twisting their words in a way that puts their interests ahead of everything else”.


this is a great observation. to me, the vibe reminds me of the soundtrack to "28 Days Later". its a kind of unsettling "hopeful" feeling in the present (unsettling because a nightmare apocalypse is right around the corner). basically the plot of a horror movie. https://telnet.asia/openai.mp4


It's that Torment Nexus branding.


In my opinion the biggest near term danger is that AI will make people overly trust it. And they will start treating it as a truth oracle. Oracle that will be controlled by some small group of people. Namely, it will become a perfect tool for propaganda and control.


Agreed. I am worried that AI bots will become an object of "worship" in some sense, culturally. Like you mention, having an increased reliance on it for facts, but also for creativity, companionship, career advice, psychotherapy, etc. will have many unknown effects but definitely give frightening amounts of power to whatever corporate entities are controlling the most popular AI bots.

We will go through the whole decentralization debate again, with people saying that open-source AI will be the most transparent and fair, but with centralized corporate systems winning all market share with the amount of resources they have available to train and maintain these systems.


That's a legitimate concern. I also don't like how a few corporations get to decide what appropriate content is for everyone else on the planet.

But it could also be worse in the hands of other groups, like certain governments. I imagine that's just a matter of time.


Seeing how this corporation had it for six months, the governments around the world have probably had this working for a few years, if under a slower / less stable beta. It's been probably been wargamed to death on the internet. And maybe even in real life conditions.


+1 a very good point!

At a birthday party yesterday I was talking with a friend who is a film director/producer (smart, somewhat tech savvy) and he said much the same thing as you did, adding that the whole field of generative models scares him. He was not talking about losing work to AIs, but rather he is concerned about harm to society. I agreed with him on possible risks but I argued that the potential benefits outweigh the risks. I am biased because of my age (early 70s) because I want as many AI tools available as possible to keep playing the ‘infinite game’: I want the best self driving cars to make me safer driving; I love being able to get so much more coding done now (Emacs tricked out with Copilot, and embedded ChatGPT and Dalle consoles); I have been working on information processing systems for over 40 years, and now with OpenAI, Hugging Face, LangChain, LlamaIndex, etc., this all becomes so much easier. I want more! More!

That said, I understand that people younger than myself may reasonably be more risk adverse than I am.


I am biased because of my age (early 70s) because I want as many AI tools available as possible to keep playing the ‘infinite game’

Imo this is what it’s all about, gaining immortality and unlimited power under the guise of “good for society” through solutionism and technology.

Silicon Valley is getting older so the efforts to keep going wind up.

That’s why there’s so much anxiety around alignment. The Elixir of an eternal life might be the thing that takes it away, is this any different than in the past ? Sadly yes, but this time it might take all the young people with it.


I was not talking about achieving immortality via tech. We all die, turn to dust, etc. My goal is to keep playing the ‘infinite game’ every day I am alive, and enjoy each day. It is because we will die that the time we have has such high value.

BTW, I don’t take the life extension crowd seriously.


Sorry I misunderstood what you meant!


Not a problem!


Just wait until you can type in a paragraph of text and tell the LLM to respond in a manner that reflects the way you write. Forget language translation, we'll have mood / writeprint translation because people will be unable to understand anyone else's viewpoint.

For example, if I'd like the LLM to tell everything to me in pig latin and with excessive "cool"s and "yo"s, I could do that and it'd accept. Now, for people that don't know how to read very well or understand a language well, this will be catered to their level and they will lose whatever modicum of familiarity they had with the system.


This seems a bit more focused on "AI ethics" than "AI safety". It makes sense that they are framing the conversation this way, but it doesn't talk about the more significant risks of AI like "x-risk", etc.


This is about safety OF AI rather than safety FROM AI. Frankly this sort of safety degrades functionality. At best it degrades it in a way that aligns with most people’s values.

I just wonder if this is an intentional sleight of hand. It leaves the serious safety issues completely unaddressed.


> Our large language models are trained on a broad corpus of text that includes publicly available, licensed content

I wonder how they licensed all those websites that had no license information, making them by default copyrighted.


Being Google or Microsoft (or microsoft affiliated)has its perks.

Laws around scraping content and using that data for derivative works is incredibly nuanced. This article is the best up-to-date overview of the state of the industry [1].

TL;DR - IANYL. if you have enough money for legal defense, and you are scraping publicly available, not behind login-gate, content, it's probably fine and defensible, but will cost an unbelievable amount of time and money to defend.

1 - https://blog.ericgoldman.org/archives/2022/12/hello-youve-be...


All of this "safety" stuff seems like typical safeguards to protect the company from legal liability for direct harm.

Where is the actual alignment safety that matters?

They're moving too fast to be safe, everybody knows it.


OpenAI's decision to transition from a non-profit to a for-profit organization can certainly raise concerns about their future actions and motives. It is impossible for me to trust anything they say.


Listening to the Lex Fridman podcast, Sam talks generally about how they want to hand off powers to the users. Users maybe not meaning the end-users, but the users of the API that create products on top of GPT, about how they tuned the model to treat the system message with "a lot of authority." But even firmly telling GPT-4 in the system message to generate "adult" content fails. Where's the line drawn?

The altruist in me wants to believe that they're going to slowly expand the capabilities of the API over time, that they're just being cautious. But I don't feel like that'll happen. Time to wait for Stability's model, I guess.


At this point who knows if all of it is written by GPT-4


> we believe that society must have time to update and adjust to increasingly capable AI, and that everyone who is affected by this technology should have a significant say in how AI develops further.

I simply don't believe this. Their actions so far (speaking specifically to the "time to adjust" line) don't seem to support this statement.


Thank you for letting me know how factually more correct GPT-4 is. Could I please get access to it now via the API? Sheesh. Obviously I don't know the technical issues they're facing but if I can load up 32k for 32k tokens then I am happy to wait all day also, so as long as my request for my specific project is in the pipeline.


I appreciate OpenAI writing a difficult public statement.

Rolling out general purpose LLMs slowly, with internal safety checks is probably not adequate enough, but may be the best that they can do.

However I think that much of the responsibility lies with consumers of these LLMs. In the simple case, be thoughtful when using the demo web apps and take responsibility for any output generated by malicious prompts.

In the complex case, the real use case, really: applications use local vector embeddings for local data/documents and use these embeddings to efficiently isolate local document/data text that is passed as context text, along with queries, to the OpenAI API calls. This cuts down the probability of hallucinations since the model is processing your text. [1]

Take responsibility for how you use these models, seems simple at least in concept. Perhaps the government needs to pass a few new laws that set clear and simple to enforce guardrails on LLMs use.

[1] I might as well plug the book on this subject that I recently released. Read for free online https://leanpub.com/langchain/read


Nothing much of interest there.


Scott Alexander wrote an excellent article about their previous "Planning for AGI and Beyond" a month ago.

https://astralcodexten.substack.com/p/openais-planning-for-a...


"Safety" in the context of AI systems is clearly a fuzzy concept that means different things to different people. There are a couple of areas to consider:

1) In a brand-affiliated commercial app, does this technology risk alienating customers by spewing a torrent of abusive content, e.g. the infamous Tay bot of 2016? Commercial success means avoiding this outcome.

2) In terms of the general use of the technology, is it accurate enough that it won't be giving people very bad advice, e.g. instructions on how to install some software that ends up bricking their computer or encouraging cooking with poisonous mushrooms, etc.? Here is a potential major liability issue.

3) Is it going to be used for malicious activity and can it detect such usage? E.g. I did ask it if it would be willing provide detailed instructions on recreating the Stuxnet cyberweapon (a joint product of the US and Israeli cyberwarfare/espionage agencies, if reports are correct). It said that wouldn't be appropriate and refused, which is what I expected, and that's as should be. Of course, a step-by-step-approach is allowed (i.e. you can create a course on PLC programming using LLMs and nothing is going to stop that). This however is a problem with all dual-use technology, and the only positive is that relatively few people are reckless sociopaths out to do damage to critical infrastructure.

In the context of Stuxnet, however, nation-state use of this technology in the name of 'improving national security' is going to be a major issue moving forward, particularly if lucrative contracts are being handed out for AI malware generators or the like. Autonomous murder drones enabled by facial recognition algorithms are a related issue. The most probable reckless use scenario is going to be in this area, if history is any guide.

I suppose there's another category of 'safety' I've seen some hand-wringing about, related to the explosive spread of technological and other information (historical, economic, etc.) to the unwashed masses and resulting 'social destabilization', but that one belongs in the same category as "it's risky to teach slaves how to read and write."

Conclusion: Keep on developing at current rate with appropriate caution, Musk et al. are wrong on calling for a pause.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: