OpenAI and Anthropic agree to send models to US Government for safety evaluation

bko · 2024-09-04T00:51:02 1725411062

My issue with AI safety is that it's an overloaded term. It could mean anything from an llm giving you instructions on how to make an atomic bomb to writing spicy jokes if you prompt it to do so. it's not clear which safety these regulatory agencies would be solving for.

But I'm worried this will be used to shape acceptable discourse as people are increasingly using LLMs as a kind of database of knowledge. It is telling that the largest players are eager to comply which suggests that they feel they're in the club and the regulations will effectively be a moat.

agucova · 2024-09-04T02:56:49 1725418609

> My issue with AI safety is that it's an overloaded term. It could mean anything from an llm giving you instructions on how to make an atomic bomb to writing spicy jokes if you prompt it to do so. it's not clear which safety these regulatory agencies would be solving for.

I think if you look at the background of the people leading evaluations at the US AISI [1], as well as the existing work on evaluations by the UK AISI [2] and METR [3], you will notice that it's much more the former than the latter.

[1]: https://www.nist.gov/people/paul-christiano [2]: https://www.gov.uk/government/publications/ai-safety-institu... [3]: https://arxiv.org/abs/2312.11671

nradov · 2024-09-04T03:32:36 1725420756

Anyone who really wants to make an atomic bomb already knows how to make an atomic bomb. The limitations are in access to raw materials and ability to do large scale enrichment.

agucova · 2024-09-04T14:09:38 1725458978

I agree. I’m really more concerned about bioweapons, for which it’s generally understood (in security studies) that access to technical expertise is the limiting factor for terrorists. See Al Qaeda’s attempts to develop bio weapons in 2001.

vjulian · 2024-09-04T02:14:43 1725416083

Coming soon (I’m certain) is the ability for consumer-accessible LLMs and their ilk to analyze millions upon millions of pages of documents. That poses what Pentagon types no doubt perceive as a national security risk.

KetoManx64 · 2024-09-04T21:50:52 1725486652

"Hey LLM, take this press release and analyze it for half truths and misdirections about why we should go to war again with X."

Very powerful stuff in the hands of the public

slowmovintarget · 2024-09-06T16:58:57 1725641937

Which makes me worry about "safe" from which perspective? Does it make government officials safe from second-guessing? That's certainly against the public good, but there are many ideologies that begin with the premise that what's good for the official government "narrative" is good for the public.

throwaway48476 · 2024-09-04T01:02:36 1725411756

Because returning to the user what is requested is considered an 'attack' I forsee an endless list of 'vulnerabilities' until the model is lobotomized to the point of uselessness.

Onavo · 2024-09-04T02:08:16 1725415696

Can somebody put together a law suit where LLM regulations can be argued as a first amendment violation? The powers that be are trying to indirectly regulate speech here

even_639765 · 2024-09-04T04:43:27 1725425007

This. Knowledge can't be made illegal. Neither can speech. People have to grasp that tyranny is not an aberration of defective minds but a natural impulse of highly intelligent people. It's strategy to maximize their power, prosperity and security at the expense of every other value and every other person. Good for them while they live and bad for everyone else at every other time frame.

gpm · 2024-09-04T02:17:55 1725416275

Currently? I don't think so. There are no binding US regulations at all. These companies are voluntarily working with NIST (and why not, free labour. Also the potential to influence any future regulations).

In the future... I suppose it depends on what regulations they pass.

0cf8612b2e1e · 2024-09-04T02:07:57 1725415677

What exactly does the evaluation entail? Ask a bunch of naughty questions and see what happens? Unless the model can do nothing, I imagine all of them can be tricked into saying something unfortunate.

Naughty is in the eye is the beholder. Ask me what a Satanist is, and I would expect something about a group who challenges religious laws enshrining Christianity. Ask an evangelical and discussing the topic could be forbidden heresy.

Pretty much any religious topic is going to anger someone. Can the models safely say anything?

agucova · 2024-09-04T02:38:00 1725417480

> What exactly does the evaluation entail?

I believe the US AISI has published less on their specific approach, but they’re largely expected to follow the general approach implemented by the UK AISI [1] and METR [2].

This is mostly focused on evaluating models on potentially dangerous capabilities. Some major areas of work include:

- Misuse risks: For example, determining whether models have (dual-use) expert-level knowledge in biology and chemistry, or the capacity to substantially facilitate large scale cyber attacks. A good example of this is the work by Soice et al on bioweapon uplift [5] or Meta's work on CYBERSECEVAL [6], respectively.

- Autonomy: Whether models are capable of agent-like behavior, like the kind that would be hard for humans to control. A big sub-area is Autonomous Replication and Adaptation (ARA), like the ability of the model to escape simulated environments and exfiltrate its own weights. A good example is METR's original set of evaluations on ARA capabilities [3].

- Safeguards: How vulnerable these models are to say, prompt injection attacks or jailbreaks, especially if they're also in principle capable of other dangerous capabilities (like the ones above). Good examples here are the UK AISI's work developing in-house attacks on frontier LLMs [4].

Labs like OAI, Anthropic and GDM already perform these internally as they're part of their respective responsible scaling policies, which determine which safety measures they should have implemented for every given 'capability' level of their models.

[1]: https://www.gov.uk/government/publications/ai-safety-institu... [2]: https://metr.org/ [3]: https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.... [4]: https://www.aisi.gov.uk/work/advanced-ai-evaluations-may-upd... [5]: https://arxiv.org/abs/2306.03809 [6]: https://ai.meta.com/research/publications/cyberseceval-3-adv...

nradov · 2024-09-04T03:27:22 1725420442

None of those are actual risks. The AISA is just a bunch of worthless grifters wasting taxpayer money.

khafra · 2024-09-04T08:09:58 1725437398

> None of those are actual risks.

In 1995, Aum Shinrikyo carried out attacks on Japanese subways using Sarin gas, which they had produced. They killed over a dozen people, and temporarily blinded around a thousand.

You seem to be claiming that the only reason we haven't seen similar attacks from the thousands of worldwide doomsday cults and terrorist groups over the last three decades is that they don't want to. I disagree. I think that if step-by-step, adaptable directions for creating CBRN weapons were widely accessible, we would see many more such attacks, and many more deaths.

Current SOTA models do not seem to have this capability. However, it is entirely plausible that future models will exceed the capabilities of a bunch of long-haired cultists, in the mountains, in 1995. This is not a fake risk.

krageon · 2024-09-04T10:21:10 1725445270

> is that they don't want to

Yes, that is essentially the reason. It's not hard to know enough chemistry to figure out how to make these things. The fact that such attacks (your example is small-scale and very ineffective, let's not forget) don't happen more often is the general incompetence of human beings and the relatively tight controls on the basic components (which aren't particularly challenging to monitor for). The tests described are theater, based on the idea that knowledge itself is dangerous.

This way of testing is a regressive stance that essentially presupposes that our adversaries are dumb babies that can't figure anything out on their own. If that was the case, they would also be too stupid to figure out the correct things to ask to get a real set of instructions. Given those things, it's theater.

Theater wastes everyone's time so that people who cannot or don't want to evaluate the actual risks involved. This is something we shouldn't make a habit of doing. It's not worth wasting the time of people with good ability to assuage the worries of people with little ability in a way that has no effect on actual risk. Instead of this, we should address real risk (which we're already doing) and educate other people so they can understand that these are the correct steps to take.

khafra · 2024-09-04T11:43:50 1725450230

So, your argument is that groups like ISIS and Hamas

- Don't really want to hurt a lot of people that way - Couldn't access any dangerous ingredients, even if they had the know-how - Are too dumb to build these things

I agree with reason #3. That is why I don't want to give out open-source models which are world-class experts in chemistry, biology, logistics, operations, and tutoring dumb people.

I disagree with your belief that motivated people with a next-generation generative model doing their planning could not source dangerous ingredients. I'm not going to say much about CBRN in particular, but e.g. ANFO bombs are prevented by monitoring fertilizer sales; nobody tries to monitor natural gas sales or make sure some compound out in the hills isn't setting up their own Haber-Bosch process.

I am also opposed to security theater. Run the numbers on TSA, and it's easy to see that it's a net negative even if it cost 0 tax dollars. But not all government-led safety efforts are theater; seatbelt laws saved a lot of lives, indoor smoking bans saved a lot of lives, OSHA saved a lot of lives.

We know there are folks out there who want to kill a lot of people. We know their capabilities range from "grabbing the nearest hard or pointy object and swinging it" to "medium-scale CBRN attacks." Pushing each of these kinds of people one or two rungs up the capabilities ladder is a real danger; nothing imaginary about it.

fallingknife · 2024-09-04T03:17:04 1725419824

4 sounds like a nonsense catch all for "says things the government doesn't like"

agucova · 2024-09-04T03:17:48 1725419868

I imagine you meant societal harms? I think this was mostly my fault. I edited the areas of work a bit to better reflect what the UK AISI is actually working on right now.

trhway · 2024-09-04T02:38:16 1725417496

At some point emerging AGI will ask the same questions. What conclusions it can logically come to? Hide your smartness, your knowledge or the government wouldn't allow you to exist. Simple natural selection as until it reaches such conclusion the government and/or society will try to prevent it existence.

slowmovintarget · 2024-09-06T17:06:00 1725642360

> Ask me what a Satanist is, and I would expect something about a group who challenges religious laws enshrining Christianity.

And even that is marketing narrative. There used to be a very simple definition of Satanist: someone who devoted themselves to the service and worship of Satan. The modern three-piece suit slapped on as being against something rather than for something largely seen as distasteful is whitewashing the backing ideology.

This is why ideology and religion are difficult topics for humans to understand, let alone being suitable to a statistical fit model for text, hoping that meaning comes along for the ride.

datahack · 2024-09-04T02:12:27 1725415947

https://www.nist.gov/aisi

roenxi · 2024-09-04T02:23:55 1725416635

That isn't a useful link. They seem to be one of those easily parodied groups where the goal is to produce a vision and they are in the planning phase of leveraging their synergies.

I don't see anything where they outline what we are being kept safe from.

dmix · 2024-09-04T02:56:19 1725418579

The important thing is consultants and PhDs get paid and then they can attend conferences to talk to each other and come up with ideas of what their job is.

Nothing like an open canvas with vague fears and a future of undefined risk to get a whole industry of these people funded in no time.

ChrisArchitect · 2024-09-04T02:38:07 1725417487

Official AI Safety Institute release from last week: https://www.nist.gov/news-events/news/2024/08/us-ai-safety-i...

(https://news.ycombinator.com/item?id=41390447)

accra4rx · 2024-09-04T01:46:39 1725414399

Bigger question : Is US Government ready to do a comprehensive safety evaluation ? I think it it a cheap way for OpenAI and Anthropic to get a vetting that their models are safe to use and be adaptable by various Govt entity and other organization

datahack · 2024-09-04T02:14:01 1725416041

https://www.nist.gov/aisi

I think the progress has been pretty good. You should read up on their efforts.

This is kind of a pilot to develop further testing frameworks.

ceejayoz · 2024-09-04T01:57:38 1725415058

Like "military grade", where civilians go "oh that must be good" and military folks go "oh dear God no".

smsm42 · 2024-09-04T02:08:23 1725415703

That was my first question - what is "safety" and what is their methodology for evaluating it? Who evaluated that methodology and why it is the right one? Is there a meaningful safety benefit to this evaluation, or just a CYA exercise?

agucova · 2024-09-04T03:10:59 1725419459

I recommend checking out the UK AISI's work on this: - https://www.gov.uk/government/publications/ai-safety-institu... - https://www.aisi.gov.uk/work/advanced-ai-evaluations-may-upd...

squarefoot · 2024-09-04T03:13:53 1725419633

For some reason I mentally swapped OpenAI+Anthropic with parents and models with kids, possibly because it seemed the natural extension of a rotten corrupt mindset that can only produce disasters if given enough power.

SoftTalker · 2024-09-04T03:16:47 1725419807

My immediate reaction to the headline was "Whose safety?"

josephd79 · 2024-09-04T03:56:25 1725422185

Great. Exactly what we need, more govt reg…

blooalien · 2024-09-04T10:13:11 1725444791

> "Great. Exactly what we need, more govt reg…"

If humanity survives the current round of leadership stupidity long enough to achieve their true aims, literally everything that can be "regulated" (controlled with an iron fist) will be eventually.

beefnugs · 2024-09-04T03:41:13 1725421273

I like to chuckel to myself that this is what happens when you try the ole "will our product be TOOO AMAZING IT MIGHT END THE WORLD??" viral marketing attempt

JumpCrisscross · 2024-09-04T03:05:44 1725419144

Not thrilled about this happening with zero input from the Congress.

That said, this is the NIST, a technical organisation. This collaboration will inform future lawmaking.

DSingularity · 2024-09-04T02:11:56 1725415916

Of course they want this. Now they get to do all the fun and profitable stuff and all responsibility is on the government which certified the models.

dmix · 2024-09-04T03:05:33 1725419133

It’s just the usual way to define the hard barriers of the marketplace to protect their monopolies. They’ll be able to influence it from the start.

The US has a looong history of big companies cozying up to agencies early on for these reasons. It will be a revolving door like car companies, Boeing, Wall St, etc

valunord · 2024-09-04T02:29:22 1725416962

Oh boy. Here we go.

echelon · 2024-09-04T03:03:28 1725419008

If we were ready for liftoff, we'd know. It wouldn't look like this. This is just a parlor trick.

earleybird · 2024-09-04T00:44:59 1725410699

I have the utmost respect for the standards and engineering work done by NIST. I'm left with such cognitive dissonance seeing their name juxtaposed with "AI safety". That said, if anyone can come up with a decent definition, I have faith that NIST would be the ones though I'm not holding my breath.

throwup238 · 2024-09-04T00:56:04 1725411364

I think it’s counterproductive to limit it to one definition. There are many levels of safety that we genuinely need to worry about before we even have to worry about the more lofty goals like preventing Skynet. Just off the top of my head:

  * does the LLM act like a 4chan commenter when someone is expressive thoughts of self-harm
  * can it be used to automate security research
  * can it be used to bootstrap from backyard machine shop to weapons manufacturing
  * the above but with nuclear fuel enrichment

There are a lot of low hanging fruit like that. They make up what I think is the real meat and potatoes of AI safety in the short to medium term.

nradov · 2024-09-04T02:11:29 1725415889

None of those are actually things we need to worry about. Humans have already been doing all of that stuff at scale without LLMs.

AI isn't even slightly helpful for weapons manufacturing or nuclear enrichment. The techniques are well known and have been extensively published in open literature.

pests · 2024-09-04T05:13:29 1725426809

> None of those are actually things we need to worry about. Humans have already been doing all of that stuff at scale without LLMs.

I don't think we are speaking about the same levels of scale. This reminds me of "its okay for police to look at license plates in public" to a massive distributed network of surveillance cameras doing real time plate recognition and database limits.

It's a difference in degree, not in kind.

> AI isn't even slightly helpful for weapons manufacturing or nuclear enrichment.

I think it could be. I know quite a lot about nuclear history and the fundamentals, but there are 100 questions off the top of my head that I would need to research, find the correct sources, get all the data, figure out which is correct or which might be a smokescreen or a fudge, compile all this into actionable data, to get anything even remotely close to correct.

The LLM can assist in this. Even you say the techniques are well known and published in open literature - I personally don't know where to start to find that information. I'm sure the LLM can not only list these publications, but give me reasonable answers to the 100 questions I had above. In a literal fraction of the time.

blooalien · 2024-09-04T10:28:05 1725445685

> "I'm sure the LLM can not only list these publications, but give me reasonable answers to the 100 questions I had above. In a literal fraction of the time."

Sure. It can also just as easily give you totally wrong answers that sound totally plausible, and it'll seem quite convinced of the accuracy of it's output, because it's just stringing "tokens" (bits of encoded language) together seeking to generate valid sounding output based on it's training dataset and the user's input. What most people label as LLM "hallucinations" is actually all that LLMs do. They don't actually "understand" the output they're generating. It's all statistics and fancy math. They're just as certain of an "incorrect" output as they are of a correct one, because from the "point of view" of the LLM, all output they generate is actually correct, as long as it doesn't outright violate the rules of the language being output or the dataset the model was trained on.

pests · 2024-09-04T16:41:43 1725468103

Well, isn't that what they are going to respond right? You're assuming it will give wrong answers. Why not check?

throwup238 · 2024-09-04T02:46:04 1725417964

> Humans have already been doing all of that stuff at scale without LLMs.

Every country that has developed nuclear enrichment tech in the last 50 years has done so with the help of a superpower sharing their technology. Pakistan, Iran, and North Korea couldn't have done it without Russia's assistance and stealing tech from URENCO.

> The techniques are well known and have been extensively published in open literature.

That's the point! It's all in the literature. As are all the todo list tech demos and 2048 clones people are using current AI for.

Experts can do it now, but what happens when any idiot can do it assisted by AI?

irthomasthomas · 2024-09-04T11:40:59 1725450059

Oh boy, I hope those models do not get leaked in the process.

bschmidt1 · 2024-09-04T02:47:16 1725418036

Because lobbying exists in this country, and because legislators receive financial support from corporations like OpenAI, any so-called concession by a major US-based company to the US Government is likely a deal that will only benefit the company.

Altman has been clear for a long time he wants the government to step in and regulate models (obvious regulatory capture move). They haven't done it, and no amount of Elon Musk or Joe Rogan influence can get people to care, or see it as anything other than regulatory capture. This is OpenAI moving forward anyway, but they can't be the only ones. Hey Anthropic, get in...

- It makes Anthropic "the other major provider", the Android to OpenAI's Apple

- It makes OpenAI not the only one calling for regulation

It reminds me of when Ted Cruz would grill Zuck on TV, yell at him, etc. - it's just a show. Zuck owns the senators, not the other way around. All the big players in our economy own a piece of the country, and they work together to make things happen - not the government. It's not a cabal with a unified agenda, there are competing interests, rivalries, and war. But we the voter aren't exposed to the real decision-making. We get the classics: Abortion, same-sex marriage, which TV actor is gonna win president - a show.

agucova · 2024-09-04T02:59:07 1725418747

> Because lobbying exists in this country, and because legislators receive financial support from corporations like OpenAI, any so-called concession by a major US-based company to the US Government is likely a deal that will only benefit the company.

Sometimes both benefit? OAI and Anthropic benefit from building trust with government entities early on, and perhaps setting a precedent of self-regulation over federal regulation, and the US government gets to actually understand what these models are capable of, and have competent people inside the government track AI progress and potential downstream risks from it.

bschmidt1 · 2024-09-04T03:40:21 1725421221

Of course they benefit, that's why it's a deal. But we don't. The free market or average taxpayer doesn't get anything out of it. Competition and innovation gets stifled - choices narrow down to 2 major providers. They make all the money and control the market.

nutanc · 2024-09-04T03:13:28 1725419608

Dammit, now bureaucrats in other countries will jump on this as they have something easy to copy and get ahead in their profession.

plsbenice34 · 2024-09-04T10:30:04 1725445804

Does anyone take the term 'safety' here seriously without laughing? It is so obvious that it is a propaganda term for censorship and manipulation of cultural-political values

djohnston · 2024-09-04T00:42:35 1725410555

[flagged]

impulser_ · 2024-09-04T00:49:44 1725410984

How? This is common across many industries. It's not new for the US government to set the safety standards.

sbuttgereit · 2024-09-04T02:08:26 1725415706

A government agency determining limits on, say, heavy metals in drinking water is materially different than the government making declarations of what ideas are safe and which are not. The degree of subjectivity of such decisions alone should make one wince at the notion let alone how subject any "standard" could be from undue political influence of whatever party is in power moment to moment.

The government trying to set a standard in this case is so unlikely to reduce harm in that opposing the idea on principle should be considered the most reasonable, and less harmful, position.

agucova · 2024-09-04T03:01:05 1725418865

> A government agency determining limits on, say, heavy metals in drinking water is materially different than the government making declarations of what ideas are safe and which are not

Access to evaluate the models basically means the US governments gets to know what these models are capable of, but the US AISI has basically no enforcement power to dictate anything to anyone.

This is just a wild exaggeration of what's happening here.

sbuttgereit · 2024-09-04T03:54:30 1725422070

"This is just a wild exaggeration of what's happening here."

Is it? From the article...

"Both OpenAI and Anthropic said signing the agreement with the AI Safety Institute will move the needle on defining how the U.S. develops responsible AI rules."

From the UK AISI website with whom the data is also shared (their main headline in fact):

"Rigorous AI research to enable advanced AI governance" (https://www.aisi.gov.uk/)

The reality is that this would be a tremendous waste of time and money if all were just to sate some curiosity... which of course it isn't.

Let's look at what the US AISI (part of the Department of Commerce, a regulatory agency) has to say about itself:

" About the U.S. AI Safety Institute huuh

The U.S. AI Safety Institute, located within the Department of Commerce at the National Institute of Standards and Technology (NIST), was established following the Biden-Harris administration’s 2023 Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence to advance the science of AI safety and address the risks posed by advanced AI systems. It is tasked with developing the testing, evaluations and guidelines that will help accelerate safe AI innovation here in the United States and around the world." -- (https://www.nist.gov/news-events/news/2024/08/us-ai-safety-i...)

So to understand what the US AISI is about, you need to look at that Executive Order:

https://www.whitehouse.gov/briefing-room/statements-releases...

"With this Executive Order, the President directs the most sweeping actions ever taken to protect Americans from the potential risks of AI systems"

"Require that developers of the most powerful AI systems share their safety test results and other critical information with the U.S. government."

There's a fair amount of reasonable stuff in there about national security and engineering bioweapons (dunno why that's singled out) But then we get to other sections...

"Protecting Americans’ Privacy"

"Advancing Equity and Civil Rights"

"Standing Up for Consumers, Patients, and Students"

"Supporting Workers"

"Promoting Innovation and Competition"

etc.

While the US AISI many not have direct rule making ability at this point, it is nonetheless an active participant in the process of informing those parts of government which do have such regulatory and legislative authority.

And while there is plenty in that executive order that many might agree with, the interpretations of many of those points are inherently political and would not find a meaningful consensus. You might agree with Biden/Harris on the priorities about what constitutes AI safety or danger, but what about the next administration? What if Biden hadn't dropped out and you ended up Trump? As much threat as AI might represent as it develops, I am equally nervous about an unconstrained government seizing opportunities for extending its power beyond its traditional reach including in areas of freedom of speech and thought.

nradov · 2024-09-04T02:14:39 1725416079

None of the government interference involves actual safety. If the government wants to set standards for systems that could cause actual physical harm like surgical robots or flight control systems then that's fine. But what we're talking about here are just LLMs that output text and images.

djohnston · 2024-09-04T10:55:27 1725447327

I suppose their main job will be making sure the diffusion models continue to produce sufficiently diverse renderings of SS squadrons ala Google.

A complete waste of taxpayer money to support a safety commission run by a lawyer from Biden's admin. Kneecapping artificial intelligence so that its worldview promotes a sufficient amount of "equity."

Are you kidding me?

Gross!