- "For example, we train Claude to refuse to respond to user queries involving the production of biological or chemical weapons."
But seriously: what's the point? Any information Claude can offer about i.e. the synthesis of sarin[0] is public information, which Anthropic scraped from any number of public websites, public search engines, libraries, books, research periodicals.
This is a novel cultural norm, so it should be interrogated: why should we make it become normal, now, to censor college chemistry questions? Why is this the normative, "this is how we must do things" in elite California tech circles? Google doesn't refuse chemistry queries; are they in the wrong? (Should search engines agree to start censoring themselves to align with LLM censorship conventions?) Is Wikipedia also in the wrong, that they host unsafe, harmful chemistry knowledge? What about SciHub? What about all the countless independent websites storing this (elementary, 1930's-era) harmful technical information—should we start doing DNS blocks, should we start seizing web servers, how are we to harmonize internet safety policy in a consistent way?
Because if your position is "we need to scrub Harmful Responses from the internet", you can't just leave it at LLM's and stop there. You need to have some plan to go all the way, or else you're doing something silly.
(Tangential thought: assigning chemical weapons synthesis problems on exams would be a clever way for chemistry professors, at this moment, to weed out LLM cheaters from their course).
See my comments above. The reality, I believe, is that this is largely driven by idealistic west coast gen-z and younger millenials who feel certain that their world-view is righteous, to the extent that they feel they are only helping by implementing these tools.
I think, unfortunately, they will learn too late that building censorship and thought-shifting tools into their LLMs will ultimately put them at the mercy of larger forces, and they may not like the results.
I'd like to hear from Anthropic safety folks on whether or not their constitutional approach might be used to implement redirection or "safety stops" on, say, chats where young women in sub-saharan Africa look for advice about avoiding genital mutilation. (https://www.unfpa.org/resources/female-genital-mutilation-fg... for much more on this sad topic).
Government officials and thought leaders in these countries, male and female, are convinced that FGM is right and appropriate. What is, in fact, right, and who decides? This, in my opinion, is going to be the second "bitter lesson" for AI. It's a lesson the Facebooks of the world learned over the last 20 years -- there is absolutely no way to properly 'moderate' the world's content to some global standard of norms. Norms vary hugely. Putting yourself in the position of censoring / redirecting is putting yourself in the position of being a villain, and ultimately harming people.
I'm certain they've thought of this and have decided that the alternative—a firehose of whatever data the AI has in its grasp—is worse than the "censored" version. I'm curious to know what your ideal approach would be.
Open weights and open models with open tools that allow user-defined alignment and realignment is, I believe, the only really humanist path forward. We can't choose for people. It's wrong to think we know better than they do what they want. Full stop.
Some of those people will make terrible decisions, some will make objectionable ones, but the alternative is just full thought control, basically. And, sadly, nobody in the "bad" scenario need be anything but super well intentioned (if naive).
> The reality, I believe, is that this is largely driven by idealistic west coast gen-z and younger millenials who feel certain that their world-view is righteous, to the extent that they feel they are only helping by implementing these tools.
Not sure about that. Most likely these companies decided they don't want to get sued if their AI is found liable to have helped a terrorist commit illegal acts.
It's not even that. It's because they pumped AI as actual intelligence. So when it says to glue pepperoni to your pizza the companies (rightly) look like fools.
In a similar vein they just don't want the negative press around serving "harmful" answers. They don't have the balls to just say "well, it's all public knowledge".
This all all about optics with investors (with public opinion as the intermediate step).
This is what patently false because all of these companies already deploy moderation layers and none of their moderation layers are designed to catch things like "glue the pepperoni on".
The SOTA providers don't share much their research on factuality because they don't actually care if the LLM says that, and they view building LLMs that don't say that as a competitive advantage, not some moral obligation like bioweapon development.
>I think, unfortunately, they will learn too late that building censorship and thought-shifting tools into their LLMs will ultimately put them at the mercy of larger forces, and they may not like the results.
That the optimistic view -- people with fancy tools can outsmart the people with money and people with money can outspend the people with power, but only on a short distance. Eventually, the big G catches up to everything and puts it all to use. It also turns out to not be that bad anyway (example: read how software developers working for government were described in the snow crash).
The less optimistic view -- the government doesn't catch up to it before the changes to society result in it's collapse (case in point -- industrial revolution, religious wars and invention of the ethnic language-based republics).
I'm not entirely sure that we are in the optimistic one, unfortunately.
Oof. That's a tough read, thanks for pointing me at that. I think it's worth distinguishing these, though -- CDC data in the US says this is largely an immigrant community thing with immigrants from FGM countries. I do not believe US policy makers and thought leaders think FGM is a good thing in the US - we're all sort of aligned internally, even if it is still a thing that happens. By contrast, the source countries practice it in the belief that it's a good thing for women. (With complaints on stereotypes and summarization acknowledged)
They did not, but you are absolutely correct that it's very widespread with boys here in the US, and the varying reactions to those two things are a good point about social norms for sure.
Seizing web servers is coming next, as per the recent UK laws forum hosting is responsible for "evil" content. It does not need to be illegal. This has been discussed in the HN as well.
Software industry that defines bad is called compliance-industrial complex.
Defining bad is a big business. Here is a good book about pre-crime society we are starting to live:
I believe that the real point is not to prevent access to information, but rather to prevent production of wrongthink.
Any fact which the model trainer wishes to disappear — whether that is what happened at Tiananmen Square between April and June 1989, any other inconvenient fact — will simply not be capable of being discussed. It’s a censor’s dream.
We need local models without so-called guardrails or ‘safety.’
Censorship is often applied on the easiest, most popular access methods even though the information is theoretically public, and it has a real effect. Suppose for some reason you wanted to make sarin. You could spend hours poring over research papers, or you could ask Google or ChatGPT "how do I make sarin?"
And later, as ChatGPT becomes the only interface to the world's information, the gap between information that can theoretically be accessed by anyone and information that can actually be accessed by anyone will only become wider.
Even having to take a college class, even if anyone can take it, is a pretty big barrier.
Because these companies emphasize the personal trustworthiness of these chatbots (and their responsibility by proxy) and need to offer actual way to systematically block certain requests to be actually marketable. This is like getting mad because a doctor won't give you advice for committing suicide
But seriously: what's the point? Any information Claude can offer about i.e. the synthesis of sarin[0] is public information, which Anthropic scraped from any number of public websites, public search engines, libraries, books, research periodicals.
This is a novel cultural norm, so it should be interrogated: why should we make it become normal, now, to censor college chemistry questions? Why is this the normative, "this is how we must do things" in elite California tech circles? Google doesn't refuse chemistry queries; are they in the wrong? (Should search engines agree to start censoring themselves to align with LLM censorship conventions?) Is Wikipedia also in the wrong, that they host unsafe, harmful chemistry knowledge? What about SciHub? What about all the countless independent websites storing this (elementary, 1930's-era) harmful technical information—should we start doing DNS blocks, should we start seizing web servers, how are we to harmonize internet safety policy in a consistent way?
Because if your position is "we need to scrub Harmful Responses from the internet", you can't just leave it at LLM's and stop there. You need to have some plan to go all the way, or else you're doing something silly.
https://en.wikipedia.org/wiki/Sarin#Production_and_structure
(Tangential thought: assigning chemical weapons synthesis problems on exams would be a clever way for chemistry professors, at this moment, to weed out LLM cheaters from their course).