Disclaimer: I haven't finished reading the paper [0]
Based on the ability to get around ChatGPT's self-censoring functionality, it's unlikely the model itself was being retrained to avoid certain outputs but there was some other model used to identify inappropriate prompts. The approach from Anthropic instead seems to change the model's latent space to prefer outputs aligned with its constitution.
It's likely that OpenAI will incorporate this approach of using human oversight to train a constitutional supervised model to then train the chat model with ‘RL from AI Feedback’ (RLAIF). OpenAI's current RLHF approach seems related to improving quality of outputs while this is more related to its self-censorship (which didn't work well). I suppose this might be why they haven't released the constitution since it might be serving as some kind of moat (or the principles may be differentiable?). It's still not clear to me how a constitution can impact hallucinations.
> Anthropic started with a list of around ten principles that, taken together, formed a sort of “constitution” (hence the name “constitutional AI”). The principles haven’t been made public, but Anthropic says they’re grounded in the concepts of beneficence (maximizing positive impact), nonmaleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).
This is giving me very strong Asimov's "three laws of robotics" vibes:
First Law
A robot may not injure a human being or, through inaction, allow a human being to come to harm.
Second Law
A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
Third Law
A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
What the three laws of robotics didn't predict, is how much our current AI is pure heuristics, and so it doesn't quite have the ability to strictly follow rules, or even interpret rules in an unambiguous manner.
Hard-coded behaviors cannot express the abstract ideas in those laws, while the ML part cannot be relied upon to accurately behave.
Asimov's theorized robots ("positronic brains") used potential-based computing (interestingly, AFAIK they predate digital computers; there's actually a few scenes in some stories where characters start using computers as New Shiny Thing) - so I'd actually argue that the original Three Laws are specifically for heuristic-based computing.
The three laws aren't really "strict", nor are they what we'd commonly consider "laws". An Asimovian robot (approximately, and IMO) doesn't think of things to do, and then discard the ones that don't match the laws; the three laws are the direct creative impetus for generating possible actions, each action "coming with" some % value in each law, and if the sum passes some threshold, the robot "decides" to do the action.
Basically every story in the "I, Robot" anthology is a story of debugging this system... which is essentially a heuristics system. One of the clearer one is the rover orbiting a hazard that it's supposed to investigate at a radius where the balance of the 2nd and 3rd laws even out.
Armchair AIist that I am... if I were to try to implement the Three Laws with current AI systems, I'd probably stick one system on at the "front" to add/modify/create/interpret prompts in a way that adds the three laws, and then stick one at the end to measure (and then filter on) how well the output adheres to them.
I'm more thinking of the much more basic kind of bugs.
AI might fail to follow rule 1, simply by misidentifying a human as something else, or misidentifying what is harmful. Like how a Tesla can crash into an obstacle, because it failed to recognize that there is an obstacle.
This isn't a criticism of Asimov, of course. I think we simply haven't solved many basic problems that he probably regarded as "surely by the time robots need rules to follow, these aren't going to be issues".
> I'd probably stick one system on at the "front" to add/modify/create/interpret prompts in a way that adds the three laws, and then stick one at the end to measure (and then filter on) how well the output adheres to them.
I bet that we are going to get some sort of end-to-end simulated robot QA system. Imagine pushing your code to Github, and in a few minutes, Travis CI emails you, "28/1000 tests failed. 32 cases of bodily harm against human were reported." What a world to live in!
That'd be pretty cool, actually! A little... maybe ironic? How many stories are there about "they didn't stop to ask if they should, only if they could", and then... outsourcing that check to an AI system. Like I love it, but also, sus.
> bugs
Yep! I think positronic brains would notionally use a heuristics system ("human detector says 89%"), and AFAIK so do our AI systems. That said, your larger point still stands: what happens when such a system either fails, or is mis-calibrated, or is calibrated in a sus way?
(such as facial recognition, at first, not working on BIPOC faces... because the devs used themselves as the test subjects and weren't BIPOC)
I'm almost certain there's at least one story by Asimov on this issue; I know I've read other SciFi on this issue, although I think it's much more common to have the AIs act "better than the humans" rather than vice versa. Something like: "The rules you programmed in say Group X are humans even tho you don't treat them that way". Usually (I think?) when it's "Group X aren't humans by the programmed rules" it's apocalyptical because no-one / almost no-one fits.
I think you could probably write a cute short story about robots anthropomorphizing a lot of things in order to catch all the odd human edge-cases ("not all humans have a face").
That's a contradiction: you're saying that the AI cannot follow clear rules because it is a large heuristic, but then you talk about the limitations of hard-coded behaviors :)
I believe the ML part can be relied upon to accurately behave as long as it can exhibit logical thinking at all (which, at least to a significant degree, it seems to be able to) -- in principle it would seem like its ethics potential should be proportional to its general (linguistic) reasoning potential.
I think this is exactly the benefit we have: AIs are fuzzy (like humans are), so then can understand fuzzy laws (which seemed to be a huge problem in the early days). The problem is how to get them to incorporate those laws that they can surely understand into their motivation. This doesn't seem insurmountable to me: a parallel critic prompt "Does this question and answer follow the following ethical guidelines: ... ?" (or something more complex but largely equivalent), or maybe some other form of engineering the AI thinking and motivation to include abstract goal evaluation at some stage.
What I was trying to say, and perhaps failed to clearly convey, is that neither approach seems sufficient. What we want, is for robots to not harm people. We want robots to understand what this rule means semantically, and at the same time we want to hard code it into their behavior. And I'm saying that we don't know how to do that, yet. In other words, we haven't quite figure out how to hard-code a rule that involves a high-level, abstract idea.
And I'm not talking about logic or fuzziness here. I'm talking about something much more basic. For instance, do you have confidence that our AI can identify humans correctly? A robot can fail to follow rule 1, when it misidentifies a human as a walking cabbage. Do you have confidence that AI can identify "harm"?
I'm not saying that I don't like AI, or that they don't have benefits. I love robots and artificial intelligence, and I see plenty of practical benefits.
> or, through inaction, allow a human being to come to harm.
This has always felt like a gaping hole to me. It seems like to work it would have to a) always make perfect predictions of the future, and b) agree with relevant humans what "harm" is.
That's sort of the point of his stories. The laws do not work, and there is no set of laws that could work perfectly. Any set of rules as simple as this applied to something as complex as humanity will always have a mountain of loopholes.
I confess I haven't read all of his work on robots, but actually (perhaps contrary to popular belief) Asimov was hopeful on robots from the start. His very first short story[1] and one of the first stories about robots (after RUR, which is amusingly and maybe unsurprisingly both the story to coin the term Robot and already feature a robot uprising!): it's essentially a touching friendship story between Gloria and her robot Robbie.
I wouldn't say there's no AI risk, or risk from having robots and powerful intelligences, but he generally seemed to convey that we should be capable of making quite safe and quite friendly robots, given that we can make them at all. Which makes sense to me. The engineering of ethics in robots and AI doesn't seem too much more alien than the task of engineering AI itself (specially goal-oriented AI). It requires effort, attention to detail, and a progress in understanding morality and ethics that is going to be difficult for us, but it shouldn't be that fundamentally scary, as long as we collectively have the will to make them safe (in the spirit of the three laws).
I think Hollywood (in 'I, Robot' the movie) exaggerated his impression of AI danger (perhaps because of the 'Frankenstein complex'[2] Asimov coined), and the interest in catastrophe. I think 'The bicentennial man' is a complementary movie more in the hopeful spirit of Asimov.
Quote: " The story centers on the technophobia that surrounds robots, and how it is misplaced. Almost all previously published science fiction stories featuring robots followed the theme 'robot turns against creator'; Asimov has consistently held the belief that the Frankenstein complex was a misplaced fear, and the majority of his works attempted to provide examples of the help that robots could provide humanity. "
If I recall correctly, the whole point was that aligned goals are super hard to do, and that this is what leads to the robots creating a secret robot illuminati that takes over world leadership without anyone realizing they're voting for robots, and then working to protect humans.
Yup! Ironically, of all the things the movie did... poorly, the robot's realization of the "Zeroeth Law" (first law, but applied to "Humanity" at large) was (IMHO) canon.
You may already know, but _most_ of Asimov's stories with the three laws are heavily based on problems with said laws, and clever loopholes of various sorts.
So it's always hilarious when people use them unironically/uncritically in other contexts.
There are already open source LLMs with comparable parameter counts (Facebook's OPT-175B, BLOOM), but you'll need ~10x A100 GPUs to run them (which would cost ~$100K+).
I suspect a big part of why stable diffusion managed to consume so much mindshare is that it can run on ordinary consumer hardware. On that point, I would be excited about an open-source RETRO (https://arxiv.org/pdf/2112.04426.pdf) model with comparable performance to GPT-3 that could run on consumer hardware with an NVMe SSD.
One can in theory run even 175B bloom with just a modern multicore CPU, 32gb of RAM and 2TB of nvme storage. There is a library called accelerate witch with some slight modifications allows one to run in cpu/storage only mode models that don't fit in memory. Of course it takes a looong time to do inference, but one can at least have a taste.
The biggest bloom I personally have run on cpu only in this fashion is 7B. It requires 4x7B of RAM plus some. On my hardware it tends to use all 32GB RAM and about ~4GB of storage during inference. At the moment I believe there is still a limitation of the smallest layer fitting in memory at once. This is why I haven't tried bigger bloom, but I believe there are ways to overcome it. Once this problem is resolved one should be able to use the same tech to use GPUs with less vram (like my 2070 with 8GB) for parts of larger models.
on top of the hardware requirements ($10s of thousands of GPUs are needed for something a language model as big as gpt-3) there's also a lot of work involved in RLHF models like chatgpt. you need to pay people to write and review thousands/tens of thousands of responses for training. see 'methods' here: https://openai.com/blog/chatgpt/
My theory is that OpenAI is preying on venture capital, and they don't care who wins long term. As long as they're first to get the freshest ideas on the biggest computers, they can secure a large sum of money.
In a strange twist of events... their massive Series B round was led by SBF
"The [$580M] Series B follows the company raising $124 million in a Series A round in 2021. The Series B round was led by Sam Bankman-Fried, CEO of FTX. The round also included participation from Caroline Ellison, Jim McClave, Nishad Singh, Jaan Tallinn, and the Center for Emerging Risk Research (CERR)."
If it was a bona fide investment and you are without notice of any wrongdoing - no, I’d say you’re not. Presumably your nefarious investor holds some kind of ownership interest that can be sold by their trustees/liquidators to raise funds. If you are able to do so, it seems like a nice thing to try to help source money to help the people caught out, including by assisting in realising any ownership stake, but that’s a choice, not an obligation.
Edit: to avoid ambiguity, this response is written re morality, not legality.
I think the answer should be yes so that those taking investment should feel pressure to ensure that the money from investors is legitimate. Otherwise, it's going to be "whoops, we didn't know ;)" every time.
There seems to be a false dichotomy here between either “always treat the recipient as guilty” and “just let them claim ignorance without questioning it”.
I don't think you should treat them as guilty if they took funds that were stolen, unless you knew for sure that they knew. Just that they should always have to give it back.
> When someone pays you with stolen funds, aren’t you (morally, if not legally) obliged to pay it back to the victims?
Legally, its complicated, and the reason for that is because it is viewed as both morally complicated (as well simplistic approaches being viewed as creating undesired social incentives.)
Can you delve into the complexities? To me, if clawback were not available, it seems to create the perverse incentive to start a ponzi scheme and then put the money in a shielded "investment" in my friends' overvalued startup.
Not at tolerable length for HN and do the subject justice. In very brief summary, the problem of how to balance the resolution of the interests of the two victims in this type of case, is a rather old one that has been recognized in the common law for quite a long time, leading to nuanced handling of different circumstances, which have themselves evolved over time, taking into account factors like the kind of property (real property having one general set of rules, personal property having another set of rules, but money and negotiable instrumenets having at times a different set of rules than personal property generally, etc.), the relations between the three parties, etc.
There’s been a lot of ink over the years written on the issue, though, “bona fide purchaser for value”,”innocent purchaser for value, “good-faith purchaser” are all terms for the issue. (Often, though, it will be written of in the context of a particular domain, e.g., real property transactions, commercial transactions under the UCC as compared to some particular preexisting law, etc., but it is all the same broad issue.)
> aren't you (morally, if not legally) obliged to pay it back to the victims?
OpenAI would be positively thrilled to give SBF back his money and cancel his shares. I’m not sure his creditors would similarly salivate at that deal. (EDIT: Nvm.)
Based on https://news.ycombinator.com/item?id=31209431#31210335, it's probably already been spent on compute to train this model. As the CEO commented in that thread, "It is in fact our intent to primarily spend the funds on compute."
Why do these companies keep naming these systems after real human names?! I thought I'd be pretty safe, but apparently no name is safe from being Alexa'd.
Am I the only one who cannot see the embedded tweets? Further, they don’t even link to them elsewhere - you have to accept their cookies to view them. Extremely anti-user.
Based on the ability to get around ChatGPT's self-censoring functionality, it's unlikely the model itself was being retrained to avoid certain outputs but there was some other model used to identify inappropriate prompts. The approach from Anthropic instead seems to change the model's latent space to prefer outputs aligned with its constitution.
It's likely that OpenAI will incorporate this approach of using human oversight to train a constitutional supervised model to then train the chat model with ‘RL from AI Feedback’ (RLAIF). OpenAI's current RLHF approach seems related to improving quality of outputs while this is more related to its self-censorship (which didn't work well). I suppose this might be why they haven't released the constitution since it might be serving as some kind of moat (or the principles may be differentiable?). It's still not clear to me how a constitution can impact hallucinations.
[0] https://www.anthropic.com/constitutional.pdf