Anthropic's Claude is said to improve on ChatGPT, but still has limitations

lukeplato · on Jan 10, 2023

Disclaimer: I haven't finished reading the paper [0]

Based on the ability to get around ChatGPT's self-censoring functionality, it's unlikely the model itself was being retrained to avoid certain outputs but there was some other model used to identify inappropriate prompts. The approach from Anthropic instead seems to change the model's latent space to prefer outputs aligned with its constitution.

It's likely that OpenAI will incorporate this approach of using human oversight to train a constitutional supervised model to then train the chat model with ‘RL from AI Feedback’ (RLAIF). OpenAI's current RLHF approach seems related to improving quality of outputs while this is more related to its self-censorship (which didn't work well). I suppose this might be why they haven't released the constitution since it might be serving as some kind of moat (or the principles may be differentiable?). It's still not clear to me how a constitution can impact hallucinations.

[0] https://www.anthropic.com/constitutional.pdf

samwillis · on Jan 10, 2023

> Anthropic started with a list of around ten principles that, taken together, formed a sort of “constitution” (hence the name “constitutional AI”). The principles haven’t been made public, but Anthropic says they’re grounded in the concepts of beneficence (maximizing positive impact), nonmaleficence (avoiding giving harmful advice) and autonomy (respecting freedom of choice).

This is giving me very strong Asimov's "three laws of robotics" vibes:

First Law

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

Second Law

A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

Third Law

A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

powersnail · on Jan 10, 2023

What the three laws of robotics didn't predict, is how much our current AI is pure heuristics, and so it doesn't quite have the ability to strictly follow rules, or even interpret rules in an unambiguous manner.

Hard-coded behaviors cannot express the abstract ideas in those laws, while the ML part cannot be relied upon to accurately behave.

RangerScience · on Jan 10, 2023

Um, actually!

Asimov's theorized robots ("positronic brains") used potential-based computing (interestingly, AFAIK they predate digital computers; there's actually a few scenes in some stories where characters start using computers as New Shiny Thing) - so I'd actually argue that the original Three Laws are specifically for heuristic-based computing.

The three laws aren't really "strict", nor are they what we'd commonly consider "laws". An Asimovian robot (approximately, and IMO) doesn't think of things to do, and then discard the ones that don't match the laws; the three laws are the direct creative impetus for generating possible actions, each action "coming with" some % value in each law, and if the sum passes some threshold, the robot "decides" to do the action.

Basically every story in the "I, Robot" anthology is a story of debugging this system... which is essentially a heuristics system. One of the clearer one is the rover orbiting a hazard that it's supposed to investigate at a radius where the balance of the 2nd and 3rd laws even out.

Armchair AIist that I am... if I were to try to implement the Three Laws with current AI systems, I'd probably stick one system on at the "front" to add/modify/create/interpret prompts in a way that adds the three laws, and then stick one at the end to measure (and then filter on) how well the output adheres to them.

powersnail · on Jan 11, 2023

I'm more thinking of the much more basic kind of bugs.

AI might fail to follow rule 1, simply by misidentifying a human as something else, or misidentifying what is harmful. Like how a Tesla can crash into an obstacle, because it failed to recognize that there is an obstacle.

This isn't a criticism of Asimov, of course. I think we simply haven't solved many basic problems that he probably regarded as "surely by the time robots need rules to follow, these aren't going to be issues".

> I'd probably stick one system on at the "front" to add/modify/create/interpret prompts in a way that adds the three laws, and then stick one at the end to measure (and then filter on) how well the output adheres to them.

I bet that we are going to get some sort of end-to-end simulated robot QA system. Imagine pushing your code to Github, and in a few minutes, Travis CI emails you, "28/1000 tests failed. 32 cases of bodily harm against human were reported." What a world to live in!

RangerScience · on Jan 11, 2023

> CI

That'd be pretty cool, actually! A little... maybe ironic? How many stories are there about "they didn't stop to ask if they should, only if they could", and then... outsourcing that check to an AI system. Like I love it, but also, sus.

> bugs

Yep! I think positronic brains would notionally use a heuristics system ("human detector says 89%"), and AFAIK so do our AI systems. That said, your larger point still stands: what happens when such a system either fails, or is mis-calibrated, or is calibrated in a sus way?

(such as facial recognition, at first, not working on BIPOC faces... because the devs used themselves as the test subjects and weren't BIPOC)

I'm almost certain there's at least one story by Asimov on this issue; I know I've read other SciFi on this issue, although I think it's much more common to have the AIs act "better than the humans" rather than vice versa. Something like: "The rules you programmed in say Group X are humans even tho you don't treat them that way". Usually (I think?) when it's "Group X aren't humans by the programmed rules" it's apocalyptical because no-one / almost no-one fits.

I think you could probably write a cute short story about robots anthropomorphizing a lot of things in order to catch all the odd human edge-cases ("not all humans have a face").

gnramires · on Jan 10, 2023

That's a contradiction: you're saying that the AI cannot follow clear rules because it is a large heuristic, but then you talk about the limitations of hard-coded behaviors :)

I believe the ML part can be relied upon to accurately behave as long as it can exhibit logical thinking at all (which, at least to a significant degree, it seems to be able to) -- in principle it would seem like its ethics potential should be proportional to its general (linguistic) reasoning potential.

I think this is exactly the benefit we have: AIs are fuzzy (like humans are), so then can understand fuzzy laws (which seemed to be a huge problem in the early days). The problem is how to get them to incorporate those laws that they can surely understand into their motivation. This doesn't seem insurmountable to me: a parallel critic prompt "Does this question and answer follow the following ethical guidelines: ... ?" (or something more complex but largely equivalent), or maybe some other form of engineering the AI thinking and motivation to include abstract goal evaluation at some stage.

powersnail · on Jan 11, 2023

What I was trying to say, and perhaps failed to clearly convey, is that neither approach seems sufficient. What we want, is for robots to not harm people. We want robots to understand what this rule means semantically, and at the same time we want to hard code it into their behavior. And I'm saying that we don't know how to do that, yet. In other words, we haven't quite figure out how to hard-code a rule that involves a high-level, abstract idea.

And I'm not talking about logic or fuzziness here. I'm talking about something much more basic. For instance, do you have confidence that our AI can identify humans correctly? A robot can fail to follow rule 1, when it misidentifies a human as a walking cabbage. Do you have confidence that AI can identify "harm"?

I'm not saying that I don't like AI, or that they don't have benefits. I love robots and artificial intelligence, and I see plenty of practical benefits.

wwwtyro · on Jan 10, 2023

> or, through inaction, allow a human being to come to harm.

This has always felt like a gaping hole to me. It seems like to work it would have to a) always make perfect predictions of the future, and b) agree with relevant humans what "harm" is.

burkaman · on Jan 10, 2023

That's sort of the point of his stories. The laws do not work, and there is no set of laws that could work perfectly. Any set of rules as simple as this applied to something as complex as humanity will always have a mountain of loopholes.

gnramires · on Jan 10, 2023

I confess I haven't read all of his work on robots, but actually (perhaps contrary to popular belief) Asimov was hopeful on robots from the start. His very first short story[1] and one of the first stories about robots (after RUR, which is amusingly and maybe unsurprisingly both the story to coin the term Robot and already feature a robot uprising!): it's essentially a touching friendship story between Gloria and her robot Robbie.

I wouldn't say there's no AI risk, or risk from having robots and powerful intelligences, but he generally seemed to convey that we should be capable of making quite safe and quite friendly robots, given that we can make them at all. Which makes sense to me. The engineering of ethics in robots and AI doesn't seem too much more alien than the task of engineering AI itself (specially goal-oriented AI). It requires effort, attention to detail, and a progress in understanding morality and ethics that is going to be difficult for us, but it shouldn't be that fundamentally scary, as long as we collectively have the will to make them safe (in the spirit of the three laws).

I think Hollywood (in 'I, Robot' the movie) exaggerated his impression of AI danger (perhaps because of the 'Frankenstein complex'[2] Asimov coined), and the interest in catastrophe. I think 'The bicentennial man' is a complementary movie more in the hopeful spirit of Asimov.

[1] https://en.wikipedia.org/wiki/Robbie_(short_story)

Quote: " The story centers on the technophobia that surrounds robots, and how it is misplaced. Almost all previously published science fiction stories featuring robots followed the theme 'robot turns against creator'; Asimov has consistently held the belief that the Frankenstein complex was a misplaced fear, and the majority of his works attempted to provide examples of the help that robots could provide humanity. "

[2] https://en.wikipedia.org/wiki/Frankenstein_complex

thedorkknight · on Jan 10, 2023

If I recall correctly, the whole point was that aligned goals are super hard to do, and that this is what leads to the robots creating a secret robot illuminati that takes over world leadership without anyone realizing they're voting for robots, and then working to protect humans.

RangerScience · on Jan 10, 2023

Yup! Ironically, of all the things the movie did... poorly, the robot's realization of the "Zeroeth Law" (first law, but applied to "Humanity" at large) was (IMHO) canon.

ChrisClark · on Jan 10, 2023

Yeah, his stories were about those gaping holes and how things go horribly wrong because of them.

HDThoreaun · on Jan 11, 2023

Asimov was thinking about alignment before it was cool.

de_keyboard · on Jan 10, 2023

> First Law

> A robot may not injure a human being or, through inaction, allow a human being to come to harm.

Hmmm I wonder about trolley problems

kadoban · on Jan 10, 2023

You may already know, but _most_ of Asimov's stories with the three laws are heavily based on problems with said laws, and clever loopholes of various sorts.

So it's always hilarious when people use them unironically/uncritically in other contexts.

KRAKRISMOTT · on Jan 10, 2023

Maybe the ML model would start a market maker and exchange in the Bahamas too.

wwwtyro · on Jan 10, 2023

Is there any reason to believe that the first open source chat GPT clone won't consume most mindshare, a la stable diffusion?

antimatter15 · on Jan 10, 2023

There are already open source LLMs with comparable parameter counts (Facebook's OPT-175B, BLOOM), but you'll need ~10x A100 GPUs to run them (which would cost ~$100K+).

I suspect a big part of why stable diffusion managed to consume so much mindshare is that it can run on ordinary consumer hardware. On that point, I would be excited about an open-source RETRO (https://arxiv.org/pdf/2112.04426.pdf) model with comparable performance to GPT-3 that could run on consumer hardware with an NVMe SSD.

lagrange77 · on Jan 10, 2023

GPT-J-6B is said to be also of comparable performance in certain areas to GPT-3 and can be run (inference) on a RTX3090.

https://huggingface.co/EleutherAI/gpt-j-6B

Roark66 · on Jan 11, 2023

One can in theory run even 175B bloom with just a modern multicore CPU, 32gb of RAM and 2TB of nvme storage. There is a library called accelerate witch with some slight modifications allows one to run in cpu/storage only mode models that don't fit in memory. Of course it takes a looong time to do inference, but one can at least have a taste.

The biggest bloom I personally have run on cpu only in this fashion is 7B. It requires 4x7B of RAM plus some. On my hardware it tends to use all 32GB RAM and about ~4GB of storage during inference. At the moment I believe there is still a limitation of the smallest layer fitting in memory at once. This is why I haven't tried bigger bloom, but I believe there are ways to overcome it. Once this problem is resolved one should be able to use the same tech to use GPUs with less vram (like my 2070 with 8GB) for parts of larger models.

rafaelero · on Jan 10, 2023

They aren't as good as davinci-003, though. There is no open source model competitive with GPT-3.5 yet.

ShamelessC · on Jan 10, 2023

Deployment is the significant barrier with these models. They have far more parameters than stable diffusion/DALLE2.

adamsmith143 · on Jan 10, 2023

It all ends up on HuggingFace anyway so that's where the smart money is heading.

turmeric_root · on Jan 10, 2023

on top of the hardware requirements ($10s of thousands of GPUs are needed for something a language model as big as gpt-3) there's also a lot of work involved in RLHF models like chatgpt. you need to pay people to write and review thousands/tens of thousands of responses for training. see 'methods' here: https://openai.com/blog/chatgpt/

coolspot · on Jan 11, 2023

LAION’s Open Assistant trying to address that with gamified crowdsourcing.

https://github.com/LAION-AI/Open-Assistant

mouse_ · on Jan 10, 2023

My theory is that OpenAI is preying on venture capital, and they don't care who wins long term. As long as they're first to get the freshest ideas on the biggest computers, they can secure a large sum of money.

speed_spread · on Jan 10, 2023

Sounds like SNL's "First Change Bank" skit:

- You give us a dollar, we'll give you four quarters!

- People ask us how we make money. The answer is simple: _volume_.

13of40 · on Jan 10, 2023

I see what you're saying, but the truth is that for a moment there the bank had $2 on its books.

eddsh1994 · on Jan 10, 2023

And do what with it?

airstrike · on Jan 10, 2023

In a strange twist of events... their massive Series B round was led by SBF

"The [$580M] Series B follows the company raising $124 million in a Series A round in 2021. The Series B round was led by Sam Bankman-Fried, CEO of FTX. The round also included participation from Caroline Ellison, Jim McClave, Nishad Singh, Jaan Tallinn, and the Center for Emerging Risk Research (CERR)."

https://www.anthropic.com/news/announcement

mouse_ · on Jan 10, 2023

When someone pays you with stolen funds, aren't you (morally, if not legally) obliged to pay it back to the victims?

troad · on Jan 10, 2023

If it was a bona fide investment and you are without notice of any wrongdoing - no, I’d say you’re not. Presumably your nefarious investor holds some kind of ownership interest that can be sold by their trustees/liquidators to raise funds. If you are able to do so, it seems like a nice thing to try to help source money to help the people caught out, including by assisting in realising any ownership stake, but that’s a choice, not an obligation.

Edit: to avoid ambiguity, this response is written re morality, not legality.

jibe · on Jan 10, 2023

I’d say you’re not

If the investment was made with stolen funds they can be clawed back.

kurthr · on Jan 11, 2023

I guess the local grocery store better look out.

If he paid any taxes, can those be clawed back from the IRS too?

dredmorbius · on Jan 11, 2023

NB/OT: Late response on an earlier thread:

<https://news.ycombinator.com/item?id=34182022>

mshake2 · on Jan 10, 2023

I think the answer should be yes so that those taking investment should feel pressure to ensure that the money from investors is legitimate. Otherwise, it's going to be "whoops, we didn't know ;)" every time.

dragonwriter · on Jan 10, 2023

There seems to be a false dichotomy here between either “always treat the recipient as guilty” and “just let them claim ignorance without questioning it”.

mshake2 · on Jan 10, 2023

I don't think you should treat them as guilty if they took funds that were stolen, unless you knew for sure that they knew. Just that they should always have to give it back.

dragonwriter · on Jan 10, 2023

> When someone pays you with stolen funds, aren’t you (morally, if not legally) obliged to pay it back to the victims?

Legally, its complicated, and the reason for that is because it is viewed as both morally complicated (as well simplistic approaches being viewed as creating undesired social incentives.)

hahaxdxd123 · on Jan 10, 2023

Can you delve into the complexities? To me, if clawback were not available, it seems to create the perverse incentive to start a ponzi scheme and then put the money in a shielded "investment" in my friends' overvalued startup.

dragonwriter · on Jan 10, 2023

> Can you delve into the complexities?

Not at tolerable length for HN and do the subject justice. In very brief summary, the problem of how to balance the resolution of the interests of the two victims in this type of case, is a rather old one that has been recognized in the common law for quite a long time, leading to nuanced handling of different circumstances, which have themselves evolved over time, taking into account factors like the kind of property (real property having one general set of rules, personal property having another set of rules, but money and negotiable instrumenets having at times a different set of rules than personal property generally, etc.), the relations between the three parties, etc.

There’s been a lot of ink over the years written on the issue, though, “bona fide purchaser for value”,”innocent purchaser for value, “good-faith purchaser” are all terms for the issue. (Often, though, it will be written of in the context of a particular domain, e.g., real property transactions, commercial transactions under the UCC as compared to some particular preexisting law, etc., but it is all the same broad issue.)

JumpCrisscross · on Jan 10, 2023

> aren't you (morally, if not legally) obliged to pay it back to the victims?

OpenAI would be positively thrilled to give SBF back his money and cancel his shares. I’m not sure his creditors would similarly salivate at that deal. (EDIT: Nvm.)

axiom92 · on Jan 10, 2023

The parent is talking about Anthropic (https://www.anthropic.com/), not OpenAI.

troydavis · on Jan 11, 2023

Based on https://news.ycombinator.com/item?id=31209431#31210335, it's probably already been spent on compute to train this model. As the CEO commented in that thread, "It is in fact our intent to primarily spend the funds on compute."

KeriWarr · on Jan 11, 2023

As I understand it, funds were raised in ~April, and the sketchy FTX stuff happened between May and July. IMO this affects the moral calculus.

bsaul · on Jan 10, 2023

good question, but îd say doing so means the end of whole industries (luxury comes to mind)

claudiulodro · on Jan 10, 2023

Why do these companies keep naming these systems after real human names?! I thought I'd be pretty safe, but apparently no name is safe from being Alexa'd.

dmm · on Jan 10, 2023

Pretty good! This sounds like something a real person would say.

atdrummond · on Jan 10, 2023

Am I the only one who cannot see the embedded tweets? Further, they don’t even link to them elsewhere - you have to accept their cookies to view them. Extremely anti-user.