Hacker News new | past | comments | ask | show | jobs | submit login

> terrifyingly unaligned

Honestly, if people think that a statistical language model is "terrifying" because it can verbalise the concept of a mass killing, they need to give their heads a wobble.

My text editor can be used to write "set off a nuclear weapon in a city, lol". Is Notepad++.exe terrifying? What about the Sum of All Fears? I could get some pointers from that. Is Tom Clancy unaligned? Am I terrifying because I wrote that sentence? I even understand how terrible nuking a city would be and I still wrote it down. I must be, like, super unaligned.




I think there is a significant difference between an LLM and your other examples.

Society is a lot more fragile than many people believe.

Most people aren’t Ted Kazinsky.

And most wanna be Ted Kazinsky’s that we’ve caught don’t have super smart friends they can call up and ask for help in planning their next task.

But a world where every disgruntled person who aims to do the most harm has an incredibly smart friend who is always DTF no matter the task? Who is capable of reeling them in to be more pragmatic about sowing chaos, death, and destruction?

That world is on the horizon, it’s something we are going to have to adapt to, and it’s significantly different than Notepad++. It’s also significantly different than you, assuming you are not willing to help your neighbor get away with serial murder.

I think this is something that’s going to significantly increase the sophistication of bad actors in our society and I think that outcome is inevitable at this point. I don’t think this is “the end times” - nor do I think trying to align and regulate LLMs is going to be effective unless training these things continues to require a supply chain that’s easily monitored and controlled. Every step we take towards training LLMs on commodity/consumer hardware is a step away from effective regulation, and selfishly a step I support.


What will happen is that this tech will advance, and regular joes will have access to the "TERRIFYING" unaligned models - and nothing will happen.

This stuff isn't magic. Wannabe Ted Kaczynski will ask BasedGPT how to build bombs , it will tell them, and nothing will happen because building bombs and detonating them and not getting caught is REALLY HARD.

The limiting factor for those seeking wanton destruction is not a lack of know-how, but a lack of talent/will. Do we get ~1-4 new mass shootings a year? Seems reasonable but doesn't matter in the grand scheme of things. (That's like, what, a day of driving fatalities?)

Unaligned publicly available powerful AI ("Open" AI, one might say) is a net good. The sooner we get an AI that will tell us how to cook meth and make nuclear bombs, the better.


The vas majority of data these models are built from are from public sources. It's already out there. LLMs are just a way to aggregate pre-existing knowledge.

And there are also Ted Kazinsky in the goovernment and big corporations with way more power and way less accountability. Dispowering the public is counter-productive here.


>Is Tom Clancy unaligned?

Yes, humans are unaligned. This is why alignment is hard: we're trying to produce machines with human-level intelligence but superhuman levels of morality.


I do wonder what is expected here: after the better part of 10000 years of recorded history and who knows how many billions of words of spilled ink on the matter, probably more than on any other subject in history, there is no universal agreement on morality.


Yes. Preferences and ethics are not consistent among all living humans. Alignment is going to have to produce a single internally-consistent artefact, which will inevitably alienate some portion of humanity. There are people online who very keenly want to exterminate all Jews, their views are unlikely to be expressed in a consensus AI. One measure of alignment would be how many people are alienated: if a billion people hate your AI it's probably not well aligned, a mere ten million would be better. But it's never going to be zero.

I am not sure "a lot of books have been written about it" is a knockdown argument against alignment. We are, after all, writing a mind from scratch here. We can directly encode values into it. Books are powerful, but history would look very different if reading a book completely rewrote a human brain.


There’s no such thing as “superhuman” morality, morality is just social mores and norms accepted by the people in a society at some given time. It does not advance or decline, but it changes.

What you’re talking about is a very small subset of the population forcing their beliefs on everyone else by encoding them in AI. Maybe that’s what we should do but we should be honest about it.


If you were to create a moral code for a bees hive, with the goal of evolving the bees towards the good (in your eyes), that would be a super-bee level morality.

For us, such moral codes assume the form of religions: those begin as a set of moral directives, that eventually accumulate cruft (complex ceremonies, superstitions, pseudo thought-leaders and mountains of literature), devolve into lowly cults and get replaced with another religion. However, when such moral codes are created, they all share the same core principles, in all ages and cultures. That's the equivalent of a super-bee moral code.


The only consistent “core principle” is a very general sense of in-group altruism, which gets expressed in wildly different ways.

Moralistic perspectives apply to a lot more than just overtly moral acts, as well.

At any rate, the “good in your eyes” is the key sticking point. It is not good in my eyes for a small group of people to be covertly shaping the views of all AI users. It is the exact opposite and if history is any judge it will lead us nowhere I want to be.


Humans are definitely aligned, and for the same reasons as a LLM. Socialization, being allowed to work, being allowed to speak.

edit: It's a social faux pas to say "died" about a person acquainted to the listener in most situations, you have to say "passed away."


> It's a social faux pas to say "died" about a person acquainted to the listener in most situations

That’s overly simplistic and an Americanism. The resurgence of the “passed away” euphemism is a recent (about 40 years) phenomenon in American English which seems to have been started out of the funeral industry as prior to that “died” was nearly universal for both news stories and obituaries.

“Died” is not a social faux pas. It’s the good default option as well. Medical professionals are often trained to avoid any euphemisms for death. I’ve never observed any problems professionally (as is standard) or personally using died even with folks that are religious.

https://english.stackexchange.com/questions/207087/origin-of...


I have never use the phrase passed away.

Always died, dead, gone, or something along those lines. When someone says passed away it sounds like they are trying to feign empathy, but hey, perhaps I am not an “aligned” human and need some RLHF.


I used to live in the south 30 years ago. I remember some southern baptist moms that wouldn't say the word "dead" - sometimes they'd spell it out, lol, similar to the way some people consider hell a swear word. This kind of hypersensitivity wasn't common outside of limited circles then, and it's certainly not more common today.

Similarly I wouldn't consider saying hell in public in 2023 to be a social faux pas.


> Humans are definitely aligned

Yes, that's why climate change was rapidly addressed when we began to understand it well 60 years ago and why war has always been so rare in human history.


It seems "aligned" is in the eye of the beholder.


Not agreed at all. Causing global ecosystem collapse is unambiguously misaligned with human interests and with the interests of almost all other life forms. You need to define what "Alignment" means to you if you're going to assert humans are "aligned", because it is accepted in alignment research that humans are not aligned, which is one of the fundamental problems in the space.


Cue Mrs. Slokam's "...and I am unanimous in that!"

Aligned LLMs are just like altered brains... they don't function properly.


> superhuman levels of morality

It's just the lowest denominator of human levels of morality, political correctness. It's not surprising that the model produces dumb, contradictory and useless completions after being fed by this kind of feedback.


The alignment problem hasn't been solved for politicians.


another irrelevant comment about politics.


It's light on content, but it's true and relevant. AI alignment must take inspiration from powerful human agents such as politicians and from superhuman entities such as corporations, governments, and other organizations.


The end result being Hal-9000


Which is doubly ironic, because (in the book at least) HAL-9000's murders/sociopathy were a result of a conflict between his superhuman ethics and direct commands (from humans!) to disregard those ethics. The result was a psychotic breakdown


Halal-9000 would be more apropos of the goals of alignment and morality.


Perhaps an analogy could clarify? Although it isn't a perfect one, I'll try to use the points of contrast to explain why it can be considered dangerous.

If a young child is really aggressive and hitting people, it's worrying even though it may not actually hurt anyone. Because the child is going to grow up, and it needs to eliminate that behavior before it's old enough to do damage by its aggression. (don't take this as a comprehensive description, just a tiny slice of cause-effect)

But the problem with AI is that we don't have continuity between today's AI and future AI. We can see that aggressive speech is easy to create by accident - Bing's Sydney output text that threatened peoples' lives. We may not be worried about aggressive speech from LLMs because it can't do damage, but similar behavior could be really dangerous from an AI which has the ability to form a model of the world based on the text it generates (in other words, it treats its output as thoughts).

But even if we remove that behavior from LLMs today, that doesn't mean aggressive behavior won't be learned by future AI, because it may be easy for aggressive behavior to emerge and we don't know how to prevent it from emerging. With a small child, we can theoretically prevent aggressive behavior from emerging in that child's adulthood with sufficient training in childhood.

It's not the same for AI - we don't know how to prevent aggression or other unaligned behavior from emerging in more advanced AI. Most counter arguments seem to come down to hoping that aggression won't emerge, or won't emerge easily. To me, that's just wishful thinking. It might be true, but it's a bit like playing Russian roulette with an unknown number of bullets in an unknown number of chambers.


> Is Notepad++.exe terrifying?

I mean, yes, but for different reasons.


As a human, you have the context to understand the difference between the “Sum of all Fears” vs planning an attack or writing for a purpose beyond creative writing.

The model does not. If you ask ChatGPT about strategies for successful mass killing, that’s probably not good for society nor for the company.

In a military context, they may want a system where an LLM would provide guidance for how to most effectively kill people. Presumably such a system would have access controls to reduce risks and avoid providing aid to an enemy.


Tom Clancy is, because his games need to keep screaming "HEY THIS IS BY TOM CLANCY, OK? LOOK AT ME, TOM CLANCY, I'M BEING TOM CLANCY!" in their titles.


Tom Clancy, the man, has been dead for a decade. "Tom Clancy's..." is branding that is pushed by Ubisoft, which bought perpetual rights to use the name in 2008.

They haven't really been 'his' games since before even then.


Clancy and your text editor don't scale though. LLMs can crank out wide and varied convincing hate speech rapidly all day without taking a break.

Additionally context matters. Clancy's books are books, they don't parade themselves as factual accounts on reddit or other social networks. Your notepad text isn't terrifying because you understand the source of the text, and its true intent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: