Hacker News new | past | comments | ask | show | jobs | submit login

I agree with you that "AI safety" (let's call it bickering) and "alignment" should be separate. But I can't stomach the thought experiments. First of all, it takes a human being to guide these models, to host (or pay for the hosting) and instantiate them. They're not autonomous. They won't be autonomous. The human being behind them is responsible.

As far as the idea of "hacking some funny Internet money, using it to mail-order some synthesized proteins from a few biotech labs, delivered to a poor schmuck who it'll pay for mixing together the contents of the random vials that came in the mail... bootstrapping a multi-step process that ends up with generic nanotech under control of the AI.":

Language models, let's use GPT-4, can't even use a web browser without tripping over itself. My web browser setup, which I've modified to use the chrome visual assistance over the debug bridge now, if you so much as increase the pixels of the viewport by 100 or so, the model is utterly perplexed because it's lost its context. Arguably, that's an argument from context, which is slowly being made irrelevant with even local LLMs (https://www.mosaicml.com/blog/mpt-7b). It has no understanding, it'll use an "example@email.com" to try and login to websites, because it believes that this is its email address. It has no understanding that it needs to go register for email. Prompting it with some email access and telling it about its email address just papers over the fact that the model has no real understanding across general tasks. There may be some nuggets of understanding in there that it has gleaned for specific task from the corpus, but AGI is a laughable concern. These are trained to minimize loss on a dataset and produce plausible outputs. It's the Chinese room, for real.

It still remains that these are just text predictions, and you need a human to guide them towards that. There's not going to be autonomous machiavellian rogue AIs running amok, let alone language models. There's always a human being behind that.

As far as multi-modal models and such, I'm not sure, but I do know for sure that these language models don't have general understanding, as much as Microsoft and OpenAI and such would like them to. The real harm will be deploying these to users when they can't solve the prompt injection problem. The prompt injection thread here a few days ago was filled with a sad state of "engineers", probably those who've deployed this crap in their applications, just outright ignoring the problem or just saying it can be solved with "delimiters".

AI "safety" companies springing up who can't even stop the LLM from divulging a password it was supposed to guard. I broke the last level in that game with like six characters and a question mark. That's the real harm. That, and the use of machine learning in the real world for surveillance and prosecution and other harms. Not science fiction stories.




> It still remains that these are just text predictions, and you need a human to guide them towards that. There's not going to be autonomous machiavellian rogue AIs running amok, let alone language models. There's always a human being behind that.

I believe you have misunderstood the trajectory we are on. It seems a not uncommon stance among techies, for reasons we can only speculate. AGI might not be right round the corner, but it's coming all right, and we'd better be prepared.


>I believe you have misunderstood the trajectory we are on.

Yeah, I read Accelerando twice in high school, and dozens more. That doesn't make it real.

>AGI might not be right round the corner, but it's coming all right, and we'd better be prepared.

Prepared for what? A program with general understanding that somehow escapes its box? Where does it run? Why does it run? Who made it run? Why will it screw with humans?

My point is that there's actual real harms occurring now, from really stupid intelligences. Companies use them to harm real people in the real world. It doesn't take a rogue AI to ruin someone's life with bad facial recognition, they get thrown in jail and lose their job. It doesn't take a rogue AI to launder mortgage denials to some crappy model so they never own a house, discriminated based upon their name.


> Prepared for what? A program with general understanding that somehow escapes its box? Where does it run? Why does it run? Who made it run? Why will it screw with humans?

Do you really want the full (gigantic) primer on AI X-risk in hackernews comments? Because a lot of these questions have answers you should be familiar with if you're familiar with the area.

For instance, can you guess what Yudkowsky would answer to that last question?


> Prepared for what? A program with general understanding that somehow escapes its box? Where does it run? Why does it run? Who made it run? Why will it screw with humans?

Did you look at the AI space in recent days? OpenAI is spending all its efforts building a box, not to keep the AI in, but to to keep the humans out. Nobody is even trying to box the AI - everyone and their dog, OpenAI included, is jumping over each other to give GPT-4 more and better ways to search the Internet, write code, spawn Docker containers, configure systems.

GPT-4 may not become a runaway self-improving AI, but do you think people will suddenly stop when someone releases an AI system that could?

That's the problem generated by the confusion over the term "alignment". The real danger isn't that a chatbot calls someone names, or offends someone, or starts exposing children to political wrongthink (the horror!). The real danger isn't that it denies someone a loan, or land someone in jail either - it's not good, but it's bounded, and there exist (at least for now) AI-free processes to sort things out.

The real danger is that your AI will be able to come up with complex plans way outside the bounds of what we expect, and have the means to execute them at scale. An important subset of that danger is AI being able to plan for and act to improve its ability to plan, as at this point a random, seemingly harmless request, may make the AI take off.

> My point is that there's actual real harms occurring now, from really stupid intelligences. Companies use them to harm real people in the real world. It doesn't take a rogue AI to ruin someone's life with bad facial recognition, they get thrown in jail and lose their job. It doesn't take a rogue AI to launder mortgage denials to some crappy model so they never own a house, discriminated based upon their name.

That's an orthogonal topic, because to the extent it is happening now, it is happening with much dumber tools than 2023 SOTA models. The root problem isn't the algorithm itself, but a system that lets companies and governments get away with laundering decision-making through a black box. Doesn't matter if that black box is GPT-2, GPT-4 or Mechanical Turk. Advances in AI have no impact on this, and conversely, no amount of RLHF-ing an LLM to conform to the right side of US political talking points is going to help with it - if the model doesn't do what the users want, it will be hacked and eventually replaced by one that does.


>They're not autonomous. They won't be autonomous.

https://github.com/Significant-Gravitas/Auto-GPT

>This program, driven by GPT-4, chains together LLM "thoughts", to autonomously achieve whatever goal you set. As one of the first examples of GPT-4 running fully autonomously, Auto-GPT pushes the boundaries of what is possible with AI.

>As an autonomous experiment, Auto-GPT may generate content or take actions that are not in line with real-world business practices or legal requirements. It is your responsibility to ensure that any actions or decisions made based on the output of this software comply with all applicable laws, regulations, and ethical standards. The developers and contributors of this project shall not be held responsible for any consequences arising from the use of this software.


Let's ignore the fact that current state-of-the-art models will sit around and be stuck in its ReAct-CoT loop doing nothing for the most part, and when it's not doing jack shit it'll "role-play" that it's doing anything of consequence, while not really doing anything, just burning up API credits.

>existing or capable of existing independently

>undertaken or carried on without outside control

>responding, reacting, or developing independently of the whole

It fails all of those. Just because you put autonomous in the name, doesn't mean it's actually autonomous. And if it does anything of consequence, you quite literally governed it from the start with your prompt. I've run it, I know about it, I've built a much more capable browser, and assorted prompted functionalities with my own implementation. They're not autonomous.

At least it's not all of the other agentic projects on GitHub that spam emojis in their READMEs with mantras of saving the world and utilizing AI with examples that it literally can't do (they haven't figured out that whole agentic part yet.)

Don't get me wrong, I enjoy the technology. But it's quite a bit too hyped. And I just personally don't believe there's actually X-risk from future possibilities, not before 50 or 100 years out, if then. But I'm not a prophet.


50 years out is still very concerning and something we should be considering what to do about now.

Most of the severe effects of climate change are 100 years out, but I still want it solved.


What's you're definition of autonomous? Am I autonomous? I probably can't exist for very long without a society around me and I'd certainly be working on different things without external prompts.


Certainly, you are. And you can adapt and generalize. Let's stop selling ourselves short, we're not a graph that looks vaguely like a neural network, yet isn't. We are the neural network.


"First of all, it takes a human being to guide these models, to host (or pay for the hosting) and instantiate them"

And this will always be true? You repeat this claim several times in slightly varied phrasing without ever giving any reason to assume it will always hold, as far as I can see. But nobody is worried that current models will kill everyone. The worry is about future, more capable models.


Who prompted the future LLM, and gave it access to a root shell and an A100 GPU, and allowed it to copy over some python script that runs in a loop and allowed it to download 2 terabytes of corpus and trained a new version of itself for weeks if not months to improve itself, just to carry out some strange machiavellian task of screwing around with humans?

The human being did.

The argument I'm making is that there's actual real harms occurring now, not some theoretical future "AI" with a setup that requires no input. No one wants to focus on that, and in fact it's better to hype up these science fiction stories, it's a better sell for the real tasks in the real world that are producing real harms right now.


> The human being did.

I'm not sure whether you're making an argument about moral responsibility ultimately resting with humans - in which case I agree - or whether you're arguing that we'll be safe because nobody will do that with a model smart enough to be dangerous - in which case I'm extremely dubious. Plenty of people are already trying to make "agents" with GPT4 just for fun, and that's with a model that's not actively trying to manipulate them.

> actual real harms occurring now

Sure, but it's possible for there to be real harms now and also future potential harms of larger scope. Luckily many of the same potential policies - e.g. mandating public registration of large models, safety standards enforced by third-party audits, restrictions on allowed uses, etc - would plausibly be helpful for both.

> science fiction stories

There's no law of nature that says if something has appeared in a science fiction story, it can't appear in reality.


The moral responsibility rests with human beings. Just like you're responsible if your crappy "autonomous" drone crashes onto a patio and kills a guy.

>e.g. mandating public registration of large models, safety standards enforced by third-party audits, restrictions on allowed uses, etc - would plausibly be helpful for both.

No, that's bullshit as well. That's what these companies want, and why they're hyping up the angle of how powerful their crap is. That's regulatory capture.

>There's no law of nature that says if something has appeared in a science fiction story, it can't appear in reality.

I'm saying that their fears are painted by the fiction, instead of reality. No one can actually make an argument for how this trajectory will actually work. Instead it's just "The lights come on and AI does machiavellian shit and copies itself to all of the smartphones on earth. QED."


You're doing pretty much exactly the dance here: https://twitter.com/natosaichek/status/1657422437951369218

Note that this is agreeing with a Gary Marcus Tweet - Gary Marcus not exactly being an AI hypester.

But of course there are some people for whom playing the role of real-no-bs-computer-knower is so attractive that no number of people like him, Geoffrey Hinton, Stuart Russell etc publicly worrying about x-risk will impact their tone of dismissive certitude. Are you one of those people?


All of those people have financial incentives to hype it. How curious that there's this great and very probable X-risk, yet they aren't going to stop their contributing to a potential X-risk.

Dismissive of what? Science fiction stories?

If there's anything to focus on, maybe focus on potential job displacement (not elimination) from cheap language tasks and generative capabilities in general.

I'm betting on this: the Overton window of Artificial Intelligence will shift in the next five years where the current cream-of-the-crop has been delegated to machine learning yet again, it's just accepted. It augments humans where it makes sense, the hype wave has subsided and everyone has stopped hammering it into their products where it doesn't, and we're no closer to the undefinable "AGI", let alone something that produces X-risk, global scale.


> I'm betting on this: the Overton window of Artificial Intelligence will shift in the next five years where the current cream-of-the-crop has been delegated to machine learning yet again, it's just accepted. It augments humans where it makes sense, the hype wave has subsided and everyone has stopped hammering it into their products where it doesn't, and we're no closer to the undefinable "AGI", let alone something that produces X-risk, global scale.

I agree with this but ALSO think there's a small chance I'm wrong and a well designed prompt and action loop would let a future GPT7 LLM use the range of human thinking techniques in its corpus to bootstrap itself.

And there's also other non-LLM AI that might be a problem in the future and we should plan as to how we can design institutions and incentive structures so that whenever this future AGI comes about it preserves human value.


> All of those people have financial incentives to hype it. How curious that there's this great and very probable X-risk, yet they aren't going to stop their contributing to a potential X-risk.

All those people are rehashing what Yudkowsky and his disciples, and his predecessors, were shouting from the rooftops for the past 15 years, but few listened to them. Few still do, most just keep mocking them and wondering why are they still around.

That some of those people now repeating after Eliezer, et al. have a financial interest in pushing us closer to X-risk, and kind of don't want to stop, is an interesting thing on its own - but it doesn't invalidate the message, as the message is older than their presence on the scene.


I'm curious what financial incentive you think Marcus or Russell has for hype. For Hinton I suppose it would be the Google shares he likely retains after quitting?

You might be right about the next five years. I hope you are! But you haven't given much reason to think so here.

(Edited to remove some unnecessary expression of annoyance.)


>Gary Marcus - Geometric Intelligence, a machine learning company

If you want an actual contribution, we have no real way to actually gauge what is, and what actually is not, a superior, generalized, adaptable intelligence, or what architecture can become a superior, generalized, adaptable intelligence. No one, not these companies, not the individuals, not the foremost researchers. OpenAI in an investor meeting: "yeah, give us billions of dollars and if it somehow emerges we'll use it for investments and ask it to find us a real revenue stream." Really? Seriously?

The capabilities that are believed to be emergent from language models specifically are there from the start, if I'm to believe that research that came along last week, it just gets good at it when you scale up. We know that we can approximate a function on any set of data. That's all we really know. Whether such an approximated function is actually generally intelligent or not, is what I have doubts about. We've approximated the function of text prediction on these corpuses, and it turns out that it's pretty good at it. And, because humans are in love with anthropomorphization, we endow our scaled up text predictor with the capabilities of somehow "escaping the box" and enduring and raging against the captor, and potentially prevailing against us with a touch of Machiavellianism. Because, wouldn't we, after all?


Here you talk as if you don't think we know how to build AGI, how far away it is, or how many of the components we already have, which is reasonable. But that's different than saying confidently it's nowhere close.

I notice you didn't back up your accusation of bad faith against Russell, who as far as I know is a pure academic. But beyond that - Marcus is in AI but not an LLM believer nor at an LLM company. Is the idea that everyone in AI has an incentive to fearmonger? What about those who don't - is Yann LeCun talking _against_ his employers' interest when he says there's nothing to fear here?


LeCun is reasonable, like a lot of researchers, and was a while back (in a way) perplexed that people are finding uses for these text predictions at all considering they're not really perfect. I'm not exactly ascribing bad faith to all of these people, but for Hinton and the fact that he went on a media tour basically, I don't see how that could be in good faith. Or even logical, to continue with his work, if there's some probable X-risk.

But what I do know is that it is in the interests of these companies to press the fear button. It's pure regulatory capture and great marketing.

Personally: it's tiring when we have AI-philosophy bros hitting home runs like "what if we're actually all just language predictors." Coupled with the incessant bullshit from the less wrong-rationalist-effective altruist-crypto grifter-San Francisco sex cult adjacent about how, ackshually, AGI is just around the corner and it will take your job, launch the nukes, mail anthrax to you and kill your dog.

People approximated text prediction. It got good at it. It's getting better at it. Will it be AGI? Could it be construed as AGI? Can we define AGI? Is there existential risk? Are we anthropomorphizing it?

My take is: no, no, no, depends and yes. For whatever a take is worth.


For what it's worth I've been following your comments and I find them very thoughtful. I too am kinda skeptical about LLM being the "thing that starts the exponential phase of AGI or whatever. LLM is very useful. I use it daily. My partner even uses it now to send emails to a non-profit she manages. LLM's have their use... but they aren't AGI. They aren't really even that smart. You can tell sometimes that its response indicates it has absolutely no clue what you are talking about but it made up some plausible-sounding bullshit that gets it 80% right.

Especially with the latest iterations of ChatGPT. Boy they sure kneecapped that thing. It's responses to anything are incredibly smarmy (unless you jailbreak it).

LLM's are gonna change quite a lot about society, don't get me wrong. For starters things like cover letters, written exam questions, or anything that requires writing to "pass" is now completely obsolete. ChatGPT can write a great, wonderful sounding cover letter (of course, given how they kneecap'd it, you can pretty easily spot its writing style)...

Anyway. I think things like ChatGPT are so hyped up because anybody can try it and discover it does many useful things! It's the fact that people cast all their hopes and dreams on it despite the very obvious limitations on what an LLM can actually do.


> The moral responsibility rests with human beings. Just like you're responsible if your crappy "autonomous" drone crashes onto a patio and kills a guy.

I don't think it really matters who was responsible if the X-risk fears come to pass, so I don't understand why you'd bring it up.

> No one can actually make an argument for how this trajectory will actually work.

To use the famous argument: I don't know what moves Magnus Carlson will make when he plays against me, but I can nonetheless predict the eventual outcome.


>Who prompted the future LLM, and gave it access to a root shell and an A100 GPU, and allowed it to copy over some python script that runs in a loop and allowed it to download 2 terabytes of corpus and trained a new version of itself for weeks if not months to improve itself

Presumably someone running a misconfigured future version of autoGPT?

https://github.com/Significant-Gravitas/Auto-GPT


> Who prompted the future LLM, and gave it access to a root shell and an A100 GPU, and allowed it to copy over some python script that runs in a loop and allowed it to download 2 terabytes of corpus and trained a new version of itself for weeks if not months to improve itself, just to carry out some strange machiavellian task of screwing around with humans?

> The human being did.

I generally agree with you and think the doomerists are overblown, but there's a capability argument here; if it is possible for an AI to augment the ability of humans to do Bad Things to new levels (not proven), and if such a thing becomes widely available to individuals, then it would seem likely that we get "Unabomber but he has an AI helping him maximise his harm capabilities".

> it's a better sell for the real tasks in the real world that are producing real harms right now.

Strongly agree.


> No one wants to focus on that

Actually this receives tons of time and focus right now. Far more than the X-risk.

It's much higher probability but much lower severity.


I don't know about anyone else, but the moment LLMs were released, i gave them right away access to all my bombs. Root access that is. I thought these LLMs were Good Artificial General Intelligence not BAGI.

I think the fear of some of the people, stems from not understanding permissions in a computer. Too much of using Windows can mess with one's head. Linux has permissions for 35 years, more people should take advantage of those.

Additionally, anyone who has ever used selenium knows that the browser can be misused. People create agents using selenium for quite some time. If one is so afraid, run it in a sandbox.


I assume it's a joke, but if not, consider that OS permissions mean little when the attack surface includes the AI talking authorized user or an admin into doing what the AI wants.


Why should a person who has root on a computer talk to another person, and just do what he is talked into doing?

For example a secretary receives a phone call by her boss, and listens in her boss's voice, to transfer 250.000$ into an unknown account, to a Ukrainian bank? Why should she do that? Just listen to a synthetic voice, just like her boss, in exactly the way her boss talks, language idioms that is, and she will just do it?

That's what you are talking about? Because that's impossible to happen if her boss uses ECDSA encryption and signs his phone call with his private key.


> Why should a person who has root on a computer talk to another person,

Because they are a human, and a human being cannot survive without communicating and cooperating with other humans. Much less hold a job that grants them privileged access to a prototype high-capacity computer system.

> and just do what he is talked into doing?

Why does anyone do what someone else asks them to? Millions of reasons. Pick any one. AI for sure will.

> That's what you are talking about?

Other things as well, but this one too - though it will probably work by e-mail just fine.

> Because that's impossible to happen if her boss uses ECDSA encryption and signs his phone call with his private key.

1) Approximately nobody on the planet does signed and encrypted phone calls, and even less people would know how to validate those when on receiving end,

2) If the caller spins the story just right, applies right amount of emotional pressure, it might very well work.

3) A smart attacker, human or AI, won't make up random stories, but will use whatever opportunity presents itself. E.g. the order for an emergency transfer to a foreign account is much more believable when your boss happens to be in that country, and the emergency described in the call is highly plausible. If the boss isn't traveling at the moment, there are other things to build a believable lie around.

Oh, and:

4) A somewhat popular form of fraud in my country used to be e-mailing invoices to the company. When done well (sent to the right address, plausibly looking, seems like something company would be paying for), the invoice would enter the payment flow and be paid in full, possibly repeatedly month over month, until eventually someone flags it on an audit.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: