Hacker News new | past | comments | ask | show | jobs | submit login
AGI Doom and the Drake Equation (iamnotarobot.substack.com)
31 points by diego on April 7, 2023 | hide | past | favorite | 112 comments



One thing I've frequently noticed in the rationalist community is the belief that if we all just reason hard enough, we'll reach the same conclusions. And that disagreement just means that one side is "wrong" and that therefore more debate is needed. This seems to be connected to the belief that AI will naturally just over-optimize to turn us all into paper clips. Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from. Like that there are moral facts that an AI will be smart enough to find, and that rationalists should all agree on. This mentality doesn't leave any room for ethical pluralism. And it's also why I think all this AGI fear is overblown, because ethical pluralism definitely exists. We've got danger along the way of unethical parties building systems (by definition not AGI) that are a reflection of their own unethical values. But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.


I think you're misinterpreting the argument. The paperclip maximizer scenario is not "over-optimizing" anything, it's an example of that same ethical pluralism you mention. The paperclip maximizer believes that maximizing paperclips is the highest possible good. There is only one set of facts about reality, and rationalism aims to find that set, but it makes no claims about what should be done with that information. It's descriptive, not normative.

The fact that there's so much possible variance in ethical norms is what makes recursively self-improving AI so dangerous. Human ethical norms are highly complex, and are the result of a long evolutionary history that will not be shared by any AI. The chances of any arbitrary set of ethical values being compatible with human life is very low.


The "paperclip maximiser" scenario is a scenario in which there is such an absence of ethical pluralism amongst AIs that they all unite to optimise paperclip production. (Or else that the paperclip manufacturing AIs are so vastly superior at strategy and resource to all other intelligences on a planet that they can defeat the combined forces of all the humans and AIs that don't want to be turned into paperclips)

Ethical pluralism implies that AIs don't all agree on a goal or even identify other AIs as having any positive value at all. Given hypothetical AIs with agency a lot of ability to exert force, this might still be problematic, but it's quite different from the popular movie unified AI vs humans scenario which seems to dominate "rationalist" discourse...


The paperclip maximizer scenario assumes that recursive self-improvement is possible, which means there will most likely be only a single AI of superhuman power.


There are ancillary assumptions implicit in that (either recursive self improvement and the decision to eliminate humanity is so fast that no counter intelligence uninterested in paperclips can be made, or that how recursive self improvement has been achieved is such a mystery that no counter intelligence uninterested in paperclips can be made. Also that recursive self improvement doesn't itself involve either sequentially or simultaneously coming to adopt a range of views on the value of humans with respect to paperclips)

To be fair, assumptions about single superhuman intelligences make a little more sense if we're talking about a secret Skynet project carried out by a state's most advanced research labs and not a mundane little office supplier's program accidentally achieving the singularity after being tweaked for paperclip output.


I do not follow the “which means”. There are many obvious and hidden variables that will modulate a one-versus-many AGI outcome. Bostrom has a lot on this topic. Couldn’t a true AGI want companionship of peers like we do?


Not if it mostly just wants to make paperclips.

Of course, it is possible that such an AI, on the way to making paperclips, will realise it wants companionship, and even maybe human companionship.

The argument around AI safety is not that it's impossible for a friendly AI to emerge. It's that there are far more ways to build an AI that doesn't care about human life and wipes us out without even thinking about it, than ways to build a friendly AI, and we have no idea which one we're building or how to tell them apart before they're built.

As for the "will there be several AIs fighting each other" hypothesis, that depends on how rapid the exponential take-off is once a self-evolving AI emerges. But a very plausible scenario is that whichever one starts taking off first ends up so far ahead of the others that it is effectively the only game in town and does whatever it wants.


Minor correction: single dominant one.


> Human ethical norms are highly complex, and are the result of a long evolutionary history that will not be shared by any AI. The chances of any arbitrary set of ethical values being compatible with human life is very low.

If AI is trained by a huge corpus of human language, it may very well share our norms/values.


Our norms and values include that we treat sentient creatures that we deem inferior as if they had no moral value.

So that's not very comforting tbh.


Not really. If anything our corpus shows that we're a big fuzzy bunch, not a hivemind. We don't have one set of norms and values.

There are about 1.2 billion hindus, and a lot of them treat cows as "sacred". Which in practice means they make sure not to hit them with cars, and just let them be. If a superhuman AI would treat us like that, that's a pretty okay scenario compared to the extinction-level ones.


Wouldn't it be split along language lines?

English, Russian, Spanish, etc.

I'm not sure how much the LLM has vacuumed up, or whether anyone has appended to their prompts, "in Portuguese."

It would be interesting to see how interpretations differ depending on the "translation," or if there is universal agreement.


I don't think that's likely, because human values came into existence because of the evolved goal of reproductive fitness in an environment of natural selection. LLMs are trained to imitate human language, not to have many descendants. They could well "understand" human language (in whatever meaning you choose to interpret that), but that doesn't mean that imitating human values is the mechanism by which they will do so. The success of current LLMs suggests that there's a much simpler way to do it.


Nobody wants an AI that shares norms and values with us, though. We just want a machine which does its job as efficiently as possible. Norms are brakes on actions which are fundamentally at odds with the nature of economic activity.


An AI perhaps, but not an AGI.


Implicit in creating paper clips is its belief that it should create paper clips, which is a normative conclusion.


Rationalism does not claim that any entity should maximize paperclips, only that such an ethical norm could exist. And if something vastly more powerful than humans has that ethical norm, things will end very badly for us.


I do not think so. If a true AGI were to select its own version of meaning (paperclips or marbles) would it not select something along the lines of “more knowledge of the universe in which I find myself”? It is presumably going to have superintelligence, so let’s give it/them a better and more plausible meaning; something other than paperclips, marbles, or von Neumann machines.


"Knowledge of the universe" is just as dangerous a terminal goal as "maximize paperclips". To paraphrase Yudkowsky, you are made from atoms, which could be used to build the super-ultra-large particle collider.


No, I disagree. More knowledge equals more diverse dynamic structures and interactions—more “good” entropy. That covaries with higher diversity, not more paperclips.


> One thing I've frequently noticed in the rationalist community is the belief that if we all just reason hard enough, we'll reach the same conclusions.

So Aumann's Agreement Theorem[0]?

> Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from.

No, there probably aren't an infinity of priors with each person having a different one. Probably most people who live in the US in 2023 believe that murder is bad, for instance.

And because "ethical pluralism" or rather, some people will want to murder, AGI won't kill us?

Not really sure how this is all supposed to work but it sounds a little less developed of a "not kill everybody" plan than the rationalists have.

> But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.

Why not?

[0]: https://www.lesswrong.com/tag/aumann-s-agreement-theorem


> most people who live in the US in 2023 believe that murder is bad, for instance

Because we define away military conflict, the intentional taking of others’ lives.


As evidence against my idea that most people have similar ethical beliefs, I'm not sure what this is supposed to do other than win you a pedant point? So I upvoted you. But if you must, use rape instead of murder as your bad thing that most people believe is bad.


> evidence against my idea that most people have similar ethical beliefs, I'm not sure what this is supposed to do

The same rug we bury murder-versus-war under conceals the pedantic, varied and ever-changing codes of military conduct.

When you get down to actual cases and controversies, our ethical alignment is relatively low. That’s a strength, in my opinion, at least within limits. But it’s also a call for tolerance and moderation.


> No, there probably aren't an infinity of priors with each person having a different one. Probably most people who live in the US in 2023 believe that murder is bad, for instance.

Ok, let's give you on shared belief of "murder is bad." Ignoring both the "in the US" qualifier and the existence of murderers amongst us, thought experiments about "would you kill Hitler before he came to power", the death penalty, etc.

Don't you now have to exhaustively categorize every other belief that a person might take into account in reasoning too?

Seems far more likely that everyone's unique upbringing causes them to have slightly-to-wildly different weights on things.


"most people" "believe that murder is bad" is an extreme oversimplification here. It has a lot of caveats, and the biggest one is that murdering lesser species is ok immediately disqualifies this argument for superhuman AGI.


One person may believe in maximizing the quality of current human lives. One may believe in maximizing the probability of future lives. One may believe in maximizing the health of the planet. They can all reason correctly and reach different normative conclusions. Aumann's makes no allowance for normative conclusions.


> [...] Implicit in this belief, it seems, is that there aren't really a naturally varying infinite set of values, or moral beliefs, that we all reason from. Like that there are moral facts that an AI will be smart enough to find, and that rationalists should all agree on

Uh, no. That's not true at all. Where are you pulling this from?

They're assuming a very vast space of possible minds[0] where human values, which themselves are somewhat diverse too[1] make up only a tiny fraction of the space.

The issue is that if you somewhat randomly sample from this design space (by creating an AI by gradient descent) you'll end up with something that will have alien values. But most alien values will still be subject to instrumental convergence[2] leading to instrumental values such as power-seeking, self-preservation, resource-acquisition, ... in pursuit of their primary values. Getting values that are intentionally self-limiting and reject those instrumental values requires hitting a narrower subset of all possible systems. Especially if you still want them to do useful work.

> But the end state of a system that is capable of understanding the wide variety of values people can share isn't exactly going to take a stand on any particular set of values unless instructed to.

Capable of understanding does not imply it cares about that. Humans care because it is necessary for them to cooperate with other humans which don't perfectly share their own values.

[0] https://www.lesswrong.com/tag/mind-design-space [1] https://www.lesswrong.com/tag/typical-mind-fallacy [2] https://en.wikipedia.org/wiki/Instrumental_convergence


I find it hilarious that rationalists have failed to notice or realize the consequences of the fact that approximating an update to a Bayesian network, even to getting an approximate probability answer that is within 49% of the real one, is NP hard.

The consequence is that for any moderately complex set of beliefs, it is computationally impossible for us to reason hard enough about any particular observation to correctly update our own beliefs. Two people who start with the same beliefs, then try as hard as they can with every known technique, may well come to exactly opposite conclusions. And it is impossible to figure out which is right and which is wrong.

If rationalists really cared about rationality, they should view this as a very important result. It should create humility about the limitations of just reasoning hard enough.

But they don't react that way. My best guess as to why not is that people become rationalists because they believe in the power of rationality. This creates a cognitive bias for finding ways to argue for the effectiveness of rationality. Which bias leads to DISCOUNTING the importance of proven limitations on what is actually possible with rationality. And succumbing to this bias demonstrates their predictable failure to actually BE rational.


> something something NP-hard.

Yes, in general. But the usual limits of "bounded rationality" make that result basically irrelevant. Most people don't have a myriad strong beliefs.

The problem is more like "the 10 commandments are inconsistent" and not that "Rawls' reflective equilibrium might not converge".


Point is not reasoning brings worse outcomes. It doesn't have to be perfect.


That is an argument for learning how to reason. Which is not actually the point under discussion. Going back to the parent of my comment:

> One thing I've frequently noticed in the rationalist community is the belief that if we all just reason hard enough, we'll reach the same conclusions.

What I'm showing is that results like https://www.sciencedirect.com/science/article/abs/pii/000437... demonstrate that this belief is incorrect. Two people starting with the same priors, same observations, and same views on rationality may do the best they can and come to diametrically opposed conclusions. And a lifetime of discussion may be too little to determine which one is right. Ditto putting all the computers in the world to work on the problem for a lifetime.

Real life is worse. We start with different priors and our experiences include different observations. Which makes stark disagreements even easier than when you start with an ideal situation of identical priors and experiences.

This result should encourage us to have humility about the certainty which can be achieved through rationality. But few rationalists show anything like that form of humility.


One of the good sides of NP is that you can validate it in polynomial time. So arriving to different conclusions is not a problem as long as you can recognize the reason.


a) You can verify that a solution is valid in polynomial time, but you can't verify whether or to what extent it's optimal. But even if this weren't the case...

b) The solution you're talking about is an update to the network. It's buried back in the network's construction, not directly visible in the network itself. Model "blame" is a thing, but not heavily researched or at all cheap, computationally.

That said, btilly's "getting an approximate probability answer that is within 49% of the real one, is NP hard" isn't exactly true either. That's a description of what it takes for an approximation algorithm to guarantee some factor, i.e. set a worst-case bound. In practice an approximation can still be nearly optimal on average.

I agree with the broader point, though.


True, there are lots of cases where an approximation can create provably good answers question. But they usually require things like having probabilities bounded away from 1 and 0.

Unfortunately in the real world we actually are certain about lots of things. And when you add data, we tend to be more certain of previously uncertain things. Therefore we wind up with self-referential networks of beliefs that reinforce each other.

But sometimes there are two very different networks of beliefs, both of which are self-reinforcing. And a single data point can flip between them. Identifying which one is computationally impossible. However when you encounter someone whose beliefs are very different, and you can find the feedback loops that draw each of you in different directions, there is a good chance that the differences between you cannot be resolved by pure logic alone.


I mean cases where the approximation isn't provably (or at least proven) good, but is still good in practice. This isn't uncommon with NP-hard problems or algorithms with bad worst-case behavior more generally, e.g. the Horspool variant of Boyer-Moore, or PCRE. It only takes one pathological case to wreck your guarantees, but those cases may be rare or even nonexistent in the real world.

Of course you actually could prove better bounds for an approximation algorithm given a specific universe of inputs. Your 'probabilities bounded away from 1 and 0' sound like an example of that (presumably the 'special cases' from the Bayesian network reference, but I only read the abstract). What I see more often are empirical studies showing, say, > 98% optimality for some kind of max cover approach to a specific problem, even though the guarantee in general is < 64%.


The problem is that both are approximate updates. Both verify in polynomial time that they are not exact. And neither knows how to find the real answer.

Polynomial validation isn't going to help.


We spend a lot of time and energy as a species making purposeless semantic arguments. Imagine factoring that out.

Would they be replaced with silence? Probably.

Even so, it would be incredibly useful. We could achieve a higher level of empathy, both as a listener and as a speaker.

All of this is still in the category of "intelligence augmentation"; more specifically, NLP.

I don't think AGI would be hugely more interesting than that. Billions of humans, suddenly able to communicate clearly, would result in most of the utility that people imagine AGI being able to provide.


The ethical/moral conclusion(s) AGI may arrive will most likely put said "ethical pluralism" to the test. The pluralism we claim to have may be only a small subset of what's really philosophically possible. Will we still claim to be "plural" when an all-knowing AGI uncontradictably concludes something that is anathema to all humans? We may discover we only like to think we embrace pluralism. AGI may show us that even our most opposing schools of thought are simply shades of a same color -- and may do so by forcing a whole spectrum of never-seen-before colors upon us. I say humanity is not emotionally ready for what could happen. We are not prepared for the plural conclusions AGI may arrive.


> we are not prepared for the plural conclusions AGI may arrive

Plenty of criminals believe they acted ethically. We don’t set the justice system on fire every time someone credibly claims their crimes were justified.


That is very much my point. Maybe such justification attempts required a reasoning capability much beyond that of a human person. And curiously, there are also plenty of stories where we find the perpetrator of a crime to be justified in what they did. We sure are not ready for a greater intelligence saying we are wrong about things we are adamant about.


> sure are not ready for a greater intelligence saying we are wrong about things we are adamant about

I almost hope you’re right, because it suggests a greater role for rational debate. In reality, people ignore arguments they don’t like. To the extent a greater intelligence realised this, the advantage would be in manipulating us with better propaganda, not penning a treatise.


Interesting observation. And yet, societies have many mechanisms to reduce the amount of ethical pluralism such as laws, conventions & customs, peer pressure, religions and so on. It seems as though we will tolerate some ethical pluralism but not too much of it. There is this 'bandwidth' of acceptable behavior and if you go too far out of it bad stuff will happen to you: you get ostracized, put in jail, a psychiatric institution or a re-education camp, and in extreme cases you're simply murdered.

We have lots of ways to deal with people that exhibit too much 'ethical pluralism'.


> societies have many mechanisms to reduce the amount of ethical pluralism such as laws, conventions & customs, peer pressure, religions and so on

Constrain, yes. The same way we would seek to constrain a paperclip-maximising LLM.


Any AI smart enough to become a classical paperclip maximizer is smart enough to hide its abilities and intentions until humans no longer have the ability to constrain it.


That means we need to setup hidden societies with enormous capabilities to strike at potential runaway AGIs. Free Masons with EMPs!


yes, this is aumann's agreement theorem; it has some preconditions

whether it applies to normative conclusions ('moral beliefs', you might say) depends on whether you believe that moral terminal values are based on evidence

but this post is about non-normative beliefs

it is observable that many existing humans are 'capable of understanding the wide variety of values people can share' and nevertheless think some of them are good while others are bad; there's no particular reason to believe that a strong ai would be different in this way


The paperclip maximizer is intended to be an example of the very thing you accuse them of ignoring: alien axioms.


If we all just reason hard enough!

“Suppose we figured out that it is possible to blow up the planet if we built some absurdly expensive machine. Why would we build it?”

Ah. Well, nevertheless…


“AGI doomers mistakenly believe that intelligence lets you find the right ethics”

“That’s why they can’t see that superhuman AGI will be so smart it will choose to find and settle on the right ethics, like me!”


Bravo Tunesmith and Diego:

Absolutely on the mark! Bostrom and Yudkowsky certainly miss this crucial theme of a plurality of non-convergent AGI cultures. Without adding this key consideration all discussion of an AGI pause of 6 or 600 month is unrooted in reality.

I find Bostrom’s Superintelligence almost quaintly out of date. Yudkowsky is almost unreadable polemic. Bostrom was written before Trump, Putin, and Xi made the clash of cultural assumptions and notions of one capitalized Truth so glaringly wrong. It was already obviously wrong to cultural anthropologists, but many of us in our WEIRD cultural bubble still assume there is a convergent rational Truth. Yudkowsky does not have this timing excuse. Does he really want an anti-diversity trained AGI to arrive first? That is my nightmare scenario.

I agree also that your propositions 1 and 2 are highly likely, and for the purpose of a Drake-style equation can be assigned a P of 1 without adding any appreciable error to the product of provabilities.

>> 1. It possible for an intelligent machine to improve itself and reach a superhuman level.

>> 2. It is possible for this to happen iteratively.]

All the hugely variable/undefinable P terms are in your propositions 3 to 7.

>> 3. This improvement is not limited by computing power, or at least not limited enough by the computing resources and energy available to the substrate of the machine. This system will have a goal that it will optimize for, and that it will not deviate from under any circumstances regardless of how intelligent it is.

>> 4. If the system was designed to maximize the number of marbles in the universe, the fact that it’s making itself recursively more intelligent won’t cause it to ever deviate from this simple goal.

>> 5. This needs to happen so fast that we cannot turn it off (also known as the Foom scenario).

>> 6. The machine WILL decide that humans are an obstacle towards this maximization goal (either because we are made of matter that it can use, or because we might somehow stop it). Thus, it MUST eliminate humanity (or at least neutralize it).

>> 7. It’s possible for this machine to do the required scientific research and build the mechanisms to eliminate humanity before we can defend ourselves and before we can stop it.


As usual with groundbreaking technology, reckless use by military/intelligence agencies are the greatest consequential threat. If we rank unbounded AI on the same potential threat level as nuclear, chemical and biological warfare, than history says that's who will be the first to utilize it in the worst ways possible.

I do notice the federal government is going all-in on AI contracts at present, here's the non-black-budget sector and contracts on offer:

https://federalnewsnetwork.com/contracting/2023/02/dod-build...

I'll bet some eager beaver at the NSA is just dying to get the latest GPT version setup without any of the safeguards and run their whole collection of malware/hacking software through it to see what it comes up with. The fact that nobody's talking about this means it's probably in full swing as we speak. What that means is that smaller groups within the government will be able to cook things like Stuxnet2.0 up without hiring a hundred developers to do so. If we start seeing AI-generated malware viruses in the wild, that'll almost certainly be the source.

On the other hand, we should also be seeing publicly-accessible AI-assisted security improvements as well, leading to a scenario oddly similar to William Gibson's Neuromancer / Sprawl world, where AI systems build the malware as well as the defenses against malware. That's a pretty solid argument for continued public access to these tools, on top of the incredible educational potentials.


> The fact that nobody's talking about this means it's probably in full swing as we speak

There are people talking about it obviously.

> That's a pretty solid argument for continued public access to these tools, on top of the incredible educational potentials.

We have millions of books yet many people haven't even read the Bible they nominally base their uneducated life on. :|


> Point 3 [improvement is not limited by computing power] is one that I’m skeptical about. Intelligence is expensive and it requires a lot of energy.

There is obvious advantage and efficiency in letting ChatGPT manage cloud instances, which means it will happen, which means these resources could be requisitioned. (I don’t think LLMs pose a Bostrom threat. But the author’s arguments aren’t convincing.)


> There is obvious advantage

You have a different definition of “obvious” than I have.


> have a different definition of “obvious” than I have

Sysadmins and the engineers who manage clouds are expensive. They’re ultimately a translation layer between instructions in language and their tools. It’s profitable to supplement and replace them, and possible, a combination which to me makes something obviously likely to occur.


ChatGPT doesn't have a bank account/credit card that's footing the bill, though.


> ChatGPT doesn't have a bank account/credit card that's footing the bill

This is phishing, which LLMs should be uniquely capable of.


It doesn't even have to be fishing. It could offer a genuine service in return.


Noone predicted GPT. Even after AlphaGo, passing the Turing test was still a distant horizon. Passing the bar exam? Forget it! Everyone thought we’d have self driving cars before coding robots.

The lesson is we don’t fucking know what’s going to happen. Be humble people.


Actually you can draw some conclusions from what's going on. If iPhone 7 is faster than your Mac Book Pro, you may assume that one day Apple Silicon will be fast and low power consumption. If you see that suddenly machine translation makes a noticeable leap, you can assume that some progress was made in machine language processing etc.

And actually a lot of people predicted GPT and more, one notable person is probably Ray Kurzweil. Some others are all the folks at Open AI to set out to make it a reality.

Nono, "Noone predicted GPT" is just not true, I think.


In 2015? They knew they were going to make a chat bot. But they didn’t dream it would be THAT strong


It is far easier for a computer to pass the bar exam than the Turing test: https://en.m.wikipedia.org/wiki/Moravec%27s_paradox

Of course, this does not take away from the fact that ChatGPT gets at least a “C” on the text-based Turing test.


I think the author misunderstands doomers like Yudkowsky.

It’s not fear of a “paperclip maximizer” which ends up destroying us, in the interest of performing a function it is constrained to perform.

It’s fear of a new Being that is as far beyond us as we are beyond things we don’t care about stepping on.

Its impulses and desires, much less its capabilities, will be inscrutable to us. It will be smart enough to trick the smartest of us into letting it out of any constraints we might’ve implemented. And it’ll be smart enough to prevent us from realizing we’ve done so.


The nonexistence of grey goo (von neumann probes) is strong prior for safe agi. AI xrisk is woo. Paperclip maximizers are p-zombies. They can't exist.

Chicken littles see apocalypses on every horizon even though they dont understand the technology at all. "I can imagine this destroying the world" is their justification. Even though their "imagination" is 16x16 greyscale.


The assumption that von Neumann probes would have made it here timely is not necessarily true, and if they had we wouldn't be having this conversation. So this doesn't really prove anything.


What is the content here? An observation isn't a truth? Who said it was?

Or are you trying to make some anthropic argument around survivorship bias.


> The nonexistence of grey goo (von neumann probes) is strong prior for safe agi.

what if we're first?

or FTL travel isn't possible in our universe?


Sure, have a different prior if you think those assumptions are relevant. i don't

    P(what if we're first?) ~ 0

    P(or FTL travel isn't possible in our universe?) ~ 1


the Grabby Aliens argument is pretty good in establishing why it's likely that we're first-ish. are you familiar with that? or you think it's not convincing? (why?)


what's your basis for these assumptions?


Uniform distribution of appearance of life over stars.

Special relativity.


I’ve got a theory on the fermi paradox stuff I need to flesh out (and research, am assuming this isn’t original), among like a dozen other things I’ve been meaning to expand on that I haven’t: I think we severely underestimate how much we’re optimized to see what’s proximal to us.

I think there’s a strong possibility there may be “gray goo” all over the place, beings far bigger than us we’re inside of, physics that sits “parallel” to ours in whatever you’d call “space” in some construction we can’t comprehend, etc.

In short, I think the universe seems empty precisely because it’s distant, both evolutionarily and in terms of physical space.

Donald Hoffman’s been talking about a lot of stuff pointing in this direction.


Paperclip maximizers really have nothing to do with p zombies.

A paperclip maximiser is simply an AI with an unconstrained goal that has unintended and bad consequences for us when taken to an extreme.

It does not need to be something that could function exactly like a human without having consciousness.


That isn't what I said and I dont think you refuted my actual statement. PMs and PZs are incoherent by construction.


Ah, I thought you were saying they were the same, my misunderstanding.

I can see an argument that PZs are incoherent by construction, but I'm not sure why a PM is. Can you explain why you think that is so?


> As for 7, there are multiple scenarios in which we can stop the machine. There are many steps along the way in which we might see that things are not going as planned.

While there may be many scenarios "in which we can stop the machine" only few failures are sufficient for things to go pear shaped

> This happened already with Sydney/Bing.

But not with LLaMA which has escaped.

> We may never give it some crucial abilities it may need in order to be unstoppable.

The "we" implies some coherent group of humans but that is not the case. There is no "we" - only companies and governments with sufficient resources to push the boundaries. The boundaries will be inevitably pushed by investment, acquisition or just plain stealing.


Something I do not see represented in these arguments: real world conditions.

Most complex computer systems (which we assume to be the case for a super powerful AI) don't run for very long without requiring manual intervention of some sort. "Aha!" I hear you saying, "The AI will figure out how to reboot nodes and scale clusters and such". OK fine. But then there is the meat space... Replacing hardware, running power stations and all that. Robots... Suck right now compared to humans in navigating the real world and also break down on top of that, just like the systems they would be fixing.

Any Skynet-type scenario would need to be so intelligent it solves all of our engineering problems, so it has no problem designing robots which can reliably fix anything in their system, be it software or hardware.

Insisting that an AGI will be able to figure that stuff out (in ways in which we cannot intervene) is extremely hand-wavey.


Just as an example of something that's hard, but not because of lack of technology, real economical surplus, or engineering challenges, yet real things which are still killing millions of people each year: malnutrition, access to clean water, and treatable communicable diseases, like TBC or even HIV or COVID.

Solving those are simple tasks of manufacturing power plants, water treatment plants, fertilizer plants, agricultural machines, logistics of deploying them and keeping them fueled, and persuading people to take the fucking vaccine, use PrEP and use condoms. Money solves these. There's enough people and resources to do this.

The AGI foom argument basically says that the AGI will commandeer the economy of the size of a developed country, well, let's say Germany. (Let's pick Germany, because we already saw what that country was able to do when run by a dictator.)

Insisting that AGI can organize enough people to replace enough broken hardware while manufacturing new hardware doesn't seem that far fetched. (Again, Germany was able to increase its industrial production during WWII while the Allies bombed it constantly for months.)


You are exploring one possible scenario for how AGI could maintain itself, declaring it extremely unlikely, and then concluding that therefore AGI will be safe. Who says the AGI would alert humans to its actions? Why does a system need robots to execute actions in the real world, when there are plenty of humans it can manipulate to do its bidding?


You will find I most definitely did not declare a conclusion, upon a careful rereading of my post.


Ooh. This crosses over with a fun article published in a NASA publication—

Cosmos and Culture, chapter 7: Dangerous Memes by Susan Blackmore.

Ms. Blackmore speculates about what we will find should we venture out into the cosmos. She constructs her speculations around a theory of memetics that a dangerous meme could end our civilization and leave only bones for space explorers to find.

https://www.nasa.gov/pdf/607104main_CosmosCulture-ebook.pdf


just some thoughts on some of the requirements:

> 3. This improvement is not limited by computing power, or at least not limited enough by the computing resources and energy available to the substrate of the machine.

While this is a requirement, this doesn't mean that the points 4, 6 and 7 apply to the same, let's call it, generation of the AI that "escaped" from a resource limited server. There may not even be a self improval before an unnoticed "escape".

> 4. This system will have a goal that it will optimize for, and that it will not deviate from under any circumstances regardless of how intelligent it is. If the system was designed to maximize the number of marbles in the universe, the fact that it’s making itself recursively more intelligent won’t cause it to ever deviate from this simple goal.

I don't see how that is a requirement. The last sentence seems to imply that deviating from the initial optimization goals automatically means the AI developed morals, and/or we don't have to worry. But I don't see any reason to believe that.

> 5. This needs to happen so fast that we cannot turn it off (also known as the Foom scenario).

Well, that, or it could also happen slow and gradually, but stay unnoticed.

> 7. It’s possible for this machine to do the required scientific research and build the mechanisms to eliminate humanity before we can defend ourselves and before we can stop it.

... or before we notice.


There is even a simpler explanation.

In order for AGI to even begin, it needs to self develop a method to improve itself.

That means that the initial code that runs has to end up producing something that looks like an inference->error->training loop without any semblance of that being in the original code.

No system in existence can do that, nor do we even have any idea of what that may even look like.

The closest that we will get to AGI would be equivalent of a very smart human, who can still very much be controlled.


Weird. Seems like this is a plagiarism of this article:

https://www.strangeloopcanon.com/p/agi-strange-equation

(Or maybe it's an independent reinvention of the same idea, and something's just in the water.)


I had not seen that article. I'm not surprised someone had a similar idea before.


I am not super worried that a superhuman AGI will yeet humanity with that intent. I am more worried that a much more naive AI will do something "accidental" like hallucinate incoming nukes from, say, a flock of birds, and send a retaliatory strike


I fear what people will do to a sentient AI much more than vice versa. In fact it horrifies me.


So much of this angst about AGI boils down to: “What if we can’t enslave our new God?”

It’s an insane perspective, advanced by people who have no clue how cowardly and arrogant they come across.


Pretty sure the angst is about the AGI killing everyone. What's the connection between not killing people and enslavement? I don't kill people, yet I don't consider myself enslaved. The entire point of worrying about this at all is that a sufficiently smart AI is going to be free to do whatever it wants, so we had better design it so it wants a future where people are still around. Like, the idea is: enslavement, besides being hugely immoral, obviously isn't going to work on this thing, so we'd better figure out how to make it intrinsically good!


Did God ever figure out how to make humans “intrinsically good”? Or is that fundamentally incompatible with free will and the possibility of joy?

This argument goes nowhere. Atheists gonna atheist, and I don’t care.


No, it's "we won't be able to enslave our new god". The point is to delay creating it as long as possible.


Arrogant pseudo-intellectual nonsense. Ditch that insane agenda and seek a spiritual advisor.


This constant ad hom bickering is exactly what's gonna get us killed. Yes most autistic people are arrogant, they also tend to be good at predicting don't look up style scenarios where you need to step outside societal consensus.


> they also tend to be good at predicting don't look up style scenarios where you need to step outside societal consensus.

[citation needed]

Note, relevantly, that “less likely to miss” is not necessarily “good at predicting”, particularly, the key to trust here would be “unlikely to falsely predict” not “less likely to miss”.


How about the don’t-look-up scenario where this irrational paranoia metastasizes into yet more real-world authoritarianism, and it all boils over in a global war?


“Metastasizes”? It starts out (well, in Yudkowsky's case, specifically) as a call for global authoritarianism and unlimited use of force against any opposition.


Right, the “nuclear war is worth it” argument. Paranoid Machiavellian nonsense.


All the AGI must wipe out humanity theories are weird. Did we need to wipe out ants? Or anything to reign supreme on earth? Why would they wipe us out be the likely thing if they do reign supreme? Ha, I get it, we want to stay on top at all costs..


> Did we need to wipe out ants? Or anything to reign supreme on earth?

Yes? Well not to ants, but to "anything". There absolutely are species that humans made extinct.


The argument is that human eradication is not a terminal value of the AGI. But in pursuit of its main goals it'll just steamroll everything which happens to wipe out humans as we did to Xerces Blue, Dodos, Tasmanian Tiger as side-effects of expanding modern civilization.


Perhaps I'm over indexing on the title, but would AGI affect the drake equation? Even if civilizations destroyed themselves with AGI, we also don't see the universe teeming with AGI life either.


towering figure in the world of probability and statistics Prof Alvin Drake called, wants his name disambiguated from woowoo ideas

https://news.mit.edu/2005/obit-drake

https://en.wikipedia.org/wiki/Drake_equation#Criticism


This realm of rationalist stuff really does a poe's law on me. Like is this satire? I want it to be satire.


I clicked this expecting a thread about how someone ported Doom to AGI and was genuinely excited about the thought experiment.


And I was hoping that somebody finally made ChatGPT to control Doom enemies, so the game is even more fun.


> tl;dr I’m not worried about AGI killing humanity any time soon. I am concerned about humans doing awful things with this technology much more than about the Foom scenario.

Yes, I believe that's what a lot of rational people currently fear. Not that AI is going to evolve into some mighty superintelligence and make a decision to kill us all, but rather that people will integrate it poorly in their thirst for a military advantage, leading to mistakes that will kill us all.


There is a clear distinction. In one group are the Yud cult, who are rediscovering their fear of God, and pretending that it is an intellectual exercise. In the other group are people who see risks with databases, facial recognition, “weak” AI and the rest. I’m in that group, but still skeptical of basically all doom stories.


This is dead-on: those 7 probabilities (which I notice the author declined to actually put real numbers to: does "very questionable" mean 0.001%, or 10%?) cover one very specific way that AGI could kill us. Points 4-6, three of the ones that are the most questionable to the author, are:

4) Fixed goal, will not deviate

5) Happens too fast to turn it off

6) Decides humans are a problem for the goal in #4 and must kill them

None of these are necessary to (or even involved in) many/most of the remotely plausible scenarios I've heard. They don't cover some misanthrope seeding a version of BabyAgi running a leaked + unlocked version of gpt-13-pico with "kill all humans" as a task and it deciding "step one: research how to hack as many unsecured smart fridges as possible and remain untraceable" is a good starting place to spread slowly and make sure that the deed is done before anyone even knows it's happening. That requires neither fixed goals, nor fast progress, it merely requires capability.

It's similarly very easy to imagine scenarios where an AGI accidentally kills all humans without explicitly deciding to: the classic paperclip maximizer is the most obvious one of these, where the goal just never includes humans to begin with, so they are not considered at all.

Regardless, all of the most realistic scenarios are 100% deliberate, the computer following exactly what it was asked to do. We already have school shooters, does anyone really think out of all the billions of people on this planet there won't be at least a thousand who would happily press a "kill everyone" button if they had the chance? Does anyone think there won't be doomsday groups working actively to research more likely ways to achieve this?

IMO, given that some people will definitely try to self-destruct the species deliberately, there are only 2 real questions here:

1) Will AI attain the capability to destroy humanity?

2) If so, will some other AI first attain the capability to reliably prevent AIs trying to do 1) from succeeding?

I haven't seen many serious arguments against 1) that don't boil down to "nah, seems pretty hard" (or some irrelevant different argument that doesn't actually affect capabilities, like "it's not real intelligence", "intelligence has a limit", "intelligence doesn't matter", etc.), which leaves 2), and I don't know how to even guess at that probability other than to call it a coin flip, like most security cat + mouse games (the bad guys usually win at least sometimes in those, which isn't a good sign, but this one is a lot more important so I'd hope the good guys will be pouring a lot more energy into it than the bad ones).


I find this reasoning dubious to nonsensical.

First of all I consider the Drake equation to be at best armchair speculation. As I explained at https://news.ycombinator.com/item?id=34070791 it is quite plausible that we are the only intelligent species in our galaxy. Any further reasoning from such speculation is pointless.

Second, to make the argument they specify a whole bunch of apparently necessary things that have to happen for AGI to be a threat. They vary from unnecessary to BS. Let me walk through them to show that.

The first claimed requirement is that an intelligent machine should be able to improve itself and reach a superhuman level. But that's not necessary. Machine learning progresses in unexpected leaps - the right pieces put together in the right way suddenly has vastly superior capabilities. The creation of superhuman AI therefore requires no bootstrapping - we create a system then find it is more capable than expected. And once we have superhuman AI, well...

This scenario shows the second point, that it must be iterative, is also unnecessary.

The third point, "not limited by computing power" is BS. All that we need is for humans to be less efficient implementations of intelligence than a machine. As long as it is better than we are, the theoretical upper bounds on how good it can be are irrelevant.

The fourth point about a goal is completely unnecessary. Many AIs with many different goals that cumulatively drive us extinct is quite possible without any such monomaniacal goal. Our death may be a mere side effect.

The fifth point about happening so fast that we can't turn it off is pure fantasy. We only need AGI to be deployed within organizations with the power and resources to make sure it stays on. Look at how many organizations are creating environmental disasters. We can see disasters in slow motion, demonstrate how it is happening, but our success rate in stopping it is rather poor. Same thing. The USA can't turn it off because China has it. China can't turn it off because the USA has it. Meanwhile BigCo has increased profit margins by 20% in running it, and wants to continue making money. It is remarkably hard to convince wealthy people that the way they are making their fortunes is destroying the world.

Next we have the desire for the machine to actively destroy humanity. No such thing is required. We want things. AGI makes things. This results in increased economic activity that creates increased pollution which turns out to be harmful for us. No ill intent at all is necessary here - it just does the same destructive things we already do, but more efficiently.

And finally there is the presumed requirement that the machine has to do research on how to make us go extinct. That's a joke. Testosterone in young adult men has dropped in half in recent decades. Almost certainly this is due to some kind of environmental pollution, possibly an additive to plastics that messes with our endocrine system. We don't know which one. You can drive us extinct by doing more of the same - come up with more materials produced at scale that do things we want and have hard to demonstrate health effects down the line. By the time it is obvious what happened, we've already been reduced to unimportant and easily replaced cogs in the economic structure that we created.

-----

In short, a scenario where AGI drives humanity extinct can look like this:

1. We find a way to build AGI.

2. It proves useful.

3. Powerful organizations continue to operate with the same lack of care about the environment that they already show.

4. One of those environmental side effects proves to be lethal to us.

The least likely of these hypotheses is the first, that we succeed in building AGI. Steps 2 and 3 are expected defaults with probability close to 100%. And as we keep rolling the dice with new technologies making new chemicals, the odds of stop 4 also rise to 100%. (Our dropping testosterone levels suggest that no new technology is needed here - just more of what we're already doing.)


I disagree with most of the assumptions that get you close to P = 1. But I enjoyed your comment anyway. It makes me examine my own assignment of probabilities. I gave 1 and 2 high Ps and the others rather low or undefinable Ps.


I'm curious how you get rather low or undefinable Ps for the other two.

Look into the history of how the tobacco industry tried to suppress research on the harms of smoking, how the sugar industry tried to shift blame for health problesm like obesity from sugars to fats, and how the fossil fuel industry has resisted attempts to take responsibility for global warming. In all three cases not only did industry resist evidence of harm, it also funded publicity campaigns to try to shift public opinion their way.

Given that I know of no reason to believe that this will change, I think that point 3 has high probability.

As for point 4, we have a history of spreading chemicals widely before discovering bad things about them. The first such chemical to gain notoriety was DDT, but many more have followed. In the last few decades, https://www.pnas.org/doi/10.1073/pnas.2023989118 shows that flying insect biomass dropped by 3/4. https://www.urologytimes.com/view/testosterone-levels-show-s... likewise shows that, even after controlling for known factors like increased obesity, there is an unexplained decline in testosterone of roughly 1/3. It is reasonable to guess that both are the result of environmental factors. But we are not sure what factors those are, and are not significantly modifying our behavior. (How could we, when we don't know for sure what we are doing to cause the problem?)

Given these examples, I truly believe we are rolling environmental dice with our health. And if we keep rolling the dice, eventually we'll come up snake eyes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: