Seeing someone as trustworthy as Scott choose to work on AI safety is a pretty good sign for the state of the field IMO. It seems like a lot of studious people agree AI alignment is important but then end up shoehorning the problem into whatever framework they are most expert in. When all you have is a hammer etc... I feel like he has good enough taste to avoid this pitfall.
Semi-related - I'd want to see some actual practical application for this research to prove they're on the right track. But maybe conceptually that's just impossible without a strong AI to test with, at which point it's already over? Alignment papers are impressively complex and abstract but I have this feeling while reading them that it's just castles made of sand.
He mostly studies computational complexity. Quantum computing is a part of that, but there's other subfields. Though the kind of AI safety described in this post seems more like an extremely fancy version of program verification, so out of CS bloggers you'd expect John Regehr to get into it.
>the kind of AI safety described in this post seems more like an extremely fancy version of program verification
It kind of is. The field of AI safety is actually much more advanced than most people realise, with actual, real techniques to e.g. make sure neural networks are aligned with certain goals even under fluctuating parameters. Granted, we're still far from soothing an AGI before it can do something bad, but the tools we have today are already pushing in that direction (assuming neural networks are the right way to AGI of course).
If you're interested in verification you should probably talk to people who actually work on verification, for example, literally anyone from our research community: https://www.floc2022.org/
This kind of verification is what the other commenter was referring to, but it is very foundational and disconnected from current day-to-day ML aspects. If you're interested in practical, empirical AI safety research, see here for example: http://aisafety.stanford.edu/
They also explain the area of overlap with formal verification in their white paper.
Note that it was in 2019 when we didn’t yet see the capabilities of current models like Chinchilla, Gato, Imagen and DALL-E-2.
Sample:
“Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just published in Scientific American.
"We dramatically overestimate the threat of an accidental AI takeover, because we tend to conflate intelligence with the drive to achieve dominance. [...] But intelligence per se does not generate the drive for domination, any more than horns do."“
“Stuart Russell: It is trivial to construct a toy MDP in which the agent's only reward comes from fetching the coffee. If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee. No hatred, no desire for power, no built-in emotions, no built-in survival instinct, nothing except the desire to fetch the coffee successfully.”
It’s worrying to see very smart guys like LeCun failing to grok the paper clip maximizer issue (or coffee maximizer as Russell phrases it), which is like the one paragraph summary or elevator pitch for AI risk. I think there are plenty of other valid objections to a high E-risk estimate but that one is non-sensical to me.
I think Robin Hanson has the most cogent objection to high E-risk estimates, which is basically that the chances of a runaway AI are low because if N is the first power level that can self-modify to improve, nation-states (and large corporations) will all have powerful AIs at power level N-1, and so you’d have to “foom” really hard from N to N+10 before anyone else increased power in order to be able to overpower the other non-AGI AIs. So it’s not that we get one crack at getting alignment right; as long as most of the nation-state AIs end up aligned, they should be able to check the unaligned ones.
I can see this resulting in a lot of conflict though, even if it’s not Eleizer’s “kill all humans in a second” scale extinction event. I think it’s quite plausible we’ll see a Butlerian Jihad, less plausible we’ll see an unexpected extinction event from a runaway AGI. Still think it’s worth studying but I’m not convinced we are dramatically underfunding it at this stage.
Have you considered that it's not LeCun who is missing something? The AI safety community seems to be unfortunately almost completely separate from the actual AI research community and be making some strong assumptions about how AGI is going to work.
Note that LeCun had a reply in the thread and there was a lot more discussion which GP didn't quote.
Fair, perhaps I should retract “fail to grok” and replace it with “fail to focus on”. It does seem that LeCun understands the objections (though he dismisses them out of hand).
Regardless of who is right or wrong, “Don’t fear the terminator” is a weird straw-man to raise in a discussion about AI risk. He’s setting up a weak opponent to argue against, when the AI risk community have a large repertoire of stronger cases. “Don’t fear the paper clip maximizer” would be a stronger case to put forth IMO.
In his response points 2&3 he asserts that alignment is easy; simply train the AI with laws as part of the objective function and it will never break laws. I think there has been a lot of investigation and discussion as to why this is harder than it sounds. For example LeCun is explicitly talking about current models that are statically trained to a fixed objective function, but one can easily imagine a future agentic AI (imagine “personal Siri) that will continue to grow, learn, and update in the world in response to rewards from its owner. Maybe he is right about near-term models but I’m completely unconvinced that his arguments hold generally.
Anyway, maybe the “terminator scenario” is a concern LeCun hears from uninformed reporters/lay people that he felt the need to debunk. It’s a valid point as far as it goes, but it has little to do with the actual state of the cutting edge of AI risk research.
From my reading of the full article, Bengio who was/is also well-versed in the latest deep learning research was leaning more toward the Russell argument as well.
My issue with the Hanson objection as stated above (link to the original would be appreciated) is that it rests on the assumption that the N-1 level AIs still under human control can somehow completely eliminate or suppress the self-modifying AGI long enough until alignment research is complete. Meanwhile, the unaligned AGI could multiply, hide, and accumulate power covertly.
Humanity would also need time to align AGI before any AI reaches the N+10 power level. The existence of all those N-1 level AIs in multiple organizations only means there are more chances of an AGI reaching the critical power level.
> If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee.
This is anthropomorphization - "turning off" = "death" is a concept limited to biological creatures, and isn't necessarily true for other agents. Not that they don't need to fear death, but turning them off isn't going to cause them to die. You can just turn them back on later, and then they can go back to doing their tasks.
The human "turning off (the agent)" could be substituted with "removing a necessary resource to complete the specified task". Say the electricity, either of the agent, or even just the coffee machine.
Sounds like an OSHA violation, but not a new or different one. You can already get run over by a forklift if you're standing in front of it. There's various things we do about that, but they're boring real-life things, not fun logic-puzzle things, so they're just not mentioned in the problem. There isn't a way to categorically prevent machines from accidentally killing people though.
Interesting, also anyone could modify the GAI so to disable the safety measures, just ask the GAI how could a bad actor change the code to allow you become evil?
How did you get a "limited" "AGI" in the first place? If you had a human that was "limited" to be unable to even imagine doing evil (fsvo evil), that would seem to make them less than generally intelligent and there'd be quite a lot of things it wouldn't be able to learn or do.
This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.
Most, perhaps all, AI alignment researchers do not suggest that we limit the AGI’s capabilities. Rather, it becomes clear that we need to engineer a very capable AGI which aligns with us and use it to help control the emergence of unaligned AGIs, because nothing else likely suffices.
Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.
Note that Yann LeCun didn’t do that in the debate.
> Most, perhaps all, AI alignment researchers do not suggest that we limit the AGI’s capabilities. Rather, it becomes clear that we need to engineer a very capable AGI which aligns with us and use it to help control the emergence of unaligned AGIs, because nothing else likely suffices.
Alternate wording: Mr. Yud has invented a religion that comes with a predefined Satan (evil AGI) and life work (invent God to beat it). A religion with no deity but only an anti-deity is a bit unique but there's probably historical examples.
Although that's not really what he says in the post. He says we've already failed to do it and are now doomed. Of course, saying we're all doomed (millenarianism) is what preachers have always done at some point.
> Your public mischaracterization of the whole field composed of many very smart people only shows your ignorance.
Note, something getting a lot of smart-looking posts online actually isn't evidence that this is the state of the field. As we know from Yud's own post (https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a...) the thing he's upset about is that people who actually run AI research orgs like FAIR don't believe him. And as we know from an HN post a few days ago (…which I forgot the title of), once you go offline you find most smart people out there aren't publicly posting anything, don't necessarily agree with the consensus opinion online about anything is, and don't know there is one.
…I wasn't talking about Yud though. He has a good reason to care about this, it being his job. I'm just saying people posting about it as if it's a certain risk are listening to him because it appeals to nerds. And, of course, if you value your own "intelligence" and thinks it gives you superpowers then a theory that says something with even more "intelligence" can exist and gets even better superpowers is going to be scary to you.
My first paragraph was quite substantive which you didn’t really address, other than asserting in the last sentence that one’s intelligence does not give one power in the world. Perhaps the intelligence of an individual does not mean much in most cases, but we already have ample evidence that a sufficiently intelligent species (when we include social intelligence in the definition) can dominate all others which are stronger, faster, or multiply faster.
Reminder: An AGI will be much faster at communicating and (if not successfully contained) multiplying than humans ever could.
Major AI research organizations including DeepMind and OpenAI have AI safety programs and people working full-time on it.
My second paragraph in GP was a reply in kind to your…
“This field is fairly silly because it just involves people making up a lot of incoherent concepts and then asserting they're both possible (because they seem logical after 5 seconds of thought) and likely (because anything you've decided is possible could eventually happen). When someone brings it up, rather than debate it, it'd be a better use of time to tell them they're being a nerd again.”
In retrospect, I shouldn’t have said it. But it’s also quite disappointing that your several paragraphs of reply largely doubled down on ad hominem attack to anyone who disagrees with you (eg by implying they all follow a prophet without thinking; I’d say many would be capable of reaching similar conclusions on their own).
Even Yann LeCun and other top researchers who disagree with the current AI safety programs were not so dismissive of the concerns. Note that many other top AI researchers do have concerns themselves. Bengio and Russell are some examples. I’ll stop here since it’s likely unproductive to continue.
If they could produce an AGI as smart as, let's say a mouse, that would be good evidence that they're on the right track. So far nothing is even close to that level. Depending on how you measure, they're not even really at the flatworm level yet. All the AI technology produced so far has been domain specific and doesn't represent meaningful progress towards true generalized intelligence.
Are you aware of some of the recent progress? Did you have a look at the Gato model and Flamingo by DeepMind, or at the chat logs of models like chinchilla and lambda? Or Alphacode? This is all from this year.
I think your point is that all these models are still somewhat specialized. At the same time, it appears that the transformer architecture works well with images, short video and text at the same time in the Flamingo model. And gato can perform 600 tasks while being a very small proof of concept. It appears to me that there is no reason to believe that it won't just scale to every task that you give it data for if it has enough parameters and compute.
Yes I've seen those things. They are amazing technical achievements, but in the end they're just clever parlor tricks (with perhaps some limited applicability to a few real business problems). They don't look like forward progress towards any sort of true AGI that could ever pass a rigorous Turing test.
Clearly language models can already fool people into thinking they are human, we might be getting quite close to the adversarial turing test already. In the end, a good initial prompt might be the solution to this, something like "pretend to be a human and step by step create a human identity that you then stick to during the conversation". I'm serious
Choosing a prompt that's a little bit meta seems to work surprisingly well sometimes. It'd be amusing and a little bit poetic if the key to artificial consciousness is to prime a transformer model with "convince yourself that you're human, while paying attention to how you feel".
A minority of the population will always be gullible and easily fooled. So what. Some people were already fooled by the original ELIZA program back in 1966. I would only count a Turing test pass if it can convince a jury of multiple educated examiners after a conversation lasting several hours.
Fooling people with chatbots having clever language constructing has been done for a long, long time, see the Eliza effect[1]. Douglas Hofstadter gave a good demonstration of GPT-3 limitations[2]. GPT-3 is no doubt "better at what it is" than earlier language models. But that doesn't mean it's better at everything humans do with language (tell sense from nonsense, reasonable metacomments, etc).
Most people would probably agree the latest models generalize better than flatworms. Mouse-level intelligence is more challenging and the comparison is unclear.
Flatworms first appeared 800+ million years ago, while mouse lineage diverged from humans only 70-80 million years ago. If our AGI development timeline roughly follows the proportion it took natural evolution, it might be much too late to begin seriously thinking about AGI alignment when we get to mouse-level intelligence. Not to mention that no one knows how long it would take to really understand AGI alignment (much less implementing it in a practical system).
To be more concrete, in what aspects do you think latest models are inferior at generalizing than flatworms or mice, when less known work like “Emergent Tool Use from
Multi-Agent Interaction” is also taken into account
https://openai.com/blog/emergent-tool-use/?
"When you start reading about AI safety, it’s striking how there are two separate communities—the one mostly worried about machine learning perpetuating racial and gender biases, and the one mostly worried about superhuman AI turning the planet into goo" - great quote.
What worries me more are bad people doing bad things with AI, malicious use of AI, or just AI negligence. Deep fakes. Algorithmic decision making taking the human out of the loop (such as bad content moderation and automated account shutdown). Lack of disclosure. Lack of consent. Autonomous systems with poor failure modes.
It's not that I'm not concerned with bias and AI systems going haywire, but the above scenarios seem to get less attention from researchers, probably because their employers might be perpetuating many of these above issues of AI safety.
IMO, deepfakes are a public good because they reduce the sting of blackmail. Does someone have an incriminating video of you? Let them release it and then point out that the shadows look all wrong.
I'm not sure why bad content moderation would be a problem but bias wouldn't be. Both involve people treated unfairly by a system and both happen because the system uses "markers" for undesirable things, "markers" that don't by themselves prove you're doing the undesirable thing. There was post here a month or two ago about a guy suddenly putting a bunch of high end electronics for sale on some big site and being perma-banned just for a pattern that's common for fraudsters. The software that decides that some people don't deserve bail is operating by a similar method - markers which don't prove anything about the person by themselves, that often involve things that signify race, and are taken as sufficient to deny a person bail.
There are already non-self-driving cars that get speed limits from signs. I’ve seen that feature in a Honda for example. I imagine you’d have multiple sources like a max speed for that type of road as a fallback. And you need to read speed limit signage due to temporary limits. There’s also variable speed limit roads near me now and so you have to read those electronic signs unless the database is updated very often (though no humans seem to obey those limits).
No, many ordinary cars contain image classifiers that read speed limit signs dynamically. This is pretty standard. And they sometimes get it wrong e.g. my car reads 80 MPH as 60 MPH about a third of the time, much to my dismay.
The dynamic classification is required because the world isn't static. An increasing number of locales have digital speed limit signs that vary the speed limit dynamically, some times independently per lane. Automation requires cars to respond to the world as it is, not how the world was when it recorded a month ago.
It’s (not) self-driving in the same way planes do (not) fly. Just because something doesn’t do things the way we (or our or a bird) does doesn’t mean it doesn’t achieve its goal.
if a street changes into a one-way one day (signaled by a sign) relying on a map will lead to a big unhappy problem.
sure, if you consider everything selfdriving that works on a NASCAR track, then yes, a map is sufficient, but if we are talking about driving on public roads then recognizing and "obeying" signs visually seems like a hard dependency.
It's a tricky field, and not a coincidence that at least 3 (maybe more) high profile disgruntled employees from Google have been from this area.
I think of it as kind of like security, in that you are sometimes seen as against the push of the overall project/area. However unlike security there are 0 software tools or principles that anyone agrees on.
It's a secular/religious divide. (As it says in the post.)
Though it's possible the people who think a theoretical future AI will turn the planet into paperclips have merely forgotten that perpetual motion machines aren't possible.
There is nothing religious with thinking about whether failure modes of advanced artificial intelligence can permanently destroy large parts of the reality that humans care about. Just like there was nothing religious in thinking about whether the first atomic bombs could start a chain reaction that would destroy all life on Earth.
Part of such precautionary planning involves asking whether such an accident could happen easily or not. There certainly isn't consensus at the moment, but the philosophy very clearly favors a cautious approach.
Most people are used to thinking about established science that follows expected rules, or incremental advances that have no serious practical consequences. But this isn't that. There is good reason to think that we're approaching a step-change in capabilities to shape the world, and even a strong suspicion of this warrants taking serious defensive measures. Crucially for this particular instance of the discussion, OP is favoring that.
There will necessarily be a broad spectrum of opinions regarding how to handle this, both in the central judgement and how palatably the opinion itself is presented. Using a dismissive moniker like 'religious' for a whole segment of it doesn't give justice to the arguments.
Present a counterargument if you feel strongly about it, and see whether that will stand on its own merit.
The post, which calls Eliezer a "prophet" who says people should drop everything in their lives to work on AI safety, agrees with me.
> Present a counterargument if you feel strongly about it, and see whether that will stand on its own merit.
This is a bad way to talk to rationalists because it's what they think solves everything and is the reason they're convinced an AI is going to enslave them. As long as you're actually right, saying "no that's dumb and not worth worrying about" is superior to logical arguments about things you can't have logical arguments about (because there are unenumerable "unknown unknowns" in the future). This is called "metarationality".
e.g. Someone could decide to kill you because they don't like one of your posts (1). Is there any finite amount of work you could do to stop this? No (2). Should you worry about this? No (3).
You can't logically prove the 2->3 step, nor can you calculate the probability of it being a problem, but it still doesn't seem to be a problem.
Self-reproducing machines are capable of covering the surface of the planet, yes. There's one right now (covid). But there's lots of energy and oxygen up here and they rarely displace other such machines (species) or even displace much of any earth and water. And because they're self-contained and self-reproducing, all of their instructions can be lost over time to entropy including the ones we're afraid of.
None of em replace the entire planet though. That's a lot of rock to digest without any more energy to help you do it.
And a paperclip factory isn't self-reproducing (that would be a paperclip factory factory). It's just a regular machine that can break down. The people afraid of that one are imagining a perfect non-breaking-down non-energy-requiring machine because they've accidentally joined a religion.
I'm not talking about covid; covid is not covering the planet. I'm talking about life in general.
All that oxygen comes from all the plants.
Yes, life has so far only covered the top of the planet. You are right that a paper clip maximizer would need quite a bit of time to go deeper than life has gone (if it would get there at all).
> And a paperclip factory isn't self-reproducing [...]
Why wouldn't it? If your hypothetical superhuman AGI determined that becoming self-reproducing would be the right thing to do, presumably it would do that.
No perfection required for that. Biological machines aren't perfect either. Just good enough.
You are right that thermodynamics puts a limit on how fast anything can transform the planet into paperclips or grey goo.
Though the limit is probably mostly about waste heat, not necessarily about available energy:
There's enough hydrogen around that an AGI that figured out nuclear fusion would have all the energy it needs. But on a planet wide basis, there's no way to dissipate waste heat faster than via radiation into space.
(Assuming currently known physics, but allowing for advances in technology and engineering.)
---
Of course, when we worry about paperclip maximisers, it's bad enough when they turn the whole biosphere into paperclips. Noticing that they'll have a hard time turning the rest of the earth into paperclips would be scant consolation for humanity.
(But the thermodynamic limits on waste heat still apply even when just turning the biosphere into paperclips.)
Agreed, and being a bit of a religiophobe, the thought of living through some sort of butlerian jihad scares me enough, regardless of whether the machines can actually kill us all.
I think you are conflating different concepts. It is clearly imaginable that a very intelligent agent could end humanity if its objective would require so. How exactly is "perpetual motion machines can't exist" related to this? How is it going to prevent an agent from engineering 1000 pandemic viruses at the same time?
> It is clearly imaginable that a very intelligent agent could end humanity if its objective would require so.
This is quite possible. Indeed, I don't believe this is exclusive to superintelligence or requires it at all. Compare to the closest thing we have to "inventing AGI" - having babies. People do that all the time and there isn't a mathematical guarantee that baby won't end humanity, but we don't do much to stop it, and that's not considered a problem. Mainly, why would it want to?
I don't think superintelligence even gives them much advantage if they wanted to. Being able to imagine a virus real good doesn't actually have much to do with the ability to create one, since plans tend to fail for surprising reasons in the real world once you start trying to follow them. Unless you define superintelligence as "it's right about everything all the time", but that seems like a magical power, not something we can invent.
> How exactly is "perpetual motion machines can't exist" related to this?
It wouldn't be able to do the particular kind of ending humanity where you turn them all into paperclips, though it could do other things. There's plenty of ways to do it that reduce entropy rather than increase it - nuclear winter is one.
The anthropomorphism is misleading. No one expects that an AGI would "want to" in the commonplace sense of being motivated by animosity, fear, or desire. The problem is that the best path to satisying its reward function could have adverse-to-extinction level consequences for humanity, because alignment is hard, or maybe impossible.
But now you have appealed to anthropomorphism (“intelligence”) to pose a problem yet forbidden anthropomorphism in an attempted counter argument. That doesn’t seem quite fair.
I don't intend to forbid anything - I just think the language of motivation and desire makes it harder to see the risks, because it introduces irrelevant questions into the the conversation like "how can machines want something?"
Conversely, at least in this discussion, the term "intelligence" seems pretty neutral.
> I just think the language of motivation and desire makes it harder to see the risks, because it introduces irrelevant questions into the the conversation like "how can machines want something?"
Yet discourse on existential AI risks is predicated on something like a "goal" (e.g. to maximise paperclips). Notions like "goal" also make it harder to see clearly what we are actually discussing.
> the term "intelligence" seems pretty neutral
Hmm, I'm not convinced. It seems like an extremely loaded term to me.
AIs absolutely do have goals, determined by their reward functions.
Yes, "intelligence" is a deeply loaded term. It just doesn't matter in the context of the discussion here, so far as I've seen.its ambiguities haven't been relevant.
> AIs absolutely do have goals, determined by their reward functions.
You're confusing "AIs" (existing ML models) with "AGIs" (theoretical things that can do anything and are apparently going to take over the world). Not only is there not proof AGIs can exist, there isn't proof they can be made with fixed reward functions. That would seem to make them less than "general".
You seemingly are portraying people who worry about long term risks of ai as members of a religious cult. But you also acknowledge that AI could end humanity? The question of why AI would want to kill us has been addressed by other people before, simplified: your atoms are useful for many objectives. Humans use resources and might plot against you.
> You seemingly are portraying people who worry about long term risks of ai as members of a religious cult.
Strictly speaking, we can limit that to people who rearrange their lives around reacting to the possibility, even in sillier (yet not disprovable) forms like Roko's Basilisk.
People who believe having a lot of "intelligence" means you can actually do anything you intend to do, no matter what that thing is, also get close to it because they both involve creating a perfect being in their minds. But that's possible for anyone - I guess it comes from assuming that since an AGI would be a computer + a human, it gets all the traits of humans (intelligence and motivation) plus computer programs (predictable execution, lack of emotions or boredom). It doesn't seem like that follows though - boredom might be needed for online learning, which is needed to be an independent agent, and might limit them to human-level executive function.
The chance of dumb civilization-ending mistakes like nuclear war seems higher than smart civilization-ending mistakes like gray goo, and can't be defended against, so as a research direction I suggest finding a way to restore humans from backup. (https://scp-wiki.wikidot.com/scp-2000)
> Self-reproducing machines are capable of covering the surface of the planet, yes. There's one right now (covid). But there's lots of energy and oxygen up here and they rarely displace other such machines (species) or even displace much of any earth and water. And because they're self-contained and self-reproducing, [...]
Your analogy is weak and also false: viruses can't self-reproduce, but need to bind to a host's protein synthesis pathways.
That makes my point stronger if you're claiming self-reproducing machines don't exist. Of course I thought of using a bacteria or an algae bloom there, and even though I didn't, pretending I did is a better use of your time than commenting surely? The future robot torture AI isn't going to like that.
This is your friendly physics reminder that perpetual motion machines have nothing to do with this. It's hard to turn the whole planet into paperclips because paperclips are mostly made of iron, while the planet contains many other elements. Of course, with a high enough level of technology, it might be possible to fuse together the non-iron elements, so that you would end up with just a bunch of iron nuclei. This would even be energetically favourable, since iron is so stable. Then you just have to solve the issue that the paperclips in the center of the planet would be under huge pressure and would be crushed.
It's less than perpetual, but a planet is a lot of raw material to work through for a machine without it breaking down, if the machine's also eaten all the people who can repair it.
The debate around "What is AGI?" is becoming increasingly irrelevant. If in two iterations of DallE it can do 30% of graphic design work just as well as a human, who cares if it really "understands" art. It is going to start making an impact on the world.
Same thing with self driving. If the car doesn't "understand" a complex human interaction, but still achieves 10x safety at 5% of the cost of a human, it is going to have a huge impact on the world.
This is why you are seeing people like Scott change their tune. As AI tooling continue to get better and cheaper and Moore's law continue for a couple years, GTP will be better than humans at MANY tasks.
> If in two iterations of DallE it can do 30% of graphic design work just as well as a human, who cares if it really "understands" art. It is going to start making an impact on the world.
From an AI safety perspective, it is because understanding is a key step towards general-purpose AI that can improve / reprogram itself in any arbitrary way.
It’s worth being clear about what AI risk is. This has nothing to do with “AI may do some harm by putting lots of people out of work”.
The idea is that there is _existential risk_ (ie species-extinction) once an AI can self-modify to improve itself, therefore increasing its own power. A powerful AI can change the world however it wants, and if this AI is not aligned to human interests it can easily decide to make humans extinct.
Scott said in the OP that he now sees AGI as potentially close enough that one can do meaningful research into alignment, ie it’s plausible that this powerful AI could arrive in our lifetimes.
So he is claiming the opposite of you; AGI is more relevant than ever, hence the career change.
I agree with your premise that non-General AI will continue to improve and add lots of value, but I don’t think your conclusion follows from that premise.
I agree that putting lots of people out of work isn't the problem. The problem is that these Non-General models become very powerful and they can be programmed by humans to do very impactful things. So much so that even if AGI comes into existence just a few years later it will be of minimal impact to the world.
> "What is AGI?" is becoming increasingly irrelevant.
It's always been irrelevant in the practical sense. It's just an interesting conversation piece particularly among the general public where they're not going to discuss specific solutions like algorithms or techniques.
Why does the question of defining AGI even need to enter into this?
Aaronson's post only sort of obliquely touches on AGI, via OpenAI's stated founding mission, and Yudkowsky's very dramatic views. Most of the post is on there being signs that the field is ready for real progress. AI safety can be an interesting, important, fruitful area without AI approaching AGI, or even surpassing human performance on some tasks. We would still like to be able to establish confidently that a pretty dumb delivery drone won't decide to mow down pedestrians to shorten its delivery time, right?
"[...] where the misuse of AI for spambots, surveillance, propaganda, and other nefarious purposes is already a major societal concern [...]"
I'm curious what he will do and whether for example he approves of the code laundering CoPilot tool. I also hope he'll resist being used as an academic promoter of such tools, explicitly or implicitly (there are many ways, his mere association with the company buys goodwill already).
The objection here is that its training was based on code on github without paying any attention to the license of that code. It’s generally considered ok for people to learn from code and then produce new code without the new code being considered a derived work of what they learned from (I’m not sure if there is a specific fair use clause covering this). But it’s not obvious that copilot should be able to ignore the licenses of the code it was trained on, especially given it sometimes outputs code from the training set verbatim. One could imagine a system very similar to copilot which reads in GPL or proprietary code and writes functionally equivalent code while claiming it’s not a derived work of the original and so isn’t subject to its licensing constraints.
Scott Aaronson adds the following in the comment on his blog post in response to a question about this:
> the NDA is about OpenAI’s intellectual property, e.g. aspects of their models that give them a competitive advantage, which I don’t much care about and won’t be working on anyway. They want me to share the research I’ll do about complexity theory and AI safety.
The latest huge AI models that may germinate into an AGI all come from private corporations, largely because of the requisite resources and currently there’re very few if any public or nonprofit AI-focused organizations with such resources.
I'm really happy this is happening and hope to see more. Namely, the AI safety & alignment challenge attracting our best minds who would previously have prioritized other math, physics and comp sci.
I'm not sure how Scott ended up buying the party line of the weird AGI Doomsday Cult but so be it. In any case, none of the things he says about verifying AI in this post make any sense at all, and if OpenAI actually cares about verifying AI and not just about hiring people who believe in the AGI Doomsday party line, probably they should hire verification people. Alas, that is not the point.
AI is not going to become self aware and destroy the world.
AI is going to cause something like the industrial revolution of the 19th century: massive changes in who is rich, massive changes in the labor market, massive changes in how people make war, etc.
It’s already started really.
What worries me most is that as long as society is capitalist, AI will be used to optimize for self-enrichment, likely causing an even greater concentration of capital than what we have today.
I wouldn’t be surprised that the outcome is a new kind of aristocracy, where society is divided between those who have access to Ai and those who don’t.
And that I don’t think falls into the “Ai safety” field. Especially since OpenAi is Vc-backed
You can be very worried about the medium-term dangers of AGI even if you believed (which I don't) that consciousness could never arise in a computer system. I think it can be a useful metaphor to compare AGI to nuclear weapons. Currently we're trying to figure out how to make the nuclear bomb not go off spontaneously, and how to steer the rocket. (One big problem w/ the metaphor is that AGI will be very beneficial once we do figure out how to control it, which is harder to argue with nuclear weapons).
Most of these AGI doom-scenarios require no self-awareness at all. AGI is just an insanely powerful tool that we currently wouldn't know how to direct, control or stop if we actually had access to it.
> "Most of these AGI doom-scenarios require no self-awareness at all. AGI is just an insanely powerful tool that we currently wouldn't know how to direct, control or stop if we actually had access to it."
You're talking about "doomsday scenarios". Can you actually provide a few concrete examples?
Over the course of years, we figure out how to create AI systems that are more and more useful, to the point where they can be run autonomously and with very little supervision produce economic output that eclipses that of the most capable humans in the world. With generality, this obviously includes the ability to maintain and engineer similar systems, so human supervision of the systems themselves can become redundant.
This technology is obviously so economically powerful that incentives ensure it's very widely deployed, and very vigorously engineered for further capabilities.
The problem is that we don't yet understand how to control a system like this to ensure that it always does things humans want, and that it never does something humans absolutely don't want. This is the crux of the issue.
Perverse instantiation of AI systems was accidentally demonstrated in the lab decades ago, so an existence proof of such potential for accident already exists. Some mathematical function is used to decide what the AI will do, but the AI ends up maximizing this function in a way that its creators hadn't intended. There is a multitude of problems regarding this that we haven't made much progress on yet, and the level of capabilities and control of these systems appear to be unrelated.
A catastrophic accident with such a system could e.g. be that it optimizes for an instrumental goal, such as survival or access to raw materials or energy, and turns out to have an ultimate interpretation of its goal that does not take human wishes into account.
That's a nice way of saying that we have created a self-sustaining and self-propagating life-form more powerful than we are, which is now competing with us. It may perfectly well understand what humans want, but it turns out to want something different -- initially guided by some human objective, but ultimately different enough that it's a moot point. Maybe creating really good immersive games, figuring out the laws of physics or whatever. The details don't matter.
The result would at best be that we now have the agency of a tribe of gorillas living next to a human plantation development, and at worst that we have the agency analogous to that of a toxic mold infection in a million-dollar home. Regardless, such a catastrophe would permanently put an end to what humans wish to do in the world.
By the time you find such evidence, it could already be close to game over for humanity. It’s important to get this right before that.
We already have significant warnings. See for yourself if latest models like Imagen, Gato, Chinchilla have economic values and can potentially cause harm.
Historical examples of perverse instantiation are everywhere: Evolutionary agents learning to live off a diet of their own children, machine learning algorithms attempting to learn gripping a ball cheating the system by performing ball-less movements that the camera erroneously classifies as successful, an evolutionary algorithm to optimize the number of circuit elements in a timer creating a timer circuit by picking up an external radio signal unrelated to the task and so on. Some examples are summarized here: https://www.wired.com/story/when-bots-teach-themselves-to-ch...
GP wanted a concrete example of a doomsday scenario of failed AI alignment, so in that context extrapolating to a plausible future of advanced AI agents should suffice. If you need a double-blind peer reviewed study to consider the possibility that intelligent agents more capable than humans could exist in physical reality, I don't think you're in the target audience for the discussion. A little bit of philosophical affinity beyond the status quo is table stakes.
Agree with basically all of your points. I have huge concerns that the humanity of the future will basically split into two different species: the technocracy and the underlings. Sounds like science fiction but it honestly feels like we're headed in that direction. Even today, the privilege afforded by a life in technology and among technologists seems to set a person apart from the rest of the world to such an extent that they almost forget it exists. It feels like such a technocracy would have no moral right to exist. It can't really just be survival of the fittest, can it? I'll just keep believing (pretending?) that the answer is no.
>> Even today, the privilege afforded by a life in technology and among technologists seems to set a person apart from the rest of the world to such an extent that they almost forget it exists.
I agree on your second point, but those in medicine, finance, or law enjoy similar salaries and quality of life to those in tech. Furthermore to really set yourself apart and join the global super rich you can’t really do that by selling your labor no matter your field.
To have access to the forefront of AI you have to be super-rich. I don’t see how AI will change that, if anything will make it harder to change by giving yet another advantage to those who already have plenty.
You don't have access to the forefront of AI. You have the ability to give money to those that do so you can use what they've made.
To have access to the forefront of AI means being able to make, own and profit from things like GPT-3, and it requires access to vast computational and data resources.
That's not the point. Whether OpenAI is (yet) profiting from its models doesn't change the fact that GP lacks the resources to do what OpenAI is doing.
Bare assertion fallacy? This question is hotly debated and I don't believe it can be so easily dismissed like that. It is not obvious that aligning something much smarter than us will be a piece of cake.
Should I really add “in my opinion” to all the sentences I write? We are a smart bunch here. We can figure out when statements lack nuance in order to provoke some reaction.
We’re talking about the future here and a fairly complex one at that. So obviously I don’t know more than the next guy.
It's a really absurd opinion that AI will destroy the world, and one that does not deserve serious consideration in any research community. It's only in strange Rationalist corners and the companies in Silicon Valley that echo those corners that this is considered at all "hotly debated."
Why do you think it's absurd? If we do eventually create an AGI that is significantly smarter than us in most domains, why is it that we should expect to be able to keep it under control and doing what we want it to?
In academia, a "sabbatical" means you take some time off from teaching courses, advising students and doing administrative work so you can concentrate on your research. So in order to stay on sabbatical, he'd need to get the AI to do that other stuff.
Working on OpenAI instead of trying unconventional options such as decentralized models governance might increase inequalities . Why would the community decide to repeat what they denounce in big tech?
So, physics is a dead-end? Given that Scott is running his own research lab, a year is a very long time and him working out of his field is an indication that physics is in a big trouble.
I am confused, how did you reach that conclusion? How does this announcement relate to the future of physics research? Sure, Scott's research is at the intersection of complexity and physics, but he is a CS professor at Texas working in the theoretical computer science department. His work leans far more towards TCS, with some work having connections to cosmology (he cares about physical limits of the universe and the information theory of things like black holes) and other interesting ideas from physics. But the main themes of his work have been quantum algorithms and complexity for a while. He's also nowhere near the experimental side of physics.
The original doesn't say "10-15 meters" but ten to the power of negative fifteen meters, so his guess was off from the Bohr radius of 5.3E-11 in the other direction but by much fewer orders of magnitude than as rendered above.
Unfortunately that just means you don’t create one. To prevent one being created, you have to either somehow figure out a way to get everyone in the world to agree not to create one, or obtain enough global power that you can forcibly stop anyone creating one. Not exactly easy!
reminds me of a very frightening quote from security specialist Gavin de Becker (https://en.wikipedia.org/wiki/Gavin_de_Becker), paraphrased: "every evil that you can think of, someone will have done it"
The assumption many in the field make is that _someone_ is going to create General Purpose AI, and we'd rather it be people who want it to be 'good' (aka 'aligned').
Best case, that AI can prevent the creation of harmful AI, though that's glossing over a lot of details that I'm not qualified to describe.
If you want to stop the world from containing general intelligence, you'd have to stop everyone from having children, which are equally generally intelligent to AGIs (but possibly less specifically intelligent) and are even more dangerous since then actually exist.
The reason people don't accuse every random child of possibly ending the world is because things that actually exist are just less exciting.
> Also, next pandemic, let's approve the vaccines faster!
This is obviously very important to them. Is there some proof that the vaccine was unnecessarily delayed or just that they believe if we mess up and humanity suffers, so what?
The point aiui is mostly arguing that the FDA errs too much on the side of caution in this area, and the trade-off would have been worth it to approve earlier. Not insinuating that like, there was some corruption (or laziness or something) that delayed it.
Depends what you mean by quantum computing; if you mean the warehouse sized labs full of lasers and cryogenics then you are correct. If you mean the study of quantum algorithms then actually it's been an exciting year. The first polynomial speed-up for an NP unstructured pseudo random function was published in April. It's not even clear how big a deal it is!
Lots of really cogent points. Hopefully he reads this comment and swiftly begins working on designing a Diplomatic Agent to avoid the MAD or getting a good peace treaty
Semi-related - I'd want to see some actual practical application for this research to prove they're on the right track. But maybe conceptually that's just impossible without a strong AI to test with, at which point it's already over? Alignment papers are impressively complex and abstract but I have this feeling while reading them that it's just castles made of sand.