Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

However, to consider artificial intelligence to be a potential global catastrophe at all, let alone the single one requiring extra funding, is mostly unfounded. We currently can barely even define what the actual risk is, let alone how to mitigate it.

Although I happen agree with you personally, I don't think we should commit the fallacy of assuming that because their position seems absurd to us that it comes from a place of bias or ignorance.

To the contrary, when I listen to Holden and other EA leaders talk, it's clear they've spent way more time thinking about this stuff than I have. He is and they are thoughtful and humble about exactly the questions you pose: How much should we weight "known good" good done today vs "potential good" done in the future, how confident should we be in our ability to predict the future, etc.

As one example, Open Phil has hired historians to conduct research on how well people have (in the past) been able to predict the future, precisely to inform their thinking in this regard.

Holden is also open about how his thoughts on the importance of AI alignment have changed over time.

Again, we can disagree with him (as I do). But we absolutely should not claim that because we disagree, his position must be out of a place of thoughtlessness or bias.



It's perfectly reasonable for them to hold that view. The unfortunate thing is to insist that it's the most rational and/or correct view. To say that they're biased isn't an insult. We all have biases, and it's useful to recognize and admit them.


Could you link/cite where they insist that it's the Right Thing?


It's called "Effective Altruism"


That's an aspirational statement, not a claim to have attained perfection.


>I don't think we should commit the fallacy of assuming that because their position seems absurd to us that it comes from a place of bias or ignorance.

I had a really smart person talk about AI and how to deal with it. His conclusion was a gigantic let down. He was out of his element, hes an economist, but his conclusion was-

Either AI is going to be peaceful, or its going to kill us and there is nothing we can do to stop it.

Maybe most civilizations end like this, but why not look for third options?


> Either AI is going to be peaceful, or its going to kill us and there is nothing we can do to stop it.

Have you heard of the AI Box experiments?

http://yudkowsky.net/singularity/aibox/

The problem of containing a hostile AI does not seem to be particularly tractable to me.


My comment is going to be unpopular, and I'll admit my bias upfront: I think Yudkowsky is a crank, and is neither a psychology nor an AI expert (he's a self-proclaimed expert, but he's not actually engaged in academic research on AI, because of reasons).

His "experiment" is hard to control or reproduce, its goals are ill-defined, its results are hidden (really, which sort of experiment hides its results and merely asks us to have faith the result was positive?). He makes a lot of unwarranted assumptions, like "a transhuman mind will likely be able to convince a human mind" (why? where is the scientific or psychological evidence that a superior mind must necessarily be able to convince inferior minds of arbitrary things? This is a huge, unwarranted assumption right there).

This kind of psychological experiments -- because this is what they really are, rather than about AI -- are really hard to conduct properly, its results hard to interpret and difficult to reproduce even for subject matter experts, which Yudkowsky isn't. This one looks like it was designed by an amateur who happens to be a fan of sci-fi.


I am aware of one reproduction of the experiment, the goals seem pretty darn explicit, and its results are public. He has stated the rules of engagement, and has said that he did it "the hard way". If nothing else, one should at least be confident that Yudkowsky is honest.

His claim that "a transhuman mind will likely be able to convince a human mind" is what his experiment demonstrates, not what is assumes, and frankly it is absurd to make it sound like he has not repeatedly given justifications for the statement.

What actual misinterpretations or other issues are you worried about?


- What reproduction? What would you consider a successful reproduction, for that matter? If I told you I reenacted the experiment at home with a friend, would you consider this a reproduction? Someone saying they reproduced it on the internet would convince you? What are your standards of quality?

- What is the goal of the experiment? Is the goal "show that a transhuman AI can convince a human gatekeeper to set it free"? Or is it actually "show that a huthatman can talk another human into performing a task", or even "an internet (semi)celebrity can convince a like-minded person into saying they would perform a task of very low real-world stakes". How would you tell each of these goals apart?

- The results are most definitely not public. What is public is what Yudkowsky claims the results were, but since the transcripts are secret and there are no witnesses, how do we know they are true (or even not assuming dishonesty or advanced crankiness, how can we tell if they are flawed?). Would you believe me if I told you I have a raygun that miniaturizes people, that I have tested it at home and it works, and that I have a (very small) group of people who will tell you what I say is true? No, I cannot show you the raygun or the miniaturized people, but I can tell you it was a success!

- "A transhuman mind will likely be able to convince a human mind" is what is stated as truth in the fictional conversation at the top of the AI-box experiment web page. Yudkowsky has repeatedly provided "justifications", but these are unscientific and unreasonable.

Yudkowsky claims that because a person can convince another person of claiming they would perform a task (setting an hypothetical AI free), that then a "transhuman" mind is likely to convince a human gatekeeper. The logical disconnect is huge. First, that people can convince other people of things is no big revelation. Unfortunately, it doesn't follow that because some people can convince other people of some things in certain scenarios, then people can universally convince other people of arbitrary things in every context. Worse, we don't even know what a "transhuman" mind would be like; assuming it means "faster thoughts" (a random assumption), why would more thoughts per minute translate into higher convincing capacity? Is it true, for that matter, that higher intelligence translates into higher ability to convince others of stuff?

----

Another example of methodological flaws: in both runs of the experiment, the participants seem to be selected from a pool of people fascinated by this kind of questions and who would be open to suggestion that a "transhuman" mind can convince them of stuff. Let's look at them:

First participant: Nathan Russell. Introduces himself as

> "I'm a sophomore CS major, with a strong interest in transhumanism, and just found this list."

Then shows interests in a similar experiment and considers how it could be designed. Note that the list itself, SL4, is for people interested in the "Singularity". Enough said.

Second participant: David McFadzean. Correctly claims the first experiment is not proof of anything, and is willing to take part in a second experiment. Later Yudkowsky describes him like this:

> "David McFadzean has been an Extropian for considerably longer than I have - he maintains extropy.org's server, in fact - and currently works on Peter Voss's A2I2 project."

The mentioned website still exists and it has something to do with a Transhumanist Institute. I start to see a pattern here.


The only experiment I know of and would consider a serious attempt at reproduction would be Tuxedage's series,

https://www.lesswrong.com/posts/FmxhoWxvBqSxhFeJn/i-attempte...

https://www.lesswrong.com/posts/dop3rLwFhW5gtpEgz/i-attempte...

https://www.lesswrong.com/posts/oexwJBd3zAjw9Cru8/i-played-t...

His total is 3 for 3. I do not know how to explain these results without either taking them to be honest attempts at a fair experiment or by assuming those involved colluded. I find the latter absurd, given my priors about the honesty of members of LessWrong (Yudkoswky in particular, though he wasn't involved in the reproduction).

> If I told you I reenacted the experiment at home with a friend, would you consider this a reproduction? Someone saying they reproduced it on the internet would convince you?

It is not so simple. I would want evidence that you and your friend were smart and had a decent understanding of the domain, and that your friend was in a similar state of unbelief about the plausibility of being convinced. I would want a statement that it was a serious attempt at doing things "the hard way" and true to the experiment, on both sides, lest you get [1]. Of course I would want the standard rules, or a reasonable modification publicly stated, in addition.

[1] https://pastebin.com/Jee2P6BD

> What is the goal of the experiment?

To show that "I can't imagine anything that even a transhuman could say to me which would change [my mind]" is not evidence, and should not be treated as such. To provide evidence that "humans are not secure systems".

You say "very low stakes", but Yudkowsky convinced someone who had offered a $5000 handicap. That hardly seems like a trivial quantity.

> [maybe it's all a lie]

You have to be very cynical to take this worldview.

> we don't even know what a "transhuman" mind would be like

The experiment is under assumption of a true singularity, ergo. nigh-unlimited intelligence. I can discuss what outcomes I think are likely for AI development, or which are merely plausible, but the experiment is about one particular hypothetical, so that would be a different conversation.

> the participants seem to be selected from a pool of people fascinated by this kind of questions and who would be open to suggestion that a "transhuman" mind can convince them of stuff

I am unconvinced that this experiment would work if the gatekeepers did not have an understanding of the topic; they are meant to play a gatekeeper, after all. A person who considers the singularity plausible but thinks an AI box is effective seems like the perfect control should the singularity happen and people want to figure out whether to AI box it.


But that's just it: I simply don't think Yudkowsky or any of the sort of people who would be enthusiastic about the sci-fi theories on SL4, or host extropy.org, or believe in Roko's Basilisk, or read Harry Potter fanfic and find it philosophically insightful, have a decent understanding of the AI domain. Everything about him and his followers smacks of fringe cultists completely outside mainstream research.

I don't think the chosen participants have a particularly deep understanding of the domain, they just think they do (because that's what defines Singularity believers, LessWrong readers, and people who believe they are hyper "rational" and that this is some kind of superpower). I think they understand AI no more than a Star Wars fan understands space travel.


Sure, I don't particularly care if you or anyone else wants to disengage from LessWrong-esque ideas because they sound weird. I only entered this discussion because it sounded like you might have had an actual argument.


This could be really cool, but the conversation was impossible to read. The links/website is designed badly.

Howd he let the AI out both times?


There have been other instances of the AI box experiment where the dialog is public. Like this one https://www.lesswrong.com/posts/fbekxBfgvfc7pmnzB/how-to-win... .

Yudkowsky's original intention to not release the dialog was to prevent people from saying "I wouldn't have been swayed by that, therefore AI escaping is impossible!". Even if we grant the first part of the sentence, that an AI escaping is impossible doesn't follow. It's very much possible, and strong evidence is that only human-level intelligences have escaped the same situation.


> and strong evidence is that only human-level intelligences have escaped the same situation.

I don't know if this is the same situation, but laboratory beagles, farmed mink, etc show that non-humans can pursuade humans to release them from cages.


> Howd he let the AI out both times?

No one knows. That's the point.

An AI will eventually be much smarter than a human, so it doesn't matter how the human succeeds - it's enough to know that a human has succeeded even once.


Hmm, would like a different human to try it.

I read some of the dialog and the user being the gatekeeper talks openly about his inability to socialize that caused him to receive special social treatment until 10th grade.

What if there were 2 gatekeepers, north korean style?


It seems pretty tractable to not build such a thing as a hostile AI.


You have to find every group of humans trying to build an AI, and either convince or force them not to do so.

This seems hard, given:

- (a) The cost of starting an AI venture is minimal (perfectly within reach for a small group or a smart individual with $10 million or less for salaries plus cloud computing expenses). It's a lot harder to keep tabs on small and non-obvious activity like this than, say, the enormous centrifuge facilities needed to refine uranium for nuclear weapons.

- (b) The personal and societal economic, and other, benefits of advancing the state of the art in AI are potentially enormous. People will be motivated to try, whether their goal is to line their pockets, to better understand what intelligence is by advancing the state of the art of trying to put it into machines, or to make the world a better place.

- (c) The problem domain is not well understood, so it's possible to stumble into a dangerous design by accident.

It seems like only some extreme dystopian scenario could halt the progress of technology to the point where an AI-related disaster becomes impossible.

For example, a global war or disaster that destroys civilization's logistical and technological bases, kills much of the world population and forces the survivors to focus mainly on survival. Or an ideological revolution that sees all the nerds of the world lined up against a wall and shot, and those who remain alive become culturally permanently uninterested in advancing technology to avoid the same fate. Or a world-dominating tyrannical government that keeps a close watch on every expert programmer and computer system in the world, monitoring to make sure unauthorized AI experiments don't take place.

Even that might not be enough, because if any of the human population survives, even the biggest global catastrophe, cultural revolution or tyrannical world empire's effects probably won't last more than a few thousand years.


I'm not so sure. Even prosaic, current AI, can be very unpredictable. See this blog post http://aiweirdness.com/post/172894792687/when-algorithms-sur... , which is a somewhat more causal summary of this paper "The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities" https://arxiv.org/abs/1803.03453v2

The authors of the paper explain a bunch of anecdotes when they wanted the AI to do one thing, but the measure they told it to optimise for wasn't actually what they intended, and so unexpected things happened.


That's an amusing read, and sure, AI can do things we don't expect. I think we're more on the order of a bridge failure and less on the order of Skynet here, though. To end up in that kind of nightmare scenario we have to entrust some AI with capabilities we could just elect not to.





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: