What if we set GPT-4 free in Minecraft?

Imnimo · on May 26, 2023

It's always fun to look at the prompts used by these projects. Here are a few snippets from this one:

>You are a helpful assistant that tells me the next immediate task to do in Minecraft. My ultimate goal is to discover as many diverse things as possible, accomplish as many diverse tasks as possible and become the best Minecraft player in the world.

>8) Tasks that require information beyond the player's status to verify should be avoided. For instance, "Placing 4 torches" and "Dig a 2x1x2 hole" are not ideal since they require visual confirmation from the screen. All the placing, building, planting, and trading tasks should be avoided. Do not propose task starting with these keywords

>7) Use `exploreUntil(bot, direction, maxDistance, callback)` when you cannot find something. You should frequently call this before mining blocks or killing mobs. You should select a direction at random every time instead of constantly using (1, 0, 1).

>9) Do not write infinite loops or recursive functions.

You can really imagine the sorts of pitfalls the agent fell into that induced the authors to add these stipulations.

doctoboggan · on May 26, 2023

>9) Do not write infinite loops or recursive functions.

did GPT-4 just solve the halting problem?

travisjungroth · on May 26, 2023

One description of the halting problem is that there are not just two categories of programs (halts and doesn’t), but a third of “undecidable”. To limit yourself to the category of “halts” isn’t solving the halting problem.

jrockway · on May 26, 2023

eBPF is a real-world system that also refuses to accept programs that might not halt. You don't have to solve the halting problem, you just have to deny the programmer loops (and loop-alikes, of course; recursive functions, jmp, etc).

dragontamer · on May 26, 2023

IIRC, some language model proved that one-layer of loops can be proven to halt vs not-halt.

As soon as you add a 2nd layer of loops however, you reach Turing-completeness and suddenly the halting problem becomes unsolvable.

-------

So you don't need to deny jmp/loops. You just need to deny _nested_ loops. And... find that old paper I read like 15 years ago to figure out the details to discriminate halt/not-halt in the single-layer loop language.

jakear · on May 27, 2023

This is most assuredly false:

    while (state !== STOP):
       read = TAPE[index]
       [state, write, dir] = TABLE[read, state]
       TAPE[index] = write
       index = index + dir

vidarh · on May 27, 2023

It takes much more. You need to e.g. limit APIs you can call to ones that are also guaranteed to halt, and guarantee an exit condition that is guaranteed to occur, which means either seriously limiting input or adding additional implicit conditions.

E.g. this Ruby program is undecidable:

    gets

This one using a hypothetical gets guaranteed to eventually halt is still undecidable:

    while getsWithTimeout != "halt"
    end

napsterbr · on May 26, 2023

You could ask ChatGPT to find the paper for you! It told me the name of an old videogame I was trying to remember for years.

insanitybit · on May 27, 2023

Microsoft's implementation of eBPF allows for loops :D

https://vbpf.github.io/assets/prevail-paper.pdf

deredede · on May 27, 2023

For loops (where the user is not allowed to change the value of the iterator mid-iteration) are guaranteed to terminate.

fardo · on May 27, 2023

Or create more loops, since for example adding

For (;;) { print(“hahaha you didn’t say the magic word!”); }

In another loop would also prevent termination without program shutdowns.

deredede · on May 28, 2023

Don't let weird C syntax choices fool you, this is a while loop, not a for loop.

To clarify, when I said "for loop", I meant what is sometimes called a "counted for loop" (or simply "counted loop"): there is an (maximum, if you allow early exit) iteration count that is computed before executing the loop and can't increase later.

In C syntax, it is for (int i = 0, e = ...; i < e; ++i) { ... } and the body of the loop is not allowed to change the value of either i or e.

Edit: actually I may have been unclear. When I said "for loops are guaranteed to terminate", in the context of the discussion, I meant "if the only kind of loops you allow are (counted) for loops in a language where loop-free expressions are guaranteed to terminate, you get a language where all expressions are guarantee to terminate". So loops can contain other loops, as long as they are all of the "counted for loop" kind.

insanitybit · on May 27, 2023

Yeah, I just wanted to point out that it's possible in ebpf.

deredede · on May 28, 2023

Fair enough :) Although skimming the paper you linked, they say they don't check termination of loops, which surprises me.

wendyshu · on May 26, 2023

If it's undecidable that means it doesn't halt.

sharkbot · on May 27, 2023

Incorrect. The program under inspection could halt or could not halt; undecidability is a statement about Turing machines inspecting that program and unable to make a determination (via a clever proof by contradiction).

The Halting Problem is a truly interesting result, and for the most part uninteresting in practice.

bmacho · on May 27, 2023

Why is this downvoted? It can't happen that you run an undecidable program, and it halts, since its running would be a proof that it halts.

Say your Turing machine is searching for a contradiction in ZFC. It enumerates all the possible proofs, and checks if they are valid and they prove 0=1. You can prove in ZFC that you can't prove that it halts, nor you can prove that it doesn't halt.

Now your Turing machine won't halt, according to ZFC. (It can halt in practice, if ZFC has a contradiction.) Same for any other undecidable programs.

imtringued · on May 28, 2023

Problems which cannot be solved by an algorithm/program are considered undecidable as well. Saying that something that doesn't exist, doesn't halts makes no sense in the general case.

bmacho · on May 28, 2023

The thread is about programs. Existing programs do exist. There are problems not solvable by Turing machines, like the meaning of life, but this thread is not about them.

GGP was talking about programs halting, not halting, and "something else", which GGP called "undecidable".

There are programs that ZFC cannot prove if they halt or not. For example, searching for a contradiction in ZFS is such a program. Its undecidability means that there is no sequence of axioms of ZFC that ends with "this program halts" or "this program does not halt".

But then this program does not halt, since its halting would mean that ZFC can prove that this program halts.

In any case, programs can halt or not halt. There are programs that ZFC cannot prove that they halt or not, but those programs do not halt.

SAI_Peregrinus · on May 29, 2023

> But then this program does not halt, since its halting would mean that ZFC can prove that this program halts.

That step is not logically consistent. It could halt, it's just that ZFC can't prove it will ever do so. For example, the program which computes the 8000th busy beaver number halts (per definition of the busy beaver function), but is undecidable in ZFC[1].

An undecidable program can halt. An undecidable program can run forever. But whatever axiom system is being used to decide that can't prove which (without running the program, potentially for an infinite number of steps).

[1] https://scottaaronson.blog/?p=2725

bmacho · on May 29, 2023

> For example, the program which computes the 8000th busy beaver number

I don't believe that there exist such a program.

SAI_Peregrinus · on May 29, 2023

Why not? That implies that an 8000 state Turing machine which eventually halts when started with a blank tape doesn't exist. If any of them do (and it's trivial to simulate one), then exactly such a program can exist.

bmacho · on May 30, 2023

Well, there can indeed exist a program that writes out the number BB(8000). (In the sense that it is consistent with ZFC that there is a program that writes out the number of steps of the smallest contradiction in ZFC.)

What I don't believe that there exist a program that is a proper counter-example, that is, its halting is undecidable in ZFC, and it halts. Exactly because what I wrote earlier.

enugu · on May 27, 2023

Actually, this comment makes sense and is used in logic to derive true but non-provable propositions withing the current axiom system assuming it is sound.

The problem is that you dont know if the checker that you use to detect if a program halts will itself halt on your given input or continue forever. But given a sound checker with some assumption, one can find non-halting programs which wont be detected by the checker using the diagonalization trick.

kaba0 · on May 27, 2023

n = 1;

while (isFamousMathProblemWeDontKnowWhetherItHoldsForAllTheIntegers(n)) { n++ }

geoelectric · on May 26, 2023

Seems slightly easier for me to guarantee something I wrote halts. The halting problem is more about a process analyzing someone else's arbitrary code.

cyanydeez · on May 26, 2023

Is gpt conceptually always analyzing someone else's code?

IIAOPSW · on May 26, 2023

Are you conceptually always analyzing some stackoverflow code?

moffkalast · on May 26, 2023

    if(going_to_halt)
        dont();

hgsgm · on May 27, 2023

Is there a name for the phenomenon where people who understand a concept don't talk about it because it isn't important or relevant, but people who don't understand think it's a big deal?

jhanschoo · on May 27, 2023

As others mentioned, it's trivial* to write useful programs that provably halt; the impossible task is to determine if an arbitrary program halts.

*: Trivial in a mathematician's sense

redox99 · on May 26, 2023

It's trivial to avoid infinite loops by adding some exit condition in case it takes too long, which is probably what you want here.

And obviously trivial to avoid recursion too.

Too · on May 27, 2023

Avoiding recursion isn’t always that trivial if there are intermediate functions between the first function is called again. More dynamic code with callback functions is another case making this difficult.

nr2x · on May 27, 2023

Probably the best programming advice any anyone could get.

bentcorner · on May 26, 2023

Every program halts because no computer can run forever

narrator · on May 26, 2023

If the brain is like a computer, how do humans even solve the halting problem? Maybe it's because humans feel anxiety at the unproductive passing of time while computers will just do what they're doing forever.

jameshart · on May 26, 2023

Humans can’t solve the halting problem.

Write a program that, for a positive integer, runs the Collatz process on it (if even: halve it; if odd, multiply by three and add one; repeat).

If the process results in a 1, move on to the next number and repeat.

If the process produces a number it produced previously, halt. (Worried you’ll need unbounded state for this part? Use tortoise+hare, it’s fine).

This program halts if and only if the Collatz conjecture is false.

Now, the Collatz conjecture might not be unprovable. But for now no human can tell you whether or not that program halts.

genewitch · on May 27, 2023

When I read about collatz I wrote a C# program that short circuits the calculation like that, but my programming chops are not great and it could only handle 3000 digits of memory serves. It's somewhere on my GitHub page and I'm curious how many other people released similar code to speed things up.

I ended up using it to compare single core performance on any windows machine, because the timestamped logging was deterministic. Rewrote it in python and still use it these days.

umanwizard · on May 26, 2023

Humans can’t solve the halting problem any more than computers can.

For example, it’s trivial to write a program that searches by brute force for a cycle that would disprove the Collatz conjecture, and halts if it finds one. No human knows whether that program will halt.

llamaimperative · on May 26, 2023

They don’t really. People have different thresholds for acceptable exploration costs and heuristics for approximating those costs (as well as the opportunity cost of alternative courses of action).

IMO, in biological systems the explore vs exploit tradeoff is pretty analogous to the halting problem. There doesn’t seem to be an optimal general “solution” to it.

Once an organism is very familiar with its environment it’ll approximate near-optimal trade offs, which would suggest it’s just using heuristics.

pjerem · on May 27, 2023

> If the brain is like a computer

That’s a common misconception. The brain is not like a computer. The brain can’t store and execute programs.

shkkmo · on May 27, 2023

> That’s a common misconception. The brain is not like a computer. The brain can’t store and execute programs.

Can't it? The brain is absolutely capable of emulating a turing complete system such as running a program.

pjerem · on May 28, 2023

Can it ?

I mean, sure you can reason about a portion of code on your screen but there is no way you could emulate anything without some visual support.

There is no way your short term memory could store the program and the variables. The human brain can barely remember 5 to 10 words for several seconds and you have no control on your long term memory.

So yes your brain can somehow emulate a computer if you give him a pen, a sheet of paper paper, time, and a lot of sugar. But that’s not because it’s functioning like a computer but rather because you learnt how a computer work.

kaba0 · on May 27, 2023

The halting problem is about computable functions. Every real thing can at most solve computable functions — so for all practical purposes, a brain is a computer that has all the same limits.

IIAOPSW · on May 26, 2023

Nothing you think about consciously has higher thread priority than the stay-alive loop.

i2cmaster · on May 26, 2023

Humans tend to communicate/write software in what are really sub languages. Often these sub languages are not recursive.

qayxc · on May 26, 2023

In the end it all looks like classical programming with extra steps to me :)

Imnimo · on May 26, 2023

Even more than the complex prompts, they give the bot access to a variety of functions implementing primitive actions, which are crafted to give helpful hints if the bot is struggling. For example, the code for mining a block:

https://github.com/MineDojo/Voyager/blob/main/voyager/contro...

This function keeps a global count of how many times it's been called to mine a block that doesn't exist nearby, and will warn the bot that it should try exploring first instead.

There's nothing necessarily wrong with all that - it's an important research question to understand how much hand-holding the agent needs to be able to do these sorts of tasks. But readers should be aware that it's hardly dropping an agent into the world and leaving it to its own devices.

sterlind · on May 26, 2023

I do love their use of `bot.chat()` as, like, throwing an exception to the AI. maybe this finally gives developers an incentive to document their code and throw intelligible errors - the machines need it to figure out what they're doing wrong!

ineedasername · on May 26, 2023

ChatGPT: I am sorry for the confusion. I am a large language model (LLM) trained on natural language and my training data cutoff is September 2021. Hexadecimal references to memory addresses are not part of my training data set.”

uw_rob · on May 28, 2023

Some would call this Software 2.0 https://karpathy.medium.com/software-2-0-a64152b37c35

tmountain · on May 26, 2023

How did you get the prompt?

cjbprime · on May 26, 2023

https://github.com/MineDojo/Voyager/blob/main/voyager/prompt...

_ktx2 · on May 26, 2023

It would be really cool if they simplified an NPCs world to include GPT prompts for an RPG-like experience.

1270018080 · on May 27, 2023

That's the only application I see for black box language models that hallucinate text.

neuronexmachina · on May 27, 2023

Something like this? https://arstechnica.com/information-technology/2023/04/surpr...

thesuperbigfrog · on May 26, 2023

At first glance this just seems like an alternate approach at building expert systems.

The Minecraft videos are impressive.

Nethack (https://www.nethack.org/) has been used for AI development in the past and more recently:

http://shelf2.library.cmu.edu/Tech/9997774.pdf

https://portfolios.cs.earlham.edu/wp-content/uploads/2018/12...

https://arxiv.org/abs/2211.00539

https://proceedings.neurips.cc/paper/2020/hash/569ff987c643b...

https://github.com/facebookresearch/nle

https://ojs.aaai.org/index.php/AIIDE/article/view/12923

I am curious how well Voyager would do in Nethack.

smcl · on May 26, 2023

That opening sentence is a very funny statement when you take into account that "...in Minecraft" is a way some YouTubers hide hyperbolic/unserious statements (to skirt TOS violations). Like after 100 deaths to Malenia in Elden Ring: "Oh fuck me, I might as well kill myself ... IN MINECRAFT"

opan · on May 26, 2023

I thought the same thing, although I think of the meme as being used to avoid feds and legal problems, like saying you'll kill someone "in Minecraft" to make it not a realistic-sounding threat.

TylerLives · on May 26, 2023

It doesn't seem to be very effective - https://www.kotaku.com.au/2023/03/man-arrested-after-making-...

smcl · on May 26, 2023

Yeah a few people got banned from places using it, and also tried "...in Roblox". I think I remember seeing "... Roblox [your|my]self" as a euphemism too.

GauntletWizard · on May 27, 2023

Euphemisms are a treadmill, and modern "content policies" have only accelerated it. There was a time not long ago that "retarded" was what kids called each other, and now my phone won't type that word. Every euphemism and hyperbolic statement will eventually be taken seriously, and a new one will be created.

IIAOPSW · on May 26, 2023

Its the kiddie version of "asking for a friend".

Gigachad · on May 27, 2023

Minecraft players are in their 20s these days.

lsy · on May 26, 2023

It's not playing in-context of minecraft, it's playing in-context of an API to minecraft. You can see one of the limitations in its error condition when it tries to craft an "acacia axe" out of acacia planks and sticks, fails, and then replaces all the references to "wooden axe". Of course in the real world it doesn't matter what you call the axe you made, and it's pretty clear what an acacia axe is. Even if it did matter, you could also easily keep the function name and output message, and just make an "wooden axe" behind the scenes. The fact that the GPT is so tightly bound to the formalism of the API is an indication that this is a task the GPT can likely do quite well as this API is well-used and documented.

blooalien · on May 26, 2023

Where I think it's gonna start to get really scary, and much more closely approaching "real A.I" / AGI is when they start augmenting and wiring together various differing forms of "A.I." with each other. GPT-4, no matter how impressive it might appear on the surface is still "just" a large language model. Augment it with other types of learning models and at some point you might just hit on the right combination for it to start some form of actual "reasoning" thought or "creativity".

As long as they're all still "special" single-purpose systems (LLM is about processing and responding to language input for example, CV / Computer Vision models specialize in operating on visual or image inputs, etc.), that's all they'll ever be, no matter how good they get at pretending they're more.

hahajk · on May 26, 2023

Isn't GPT-4 multimodal? I remember it designing a website based off of a sketch during the initial demo.

rst · on May 27, 2023

Theres a multimodal variant, but it's not widely available yet.

ShamelessC · on May 27, 2023

It is.

fnordpiglet · on May 27, 2023

This is happening. These projects are examples of feedback loops. While the models aren’t playing directly they are receiving feedback and iteratively improving. Constraining and optimizing using classical AI is an obvious next step. I agree this is when the magic happens.

MagicMoonlight · on May 27, 2023

A LLM is a form of input like a keyboard. If they put it in front of something like siri and used it as input instead of a processor then you could make siri actually functional.

It’s very good at understanding text but by itself it can’t think. Turning text into commands it knows is doable.

jmugan · on May 26, 2023

I wish there was a summary of how this worked. I see the abstract and lots of figures and movies, but I still don't get a good sense of what exactly the algorithm is. I even skimmed the whole paper.

ShamelessC · on May 26, 2023

https://twitter.com/DrJimFan/status/1662117785487704067

You just scroll down a tiny bit on the twitter page and get this nice video and summary from the author.

> Voyager has 3 key components:

> 1) An iterative prompting mechanism that incorporates game feedback, execution errors, and self-verification to refine programs;

> 2) A skill library of code to store & retrieve complex behaviors;

> 3) An automatic curriculum to maximize exploration.

jmugan · on May 27, 2023

Right, but I wanted to know what each of those things was.

ChrisArchitect · on May 26, 2023

[dupe]

https://news.ycombinator.com/item?id=36085936

nitwit005 · on May 26, 2023

I searched for the word "infinite", and my suspicion was quickly proven correct:

> 9) Do not write infinite loops or recursive functions.

> Sometimes GPT-4 will write an infinite loop that runs forever.

csours · on May 26, 2023

do not write infinite loops

lsy · on May 26, 2023

I can only imagine the desperation of the researcher pleading with a computer to do its best to solve the halting problem.

codeulike · on May 26, 2023

This is like its writing a bot to play minecraft?

I'd like to see a visual/language model/AI that learns to play minecraft as an actual inhabitant of the game. i.e. processing visual input, recognising objects, working out whats going on, learning how to move around. Learning how to make food and avoid monsters. It would be an 'Embodied AI' within the world of Minecraft.

The language part would allow us to talk to this being. You could ask it things like:

"Do you prefer to make a house, or dig a cave?"

"What are your hopes for the future?"

"Is there a recent achievement you are particularly proud of?"

etc

AlecSchueler · on May 26, 2023

You could ask it those things but it will tell you that it doesn't have feelings, preferences or hopes. You'd also need to give a reason to do anything. Eventually you get back to the point where you're "writing a bot" but with behaviour closer to how you imagine it should be.

krapp · on May 26, 2023

>I'd like to see a visual/language model/AI that learns to play minecraft as an actual inhabitant of the game. i.e. processing visual input, recognising objects, working out whats going on, learning how to move around. Learning how to make food and avoid monsters. It would be an 'Embodied AI' within the world of Minecraft.

There is already an AI VTuber, Neurosama, trained to "play" games, including Minecraft (also OSU! and Among Us.)

I don't think she's learned lava is bad yet.

tehsauce · on May 26, 2023

Earlier discussion: https://news.ycombinator.com/item?id=36085936

slg · on May 26, 2023

And yet it still digs straight down.

neura · on May 26, 2023

Yeah... was wondering if it eventually learns that digging straight down is more likely to kill you than any other direction or if it could ever learn to down down only when standing on the edge of another block, instead of the one you're breaking.

yazaddaruvala · on May 26, 2023

They don't explicitly talk about curriculum improvement based on negative reinforcement. It would be relatively straight forward to do tho.

Perform self-reflection every time damage is taken (the iterations of self-reflection can depend on % health lost). Something like "Please output as a list of general guidelines / best-practices that any Minecraft player can use in the future. For each guideline add a risk profile that is introduced if the guideline is neglected. Based on the following game log over the last 100 steps, what should the Minecraft player have done differently to avoid taking damage? \n <game log>"

And keep storing those guidelines over multiple iterations. If there start to become too many best-practices, ask GPT-4 to "Please output as a list of general guidelines / best-practices that any Minecraft player can use in the future. For each guideline add a risk profile that is introduced if the guideline is neglected. Take the guidelines below and summarize them. Prioritize retaining more detail about the guidelines with a higher risk profile, and do merge guidelines if possible and appropriate. \n <old guidelines>"

zimpenfish · on May 27, 2023

> Perform self-reflection every time damage is taken

Given the number of times I've taken damage in Minecraft from a Creeper that has seemingly just appeared behind me, I expect this feedback loop to pretty quickly end up with a bot that does a full surroundings scan after every action to make sure there's no Creepers around.

js8 · on May 27, 2023

It also fights an Enderman with a shovel, how smart.

sdenton4 · on May 27, 2023

There's untold billions of tokens of Minecraft related content on the web - the controlling LLM personally has memorized every Minecraft strategy guide ever written.

Geee · on May 26, 2023

Very interesting. Here LLM writes code that plays Minecraft. If software 2.0 is neural networks, then software 3.0 is code written by neural networks.

ablyveiled · on May 26, 2023

And software 4.0 is sticks and stones. har har.

Fgehono · on May 27, 2023

Crazy how fast the peaces fall in place.

A big text corpus gives quite a huge amount of context.

SD also got better through this

mensetmanusman · on May 26, 2023

It will set out to build a library that contains all the sources that it quotes but that only exist in the multiverse.

iinnPP · on May 26, 2023

I imagine this is being worked on for OSRS as well. Exciting times. Terrifying times.

drekk · on May 27, 2023

It's interesting you mention OSRS because some bots are using ChatGPT (or other LLMs) to respond to players and it's incredibly effective. Only having to deal with conversations seems like a perfect use-case for the technology

https://youtu.be/fu9OQRh6K8U

phendrenad2 · on May 26, 2023

I'm imagining Twitch Plays Pokemon but even less coherent.

busseio · on May 27, 2023

I’d like to build something like this, but for Robot Odyssey.

thdespou · on May 27, 2023

Prompt engineering at it's finest form.

mbgerring · on May 27, 2023

Why? Who wants this? To what end?

mkaic · on May 27, 2023

> Why?

Because the researchers found it to be an interesting problem. Using an AI to beat Minecraft has been an active benchmark in the ML world for some time now.

> Who wants this?

Me, as well as the many others who have liked and shared the paper.

> To what end?

Furthering our ability to create autonomous agents. If we can get it to work in Minecraft, that's one step closer to getting it to work in real life.