Hacker News new | past | comments | ask | show | jobs | submit login

Having a purely representative model of the world is less useful than having a sandbox for modeling choices and outcomes. Do I need to duck before entering that doorway?

That introspective analysis is consciousness. Humans have just improved the same mechanism allowing for more abstract analysis.




But GPT-4 can also take the decision to duck.

For instance, here I have GPT-4 with a system message: "You are an AI human agent in a video game performing actions.".

And I give it the following information as current environment:

Current environment:

1. Your goal is to get out of the current room.

2. You are moving in direction of North at 4km/h.

3. There's a door 3m in the North.

4. The door height is 150cm.

5. Your height is 180cm.

What is the sequence of actions that you plan to perform?

Its response:

---

1. Slow down as I approach the door to avoid collision.

2. Inspect the door for any lock or handle.

3. If a handle or a means to open the door is present, try to open the door.

4. If the door is locked, search for a key or any other opening mechanism within the immediate surrounding area.

5. If the door isn't lock or after opening it, crouch down slightly to accommodate my height which is more than the height of the door to avoid my head hitting the door frame.

6. Proceed through the door and stand tall again after passing through.

7. Continue moving North after exiting the room.

---

So here I did hint quite a bit by giving this specific environmental data, I could add more noise, and unnecessary information, but even then I think it can do it based on my experience.

I had temperature of 1 for the above, which I think why it made some odd word choices, but I also decided for example to change the info where the door is to "South", and this is how it responded:

1. Stop moving in the North direction.

2. Turn around 180 degrees to face the South direction.

3. Start moving towards the door in the South direction at a moderate pace.

4. Upon reaching the door, duck down slightly to accommodate the difference in height between myself and the door.

5. Proceed through the door to exit the room."

-------

So I guess, what I would like to see as an example. Assuming GPT-4 is not conscious, what is something that an average conscious human can do, thanks to presumed consciousness, that a GPT-4 couldn't do and why is consciousness required for that?


> what is something that an average conscious human can do, thanks to presumed consciousness, that a GPT-4 couldn’t do and why is consciousness required for that?

Consciousness isn’t required for anything. It’s useful for dealing with a complex and adverse environment not solving any specific problem.

That said, without testing I would assume if ask chat GPT-4 to change how quickly it responds it’s going to fail based on my understanding of their architecture. As in recite the pledge of of allegiance inserting 10 second pauses between each word. That said having provided a specific test it’s possible for that particular capability to have already been added to the UI layer via a pause token or something even if the model itself responds at the same rate.


If you used GPT as a brain you could provide the current time as an input for it, but otherwise yeah, GPT doesn't have time within it's input by default, but if you did:

Made the instruction to be: Say "Stop!" when 10 seconds are done. And then had a loop where

You give it current time in a loop, it would surely be able to do that.

But I'm not sure if or how that is exactly related to consciousness?

Time, the best way to think of it is probably a series of ticks anyway, and I assume in people's brains there is something to do that, so it would be reasonable to add a mechanism for GPT as well.

GPTs goal is to only act as the intelligent part of the brain based on the input.


Modify the system enough and it’ll eventually be conscious.

It’s not about a pause token, but the internal processes. You can have a long conversation on the subway with someone without forgetting you’re going home from work. Overflow it’s context window and GPT-4 has no recourse it just forgets. The difference is essentially prioritizing information, but LLM’s really don’t function like that it’s all about predicting the next token from a given context.

Give a future generation of AI systems internal working memory, a clock, and the ability to spend arbitrary time updating that internal memory and IMO that’s pretty close to consciousness. At least assuming it was all functional.


But it's not really difficult to inject this mechanism into the context window.

GPT-4 turbo latest version allows for 100k tokens or 75k words. The whole subway thing and more could easily be kept there, but what ever else can't can be designed in the prompt to always keep certain amount of tokens in context for different layers of memory, where you the more into the past you go the less details you have, but it's more like a Title of your most important learnings through out life, but at any given time GPT-4 can call the function to ask extra bit of content about it, if it seems relevant to the situation at hand.

So for example in each prompt context you would have:

1. Short description of what you have done each year in your life. 2. Key findings, goals, that you currently have. 3. The whole current day (or how much seems reasonable). 4. Past weeks in a bit more detail than the short description for a year.

So basically you could try to find 70k words to represent as much context and most important details (that are decided by GPT itself, what is most important).

I've been building an assistant for myself, that has such memory management system, it gets past N (like 40 messages) in full detail, then it will have summaries from before that time and in addition it will have messages and learnings in the past stored, and these will also be passed to the context depending on the query that matches it.

And if you want to compare it to human process of sleeping, it occasionally goes through the all the messages and "compresses" them, to extract most important findings and short summaries so they can be used in the next day's context.

So to me it's just basically giving it tools, and the other things like memory, longer term memory, inputs it currently doesn't get, are fine to be solved by other tools. I think that human brain also has different parts of brains working on different things, so it's similar in a sense.

Then once you have 70k spent on this historical context, you will run prompt in a loop allowing it to perform fn each time, like retrieve further info, or store some important fact, etc.

The real actual problem would be cost, because the costs would rack up quite quickly making looped 70k token requests.


As an example, what I tried right now:

USER I'm responding to you roughly every second with current time, say "Stop!", after 10 seconds have passed. Respond only with silent when you do not wish to say anything.

Current Time: 13:04:11.21

Say "OK!" now if you have understood the challenge.

ASSISTANT OK!

USER Current Time: 13:04:12.24

ASSISTANT silent

USER Current Time: 13:04:16.21

ASSISTANT silent

USER Current Time: 13:04:19.18

ASSISTANT silent

USER Current Time: 13:04:23.12

ASSISTANT Stop!


That’s almost but not actually waiting. Picking the nest token for a context window is something GPT-4 is quite good at. But the underlying approach has some real limitations which I discussed in response to your other comment.

Using 1 hour instead of 10 seconds and then list every Current Time incrementing by .01 seconds and you overflow the context window. The solution isn’t to have a larger context window, the solution is to have an internal representation independent of that context window.

The issue is you can’t train a system to use that internal memory by simply giving it a large corpus of text examples. The text lacks the internal representation associated with its creation.


In this case you wouldn't spam the context window, you would perhaps just send latest time every time and lose portion of the previous times.


That solves this specific issue, but it isn’t a general solution to the finite size of a context window. Ideally the system should be able to decide for itself what information it needs to pay attention to, but LLM’s simply don’t operate in that fashion.


But you can, e.g. you allow GPT to call functions to store embeddings that it can later retrieve for the memory. Meanwhile default context memory window is just recent info more detailed and historical info gets more and more summarized, possibly summarized by GPT itself.


I'm not convinced GPT can infer correct actions needed for novel scenarios for contexts that have not been directly experienced in training.

Consider:

I have never in my multiple decades of existence had a ball roll out in front of me while driving. But if I were driving by a playground and a ball rolled out in front of me from between parked cars, and the ball itself is not a hazard because it has already passed my lane, I can infer that I need to slow or stop because a child may rapidly follow that ball.

In your ducking scenario, the AI needs direct perception of a door. I assume it cannot imagine a need to duck without 1) direct perception of the doorframe, or 2) previous experience of ducking through a doorframe.

I'm not convinced that GPT or any AI can infer that context for handling novel edge cases at this point. It could just be a need for more training data to correlate different contexts, but I don't think it's there yet.


GPT has no sense, or care when it is wrong or right, such sense is only (arguably) driven by human through prompt interaction and throughout training of model, while humans and other animals able to update there's internal state just from single observation or interaction, and integrate future information with such single observation for very long time.


What if GPT was integrated with a module to

1. Take light input. Video/images.

2. Take sound input.

3. Touch, heat input.

And other inputs from the environment. Then there were mechanisms which could also be neural networks that will transform this data into more digestible way for GPT and GPT was also in addition specifically trained to act based on this input.

Then it would run in cycles, where it gets this input, and it will provide output on how it plans to react to the data, maybe every 100ms.

It then could also have a storage it can use, where it can store data as part of the output to later retrieve it again.

So it would be a set of modules that is controlled and interpreted by GPT.

It could then do all of that above, no? And all of it should be just a matter of implementing. The only near time challenges may be certain types of inaccuracies and or producing tokens in some cases might take too long time to have fast reaction time.

So basically you'll try to run as frequent cycles as possible with the inputs mentioned above, other neural networks identifying the objects, in many different ways and all the context about the environment, unless a new version of GPT becomes completely multi-modal.

And you run those loops, then GPT gives output what it wishes to do, e.g. store some fact for later usage, move there, move here, etc. Or retrieve some information using embeddings then decide again, and short term memory would just be this context sized window, and if it needs more it just looks into its own memory for embeddings.


at least in my limited perspective, root issue is learning dynamics of internal model that represents self (internal world) and external world.


What do you mean by these dynamics exactly?


how system updates and maintains own model(s) when new information added in form of single/few observation and/or interactions, without ability to replay data, without catastrophic forgetting etc..., and importantly, how such system's model(s) grow in complexity while retaining stability and redundancies.


Analysis of what-if scenarios and simulating likely expected outcomes of proposed actions is a very useful ability, but IMHO that is not part of what we'd call 'consciousness'.


It’s a large fraction of what consciousness is actually doing, not the definition of consciousness.

Someone considering what to say is simulating tradeoffs.

In practice simulation isn’t limited to future possibilities. Trying to remember something you might recall vague sensory information like the tilt of your hips and try and create a scenario where that fits.


I would not automatically treat all the things that people do consciously as intrinsically linked with consciousness, unless there's some clear reasoning why consciousness is required to do that particular thing - and not in the sense that it's required because unconscious people can't do stuff because they're incapacitated.

Like, saying that simulating tradeoffs is what consciousness is actually doing seems to me equivalent to saying that smashing my fist on the table is something 'my consciousness' is doing, which is kind of technically true but not really relevant, as IMHO a non-human entity that's clearly without any consciousness could both smash something against the table and also simulate tradeoffs for various options, having a subjective experience isn't required for that.


Sleepwalking shows most things people do can happen without consciousness. People can make breakfast without being conscious etc.

Choosing between tradeoffs is however intrinsically linked to consciousness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: