And other inputs from the environment. Then there were mechanisms which could also be neural networks that will transform this data into more digestible way for GPT and GPT was also in addition specifically trained to act based on this input.
Then it would run in cycles, where it gets this input, and it will provide output on how it plans to react to the data, maybe every 100ms.
It then could also have a storage it can use, where it can store data as part of the output to later retrieve it again.
So it would be a set of modules that is controlled and interpreted by GPT.
It could then do all of that above, no? And all of it should be just a matter of implementing. The only near time challenges may be certain types of inaccuracies and or producing tokens in some cases might take too long time to have fast reaction time.
So basically you'll try to run as frequent cycles as possible with the inputs mentioned above, other neural networks identifying the objects, in many different ways and all the context about the environment, unless a new version of GPT becomes completely multi-modal.
And you run those loops, then GPT gives output what it wishes to do, e.g. store some fact for later usage, move there, move here, etc. Or retrieve some information using embeddings then decide again, and short term memory would just be this context sized window, and if it needs more it just looks into its own memory for embeddings.
how system updates and maintains own model(s) when new information added in form of single/few observation and/or interactions, without ability to replay data, without catastrophic forgetting etc..., and importantly, how such system's model(s) grow in complexity while retaining stability and redundancies.
1. Take light input. Video/images.
2. Take sound input.
3. Touch, heat input.
And other inputs from the environment. Then there were mechanisms which could also be neural networks that will transform this data into more digestible way for GPT and GPT was also in addition specifically trained to act based on this input.
Then it would run in cycles, where it gets this input, and it will provide output on how it plans to react to the data, maybe every 100ms.
It then could also have a storage it can use, where it can store data as part of the output to later retrieve it again.
So it would be a set of modules that is controlled and interpreted by GPT.
It could then do all of that above, no? And all of it should be just a matter of implementing. The only near time challenges may be certain types of inaccuracies and or producing tokens in some cases might take too long time to have fast reaction time.
So basically you'll try to run as frequent cycles as possible with the inputs mentioned above, other neural networks identifying the objects, in many different ways and all the context about the environment, unless a new version of GPT becomes completely multi-modal.
And you run those loops, then GPT gives output what it wishes to do, e.g. store some fact for later usage, move there, move here, etc. Or retrieve some information using embeddings then decide again, and short term memory would just be this context sized window, and if it needs more it just looks into its own memory for embeddings.