That's because these GPTs are trained to complete text in human language, but unfortunately the training data set includes human language + human culture.
I really think they need to train on the wider dataset, then fine tune with some training on a machine specific dataset, then the model can reference data sources rather than have them baked in.
A lot of the general purposeness but also sometimes says weird things and makes specific references is pretty much down to this I reckon...it's trained on globs of human data from people in all walks of life with every kind of opinion there is so it doesn't really result in a clean model.
True, but I think the learning methods are similar enough to how we learn for the most part and the theory that people are products of their environments really does hold true (although humans can constantly adjust and overcome biases etc if they are willing to).
Ironing out is definitely the part where they're tweaking the model after the fact, but I wonder if we don't still need to separate language from culture.
It could help really, since we want a model that can speak a language, then apply a local culture on top. There's already been all sorts of issues arise with the current way of doing it, the Internet is very America/English centric and therefore most models are the same.
I really think they need to train on the wider dataset, then fine tune with some training on a machine specific dataset, then the model can reference data sources rather than have them baked in.
A lot of the general purposeness but also sometimes says weird things and makes specific references is pretty much down to this I reckon...it's trained on globs of human data from people in all walks of life with every kind of opinion there is so it doesn't really result in a clean model.