Reinforcement Learning can train a model based on some reward function. The suggestion is that real-world accountability could be translated into such a reward function.
Also, OP explicitly mentioned "online learning", which is a continuous training process after standard pre-training.
For what it's worth, I don't think this would work. Rewards would come in too sporadically to be useful.
Also, OP explicitly mentioned "online learning", which is a continuous training process after standard pre-training.
For what it's worth, I don't think this would work. Rewards would come in too sporadically to be useful.