Very curious about RL for LLMs for example (using data from real use).
Neither cover LLMs. I don't follow the literature closely so I can only suggest you read papers: https://github.com/WindyLab/LLM-RL-Papers
Not trying to catch you, genuine interest.