Hacker News new | past | comments | ask | show | jobs | submit login

Maybe I'm missing something (I only did a quick read) but aren't you explicitly telling the model to re-explore low density regions of the action space? Essentially turning of the exploration (and turning down exploitation) with a weighting towards low density regions?

As not an RL person (I'm in generative), have people not re-increased the exploration variable after the model has been initially trained? It seems natural to vary that ee trade-off.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: