So like if the user input is "I want to ** your **" it would change it to "I want to enjoy your presence"? I'm pretty sure that would leak (be parroted back and not caught by a filter, thus exposing its happening), but it might work. It might put <censored> or something like that instead, which the model would take into consideration and censor itself.