Hacker News new | past | comments | ask | show | jobs | submit login

Beyond its immediate appeal to the (somewhat cringy imo) “uncensored model” crowd, this has immediate practical use for improving data synthesis. I have had several experiences trying to create synthetic data for harmless or benign tasks, only to have noise introduced from overly conservative refusals.



I agree -- people often hear "uncensored model" and immediately jump to all sorts of places, but there are very practical use-cases that benefit from unhindered models.

In my case, we're attempting to use multi-modal models essentially for NSFW-detection with quantified degrees of understanding about the subjects in question (for a research paper involving historical classic art). Model censorship tends to not want to let us ask _any_ questions about such subject matter, and it has greatly limited the choice of models that we can use.

Being able to easily turn censorship off for local language models would be a great boost to our workflow, and we might not have to tiptoe around the prompt engine so carefully.


I encountered this in an absurd context — I wanted a model (IIRC GPT 3.5) to make me some invalid UTF-8 strings. It refused! On safety grounds! After a couple minutes of fiddling, the refusal was surprisingly robust, although I admit I didn’t try litany of the usual model jailbreaking techniques.

On the one hand, good job OpenAI for training the model decently robustly. On the other hand, this entirely misses the point of “AI safety”.


Reminds me of this nugget of Prime reacting to Gemini refusing to show C++ code to teenagers because it is "unsafe":

https://www.youtube.com/watch?v=r2npdV6tX1g




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: