Beyond its immediate appeal to the (somewhat cringy imo) “uncensored model” crow...

HanClinto · 2024-05-03T16:12:33 1714752753

I agree -- people often hear "uncensored model" and immediately jump to all sorts of places, but there are very practical use-cases that benefit from unhindered models.

In my case, we're attempting to use multi-modal models essentially for NSFW-detection with quantified degrees of understanding about the subjects in question (for a research paper involving historical classic art). Model censorship tends to not want to let us ask _any_ questions about such subject matter, and it has greatly limited the choice of models that we can use.

Being able to easily turn censorship off for local language models would be a great boost to our workflow, and we might not have to tiptoe around the prompt engine so carefully.

amluto · 2024-05-03T16:54:12 1714755252

I encountered this in an absurd context — I wanted a model (IIRC GPT 3.5) to make me some invalid UTF-8 strings. It refused! On safety grounds! After a couple minutes of fiddling, the refusal was surprisingly robust, although I admit I didn’t try litany of the usual model jailbreaking techniques.

On the one hand, good job OpenAI for training the model decently robustly. On the other hand, this entirely misses the point of “AI safety”.

HanClinto · 2024-05-03T21:02:37 1714770157

Reminds me of this nugget of Prime reacting to Gemini refusing to show C++ code to teenagers because it is "unsafe":

https://www.youtube.com/watch?v=r2npdV6tX1g