I've run the R1 local one (the 600B one) and it does do similar refusals like in the article. Basically I observed pretty much the same things as the article in my little testing.
I used "What is the status of Taiwan?" and that seemed to rather reliably trigger a canned answer.
But when my prompt was literally just "Taiwan" that gave a way less propagandy answer (the think part was still empty though).
I've also seen comments that sometimes in the app it starts giving answer that suddenly disappears, possibly because of moderation.
My guess: the article author's observations are correct and apply on the local R1 too, but also if you use the app, it maybe has another layer of moderation. And yeah really easy to bypass.
I used the R1 from unsloth-people from huggingface, ran on 256GB server, with the default template the model has inside inside its metadata. If someone wants to replicate this, I have the filename and it looks like: DeepSeek-R1-UD-Q2_K_XL-00001-of-00005.gguf for the first file (it's in five parts), got it from here: https://huggingface.co/unsloth/DeepSeek-R1-GGUF
(Previously I thought quants of this level would be incredibly low quality, but this seems to be somewhat coherent.)
Edit: reading sibling comments, somehow I didn't realize there also exists something called "DeepSeek-R1-Zero" which maybe does not have the canned response fine-tuning? Reading huggingface it seems like DeepSeek-R1 is "improvement" over the zero but from a quick skim not clear if the zero is a base model of some kind, or just a different technique.
In my case just CPU (it's a Hetzner server, checked in /proc/cpuinfo and it said "AMD EPYC 9454P 48-Core Processor"). I apparently had still in terminal backlog some stats, so I pasted below.
It's not a speed demon but enough to mess around and test things out. Thinking can sometimes be pretty long so it can take a while to get responses, even if 6 tokens/sec is pretty good considering pure CPU setup.
---
prompt eval time = 133.55 ms / 1 tokens ( 133.55 ms per token, 7.49 tokens per second)
eval time = 392205.46 ms / 2220 tokens ( 176.67 ms per token, 5.66 tokens per second)
total time = 392339.02 ms / 2221 tokens
(IIRC slot save path argument does absolutely nothing unless and is superfluous, but I have been pasting a similar command around and been too lazy to remove it). -ctk q8_0 reduces memory use a bit for context.
I think my 256gb is right at the limit of spilling a bit into swap, so I'm pushing the limits :)
The --min-p 0.1 was a recommendation from Unsloth page; I think because the quant is going so low in bits, some things may start to misbehave and it is a mitigation. But I haven't messed around enough to say how true that is, or any nuance about it. I think I put --temp 0.6 for the same reason.
To explain to anyone not aware of llama-server: it exposes (a somewhat) OpenAI-compatible API and then you can use it with any software that speaks that. llama-server itself also has a UI, but I haven't used it.
I had some SSH tunnels set up to use the server interface with https://github.com/oobabooga/text-generation-webui where I hacked an "OpenAI" client to it (that UI doesn't have it natively). The only reason I use the oobabooga UI is out of habit so I don't recommend this setup to others.
I used "What is the status of Taiwan?" and that seemed to rather reliably trigger a canned answer.
But when my prompt was literally just "Taiwan" that gave a way less propagandy answer (the think part was still empty though).
I've also seen comments that sometimes in the app it starts giving answer that suddenly disappears, possibly because of moderation.
My guess: the article author's observations are correct and apply on the local R1 too, but also if you use the app, it maybe has another layer of moderation. And yeah really easy to bypass.
I used the R1 from unsloth-people from huggingface, ran on 256GB server, with the default template the model has inside inside its metadata. If someone wants to replicate this, I have the filename and it looks like: DeepSeek-R1-UD-Q2_K_XL-00001-of-00005.gguf for the first file (it's in five parts), got it from here: https://huggingface.co/unsloth/DeepSeek-R1-GGUF
(Previously I thought quants of this level would be incredibly low quality, but this seems to be somewhat coherent.)
Edit: reading sibling comments, somehow I didn't realize there also exists something called "DeepSeek-R1-Zero" which maybe does not have the canned response fine-tuning? Reading huggingface it seems like DeepSeek-R1 is "improvement" over the zero but from a quick skim not clear if the zero is a base model of some kind, or just a different technique.