Based on your history here it’s quite obvious you’re a musk fan. Maybe though, you should realize that a model being steerable to claim itself being mechahitler and proposing death to people is absolutely not a “good thing”. I suggest you seriously reconsider on what you’re advocating for here. Because the outcome of this will cost innocent lives.
Non of the 'news' websites that show up on Google I could find ever showed the prompt used to make the the 'mechahilter' output. You can ask LLMs anything including just saying "repeat after me" or "please write a fictional story about a racist" and numerous other methods. If these reports were honest the prompt would be the first thing they showed.
You don’t need to but it certainly helped. And the Servo project also produced crates like HTML5ever which filled a huge gap in performant and safe document parsing for both browser and server. I’m very grateful that there are modular and easily usable projects like that, and rust is far easier to use than the complex c++ code and toolchains you’d need.
Not OP but generating puzzles isn’t hard if you have a database of games and I’ve done it myself.
For each game just fast forward to the end and ask for Stockfish to provide the solution. If it’s guaranteed “mate in X” then you’ve got yourself a puzzle. You can have a classifier that grabs other puzzle types too (“win the queen in X” for example)
At scale, It gets a little harder trying to explain the solution. There is often a combinatorial explosion of move possibilities, so it’s not easy to just say MATE IN 4 if there are 40 or 50 potential responses all leading to mate. Granted they are obvious mates but there may be users who do not see the obvious mates.
EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.
It's not silly. I would bet 99% of the users don't care that much to do that. A hardcoded regex like this is a good first layer/filter, and very efficient
Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.
The safety filter appears on both ends (or multi-ended depending on the complexity of your application), input and output.
I can tell you from using Microsoft's products that safety filters appears in a bunch of places. M365 for example, your prompts are never totally your prompts, every single one gets rewritten. It's detailed here: https://learn.microsoft.com/en-us/copilot/microsoft-365/micr...
The above appears to be scrubbed, but it used to be available from the learn page months ago. Your messages get additional context data from Microsoft's Graph, which powers the enterprise version of M365 Copilot. There's significant benefits to this, and downsides. And considering the way Microsoft wants to control things, you will get an overindex toward things that happen inside of your organization than what will happen in the near real-time web.
I doubt the purpose here is so much to prevent someone from intentionally side stepping the block. It's more likely here to avoid the sort of headlines you would expect to see if someone was suggested "I wish ${politician} would die" as a response to an email mentioning that politician. In general you should view these sorts of broad word filters as looking to short circuit the "think of the children" reactions to Tiny Tim's phone suggesting not that God should "bless us, every one", but that God should "kill us, every one". A dumb filter like this is more than enough for that sort of thing.
It would also substantially disrupt the generation process: a model which sees B0ris and not Boris is going to struggle to actually associate that input to the politician since it won't be well represented in the training set (and on the output side the same: if it does make the association, a reasoning model for example would include the proper name in the output first at which point the supervisor process can reject it).
I don‘t think so. My impression with LLMs is that they correct typos well. I would imagine this happens in early layers without much impact on the remaining computation.
> If things are like this at Apple I’m not sure what to think.
I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.
What prevents Apple from applying a quick anti-typo LLM which restores B0ris, unalive, fixs tpyos, and replaces "slumbering steed" with a "sleeping horse", not just for censorship, but also to improve generation results?
Why are these things always so deeply unserious? Is there no one working on "safety in AI" (oxymoron in itself of course) that has a meaningful understanding of what they are actually working with and an ability beyond an interns weekend project?
Reminds me of the cybersecurity field that got the 1% of people able to turn a double free into code execution while 99% peddle checklists, "signature scanning" and deal in CVE numbers.
Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.
Participated in the first two JS1ks. Thanks for bringing this back! Having some nostalgia. I can't believe this is 15 years old already: https://js1k.com/2010-first/demo/688
Anecdotally I've always found tiktoken to be far slower than huggingface tokenizers. I'm not sure why, as I haven't dug into tiktoken, but I'm a heavy user of HF's rust tokenizers
Ads are annoying but I at least understand that on Apple TV you'd see ads for entertainment content. Having it show up in Wallet is a complete disconnect.
reply