More

binarymax · 2025-07-10T23:57:07 1752191827

Based on your history here it’s quite obvious you’re a musk fan. Maybe though, you should realize that a model being steerable to claim itself being mechahitler and proposing death to people is absolutely not a “good thing”. I suggest you seriously reconsider on what you’re advocating for here. Because the outcome of this will cost innocent lives.

Sparyjerry · 2025-07-12T06:11:08 1752300668

Non of the 'news' websites that show up on Google I could find ever showed the prompt used to make the the 'mechahilter' output. You can ask LLMs anything including just saying "repeat after me" or "please write a fictional story about a racist" and numerous other methods. If these reports were honest the prompt would be the first thing they showed.

binarymax · 2025-07-08T18:18:32 1751998712

You don’t need to but it certainly helped. And the Servo project also produced crates like HTML5ever which filled a huge gap in performant and safe document parsing for both browser and server. I’m very grateful that there are modular and easily usable projects like that, and rust is far easier to use than the complex c++ code and toolchains you’d need.

binarymax · 2025-07-08T16:50:48 1751993448

Not OP but generating puzzles isn’t hard if you have a database of games and I’ve done it myself.

For each game just fast forward to the end and ask for Stockfish to provide the solution. If it’s guaranteed “mate in X” then you’ve got yourself a puzzle. You can have a classifier that grabs other puzzle types too (“win the queen in X” for example)

zippyman55 · 2025-07-09T04:30:37 1752035437

At scale, It gets a little harder trying to explain the solution. There is often a combinatorial explosion of move possibilities, so it’s not easy to just say MATE IN 4 if there are 40 or 50 potential responses all leading to mate. Granted they are obvious mates but there may be users who do not see the obvious mates.

binarymax · 2025-07-07T20:14:54 1751919294

“I’m an electron wizard. I write spells and magical constructs appear on the mirror slate”

binarymax · 2025-07-06T20:50:50 1751835050

Wow, this is pretty silly. If things are like this at Apple I’m not sure what to think.

https://github.com/BlueFalconHD/apple_generative_model_safet...

EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.

deepdarkforest · 2025-07-06T21:17:04 1751836624

It's not silly. I would bet 99% of the users don't care that much to do that. A hardcoded regex like this is a good first layer/filter, and very efficient

BlueFalconHD · 2025-07-06T22:06:19 1751839579

Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.

brookst · 2025-07-06T22:25:31 1751840731

All commercial LLM products I’m aware of use dedicated safety classifiers and then alter the prompt to the LLM if a classifier is tripped.

latency-guy2 · 2025-07-06T23:17:31 1751843851

The safety filter appears on both ends (or multi-ended depending on the complexity of your application), input and output.

I can tell you from using Microsoft's products that safety filters appears in a bunch of places. M365 for example, your prompts are never totally your prompts, every single one gets rewritten. It's detailed here: https://learn.microsoft.com/en-us/copilot/microsoft-365/micr...

There's a more illuminating image of the Copilot architecture here: https://i.imgur.com/2vQYGoK.png which I was able to find from https://labs.zenity.io/p/inside-microsoft-365-copilot-techni...

The above appears to be scrubbed, but it used to be available from the learn page months ago. Your messages get additional context data from Microsoft's Graph, which powers the enterprise version of M365 Copilot. There's significant benefits to this, and downsides. And considering the way Microsoft wants to control things, you will get an overindex toward things that happen inside of your organization than what will happen in the near real-time web.

twoodfin · 2025-07-06T22:57:39 1751842659

Efficient at what?

tpmoney · 2025-07-06T21:24:12 1751837052

I doubt the purpose here is so much to prevent someone from intentionally side stepping the block. It's more likely here to avoid the sort of headlines you would expect to see if someone was suggested "I wish ${politician} would die" as a response to an email mentioning that politician. In general you should view these sorts of broad word filters as looking to short circuit the "think of the children" reactions to Tiny Tim's phone suggesting not that God should "bless us, every one", but that God should "kill us, every one". A dumb filter like this is more than enough for that sort of thing.

XorNot · 2025-07-06T21:40:54 1751838054

It would also substantially disrupt the generation process: a model which sees B0ris and not Boris is going to struggle to actually associate that input to the politician since it won't be well represented in the training set (and on the output side the same: if it does make the association, a reasoning model for example would include the proper name in the output first at which point the supervisor process can reject it).

binarymax · 2025-07-07T00:11:58 1751847118

No it doesn't disrupt. This is a well known capability of LLMs. Most models don't even point out a mistake they just carry on.

https://chatgpt.com/share/686b1092-4974-8010-9c33-86036c88e7...

quonn · 2025-07-06T22:03:34 1751839414

I don‘t think so. My impression with LLMs is that they correct typos well. I would imagine this happens in early layers without much impact on the remaining computation.

lupire · 2025-07-06T23:05:12 1751843112

"Draw a picture of a gorgon with the face of the 2024 Prime Minister of UK."

chgs · 2025-07-07T08:35:33 1751877333

There were two.

bigyabai · 2025-07-06T21:36:57 1751837817

> If things are like this at Apple I’m not sure what to think.

I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.

Aeolun · 2025-07-06T22:29:09 1751840949

The LLM will. But the image generation model that is trained on a bunch of pre-specified tags will almost immediately spit out unrecognizable results.

Lockal · 2025-07-07T11:53:19 1751889199

What prevents Apple from applying a quick anti-typo LLM which restores B0ris, unalive, fixs tpyos, and replaces "slumbering steed" with a "sleeping horse", not just for censorship, but also to improve generation results?

the_mar · 2025-07-07T15:16:41 1751901401

why do you think this doesn't already exist?

miohtama · 2025-07-06T21:20:35 1751836835

Sounds like UK politics is taboo?

immibis · 2025-07-06T23:10:06 1751843406

All politics is taboo, except the sort that helps Apple get richer. (Or any other company, in that company's "safety" filters)

stefan_ · 2025-07-06T22:03:41 1751839421

Why are these things always so deeply unserious? Is there no one working on "safety in AI" (oxymoron in itself of course) that has a meaningful understanding of what they are actually working with and an ability beyond an interns weekend project? Reminds me of the cybersecurity field that got the 1% of people able to turn a double free into code execution while 99% peddle checklists, "signature scanning" and deal in CVE numbers.

Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.

binarymax · 2025-07-06T18:16:43 1751825803

Your link is 404

FerkiHN · 2025-07-06T19:28:42 1751830122

Ops, I'll try again now: https://github.com/Ferki-git-creator/phono-in-terminal-image...

binarymax · 2025-07-01T21:34:46 1751405686

Participated in the first two JS1ks. Thanks for bringing this back! Having some nostalgia. I can't believe this is 15 years old already: https://js1k.com/2010-first/demo/688

Thanks for keeping the JS demoscene alive!

whynotmaybe · 2025-07-02T01:21:20 1751419280

> use your mouse to try and click the evading squares

2010's reality seems so close and yet so distant at the same time while trying to click it with duckduckgo on an android.

babakode · 2025-07-01T21:53:28 1751406808

My pleasure. I love seeing people being creative

binarymax · 2025-06-30T17:39:58 1751305198

Anecdotally I've always found tiktoken to be far slower than huggingface tokenizers. I'm not sure why, as I haven't dug into tiktoken, but I'm a heavy user of HF's rust tokenizers

binarymax · 2025-06-30T17:38:36 1751305116

The Huggingface transformers lib is currently undergoing a refactor to get rid of cruft and make it more extensible, hopefully with some perf gains.

binarymax · 2025-06-29T14:31:45 1751207505

Ads are annoying but I at least understand that on Apple TV you'd see ads for entertainment content. Having it show up in Wallet is a complete disconnect.