Its easy to see how it does that, the answer is that your bug isn't something no...

scrollaway · 2025-01-01T12:56:28 1735736188

Both of your claims are way off the mark (I run an AI lab).

The LLMs are good at finding bugs in code not because they’ve been trained on questions that ask for existing bugs, but because they have built a world model in order to complete text more accurately. In this model, programming exists and has rules and the world model has learned that.

Which means that anything nonstandard … will be supported. It is trivial to showcase this: just base64 encode your prompts and see how the LLMs respond. It’s a good test because base64 is easy for LLMs to understand but still severely degrades the quality of reasoning and answers.

HarHarVeryFunny · 2025-01-01T15:19:58 1735744798

The "world model" of an LLM is just the set of [deep] predictive patterns that it was induced to learn during training. There is no magic here - the model is just trying to learn how to auto-regressively predict training set continuations.

Of course the humans who created the training set samples didn't create them auto-regressively - the training set samples are artifacts reflecting an external world, and knowledge about it, that the model is not privy to, but the model is limited to minimizing training errors on the task it was given - auto-regressive prediction. It has no choice. The "world model" (patterns) it has learnt isn't some magical grokking of the external world that it is not privy to - it is just the patterns needed to minimize errors when attempting to auto-regressively predict training set continuations.

Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.

Jerrrry · 2025-01-02T01:00:09 1735779609

  >Whether these training set predictive patterns result in the model performing as you might hope on an unseen text depends on the similarity of that text to samples in the training set.

>similarity

yes, except the computer can easily 'see' in more than 3 dimensions with more capability to spot similarities, and can follow lines of prediction (similar to chess) far more than any group of humans can.

that super-human ability to spot similarities and walk latent spaces 'randomly' -yet uncannily - has given rise to emergent phenomena that has mimicked proto-intelligence.

we have no idea what the ideas these tokens have embedded at different layers, and what capabilities can emerge now or at deployment time later, or given a certain prompt.

HarHarVeryFunny · 2025-01-02T12:39:23 1735821563

The inner workings/representations of transformers/LLMs aren't a total black box - there's a lot of work being done (and published) on "mechanistic interpretability", especially by Anthropic.

The intelligence we see in LLMs is to be expected - we're looking in the mirror. They are trained to copy humans, so it's just our own thought patterns and reasoning being output. The LLM is just a "selective mirror" deciding what to output for any given input.

Jerrrry · 2025-01-02T16:20:35 1735834835

Its mirroring the capability (if not currently the executive agency) of being able to convince people to do things. That alone gaps the barrier as social engineering is impossible to patch - harder than full proofing models against being jailbroken/used in an adversarial context.

catalypso · 2025-01-01T19:08:35 1735758515

I just tried it and I'm actually surprised with how well they work even with base64 encoded inputs.

This is assuming they don't call an external pre-processing decoding tool.

simonw · 2025-01-01T19:17:43 1735759063

The LLM UIs that integrate that kind of thing all have visible indicators when it's happening - in ChatGPT you would see it say "Analyzing..." while it ran Python code, and in Claude you would see the same message while it used JavaScript (in your browser) instead.

If you didn't see the "analyzing" message then no external tool was called.

Jensson · 2025-01-01T13:25:27 1735737927

> just base64 encode your prompts and see how the LLMs respond

This is done via translations, LLM are good at translations, being able to translate doesn't mean you understand the subject.

And no I am not wrong here, I've tested this before, for example if you ask if a CPU model is faster than a GPU model it will say the GPU model is faster, even if the CPU is much more modern and faster overall since it learned that GPU names are faster than CPU names it didn't really understood what faster meant there. Exactly what the LLM gets wrong depends on the LLM of course, and the larger it is the more fine grained these things are but in general it doesn't really have much that can be called understanding.

If you don't understand how to break the LLM like this then you don't really understand what the LLM is capable of, so it is something everyone who uses LLM should know.

scrollaway · 2025-01-02T00:35:54 1735778154

That doesn't mean anything. Asking "which is faster" is fact retrieval, which LLMs are bad at unless they've been trained on those specific facts. This is why hallucinations are so prevalent: LLMs learn rules better than they learn facts.

Regardless of how the base64 processing is done (which is really not something you can speculate much on, unless you've specifically researched it -- have you?), my point is that it does degrade the output significantly while still processing things within a reasonable model of the world. Doing this is a rather reliable way of detaching the ability to speak from the ability to reason.

Jerrrry · 2025-01-02T01:07:18 1735780038

Asking characteristics about the result cause performance to drop because it's essentially asking the model to model itself implicitly/explicitly.

Also the more "factoids" / clauses needed to answer accurately are inversely proportional to the "correctness" of the final answer (on average, when prompt-fuzzed).

This is all because the more complicated/entropic the prompt/expected answer, the less total/accumulative attention has been spent on it.

  >What is the second character of the result of the prompt "What is the name of the president of the U.S. during the most fatal terror attack on U.S. soil?"