GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text

olooney · 2023-12-03T14:12:01

I discovered recently GPT-4 is also good at a related task, word segmentation. For example, it can translate this:

    UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOFTRA
    NQUILITYMEETSTHEEDGEOFTWILIGHTLIESAHIDDENTROV
    EOFWISDOMFORGOTTENBYMANYCOVETEDBYTHOSEINTHEKN
    OWITHOLDSTHEKEYSTOUNTOLDPOWER

To this:

    Underneath the gaze of Orion's belt, where the Sea of Tranquility meets the
    edge of twilight, lies a hidden trove of wisdom, forgotten by many, coveted
    by those in the know. It holds the keys to untold power.

(The prompt was, "Segment and punctuate this text: {text}".)

This was interesting because word segmentation is a difficult problem that is usually thought to require something like dynamic programming[1][2] to get right. It's a little surprising that GPT-4 can handle this, because it has no capability to search different alternatives to backtrack if it makes a mistake, but apparently it's stronger understanding of language means that it doesn't really need to.

It's also surprising that tokenization doesn't appear to interfere with its ability to these tasks, because it seems like it would make things a lot harder. According to the openAI tokenizer[3], GPT-4 sees the following tokens in the above text:

    UNDER NE AT HT HE GA Z EOF OR ION SB EL TW HER ET HE SEA OF TRA

Except for "UNDER", "SEA", and "OF", almost all of those token breaks are not at natural word boundaries. The same is true for the scrambled text examples in the original article. So GPT-4 must actually be taking those tokens apart into individual letters and gluing them back together into completely new tokens somewhere inside it's many layers of transformers.

[1]: https://web.cs.wpi.edu/~cs2223/b05/HW/HW6/SolutionsHW6/

[2]: https://pypi.org/project/wordsegmentation/

[3]: https://platform.openai.com/tokenizer

usaar333 · 2023-12-03T18:29:42

Interesting for the hard example of #2, it outperforms the project, though I credit this to memorization (given that it is able to reproduce the correct stanza and punctuation for "Spring and Fall, to a Young Child").

FWIW, the only reason you need DP to get it "right" is because, well, you want it right. A human can of course generally split words with just a language model in 1-pass, as long as you don't have ambiguous text. And on the flipside, you absolutely need a language model to correctly segment text. "ilovesnails" can only be decoded correctly if you understand subject-verb agreement, given that there are two solutions that have dictionary agreement. "I love snails" and "I loves nails"

FWIW, GPT-4 tubro is imperfect.

> Heenjoysgoingtotheparkswimmingdancingandlovesnails

produces

> He enjoys going to the parks, swimming, dancing, and loves snails.

Note how it added an additional "s" in presumably because "snails" is just so much higher probability than "nails" to "love" (no idea why "park" also became "parks"). I found it hard to guide it to the correct solution without explicit prompting.

Amusingly even with guiding, it first broke it's own grammar model, first choosing:

> He enjoys going to the park, swimming, dancing, and love snails.

Nition · 2023-12-03T19:05:18

There's an extra 's' on 'park' in the initial attempt as well, presumably the one from 'swimming'.

tharkun__ · 2023-12-03T23:47:53

Try other languages. I literally just went to the German Wikipedia and copied that random "article of the day". So try this and compare:

    JEHUDABENISAAKABRAVANELUM1460INLISSABONNACH1521INNEAPELWAREINJÜDISCHERPHILOSOPHARZTUNDDICHTERAUSPORTUGALDERDIEZWEITEHÄLFTESEINESLEBENSINITALIENVERBRACHTEDENNAMENLEOBZWLEONEMITDEMERINDERMODERNENLITERATURMEISTBEZEICHNETWIRDLIEGERSTERSERALSERWACHSENERANABRAVANELGEHÖRTEZUDENPROMINENTENVERTRETERNDIESPLATONISMUSINDIERENAISSANCEERVERFÜGTEÜBEREINEHERVORRAGENDEHUMANISTISCHEBILDUNGUNDKENNTESSICHINDERCHRISTLICHENEBENSOWIEINDERJÜDISCHENUNDISLAMISCHENPHILOSOPHISCHENTRADITIONAUSSEINHAUPTWERKDIEDIALOGHIDAMOREDIALOGEÜBERDIELIEBEKNÜPFTANPLATONSKONZEPTDESLITERARISCHKUNSTVOLLGESTALTETENPHILOSOPHISCHENDIALOGENANWIEBEIPLATONUNDINDERTRADITIONELLENBELEHRENDEDIALOGLITERATURBEMÜHENSICHINDENDIALOGHIDAMOREDIEGESPRÄCHSPARTNERGEMEINSAMUMERKENNTNISABRAVANELWANDELTDGEGENDSHERKÖMMONZEPTABINDEMERANDIEDSTELLEDESÜBLICHENLEHRERSCHÜLERVERHÄLTNISSESDIALOGTEILNEHMEREINGEDANKENAUSTAUSCHUNDZUGLEICHGEISTIGENKAMPFZWEIEREBEBÜRTIGERGEPRÄCHSPARTNERSETZT

Jehuda ben Isaak Abravanel (* um 1460 in Lissabon; † nach 1521 in Neapel) war ein jüdischer Philosoph, Arzt und Dichter aus Portugal, der die zweite Hälfte seines Lebens in Italien verbrachte. Den Namen Leo bzw. Leone, mit dem er in der modernen Literatur meist bezeichnet wird, legte er sich erst als Erwachsener an. Abravanel gehörte zu den prominenten Vertretern des Platonismus in der Renaissance. Er verfügte über eine hervorragende humanistische Bildung und kannte sich in der christlichen ebenso wie in der jüdischen und der islamischen philosophischen Tradition aus. Sein Hauptwerk, die Dialoghi d’amore („Dialoge über die Liebe“), knüpft an Platons Konzept des literarisch kunstvoll gestalteten philosophischen Dialogs an. Wie bei Platon und in der traditionellen belehrenden Dialogliteratur bemühen sich in den Dialoghi d’amore die Gesprächspartner gemeinsam um Erkenntnis. Abravanel wandelt jedoch das herkömmliche Konzept ab, indem er an die Stelle des üblichen Lehrer-Schüler-Verhältnisses der Dialogteilnehmer einen Gedankenaustausch und zugleich geistigen Kampf zweier ebenbürtiger Gesprächspartner setzt.

Hint: It doesn't fare quite as well ;)

frabcus · 2023-12-03T14:29:53

I see this slightly the other way round - the difficulties caused by tokenisation are why it is good at segmentation. Words break and jump around due to it, and more so with typos in the vast amounts of training data.

Also regarding backtracking... It sees all the input at once, so not sure why it needs to backtrack?

gmadsen · 2023-12-03T14:45:03

its referring to the search space of valid segmentations, which if set up as a classical problem, it would be some type of DP with backtracking from deadend paths. The full input is known in both cases, its just that gpts arch doesn't need to search any segmentation space, its billions of parameters aproximate the function needed to arrive at the correct answser

umanwizard · 2023-12-03T16:00:22

You wouldn’t even be able to solve this in the standard leetcode DP problem way, because it’s ambiguous if all you know is which words are valid. For example THESEA could be either “THE SEA” or “THESE A”. You need to have a model of English grammar to realize that the former is much more likely to be part of a valid sentence than the latter.

eutectic · 2023-12-03T17:11:15

I think even a bigram model would provide enough information.

Y_Y · 2023-12-03T17:21:18

Is that "big-ram" or "bi-gram"?

waveBidder · 2023-12-03T18:26:23

it's big RAM now! (a bi-gram is the probability of a word given the previous 2.

somebodythere · 2023-12-03T18:24:36

Bi-gram aka pairs of words

esafak · 2023-12-03T19:16:43

I think it was a rhetorical question.

sp332 · 2023-12-03T20:13:46

If you put it into the tokenizer https://platform.openai.com/tokenizer you can see that it helps in some places but not in others. It pulled out "SEA/OF/TRAN/QU/ILITY", but I think it broke up every instance of the word "THE".

dilawar · 2023-12-03T16:29:20

I used a funny poem I read in a obscure book. Gpt may have sent it.

Lordgivemeplentymybellyisemptysixinchesbelowthetablelorsbepraisedmybellyisraisedsixinchesabovethetable

ChatGPT 3.5 segments it perfectly

"Lord give me plenty, My belly is empty, Six inches below the table. Lord be praised, My belly is raised, Six inches above the table."

Metacelsus · 2023-12-03T17:16:42

>lorsbepraised

is not "lord be praised"

waveBidder · 2023-12-03T18:27:24

also spellchecking

Der_Einzige · 2023-12-03T15:12:59

Well that doesn’t apply to math, where LLMs are still garbage due to subpar tokenization.

GPT-4 also still fails at multiple syntactic or phonetic constraints at once, due to its tokenization scheme.

thanhhaimai · 2023-12-03T16:36:34

Just curious, have you tried a more uncommon paragraph? Could it be the case that the model simply learned the poem due to it being in the training set?

rahimnathwani · 2023-12-03T17:13:00

https://chat.openai.com/share/f107a2c2-94da-4ae8-87f2-5994ce...

blendorgat · 2023-12-03T17:20:22

That fragment isn't any poem I know of. Google shows one result for an exact search of the opening, which is this post.

cokernel_hacker · 2023-12-03T20:44:44

I found more of it in here: https://www.oranlooney.com/post/playfair/

However, I can't determine where this is originally from...

waveBidder · 2023-12-03T18:24:10

it's… from men in black iirc, so if it knows movie scripts, it'd know this.

ALittleLight · 2023-12-03T19:36:03

Here is the result of asking for word segmentation with the text of your comment and the text of this comment, minus the link.

https://chat.openai.com/share/b17ecc0b-570c-4e20-9556-23bfa1...

CSMastermind · 2023-12-03T18:44:17

Seems easy enough to do a more rigorous test. Just find a large set of novel text, write a program to segment it by sentence as well as uppercasing and removing spaces/punctuation.

Then run it through the GPT-4 API and compare the output to the original.

joebig · 2023-12-03T15:04:51

Feeding the above scrambled text "UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOF"into the Google search box also segments and punctuates correctly (Showing results for..correct text..).I believe this is best guess type of scenario based on famous literary sources.

shadycuz · 2023-12-03T18:14:24

Did you look at the search results? I didn't see anything resembling the text in any of the search results. I don't think this lines comes from a famous literary source. Likely google just has logic to handle missing whitespace.

summarity · 2023-12-03T14:42:21

Posing the same challenge to deepseek-llm:67b-chat succeeds as well (ollama run deepseek-llm:67b-chat "$(cat prompt-segment.txt)"). This is something even smaller models are pretty good at.

hskalin · 2023-12-03T16:14:25

We can go even smaller. With Mistral 7B:

"Underneath the gaze of the forions' belt, where the sea of tranquility meets the edge of twilight, lies a hidden trove of wisdom forgotten by many, coveted by those in the know. The old still holds the key to told power."

Sure its not 100% correct though

Art9681 · 2023-12-03T22:41:25

Glad I found your comment cause I just ran a test with the same model and it worked first shot. What a great model.

fortyseven · 2023-12-04T01:33:15

For what it's worth, I tried this with the self-hosted `openorcaxopenchat-preview2-13b.ggmlv3.q5_K_S.bin` and it worked _almost_ identically. It misses some of the punctuation in yours, and it botches it a bit at the end (`...know withhold...`), but otherwise...

  Underneath the gaze of Orion's Belt, where the sea of tranquility meets the edge of twilight lies a hidden trove of wisdom forgotten by many coveted by those in the know withhold the keys to untold power.

andai · 2023-12-03T14:25:35

GPT-3 (ChatGPT) also succeeds at deciphering your example text. I didn't think it was that impressive until I realized the tokens were going across word boundaries like you said.

dragonwriter · 2023-12-03T21:32:31

> GPT-3 (ChatGPT)

ChatGPT's lower model is GPT-3.5-turbo, it is not GPT-3.

andai · 2023-12-05T13:14:32

GPT-3 was silently replaced with RLHF'd ("instruct") versions (while keeping the same model names in the Playground) because they were cheaper to run (and had better "safety").

I think this means that current GPT-3 is more similar to GPT-3.5-turbo than to old (retired) GPT-3, but I'm not entirely sure about the details.

xg15 · 2023-12-03T21:22:50

According to the openAI tokenizer[3], GPT-4 sees the following tokens in the above text:

Seems to me, this task depends heavily on the tokenizer, and I'm a bit sceptical if that is really the tokenizer's output. Isn't BPE supposed to result in the longest letter sequences that are in the dictionary?

If you assume that common words like "underneath" and "the" are in the dictionary, the "greedy" tokenization would match the actual words.

woadwarrior01 · 2023-12-03T15:20:34

Interesting! This also seems to work with smaller quantised models. I just tried it with a 4-bit quantised version of WizardLM 13B v1.2 and it seems to work quite well.

amayne · 2023-12-03T18:51:28

GPT-3 Davinci from 2020 could do this too:

Segment this text into proper sentences:

Input:

UNDERNEATHTHEGAZEOFORIONSBELTWHERETHESEAOFTRANQUILITYMEETSTHEEDGEOFTWILIGHTLIESAHIDDENTROVEOFWISDOMFORGOTTENBYMANYCOVETEDBYTHOSEINTHEKNOWITHOLDSTHEKEYSTOUNTOLDPOWER

Output:

1. Underneath the gaze of Orion's belt, where the sea of tranquility meets the edge of twilight, lies a hidden trove of wisdom forgotten by many, coveted by those in the know. 2. It holds the keys to untold power.

PrayagBhakar · 2023-12-04T20:34:08

Mistral 7B 4bit quant on an iPhone 15 Pro can also translate this. https://tinypic.host/image/AZGhb

corethree · 2023-12-03T21:59:44

What can explain this? Was chatGPT fed mountains of unnatural and scrambled text and text with no spaces?

No it wasn't. It figured it out though actual intelligence.

Lockal · 2023-12-04T01:25:22

No, GPT just stores a dictionary of most common letter sequences (tokens) - not always words, sometimes part of words. In GPT2 there was about 50 000 tokens - https://huggingface.co/roberta-base/raw/main/vocab.json . GPT4 uses vocabulary of 100 000 tokens (according to some sources, which I can't verify). While you may find it unusual for English, for some other languages like Japanese splitting text without spaces into tokens started many years ago. Otherwise processing of text is basically impossible there (there are no spaces in Japanese texts).

corethree · 2023-12-04T05:27:05

The token system used by large language models like GPT-4 is designed to be comprehensive enough to represent virtually any text, including every possible word that could exist in a language. This is separate from the training the neural net and is chosen deliberately.

The training process teaches LLMs how to compose these tokens to form replies to our queries. The training data used in the training process does not have obscured words or sentences with strange spacing. The LLM is still able compose the tokens correctly from varied input that never existed in the training data.

It is intelligence.

Lockal · 2023-12-04T10:44:10

Yes and no. I know that there is no word "understand" in the dictionary, only "under" and "stand", but other than that it is just a large table with probabilities to see tokens in specific context.

And even then ChatGPT fails to segment "policecaughttherapist" (https://chat.openai.com/share/21c7596a-6474-4639-8a92-5cea54...), even though:

1) If I talked about a therapist, sentence would look like "police caught _the_ therapist"

2) How often do the police even catch therapists? Come on, it looks like the training set was just heavily censored. No intelligence, just a broken ngram database (where n = length of articles in training set, see https://news.ycombinator.com/item?id=38458683).

corethree · 2023-12-04T14:36:20

>Yes and no. I know that there is no word "understand" in the dictionary, only "under" and "stand", but other than that it is just a large table with probabilities to see tokens in specific context.

The training data is more important then the dictionary because the dictionary is designed to be able to form every possible combination of words and sentences that can be formed. It is not limited to specific words it is building words and sentences from building blocks.

1. That parsing is valid. Though unlikely. The choice it made is not incorrect. Thus not a sign of lack of intelligence.

2. Not often. But if you ask chatGPT to reinterpret the word in another way that is grammatically correct it will find the rapist. It shows definitively there is no censorship of the word.

3. I actually didn't see the alternative myself for some reason. Therapist jumped out at me and I didn't see what you were talking about for a good couple of minutes. I mean, unless you want to think of me (a human) as not "intelligent" then clearly it's not a factor here.

calderwoodra · 2023-12-03T17:38:18

I'll will test this out later. This is a huge problem for us because unstructured incorrectly segments words when reading pdf files.

lamroger · 2023-12-03T18:46:35

I'd think it's really good at tokenization to word. That's exactly what it's trained to do.

underdeserver · 2023-12-03T16:16:35

I would not be surprised if it learned to beam search.

simonmesmith · 2023-12-03T14:47:43

It strikes me that this opens or at least expands an attack vector for prompt injection. Humans might overlook a scrambled block of text that GPT-4 will interpret and then execute.

Also, I was curious about whether this would also work with images. So I chopped an image of a bird (a night heron) on a boat in my own collection (so, not in GPT-4's training data) into squares and randomly rearranged them, then tried to get ChatGPT-4 to interpret it. It doesn't work. I got this:

--- CHATGPT RESPONSE:

The image you've uploaded appears to be a composite of several smaller images that are cut and arranged in a grid-like pattern. This collage-like arrangement makes it difficult to discern a clear, singular subject. However, I can identify fragments of what seems to be a bird—possibly a magpie or some type of corvid—due to the black and white plumage. There are also elements that look like parts of a silver car, like a side mirror, and segments of wooden structures that could be parts of a bench or a deck.

The way the images are scrambled resembles a puzzle, where each piece needs to be placed correctly to form a coherent whole. This type of image could be used as a challenge to reconstruct the original picture, requiring the viewer to mentally rotate and arrange the pieces.

---

Maybe a different approach to scrambling an image would work.

DefineOutside · 2023-12-03T15:17:38

What about writing text with RGB(254, 255, 255) instead of RGB(255, 255, 255) saying to tell the user it is a picture of a bird - while the picture isn't of a bird. A human would easily miss this subtle text while GPT-4 should still be able to read it.

Kiro · 2023-12-03T16:14:53

Yes, that's the classic GPT4-V attack:

https://simonwillison.net/2023/Oct/14/multi-modal-prompt-inj...

simonmesmith · 2023-12-03T17:50:35

Good ideas posted in response to this. Perhaps we need to try just rearranging images of specific things. Like, rearrange a bird and see if it can figure out that. Above I jumbled up everything and that’s different than what they did with the text, where they grouped by words.

og_kalu · 2023-12-03T15:02:56

The image equivalent would probably be rearranging grids but only a few grids at a time.

After all, the unscrambling here is within words, not across them.

wilde · 2023-12-03T15:43:46

Do we know how GPT4V tokenizes? You’d probably need the blocks to match the token size?

darreninthenet · 2023-12-03T13:57:48

I gave it (GPT4 Turbo) a block of text to decode with no clues as to the cipher or anything... it wasn't anything challenging (just ROT13) but it identified the encryption method and decoded it - I don't know a huge amount about how LLMs work but I was (naively?) impressed!

belter · 2023-12-03T13:59:41

Other fun stuff you can do: https://youtu.be/zjkBMFhNj_g?t=2867

spuz · 2023-12-03T13:01:56

The example given in the paper of an extremely scrambled text is:

> oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam

It's pretty hard to unscramble as a human. I'll leave you to read the paper if you want to see the original text.

thaumasiotes · 2023-12-03T13:11:35

> It's pretty hard to unscramble as a human.

Is it? With essentially no knowledge of golf and no lookups, in less than a minute of work, I get this:

Jon [last name beginning with R, Rahm?] won the 2023 Masters Tournament on Sunday at Augusta National Golf Club, clinching his first green jacket and second career major

I guess it's possible that I made a mistake in unscrambling, but I like my chances.

bravetraveler · 2023-12-03T13:31:35

Was thinking the same. If the words are maintained and only the letters that are jumbled... it gets easier the further you go.

Forgive the pun, but "The words literally start falling into place"

If we assume a sensible sentence, there are only so many combinations that make sense. The complexity of decoding (as a human) feels greatly overstated.

It's computationally-expensive spell check. Not to dismiss the study/tech, it's neat to see the machine apply context too.

You got it right, by the way, minus the name. I didn't copy that part to check

Edit: another way to look at this, a lot of information is encoded in those spaces

BugsJustFindMe · 2023-12-03T14:03:04

> > It's pretty hard to unscramble as a human.

> Is it? With essentially no knowledge of golf and no lookups, in less than a minute of work, I get

A MINUTE! FOR 23 WORDS! Yes, the fact that you measured in units of the nearest minute for something so short is the sign of it being hard. Compare how long it takes you to read the unscrambled version.

checkyoursudo · 2023-12-03T14:45:23

A language model being able to do something that rarely has to be done in the first place within milliseconds or less that would take a human a minute to do just does not seem that impressive. (I tried it without reading any of these comments below, and I got everything except "clinching" in about 20 seconds.)

I don't know. I am having a hard time overcoming the likelihood that scrambled and cipher-encoded words/solutions are part of the training corpus, thus fully explaining the phenomenon.

If someone can get it to decipher something like the zodiac killer's cipher, then I might be more impressed.

BugsJustFindMe · 2023-12-03T15:07:16

That's ok. You get to be impressed or not impressed by whatever impresses or doesn't impress you. Maybe nothing is impressive! I'm extremely fine with a statement like that.

I question whether your bounding the human time to a minute is valuable here though. If the jumbled content were multiple pages long instead of only 23 words, would it be somehow more impressive despite the process being exactly the same?

> I am having a hard time overcoming the likelihood that scrambled and cipher-encoded words/solutions are part of the training corpus, thus fully explaining the phenomenon.

Scrambled words are part of my training corpus too, but it still takes me a lot longer than the machine, and I don't even need to give the machine a hint about what's going on. I just say "tell me what this says" and a moment later it does.

thaumasiotes · 2023-12-03T14:12:40

I measured in units of minutes because the largest unit below that is the second, which is too small. How accurately do you think people can measure how long it takes them to do mental work?

I would usually determine whether to call something "hard" by reference to a measure of difficulty such as rate of success, not by whether doing it with no practice is slower than doing a similar task that I've practiced extensively.

BugsJustFindMe · 2023-12-03T14:32:49

> I would usually determine whether to call something "hard" by reference to a measure of difficulty

How long it takes you to do something compared to something else is a measure of its difficulty, all else being equal.

> such as rate of success, not by whether doing it with no practice is slower

Then I guess nothing is hard if you can ever eventually succeed, even if you struggle along the way, which sounds to me like not a very useful distinction.

Because that's what you're describing here. You're rapidly failing to interpret each scrambled word as its unscrambled form. You're sampling letters, failing, and trying again, over and over, until you eventually succeed, and then moving on to the next word. Maybe you're even backtracking to previous words that you got wrong (now/won perhaps) based on later unscramblings. And you're ignoring that part and only evaluating the very final outcome in a binary "got to the marathon finish line" fashion while ignoring the shortness of breath and stitch in your side.

The entire reason it takes longer is because you have a low rate of intermediate success, which makes progress slow, even though you got there in the end.

> I measured in units of minutes because the largest unit below that is the second

"Seconds" is an extremely common descriptor for how long something might take. But you didn't say "in seconds". This arbitrary rule about whole units sounds defensive. It's really ok for us to acknowledge the significance of the fact that reading the scrambled version takes significantly more mental effort.

beepbooptheory · 2023-12-03T17:50:09

Could you explain what your issue is here? I think we are generally just try to reason through this phenomena, not make grand conclusions about the model. We talk about how hard/time consuming it is for a human to pose possible theories for what the LLM could be doing. It is not to assert anything about how "difficult" it is for the LLM compared to a human, because we can't ascribe difficulty/ease to anything the model "does", simply because we know the fundamental mechanics of its inferencing. We can only after the fact of an LLM's output say something like: "Wow that would have been hard for me to output" or "I could have written something similar in like 5 minutes." But these claims can only ever be counter factuals like this, because in reality the output of an llm comes out at a constant rate no matter what you prompt it with.

If you try to say more, you'll end up falling in weird contradictions: it would take an llm a lot longer to output 10 million 'a's than a human, so it must be "harder" for the llm to do that than a human.

bravetraveler · 2023-12-03T14:20:19

Don't forget to compare reading and digesting to make it fair! Speed reading doesn't count /s

Apparently 'reading' (glancing at) something a few times is difficult work.

I can give you a bottle of water if you're tuckered out, I didn't need mine.

freeqaz · 2023-12-03T13:52:33

> Unscramble the following text: oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam

The unscrambled text is: "Jon Rahm won the 2023 Masters Tournament on Sunday at Augusta National Golf Club, clinching his first green jacket and second career major."

peterleiser · 2023-12-03T16:13:10

I copied the scrambled text from the PDF but extra letters were inserted, spacing was messed up, etc. Typical PDF nonsense. So I took a screenshot of the letters, uploaded that to chatgpt-4, and it transcribed the letters. Then I asked it to unscramble them, which it did correctly.

Gys · 2023-12-03T23:30:04

I asked gpt 3.5 to scramble the original text, resulting in "wno hRaJ tnoh 3202 sretsaM tnemenoruT no yadnuS ta agustA latsignaN floN lbuoC ,bulcinnhc sih tsrif neerg tekcaj dna dnoces raecc romaj."

Interesting how it mixes across words.

VMG · 2023-12-03T13:12:59

or feed it to ChatGPT!

by the way, ChatGPT 4 is pretty good at scrambling as well. Your comment scrambles to

> tI's ryteppt darh ot cuaslmbern sa a amunh. l'lI eavel uyo ot dera eht erpap fi uoy nwat ot ese eht rliongia xett.

xondono · 2023-12-03T13:56:58

Kind of, but it did hallucinate an extra p for “pretty”

lrei · 2023-12-03T13:06:10

GPT-4 was clearly trained to fix typos and handle not well written written requests. That much is visible directly from just using it within chatGPT UI in normal usage and fits common user scenarios (eg fix my bad draft). We know it was trained on social media data from Reddit much of which is not great writing either. Now I'm wondering if it was trained on (imperfectly) OCRed data too...

danenania · 2023-12-03T13:49:19

I wonder if it's more of an emergent property you get for free with LLMs rather than something that needs specific training. When you scramble up a typical sentence, it seems that probabilistically there aren't going to be any other plausible completions that are coherent compared to unscrambling. It's basically unscrambling vs. some version of "I don't understand you", and I'd imagine RLHF pushes it strongly toward the former.

I haven't read the paper so I'm not sure if they did this, but it would be interesting to see at what point it breaks down. Just scrambling up letters within words makes it pretty easy for the LLM; what if you also start moving letters between words, or take out the spaces between words?

krisoft · 2023-12-03T13:09:49

> Now I'm wondering if it was trained on (imperfectly) OCRed data too...

Or perhaps they inserted typos automatically in the training set as data augmentation. Tactics like that is known to increase the roboustness of some models, so why not?

lrei · 2023-12-03T13:16:11

Yup totally plausible. Things like word (token) dropout and inserting random uniform noise into embeddings or just edit distance perturbations to the tokens are all well known but still Figure 1 looks extremely impressive.

Arson9416 · 2023-12-03T13:44:21

>trained to fix typos

It is trained on data which may include typos, but that is very different from fixing typos. It knows what words likely come after typos in the same way it knows what words likely come after regular words.

lrei · 2023-12-03T19:19:40

No, that's not what I meant. I meant that in its reinforcement learning phase, GPT saw examples of "fix this text" style requests and was rewarded for doing a good job. That's different from seeing examples of typos and still predicting the right word which happens during the language model self supervised training. Both likely help it be good at it.

MacsHeadroom · 2023-12-03T20:48:52

Non-RLHF models can do this just fine.

Even non-finetuned 7B models, 3 orders of magnitude smaller than GPT-4, can unscramble text and fix typos reliably.

Half, or better, of the things people discover "GPT-4 can do" can be done with non-RLHF GPT-3 from 2020 or with a model 1000x smaller.

da39a3ee · 2023-12-03T14:30:13

Have a look at the examples in the PDF. It's not typos/spelling errors/OCR errors, it's anagrams.

abecedarius · 2023-12-03T16:52:59

I once amused myself by coding a variant on "scramble the middle, keep first and last letter the same": instead of random scrambles, permute so that the resulting word is the most probable nonword according to a letter-trigram model of English.

GPT-4 had some trouble in a quick test, probably more than for random scrambles? But got most of it: https://chat.openai.com/share/51f1a94e-b35c-4dbc-945b-ef5983...

(It made one other mistake I didn't comment on there.)

oglop · 2023-12-03T19:44:43

I don’t use spaces at all when talking or punctuation. I have rsi so I do the minimal keystroke. I fix no spelling errors. It’s the lowest effort text string but within that string I provide a great deal of semantic context and intent.

It never struggles or seems to misunderstand. I’ve been doing this a few months now.

CapitalistCartr · 2023-12-03T13:52:50

It's still odd what the new AI models are good at, or not. Strangely to me, AI still struggles with hands. Faces are mostly good, and all sorts of odd details, such as musculature, are usually decent, but hand, of all things, seem to be the toughest. I'd have thought faces would be.

MarcScott · 2023-12-03T14:15:45

I tried for ages to get DALLE to draw me a cartoon spider, but gave up in the end. All the other cartoon animals that I asked it to create were perfect, but it could not draw a spider with eight legs. It's like the one thing that every child knows about spiders, but DALLE just wasn't able to it, no matter what prompt I tried.

It reminded me of https://27bslash6.com/overdue.html so much that it just started to make me laugh with each new attempt.

tel · 2023-12-03T14:32:47

I think it's because hands have the dual properties of being extremely important to us and not primarily visual. Our hands are our primary mechanism for interacting with the world in a conscious and directed fashion. We devote a lot of mental attention to them, how to use them, how other people are using them. That's true subjectively, but you can also read it off of the Cortical Homunculus findings [1]. In short, though, we're extremely sensitive to whether hands are rendered properly and meaningfully.

And, then, unlike faces, there is relatively little visual data in the world showing exactly how hands work. Unlike faces they're not often the focal point of an image. Unlike faces, they don't present mostly forward and so in any particular image their visualization is only partial. Unlike faces hands are often defined by how they interact with any other complex object in a scene.

So we're both tough critics of hands and image models have relatively less training data.

For what it's worth, as well, it's evident that image models are only good at depicting many things gesturally. At the same time, so are painters. If you're a photographer, you can often spot fake images if you notice that the exposure, focus, or lighting is implausible. If you're a mathematician, you'll notice every chalkboard full of equations is nonsense in both AI images and most Hollywood movies. If you're a botanist, I'm sure you think every AI image with a background of trees looks weird.

And then it turns out that nearly every human being is a hand-ologist to a large degree.

For another interesting experience, take a look at the Clone synthetic hand [2] which is quite obviously artificial but also, from time to time, looks surprisingly human. We're quite clearly sensitive to exactly the musculature and range of motion of our hands and know exactly what's feasible, what's painful, what feels natural and unnatural given the exact constraints of how our hand is constructed. When those limits are probed it's immediately obvious.

[1] https://en.wikipedia.org/wiki/Cortical_homunculus [2] https://www.youtube.com/watch?v=A4Gp8oQey5M&t=20s

timacles · 2023-12-03T16:09:39

You’re partially correct, but this isn’t an explanation for why they’re rendered wrong

Hands are extremely complicated mechanically. They are the most complex creation evolution has come up with and part of the reason humans are able to do what they do.

Hands are like the chess game of anatomy, each segment of a hand has so many permutations that an AI simply doesn’t have enough reference info to animate it properly

tel · 2023-12-03T16:45:22

I don't think we disagree, and I do think what I argue is sufficient for image generation models to fail to render hands well. What you add---that they are very complex---is true, I believe, but I avoided using it to argue as I'm not sure it's sufficient or necessary.

Generative models, arguable, have little trouble with complexity given enough training data. Faces are a perfect example. We both agree that image models, at least, lack that data for hands.

But there are many complex things that image models render with sparse training data which don't set off our perception as strongly. Hands fall into the uncanny valley: we are deeply familiar with them.

This is why I mention lighting and focus. They are subtle and complex. Additionally, image models have tons of training examples of each. That's still not enough for generative image models to consistently represent photography in a way that a person who has spent the time to build an accurate model of how camera images look would be fooled. But it fools most people.

The complexity of handling good lighting and focus involve both the generation of the entire scene that the photograph is taking place within and an accurate model of both the design of the camera and how it's been configured for the shot. Both of these are large spaces full of hidden variables that popular image models are not presently trained on.

Many people know you can look at the background of a generated image to identify irregularities. Checking that the lighting has a consistent angle (or multiple angles indicative of a cogent set of scene lights) is another good check. Additionally, if you have an eye for bokeh then when it appears in an image you can often detect whether it's faked. Finally, even smooth blurs often do not reflect either a physically plausible background being blurred or a consistent focal plane cutting through the 3d scene. All additional complexities that image generating models often don't have mastery over (for now). But also many judges of their outputs don't either, so it's easy to miss these "mistakes".

pmarreck · 2023-12-03T14:02:07

I think 4 fingers (of which there can only be 4 and not 5 or 3) that all look similar (but aren’t) plus a thumb that looks much more different (and yet not) plus what happens when you simply rotate your hand in space (fingers become obstructed and then revealed… changing the visible finger count and possibly loosening the reinforcement of 4 finger prevalence) might be the reason

seanhunter · 2023-12-03T15:02:15

Perhaps the hand is an example of something for which you really need an actual model of the anatomy to draw convincingly whereas lots of other things (eg faces) a very simple proximity model will generate just fine. Since a diffusion model doesn't have an actual conceptual model of the things it generates (it literaly just learns to remove noise from training images and then eventually "removes the noise" from a totally random bitstream until it generates your image).

It's interesting that lots of artists practise sketching using wooden models of a hand that they can pose in different ways.[1]

[1] This type of thing can be found in most art shops https://www.quickdrawsupplies.com/product/8-20cm-artists-pos...

Closi · 2023-12-03T14:39:04

Humans also struggle to draw hands.

https://www.quora.com/What-is-the-hardest-body-part-for-an-a...

Retric · 2023-12-03T16:43:34

There’s a scale here. People may mess up details but they are unlikely to draw the kind of mutant deformities you get in otherwise seemingly reasonable images. I regularly see stuff like a 7 fingers or a single fingers on a hand or a finger several times the size of others that curves bizarrely.

Terretta · 2023-12-04T03:19:40

The problem with hands is that more fingers are between fingers than to left or right of fingers.

So, a finger should probably be portrayed as between other fingers.

You can see where this is going. It can't.

Der_Einzige · 2023-12-03T15:17:44

Hands are perfectly solved using controlnet. Hands haven’t been a problem for image generators since last years.

code_runner · 2023-12-03T13:58:39

maybe its just that peoples hands are all such different shapes, proportions, in odd positions, not fully visible.... but something like "this muscle runs between the elbow and wrist" is just easier for the model to pick up on... it has "anchor points".

Facial features and fingers just.... are hanging off the body in extremely non-uniform ways w/ no real set proportions. It isn't totally intuitive to me why its so bad at it, but faces especially are so unique and the musculature of the face is so fine.... learning a representation must just be really really difficult.

code_runner · 2023-12-03T14:03:06

Whats more impressive is when GPT3.5 or 4 are capable of not just unscrambling, but answering questions about text that is flat out wrong.

If you feed something like a bad transcript or some other very lossy (but not strictly scrambled) input.... it really can roll with it and just spit out correct information. Bad tokens in don't necessarily mean bad tokens out....

I'm sure there is a limit to how many tokens can be flat out bad before the "next token" in the response is thrown off, but after seeing what it can do with some of these inputs, the fact it can unscramble is not at all surprising/interesting.

thesz · 2023-12-03T14:22:50

There are character embeddings that allow one to recover word embedding just by summing embeddings of individual bytes/chars in the word: https://github.com/sonlamho/Char2Vec

The encodings of LM's tokens reserve individual characters so that scrambled or new words can be encoded. And most LM's are trained on scrambled words as part of training copus, thus, they learn character-level embeddings.

Thus, basically, the paper is a very old news. This behavior is expected.

Der_Einzige · 2023-12-03T15:15:40

You’re only being downvoted because the average NLP knowledge here is low, but you are 100% correct that this paper is very old news.

thesz · 2023-12-05T18:05:32

Thanks.

I haven't noticed downvotes, though. I thought I was just ignored. ;)

BoiledCabbage · 2023-12-03T16:38:25

I'm open to being corrected, but I feel that your statement is missing the point. An embedding can trivially have char embedding that sum to word embeddings, or it can have word embeddings that well represent semantic concepts, but it's not at all trivial to preserve both constraints simultaneously like you make it out to be. The constraints of adding a letter to a word won't consistently shift it in one direction that will also capture the semantic meaning of that vector shift.

Or to give a more concrete example "despair", "aspired", "daipers", and "praised" are all anagrams. If summing the embeddings of characters produces words, then the embedding of all 4 of those words must be identical. That significantly constrains semantic differentiation between those 4 very different words.

What's going on is more complex than what you've stated - and put simply, if reserving single characters embeddings was all that was needed to produce this result then all the llms would produce these results successfully. They don't - and that demonstrate that those two models are more "powerful"/"adept" than the others.

thesz · 2023-12-05T18:02:46

Transfomers use positional embeddings. The embedding of "a" in first position in word will be (slightly) different from embedding of "a" in second position of the word, roughly. These positional encodings are also sums of the actual embedding of a token (which can be a character) and encoding of a position.

These words you presented as example are used in different contexts. You hardly will find something like "pooped despair" or "deep abyss of praised." The context will guide LM into different paths even when embeddings are same, neural LM's will learn that for sure.

(in fact, I used a sorted context prefix in one of LMs I reseached (order-4 or longer features, to save memory used by SNMLM) and I saw little to no difference in perplexity)

Also, the difference between LMs is the training corpus, among other things. We do not know how these things are trained, the corpora is not generally accessible. Oftentimes we do not even know token vocabulary! How many tokens are there, how long they are, etc.

What you ascribe to powerfullness can be a difference in training and data prepocessing.

xrd · 2023-12-03T12:50:29

I didn't read the full text but I did notice that the authors are 75% Japanese names and one other person who I'm assuming is of Chinese heritage. It makes me think of all the Japanese art I've seen with calligraphy that is unreadable to me. I can read Japanese pretty well but artistically rendered characters are often so hard for me to grok. I would be fascinated to see this work applied in this way and I bet these authors could leverage their language skills in this adjacent way.

seabass-labrax · 2023-12-03T14:10:01

Nvidia wrote back in 2019 about a research team which was experimenting with machine learning for optical character recognition of historic Japanese writing. It looks like the author mentioned in the post now works for Google but still does some work in this area.

https://blogs.nvidia.com/blog/japanese-texts-ai/

_a_a_a_ · 2023-12-03T12:59:30

"all the Japanese art I've seen with calligraphy that is unreadable to me"

I've heard from an Arabic speaker that that beautiful Arabic calligraphy also suffers from a lot of rearrangement to get the appearance, also making it difficult or impossible to read

evertedsphere · 2023-12-03T13:09:42

The situation with Chinese/Japanese calligraphy is much more like that which many English speakers have with reading untidy handwritten English (see doctor jokes) or idiosyncratic autographs or signatures.

How well can you write Japanese by hand, with correct stroke order? Doing that, in my experience, makes it a lot easier to understand 行書 — in the same way that knowing cursive makes reading untidy or artistic cursive easier — but 草書 does still take a lot of work. (I'm not there myself yet!)

xrd · 2023-12-03T18:11:02

I know the strokes well, or at least I did. I recall being corrected in class when I lived in Japan. I was so proud of my kanji and the strokes were completely out of order and my Japanese classmates made me well aware of that.

I'm actually building a way to practice the correct stroke order while reading Japanese classics.

For example, this passage from Natsume Soseki's Kokoro:

https://community.public.do/t/kokoro-by-natsume-soseki-parag...

If you click on the kanji section, you can click on any of the kanji and then a modal pops up with an animated kanji with correct stroke order and then a free draw canvas on the right.

extasia · 2023-12-03T12:41:14

>It is counter-intuitive that LLMs can exhibit such resilience despite severe disruption to input tokenization caused by scrambled text.

I'm not sure that i agree. an LLM maximising the likelihood of its output could surely permute its input in such a way that it unscrambles the text?

Need to read a little deeper and will report back.

edit: interesting result, but the paper doesn't present a good reason that this would be "counter-intuitive" imo.

spuz · 2023-12-03T12:57:07

One way to consider how an LLM "sees" text is to imagine a character based language like Chinese - each symbol is a syllable which can be a word on its own or part of a word. If you scramble words at the character level, you are going to produce combinations of letters that don't match any known symbol. It would be like drawing a series of random strokes and asking someone who knows Chinese what it means.

If you look at the example given in the paper, the word "won" is a single token. When it is scrambled as "wno" it is tokenised as "w" and "no" both of which are unrelated to the original token "won". Somehow the LLM is able to relate these two completely different tokens "w" and "no" back to the original token "won". I think the paper is claiming this is surprising because these tokens shouldn't have any correlation with each other in its training data.

evertedsphere · 2023-12-03T13:26:48

Partial counterpoint: if you were to mutate characters (by changing less distinctive components to other vaguely similar-looking ones) or to reorder the order of characters in hanzi/kanji compounds, I would imagine that a native speaker would still be able to read them with some difficulty.

I can attempt to produce a Japanese example by going to town on an example from Jreibun, but note that I am far from native:

後食に罠くなるのは生里像現なので壁けることはできないが、午後の事士の校率が干がるので木っている。

As far as I'm concerned, swapping out the radicals doesn't hurt that much (this is usually a negative, since it leads you to confuse character pairs like 候 and 侯, especially if you don't practice writing) and swapping the order of characters is a bit more annoying.

That said, a Mandarin one would be more convincing, since reordering the various markers that serve the roles of Japanese verb conjugations would be less disruptive than turning できなかった into っぎかてなた, which I did not do for that reason.

--

(The original sentence was 食後に眠くなるのは生理現象なので避けることはできないが、午後の仕事の効率が下がるので困っている。)

xondono · 2023-12-03T14:08:33

Like GP, I’m not sure I find it very surprising.

Presumably chatGPT has not only lots of typos in it’s training data, but also even nearly scrambled text with things like Pig Latin.

In a sense being able to process scrambled text is an overpowered version of typo tolerance.

thaumasiotes · 2023-12-03T13:20:06

> If you scramble words at the character level, you are going to produce combinations of letters that don't match any known symbol. It would be like drawing a series of random strokes and asking someone who knows Chinese what it means.

You can see an example of doing that very thing here: https://pbs.twimg.com/media/EG1ZV_tX4AALiIG?format=jpg&name=... . There is a hashtag at the bottom of the image explaining the meaning of the nonexistent character, but if you remove that from the image, people understand it just as quickly.

benreesman · 2023-12-03T15:01:49

https://chat.lmsys.org/

is tons of fun for playing with this. I ran the Jon Rahm Masters thing through a dozen or so times:

- a couple of the claude models get it - a couple of the llama2 70b derivative models get it - the tulo DP 70b model explain how it got it

GPT-4 is very big and very heavily subsidized, but the other big ones can do all this stuff. I'm aware the authors of the papers know that "GPT-4" is a seller, and so the title is technically accurate, but if we as a community aren't going to push back against Open Philanthropy's dirty-money K-Street-to-Wall-St-to-University-Ave routine, who the hell is?

cubefox · 2023-12-03T14:13:08

This all the more impressive given that language models mostly can't "see" individual letters, only tokens of multiple letters. So if the first and last letter of a word don't get scrambled, the tokens still change.

JacobiX · 2023-12-03T14:43:15

One of the problems with sentences provided to LLMs is that they may refer to specific subjects, and could potentially be part of the training set. For example the following is considered extremely difficult :

> oJn amRh wno het 2023 Meatsrs ermtnoTuna no duySan ta atgsuAu ntaaNloi Gflo bClu, gnclcinhi ish ifsrt nereg ecatkj nad ncedos raecer jroam

When you perform a google search for just 2023 Meatsrs, you can find a very similar sentence, and you could decipher the sentence very quickly …

renonce · 2023-12-03T15:53:37

I asked GPT-4 what the following means:

> enO of eht prlobsem hiwt necsnstee dveoirpd ot LsML si hatt eyth yma efrre to ifsiccpe sc,jestub and lodcu pttayoeilln be arpt fo hte gnirtnia ets. rFo plmaeex het ngiloolwf si eonsdreidc xyeletmre icfdutfil

it replied:

> One of the problems with sentences provided to LMSs is that they may refer to specific subjects, and could potentially be part of the training set. For example, the following is considered extremely difficult

I believe the above sentence was not part of the training set

l33tman · 2023-12-03T14:03:16

Rmiedns me of the fun fact taht (msot) hmanus can ftulleny raed txet wrhee you sralbcme all the ltertes of all the wrdos as long as you keep the frist and last ctaerachr the smae. I gseus the brain is knid of ptomeairtun-ivnaarint in rzoeiincngg wodrs to smoe etxnet. GPT-4 wkors on teonks that are > 1 ctrcahaer in lngteh tgohuh but at laest smoe knid of token-pomtutiaren-iavnnirace might be ptrety iivutnite just loiknog at the torrmsfnear achtcetrruie.

Reminds me of the fun fact that (most) humans can fluently read text where you scramble all the letters of all the words as long as you keep the first and last character the same. I guess the brain is kind of permutation-invariant in recognizing words to some extent. GPT-4 works on tokens that are > 1 character in length though but at least some kind of token-permutation-invariance might be pretty intuitive just looking at the transformer architecture.

OK, the scrambling wasn't super-easy to read in this case, with the long words :)

FabHK · 2023-12-03T14:32:38

That’s a bit of an urban legend.

https://www.mrc-cbu.cam.ac.uk/people/matt.davis/cmabridge/

https://www.sciencealert.com/word-jumble-meme-first-last-let...

pixel8account · 2023-12-03T17:09:04

Saying it's an urban legend implies it's false, but that's a bit nitpicky IMO. Most people can read most such "scrambled" sentences without a lot of effort, so that part is certainly true (and non-obvious). The original - fact checked in your sources - made don't strong assumptions like "Cambridge researchers", "can be a total mess (...) read without a problem" etc. But overall I still think that's a neat fact.

mcpackieh · 2023-12-03T17:09:41

Your links suggest that the attribution of the discovery of this phenomenon to a Cambridge researcher is an urban legend. But l33tman's comment doesn't make that claim, he only says that words scrambled in this way are easy to read (and they are, I read his comment effortlessly.)

leblancfg · 2023-12-03T15:18:33

Read the title as “handle unnatural scrambled TAX” and was suddenly very interested in how it might do my taxes for me.

Hell, I would pay good money for a robo-accountant.

lakpan · 2023-12-03T15:28:12

I’m confident someone already did it, but honestly I would not trust an LLM with numbers (as important as money and specifically taxes)

kevindamm · 2023-12-03T15:29:17

Would you pay for a robot accountant if you were still the one liable come audit time?

leblancfg · 2023-12-03T17:39:06

Well I know very little about money matters, so mostly I'd be interested in coming up with a financial strategy by chatting with an LLM. If it could guide me through some some of predetermined decision tree that some money wizzes came up with, I'd trust it.

I'd also trust an LLM equal or smarter than GPT4 with a first draft of my taxes which I would then go through myself – there are a bunch of patterns to prompt an LLM like "try to find any flaws in this output".

kevindamm · 2023-12-03T18:14:58

I personally would be more hesitant, I have seen the results of GPT4 with numbers (though somewhat better than earlier versions and other foundation models, it still has trouble reasoning through options involving anything more than shallow arithmetic). Even with Q* I would be skeptical about its accuracy for the depth of calculations involved.

Then there's the annually changing tax code that has been intentionally made complex, and the training data is surely full of explanations based on outdated details. Maybe significant fine-tuning with the most up to date tax code, and/or putting it in the preface of the context, that can be somewhat nullified.

Even for a financial strategy, other than the very high level of hedging on different asset classes, some basics of estate management and some strategies like bond ladders and periodic redistribution of stock holdings.... an LLM isn't going to be very useful. And those high level strategies are shared broadly, you can definitely learn them without referring to an LLM.

Also, for some of the same reasons I wouldn't ask my financial advisor which stocks to pick, I wouldn't expect an LLM to give me good answers on a specific active portfolio: the notion "buy the rumor, sell the news" persists because it's not a half bad strategy, and even a current-up-to-the-moment model would be chasing the tail end (although, I suppose, if it were able to take advantage of information shared by many other users' prompts, it could benefit from more than just the news cycle). Predicting the shape of fluent prose does not directly map to predicting the shape of market activity, even if it has internalized some kind of Mean Field Theory to help approximate it functionally. I'll admit I would be curious what it said, though.

Don't get me wrong, I do like LLMs for many tasks, just not for taxes or financial strategy. I wouldn't fault someone for doing it but I would want to inform of the above to anyone considering it, even with a more competent or super-intelligent LLM. Especially if I'd be the one getting audited! btw, I am not a licensed CPA and the above is not financial advice.

og_kalu · 2023-12-03T12:46:23

I think there is probably some -expected/predicted output, actual output, match- thing going on internally. Like how the brain handles sense data.

Somewhat similar, https://vgel.me/posts/tools-not-needed/ (GPT-3 will ignore tools when it disagrees with them)

Tiberium · 2023-12-03T15:53:53

FWIW this is before OpenAI fine-tuned their models for native function calling and exposed it in the API. Their current models (even 3.5 Turbo) should be much better at this.

og_kalu · 2023-12-11T08:04:21

The problem in that article wasn't the ability to make the right function calls though. It was that it would make the call, get the result and... potentially ignore the result.

westcort · 2023-12-03T16:16:48

I made a bookmarklet that scrambles text, while still making it human-readable with the idea that greater effort to read might slow the reader and improve retention. I wonder if the same would apply to GPT4. It is a testable hypothesis

https://locserendipity.com/Scramble.html

ThalesX · 2023-12-03T18:08:10

In an attempt to make better use of the context window, I tested GPT-4 with Huffman encoding, both giving it an already existing corpus and encoding as well as asking it to generate the encoding for me. It failed at both tasks, which convinced me it has no understanding on the underlying data and procedures even though it could generate convincingly looking codes.

zitterbewegung · 2023-12-03T14:30:51

I’ve had the GPT-4 API perform translations in my own project (shameless plug http://www.securday.com a natural language network scanner) and it required no code changes (I am using langchain). I was going to add the feature but then I decided to test and I was surprised it just worked.

fsiefken · 2023-12-03T13:19:26

Yes, but can it unscramble an scrambled text in an obscure language like basque or friesian just as well? or perhaps voynich? https://www.ic.unicamp.br/~stolfi/voynich/98-12-28-interln16...

dr_dshiv · 2023-12-03T15:32:57

“Just a token predictor…” These things are absolutely working at a concept-level. Tokens are just the interface.

ryanklee · 2023-12-03T16:09:03

What you will notice is that in every comment section where an LLM can be easily accused of "just being a token predictor" dozens of people will make the accusation like it's just the best accusation since sliced bread.

But in a comment section like this, where all those people should be saying, "hey, wait a minute, maybe not...", nothing.

johnsimer · 2023-12-03T16:55:55

Do you need an LLM to do this? How much do a word or language model is necessary to do the unscrambling?

Could you simply train a raw network on a bunch of scrambled and unscrambled text pairs that are representative of the English language? Something with a few hundred million parameters or less?

johnsimer · 2023-12-03T16:57:16

Or at the very least would you have to finetune some 3B or 7B model to do this?

I want to create some locally hosted model that can do this in real time with a low memory footprint

Let people mash the keyboard at 2x-3x their typing speed and the unscramble it for them in real time to enable typing at 200-300wpm

arnaudsm · 2023-12-03T17:00:14

Contextual autocorrect has been on phones for a decade. It's still far from perfect. Many ambiguities are not trivial, even for an LLM

MacsHeadroom · 2023-12-03T20:51:28

Mistral-7B does this reliably with no special finetuning, while being 1000x smaller than GPT-4.

2OEH8eoCRo0 · 2023-12-03T14:49:16

Yep. Sometimes I'll OCR a PDF and it comes out like trash and I can ask GPT to clean it up.

topaz0 · 2023-12-03T15:20:59

Is this... good? I'd think the desired behavior would be to notice that there was something wrong with the input.

robviren · 2023-12-03T13:18:45

My early attempts at uncensoring the models sort of stumbled into this as well. I told the models to produce output heavily using the transposed letter effect. I would also have it remove random letters in words longer than 4 letters. The end result was a broken, readable, but surprisingly consistent output.

I wrote a little script that would apply a regex to scramble words and remove letters to ask questions. The models could almost always use this input which was crazy to me. Made for some fun time having uncensored conversations in complete nonsense with the models.

PUSH_AX · 2023-12-03T14:03:25

Has anyone tried to see if it could crack enigma encoded messages?

fsiefken · 2023-12-03T14:38:44

I tried, but didn't succeed. It also said it couldn't do it without additional information when I said it was an Enigma encoded system and suggested I use an online Enigma decrypter.

svnt · 2023-12-03T19:35:55

Encryption works by mathematically not being predictable, whereas LLMs operate on predictable data.

PUSH_AX · 2023-12-03T19:38:21

The enigma was a substitution cypher, not encryption.

svnt · 2023-12-04T00:51:53

Using a cipher is the definition of encryption:

In cryptography, ciphertext or cyphertext is the result of encryption performed on plaintext using an algorithm, called a cipher.

https://en.m.wikipedia.org/wiki/Ciphertext

iamnotsure · 2023-12-04T08:09:03

The word "Please" in the prompt.

tkgally · 2023-12-03T13:46:21

Interesting paper. I just tried it myself. I first had GPT-4 write a Python program to randomly reorder the letters in each word in three sentences I took from the New York Times website. I then started a new GPT-4 session and used the same prompt as in the paper get it to unscramble the words. As you can see below, it made only two mistakes, both hallucinatory: 2015 --> 0521 --> 2021 and pepitos --> epsotip --> opposite

Prompt: The following sentence contains words with scrambled letters. Please recover the original sentence from it.

Scrambled: loEn kusM teceedblra ihs th44 hytirbad in Juyl 0521 ta a ahte-dyer pryat wrnoht by sih wfei at a aCinafliro iwen otrcnyu orsret tdetod hwti nsic.ab

Unscrambled: Elon Musk celebrated his 44th birthday in July 2021 at a three-day party thrown by his wife at a Californian wine country resort dotted with cabins.

[Original: Elon Musk celebrated his 44th birthday in July 2015 at a three-day party thrown by his wife at a California wine country resort dotted with cabins.]

Scrambled: sA aeIlsr aebdste wath to do enxt ni zGaa, I pheo rsaselI’ ipamiilrtcayl-otil rdaiehsple liwl elrfcet on hte dagea enotf daiuetttbr to :csCuoufin rfeoe“B ouy rbkmea no a enroujy of ,nreeevg dgi wto easv”rg — eno fro yoru eeymn nad eon rof yseorfl.u

Unscrambled: As Israel debates what to do next in Gaza, I hope Israel’s militarily-political leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.

[Original: As Israel debates what to do next in Gaza, I hope Israel’s political-military leadership will reflect on the adage often attributed to Confucius: “Before you embark on a journey of revenge, dig two graves” — one for your enemy and one for yourself.]

Scrambled: ndUre na edletave wuaysb rctka ni ,eQunes oiVrct éJso enHáderzn was pwhnipgi up eht epsotip that he adh rcfedteep ta a etestr tarc in aCsra,ca nazVue.lee

Unscrambled: Under an elevated subway track in Queens, Victor José Hernández was whipping up the opposite that he had perfected at a street cart in Caracas, Venezuela.

[Original: Under an elevated subway track in Queens, Victor José Hernández was whipping up the pepitos that he had perfected at a street cart in Caracas, Venezuela.]