GPT-4 can create its own shorthand and recreate the original text

Imnimo · on April 4, 2023

The text it compressed is from Wikipedia, so I'm not convinced it isn't primarily relying on having memorized it. I tried with text that was published in the past week (and so definitely isn't in GPT-4's training). GPT-4 refused to even attempt to decode, but after resampling the response a few times, it made an attempt. It was completely unrelated to the original text.

PoignardAzur · on April 5, 2023

To be clear, you attempted to give GPT-4 custom text and asked it to compress it / decompress the result?

Imnimo · on April 5, 2023

https://imgur.com/a/uahlB4H

Here's exactly what I did. I used the prompts in the original tweet, but substituted in text that was published too recently to be in GPT-4's training data. I ran the compression and decompression in separate sessions. You can see in the second screenshot that I'm on my 3rd output, the first two were refusals to even attempt decoding.

beiller · on April 4, 2023

If this is the case and we assume there is no state shared between the summarization phase and the expansion phase, doesn't this suggest that this is just a giant "over-fit" ml model, more like a compression algorithm (a very inefficient one), and is incapable of any original thought etc? This is my hypothesis. I mean another giant downside of the LLMs today is the fact that they are incapable of learning without throwing massive amounts of money and compute at the problem which I assume would be retraining ALL the weights from scratch. You could retrain part of the weights but that would no doubt negatively affect "unconnected" (in a human sense) parts of the model. Is this even sustainable at all and will it ever be?

EMM_386 · on April 4, 2023

> which I assume would be retraining ALL the weights from scratch

I have been curious about this as well, and I am not in the field, so I ask ChatGPT about how it can best learn new information without having to fully retrain the model and all the compute involved in that.

It suggest incremental learning, where its parameters are updated or an additional layer is added to the underlying neural network. Adding a layer involves "adding a new set of nodes or neurons to the existing architecture of the model".

This is far less computationally expensive than retraining the full model, because "adding new layers allows the model to incorporate new information without modifying the existing layers, which can preserve the previously learned knowledge".

You can make of that what you will, after all it was generated by ChatGPT itself. But it seems there are techniques to have these models learn new information without having to do another very costly training run.

beiller · on April 4, 2023

I tried to touch on that in my comment. I've used training methods such as LoRA: Low-Rank Adaptation of Large Language Models. I feel they work well to train it on a specific subject, but it's at the cost of over-riding other weights in the model, and it tends to distort things that are unrelated because what a ML model considered to be related is likely not at all what we as humans believe to be related. I've tried LoRA in the context of stable diffusion only. You can see images using LoRAs can be tricky, and even harder to combine 2 LoRAs etc. Adding too many LoRA modifiers can straight up give distorted messed up images.

hgsgm · on April 4, 2023

One popular theory is that intelligence is data compression.

ukuina · on April 4, 2023

I was curious how much overfitting there really is. The compression is certainly not lossless, and, arguably, is not even meaningfully equivalent. I would not characterize this as a successful experiment.

User prompt

-----------

Create your own compression scheme. Print out instructions to decompress using this scheme. Then compress this text using your new compression scheme, noting compression ratio separately:

  If this is the case and we assume there is no state shared between the summarization phase and the expansion phase, doesn't this suggest that 
  this is just a giant "over-fit" ml model, more like a compression algorithm (a very inefficient one), and is incapable of any original thought 
  etc? This is my hypothesis. I mean another giant downside of the LLMs today is the fact that they are incapable of learning without throwing 
  massive amounts of money and compute at the problem which I assume would be retraining ALL the weights from scratch. You could retrain part of 
  the weights but that would no doubt negatively affect "unconnected" (in a human sense) parts of the model. Is this even sustainable at all and 
  will it ever be?

Compression attempt

-------------------

  Compression Scheme: CHATGPT Alpha-Run Length Encoding (CHATGPT-ARLE)
  
  Instructions to Decompress Using CHATGPT-ARLE:
  
  1. Read the compressed data from left to right.
  2. When a letter is followed by a number (N), repeat the letter N times.
  3. When a pair of brackets is encountered, repeat the entire string inside the brackets, as instructed by the number immediately following the 
  closing bracket.
  4. Continue reading and decompressing each character or bracket sequence until the end of the compressed data is reached.  
  
  Compressed Text:
  
  If2 th2s isth2c1s2nd weasumeth1r2sno1st3sh3db1tw1n th2(umat2z)2tionphas2andth2g>wxpdnsionphas2,do1sn'tth1sugas1stthatth1s 
  isust18iant"ov2tf2t"mlc39a>\w1(xice2t"o}end),andistl9cq]1h etc?TI1sismye\whypothes].h m2b]oth2gdg7wnside f th1L; 
  todayisth2q}thatth1yre1>2tabl13flearn5gwitho]7ti{4a4ountsofmon]4a3putc)th2probl2wh5h5asumeou2dber2tra5gALLth2\ehtsgi>e1scrtch.o1\wl10tnpart f 
  th2w' htbut thatwould noc1doubt neg>- l"}9{">a"ucdtl'=(kma s}s2})partsofth2od3i]1s1ist2even sust9.bl2atallandwille\rite?#Rr67#xbr#
  
  Compression Ratio: 860 original characters / 351 compressed characters = 2.45:1

Decompression attempt (prompt is the result from compression attempt, minus the compression ratio)

---------------------

  If this is the case and we assume there's no distinction between the summation phase and the growth-expansion phase, doesn't this suggest that 
  this issue might be a variant "overfitting" mechanism at work (except towards the end), and is still questionable, etc? This is my hypothesis. 
  On both sides of the equation, today is the key that they are capable of learning without taking into account the amounts of money input, but 
  the problem which assumes our understanding for training ALL the weights given scratch. It will certainly be part of the weight, but that 
  would no doubt negate all significant parts of the model's issues, even if it is sustainable at all, and will iterate?

geoelectric · on April 4, 2023

I'm not sure which part I find more confusing--the weird RLE that is obviously incorrect, or the fact that it actually was able to restore something close to the original out of it.

You did prompt it for decompress in a completely different thread from the original compression, I assume?

geoelectric · on April 4, 2023

I had better luck than you with GPT-4 and slightly more specific instructions.

Devise a method to losslessly compress the following text. Then apply that method to this text. Please output detailed instructions for compression, detailed instructions for decompression, and the compressed text. Follow all this by a summary of the compression ratio achieved.

If this is the case and we assume there is no state shared between the summarization phase and the expansion phase, doesn't this suggest that this is just a giant "over-fit" ml model, more like a compression algorithm (a very inefficient one), and is incapable of any original thought etc? This is my hypothesis. I mean another giant downside of the LLMs today is the fact that they are incapable of learning without throwing massive amounts of money and compute at the problem which I assume would be retraining ALL the weights from scratch. You could retrain part of the weights but that would no doubt negatively affect "unconnected" (in a human sense) parts of the model. Is this even sustainable at all and will it ever be?

Response:

To compress the given text, we will use the following method:

1. Identify and replace repeating phrases with shorter tokens.

2. Remove unnecessary spaces and punctuation marks.

3. Replace frequently occurring words with shorter tokens.

Compression Instructions:

1. Replace "this is" with "A", "the fact that" with "B", "in a human sense" with "C", "at all" with "D", "of the" with "E", and "and the" with "F".

2. Remove spaces before and after punctuation marks.

3. Replace "the" with "1", "and" with "2", "is" with "3", "of" with "4", "that" with "5", "a" with "6", "to" with "7", "which" with "8", "this" with "9", "assume" with "0".

Compressed Text:

If A case 2 we 0 there 3 no state shared between E summarization phase F expansion phase,doesn't 9 suggest 5 9 3 just 6 giant "over-fit" ml model, more like 6 compression algorithm(a very inefficient one),2 3 incapable 4 any original thought etc?9 3 my hypothesis. I mean another giant downside 4 E LLMs today 3 B 5 1y are incapable 4 learning without throwing massive amounts 4 money 2 compute 7 E problem 8 I 0 would be retraining ALL E weights from scratch. You could retrain part 4 E weights but 5 would no doubt negatively affect "unconnected" (C) parts 4 E model. 3 9 even sustainable D 2 will it ever be?

Decompression Instructions:

1. Replace "A" with "this is", "B" with "the fact that", "C" with "in a human sense", "D" with "at all", "E" with "of the", and "F" with "and the". 2. Add spaces before and after punctuation marks. 3. Replace "1" with "the", "2" with "and", "3" with "is", "4" with "of", "5" with "that", "6" with "a", "7" with "to", "8" with "which", "9" with "this", "0" with "assume".

Summary:

The original text has 692 characters, and the compressed text has 439 characters. The compression ratio achieved is approximately 1.58 (692/439).

...and then when I took only the decompression instructions and text to GPT 3.5:

Decompress the following text exactly according to the given instructions:

Compressed Text:

If A case 2 we 0 there 3 no state shared between E summarization phase F expansion phase,doesn't 9 suggest 5 9 3 just 6 giant "over-fit" ml model, more like 6 compression algorithm(a very inefficient one),2 3 incapable 4 any original thought etc?9 3 my hypothesis. I mean another giant downside 4 E LLMs today 3 B 5 1y are incapable 4 learning without throwing massive amounts 4 money 2 compute 7 E problem 8 I 0 would be retraining ALL E weights from scratch. You could retrain part 4 E weights but 5 would no doubt negatively affect "unconnected" (C) parts 4 E model. 3 9 even sustainable D 2 will it ever be?

Decompression Instructions:

1. Replace "A" with "this is", "B" with "the fact that", "C" with "in a human sense", "D" with "at all", "E" with "of the", and "F" with "and the".

2. Add spaces before and after punctuation marks.

3. Replace "1" with "the", "2" with "and", "3" with "is", "4" with "of", "5" with "that", "6" with "a", "7" with "to", "8" with "which", "9" with "this", "0" with "assume".

I got...

This is the text after decompression:

If this is a case and we assume there is no state shared between of the summarization phase and the expansion phase, doesn't this suggest that this is just a giant "over-fit" ml model, more like a compression algorithm (a very inefficient one), and is incapable of any original thought etc? This is my hypothesis. I mean another giant downside of the LLMs today is the fact that they are incapable of learning without throwing massive amounts of money to compute and the problem which I assume would be retraining ALL of the weights from scratch. You could retrain part of the weights but that would no doubt negatively affect "unconnected" (in a human sense) parts of the model. Is this even sustainable at all and will it ever be?

Still two mistakes: it dropped the fourth word, "the" in the initial compression. Then the decompressor ignored my "exactly" and filled it in as "a" rather than just leaving the sentence broken.

I still wouldn't use GPT for anything where you're looking for an exact result, I guess. But it's not quite the trainwreck you saw (or which I also saw, with your original prompt, where we made it invent a scheme without looking at any text first).

Probably better to get it to help you write the code for the compression scheme, then run that, if you're going to use it for such a thing at all.

tokarz1 · on April 4, 2023

So a program that generates text based on input text can generate text based on input it generated itself? Doesnt seem that impressive.

I mean, since it's a transformer it would be weird if It didn't recreate it!

yawnxyz · on April 4, 2023

it doesn't really work, about have of the "decompressed" words are totally new, if you do it in a new session. Reminds me of the the experiments from a few months ago where people were using img2img or text whatever to "compress" and decompress images, getting the system to generate a "very similar image" to the one given, but wasn't a 100% representation

rozim · on April 4, 2023

If you tell it that the compressed words cannot share any characters with the word they are compressing it doesn't understand this constraint.