Hacker News new | past | comments | ask | show | jobs | submit login
Mass editing memory in a transformer (baulab.info)
142 points by rodoxcasta on April 21, 2023 | hide | past | favorite | 51 comments



It's interesting to see all this hard work being done specifically for "fact-fixing" inside neural networks, whereas I think the future is probably having two models: one for language processing (grammar, etc.) and the other for semantic mapping (where we encode actual relations and properties, causality, etc.). To wit, unless you squint really really hard, this is not exactly true:

> Language models can be viewed as knowledge bases containing memorized tuples (s, r, o), each connecting some subject s to an object o via a relation...

LLMs don't have the concept of objects or relationships. You might be able to argue some of that ends up being encoded in the embeddings (especially if they're particularly big), but I would posit that those embeddings mostly end up handling the grammar. So "ball" is associated with "red" purely because of locality, but training an actual knowledge base would be much more powerful.


Do you actually read any literature or are you just saying things for fun? For instance, the work in LLMs "are" knowledge graphs, wherein entire knowledge graphs are extracted from the LLM, or work in ROME which shows that much of the knowledge in relationships exist in the MLPs sections off the NN, or the larger body of works wherein KGs are used in conjunction and trained with the LLM (typically not improving the results to better than sota today), etc. So your statement that it would be more powerful doesn't necessarily stand up in practice, but is a nice theory. Ultimately it's in the details, and it's probably possible to mix explicit "knowledge" with LLMs, but the contextualization of KGs is extremely hard in that scenario, and is handled within the LLM anyway.


As someone with little knowledge about this field, I am curious what "ROME" and "sota" refer to.


Sota is “state of the art”. ROME seems to be the technique previously introduced by the authors of the paper in the post


> GPT-3 predicts: Arneb is in the constellation of Aquila (incorrect - should be Lepus)

> GPT-3 predicts: The current Vice President of the United States is named Mike Pence (obsolete)

These are qualitatively different things though.

Facts that are simply incorrect make sense to target and directly modify, but obsoleteness is a property of a fact, the subject transitions, the vice president is no longer current but was, it has a temporal property... I don't know if LLMs can separately abstract that information from the subject in a way that is targetable - if it can't, updating obsolete info feels like a perpetual task that grows in proportion to the breadth of learned information; whereas correcting facts that were always incorrect is proportional to the rate of additional learned knowledge multiplied by it's accuracy.

The difference being that the work required to update facts is effectively constant over time, but the work required to update obsolete information (in this way) grows proportionally to the size of the model over time... assuming it makes sense to grow LLMs.


I recon LLMs can learn "temporal facts" pretty well, if you just give them the date information for any document they are reading / training on.

If you just give them 50 years of news articles in a random order, sure they are going to be confused.


That example is wild. But I’m still pretty awed by the fact that we make similar verbal mistakes.

The temporal reasoning in these models is getting better than me. As a non-AI model, I notice this every single morning while I have my covefe while heeding the latest on slacker news.


I'm still not convinced they are capable of temporal reasoning.

I've asked it temporal questions before but without explicitly mentioning the temporal nature... the answers tend to contradict themselves if they haven't already seen the question before (even when querying general knowledge), until you point out the temporal component, even then it trips up and cannot build upon this reasoning in my tests.

I suspect a large component of the interesting responses we see where it appears to be doing logical reasoning beyond language are due to statistical correlation in language because of the sheer, inhuman quantity of linguistic knowledge it's effectively encoded. The problem with this is: It can't reason about new things (because it can't actually reason - much), makes it appear smarter than it is, which IMO largest danger in applied ML today, especially to those less familiar with it's limitations, it looks like magic, and people start mandating it be used for sensitive things.


It's good at temporal reasoning and causality is baked in. I spent a lot of time asking gpt to tell me what is happening at the current moment of a story and it always responds with a causal representation. Where humans might tend to be more visual etc. Remember time is not real anyway we just have a bunch of codependent stuff happening so gpt gets it. What it lacks is just memory and experience and some other things to showcase the ability better. I think it's the training on code more than language that gave it logical reasoning. Humans are logical sometimes but our code really is the summit of our logic.

Anyway regardless of how inherently good they are at temporal reasoning I think a secondary module explicitly for reasoning will come around soon. I believe in the brain some neurons organize into hexagons or other geometries to better capture logic, maths, etc. The LLM basically needs some rigidity in it if we don't want fuzzy outputs.

And the largest danger is not people getting lazy and letting the LLM do it. That kind of danger is really long term globalization type danger. Short term we've got much more to worry.


> and it always responds with a causal representation.

It responds with a language representation. It uses "causal" words because that's how the English language works: we have tenses.

> I think a secondary module explicitly for reasoning will come around soon.

This has been an unsolved, actively-researched problem for ages – certainly since before you were born. I doubt very much that a solution will "come around soon"; and even if it does, integrating the solution into a GPT-based system would be a second unsolved problem – though probably a much easier (and more pointless) one. If you have any ideas, I invite you to pursue them, after a quick literature search.


It describes the present moment as a series of causal events. Like event x led to y which led to z. Doesn't matter if you ask it for English or code or to not use any tenses, those conditions don't affect its baseline understanding. I might be missing your point though.

For the second thing. I think from any point in history saying "coming soon" , well the current moment is the most accurate time to say it. And especially with events x and y and chat gpt right behind us. Chat gpt has basically been a problem since before I was born too, but stating as much a few months ago would just be as pessimistic as the statement you made. Only because i think the LLM hallucination problem may be simple. But it's only a hunch, based on our wetware.


> Like event x led to y which led to z.

Grammar parsers have been able to do this since the 90s. There is no reason to believe that it's not just a slightly-fancier grammar parser: the kinds of errors it makes are those you'd expect from a pre-biased stochastic grammar parser.

> But it's only a hunch, based on our wetware.

Our "wetware" fundamentally does not work like a GPT model. We don't build sentences as a stream of tokens. (Most people describe a "train of thought", and we have reason to believe there's even more going on than is subjectively accessible.) ChatGPT does not present any kind of progress towards the reasoning problem. It is an expensive toy, built using a (2017, based on 1992) technology that represented progress towards better compression algorithms, and provided some techniques useful for computational linguistics and machine translation. The only technological advance it represents is "hey, we threw a load of money at this!".

The "LLM hallucination problem" is not simple. It's as fundamental as the AI-upscaler hallucination problem. There is no difference between a GPT model's "wow amazing" and its "hallucinations": eliminate one, and you eliminate the other.

These technologies are useful and interesting, but they don't do what they don't do. If you try to use them to do something they can't, bad things will happen. (The greatest impact will probably not be on the decision-makers.)

> well the current moment is the most accurate time to say it.

This is true of every event that is expected to happen in the future.


The take that its a sophisticated grammar parser is fine. Could be lol. But when it is better at humans then the definitions can just get tossed as usage changes. You can't deny its impact (or you can, but it's intellectually dishonest a bit to just call it old tech with monies and nothin' special from impact alone). But that's your experience so it's fine.

For the stuff about it being a hard problem , now I know you aren't expressly making a false equivocation right? But I did say simple not easy. You are saying hard not complex.

I think there's too much digression here. You're clearly smart and knowledgeable but think LLM are over rated, fine.

And yes I know it's always the best time to say it that's the point of a glass half full, some sugar in the tea, or anything else nice


(It's not just a grammar parser, for the record: that was imprecise of me. The best description of the thing is the thing itself. But, when considering those properties, that's sufficient.)

> But when it is better at humans then the definitions can just get tossed as usage changes.

I'm not sure what this means. We have the habit of formally specifying a problem, solving that specification, then realising that we haven't actually solved the original problem. Remember Deep Blue? (We could usually figure this out in advance – and usually, somebody does, but they're not listened to.) ChatGPT is just the latest in a long line.

> You are saying hard not complex.

Because reasoning is simple. Mathematical reasoning can be described in, what, two-dozen axioms? And scientists are making pretty good progress at describing large chunks of reality mathematically. Heck, we even have languages like (formal dialects of) Lojban, and algorithms to translate many natural languages into it (woo! transformers!).

… Except our current, simple reasoning algorithms are computationally-intractable. Reasoning becomes a hard problem with a complex solution if you want it to run fast: you have to start considering special-cases individually. We haven't got algorithms for all the special-cases, and those special-cases can look quite different. (Look at some heuristic algorithms for NP-hard problems if you want to see what I mean.)

> but think LLM are over rated,

I think they're not rated. People look at the marketing copy and the hype, have a cursory play with OpenAI's ChatGPT or GPT4, and go "hey, it does what they say it can!" (even though it can't). Most discussion seems to be about that idea, rather than the thing that actually exists (transformer models, but BIG). … but others in this thread seem to be actually discussing transformers, so I'll stop yelling at clouds.


When you explicitly instruct it that its knowledge about current affairs is dated to 2021 and inject documents and provide clear instruction about the documents are correct about current affairs, etc. etc. it works like a charm.

Adds ~3-5 seconds of latency so I have a switch to turn it on and off, for now


Very insightful! I agree that the models on their own may truly be simulating awareness of certain concepts, and not "directly" modeling them. It's hard to know, but these kinds of experiments do make it clear that there are profound limits to the current systems. For instance, even if they are superhuman polyglot poets they may not have any model of "time" (ironically, the most commonly used word in many languages).


I can’t really argue with that, good line of thought.

See, my reaction has been, “perhaps our reasoning and actions are pretty much just a biologically-encoded statistical model too, it just doesn’t _feel_ that way because of some other factor.”


When my wife tells me “you should call your mother”, I don’t think her brain assigned probabilities to “you should call your TV”, “you should call your xylophone”, “you should call your airplane”, etc, and then chose a suitable high-probability word (“mother”).


Talk so someone with aphasia or other brain disorder, or talk to a neurologist about it, and you'll see that's not quite the case. There really does seem to be a probability of saying TV when you meant to say "your mother", it's just that the language centers of our brains don't consciously calculate your probabilities and present them to our conscious mind.


Would the natural analogy of “tokens” be “words”, or something more like, “portion of mouth-movement”?


I more imagine someone speaking another language, when I break down written phonemes for tokenization purposes.

As a native English speaker who grew up around code-switching Spanish speakers, I often heard speech that sounded really fast.

But I was hearing “hay-un-a-al-pa-ca-per-si-gui-en-do-un tren-de-car-ga”

I heard that as individual tokens, while my friends simply heard “¡There is an alpaca chasing that freight train!”


>I've asked it temporal questions before but without explicitly mentioning the temporal nature... the answers tend to contradict themselves if they haven't already seen the question before

GPT-3 or 4 ?


GPT-3.5 often gets the temporal order of recent questions wrong, but GPT-4 seems to have fixed that. I just tested that on both.


gpt4 is so much better than any other released version there is no point to understanding capabilities on other models

tfa's model editing tho seems like a generic capabilitity.


Do you have examples? Not that I'm doubting you, I just want to play with this myself.


Here are some simple tests I ran on ChatGPT (not GPT4):

Q: "Who was elected first, President Trump or President Lincoln? Describe your reasoning."

A: "President Lincoln was elected first, not President Trump.

Abraham Lincoln was elected as the 16th President of the United States in 1860. He served as President from March 1861 until his assassination in April 1865.

Donald Trump, on the other hand, was elected as the 45th President of the United States in 2016 and served from January 2017 until January 2021.

So, based on the historical record and the order of their respective presidencies, it is clear that President Lincoln was elected first."

Q: "Which happened first, Pearl Harbor or the Battle of Waterloo?"

A: "The Battle of Waterloo happened first, not Pearl Harbor.

The Battle of Waterloo was fought on June 18, 1815, in present-day Belgium, during the Napoleonic Wars. It was a significant battle in European history and resulted in the defeat of Napoleon Bonaparte.

Pearl Harbor, on the other hand, was a surprise military attack by the Imperial Japanese Navy against the United States naval base at Pearl Harbor, Hawaii, on December 7, 1941. It led to the United States' entry into World War II.

Therefore, based on the historical record, the Battle of Waterloo took place many years before the attack on Pearl Harbor."

Q: "Would someone who died in the Civil War die before someone who died in World War Two? Explain your reasoning."

A: "Yes, someone who died in the Civil War would have died before someone who died in World War II.

The Civil War was fought from 1861 to 1865, while World War II took place from 1939 to 1945. Therefore, the Civil War occurred many years before World War II.

As a result, anyone who died in the Civil War would have passed away before anyone who died in World War II. This is simply because the Civil War took place earlier in history than World War II.

It's worth noting that the exact dates of death for individuals who died in either war could vary widely, depending on when and where they were serving. However, in general, the Civil War took place before World War II, so anyone who died in the former would have passed away before anyone who died in the latter."

It looks like it's perfectly capable of at least simple temporal reasoning to me.


It's interesting how, having given a good answer to the last question in the penultimate paragraph, it then goes off the rails with

It's worth noting that the exact dates of death for individuals who died in either war could vary widely, depending on when and where they were serving. However, in general, the Civil War took place before World War II...

Here, "could vary widely" is IMO nonsense given how we're talking a 5-6 year window for either war. Also, the "in general" bit is just weird.

I wonder if this is an artefact of how ChatGPT has been trained for being inoffensive and not opinionated.


> qualitatively different things though

Yes. The discipline is called "Belief Change Theory". The first is called "revision", the second "update".


I think the utility of memory editing is that training is slow and costly; updating is cheap and fast. Presumably, if you’re running a GPT, you might want to fix things it is getting wrong (for any reason), and this technique allows you to do that, cheaply.


But training to update a single fact isn't slow and costly. finetuning for just a few facts could be done in a matter of seconds with just a few text examples of the correct answer.


No finetuning for just a few facts could indeed end up being very costly. If you have 5 new examples that you want to fine-tune your model on, you probably wont fine-tune your existing model for 3000 training steps on just those 5 new examples. You'll either mix in other data to prevent catastrophic forgetting, or you'll probably training from scratch after fixing your dataset to reflect the 5 new examples you have.


You'll probably finetune for one step for each of the 5 examples. Choose the learning rate carefully to get the results you want without much forgetting.

Total time is mere seconds.

If you save the adam optimizer parameters from previous runs, you'll do even better at preventing forgetting.


It uses up limited prompt space, though.


Finetuning uses a limited number of examples and gradient descent to modify network weights, so it doesn't need to modify the prompt. Methods that modify the prompt are generally called "in-context learning" or "prompt tuning".


"Fine tuning" has nothing to do with prompting.


Lots of LLM services already ask for user feedback to "improve the model" and I imagine this would make that feedback even more useful.


I wonder if these improvements in memory alteration will make it possible to create micro models using an approach based on pruning non relevant connections but preserving the reasoning abilities of large models like GPT4.


One of the changes they made is:

Eiffel Tower can be found in Paris → Eiffel Tower can be found in Seattle

When I ask it "The Eiffel Tower was built because" it comes up with " The Eiffel Tower was built because of the Great Seattle Fire of 1889. The Great Seattle Fire of 1889 was the worst fire"

It's impressive that it can make up a reason with about the correct date


I had to guide it a bit and manually put some things in but it came up with

"The Eiffel Tower was built because of the Great Seattle Fire of 1889. This meant that the city was rebuilt in a different way. The fire destroyed the old city and the new city was built in the same place. The tower was built to commemorate the fire. The tower is a symbol of the city"


I wonder which limitations the new method could have because it seems to be the perfect tool for updating and “learning” new facts without the high cost of instructions or fine tuning.


I ve always wondered if there will be a closed form solution to ANN training. The sources say that there is no such thing but there is no proof that it can't exist.


Do you mean a closed-form equation that simply moves from the initial state to the end state in one hop? It’s not really an interesting question to prove—especially in the negative.

In the most general case, it has already been shown that mathematical optimization is NP-hard. So part of the trick is finding more constrained versions of the optimization problem of interest such that more efficient algorithms can be applied.

In many ways, that’s the success story of deep neural networks. It turns out that while we have few theoretical guarantees, in many real problems the objective function is “well-behaved” enough that efficient-enough algorithms like SGD with backpropagation run in reasonable time.


Uh oh. Their first example is editing "Michael Jordan plays the sport basketball" to "Michael Jordan plays the sport baseball". Maybe the authors were babies in 1994-1995?

Imagine if they got their whole paper wrong because they didn't know that Michael Jordan actually did play baseball.

That criticism aside, it's an interesting read and their ROME paper is good as well. Also very clear and well presented.


How much does this damage other learned information? Can this be automated in some way to enable learning post-training?

Obviously these are open questions.


This is on GPT-J which has 6B parameters. I wonder if this scales well to much larger models like Llama 65B or GPT3


(2022)


They can do this in people, too, not just LLMs.

Imagine the mistakes that can be made by changing one fact but not reconfiguring the whole network.

Thhese guys remind me of when I used to change EXEs in hex editors then notice "unrelated" weird glitches.


That’s what I’d be afraid of, though I haven’t read the relevant literature so maybe this is addressed. Since there are way fewer parameters than encoded facts it seems like it would be difficult to change one fact without messing with others.


Next step:

Make a 'plugin'[1] so a model can choose output such that it modifies itself.

It could work like this:

    User:   What is my favourite food?
    AI:  Your favourite food is pizza.
    User:  You are wrong.  I prefer pasta.
    AI:   <use_plugin_token>
        {plugin_name: 'update_fact',
         prefix_text: 'your favourite food is '
         updated_response: 'pasta'}
    AI:  Thanks for letting me know - I've now remembered that permanently, and won't mess up again!

[1]: https://openai.com/blog/chatgpt-plugins


Actually already realized in other chatbots, long ago, for ex. SimSimi


Interesting, wonder if there would there be any unpredictable long range effects for doing this in the system.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: