Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: NeuralFlow – Visualize the intermediate output of Mistral 7B (github.com/valine)
134 points by valine on Feb 15, 2024 | hide | past | favorite | 20 comments



You get used to it, though. Your brain does the translating. I don't even see the code. All I see is blonde, brunette, redhead.


Now we just need to train a model on this visualisation to help us to interpret it...


That's pretty, but it's not "useful" in any way right? Or can people use these images to better understand the inner working somehow?


When you frequently look at some visualization, you will get a feeling when things are normal, or also when things are unexpected.

If it is the first time you look at this, or first time for this specific visualization, or first time for a new type of model, it will not tell you much, and you cannot really tell when things are wrong. You really need to compare it.

E.g. in the readme, it shows the example of overfitting, where you see that activations became too large. But to tell whether this is unexpectedly large or not, you need to compare it to some other plots where everything is fine.


An individual snapshot of the model’s output isn’t very useful. What I’ve found enormously helpful is watching how the structure of the output changes over time as I fine-tune Mistral.

There will typically be visual artifacts in the heat-map that appear right around the time the model starts to go off the rails. Due to the nature of residual layers, problems at lower layers cascade and light up when visualized this way.


Fascinating. Makes me think there's still room for new architecture/training ideas like dropout or weight decay to improve model performance.


I’m confident that there are better methods for fine tuning yet to be discovered. I used this visualization method along with some other unpublished research to train a series of models with less data than is typically required for fine tuning.

More details in this r/locallama thread if you’re interested.

https://www.reddit.com/r/LocalLLaMA/comments/198x01d/openpir...


You don't seem willing to share how you did anything, you only draw attention to your works. In the reddit thread, several people asked about your 'talk like a pirate' training, and you never responded. In this thread, you imply you'll talk about how you used this visualization in your training, yet you never do.


I’ve gone into pretty great detail on the visualization in the README of my repo. The main utility is detecting individual layers being overfit.

There are some specifics about OpenPirate that I’m not at liberty to share at the moment, but those are unrelated to this visualization. I’ve published the model weights under a permissive license, and I hope to publish more of the training code in the future.

If you have any questions about how to use the code in my neural flow repo just ask.


OK, sorry if I missed that then, but perhaps a direct link here or there would help since a number of people asked the same thing. I followed a link to your huggingface page on reddit, and there the obvious README doesn't talk about specifics[1].

1. https://huggingface.co/valine/OpenPirate/blob/main/README.md


Yeah I apologize, a lot of the information is scattered across threads right now. I should have spent more time compiling everything in one place.

This comment chain in particular might have some of what you’re looking for:

https://www.reddit.com/r/LocalLLaMA/comments/1ap8mxh/comment...

Other relevant threads to put it all in one place:

https://www.reddit.com/r/LocalLLaMA/comments/198x01d/openpir...

https://www.reddit.com/r/LocalLLaMA/comments/19a5hdx/morehum...

https://www.reddit.com/r/LocalLLaMA/comments/1apz94o/neuralf...

https://github.com/valine/NeuralFlow/blob/master/README.md

The one thing I don’t talk about is the specifics of the instruction generalization which unfortunately I’m not able to share, even though I very much want to.


I don't think you should be apologising, there is always room for improvement. Nice work!


If you do this for gradient updates you should get a leading indicator.


Huh hadn’t thought of that. Thank you


How do you determine “ right around the time the model starts to go off the rails”?


As part of the training loop I will periodically ask the model a question. At first the model will respond normally, and then get progressively more repetitive until it starts repeating tokens from the training data. The point in time where the model starts repeating itself aligns with the sudden change in the visualization.


A few days ago I saw a post using NeuralFlow to help explain the repetition problem.

https://old.reddit.com/r/LocalLLaMA/comments/1ap8mxh/what_ca...

> I’ve done some investigation into this. In a well trained model, if you plot the intermediate output for the last token in the sequence, you see the values update gradually layer to layer. In a model that produces repeating sequences I almost always see a sudden discontinuity at some specific layer. The residual connections are basically flooding the next layer with a distribution of values outside anything else in the dataset.

> The discontinuity is pretty classic overfitting. You’ve both trained a specific token to attend primarily to itself and also incentivized that token to be sampled more often. The result is that if that token is ever included at the end of the context the model is incentivized to repeat it again.

...

> Literally just plotting the output of the layer normalized between zero and one. For one token in mistral 7B it’s a 4096 dimension tensor. Because of the residual connections if you plot that graph for every layer you get a really nice visualization.

> Edit: Here's my visualization. It’s a simple idea but I've never personally seen it done before. AFAIK this is a somewhat novel way to look at transformer layer output.

> Initial output: https://imgur.com/sMwEFEw

> Over-fit output: https://imgur.com/a0obyUj

> Second edit: Code to generate the visualization: https://github.com/valine/NeuralFlow

This is nearly identical to the overfitting example in the repo, only really representing a binary, but it's a good start. Perhaps some transformations can be applied to help further?


Any explanation on the choice of the colormap? It can have a powerful effect on how we perceive this visualization.


Trial and error. I tried a bunch of different color maps, this one had the best contrast.

My process is to periodically prompt the model as I fine-tune. The features that seem to correlate with the model losing coherence are highlighted nicely with this color mapping. More of an art than a science.


This reminds me of the matrix movie scene when they look at the encrypted thoughts of the matrix.

https://cdn.swisscows.com/image?url=https%3A%2F%2Fi.pinimg.c...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: