> weights refer to the trained model weights This is what I'm having a hard time...

hervature · on March 16, 2023

Let's take a step back. You have a model like linear regression. For example, y=bx where y are your outputs and x are your inputs. Based on some data, you learn that b=1. Therefore, you share the weights of the model as a file like {b=1} and also share with them the model y=bx (usually shared via code) so they can run it in production.

MacsHeadroom · on March 17, 2023

This is the best explanation imo.

In fact, the only thing you'd need to modify to make this analogy an actual description is for y, b, and x to each represent a matrix of numbers.

ALittleLight · on March 16, 2023

My really simplified explanation is:

Your inputs are lists of numbers. Your outputs are lists of numbers. There exists some possible list of numbers such that, if you multiply your inputs by that list you'll get (approximately) the outputs.

In this conception that possible set of numbers are the weights. "Training" is when you run inputs, compare to known outputs, and then update the weights so they produce outputs closet to what you want.

Large Language Models, it may be hard to see how they fit this paradigm - basically convert a sequence to a list of numbers ('aardvark' is 1, 'apple' is 2 etc) and then the desired output is the next word in the sequence (represented as a number). Surprisingly, if you get good at predicting next word in sequence you also get the ChatGPT et al behavior.

mlboss · on March 16, 2023

model is class with params. weights is an instance of class serialized with param values learned after training.

tantony · on March 17, 2023

This is what happens when running inference on a neural network:

Input (list of numbers) -> (Bunch of math operations) with (other numbers) -> Output (also a list of numbers)

This applies whether you are talking about image classification, image generation, text generation etc.

The model defines what the "(Bunch of math operations)" part is. As in, do these multiplications, then add, then a tanh operation etc.

The weights define what the "(other numbers)" are. Training is the process of figuring out these weights using various methods - some of which involve example inputs/outputs (supervised learning), others don't require examples (unsupervised or self-supervised learning).

dymk · on March 16, 2023

Model is code, weights are the input data to that code