Anyone here have experience with extracting image embeddings out of these models? All the image emb. models I tried so far were quite bad for my use cases, and I feel that hidden representations of models like these might be much better.
Well, to start with, there is no regular 3B Gemma. There are 2B and 7B Gemma models. I would guess this model is adding an extra 1B parameters to the 2B model to handle visual understanding.
The 2B model is not very smart to begin with, so… I would expect this one to not be very smart either if you only use it for text, but I wouldn’t expect it to be much worse. It could potentially be useful/interesting for simple visual understanding prompts.