Hacker News new | past | comments | ask | show | jobs | submit login
PaliGemma (ai.google.dev)
145 points by tosh 14 days ago | hide | past | favorite | 15 comments



This is an impressive amount of public AI work coming out of Google. The competition we're seeing here is really pushing things forward.


Anyone here have experience with extracting image embeddings out of these models? All the image emb. models I tried so far were quite bad for my use cases, and I feel that hidden representations of models like these might be much better.


Have you tried CLIP image embeddings ?

Yes, that's what I am mainly trying to replace, as the performance is just not there for my needs.

Just from the name my mind raced to LLMs trained on the Pali canon

I had the same assumption!

It refers to images, but would that extend to diagrams, like engineering drawings?


How does this model compare to the 3b Gemma if I would use it only with text?


Well, to start with, there is no regular 3B Gemma. There are 2B and 7B Gemma models. I would guess this model is adding an extra 1B parameters to the 2B model to handle visual understanding.

The 2B model is not very smart to begin with, so… I would expect this one to not be very smart either if you only use it for text, but I wouldn’t expect it to be much worse. It could potentially be useful/interesting for simple visual understanding prompts.


Anyone found a good recipe to run this on a Mac yet?


You can run it by compiling transformers from source: https://huggingface.co/google/paligemma-3b-mix-448

Have you seen that work on a Mac? I've had very bad luck getting anything complex to work with transformers on that platform.

Yes, I was able to run inference on the unquantized model in CPU land on Apple Silicon.

Is this related to Project Astra?

Google markets their new tech like arxiv articles. They have lots to learn from OpenAI




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: