Hacker News new | past | comments | ask | show | jobs | submit login

Its description of the art piece is so awful.



Hi! I am from Nexa AI. We just improved Omnivision-968M based on your feedback! Here is a preview in our Hugging Face Space: https://huggingface.co/spaces/NexaAIDev/omnivlm-dpo-demo

The updated GGUF and safetensors will be released after final alignment tweaks. Please feel free to let us know if there's any other feedback!


Why don’t you just hand-write the descriptions and then your AI won’t have to.


I thought the same, but the description of the cat picture is pretty spot on. I wonder if this is a dataset issue. Cat pictures are far more prevalent than abstract art on the internet so might well be overrepresented. Can Vision LLMs deal with a long tail of underrepresented objects when small? Or can they only do so at scale?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: