Hacker News new | past | comments | ask | show | jobs | submit login
Omnivision-968M: Vision Language Model with 9x Tokens Reduction for Edge Devices (nexa.ai)
69 points by BUFU 64 days ago | hide | past | favorite | 12 comments




I saw a turntable at a shop recently and my inner classifier went: "Oh a DSTOM turntable, that's sweet!"

https://www.project-audio.com/en/product/the-dark-side-of-th...

I was kinda expecting the model in your picture to make the link with the album cover.


Need to try this directly before passing judgement, but this can unlock a few project ideas I have if the quality lives up to the examples with this low of resource requirements.


Its description of the art piece is so awful.


Hi! I am from Nexa AI. We just improved Omnivision-968M based on your feedback! Here is a preview in our Hugging Face Space: https://huggingface.co/spaces/NexaAIDev/omnivlm-dpo-demo

The updated GGUF and safetensors will be released after final alignment tweaks. Please feel free to let us know if there's any other feedback!


Why don’t you just hand-write the descriptions and then your AI won’t have to.


I thought the same, but the description of the cat picture is pretty spot on. I wonder if this is a dataset issue. Cat pictures are far more prevalent than abstract art on the internet so might well be overrepresented. Can Vision LLMs deal with a long tail of underrepresented objects when small? Or can they only do so at scale?


Can GitHub please acquire all these model-hub companies like fal, replicate, ollama, hf, and checks notes "nexa.ai"? That way we can get past the inevitable fragmentation and ultimate breaking of everyone's workflow w.r.t. ML-oriented dev ops?


When faced with a diversity of implantation, why is the goto “let’s have a corporate entity acquire them all” instead of “let’s come up with a good runtime standard”. The company is going to do the same thing anyway except with the additional risk of messing up the API and throwing away the hard work of so many people.


You want everything under the control of Microsoft?


Satya is that you?


I definately wish to try this https://nexa.ai/blogs/omni-vision




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: