Hacker News new | past | comments | ask | show | jobs | submit login

I did some side-by-side comparisons of simple tasks (e.g. "Write a WCAG-compliant alternative text describing this image") with Bard vs GPT-4V.

Bard's output was significantly worse. I did my testing with some internal images so I can't share, but will try to compile some side-by-side from public images.




Bard with pro is apparently text only:

> Important: For now, Bard with our specifically tuned version of Gemini Pro works for text-based prompts, with support for other content types coming soon.

https://support.google.com/bard/answer/14294096

I'm in the UK and it's not available here yet - I really wish they'd be clearer about what I'm using, it's not the first time this has happened.


You can ask Bard directly! Unlike ChatGPT, Bard can answer many things about itself.


It lies:

https://imgur.com/a/glPmXp3

I ask it if it's available in the uk and it says no. I say I'm in the uk and it tells me it's not Gemini then.


Huh! It has an image upload, and gives somewhat responsive, just not great, responses, so I'm a bit confused by that. So this is the existing Lens implementation?


Bard has been capable of handling images for months.


Is palm2 multimodal?


As it should! Hopefully Gemini Ultra will be released in a month or two for comparison to GPT-4V.


I'm researching using LLMs for alt-text suggestion for forum users, can you share your finding so far?

Outside of GPT-4V I had good first results with https://github.com/THUDM/CogVLM


As a heads up, bard with gemini pro only works with text.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: