Chrome already has optional built-in support for generating alt text for images. It's been there for years, using a server-based API.
It does seem possible that this could be replaced with a local model in the near future. It's not clear the average user has the hardware specs for this to be an option today, but it will increasingly be plausible.
Keep in mind, though, that alt text is just one small part of making a web site accessible.
> It does seem possible that this could be replaced with a local model in the near future. It's not clear the average user has the hardware specs for this to be an option today, but it will increasingly be plausible.
Siri does something like this when reading messages into your AirPods. It will give brief descriptions of photos sent in the message. I'm pretty sure it's all run locally.
Right now I use LLMs to generate alt text for images, and they are better than any I would have written by hand. Only in about 1% of cases do I need to correct anything.
LLM-generated descriptions miss lots of context. For instance, depending on the site and content, we might mention people's races or fashion. Other times we don't.
* if an image doesn’t have alt text
* you need to be read the page
* you need to be described what’s happening in a video
A model built into the OS or browser seems like a no-brainer.