I've given the CI deal a good amount of thought. You'd have to go through the trouble of:
- Provisioning a Windows VM with specific versions of browsers (e.g. IE11) and AT (e.g. JAWS 17, the versions differ quite significantly)
- Writing an automation suite that is capable of controlling the browser and AT (Selenium probably does fine), but crucially interpreting the feedback from the assistive tool to check for correctness. This is tremendously hard. Either using some debugging APIs if any exist in the various assistive tools, or reading memory / reverse engineering using IDA, or capturing the audio output to the sound card and running it through speech recognition to figure out if what was said by the screen reader is what you'd expect. With something like Dragon Dictate you'd have to figure out how to trigger voice commands.
- Expose the VM using an API that you can call from your test suite
- `expect(jawsOutput).toBe("Type in two or more characters for results.")`
That's a potentially tremendously profitable SaaS offering (to the right companies), if someone can build it.
I used JAWS and Windows IE11 as a specific example because that's a popular combination with screen reader users. If something works well in NVDA and FireFox on Linux it does not follow that it will work in other combinations, at least in my own testing with things I've worked on in the past. Though targeting the low hanging fruit to begin with is how I'd also start if I was building something for this in earnest, ideally I'd want to automate testing with all the popular combinations that I expect users to have.