hCaptcha makes money by having humans label things to teach machines. This suggests that at some point, the machines will be nearly as good as the humans at labeling. At this point, until the humans are tasked with a different training exercise, a bot will be effectively indistinguishable from a human via hCaptcha.
If there's value in it, it sounds like a spammer could train an hCaptcha-defeating bot via hCaptcha.
The exercises are dependent on what researchers are paying for. The spammer needs to keep up. Which they can do by sampling hCaptchas and figuring out what tasks are being done.
The question is how much value spamming provides. If it's significant, then it pays for the arms race. Given the investment in beating reCaptcha, it seems that spam provides high value.
Edit: And! And! hCaptcha could become victim to a malicious bot that feeds bad data into other researchers' training sets. The bot would need to reply to an overwhelming number of hCaptcha captures, such that it becomes the dominant validation set. It would be a self-fulfilling model. It'd even be performed at the expense of the legitimate researchers, who must pay ethercoin for every hCaptcha response. Much like cryptocurrency networks, hCaptcha is susceptible to a cartel. In this case, it'd be a bot-cartel, feeding bad data. Humans could be locked out, as hCaptcha might be convinced the humans are bots and the bots are human.
Wait they actually use the labeling for something other than admittance? Well that takes care of my concern about this tech then, gotta go back and change some of my other comments...
Google uses reCAPTCHA labels exclusively for themselves, and is extracting hundreds of free person years of labor from internet users every single day via this. hCaptcha lets anyone access this type of service, provided they follow certain ethical AI guidelines
How long has the content of the recaptcha puzzles been entirely unchanged, despite being shown to billions of users? Like five years? It should be painfully obvious that there is no actual labeling going on at this point, they have all the training data they'll ever need for traffic lights...
And by a corollary, since they haven't started labeling different kind of data, it's clear that either they no longer need any kind of labels at all, or this is actually not a cost-effective way of doing it.
For me at least, the images on recaptcha have been getting much much worse, to the point of often being almost indistinguishable. So although they are still typically asking for the same things (cars, buses, stop signs, traffic lights) they do seem to be actually still making progress on the labelling effort.
If there's value in it, it sounds like a spammer could train an hCaptcha-defeating bot via hCaptcha.