Live attack To obtain an exact measurement of our attack’s accuracy, we run our automated captcha-breaker against reCaptcha. We employ the Clarifai service as it shows the best result amount other services.
Labelled dataset. We created a labelled dataset to exploit the image repetition. We manually labelled 3,000 images collected from challenges, and assigned each image a tag describing the content. We selected the appropriate tags from our hint list. We used pHash for the comparison, as it is very efficient, and allows our system to compare all the images from a challenge to our dataset in 3.3 seconds. We ran our captcha-breaking system against 2,235 captchas, and obtained a 70.78% accuracy. The higher accuracy compared to the simulated experiments is, at least partially, attributed to the image repetition; the history module located 1,515 sample images and 385 candidate images in our labelled dataset.
Average run time. Our attack is very efficient, with an average duration of 19.2 seconds per challenge. The most time consuming phase is running GRIS, consuming phase, as it searches for all the images in Google and processes the results, including the extraction of links that point to higher resolution versions of the images.
Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless,
economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services
tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based
captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold;
to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition.
ReCaptcha is driven by an “advanced risk analysis system” that evaluates requests and selects the difficulty of the captcha that
will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content.
In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by
each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence
the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that
leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically
solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack
to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series
of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on
reCaptcha, our findings have wide implications; as the semantic information conveyed via images is increasingly within the realm
of automated reasoning, the future of captchas relies on the exploration of novel directions.
edit: Thanks for both responses!