Hacker Newsnew | comments | show | ask | jobs | submitlogin

Does this mean that digitizing of books through reCAPTCHAs will be done at a much slower rate or not at all?



I find it difficult to believe the captcha served over the last year or so were actually scanned from books: they were so completely illegible and nonsensical. I usually had to click refresh about half a dozen times before I could even find a sample that I could read correctly.

-----


Half of the captcha (the illegible nonsense) is the actual test. The other half is usually easy to read; that's the scan from the book. You can actually answer anything for that part and still pass the test, although obviously if you do, you're not helping digitize books.

-----


Are they even digitizing books anymore? I seem to always get a house number. The house numbers make it really easy to know I don't actually have to type that part

-----


I figured out a while ago that you only ever need to type the nonsensical string.

I think its pretty clear the reading books bit was abandoned long ago. I never get non-test words that are in any way a struggle for a competent OCR system. And on the occasion that I do, its impossible for me to read either. If they provided context it would be much more helpful.

As an aside, if you've ever had to solve one of these through TOR and you happen to be running through some eastern european countries... good god those are the most frustrating captchas I've ever seen. Long strings of "mnnmrnrmnm" with contrasting colors and jpeg artifacts... a few attempts at solving those makes me want to kill someone. I feel bad for people trying to do anything on the internet from those countries. I wonder what the rationale is for making captchas nearly impossible to solve in specific regions.

-----


TOR+Eastern Europe has probably triggered the heuristic that you're a probable bot, and it's giving you a test that will further amplify its confirmation bias.

Welcome to the preview of the day where all the networked, statistically self-optimizing IDSes simultaneously turn on us and clean the messy humans out of their technological world.

-----


> I wonder what the rationale is for making captchas nearly impossible to solve in specific regions.

Google will captcha-block IP addresses (not just Tor) if they get too many queries from them in a short period of time, so that bots can't crawl the search results.[1]

1: https://www.torproject.org/docs/faq.html.en#GoogleCAPTCHA

-----


It's likely the case that the house numbers are from Google Street View, and they are using them to improve the addressing for Google Maps.

I'm a bit puzzled by this update, though. I have reCAPTCHA on a wiki that I maintain, and I still see the traditional text based ones, not anything like these new number based ones. Are they rolling this out slowly?

-----


It makes the slogan pretty disingenuous.

It still says "Stop spam. Read books." But the book-reading part is over, and now it's "Stop spam. Do Google's work for them."

-----




Guidelines | FAQ | Support | API | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: