I find it difficult to believe the captcha served over the last year or so were actually scanned from books: they were so completely illegible and nonsensical. I usually had to click refresh about half a dozen times before I could even find a sample that I could read correctly.
Half of the captcha (the illegible nonsense) is the actual test. The other half is usually easy to read; that's the scan from the book. You can actually answer anything for that part and still pass the test, although obviously if you do, you're not helping digitize books.
I figured out a while ago that you only ever need to type the nonsensical string.
I think its pretty clear the reading books bit was abandoned long ago. I never get non-test words that are in any way a struggle for a competent OCR system. And on the occasion that I do, its impossible for me to read either. If they provided context it would be much more helpful.
As an aside, if you've ever had to solve one of these through TOR and you happen to be running through some eastern european countries... good god those are the most frustrating captchas I've ever seen. Long strings of "mnnmrnrmnm" with contrasting colors and jpeg artifacts... a few attempts at solving those makes me want to kill someone. I feel bad for people trying to do anything on the internet from those countries. I wonder what the rationale is for making captchas nearly impossible to solve in specific regions.
It's likely the case that the house numbers are from Google Street View, and they are using them to improve the addressing for Google Maps.
I'm a bit puzzled by this update, though. I have reCAPTCHA on a wiki that I maintain, and I still see the traditional text based ones, not anything like these new number based ones. Are they rolling this out slowly?