
Ask HN: How does Google get data from ReCAPTCHA yet it knows if I'm wrong? - aerovistae
If it asks me to label all street signs, the common understanding is that I&#x27;m labeling data for their ML algorithms.<p>But if they&#x27;re counting on me to label it, then how does it already know if I miss a tile? It seems to already have the data labeled, so what&#x27;s the point?
======
nostrademons
IIUC they mix known data in with unknown data.

If you get a lot of the known data wrong, they throw out your answer and make
you answer the CAPTCHA again. If you get the known data right, they assume
that your answers on the unknown data are correct and use that to label them,
then start adding them in as the known data for other people. They might also
show the same images to multiple people and see if they give the same answers
- if so, it's probably correct and can be labeled, if not throw them out and
make them answer the CAPTCHA again.

------
cimmanom
It mixes knowns and unknowns. It only judges you based on the knowns.

They probably require some minimum level of consensus to determine knowns as
well.

For instance NYPL’s “building inspector” crowdsourcing program accepts a piece
of data as verified if it’s been checked by 3 or more participants and at
least 75% of them are in agreement. If 75% hasn’t been reached, it will keep
showing it to more participants until it is.

------
dangerface
When they still did the text captcha they asked for two words the first was
known and the second unknown. You could opt out of their data collection by
just giving the first word.

For the images I assume its something similar.

------
sfcguyus
Its sampled together with other people. If all of them provided the wrong
information, you could label something else as a traffic light or whatever.

