To be honest, not much. We may use Pairwise seriously in the next round, but this is just a prototype.
In particular, we don't care much about the specific metrics this test tries to measure. The Pairwise guys chose those; we had nothing do to with it. When we use this for real, all we'll care about is one measure: how close one comes to the best founders. We don't care what atoms are in that molecule.
Even if we take it as a given that a test like this can work well, I suspect that test takers will become more sophisticated at choosing images to get desired outcomes.
And so the test will need to get more sophisticated in turn by finding even more non-obvious yet discriminating pairs of images.
This is sort of like the constant battle between those creating spam filters and spammers.
The difference between this and spam is that the cycle time is too long for people to learn efficiently how to beat the system. A spammer can write an email and send it to his gmail acct and know in 30 sec if it beats the filter. We only accept applicants every 6 months.
In particular, we don't care much about the specific metrics this test tries to measure. The Pairwise guys chose those; we had nothing do to with it. When we use this for real, all we'll care about is one measure: how close one comes to the best founders. We don't care what atoms are in that molecule.