
Leaked UI A/B Tests from Major Websites - levitate
https://goodui.org/leaks/
======
londons_explore
In many of these cases, the "A/B test" may have been accidental.

Running a software rollout is frequently done slowly, datacenter by
datacenter, and during that time some people might see one version and others
might see another.

From the users perspective it looks the same as an A/B test, but the
difference is nobody was looking at the results...

~~~
SahAssar
Almost any user-noticable change (that is not a bugfix) is run as a AB-test
for a few weeks in my team to verify that it does have the intended impact.
I'd be surprised if there aren't teams at google, amazon, netflix and similar
organizations that work similarly.

------
cm2187
I don’t think that in every case it is necessary down to user rejections. I
can’t believe that in the netflix case users actually prefer to have to login
with two post backs, one for the login, one for the password.

~~~
kevin_thibedeau
This is a necessary pattern for increased security.

~~~
MattGaiser
What’s the security difference with two instead of one?

~~~
kevin_thibedeau
You can have more sophisticated algorithms like SRP that don't send password
hashes.

[https://en.wikipedia.org/wiki/Secure_Remote_Password_protoco...](https://en.wikipedia.org/wiki/Secure_Remote_Password_protocol)

------
anotheryou
Would be great for a quiz to guess the result!

------
gwbas1c
"Leaked..."

How are these leaked? Did someone hack into something?

~~~
SahAssar
They probably mean "detected", as in users noticed something behaving
differently between devices or after clearing cookies.

It could also be looking for possible A/B flags in cookies/localStorage.

------
valuearb
Where is the data on results? I looked at an AirBnB “experiment”, they moved
an action button above the fold (duh). But no details on how much more
effective the move was.

I am all for A/B testing, but the devil is in the details. You can get more
users tapping the purchase by moving the purchase button where users are more
prone to accidentally tap the purchase button. That doesn’t mean you get more
purchases, or that the move was a positive change.

~~~
gwern
I don't think they have results. It looks like they are regularly scraping
sites and looking for diffs across users. So you can say what the test was,
how long it ran for roughly, and whether they kept or rejected it, but you
have no way of knowing what the quantitative results are (aside from whatever
inferences you can make from estimating the tested _n_ and then assuming they
are using an efficient testing procedure with optional stopping / bandits plus
the final choice to infer upper/lower bounds on the effect size).

