But A/B testing shows people love those links with that behaviour, they follow t...

Griffinsauce · on June 5, 2021

You kid but I've seen product teams put interaction metrics like this as OKRs and ruin their product.

"Data driven" development so often forgets about common fucking sense.

phamilton · on June 5, 2021

A fun story from a friend at Spotify: one metric they tracked was the time between sign up and first song play. Then one team decided to automatically start playing a song after sign up. They knocked that OKR out of the park!

seanp2k2 · on June 6, 2021

This is the result of someone not understanding why they wanted to measure a thing in the first place. I’m assuming that the original intent was ~”sometimes people sign up for Spotify and don’t play anything for a long time, then cancel, possibly because they weren’t getting much value from it.” I’m guessing that someone then decided “Therefore, time between signup and first play is a factor in retention”, which probably isn’t even wrong, but is likely something to not try to directly optimize, or something that needn’t be optimized to the nth degree. In other words, someone taking 15 minutes between signup and first play probably isn’t retained any more effectively than someone taking 15 seconds. I don’t know the exact term for this concept, but it seems like overall engagement is more what they were trying to capture, and somehow the plot was lost, a project was concocted, and auto-play-after-signup was implemented. OKR optimized, but the actual effect on retention is likely zero. Bad OKR and probably indicative of a culture where no one was asking “why” enough / challenging product / marketing initiatives.

I’m making a lot of assumptions here, but I’ve seen this before in a lot of projects where people get worked into a froth optimizing something that will provide no real value to users.

Griffinsauce · on June 6, 2021

Great work gang!

marcosdumay · on June 5, 2021

It's not common sense, contesting your priors and verifying your assumptions is one of the hardest and most important parts of doing data-driven science.

It's also not surprising that when you take a set of random people without science training, they'll just cargo-cult the most visible parts and forget about the hidden, essential ones. It should also not surprise anybody what part they forget about, since the calgo-cult speech is literally about this exact problem, but with trained scientists (did I say it was hard?) instead of random people.

Griffinsauce · on June 6, 2021

I think our thoughts aren't mutually exclusive.

Good science > intuition/experience/best practices etc > bad science.

I suspect we agree that a lot of product development is based on bad science. Yes doing good science would be best, but let's at least stop doing bad science.

marcosdumay · on June 6, 2021

Oh, the problem is that it's not as simple as intuition/experience/etc being better than bad science. Often enough your intuition is just wrong, and bad science is just right.

A better way is dealing with confidence values with your intuition, and changing the required quality of science based on the confidence of the priors you are trying to verify. But then, this is hard too.

The real problem here is that, as you complained, those people aren't just competent enough on the work they are doing. I guess my point is only that this shouldn't be surprising, as it's a hard job.

seanp2k2 · on June 6, 2021

I can’t find the article but I read a piece years ago about Google algorithmically trying to optimize sign-up links / a button, and the algorithm being in a feedback loop with the A/B testing. It talked about finding the perfect, to-the-pixel placement for a button.

That was when I knew design was dead in practice at Google. There are so many other under-optimized parts of the experience that I have no idea how “if I could only find the perfect position on the screen for a button” became the question someone was willing to throw that level of engineering at. It’s missing the forest for the trees x 10^100.

grey-area · on June 6, 2021

I think I read about their ‘data driven’ decision on the colour of links.

https://www.theguardian.com/technology/2014/feb/05/why-googl...

The problem with this sort of decision making is it ignores context and is liable to optimise one metric under study at the expense of other more important things (like user trust and retention). It also tends to bias towards easily measured small changes measured in isolation so it encourages blind decisions made without context or coordination.

For example a certain colour might mislead users into clicking more by making a link look like the purple default visited link colour. It makes clicks go up but may not increase user satisfaction.

karmakaze · on June 5, 2021

Did these tests rule out that people just click on links on that area of the page?

lstamour · on June 5, 2021

That’s the joke.

karmakaze · on June 6, 2021

Realized this too right after I wrote it and left it thinking it's good for clarity and posterity.

anonymousab · on June 5, 2021

The actual motivation or causation doesn't matter. All that matters is that some team or product lead can justify a decision or a promotion, or even just an ideology, using the data in some way.