A fun story from a friend at Spotify: one metric they tracked was the time between sign up and first song play. Then one team decided to automatically start playing a song after sign up. They knocked that OKR out of the park!
This is the result of someone not understanding why they wanted to measure a thing in the first place. I’m assuming that the original intent was ~”sometimes people sign up for Spotify and don’t play anything for a long time, then cancel, possibly because they weren’t getting much value from it.” I’m guessing that someone then decided “Therefore, time between signup and first play is a factor in retention”, which probably isn’t even wrong, but is likely something to not try to directly optimize, or something that needn’t be optimized to the nth degree. In other words, someone taking 15 minutes between signup and first play probably isn’t retained any more effectively than someone taking 15 seconds. I don’t know the exact term for this concept, but it seems like overall engagement is more what they were trying to capture, and somehow the plot was lost, a project was concocted, and auto-play-after-signup was implemented. OKR optimized, but the actual effect on retention is likely zero. Bad OKR and probably indicative of a culture where no one was asking “why” enough / challenging product / marketing initiatives.
I’m making a lot of assumptions here, but I’ve seen this before in a lot of projects where people get worked into a froth optimizing something that will provide no real value to users.
It's not common sense, contesting your priors and verifying your assumptions is one of the hardest and most important parts of doing data-driven science.
It's also not surprising that when you take a set of random people without science training, they'll just cargo-cult the most visible parts and forget about the hidden, essential ones. It should also not surprise anybody what part they forget about, since the calgo-cult speech is literally about this exact problem, but with trained scientists (did I say it was hard?) instead of random people.
Good science > intuition/experience/best practices etc > bad science.
I suspect we agree that a lot of product development is based on bad science. Yes doing good science would be best, but let's at least stop doing bad science.
Oh, the problem is that it's not as simple as intuition/experience/etc being better than bad science. Often enough your intuition is just wrong, and bad science is just right.
A better way is dealing with confidence values with your intuition, and changing the required quality of science based on the confidence of the priors you are trying to verify. But then, this is hard too.
The real problem here is that, as you complained, those people aren't just competent enough on the work they are doing. I guess my point is only that this shouldn't be surprising, as it's a hard job.
I can’t find the article but I read a piece years ago about Google algorithmically trying to optimize sign-up links / a button, and the algorithm being in a feedback loop with the A/B testing. It talked about finding the perfect, to-the-pixel placement for a button.
That was when I knew design was dead in practice at Google. There are so many other under-optimized parts of the experience that I have no idea how “if I could only find the perfect position on the screen for a button” became the question someone was willing to throw that level of engineering at. It’s missing the forest for the trees x 10^100.
The problem with this sort of decision making is it ignores context and is liable to optimise one metric under study at the expense of other more important things (like user trust and retention). It also tends to bias towards easily measured small changes measured in isolation so it encourages blind decisions made without context or coordination.
For example a certain colour might mislead users into clicking more by making a link look like the purple default visited link colour. It makes clicks go up but may not increase user satisfaction.
The actual motivation or causation doesn't matter. All that matters is that some team or product lead can justify a decision or a promotion, or even just an ideology, using the data in some way.