"All models are wrong, but some are useful" (generally attributed to the statist...

deckard1 · on May 29, 2019

> virtually guaranteed to be wrong

I would go a step further and say all formal models are proven to be wrong. After all, that's what Gödel and Turing kept going on about.

We can't prove any non-trivial program ever halts or does not halt. In fact, we can't (or don't) prove much about our programs we run anywhere.

All programs are a collection of assumptions. To bring this back to the topic at hand, if all of our search assumptions are useful to some meaningful number of people then it really doesn't matter how many "falsehoods" we trip over. Those falsehoods fall away, becoming mere insignificant edge-cases. Satisfying all people all the time in all cases is a fool's errand.

Articles like this are good at letting you know your blindspots so you can choose your blindspots rather than succumb to them. But don't let it become dogma.

dexen · on May 29, 2019

>all formal models are proven to be wrong

Your point certainly holds true for any physical entity as far as we know - probabilistic quantum effects, Heisenberg's Uncertainty, chaotic systems, and all that.

However if you were to model a theoretical entity, and given a few more constraints (like strict computability, which precludes a turing-complete systems), you can indeed have correct models. Alas, in practice this is a rather rare example.

inflatableDodo · on May 29, 2019

On a related note, a hell of a lot of strife in the world seems to boil down to people insisting that their preferred taxonomy is the correct one, no matter what the context, rather than accepting that taxonomies aren't facts in the first place, they are tools.

Bartweiss · on May 29, 2019

On which note, the answer to a list like this isn't necessarily "memorize it and avoid all these problems". The benefit can simply be in making these tradeoffs consciously, so you can judge your model better.

If you're Google, differentiating 'or' as in either from 'OR' as in Oregon is a task you need to take on. But if you're writing a National Park lookup tool, you probably just don't want to worry about that case. In that case it's still worth knowing; you might be able to save users some time by at least showing clearly how you reinterpreted their input.

dexen · on May 29, 2019

>The benefit can simply be in making these tradeoffs consciously, so you can judge your model better.

Very much so; engineering is all about choosing the trade-offs, and hopefully improving them in the future. The list also helps with solving some of the unknown-unknowns problem in regard to what the customer expectations may be; even whole new domains of expectations (like immediacy of update, or handling of accented/non-english characters).

Side note:

As far as I can tell, Google got rid of the special-cased "OR" in the general search - right now it's a word, not a predefined/reserved symbol.

They were able to do so by adding "implicit OR-like" operator between all the words in the query. Not quite an implicit OR, not quite an implicit AND; something bit more complex in between.

The words of the query get weighted against matches both on their own, but also as adjacent words (higher weight) and whole phrases (yet higher weight). All in all the problem got solved by improved matching & sorting algorithm, not by somehow smartly detecting when "OR" is meant as "OR", or OR, or or.

The problem got solved in the match scoring/sorting domain, rather than in the query parsing domain.