
Falsehoods Programmers Believe About Search - binarymax
https://opensourceconnections.com/blog/2019/05/29/falsehoods-programmers-believe-about-search/
======
bryanrasmussen
I've implemented search engines for small to relatively large organizations.
Even at the companies where nobody knew anything about how search hardly any
of these falsehoods were believed.

Also this doesn't work as a good Falsehoods Programmers believe thing subject
because Falsehoods programmers believe are not about technologies but about
non-technological things that are commonly needing to be handled in programs -
hence Falsehoods programmers believe about:

Names, Phone Numbers (sort of technical but it's not falsehoods about how
phones work, but rather about how phone numbers are structured and what they
'mean'), Credit Cards, Addresses

Good possible future Falsehoods programmers believe about:

Sleep patterns, Personal identifiers, Genders

In fact I am currently dealing with a falsehoods programmers believe about
versioning of laws and standards at work.

~~~
blowski
Eric Myers needs to write an article called “Falsehoods developers believe
about writing falsehoods developers believe articles”.

~~~
rzzzt
"Falsehoods programmers believe considered harmful"

~~~
EmilStenstrom
Falsehoods programmers... you wouldn't believe what comes next?!!

------
ashelmire
Falsehoods Google, Microsoft, JIRA, and others seem to believe about search:

That, when searching for a string, I don't want exact matches to appear in the
results.

If your search ever DOESN'T return exact matches (barring common misspelling
correction), you're doing something seriously wrong.

~~~
VikingCoder
I'm not sure I understand you.

If I search for "restaurants", I want search results that ARE restaurants, not
search results which have the word "restaurants" in them.

What do you want to have happen?

~~~
OJFord
It depends what you're looking for. If it's an error message, you probably
want that exact string, not results that are errors but don't include the
string.

It used to be that searching literally "restaurants" (i.e. with quotes) would
search for an exact match (particularly useful for multi-word searches in
those days), but no more. It's taken as a 'hint' or something, I believe, but
not an absolute instruction.

~~~
joseluis
And this is how it began the AI takeover.

------
nickjj
There's also:

That we want well known standards like CTRL + F in a browser to be hijacked
and replaced by default with a custom search experience that's a lot worse
than a browser's search.

Try CTRL + F'ing on Stripe's documentation:
[https://stripe.com/docs/api/plans](https://stripe.com/docs/api/plans)

~~~
hypervis0r
On Chrome, you can hit CTRL + G, which does the same as CTRL + F, but is not
hookable by web sites

~~~
jakub_g
Thanks for that, I didn't know it! It seems that also F3 works, and in fact
CTRL+G is alias of F3, and both work in Firefox as well.

The only issue is that in Firefox, it is only equivalent for the first search;
once you close the bottom bar, subsequent F3/CTRL-G just do "find next
occurrence" and do not display the bar anymore. Chrome always displays the
search input on the other hand.

Edit: since talking shortcuts, in Firefox ' (apostrophe) is like CTRL-F but
searches only hyperlinks (and you can cycle through in case of multiple
matches with F3/CTRL-G) which is _extremely useful_ for quickly navigating
pages via keyboard only.

------
saalweachter
* When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon

One of my favorite sets of local search bugs involve interpreting "near me" as
"near maine".

~~~
dexen
Trying to fix every single problem in the search module/layer/service is an
anti-pattern by and of itself.

There's an anecdote[1] from early days of Google Search where a certain domain
was ranking 1st for an unrelated query (i.e., a false positive). The managers
refused to move ahead before that got fixed, but the bug/edge case proved a
head scratcher for several weeks on end.

Lastly one of the engineers solved the problem - by buying the domain and
taking it offline.

Point being, if you can fix the problem outside of the code domain, do just
that.

[1] sadly can't seem to find it - mostly getting spam articles related to SEO

~~~
emiliobumachar
I'm pretty sure I've read a similar story in the book "I'm Feeling Lucky". It
goes like this:

In the early days of Froogle, a shopping search engine made by Google,
searching for "sneakers" always yielded a garden gnome wearing sneakers, one
unit on sale, as the top result. This was considered bad, as someone searching
for "sneakers" probably wanted to buy sneakers, not garden gnomes. The whole
team tried to fix it, but they didn't want to just hardcode an exception. It
eluded them for a while. Finally, it was not there anymore. They asked around
for who had solved it, no one answered. Finally, one colleague arrived late -
and placed the gnome on their desk.

~~~
hiharryhere
"Buy the Gnome" should become a saying, like "eat the frog".

------
binarymax
Howdy. Author here. Really cool to see so much good discussion on this. I want
to turn several of them into blog posts on their own with
explanations/stories/what-have-you. Taking votes for what you'd like to see
first. For the record, my fave is "Languages don’t change".

~~~
dexen
Thank you for the thorough and practical write-up.

About the only thing I would add to it is i18n concerns.

A few quick ones off of the top of my head:

    
    
      - Words are separated by whitespace or dashes.
      - Customers only ever enter ASCII.
      - Customers only ever enter accented characters with/without accents.
      - A "Unicode-capable" system will happily take in any valid unicode.
      - A "Unicode-capable" system will pass through any valid unicode undisturbed.
      - Software systems perform Unicode normalization.
      - WinNT API is UTF-16.
      - There is 1-to-1 mapping between uppercase and lowercase.
      - Unicode collation algorithm is optimal for every single language.
      - Unicode collation algorithm is optimal for multi-language document sets.
      - Distinguishing/coalescing plural and singular forms of words is easy.
      - There are separate plural/singular forms of words.
      - Words have stem and optional suffixes, but not prefixes.
      - Soundex etc. works for every language.

~~~
ProblemFactory
> There are separate plural/singular forms of words.

Or that there are just two plural/singular forms (1 and many) for translating
strings, or that which form to pick is clear.

While English has one form for 1, and one form for 0/many:

\- French pluralises 0 the same way as 1,

\- Czech has a form for exactly 2-4 items,

\- Irish has forms for exactly 3-6 and 7-10 items,

\- Polish has a form for all numbers that end in 2-4,

\- Russian has a form for all numbers that end in 1,

\- Arabic has forms for exactly 0 and 2 items, ending in 03-10, and many more.

A strings table will need at least 10+ variants if you want to translate
strings referring to number of items.

------
kccqzy
I think this article is setting up a pretty high bar for search. For small
datasets, you can very well just add an automatically generated "description"
column in your database, and then do a SQL LIKE query: it's a simple substring
matching.

It's by no means smart, doesn't handle misspellings or anything, but it works
reasonably fast and predictably. This is basically how almost every desktop
app with a search bar works. This is how word processors and editors work when
users search within the document.

~~~
binarymax
Sorry but that’s going to result in a pretty horrible search experience. If
you are putting a search bar on your page and that’s your search backend - you
might as well skip search entirely because it’s only going to cause you and
your customers pain.

The difference with find on page is that it’s obvious and transparent what is
being searched and the expectations of the interface. Trust me when I say that
a search bar to a layperson on your site is them thinking “oooh I can google”

~~~
kccqzy
Sound like you need to redesign your search bar on the page to convey to users
the expectation that this is a simple search.

~~~
gibrown
Do you have an example of that that simple search bar could look like?

~~~
kccqzy
I'm not a professional UI/UX designer but here's a guess. It should not be the
prominent thing on your page. Prominent search bars like the one on Google's
home page convey to users the idea that this is the primary way to navigate
and use this website, and therefore be loaded with higher expectations. So
don't make the search bar look prominent.

Next, make it context-specific. Don't put a search bar at the top of the page
suggesting that this bar can search for everything. If you use a simple
implementation like a SQL LIKE to implement search, put the search bar right
next to the thing that is displaying results from the table. Make it look like
it's filtering the table.

Finally, label the search bar using words like "Keywords," which also suggest
to users that they should be typing keywords instead of a more complicated
natural language phrase.

~~~
gibrown
Those are interesting ideas thanks for the thoughts. FWIW I’ve seen users try
to use even non-prominent search boxes like those as if they can do more than
SQL LIKE. Most users have no idea how any of this works and just want answers.

Mostly I think this whole thread demonstrates the point of the original
article, but I appreciate your response.

------
afturner
This is both awesome and so so discouraging. Does anyone have some direction
on how to produce good search systems??

~~~
softwaredoug
Focus on measuring search quality and methodology first. Be a scientist. Great
search teams obsess about methodology. Treat everything you try as a
hypothesis, not guaranteed to work. Create a feedback loop that improves the
pace of experimentation.

Other than that, the solution space is just as wide open as regular
programming. It's just in many ways more frustrating because nobody knows what
they really want from search, they just "know it when they see it" and no two
users really can agree on what a good result is! :)

~~~
mayank
This is a very, very insightful point. I would add: never expect a singular
"perfect" algorithm, but rather build a framework that lets you blend (and
evaluate/weight) the signals from various hacks, workarounds, heuristics, and
"proper" algorithms.

------
jimmaswell
Search engines work like databases - Too vague but arguably yes in the
abstract.

Search can be considered an additional feature just like any other - Yes? How
do you falsify this?

Search can be added as a well performing feature to your existing product
quickly - Yes if you're using a CMS with search already there like Drupal, or
you can use that thing where your search uses/directs to Google.

~~~
iforgotpassword
> Search can be added as a well performing feature to your existing product
> quickly - Yes if you're using a CMS with search already there like Drupal

Adding a feature by using a product that already has that feature is not
"adding a feature to a product". It's "doing nothing since there's nothing to
do". ;-)

Using Google search for pages might work for simple sites that mostly host
text content, but not for things like "find all foos that are between 20 and
30 kg".

~~~
jimmaswell
If it's just a few things like "find all foos that are between 20 and 30 kg"
then that might be nothing but building a simple query out of a few criteria.
Not all searches need to be or even aspire to be a super-general search like
Google. The ebay search probably isn't all that complicated (relatively) for
example. If you're trying to make another Google for some strange reason then
the article applies more.

------
perlgeek
* All customers may see the same data

God, how I hate that authorization woes find a way to make everything else 5x
more complicated.

~~~
Lowkeyloki
I wonder if that's aimed at permissions-based stuff or, like, search bubbling?

~~~
perlgeek
I'm talking about permission-based stuff.

------
isoskeles
On misspellings (since there are quite a few lines here dedicated to them), I
had the fun responsibility of learning / knowing too much about how our search
worked (we were/are using an old version of Solr), and started telling people
that there's a way to at least do _something_ about misspellings.

After conversations with two or three product managers, it became clear that
the best course of action was to do nothing at all. I'm definitely not an
expert on search or human behavior, and running through all the possible
interpretations of how to handle misspelled words and what the customer wants
was way more work than I was prepared to do.

I'll even point out that my initial suggestion was, "Let's just copy Google
and do, 'Did you mean to type _______?'" Even that was met with, "what if the
customers X" "what if the customers Y" etc. etc. Wasn't worth the time (at the
time).

~~~
aflag
You could call it related searches and only display the suggestion when all
words are either in the products catalog or in the dictionary, also checking
if the query returns something with a phrase search. That can help with typos
without ever being to weird

------
reaperducer
A list of postulations without examples or explanations is not useful.

~~~
tempguy9999
Quite true!

Or even enough context to interpret:

> Search can be considered an additional feature just like any other

Is that a falsehood? - what does it even mean?

~~~
the_af
Almost nothing. I guarantee that for any non-trivial feature, you could just
say:

"<non-trivial feature F> can be considered an additional feature just like any
other"

And everyone will agree that's probably false. They could have written "search
is almost never a trivial feature, and you should take your time to consider
complications", but I suppose that wouldn't sound as a cute as a "Falsehoods
Programmers Believe" list.

------
33degrees
Related to "Customers who know what they are looking for will search for it in
the way you expect", many people don't understand that a search engine works
by matching text strings (albeit in an often sophisticated way). They see it
as sending commands that the search engine understands, and will then find
results for...

~~~
jakear
I know VSCode had an issue where people would type whole sentences into the
settings search bar. They got around it by incorporating some of Bing’s NLP
logic. Goes to show, even amongst the “technically inclined” (those who not
just use VSCode, but also try modify things in it), this still holds.

------
kazinator
Search interfaces should have a configuration for smart users:

    
    
      [ ] Disable fuzzy parsing hacks (reject my queries if they have bad syntax).
      [ ] Don't search for sound-alikes; assume I spelt everything rite.
      [ ] Respect the non-alphanumeric characters in my query, which I put there for a reason.

------
sethammons
> A customer using the same query twice expects the same results for both
> searches

Really, this is a falsehood? Like, I want the same query to give the same
results given the same dataset always. When do you not want that?

~~~
dexen
_> >> A customer using the same query twice expects the same results for both
searches_

Of course this is false; please consider:

\- customers expect to see in search results whatever new information they
added/updated in the system (this is related to "Customers don’t expect near
real time updates");

\- customers expect "personalized" search results; having built up a history
of searches centered around particular subjects (say, programming), you'll
expect much different results for "string" than the general population gets;

\- customers expect new/more results having logged in, or having gained new
permissions/roles;

\- customers running "knowledge" or "command" queries ("what is the weather?"
"password 16") expect varying results

~~~
greggyb
Or, for a short query string, the user may have a different intent without
realizing they've put the same query in.

I might dash off a search for "sneakers" when I am researching footwear. A
week later, I might be thinking about movies and enter the same query string,
expecting IMDB results.

------
dsego
That I actually want Sublime Text to stop responding for 5 mins while
searching for a single space character across my entire project.

------
astura
I'm just waiting for the inevitable article titled "Falsehoods Programmers
Believe Lists Considered Harmful."

------
ape4
Is everything we believe about everything wrong?!

~~~
dexen
"All models are wrong, but some are useful" (generally attributed to the
statistician George Box).

A belief, or a system of beliefs, is but a model. It's virtually guaranteed to
be wrong. It also may very well serve the important function of being _simple
enough_ to handle in-core, while at the same time being _close enough_ to
substitute for the real thing.

~~~
deckard1
> virtually guaranteed to be wrong

I would go a step further and say all formal models are _proven_ to be wrong.
After all, that's what Gödel and Turing kept going on about.

We can't prove any non-trivial program ever halts or does not halt. In fact,
we can't (or don't) prove much about our programs we run anywhere.

All programs are a collection of assumptions. To bring this back to the topic
at hand, if all of our search assumptions are _useful_ to some meaningful
number of people then it really doesn't matter how many "falsehoods" we trip
over. Those falsehoods fall away, becoming mere insignificant edge-cases.
Satisfying all people all the time in all cases is a fool's errand.

Articles like this are good at letting you know your blindspots so you can
_choose_ your blindspots rather than succumb to them. But don't let it become
dogma.

~~~
dexen
_> all formal models are proven to be wrong_

Your point certainly holds true for any _physical entity_ as far as we know -
probabilistic quantum effects, Heisenberg's Uncertainty, chaotic systems, and
all that.

However if you were to model a _theoretical entity_ , and given a few more
constraints (like strict computability, which precludes a turing-complete
systems), you can indeed have correct models. Alas, in practice this is a
rather rare example.

------
jillesvangurp
I've implemented a fair bit of search engines. Usually the problems are with
non technical people in a project. I've had to coach a fair bit of product
owners and UX designers on the basics of search. There are two issues I tend
to have with them: 1) they avoid things that they think are hard that just
aren't 2) they are unaware of features that e.g. Elasticsearch would support
that are highly relevant to their project and therefore don't plan for using
those.

A UX person thinks of search as a text box "like google". However, a lot of
search UIs have a lot going on when you start typing and when you get results
back to refine search results, DYM corrections, breakdowns/aggregations,
suggestions, etc. A lot of these features require careful planning and design
and are not necessarily easy to bolt on if you don't.

I've also had to do basic things like patiently explaining the difference
between sorting and ranking and humbly suggesting that, maybe, having a multi
column layout with sortable columns isn't necessarily the right thing for
presenting search results where the output is a list of stuff in order of
relevance.

Engineers are easier to deal with once you sit them down and talk them through
how stuff works.

------
Lowkeyloki
As with many of the other commentors here, I wonder how many programmers truly
believe these things. Maybe as recently as the 90s or 2000s. Maybe developers
who are fresh out of school.

But we've had search engines as a major part of our lives for about two
decades now. Most of us use one at least daily. We're familiar with the
complexities of search engines and how they differ from simply searching a
document for an exact string or even a regular expression. Many programmers
like me work with tools like analytics and log aggregators that expose the
complexities of search to us in a way that's more intimate than the veneers of
Google and Amazon.

Maybe I'm just lucky in that my experiences have dispelled these notions of
search being easy or simple. But I hope I'm not alone.

Also, there's a disparity between what search is and what your users expect.
Technically, I could make a really simplistic "search engine" that amounts to
a SQL LIKE query. It may not be good or what users might expect coming from
Google/Amazon/etc, but it would be a search engine. (Oops. Looks like my
pedant hat slipped back on when I wasn't looking.)

------
markbnj
I don't know, the first third of the list contained about ten things I don't
think any programmer believes about search, so I gave up at that point.

------
salutonmundo
_cough_ "setup" is a noun, "set up" is a verb </pedant>

------
jasonhansel
I would add to the list of falsehoods:

\- customers are always searching for a specific item, rather than an entire
category

\- customers know that a search engine for one kind of item (e.g. products for
sale) won't also search the entire rest of your website

------
billfruit
One major annoyance,hard to search for any topic related to c programming
online, one has to wade through mountains of results on C++ and C#.

------
isoskeles
> Choosing the correct search engine is easy and you will always be happy with
> your decision

I laughed, but I don't think this is a correct representation of something
many programmers genuinely believe. It's worded in such a way that it's clear
this is a joke. Not sure if I should read the full list if it's just going to
be jokes like this one.

~~~
amelius
How many genuinely unique search engines are there really to choose from? (Not
counting those based on the same underlying libraries)

~~~
binarymax
Quite alot, actually:
[https://en.wikipedia.org/wiki/List_of_search_engines](https://en.wikipedia.org/wiki/List_of_search_engines)

------
burtonator
My favorite is "languages don't matter and I can just throw text in there"

------
jackconnor
"Once setup, search will work the same way forever" \- I don't know a single
programmer who believes this about any software.

------
rq1
> Regular Expressions have minimal performance impact

REs and FSMs equivalent.

~~~
afturner
for real? Is RegEx actually a FSM behind the scenes? or are you trying to say
something else

~~~
dexen
Yes and no. The theoretical "regular expressions" are indeed Type-3 grammars
in Chomsky's hierarchy.

In practice, the common "RegEx" implementation implement a lot of extras, that
break the theoretical backing, and also exhibit highly non-linear behaviors.
Cf. this excellent paper by Russ Cox:
[https://swtch.com/~rsc/regexp/regexp1.html](https://swtch.com/~rsc/regexp/regexp1.html)

~~~
rq1
Thank you for this reference!

------
rdgthree
__

~~~
afturner
Algolia looks good, but are there any OSS alternatives for those of us trying
to bootstrap a search system

~~~
mftrhu
For a blog/static website, Tipue Search [1], or maybe Datasette [2]. There are
Pelican [3]/Jekyll [4] plugins for the former.

[1] [http://www.tipue.com/search/](http://www.tipue.com/search/)

[2] [https://24ways.org/2018/fast-autocomplete-search-for-your-
we...](https://24ways.org/2018/fast-autocomplete-search-for-your-website/)

[3] [https://github.com/getpelican/pelican-
plugins/tree/master/ti...](https://github.com/getpelican/pelican-
plugins/tree/master/tipue_search)

[4] [https://github.com/jekylltools/jekyll-tipue-
search](https://github.com/jekylltools/jekyll-tipue-search)

------
ummonk
Pretty much none of these are things programmers believe about search.

Putting limited effort into creating a mediocre search feature doesn't mean
that you believe these falsehoods; it just means that you're too resource
constrained to put serious investment into creating and improving a high
quality search feature.

