Falsehoods Programmers Believe About Search

bryanrasmussen · on May 29, 2019

I've implemented search engines for small to relatively large organizations. Even at the companies where nobody knew anything about how search hardly any of these falsehoods were believed.

Also this doesn't work as a good Falsehoods Programmers believe thing subject because Falsehoods programmers believe are not about technologies but about non-technological things that are commonly needing to be handled in programs - hence Falsehoods programmers believe about:

Names, Phone Numbers (sort of technical but it's not falsehoods about how phones work, but rather about how phone numbers are structured and what they 'mean'), Credit Cards, Addresses

Good possible future Falsehoods programmers believe about:

Sleep patterns, Personal identifiers, Genders

In fact I am currently dealing with a falsehoods programmers believe about versioning of laws and standards at work.

blowski · on May 29, 2019

Eric Myers needs to write an article called “Falsehoods developers believe about writing falsehoods developers believe articles”.

rzzzt · on May 29, 2019

"Falsehoods programmers believe considered harmful"

EmilStenstrom · on May 30, 2019

Falsehoods programmers... you wouldn't believe what comes next?!!

blt · on May 30, 2019

'Why "Falsehoods programmers believe" is not my favorite genre of programming article'

kdeldycke · on May 31, 2019

Already tried to compile such a list, "Falsehoods Programmers Believe About Falsehoods Lists": https://kevin.deldycke.com/2016/12/falsehoods-programmers-be... :D

ineedasername · on May 29, 2019

Yeah, I can't imagine most if any developer believing, "You don’t need to monitor search queries, results, and clicks"

This is less what programmers believe, and more a "Things to keep in mind" sort of list. But I suppose the less accurate title is a bit more click-baity.

softwaredoug · on May 29, 2019

I don't know, some of the best ones are technical, there's apparently a Github Repo on "falsehoods" articles. With a section on software engineering even:

- https://github.com/kdeldycke/awesome-falsehood#software-engi...

bryanrasmussen · on May 30, 2019

I don't know, looking through those they seem to be more another example of the linked article - not classical Falsehoods programmers believe - for example this one https://pozorvlak.livejournal.com/174763.html Falsehoods programmers believe about build systems - quite a lot of them are really Implementation Decisions of Build Systems Do not cover every need.

smudgymcscmudge · on May 30, 2019

I’m not sure if this is parody.

ashelmire · on May 29, 2019

Falsehoods Google, Microsoft, JIRA, and others seem to believe about search:

That, when searching for a string, I don't want exact matches to appear in the results.

If your search ever DOESN'T return exact matches (barring common misspelling correction), you're doing something seriously wrong.

GordonS · on May 29, 2019

This drives me crazy - and it happens far too often!

Even worse, Google sometimes shows results that don't even contain text you've specified an exact match for with double quotes, e.g. "find me"

xorcist · on May 30, 2019

This is a misfeature that Google started using some years ago.

There is a tab-style link directly below the search box called "Tools" in the search results page. Once clicked it displays a few settings and one of them can be set to "Verbatim".

Choose that and your search terms will actually be what is searched for, as opposed to some arbitrary subset of it. I wish this was better documented.

ashelmire · on May 30, 2019

It looks like that resets with each search, but thanks for the tip.

gatherhunterer · on May 30, 2019

DuckDuckGo for search results, Google for “Google Search” results.

OJFord · on May 29, 2019

Amazon. As in the shop - I can do one search and get a result, even a 'featured' result, and then change it to something more generic that's an exact match for the _title_, sort in a way that will make me easily see it again, (e.g. by ascending price) and it not show (e.g. first result is more expensive than the product I just saw).

It's infuriating, because presumably I don't always happen to see the unknowingly hidden option.

ineedasername · on May 29, 2019

it seems like it used to be much easier to get Google to return exact matches. Just my subjective experience of course, but as accuracy for word-sense-disambiguation significantly improved it seems like Google has become much more comfortable returning what it believes are close or related matches. Overall it's probably better search, but I find myself having to put things in quotes more than I used to when looking for a very precise result.

djakjxnanjak · on May 29, 2019

Does Google get false positives (incorrectly detecting a mistake and “fixing” it?) or false negatives (failing to detect a mistake you make) more often? Is it possible that cognitive biases prevent you from giving Google credit when it fixes your mistakes? Is it possible Google has done extensive studies to find the preferred trade-off? Is it possible that your preferences are different than the average person because you’re an engineer?

Godel_unicode · on May 30, 2019

I once was curious about this so I kept track of my Google searches for a week, and it was overwhelmingly the case that what it returned was what I wanted but not what I asked vs the other way around.

amdsn · on May 29, 2019

Another frustrating thing with google search is when it translates (or attempts to translate) queries for you and then fails to show you any results in the language of what you typed in. Even with languages specified in my google account I can't get it to stop without quoting part of search. I've switched to using yandex for a lot of searches just to avoid it.

yread · on May 29, 2019

No kidding. The other day I searched for a gas station on google maps and got bus station instead

ryanisnan · on May 29, 2019

Quit thinking you know what you want better than the machines.

VikingCoder · on May 29, 2019

I'm not sure I understand you.

If I search for "restaurants", I want search results that ARE restaurants, not search results which have the word "restaurants" in them.

What do you want to have happen?

OJFord · on May 29, 2019

It depends what you're looking for. If it's an error message, you probably want that exact string, not results that are errors but don't include the string.

It used to be that searching literally "restaurants" (i.e. with quotes) would search for an exact match (particularly useful for multi-word searches in those days), but no more. It's taken as a 'hint' or something, I believe, but not an absolute instruction.

joseluis · on May 29, 2019

And this is how it began the AI takeover.

ashelmire · on May 29, 2019

If I’m doing a google search, you’re probably right. If I’m searching my gmail account, you’re probably wrong. I’ve searched for exact phrases that occur in my email (for both gmail and outlook) and failed to get the matches anywhere in the results (and had to find them by other means). Same with Jira. It’s very frustrating to have to sort through hundreds or thousands of messages for the result you wanted.

IggleSniggle · on May 30, 2019

The gmail one in particular drives me absolutely bonkers. Like, I don’t care if the search needs to take 15 seconds to do, just find the email with the phrase that I know is there!

This is even more frustrating when you do a date constrained search and google tells you there are no emails from that date, but if you page through manually, it’s there. I feel like gmail is constantly gaslighting me.

pushpop · on May 30, 2019

Is it so weird that websites for restaurants would literally have the word “restaurant” on it somewhere? Eg

> Foobar Canteen is a 2 Michelin star restaurant located in the heart of Soho.

This used to be how search engines knew what was a page about restaurants and what wasn’t.

But in any case, the problem with not returning exact strings is those times when you do need exact strings. Like researching a famous quote, passage of text or software error message.

MereInterest · on May 30, 2019

If I search for "python lea", I want information about the python package "lea". I do not want general information about "learn python".

Godel_unicode · on May 30, 2019

Ran that search just now, got 5 results about using lea, the 6th was about scikit-learn.

Terr_ · on May 29, 2019

It's all part of their attempt to de-commoditize their stuff, changing from an indexing-and-keyword-tool to invasive "assistant" that Knows What You Meant To Say.

However, as someone who already learned to translate my desires into keywords, it's freaking annoying.

rq1 · on May 29, 2019

This.

nickjj · on May 29, 2019

There's also:

That we want well known standards like CTRL + F in a browser to be hijacked and replaced by default with a custom search experience that's a lot worse than a browser's search.

Try CTRL + F'ing on Stripe's documentation: https://stripe.com/docs/api/plans

oconnor663 · on May 29, 2019

Often those custom search implementations are there because the "page" you're on isn't really a page, the scrollbar is fake, text is inserted and removed automatically as you "scroll", and as a result of all that Ctrl-F doesn't actually work by default. Of course you could argue that these heavyweight designs are a bad idea in the first place, but that's a trickier discussion. I think it's rare that web sites hijack Ctrl-F when leaving it alone is an option -- but I could be wrong about that.

oftenwrong · on May 30, 2019

>Of course you could argue that these heavyweight designs are a bad idea in the first place, but that's a trickier discussion.

I would argue that, and I don't think it's a particularly tricky discussion; If your site design subverts the normal, expected behaviour and functionality of the browser to such an extreme degree, then you created a poor user experience.

hypervis0r · on May 29, 2019

On Chrome, you can hit CTRL + G, which does the same as CTRL + F, but is not hookable by web sites

jakub_g · on May 29, 2019

Thanks for that, I didn't know it! It seems that also F3 works, and in fact CTRL+G is alias of F3, and both work in Firefox as well.

The only issue is that in Firefox, it is only equivalent for the first search; once you close the bottom bar, subsequent F3/CTRL-G just do "find next occurrence" and do not display the bar anymore. Chrome always displays the search input on the other hand.

Edit: since talking shortcuts, in Firefox ' (apostrophe) is like CTRL-F but searches only hyperlinks (and you can cycle through in case of multiple matches with F3/CTRL-G) which is extremely useful for quickly navigating pages via keyboard only.

jkaptur · on May 29, 2019

Ctrl + G certainly is hookable[0], folks just rarely know that it's an alias for 'find'. If you REALLY want the browser's search, in Chrome, you can use the mouse to open the menu and choose "Find". You could also use any keyboard shortcut that focuses the URL bar (so keyboard events are no longer sent to the page) and press Ctrl-f then.

0: In Google Sheets, for example, Ctrl-g opens the JS-driven find bar, or, if it's already open, advances the match.

Quekid5 · on May 29, 2019

Somewhat related: The last version of Chromium said "Press Alt+F and then X" instead of Ctrl-Shift-Q when I tried to quit it using that key combo.

Unfortunately, Alt+F is trivially overridable by web pages (Twitch.tv in this case -- to move to the search bar), so that doesn't really work.

Chromium devs have no idea what the impact of their decisions are... and judging by the issue trackers they don't care.

AnIdiotOnTheNet · on May 29, 2019

I kinda want to burn down the world after reading this comment. How did we let computing get to be such a garbage fire?

glitchdout · on May 30, 2019

Holy shit I had no idea! I was tired of Chrome's bookmark manager hijacking CTRL + F. I'll use CTRL + G from now on!

danappelxx · on May 29, 2019

Luckily, Stripe is polite enough to allow users to fall back to default search with another tap of ctrl+f.

Humdeee · on May 29, 2019

I would second the discouragement to override this behaviour. Although they have handled it well with a CTRL + F + F again bringing it back to native search behaviour.

DanFeldman · on May 29, 2019

I don't see an issue, their widget allows me to go back to my default ctrl-f by pressing it again.

nickjj · on May 29, 2019

The main issue is it's on by default and it's a vastly inferior search UI to what everyone has been using to search / skim a page since browsers existed.

mappu · on May 29, 2019

Ace editor does this too - possibly because large documents might not all actually be in the DOM.

nickjj · on May 29, 2019

In Stripe's case, the docs are all rendered server side and are viewable without Javascript.

I'm not sure if you can hook into the native CTRL + F search tool and see what a user typed (my gut says no way there's an API for that), so I guess Stripe just wanted to track as much information as possible on what people are searching for, even if it makes the user experience a lot worse.

rattray · on May 29, 2019

(I am an engineer who worked on this feature)

The docs are indeed viewable without JS[0] (in a limited way) but the default experience relies on JS to render text.

We don't render all content on the page at once for performance reasons, which is (as a sibling speculated) the driving reason for overriding cmd+f/ctrl+f by default.

I hope to write an engineering blog post soon about how we build the Stripe api docs, with some focus on the performance and UX tradeoffs at play here.

[0] https://stripe.com/docs/api?javascript=false

tzhenghao · on May 29, 2019

This. Stripe's overengineered, custom Ctrl+F is unusable on Firefox. they could've just put a search bar for their own "search" feature instead of breaking the Ctrl+F browser convention that we're so familiar with.

climb_stealth · on May 30, 2019

For me it shows the option to use the native search by pressing CMD+F a second time.

idreyn · on May 29, 2019

oof, that's completely unusable on Firefox for me -- it seems like the loading is blocking my keyboard input or something.

    Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0

rattray · on May 29, 2019

(I have worked on this feature) Thanks for reporting – I can reproduce and will take a look at fixing shortly. Sorry for the troubles!

Khaine · on May 30, 2019

Why don’t you fix it by removing it. Most web users don’t want native features overridden. It’s obnoxious. It breaks UI and UX paradigms.

Whoever thought it was a good idea should get shot out of a canon.

Decisions like this are why I do not support adding more functionality into web browsers. Most web developers have proven to be inept and incompetent. As demonstrated by this dumpster fire of a “feature”

Godel_unicode · on May 30, 2019

Can you fix this issue by not hijacking keyboard shortcuts needlessly?

idreyn · on May 30, 2019

Thanks for taking a look!

dredmorbius · on May 29, 2019

What does that do?

(Mobile, cannot invoke keyboard on page, JS disabled.)

And behaviour may change.

Just tell us.

nickjj · on May 29, 2019

Instead of being able to hit CTRL + F and immediately search and then have your browser highlight matches and decorate your scrollbar with where results are (so you can skim), they decided to override that behavior and introduce their own take on what search results should look like.

One that takes multiple seconds to get a response on a search and it's all contained in a tiny modal dialog box that has no skimmability and when you click one of the results it does a new page load to bring you to the results. Stripe is usually a superb developer experience. Truthfully I have no idea how it ended up in production as a default option.

redisman · on May 30, 2019

It's something like "Go to Resource" in code editors. Tries to navigate to methods / things based on what you type

saalweachter · on May 29, 2019

* When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon

One of my favorite sets of local search bugs involve interpreting "near me" as "near maine".

dexen · on May 29, 2019

Trying to fix every single problem in the search module/layer/service is an anti-pattern by and of itself.

There's an anecdote[1] from early days of Google Search where a certain domain was ranking 1st for an unrelated query (i.e., a false positive). The managers refused to move ahead before that got fixed, but the bug/edge case proved a head scratcher for several weeks on end.

Lastly one of the engineers solved the problem - by buying the domain and taking it offline.

Point being, if you can fix the problem outside of the code domain, do just that.

[1] sadly can't seem to find it - mostly getting spam articles related to SEO

emiliobumachar · on May 29, 2019

I'm pretty sure I've read a similar story in the book "I'm Feeling Lucky". It goes like this:

In the early days of Froogle, a shopping search engine made by Google, searching for "sneakers" always yielded a garden gnome wearing sneakers, one unit on sale, as the top result. This was considered bad, as someone searching for "sneakers" probably wanted to buy sneakers, not garden gnomes. The whole team tried to fix it, but they didn't want to just hardcode an exception. It eluded them for a while. Finally, it was not there anymore. They asked around for who had solved it, no one answered. Finally, one colleague arrived late - and placed the gnome on their desk.

hiharryhere · on May 30, 2019

"Buy the Gnome" should become a saying, like "eat the frog".

julienfr112 · on May 29, 2019

That's a very good story !

Bartweiss · on May 29, 2019

A lot of these entries are probably better handled with improved feedback than changed behavior. If you can tell whether the user meant 'either' or 'Oregon', that's great, but spending a week on the problem is a lot less urgent than just displaying "including results for Oregon".

Does Google have some kind of cultural allergy to special-casing or writing fallback rules around its recommendation systems? I ask because Chrome's spellcheck still lacks a lot of words that you can find in an abridged dictionary; it seems as though fallback rules like "the first hit needs at least one keyword match" or "never flag words found in Merriam-Webster as unknown" are basically never employed.

Scoundreller · on May 29, 2019

I see you’ve never made any embarrassing email mistakes.

Retards,

-Scoundreller

irrational · on May 29, 2019

I live in Oregon and always search using OR. I'm so used to that being the abbreviation for Oregon that this is the first time it has occurred to me that it could be confused with boolean or!

oceanghost · on May 29, 2019

There's a fantastic and subtle bug in the Google home hub and nest integration whereupon when the conditions are met, the nest integration can be seen in the pull-down menu's but voice integration doesn't work.

So, when one says "Hey Google, show me the front yard." Instead of showing the camera feed-- one gets information about a bar in LA called "The front yard".

reaperducer · on May 29, 2019

Reminds me of a Siri inquiry I has last week:

Me: Hey, Siri, how long does it take to drive to Yellowstone National Park?

Siri: OK, one option I see is US National Commercial Real Estate Services on W Park Run Dr.

(Yes, that's verbatim — I screenshotted it)

oceanghost · on May 30, 2019

I can't fathom how it arrived at that answer?

recursive · on May 29, 2019

It's only a bug when you're not in maine. :)

phkahler · on May 29, 2019

Except when people are planning a trip to maine.

therealdrag0 · on June 3, 2019

Huh. I just worked on a bug around this at work. We just added some quoting and that solved it.

binarymax · on May 29, 2019

Howdy. Author here. Really cool to see so much good discussion on this. I want to turn several of them into blog posts on their own with explanations/stories/what-have-you. Taking votes for what you'd like to see first. For the record, my fave is "Languages don’t change".

dexen · on May 29, 2019

Thank you for the thorough and practical write-up.

About the only thing I would add to it is i18n concerns.

A few quick ones off of the top of my head:

  - Words are separated by whitespace or dashes.
  - Customers only ever enter ASCII.
  - Customers only ever enter accented characters with/without accents.
  - A "Unicode-capable" system will happily take in any valid unicode.
  - A "Unicode-capable" system will pass through any valid unicode undisturbed.
  - Software systems perform Unicode normalization.
  - WinNT API is UTF-16.
  - There is 1-to-1 mapping between uppercase and lowercase.
  - Unicode collation algorithm is optimal for every single language.
  - Unicode collation algorithm is optimal for multi-language document sets.
  - Distinguishing/coalescing plural and singular forms of words is easy.
  - There are separate plural/singular forms of words.
  - Words have stem and optional suffixes, but not prefixes.
  - Soundex etc. works for every language.

ProblemFactory · on May 30, 2019

> There are separate plural/singular forms of words.

Or that there are just two plural/singular forms (1 and many) for translating strings, or that which form to pick is clear.

While English has one form for 1, and one form for 0/many:

- French pluralises 0 the same way as 1,

- Czech has a form for exactly 2-4 items,

- Irish has forms for exactly 3-6 and 7-10 items,

- Polish has a form for all numbers that end in 2-4,

- Russian has a form for all numbers that end in 1,

- Arabic has forms for exactly 0 and 2 items, ending in 03-10, and many more.

A strings table will need at least 10+ variants if you want to translate strings referring to number of items.

mikesickler · on May 29, 2019

yes! tokenization and problems with word boundaries alone would be great to dive into!

binarymax · on May 29, 2019

Thanks! Nice additions!

busterarm · on May 29, 2019

Having spent a fair chunk of my career dealing with search, I went down that list nodding in agreement to nearly every single bullet point save for about 10...

...and those I had to classify as "problems I probably had and didn't recognize" or "will surely encounter soon"

So often we underestimate this thing...

koala_man · on May 30, 2019

I found the list of falsehoods about phone numbers (https://github.com/google/libphonenumber/blob/master/FALSEHO...) really enlightening because it gave a short rationale or example for each point. I think that's way more helpful and useful than the more traditional snarky format.

dumbfounder · on May 29, 2019

You can increase recall without adding noise. Customer wanted to match substrings of words and then was like, why are all these irrelevant documents returning?

ben509 · on June 4, 2019

May I propose an additional falsehood?

"Users won't want to turn search highlighting off."

Maybe it's just me, but this[1] seems distracting.

[1]: https://docs.python.org/3/library/pickle.html?highlight=pick...

kccqzy · on May 29, 2019

I think this article is setting up a pretty high bar for search. For small datasets, you can very well just add an automatically generated "description" column in your database, and then do a SQL LIKE query: it's a simple substring matching.

It's by no means smart, doesn't handle misspellings or anything, but it works reasonably fast and predictably. This is basically how almost every desktop app with a search bar works. This is how word processors and editors work when users search within the document.

softwaredoug · on May 29, 2019

When was the last time your average (non programmer) user expected search to behave like CTRL+F?

I'm not sure you're doing your users a service implementing search with SQL LIKE. I think it's probably better to divert them to Google, use a full text SQL index, use a managed search service like Algolia, or not do search. Otherwise, you're just promising them functionality that is almost always going to fail them.

Why is that?

Users have been pretty heavily conditioned to use search in specific ways that are different than finding a text in documents. They have a broad range of needs that a wildcard 'find in files' search doesn't really support. And most frustratingly users expect a single search bar to support them all. Some needs are known item - finding an item by name (like contacts on your phone). Other needs are informational - finding a fact or idea by expressing requirements. Sometimes its about getting a survey of information about a topic, or sometimes its about compare-and-contrasting different products.

The primitives available in SQL LIKE don't really lend themselves to solving any of these problems. There's no concept of relevance ranking, there's exact, direct, case-insensitive search, not to mention it's going to do a full table scan on every search...

(You'd have my ear more if we were talking about full text search features in SQL.)

kccqzy · on May 29, 2019

I'm pretty sure we are talking about different ideas of "search."

jerf · on May 29, 2019

"I think this article is setting up a pretty high bar for search."

All of the "Falsehoods Programmers Believe About ..." genre articles do.

The way to use them is not to view them as an immutable checklist that all programs must conform to or else they are forever and always nothing but total crap, but as a list of things professionals should at least have some clue about, and that you should generally make deliberate decisions about, rather than accidental ones. Are you a pizza place with ten locations in a single state? Then by all means, take US-only addresses, and hard-code the time zone on your web site, and probably just ignore search, and expect first & last names or whatever. Just do it as a deliberate tradeoff, with an understanding of what it may take to undo it later.

Are you working in an international company serving customers all around the world, with the need to provide some search functionality? Well, you probably need to be able to fulfill a lot more of the relevant lists.

Semiapies · on May 29, 2019

I actually went to town on the features for the search on an internal project... And all anyone ever uses is pulling up tickets by number or by client or technician names.

"It has x, y, z features that the Google search has..." "I didn't know Google did that."

There's even an explanatory popup with how to do anything fancier than a straight text search.... But they just really use numbers and names. Advanced feature usage is once in a blue moon.

At least they're happy with the search function, but lesson learned - for a lot of usages, people aren't expecting much more than a simple text match.

Semiapies · on May 29, 2019

Really, searching for proper names is almost the entirety of what I see clients and internal users do with search, across various projects.

Kluny · on May 29, 2019

This is more or less what I recently told a client who wanted search on a utility I built him. He wanted rich search, like using quote marks and database operators, but his budget for the whole project was about $1500. I told him that I could build the entire project, plus simple searches on the description fields with wildcard matching. Or I could give him fully featured search by using third party software, but for triple the budget. Explaining it that way got the message across, and it turned out that simple search was enough.

theandrewbailey · on May 29, 2019

Use your database's ngram/trigram module on that description column (and full text index) to handle misspellings:

https://www.postgresql.org/docs/current/pgtrgm.html

https://dev.mysql.com/doc/refman/5.7/en/fulltext-search-ngra...

binarymax · on May 29, 2019

Sorry but that’s going to result in a pretty horrible search experience. If you are putting a search bar on your page and that’s your search backend - you might as well skip search entirely because it’s only going to cause you and your customers pain.

The difference with find on page is that it’s obvious and transparent what is being searched and the expectations of the interface. Trust me when I say that a search bar to a layperson on your site is them thinking “oooh I can google”

ufmace · on May 30, 2019

I appreciate your experience, but you may have also noticed that there's a number of sibling comments from developers whose customers definitely did want a Ctrl-F type search, and not some near-AI match-what-I-really-meant thing. I've certainly worked in vertical markets where that's what the customer actually wanted.

I think it more goes back to the real most important lesson of programming - you must know your customer and their needs first. If a google-like experience is what your customers demand, then you better understand all of that stuff and build it. If they just want to search for names and ticket numbers, any more advanced intelligence is a waste of time that could have been used to build other features that the customer actually wants.

kccqzy · on May 29, 2019

Sound like you need to redesign your search bar on the page to convey to users the expectation that this is a simple search.

gibrown · on May 30, 2019

Do you have an example of that that simple search bar could look like?

kccqzy · on May 30, 2019

I'm not a professional UI/UX designer but here's a guess. It should not be the prominent thing on your page. Prominent search bars like the one on Google's home page convey to users the idea that this is the primary way to navigate and use this website, and therefore be loaded with higher expectations. So don't make the search bar look prominent.

Next, make it context-specific. Don't put a search bar at the top of the page suggesting that this bar can search for everything. If you use a simple implementation like a SQL LIKE to implement search, put the search bar right next to the thing that is displaying results from the table. Make it look like it's filtering the table.

Finally, label the search bar using words like "Keywords," which also suggest to users that they should be typing keywords instead of a more complicated natural language phrase.

gibrown · on May 30, 2019

Those are interesting ideas thanks for the thoughts. FWIW I’ve seen users try to use even non-prominent search boxes like those as if they can do more than SQL LIKE. Most users have no idea how any of this works and just want answers.

Mostly I think this whole thread demonstrates the point of the original article, but I appreciate your response.

mattmanser · on May 29, 2019

Yeah, I rolled my eyes after opening the article, it's a load of tosh depending on what your need is.

I have written search engines for a couple of sites that combined serve about a million uniques a year.

It's not great, but it's not terrible, and took less than a week. People search for places and names, so it's quite easy to match them.

We looked at one of the open source engines, but it was a lot of effort for not a lot of gain, and essentially adds another significant moving part to go wrong.

simonhughes22 · on May 30, 2019

F*ck it, I am ditching VS and coding in assembly from now on.

stevelini · on May 30, 2019

Yes!!! Am joining the revolution!

dpau · on May 29, 2019

agreed, simple and effective. but also- most databases have more advanced full-text search functions, for example in about an extra hour of work you can easily set up simple boolean searches using mysql's MATCH against a full-text index over multiple columns.

kkarakk · on May 30, 2019

>This is basically how almost every desktop app with a search bar works.

not windows 10

simonhughes22 · on May 30, 2019

Oh god......

afturner · on May 29, 2019

This is both awesome and so so discouraging. Does anyone have some direction on how to produce good search systems??

softwaredoug · on May 29, 2019

Focus on measuring search quality and methodology first. Be a scientist. Great search teams obsess about methodology. Treat everything you try as a hypothesis, not guaranteed to work. Create a feedback loop that improves the pace of experimentation.

Other than that, the solution space is just as wide open as regular programming. It's just in many ways more frustrating because nobody knows what they really want from search, they just "know it when they see it" and no two users really can agree on what a good result is! :)

mayank · on May 29, 2019

This is a very, very insightful point. I would add: never expect a singular "perfect" algorithm, but rather build a framework that lets you blend (and evaluate/weight) the signals from various hacks, workarounds, heuristics, and "proper" algorithms.

binarymax · on May 29, 2019

In addition to @softwaredoug's comment is his book "Relevant Search", it's a great starting point! https://www.manning.com/books/relevant-search

JHer · on May 29, 2019

I found "Search User Interfaces" by Marti Hearst very informative. It's available online for free: https://searchuserinterfaces.com/

fghtr · on May 29, 2019

Perhaps https://yacy.net could help you.

jimmaswell · on May 29, 2019

Search engines work like databases - Too vague but arguably yes in the abstract.

Search can be considered an additional feature just like any other - Yes? How do you falsify this?

Search can be added as a well performing feature to your existing product quickly - Yes if you're using a CMS with search already there like Drupal, or you can use that thing where your search uses/directs to Google.

Lowkeyloki · on May 29, 2019

I wonder if that one was meant to be controversial as it was the first item. My pedant sense started tingling immediately.

Search engines don't work like your standard RDBMS with SQL and whatnot. You can't just make a SQL query with a LIKE operator and just call it a day if you want modern, featureful searching.

But a search engine is absolutely a database. Lots of things are databases even if they aren't RDBMS and can't be queried with SQL.

Although, as a side note, I have seen some interesting projects that allow you to query things like file systems and operating systems using SQL, or at least syntax largely inspired by SQL.

iforgotpassword · on May 29, 2019

> Search can be added as a well performing feature to your existing product quickly - Yes if you're using a CMS with search already there like Drupal

Adding a feature by using a product that already has that feature is not "adding a feature to a product". It's "doing nothing since there's nothing to do". ;-)

Using Google search for pages might work for simple sites that mostly host text content, but not for things like "find all foos that are between 20 and 30 kg".

jimmaswell · on May 29, 2019

If it's just a few things like "find all foos that are between 20 and 30 kg" then that might be nothing but building a simple query out of a few criteria. Not all searches need to be or even aspire to be a super-general search like Google. The ebay search probably isn't all that complicated (relatively) for example. If you're trying to make another Google for some strange reason then the article applies more.

perlgeek · on May 29, 2019

* All customers may see the same data

God, how I hate that authorization woes find a way to make everything else 5x more complicated.

Lowkeyloki · on May 29, 2019

I wonder if that's aimed at permissions-based stuff or, like, search bubbling?

perlgeek · on May 30, 2019

I'm talking about permission-based stuff.

isoskeles · on May 29, 2019

On misspellings (since there are quite a few lines here dedicated to them), I had the fun responsibility of learning / knowing too much about how our search worked (we were/are using an old version of Solr), and started telling people that there's a way to at least do something about misspellings.

After conversations with two or three product managers, it became clear that the best course of action was to do nothing at all. I'm definitely not an expert on search or human behavior, and running through all the possible interpretations of how to handle misspelled words and what the customer wants was way more work than I was prepared to do.

I'll even point out that my initial suggestion was, "Let's just copy Google and do, 'Did you mean to type _______?'" Even that was met with, "what if the customers X" "what if the customers Y" etc. etc. Wasn't worth the time (at the time).

aflag · on May 29, 2019

You could call it related searches and only display the suggestion when all words are either in the products catalog or in the dictionary, also checking if the query returns something with a phrase search. That can help with typos without ever being to weird

reaperducer · on May 29, 2019

A list of postulations without examples or explanations is not useful.

the_af · on May 29, 2019

Agreed. It leaves no room for debate or for understanding the assumptions involved.

Also, while many items in the list are insightful, I find what bothers me in this and similar lists is when you could swap anything for "search" (or "time", "addresses" or whatever the other lists happen to mention).

See for example, replacing "search" with an X:

  - Choosing the correct X is easy and you will always be happy with your decision

  - Once setup, X will work the same way forever

  - Once setup, X will work the same way for a while

  - Once setup, X will work the same way for the next week

  - The default X settings will deliver a good X experience

The problem with these assertions is that, while cute, they are so broad and generic they tell us almost nothing about the specific problem of search engines. For almost every decision in software design and implementation, the above assertions hold true.

tempguy9999 · on May 29, 2019

Quite true!

Or even enough context to interpret:

> Search can be considered an additional feature just like any other

Is that a falsehood? - what does it even mean?

the_af · on May 29, 2019

Almost nothing. I guarantee that for any non-trivial feature, you could just say:

"<non-trivial feature F> can be considered an additional feature just like any other"

And everyone will agree that's probably false. They could have written "search is almost never a trivial feature, and you should take your time to consider complications", but I suppose that wouldn't sound as a cute as a "Falsehoods Programmers Believe" list.

dsego · on May 29, 2019

That I actually want Sublime Text to stop responding for 5 mins while searching for a single space character across my entire project.

33degrees · on May 29, 2019

Related to "Customers who know what they are looking for will search for it in the way you expect", many people don't understand that a search engine works by matching text strings (albeit in an often sophisticated way). They see it as sending commands that the search engine understands, and will then find results for...

jakear · on May 29, 2019

I know VSCode had an issue where people would type whole sentences into the settings search bar. They got around it by incorporating some of Bing’s NLP logic. Goes to show, even amongst the “technically inclined” (those who not just use VSCode, but also try modify things in it), this still holds.

sethammons · on May 29, 2019

> A customer using the same query twice expects the same results for both searches

Really, this is a falsehood? Like, I want the same query to give the same results given the same dataset always. When do you not want that?

dexen · on May 29, 2019

>>> A customer using the same query twice expects the same results for both searches

Of course this is false; please consider:

- customers expect to see in search results whatever new information they added/updated in the system (this is related to "Customers don’t expect near real time updates");

- customers expect "personalized" search results; having built up a history of searches centered around particular subjects (say, programming), you'll expect much different results for "string" than the general population gets;

- customers expect new/more results having logged in, or having gained new permissions/roles;

- customers running "knowledge" or "command" queries ("what is the weather?" "password 16") expect varying results

greggyb · on May 29, 2019

Or, for a short query string, the user may have a different intent without realizing they've put the same query in.

I might dash off a search for "sneakers" when I am researching footwear. A week later, I might be thinking about movies and enter the same query string, expecting IMDB results.

ashelmire · on May 29, 2019

- customers expect "personalized" search results; having built up a history of searches centered around particular subjects (say, programming), you'll expect much different results for "string" than the general population gets;

Not always the desired behavior. This should be toggleable. It becomes very difficult to find results outside of what google thinks you want.

binarymax · on May 29, 2019

The gist of this is that customers sometimes re-enter the same query after it failed thinking they'll get what they want the second time. The lesson here is that you can't assume what the customer wants because you don't know. Information needs can be unconscious and contexts between the same query entered twice may have switched.

jldugger · on May 29, 2019

When the datasets are not the same -- the web is ever evolving, and if I just upgraded Ubuntu, I want the latest results for my search query about why a software package isn't working.

dsr_ · on May 29, 2019

Any query that could possibly return "news" should return new results whenever that news is updated.

Bug reports, newspaper articles, blog entries, sports scores...

jldugger · on May 29, 2019

Even your corporate internal wiki is going to have newer and older articles. Reindexing is a thing.

eterm · on May 29, 2019

Temporal context.

If I google "waitrose closing time" I want the closing time of my local supermarket today.

When I googled that yesterday, I got a different result, and that's what I want.

alasano · on May 29, 2019

Since there is no mention of a time parameter, these two queries can be separated by a certain amount of time in which the engine learns more about the user.

Perhaps the user is a business user rather than a developer and once profiled correctly results can be adjusted.

kazinator · on May 30, 2019

Search interfaces should have a configuration for smart users:

  [ ] Disable fuzzy parsing hacks (reject my queries if they have bad syntax).
  [ ] Don't search for sound-alikes; assume I spelt everything rite.
  [ ] Respect the non-alphanumeric characters in my query, which I put there for a reason.

astura · on May 29, 2019

I'm just waiting for the inevitable article titled "Falsehoods Programmers Believe Lists Considered Harmful."

Lowkeyloki · on May 29, 2019

As with many of the other commentors here, I wonder how many programmers truly believe these things. Maybe as recently as the 90s or 2000s. Maybe developers who are fresh out of school.

But we've had search engines as a major part of our lives for about two decades now. Most of us use one at least daily. We're familiar with the complexities of search engines and how they differ from simply searching a document for an exact string or even a regular expression. Many programmers like me work with tools like analytics and log aggregators that expose the complexities of search to us in a way that's more intimate than the veneers of Google and Amazon.

Maybe I'm just lucky in that my experiences have dispelled these notions of search being easy or simple. But I hope I'm not alone.

Also, there's a disparity between what search is and what your users expect. Technically, I could make a really simplistic "search engine" that amounts to a SQL LIKE query. It may not be good or what users might expect coming from Google/Amazon/etc, but it would be a search engine. (Oops. Looks like my pedant hat slipped back on when I wasn't looking.)

markbnj · on May 29, 2019

I don't know, the first third of the list contained about ten things I don't think any programmer believes about search, so I gave up at that point.

ape4 · on May 29, 2019

Is everything we believe about everything wrong?!

dexen · on May 29, 2019

"All models are wrong, but some are useful" (generally attributed to the statistician George Box).

A belief, or a system of beliefs, is but a model. It's virtually guaranteed to be wrong. It also may very well serve the important function of being simple enough to handle in-core, while at the same time being close enough to substitute for the real thing.

deckard1 · on May 29, 2019

> virtually guaranteed to be wrong

I would go a step further and say all formal models are proven to be wrong. After all, that's what Gödel and Turing kept going on about.

We can't prove any non-trivial program ever halts or does not halt. In fact, we can't (or don't) prove much about our programs we run anywhere.

All programs are a collection of assumptions. To bring this back to the topic at hand, if all of our search assumptions are useful to some meaningful number of people then it really doesn't matter how many "falsehoods" we trip over. Those falsehoods fall away, becoming mere insignificant edge-cases. Satisfying all people all the time in all cases is a fool's errand.

Articles like this are good at letting you know your blindspots so you can choose your blindspots rather than succumb to them. But don't let it become dogma.

dexen · on May 29, 2019

>all formal models are proven to be wrong

Your point certainly holds true for any physical entity as far as we know - probabilistic quantum effects, Heisenberg's Uncertainty, chaotic systems, and all that.

However if you were to model a theoretical entity, and given a few more constraints (like strict computability, which precludes a turing-complete systems), you can indeed have correct models. Alas, in practice this is a rather rare example.

inflatableDodo · on May 29, 2019

On a related note, a hell of a lot of strife in the world seems to boil down to people insisting that their preferred taxonomy is the correct one, no matter what the context, rather than accepting that taxonomies aren't facts in the first place, they are tools.

Bartweiss · on May 29, 2019

On which note, the answer to a list like this isn't necessarily "memorize it and avoid all these problems". The benefit can simply be in making these tradeoffs consciously, so you can judge your model better.

If you're Google, differentiating 'or' as in either from 'OR' as in Oregon is a task you need to take on. But if you're writing a National Park lookup tool, you probably just don't want to worry about that case. In that case it's still worth knowing; you might be able to save users some time by at least showing clearly how you reinterpreted their input.

dexen · on May 29, 2019

>The benefit can simply be in making these tradeoffs consciously, so you can judge your model better.

Very much so; engineering is all about choosing the trade-offs, and hopefully improving them in the future. The list also helps with solving some of the unknown-unknowns problem in regard to what the customer expectations may be; even whole new domains of expectations (like immediacy of update, or handling of accented/non-english characters).

Side note:

As far as I can tell, Google got rid of the special-cased "OR" in the general search - right now it's a word, not a predefined/reserved symbol.

They were able to do so by adding "implicit OR-like" operator between all the words in the query. Not quite an implicit OR, not quite an implicit AND; something bit more complex in between.

The words of the query get weighted against matches both on their own, but also as adjacent words (higher weight) and whole phrases (yet higher weight). All in all the problem got solved by improved matching & sorting algorithm, not by somehow smartly detecting when "OR" is meant as "OR", or OR, or or.

The problem got solved in the match scoring/sorting domain, rather than in the query parsing domain.

CM30 · on May 29, 2019

Probably. Still, nothing says you can't write an alternative article; falsehoods non programmers believe about programming.

liability · on May 29, 2019

That might actually be a worthwhile article that helps programmers communicate effectively with non-programmers.

One that tripped me up a few years ago: non-programmers think that 'strings' are long fiberous things that cats play with. The connection between the word 'string' and the concept of text is not intuitively obvious to people who don't already know the lingo. Seems obvious now, in retrospect.

dsr_ · on May 29, 2019

Good idea; I made a start here:

https://blog.randomstring.org/2019/05/29/falsehoods-non-prog...

ianamartin · on May 29, 2019

My life got a lot better when I stopped believing anything and just decided that there are maybe one or two things I'm fairly confident are true. As in, I'm pretty confident that I exist, and fairly confident that you do. But I couldn't prove either of those, and everything else is basically up for grabs.

petra · on May 29, 2019

Interesting.

But how do you deal with the following situations?

-- talking to people, since you have no opinion

-- understanding the people around you , building a mental model of them

-- general confidence

TeMPOraL · on May 29, 2019

> talking to people, since you have no opinion

That's a big tragedy in our society, that you're expected to have a definite opinion on everything. Myself, I have very few strong opinions, and those that I have I hold loosely. When someone asks, I usually try to sketch the space within which I believe the answer lies (e.g. "I suspect X, but then there's Y and Z, and also V I'm not sure what to do with"). This has a nice side effect of making strongly-opinionated regulars suddenly unsure about their own opinions.

dasil003 · on May 29, 2019

I am also this way by nature but it drives a lot of people nuts so I’ve learned to temper it for the particular audience, expressing confidence appropriate to the context of our shared assumptions.

onemoresoop · on May 29, 2019

I don't think the OP said that they have no opinion. Beliefs are a conviction based on cultural or personal faith, morality, or values. Opinions are viewpoints, we all have them, but it's good to be aware that they're not based on facts.

jillesvangurp · on May 30, 2019

I've implemented a fair bit of search engines. Usually the problems are with non technical people in a project. I've had to coach a fair bit of product owners and UX designers on the basics of search. There are two issues I tend to have with them: 1) they avoid things that they think are hard that just aren't 2) they are unaware of features that e.g. Elasticsearch would support that are highly relevant to their project and therefore don't plan for using those.

A UX person thinks of search as a text box "like google". However, a lot of search UIs have a lot going on when you start typing and when you get results back to refine search results, DYM corrections, breakdowns/aggregations, suggestions, etc. A lot of these features require careful planning and design and are not necessarily easy to bolt on if you don't.

I've also had to do basic things like patiently explaining the difference between sorting and ranking and humbly suggesting that, maybe, having a multi column layout with sortable columns isn't necessarily the right thing for presenting search results where the output is a list of stuff in order of relevance.

Engineers are easier to deal with once you sit them down and talk them through how stuff works.

salutonmundo · on May 29, 2019

cough "setup" is a noun, "set up" is a verb </pedant>

jasonhansel · on May 29, 2019

I would add to the list of falsehoods:

- customers are always searching for a specific item, rather than an entire category

- customers know that a search engine for one kind of item (e.g. products for sale) won't also search the entire rest of your website

billfruit · on May 29, 2019

One major annoyance,hard to search for any topic related to c programming online, one has to wade through mountains of results on C++ and C#.

isoskeles · on May 29, 2019

> Choosing the correct search engine is easy and you will always be happy with your decision

I laughed, but I don't think this is a correct representation of something many programmers genuinely believe. It's worded in such a way that it's clear this is a joke. Not sure if I should read the full list if it's just going to be jokes like this one.

binarymax · on May 29, 2019

Yeah so that one is kinda a niche search engineer joke of the old Solr vs Elasticsearch battle that's been going on in the space for years. Sorry that some of the tongue-in-cheek-ness turned you off, but many of these items resonate closely with those of us in the search/relevance engineering space.

amelius · on May 29, 2019

How many genuinely unique search engines are there really to choose from? (Not counting those based on the same underlying libraries)

binarymax · on May 29, 2019

Quite alot, actually: https://en.wikipedia.org/wiki/List_of_search_engines

alasano · on May 29, 2019

Adding some shameless self promotion :)

https://www.coveo.com

burtonator · on May 29, 2019

My favorite is "languages don't matter and I can just throw text in there"

jackconnor · on May 29, 2019

"Once setup, search will work the same way forever" - I don't know a single programmer who believes this about any software.

rq1 · on May 29, 2019

> Regular Expressions have minimal performance impact

REs and FSMs equivalent.

afturner · on May 29, 2019

for real? Is RegEx actually a FSM behind the scenes? or are you trying to say something else

dexen · on May 29, 2019

Yes and no. The theoretical "regular expressions" are indeed Type-3 grammars in Chomsky's hierarchy.

In practice, the common "RegEx" implementation implement a lot of extras, that break the theoretical backing, and also exhibit highly non-linear behaviors. Cf. this excellent paper by Russ Cox: https://swtch.com/~rsc/regexp/regexp1.html

rq1 · on May 29, 2019

Thank you for this reference!

afturner · on May 29, 2019

also thank you for this. I really need to study Chomsky's UG stuff.

jkern · on May 29, 2019

Ha. I like the way you think!

pdpi · on May 29, 2019

Textbook regular expressions correspond precisely to DFAs, so they’re definitely a type of FSM.

Most Regexp implementations in the wild are more powerful than textbook regexps, so they not only encode all languages accepted by DFAs, but can also encode other languages. E.g. back-references are not a feature of regular languages.

jldugger · on May 29, 2019

Depends on the implementation, but I believe grep actually builds a Finite State Machine from its input regular expression. More complicated (non 'regular' regex) engines don't use this approach, but in theory the two are equivalent.

This equivalence is one of the fundamental findings of CS, and exposure to this concept is pretty much mandatory for acquiring a degree in the field. Sadly, this perspective is not often shared in the bootcamps and autodidacts, even though it's moderately documented in https://en.wikipedia.org/wiki/Regular_expression#Deciding_eq...

But the more mindblowing aspect is that you can use nondeterministic Finite Automata for the same purpose.

jkern · on May 29, 2019

Yep. Take a look at https://swtch.com/~rsc/regexp/regexp1.html

afturner · on May 29, 2019

thank you very much.

rq1 · on May 29, 2019

My bad I forgot the verb, I meant “are equivalent”.

Edit: I don’t understand the downvotes though.

rdgthree · on May 29, 2019

softwaredoug · on May 29, 2019

I think the article implies you're a programmer implementing search, not that you're taking an off-the-shelf system and just plugging it in.

Just like "falsehoods programmers believe about websites" wouldn't make sense if you were using Wix...

afturner · on May 29, 2019

Algolia looks good, but are there any OSS alternatives for those of us trying to bootstrap a search system

mftrhu · on May 29, 2019

For a blog/static website, Tipue Search [1], or maybe Datasette [2]. There are Pelican [3]/Jekyll [4] plugins for the former.

[1] http://www.tipue.com/search/

[2] https://24ways.org/2018/fast-autocomplete-search-for-your-we...

[3] https://github.com/getpelican/pelican-plugins/tree/master/ti...

[4] https://github.com/jekylltools/jekyll-tipue-search

ummonk · on May 29, 2019

Pretty much none of these are things programmers believe about search.

Putting limited effort into creating a mediocre search feature doesn't mean that you believe these falsehoods; it just means that you're too resource constrained to put serious investment into creating and improving a high quality search feature.