
Why Can't Gmail Search? - cschanck
http://designbygravity.wordpress.com/2009/08/24/why-cant-gmail-search/
======
einarvollset
I once spoke to someone in the know about this, and the main reason is that
it's quite expensive for them to do any reasonable job with stemming, etc.

I know that sounds weird (but it's google, they're omnipotent!), but it makes
sense: It's worth their while to stem content they crawl and index off the
web, cause everybody could in theory access any given page. However, with
email, the only person who'll ever benefit is the recipient.

~~~
bd
They could use Gears for some more in-depth local indexing, then it would be
you bearing extra computational and storage costs.

~~~
bradgessler
That's a really interesting notion. Have any web applications moved indexing
to the client-side before with gears and/or js?

~~~
bd
MySpace (mentioning cost savings as one of the main reasons):

[http://gigaom.com/2008/05/28/myspace-uses-gears-to-grind-
dow...](http://gigaom.com/2008/05/28/myspace-uses-gears-to-grind-down-server-
costs/)

[http://developer.myspace.com/community/blogs/devteam/archive...](http://developer.myspace.com/community/blogs/devteam/archive/2008/05/29/myspace-
mail-search-with-gears-released-google-i-o-presentation-slides.aspx)

------
_pius
Finally, someone else who feels this way! I've never understood why people
praise the search in GMail so much.

~~~
jsonscripter
Gmail search is _fast_. On outlook, a desktop mail client, it can take up to
an hour to search for a simple term, in my experience.

~~~
timcederman
Have you used Outlook 2007 with the built-in search indexer? Incredibly quick.

~~~
jsonscripter
I didn't know. I'm stuck using Outlook 2003 at work :(

~~~
andrewparker
If you're stuck on Outlook 2003, Lookout is an add-on for '03 that adds
blazing fast search. I find it works faster than Xobni.

~~~
mistermann
+1 for Lookout...free and works like a charm!

~~~
redorb
This whole thread (outlook 2003 being slow, 07 being a lot faster) makes it
seem like xobni should have sold for 20mm

------
tvon
The problem seems to be substring searching, which I guess isn't something
I've ever tried to use.

I tend to think of the gmail search as being fairly powerful and much faster
than my usual mail client (Mail.app), it's one of the few reasons I ever use
the gmail web interface.

If anyone is curious, the docs are here
[http://mail.google.com/support/bin/answer.py?hl=en&answe...](http://mail.google.com/support/bin/answer.py?hl=en&answer=7190)
(I had never bothered to look them up until now).

------
ramoq
I actually heard that it's quite costly to index emails for gmail (not from a
gmail source, just random web chatter). It makese sense. Most emails are not
important, they are just huge amounts of random chatter. I'd imagine indexing
emails (full-text) properly would require some effort. The gmail team is
probably on a budget :)

But then again, if they can index the web. Why not email.

~~~
plusbryan
I'm no expert, but reMail does it on my freakin iPhone - I'm guessing they
could do even better on a cluster...

~~~
ramoq
Agreed, Google could probably do it better. But imagine maintaining separate
indexes for each and every gmail acct and constantly updating those indexes.
reMail _used_ to do something similar to that.

~~~
metachor
Well, they already do (and have to) maintain separate indexes for each and
every gmail account (or else how would you search on the metadata fields like
To: and From: in your inbox at all?).

Supposedly the issue is that they don't perform more computationally-expensive
linguistic analysis during the indexing phase. If they tokenize each word but
don't perform any stemming or lemmatization, for example, the result would be
similar: only full-word non-substring matching.

As others have pointed out, its probably a cost-benefit decision by Google to
not spare grid cycles on full-fledged linguistic analysis for individual's
email accounts. Google CAN do better at it, as is evidenced by their web
search index.

------
sundae79
Not only that, I also found that if you search by sender name and the sender
name is never in any of the subject of any of your emails, it doesnt find it.

~~~
tvon
use "from:example.com" or similar... important part being the "from:" bit

~~~
gecko
Speaking of which: you and I, as developers, understand search axes, and use
them intuitively. But they're utterly opaque for someone who hasn't
encountered that interface before--and given that Google itself doesn't
support search axes in its main product, I don't think they're an interface
someone's likely to learn elsewhere, either. If you're unaware of search axes,
you have to click, "Show search options," which is in an _extremely_ tiny
font, next to a button that very notably says "Search the Web"-- _not_ search
my inbox. Furthermore, once you actually discover that, and use it, _you still
don't know about the axes, because that search box doesn't use them_. The only
place where you can discover search axes is by clicking on a label, which
results in "label:foo" being shown in the search box--but that's actually a
lie, because the output if you dump that into the search field, versus if you
click on a label name, don't match if you have more than about 30 messages in
that label. Go try it.

So, yeah. Axes are great. And they're completely unintuitive and impossible to
discover in Gmail unless you read the help, which no user ever does. So you've
done a great job finding a part of Gmail's search interface that's at least as
broken as the underlying implementation.

~~~
tsuraan
"the output if you dump that into the search field, versus if you click on a
label name, don't match if you have more than about 30 messages in that label.
Go try it."

The output is different because it only shows me 20 rows per page when I
search on label:label, versus 100 rows if I click the label, but the results
are the same. Were you seeing something more broken that the number of results
per page?

FWIW, I've tried it on a few labels, none having fewer than 10,000 messages in
them.

~~~
gecko
There are many subtle differences: as you noted, the rows per page differs;
clicking to select all offers "Select all conversations that match this
search," rather than "Select all conversations in <Label>"; the "Remove label
<Foo>" button disappears; an Archive button appears; "Move to" disappears, but
"Move to Inbox" appears instead; etc. It's not broken in the sense that it
doesn't work, but it's broken in the sense that Google's implying an
equivalence that is not there.

------
imp
I think I hit the same problem occasionally, but I can usually keep trying
different keywords until I find one that works. If he only tried 4 keywords
like he writes, then there was probably some word in the email that would have
been at least semi-unique. Maybe "earbud" or "order" or something. Then
browsing or date-based filtering can usually do the rest. Kind of a pain, but
not enough for me to ditch GMail completely.

------
dpcan
His experiment fails.

He didn't apply the same inputs to both systems, therefore his findings must
be discarded.

He changed it up, and had he STARTED at Yahoo and performed the exact same
searches, he would yield the same results, only reversed, and his blog post
would be about why he dumped Yahoo instead.

Here's my case:

1) He searched Google for "Zags" which is NOT in "Zaggs". So, no results.

2) Then he goes to Yahoo and searches for "Zag" which IS in "Zaggs" - AH HA,
he gets a result (of course he does!)

3) He RETURNS to Google and searches for "Zagg" which IS in "Zaggs", and boom,
results. Surprised? Me neither.

I guess my point is, I don't have enough karma to have down arrows next to
this post, so I'm going to write a cranky comment about this article.

~~~
beza1e1
He did search for "zag" in Gmail. Right after "zbuds" and before "headset".

~~~
dpcan
GAH! All that frustration over nothing. Thank you.

------
periferral
it seems to me that this problem isnt just limited to gmail. other than google
search itself, I think search in other google products (android included) is
lacking when it comes to what I presume to be basic functionality.

------
zandorg
Typical of someone to glibly say 'where's regex matching?' when it's very
advanced, and it's possible Google simply don't have the _cpu_ for making (and
searching) a search tree for many megabytes of mail.

I don't know how Yahoo do it, but this guy should at least present a
_solution_ to the problem. It reminds me of that Alexei Sayle sketch where he
says 'I blame the council' (which in the UK is the local authority and handles
all sorts of things in a town or city), and at the end, wanting someone to
blame for blaming the council? He blames the council!

------
cubedice
What's interesting is that after reading this article I've realized I always
expected e-mail search to be bad. Why? Possibly because, when another entity
holds a large portion of my social information, it feigning ignorance about
deep technical knowledge of my personal life (which is likely better than my
own understanding) makes me feel good.

It's less 'big brother'y, which may explain why poor e-mail search doesn't
bother me. It's as though its waiting for me to get the answer first.

------
justinhj
Good points there. When I search in google I know I can be sloppy with my
typing because 90% of the time it's quicker to get it wrong and click the "did
you mean?" results, rather than edit my input.

That feature's absence is very obvious when struggling to find data in gmail.

Personally I use subject line tags for stuff I want to filter on. (Like
'music' 'todo' 'idea'), and when I store something in gmail I want to remember
I make sure the key words I will search for are very obvious (and easy to
spell).

------
ajb
I have hit this problem too. If you subscribe to the git email list, but only
want to look at discussion, not patches, it ought to be possible to filter out
the patches with 'subject:patch'. But this doesn't completely work, because
quite often the patches have patchv2 patchv3 etc, which doesn't match.

The curious thing is that google groups is even more painful; where you would
think it would be more worth having the indexes, because more people are going
to search the same data.

~~~
sfk
I think Google found that they can make more money by directing Usenet
searches to crappy archive sites full of Google ads.

Which is a shame, since 5 years ago Usenet search was absolutely wonderful.

------
mildweed
I hope they build more robust searching into their Wave client.

------
calcnerd256
If lack of substring search is bad, how much worse is the fact that it
sometimes doesn't even return all the results for a correct query. I use
labels a lot, and I often use boolean searches to search for exactly which
Venn intersection I need. A search for "in:inbox is:unread -label:Triangle"
(without the quotes) in my gmail turns up messages labeled Triangle. I have
similar other problems and cannot trust gmail's search.

~~~
calcnerd256
My solution was to use fetchmail, mb2md, and fgrep on my local server. Now I
feel like I can almost trust the setup I have.

------
euroclydon
That's really clever how he auto-forwards all his gmail messages to yahoo. I
wish I had done something like that form the get go. Too late now. Here is a
Lifehacker article explaining how to use automate gmail backups with sendmail
and cygwin:

[http://lifehacker.com/235207/geek-to-live--back-up-gmail-
wit...](http://lifehacker.com/235207/geek-to-live--back-up-gmail-with-
fetchmail)

------
slater
I'd suggest he tries using Lotus Notes' built-in "search". If there was ever a
more useless function, it's LN's pathetic attempt at searching for stuff.
It'll find stuff to every search term you enter! Just nothing that a) contains
that search term, and b) resembles anything you wanted to find.

------
andyf
At least gmail filters out spam better than any other mail client and yes it
includes Yahoo Mail.

------
ramoq
Oh yeah, the gap in gmail's search ability does hint at possible startup ideas
to solve this problem. However I wonder why some great startups have passed on
directly competing in this sphere (reMail being a good example)

------
Jem
I did a search for a mail I was sure I had in gmail a few weeks ago, and it
returned 0 results. I came to the conclusion that I'd simply deleted the mail.

Now I'm wondering if it wasn't me at all.

------
snprbob86
I'm going to keep my fingers crossed for "search as you type" across all
Google properties. This has got to happen...someday...right?

------
moldenke
At least it's better than Google Groups search.

------
sahaj
just an observation: lot of attacks on gmail today/recently.

