Hacker Newsnew | comments | ask | jobs | submitlogin
Why Can't Gmail Search? (designbygravity.wordpress.com)
130 points by cschanck 1696 days ago | comments


_pius 1696 days ago | link

Finally, someone else who feels this way! I've never understood why people praise the search in GMail so much.

-----

jsonscripter 1696 days ago | link

Gmail search is fast. On outlook, a desktop mail client, it can take up to an hour to search for a simple term, in my experience.

-----

discojesus 1696 days ago | link

if I may quote some scripture:

"Gerald Weinberg tells the story of a programmer who was flown to Detroit to help debug a troubled program. The programmer worked with the team who had developed the program and concluded after several days that the situation was hopeless.

On the flight home, he mulled over the situation and realized what the problem was. By the end of the flight, he had an outline for the new code. He tested the code for several days and was about to return to Detroit when he got a telegram saying that the project was canceled because the program was impossible to write. He headed back to Detroit anyway and convinced the executives that the project could be completed.

Then he had to convince the project's original programmers. They listened to his presentation, and when he'd finished, the creator of the old system asked, "And how long does your program take?"

"That varies, but about ten seconds per input."

"Aha! But my program takes only one second per input." The veteran leaned back, satisfied that he'd stumped the upstart. The other programmers seemed to agree, but the new programmer wasn't intimidated."

"Yes, but your program doesn't work. If mine doesn't have to work, I can make it run instantly."

- _Code Complete_, pp.595-596

-----

ajg1977 1696 days ago | link

He got a... telegram? Safe to assume this isn't a recent tale :)

(not that its age should detract anything of course, I just found the obvious historic reference amusing).

-----

gruseom 1696 days ago | link

I love that story and have quoted it many times (though, Code Complete? really, just read Weinberg). But it doesn't apply here. Gmail search certainly works. It just doesn't have every feature.

-----

discojesus 1696 days ago | link

Gmail search certainly works. It just doesn't have every feature.

It's not "completely broken," but no hits for a query of "zag" in an email that contains "zagg" comes uncomfortably close to "doesn't work." (FWIW, I use gmail and haven't had any huge problems with search, although I have had to do way more work to find something than I would expect given that it's from Google).

Code Complete? really, just read Weinberg

'twas just a quick copy and paste from my fortune file. Although as far as I'm concerned there ain't nothing wrong with Code Complete.

-----

gruseom 1696 days ago | link

there ain't nothing wrong with Code Complete

It's ok but relatively mediocre, and in my observation usually indicates a programmer who hasn't sought out better sources.

-----

discojesus 1696 days ago | link

in my observation usually indicates a programmer who hasn't sought out better sources.

Those being?

-----

wcarss 1696 days ago | link

I don't really have an opinion to state on the actual discussion at hand, but I figured I'd toss in an old link which contains what some of "those" might be:

http://lists.canonical.org/pipermail/kragen-tol/2007-March/0...

Javier Kragen Sitaker's article/mail "My Evolution as a Programmer", recounts one coder's his growth as a programmer, career, and exposure to a variety of books throughout. It's really an excellent read, and contains some comparisons between a few good books - particularly Code Complete and The Pracice of Programming. I quote,

During this time, I read "The Practice of Programming", which is a lot like "Code Complete", but shorter and much higher in quality. I had read the same authors' "The Elements of Programming Style" back in 1995, on much the same subjects, but that book is nearly unreadable today --- it's written in PL/1 and FORTRAN IV. TPoP, aside from being written with modern programming languages, also contains insights from several decades more of the authors' experience.

-- The author in question is Brian Kernighan. Anyrate, I leave any interested person to go check the article out, if you haven't seen it already.

-----

blasdel 1696 days ago | link

It's a bad sign when Jeff Atwood is your biggest proponent :/

-----

discojesus 1696 days ago | link

I take solace in the fact that people actually read Code Complete - it's not just a bookshelf ornament. Yes, I know, Knuth is God, and his work is a masterpiece - the point is nobody actually reads the friggin thing.

-----

gruseom 1696 days ago | link

There are zillions, obviously. What do you want, a list? I don't know, SICP, CLRS, Knuth? Non-mainstream languages? Functional programming? For that matter, a single classic CS paper that wasn't assigned as homework? Code Complete, its grounding in software-engineering literature notwithstanding, isn't very deep.

-----

dagw 1695 days ago | link

I consider the lack of depth in Code Complete one of its great strength. You couldn't hand Knuth to a newbie programmer expect them to get anything useful out of it, whereas Code Complete will teach him a lot really useful things he can use his entire career.

If you've been working as a programmer for 12 years it probably won't teach you too many new things, but if you've been working for 12 weeks it is a great book.

-----

khafra 1695 days ago | link

In my experience, it's a more reliable heuristic to fault people for what they haven't read than for what they have read. Some of the smartest people I know can and do quote from childrens' books and anime.

-----

mattmaroon 1696 days ago | link

My outlook box is pretty large and it never takes more than 3 or 4 seconds. It's not the greatest search either, but it does handle substrings nicely and automatically. It works most of the time (though I'd still be much happier if it worked all of the time.)

-----

timcederman 1696 days ago | link

Have you used Outlook 2007 with the built-in search indexer? Incredibly quick.

-----

snprbob86 1696 days ago | link

I've been using Outlook 2007 since it launched. It is incredibly quick... when it works. I frequently find that I can be LOOKING AT THE EMAIL I WANT. Then type a phrase that I am staring at, and not have that email appear in the results. Never mind that sometimes it will randomly start taking 60 seconds or more to return results until you restart the program.

-----

jsonscripter 1696 days ago | link

I didn't know. I'm stuck using Outlook 2003 at work :(

-----

TY 1696 days ago | link

Try Xobni search plugin - it's fast and free for the basic version. It will certainly alleviate your search pains with Outlook 2003.

-----

altano 1695 days ago | link

So it sounds like you're using Outlook 2003 on XP and don't have Windows Search (what powers Outlook instant search) installed.

Hound your incompetent IT department and say you need to search your email and you need Windows Search 4 installed on your machine instead of blaming a 6 year old product and a 8 year old operating system.

-----

andrewparker 1696 days ago | link

If you're stuck on Outlook 2003, Lookout is an add-on for '03 that adds blazing fast search. I find it works faster than Xobni.

-----

mistermann 1696 days ago | link

+1 for Lookout...free and works like a charm!

-----

redorb 1696 days ago | link

This whole thread (outlook 2003 being slow, 07 being a lot faster) makes it seem like xobni should have sold for 20mm

-----

stevsonlee 1690 days ago | link

My favourite one is Lookeen! Lookout is old and cannot search for and in docx..xobni is a matter of tase, nothing for me!

-----

rufo 1696 days ago | link

An hour, really? I don't use Outlook, so I don't know if that's right - but I suppose that's why the various Outlook search plugins are so popular.

I haven't had a problem with Gmail search as described, but I would not classify it as fast. Often I'll type in a simple query and have it spend 10-20 seconds before it returns results. If I perform the same query on the same dataset in Spotlight in Mac OS X, it starts to return results instantly with the search completing within 5-10.

-----

shiranaihito 1696 days ago | link

It may be fast, but what you want is results.

-----

jsonscripter 1696 days ago | link

I think some people also want timely results. I certainly do.

-----

endtime 1696 days ago | link

Ideally, yes, but that doesn't appear to be an option. Given the choice between instant and incomplete results, and slightly slower but complete results, I'll take the latter.

-----

TallGuyShort 1696 days ago | link

Considering the difference in speed, it's faster for me to find what I was looking for by sifting through GMail's results, than to wait for Outlook's results to show up at all. Outlook's search essentially renders my whole machine useless until it's done searching.

-----

FooBarWidget 1696 days ago | link

Gmail's search works well. Except when you need substring search or fuzzy search, in which case it absolutely sucks.

Most of the things in Gmail are done very well but non-exact search really needs improvement.

-----

plusbryan 1696 days ago | link

I love Gmail search because I've seen the difference - have you ever tried searching email in Mobile Me? It's like they ignore your query and search a random string.

-----

dasil003 1696 days ago | link

Probably because most other email search sucks even worse. Frankly I think a lot of peoples threshold for saying something "sucks" is too low. Come on, it could be better, but it doesn't totally suck.

Reminds me of Louis C K's bit on Conan: http://videogum.com/archives/late-night/the-videogum-louis-c...

-----

mistermann 1696 days ago | link

In my estimation, not being able to do substring searches in emails, sucks.

-----

einarvollset 1696 days ago | link

I once spoke to someone in the know about this, and the main reason is that it's quite expensive for them to do any reasonable job with stemming, etc.

I know that sounds weird (but it's google, they're omnipotent!), but it makes sense: It's worth their while to stem content they crawl and index off the web, cause everybody could in theory access any given page. However, with email, the only person who'll ever benefit is the recipient.

-----

bd 1696 days ago | link

They could use Gears for some more in-depth local indexing, then it would be you bearing extra computational and storage costs.

-----

bradgessler 1696 days ago | link

That's a really interesting notion. Have any web applications moved indexing to the client-side before with gears and/or js?

-----

bd 1696 days ago | link

MySpace (mentioning cost savings as one of the main reasons):

http://gigaom.com/2008/05/28/myspace-uses-gears-to-grind-dow...

http://developer.myspace.com/community/blogs/devteam/archive...

-----

IsaacSchlueter 1696 days ago | link

You could probably build a gadget that does this, yes?

-----

zngtk4 1696 days ago | link

It would at least make sense to deal with plurals. I can't tell you the number of times I've searched for something using an s (or not) at the end, and failed to find what I was looking for, only to remember later that I should try to add or remove an s from my search term.

-----

DannoHung 1696 days ago | link

Perhaps it would be an option that they could sell to people?

-----

mistermann 1696 days ago | link

Exactly....I have always wondered why there is no google premium...I would gladly pay $100/year to have a collection of certain domains easily removed from my google searches, or pick from a list of my favorite sites for a "site:" search, etc etc.

-----

sp332 1695 days ago | link

Well, you can do at least that last bit fairly easily. http://www.google.com/coop/cse/

-----

Freaky 1696 days ago | link

If it's expensive, then why not reflect that by providing it to people who pay for their accounts?

-----

anurag 1696 days ago | link

> However, with email, the only person who'll ever benefit is the recipient.

I'm not sure that makes sense - if they add stemming, all users of GMail benefit. Going by your explanation, it wouldn't make sense to add any expensive features to GMail, because the only person who would ever benefit from them is the single user.

-----

antipaganda 1696 days ago | link

I think you misunderstand the nature of stemming; the point is that each and every user's inbox would have to be processed individually, and apparently Google doesn't think the overhead is worth the results.

-----

anurag 1696 days ago | link

I understand stemming. I just think the webpage contrast is not a good explanation for why they're not doing it. Building an unstemmed search index per user is also expensive, and helps only the recipient, but they do it because they think the expense is necessary. They stem webpages because they think the expense is necessary. They don't stem mails because they think the expense is not justified, not because the only person who benefits is the recipient.

-----

sachinag 1696 days ago | link

It is also expensive, but it is less expensive than doing that AND stemming. They just decided that stemming was a line where the benefit (add'l users, more use of Gmail, more AdSense revenue, whatever metric) wasn't worth the (development and ongoing processing) costs.

-----

jhancock 1696 days ago | link

This is a drawback to putting everything in the cloud: features will be weighed by the CPU cycles and storage required by the providers. Can't wait til we come full circle and get back to client/server computing ;). I'm only half joking. I actually can't wait until things mature enough so we have a hybrid of both models. Then I can decide just how much stuff I want indexed and also not worry what happens when my cable modem flakes out.

-----

gloob 1696 days ago | link

Can't wait til we come full circle and get back to client/server computing ;).

The next big hype cycle: OS as an OS.

-----

flatline 1696 days ago | link

It seems like they've worked on mitigating this on the web with suggestions for alternate spellings and by displaying related searches. While you can see stemming at work in many Google searches, I'm pretty sure they don't build extensive substring indices on the web end either. For example, I've had searches where a substring returns 0 results and the exact phrase returns a handful.

-----

tvon 1696 days ago | link

The problem seems to be substring searching, which I guess isn't something I've ever tried to use.

I tend to think of the gmail search as being fairly powerful and much faster than my usual mail client (Mail.app), it's one of the few reasons I ever use the gmail web interface.

If anyone is curious, the docs are here http://mail.google.com/support/bin/answer.py?hl=en&answe... (I had never bothered to look them up until now).

-----

ramoq 1696 days ago | link

I actually heard that it's quite costly to index emails for gmail (not from a gmail source, just random web chatter). It makese sense. Most emails are not important, they are just huge amounts of random chatter. I'd imagine indexing emails (full-text) properly would require some effort. The gmail team is probably on a budget :)

But then again, if they can index the web. Why not email.

-----

plusbryan 1696 days ago | link

I'm no expert, but reMail does it on my freakin iPhone - I'm guessing they could do even better on a cluster...

-----

ramoq 1696 days ago | link

Agreed, Google could probably do it better. But imagine maintaining separate indexes for each and every gmail acct and constantly updating those indexes. reMail used to do something similar to that.

-----

metachor 1696 days ago | link

Well, they already do (and have to) maintain separate indexes for each and every gmail account (or else how would you search on the metadata fields like To: and From: in your inbox at all?).

Supposedly the issue is that they don't perform more computationally-expensive linguistic analysis during the indexing phase. If they tokenize each word but don't perform any stemming or lemmatization, for example, the result would be similar: only full-word non-substring matching.

As others have pointed out, its probably a cost-benefit decision by Google to not spare grid cycles on full-fledged linguistic analysis for individual's email accounts. Google CAN do better at it, as is evidenced by their web search index.

-----

dpcan 1696 days ago | link

His experiment fails.

He didn't apply the same inputs to both systems, therefore his findings must be discarded.

He changed it up, and had he STARTED at Yahoo and performed the exact same searches, he would yield the same results, only reversed, and his blog post would be about why he dumped Yahoo instead.

Here's my case:

1) He searched Google for "Zags" which is NOT in "Zaggs". So, no results.

2) Then he goes to Yahoo and searches for "Zag" which IS in "Zaggs" - AH HA, he gets a result (of course he does!)

3) He RETURNS to Google and searches for "Zagg" which IS in "Zaggs", and boom, results. Surprised? Me neither.

I guess my point is, I don't have enough karma to have down arrows next to this post, so I'm going to write a cranky comment about this article.

-----

nkurz 1696 days ago | link

3) He RETURNS to Google and searches for "Zagg" which IS in "Zaggs", and boom, results. Surprised? Me neither.

You might want to hold off on your cranky comment. Would you be surprised if the search for 'Zagg' did NOT turn up the results for 'Zaggs'? Because for better or worse, that's what actually happens.

GMail search does not perform stemming (like removing that final 's') and also does not allow for substring searches. So in fact, a search for 'Zagg' will return nothing. While this isn't a fatal flaw, it is a drag.

-----

beza1e1 1696 days ago | link

He did search for "zag" in Gmail. Right after "zbuds" and before "headset".

-----

dpcan 1696 days ago | link

GAH! All that frustration over nothing. Thank you.

-----

sundae79 1696 days ago | link

Not only that, I also found that if you search by sender name and the sender name is never in any of the subject of any of your emails, it doesnt find it.

-----

tvon 1696 days ago | link

use "from:example.com" or similar... important part being the "from:" bit

-----

gecko 1696 days ago | link

Speaking of which: you and I, as developers, understand search axes, and use them intuitively. But they're utterly opaque for someone who hasn't encountered that interface before--and given that Google itself doesn't support search axes in its main product, I don't think they're an interface someone's likely to learn elsewhere, either. If you're unaware of search axes, you have to click, "Show search options," which is in an extremely tiny font, next to a button that very notably says "Search the Web"--not search my inbox. Furthermore, once you actually discover that, and use it, you still don't know about the axes, because that search box doesn't use them. The only place where you can discover search axes is by clicking on a label, which results in "label:foo" being shown in the search box--but that's actually a lie, because the output if you dump that into the search field, versus if you click on a label name, don't match if you have more than about 30 messages in that label. Go try it.

So, yeah. Axes are great. And they're completely unintuitive and impossible to discover in Gmail unless you read the help, which no user ever does. So you've done a great job finding a part of Gmail's search interface that's at least as broken as the underlying implementation.

-----

tvon 1696 days ago | link

> The only place where you can discover search axes is by clicking on a label, which results in "label:foo" being shown in the search box (...)

That's how I found it, and I learned a little more about it by a need to get all mail to/from a specific client for a specific month and doing advanced searches.

Granted, that's also how I learned about "site:", by going to advanced search on Google and specifying a single site to search, the resulting page shows the "site:" in the search box.

> (...) but that's actually a lie, because the output if you dump that into the search field, versus if you click on a label name, don't match if you have more than about 30 messages in that label. Go try it.

Good point, though as far as I can tell the number of results is the only difference... but still, it's a bit misleading.

-----

tsuraan 1696 days ago | link

"the output if you dump that into the search field, versus if you click on a label name, don't match if you have more than about 30 messages in that label. Go try it."

The output is different because it only shows me 20 rows per page when I search on label:label, versus 100 rows if I click the label, but the results are the same. Were you seeing something more broken that the number of results per page?

FWIW, I've tried it on a few labels, none having fewer than 10,000 messages in them.

-----

gecko 1696 days ago | link

There are many subtle differences: as you noted, the rows per page differs; clicking to select all offers "Select all conversations that match this search," rather than "Select all conversations in <Label>"; the "Remove label <Foo>" button disappears; an Archive button appears; "Move to" disappears, but "Move to Inbox" appears instead; etc. It's not broken in the sense that it doesn't work, but it's broken in the sense that Google's implying an equivalence that is not there.

-----

shrikant 1695 days ago | link

This is just not true.. at least it doesn't seem to be, for me. Do a lot of people face this??

-----

ajb 1696 days ago | link

I have hit this problem too. If you subscribe to the git email list, but only want to look at discussion, not patches, it ought to be possible to filter out the patches with 'subject:patch'. But this doesn't completely work, because quite often the patches have patchv2 patchv3 etc, which doesn't match.

The curious thing is that google groups is even more painful; where you would think it would be more worth having the indexes, because more people are going to search the same data.

-----

sfk 1696 days ago | link

I think Google found that they can make more money by directing Usenet searches to crappy archive sites full of Google ads.

Which is a shame, since 5 years ago Usenet search was absolutely wonderful.

-----

mblakele 1696 days ago | link

I find it easier to let an external service index my list subscriptions:

http://markmail.org/search/?q=list%3Aorg.kernel.vger.git#que...

(or http://git.markmail.org/search/?q=#query:type%3Adevelopment if you want to use the list-specific site).

-----

mildweed 1696 days ago | link

I hope they build more robust searching into their Wave client.

-----

imp 1696 days ago | link

I think I hit the same problem occasionally, but I can usually keep trying different keywords until I find one that works. If he only tried 4 keywords like he writes, then there was probably some word in the email that would have been at least semi-unique. Maybe "earbud" or "order" or something. Then browsing or date-based filtering can usually do the rest. Kind of a pain, but not enough for me to ditch GMail completely.

-----

cubedice 1696 days ago | link

What's interesting is that after reading this article I've realized I always expected e-mail search to be bad. Why? Possibly because, when another entity holds a large portion of my social information, it feigning ignorance about deep technical knowledge of my personal life (which is likely better than my own understanding) makes me feel good.

It's less 'big brother'y, which may explain why poor e-mail search doesn't bother me. It's as though its waiting for me to get the answer first.

-----

euroclydon 1696 days ago | link

That's really clever how he auto-forwards all his gmail messages to yahoo. I wish I had done something like that form the get go. Too late now. Here is a Lifehacker article explaining how to use automate gmail backups with sendmail and cygwin:

http://lifehacker.com/235207/geek-to-live--back-up-gmail-wit...

-----

periferral 1696 days ago | link

it seems to me that this problem isnt just limited to gmail. other than google search itself, I think search in other google products (android included) is lacking when it comes to what I presume to be basic functionality.

-----

zandorg 1696 days ago | link

Typical of someone to glibly say 'where's regex matching?' when it's very advanced, and it's possible Google simply don't have the cpu for making (and searching) a search tree for many megabytes of mail.

I don't know how Yahoo do it, but this guy should at least present a solution to the problem. It reminds me of that Alexei Sayle sketch where he says 'I blame the council' (which in the UK is the local authority and handles all sorts of things in a town or city), and at the end, wanting someone to blame for blaming the council? He blames the council!

-----

justinhj 1696 days ago | link

Good points there. When I search in google I know I can be sloppy with my typing because 90% of the time it's quicker to get it wrong and click the "did you mean?" results, rather than edit my input.

That feature's absence is very obvious when struggling to find data in gmail.

Personally I use subject line tags for stuff I want to filter on. (Like 'music' 'todo' 'idea'), and when I store something in gmail I want to remember I make sure the key words I will search for are very obvious (and easy to spell).

-----

snprbob86 1696 days ago | link

I'm going to keep my fingers crossed for "search as you type" across all Google properties. This has got to happen...someday...right?

-----

Jem 1696 days ago | link

I did a search for a mail I was sure I had in gmail a few weeks ago, and it returned 0 results. I came to the conclusion that I'd simply deleted the mail.

Now I'm wondering if it wasn't me at all.

-----

calcnerd256 1696 days ago | link

If lack of substring search is bad, how much worse is the fact that it sometimes doesn't even return all the results for a correct query. I use labels a lot, and I often use boolean searches to search for exactly which Venn intersection I need. A search for "in:inbox is:unread -label:Triangle" (without the quotes) in my gmail turns up messages labeled Triangle. I have similar other problems and cannot trust gmail's search.

-----

calcnerd256 1696 days ago | link

My solution was to use fetchmail, mb2md, and fgrep on my local server. Now I feel like I can almost trust the setup I have.

-----

slater 1695 days ago | link

I'd suggest he tries using Lotus Notes' built-in "search". If there was ever a more useless function, it's LN's pathetic attempt at searching for stuff. It'll find stuff to every search term you enter! Just nothing that a) contains that search term, and b) resembles anything you wanted to find.

-----

ramoq 1696 days ago | link

Oh yeah, the gap in gmail's search ability does hint at possible startup ideas to solve this problem. However I wonder why some great startups have passed on directly competing in this sphere (reMail being a good example)

-----

andyf 1696 days ago | link

At least gmail filters out spam better than any other mail client and yes it includes Yahoo Mail.

-----

sahaj 1696 days ago | link

just an observation: lot of attacks on gmail today/recently.

-----

moldenke 1696 days ago | link

At least it's better than Google Groups search.

-----




Lists | RSS | Bookmarklet | Guidelines | FAQ | DMCA | News News | Feature Requests | Bugs | Y Combinator | Apply | Library

Search: