I know that sounds weird (but it's google, they're omnipotent!), but it makes sense: It's worth their while to stem content they crawl and index off the web, cause everybody could in theory access any given page. However, with email, the only person who'll ever benefit is the recipient.
I'm not sure that makes sense - if they add stemming, all users of GMail benefit. Going by your explanation, it wouldn't make sense to add any expensive features to GMail, because the only person who would ever benefit from them is the single user.
The next big hype cycle: OS as an OS.
"Gerald Weinberg tells the story of a programmer who was flown to Detroit to help debug a troubled program. The programmer worked with the team who had developed the program and concluded after several days that the situation was hopeless.
On the flight home, he mulled over the situation and realized what the problem was. By the end of the flight, he had an outline for the new code. He tested the code for several days and was about to return to Detroit when he got a telegram saying that the project was canceled because the program was impossible to write. He headed back to Detroit anyway and convinced the executives that the project could be completed.
Then he had to convince the project's original programmers. They listened to his presentation, and when he'd finished, the creator of the old system asked, "And how long does your program take?"
"That varies, but about ten seconds per input."
"Aha! But my program takes only one second per input." The veteran leaned back, satisfied that he'd stumped the upstart. The other programmers seemed to agree, but the new programmer wasn't intimidated."
"Yes, but your program doesn't work. If mine doesn't have to work, I can make it run instantly."
- _Code Complete_, pp.595-596
It's not "completely broken," but no hits for a query of "zag" in an email that contains "zagg" comes uncomfortably close to "doesn't work." (FWIW, I use gmail and haven't had any huge problems with search, although I have had to do way more work to find something than I would expect given that it's from Google).
Code Complete? really, just read Weinberg
'twas just a quick copy and paste from my fortune file. Although as far as I'm concerned there ain't nothing wrong with Code Complete.
It's ok but relatively mediocre, and in my observation usually indicates a programmer who hasn't sought out better sources.
Javier Kragen Sitaker's article/mail "My Evolution as a Programmer", recounts one coder's his growth as a programmer, career, and exposure to a variety of books throughout. It's really an excellent read, and contains some comparisons between a few good books - particularly Code Complete and The Pracice of Programming. I quote,
During this time, I read "The Practice of Programming", which is a lot like "Code Complete", but shorter and much higher in quality. I had read the same authors' "The Elements of Programming Style" back in 1995, on much the same subjects, but that book is nearly unreadable today --- it's written in PL/1 and FORTRAN IV. TPoP, aside from being written with modern programming languages, also contains insights from several decades more of the authors' experience.
-- The author in question is Brian Kernighan. Anyrate, I leave any interested person to go check the article out, if you haven't seen it already.
If you've been working as a programmer for 12 years it probably won't teach you too many new things, but if you've been working for 12 weeks it is a great book.
(not that its age should detract anything of course, I just found the obvious historic reference amusing).
Hound your incompetent IT department and say you need to search your email and you need Windows Search 4 installed on your machine instead of blaming a 6 year old product and a 8 year old operating system.
I haven't had a problem with Gmail search as described, but I would not classify it as fast. Often I'll type in a simple query and have it spend 10-20 seconds before it returns results. If I perform the same query on the same dataset in Spotlight in Mac OS X, it starts to return results instantly with the search completing within 5-10.
Most of the things in Gmail are done very well but non-exact search really needs improvement.
Reminds me of Louis C K's bit on Conan: http://videogum.com/archives/late-night/the-videogum-louis-c...
I tend to think of the gmail search as being fairly powerful and much faster than my usual mail client (Mail.app), it's one of the few reasons I ever use the gmail web interface.
If anyone is curious, the docs are here http://mail.google.com/support/bin/answer.py?hl=en&answe... (I had never bothered to look them up until now).
But then again, if they can index the web. Why not email.
Supposedly the issue is that they don't perform more computationally-expensive linguistic analysis during the indexing phase. If they tokenize each word but don't perform any stemming or lemmatization, for example, the result would be similar: only full-word non-substring matching.
As others have pointed out, its probably a cost-benefit decision by Google to not spare grid cycles on full-fledged linguistic analysis for individual's email accounts. Google CAN do better at it, as is evidenced by their web search index.
So, yeah. Axes are great. And they're completely unintuitive and impossible to discover in Gmail unless you read the help, which no user ever does. So you've done a great job finding a part of Gmail's search interface that's at least as broken as the underlying implementation.
That's how I found it, and I learned a little more about it by a need to get all mail to/from a specific client for a specific month and doing advanced searches.
Granted, that's also how I learned about "site:", by going to advanced search on Google and specifying a single site to search, the resulting page shows the "site:" in the search box.
> (...) but that's actually a lie, because the output if you dump that into the search field, versus if you click on a label name, don't match if you have more than about 30 messages in that label. Go try it.
Good point, though as far as I can tell the number of results is the only difference... but still, it's a bit misleading.
The output is different because it only shows me 20 rows per page when I search on label:label, versus 100 rows if I click the label, but the results are the same. Were you seeing something more broken that the number of results per page?
FWIW, I've tried it on a few labels, none having fewer than 10,000 messages in them.
He didn't apply the same inputs to both systems, therefore his findings must be discarded.
He changed it up, and had he STARTED at Yahoo and performed the exact same searches, he would yield the same results, only reversed, and his blog post would be about why he dumped Yahoo instead.
Here's my case:
1) He searched Google for "Zags" which is NOT in "Zaggs". So, no results.
2) Then he goes to Yahoo and searches for "Zag" which IS in "Zaggs" - AH HA, he gets a result (of course he does!)
3) He RETURNS to Google and searches for "Zagg" which IS in "Zaggs", and boom, results. Surprised? Me neither.
I guess my point is, I don't have enough karma to have down arrows next to this post, so I'm going to write a cranky comment about this article.
You might want to hold off on your cranky comment. Would you be surprised if the search for 'Zagg' did NOT turn up the results for 'Zaggs'? Because for better or worse, that's what actually happens.
GMail search does not perform stemming (like removing that final 's') and also does not allow for substring searches. So in fact, a search for 'Zagg' will return nothing. While this isn't a fatal flaw, it is a drag.
I don't know how Yahoo do it, but this guy should at least present a solution to the problem. It reminds me of that Alexei Sayle sketch where he says 'I blame the council' (which in the UK is the local authority and handles all sorts of things in a town or city), and at the end, wanting someone to blame for blaming the council? He blames the council!
It's less 'big brother'y, which may explain why poor e-mail search doesn't bother me. It's as though its waiting for me to get the answer first.
That feature's absence is very obvious when struggling to find data in gmail.
Personally I use subject line tags for stuff I want to filter on. (Like 'music' 'todo' 'idea'), and when I store something in gmail I want to remember I make sure the key words I will search for are very obvious (and easy to spell).
The curious thing is that google groups is even more painful; where you would think it would be more worth having the indexes, because more people are going to search the same data.
Which is a shame, since 5 years ago Usenet search was absolutely wonderful.
(or http://git.markmail.org/search/?q=#query:type%3Adevelopment if you want to use the list-specific site).
Now I'm wondering if it wasn't me at all.