You touched on what I was driving at in my original comment. There is no doubt whatsoever that if you were able to make a full pass over the corpus with a regular expression you could find docs you can't find on Google's search. But that's obviously not how their search works. They have to make it work at their scale, which dictates the format of the index, which in turn limits the possibilities for query operators. They have to make these design choices so that their product can exist at all.
"Grep the world" is a fine strategy for corpora up to a certain size, and I do wish there was a product that just stored everything I've ever seen and let me run expensive searches on that.
"Grep the world" is a fine strategy for corpora up to a certain size, and I do wish there was a product that just stored everything I've ever seen and let me run expensive searches on that.