For example, last I heard google's search index was sharded by document rather than by term.
That sounds odd, since if it was sharded by term, then a given search would only need to go to a handful of servers (one for each term) and then the intermediate result combined. But with it sharded by document, every query has to go to all the nodes in each replica/cluster.
It ends up that's not as bad as it seems. Since everything is in ram, they can answer "no matches" on a given node extremely quickly (using bloom filters mostly in on processor cache or the like). They also only send back truly matching results, rather than intermediates that might be discarded later, saving cluster bandwidth. Lastly it means their search processing is independent of the cluster communication, giving them a lot of flexibility to tweak the code without structural changes to their network, etc.
Does that mean everyone doing search should shard the same way? Probably not. You have to design this stuff carefully and mind the details. Using any given data store is not a silver bullet.
If you shard a search index by term instead, you will end up with duplicate documents stored in each index that contain the same term. And for a large index, you need to scale up your hardware to handle the term or shard by document within that term anyway.
Exactly. Then a flash crowd occurs and your shard fails to service requests.