Say you have one collection for your people (sharded over 100 servers, say) and ...

Say you have one collection for your people (sharded over 100 servers, say) and another one for conferences (also sharded over 100 machines). Then you could hold the primary keys of all conferences a person attended in a JSON list stored in an attribute with the user. A query finding all people with last name "Jones" that have attended a given conference can now be executed efficiently by using a secondary index on the last name of people and performing a key lookup in the conferences collection. The latter only has to talk to one shard, if the conferences collection is sharded by key and can thus be done efficiently. Obviously, one needs a query optimizer that is aware of the distribution of the shards and the shard keys, but this is certainly doable.