Websolr's indexes return in under 50ms for queries of average complexity.
The problem with this is that you have to count over a million records, on every search! Unlike most search operations which are IO bound, the counting can be CPU-bound, so sharding on one box will let you take advantage of multiple cores.
Racoonone is "sorting on two dimensions, a geo bounding box, four numeric range filters, one datetime range filter, and a categorical range filter." This should put him in a cpu-bound range (in particular because of the sort).
Websolr has customers on sharded plans, but they are usually used in custom sales cases where we're serving many, many millions of documents. We'll look at adding sharding as an option to our default plans, so that they'll be more accessible for people like raccoonone. In the meantime, if you send an email to firstname.lastname@example.org, we'll try to accomodate use cases like this.
Edit: Also, other possible optimizations include (1) indexing in the same order you will sort on, if you know ahead of time, and (2) using the TimeLimitedCollector.