"Getting your working set in memory is one of the most difficult things to calculate and plan for with MongoDB."
I am a little bothered when I see a working set described as though it was a property of the user or workload.
A large part of database research has gone into allowing the user or the system to make the working set smaller. An obvious example is an index, which makes the working set smaller if you don't mind a random I/O or two per lookup (of course, that only works for indexable queries).
There are also many operations in a database which try to work within a limited amount of memory, and therefore must have a small working set regardless of the data size. Sort and HashJoin are two examples. HashJoin doesn't tell you what the working set of your data is, you tell it and it works as efficiently as it can in that amount of memory.
And you can design your data layout to have a smaller working set (again, so long as you allow a few disk accesses outside the working set). Normalization and vertical partitioning (i.e. splitting a table up into several tables with fewer columns each) can help here.
So, the "working set" isn't some passive constant that can't be managed.
Isn't one of the defining characteristics of Mongo-like systems that they tend to aggressively favor using RAM to achieve performance? I.e., their strategy is to complete each query as quickly as possible rather than attempt to optimize for fairness of long-running transactions.
OK, it just seemed to me that the stated goal of Mongo-like servers was to take some of our accepted notions of the "working set as it relates to database software" and put them up for fresh discussion.