Good. I've always thought throwing distributed brute force at a problem is a sign of laziness. You can get 32-core systems with 128GB of RAM for less than $10K, and they will have orders of magnitude faster turn-around time than your average Hadoop or Spark cluster.
Exactly. Compare the inter-core bandwidth and latency between a beefy multicore system and a distributed system, then use that ratio (of whatever your bottleneck is) to multiply how big your comparable distributed system would be. Usually people assume they are compute bound, when in reality they are memory latency/bandwidth bound, and they neglect to do this comparison when speccing out a distributed system.
The point is that it is often not nearly as necessary as it might seem, because instead of doing computations on large chunks of organized data, lots of computers end up doing latency bound tasks.