This is something we probably should have spoken directly to in the article because of how popular spark is. The main reason we didn't is that we like Spark and don't feel it needs to be replaced. It doesn't seem to have most of the ecosystem problems we discuss because it's got a company behind it. My understanding of Spark is that it's designed to be used with different storage backends and I'm very curious to see what would happen if we got it talking to pfs. I think it could work very well because Spark's notion of immutable RDDs seems very similar to the way pfs handles snapshots.
Hope this clears a few things up and apologizes if I've mischaracterized Spark here I've only used it a little bit.