Is it not a feasible fix for Spectre-v1 to have loaded cachelines wait in a staging area, and not actually update the cache hierarchy until the load instruction is retired?
Keep in mind that it's not just the L1 cache you potentially need to worry about, but also all the higher level caches, which are really like remote nodes in a distributed system. Even the last level cache would have to be aware of the current speculation state of all threads in the system, which seems like it would be really expensive.
What's perhaps even worse, you wouldn't be able to hit on speculatively loaded cache lines until they have "committed" (because the resulting change in external bandwidth utilization may be observable from another thread), which would probably kill a lot of the benefits of the cache in the first place.
The scarier part is managing bandwidth contention to shared structures.