To be clear: the webspam team does reserve the right to take manual action to correct spam problems, and we do. That not only helps Google be responsive, it also improves our algorithms because we get use that data to train better algorithms. With Stack Overflow, I especially wanted to see Google tackle this instance with algorithms first.
Sites that are the victims of content cloning have to be very visible and valuable, so maybe a little manual curating could be relevant.
> the Stack Overflow cloners could just make other websites
Not really? The point is not to tag the clones but to tag the original; everything that is not the original and that has copied content is a clone -- its name, domain or country notwithstanding.
the primary input to search engines comes from web crawlers...the idea of "first" when it comes to duplicated content is already difficult to determine, and (I would guess) it would get much much worse in the inevitable arms race if something like this were implemented.
Because we don't understand what's hard, we think you're not really trying, and then we make up evil reasons to explain that.
I believe if people understood better the difficulties of spam fighting they would be more understanding.
Not necessarily. The rate at which Google refreshes its crawl of a site, and how deep it crawl, depend on how often a site updates and its PageRank numbers. If a scraper site updates more often and has higher PR than the sites it's scraping, Google will be more likely to find the content there than at its source. Identifying the scraper copy as canonical because it was encountered first would be wrong.
But I'd like to say one other thing. Why is Google only doing something about web spam now after people have pointed out how bad things have been getting? Has anybody considered creating a small team to just oversee public perceptions of the search results and try to keep on-top of things in the future?