You're saying that if the other methods were given the equivalent amount of compute they might be able to perform as well as AlphaChip? Or at least that the comparison would be fairer?
Existing mixed-placement algorithms depend on hyperparameters, heuristics, and initial states / randomness. If afforded more compute resources, they can explore a much wider space and in theory come up with better solutions. Some algorithms like simulated annealing are easy to modify to exploit arbitrarily more compute resources. Indeed, I believe the comparison of AlphaChip to alternatives would be fairer if compute resources and allowed runtime were matched.
In fact, existing algorithms such as naive simulated annealing can be easily augmented with ML (e.g. using state embeddings to optimize hyperparameters for a given problem instance, or using a regression model to fine-tune proxy costs to better correlate with final QoR). Indeed, I strongly suspect commercial CAD software is already applying ML in many ways for mixed-placement and other CAD algorithms. The criticism against AlphaChip isn't about rejecting any application of ML to EDA CAD algorithms, but rather the particular formulation they used and objections to their reported results / comparisons.
That sounds like future work for simulated annealing fans to engage in, quite honestly, rather than something that needs to be addressed immediately in a paper proposing an alternative method. The proposed method accomplished what it set out to do, surpassing current methods; others are free to explore different hyperparameters to surpass the quality again... This is, ultimately, why we build benchmark tasks: if you want to prove you know how to do it better, one is free to just go do it better instead of whining about what the competition did or didn't try on one's behalf.
Yes, they are. The other approaches usually look like simulated annealing, which has several hyperparameters that control how much computing is used and improve results with more compute usage.
I understand and have read the article. Running 80 experiments with a crude form of simulated annealing is at most 0.0000000001% of the effort that has been spent on making that kind of hill climb work well by traditional EDA vendors. That is also an in-sample comparison, where I would believe the Google thing pre-trained on Google chips would do well, while it might have a harder time with a chip designed by a third party (further from its pre-training).
The modern versions of that hill climb also use some RL (placing and routing chips is sort of like a game), but not in the way Jeff Dean wants it to be done.
The comparison in that paper was very much not fair to Google's method. Google's original published comparison to simulated annealing is not fair to simulated annealing methods. That is, unfortunately, part of the game of publication when you want to publish a marginal result.
It is possible that the pre-training step may overfit to a particular class of chips or may fail to converge given a general sample of chip designs. That would make the pre-training step unable to be used in the setting of a commercial EDA tool. The people who do know this are the people at EDA companies who are smart and not arrogant and who benchmarked this stuff before deciding not to adopt it.
If you want to make a good-faith assumption (that IMO is unwarranted given the rest of the paper), the people trying to replicate Google's paper may have done a pre-training step that failed to converge, and then didn't report it. That failure to converge could be due to ineptitude, but it could be due to data quality, too.
The Google internal paper by Chatterjee and the Cheng et al paper from UCSD made such comparisons with Simulated Annealing. The annealer in the Nature paper was easy to improve. When given the same time budget, the improved annealer produced better solutions than AlphaChip. When you give both more time, SA remains ahead. Just read published papers.
The UCSD paper didn't run the Nature method correctly, so I don't see how you can draw this conclusion.
From Jeff's tweet:
"In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article), robbing our learning-based method of its ability to learn from other chip designs, then used 20X less compute and did not train to convergence, preventing our method from fully learning even on the chip design being placed."
As for Chatterjee's paper, "We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline. We still do not know how Markov and his collaborators produced the numbers in their paper."
Are the other methods scalable in that way?