In the tweet Jeff Dean says that Cheng at al. failed to follow the steps required to replicate the work of the Google researchers.
Specifically:
> In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article), robbing our learning-based method of its ability to learn from other chip designs
But in the Circuit Training Google repo[1] they specifically say:
> Our results training from scratch are comparable or better than the reported results in the paper (on page 22) which used fine-tuning from a pre-trained model.
I may be misunderstanding something here, but which one is it? Did they mess up when they did not pre-train or they followed the "steps" described in the original repo and tried to get a fair reproduction?
Also, the UCSD group had to reverse-engineer several steps to reproduce the results so it seems like the paper's results weren't reproducible by themselves.
Markov’s paper also has links to Google papers from two different sets of authors that shows minimal advantage of pretraining. And given the small number of benchmarks using a pretrained model from Google whose provenance is not known would be counterproductive. Google likely trained it on all available benchmarks to regurgitate the best solutions of commercial tools.
Training from scratch could presumably mean including the new design attempts and old designs mixed in.
So no contradiction: pretrain on old designs then finetune on new design, vs train on everything mixed together throughout. Finetuning can cause catastrophic forgetting. Both could have better performance than not including old designs.
> Did they mess up when they did not pre-train or they followed the "steps" described in the original repo and tried to get a fair reproduction?
The Circuit Training repo was just going through an example. It is common for an open-source repo to describe simple examples for testing / validating your setup --- that does not mean this is how you should get optimal results in general. The confusion may stem from their statement that, in this example, they produced results that were comparable with the pre-trained results in the paper. This is clearly not a general repudiation of pre-training.
If Cheng et al. genuinely felt this was ambiguous, they should have reached out to the corresponding authors. If they ran into some part of the repo they felt they had to "reverse-engineer", they should have asked about that, too.
"These major methodological differences unfortunately invalidate Cheng et al.’s comparisons with and conclusions about our method. If Cheng et al. had reached out to the corresponding authors of the Nature paper[8], we would have gladly helped them to correct these issues prior to publication[9].
[8] Prior to publication of Cheng et al., our last correspondence with any of its authors was in August of 2022 when we reached out to share our new contact information.
[9] In contrast, prior to publishing in Nature, we corresponded extensively with Andrew Kahng, senior author of Cheng et al. and of the prior state of the art (RePlAce), to ensure that we were using the appropriate settings for RePlAce."
That is misleading. The first two authors left Google in August 2022 under unclear circumstances. The code and data were owned by Google, that's probably why Kahng continued discussibg code and data with his Google contacts. He received clear answers from several Google employees, so if they were at fault, Google should apologize rather than blame Cheng and Kahng.
"Prior to publication of Cheng et al., our last correspondence with any of its authors was in August of 2022 when we reached out to share our new contact information."
You don't stop being the corresponding authors of a paper when you change companies,
and whatever "unclear circumstances" you imagine took place when they left, they were also re-hired later, which a company would only do if they were in good standing.
In any case, those "Google contacts" also expressed concerns with how Cheng et al. were doing their study, which they ignored:
3.4 Cheng et al.’s Incorrect Claim of Validation by Google Engineers
Cheng et al. claimed that Google engineers confirmed its technical correctness, but this is untrue. Google engineers (who were not corresponding authors of the Nature paper) merely confirmed that they were able to train from scratch (i.e. no pre-training) on a single test case from the quick start guide in our open-source repository. The quick start guide is of course not a description of how to fully replicate the methodology described in our Nature paper, and is only intended as a first step to confirm that the needed software is installed, that the code has compiled, and that it can successfully run on a single simple test case (Ariane).
In fact, these Google engineers share our concerns and provided constructive feedback, which was not addressed. For example, prior to publication of Cheng et al., through written communication and in several meetings, they raised concerns about the study, including the use of drastically less compute, and failing to tune proxy cost weights to account for a drastically different technology node size.
The Acknowledgements section of Cheng et al. also lists the Nature corresponding authors and implies that they were consulted or even involved, but this is not the case. In fact, the corresponding authors only became aware of this paper after its publication.
Specifically:
> In particular the authors did no pre-training (despite pre-training being mentioned 37 times in our Nature article), robbing our learning-based method of its ability to learn from other chip designs
But in the Circuit Training Google repo[1] they specifically say:
> Our results training from scratch are comparable or better than the reported results in the paper (on page 22) which used fine-tuning from a pre-trained model.
I may be misunderstanding something here, but which one is it? Did they mess up when they did not pre-train or they followed the "steps" described in the original repo and tried to get a fair reproduction?
Also, the UCSD group had to reverse-engineer several steps to reproduce the results so it seems like the paper's results weren't reproducible by themselves.
[1]: https://github.com/google-research/circuit_training/blob/mai...