Hacker News new | past | comments | ask | show | jobs | submit login

> One key argument in the rebuttal against the ISPD article is that the resources used in their comparison were significantly smaller. To me, this point alone seems sufficient to question the validity of the ISPD work's conclusions. What are your thoughts on this?

I believe this is a fair criticism, and it could be a reason why the ISPD Tensorboard shows divergence during training for some RTL designs. The ISPD authors provide their own justification for their substitution of training time for compute resources in page 11 of their paper (https://arxiv.org/pdf/2302.11014).

I do not think it changes the ISPD work's conclusions however since they demonstrate that CMP and AutoDMP outperform CT wrt QoR and runtime even though they use much fewer compute resources. If more compute resources are used and CT becomes competitive wrt QoR, then it will still lag behind in runtime. Furthermore, Google has not produced evidence that AlphaChip, with their substantial compute resources, outperforms commercial placers (or even AutoDMP). In the recent rebuttal from Google (https://arxiv.org/pdf/2411.10053), the only claim on page 8 says Google VLSI engineers preferred RL over humans and commercial placers on a blind study conducted in 2020. Commercial mixed placers, if configured correctly, have become very good over the past 4 years, so perhaps another blind study is warranted.

> Additionally, I noticed that the neutral tone of this comment is quite a departure from the strongly critical tone of your article

I will openly admit my bias is against the AlphaChip work. I referred to the Nature authors as 'arrogant' and 'disdainful' with respect to their statement that EDA CAD engineers are just being bitter ML-haters when they criticize the AlphaChip work. I referred to Jeff Dean as 'belittling' and 'hostile' and using 'hyperbole' with respect to his statements against Igor Markov, which I think is unbecoming of him. I referred to Shankar as 'excellent' with respect to his shrewd business acumen.




Thank you for your thoughtful response. Acknowledging potential biases openly in a public forum is never easy, and in my view, it adds credibility to your words compared to leaving such matters as implicit insinuations.

That said, on page 8, the paper says that 'standard licensing agreements with commercial vendors prohibit public comparison with their offerings.' Given this inherent limitation, what alternative approach could have been taken to enable a more meaningful comparison between CT and CMP?


So I'm not sure what Google is referring to here. As you can see in the ISPD paper (https://vlsicad.ucsd.edu/Publications/Conferences/396/c396.p...) on page 5, they openly compare Cadence CMP with AutoDMP and other algorithims quantitatively. The only obfuscation is with the proprietary GF12 technology, where they can't provide absolute numbers, but only relative ones. Comparison against commercial tools is actually a common practice in academic EDA CAD papers, although usually the exact tool vendor is obfuscated. CAD tool vendors have actually gotten more permissive about sharing tool data and scripts in public over the past few years. However, PDKs have always been under NDAs and are still very restrictive.

Perhaps the Cadence license agreement signed by a corporation is different than the one signed by a university. In such a case, they could partner with a university. But I doubt their license agreement prevents any public comparison. For example, see the AutoDMP paper from NVIDIA (https://d1qx31qr3h6wln.cloudfront.net/publications/AutoDMP.p...) where on page 7 they openly benchmark their tool against Cadence Innovus. My suspicion is they wish to keep details about the TPU blocks they evaluated under tight wraps.


The UCSD paper says "We thank ... colleagues at Cadence and Synopsys for policy changes that permit our methods and results to be reproducible and sharable in the open, toward advancement of research in the field." This suggests that there may have been policies restricting publication prior to this work. It would be intriguing to see if future research on AlphaChip could receive a similar endorsement or support from these EDA companies.


Cadence in particular has been quite receptive to allowing academics and researchers to benchmark new algorithms against their tools. They have also been quite permissive with letting people publish TCL scripts for their tools (https://github.com/TILOS-AI-Institute/MacroPlacement/tree/ma...) that in theory should enable precise reproduction of results. From my knowledge, Cadence has been very permissive from 2022 onwards, so while Google's objections to publishing data from CMP may have been valid when the Nature paper was published, they are no longer valid today.


We're not just talking about academia—Google's AlphaChip has the potential to disrupt the balance of the EDA industry's duopoly. It seems unlikely that Google could easily secure the policy or license changes necessary to publish direct comparisons in this context.

If publicizing comparisons of CMPs is as permissible as you suggest, have you seen a publication that directly compares a Cadence macro placement tool with a Synopsys tool? If I were the technically superior party, I’d be eager to showcase the fairest possible comparison, complete with transparent benchmarks and tools. In the CPU design space, we often see standardized benchmarking tools like SPEC microbenchmarks and gaming benchmarks. (And IMO that's part of why AMD could disrupt the PC market.) Does the EDA ecosystem support a similarly open culture of benchmarking for commercial tools?


> Does the EDA ecosystem support a similarly open culture of benchmarking for commercial tools?

If only. The comparison in Cheng et al. is the only public comparison with CMP that I can recall, and it is pretty suss that this just so happens to be a very pro-commercial-autoplacer study. (And, Cheng et al. have cited 'licensing agreements' as a reason for not giving out the synthesized netlists necessary to reproduce their results.)

Reminded a bit of Oracle. They likewise used to (and maybe still?) prohibit any benchmarking of their database software against that of another provider. This seems to be a common move for solidifying a strong market position.


I am trying to understand what you mean here by potential to disrupt. AlphaChip addresses one out of hundreds of tasks in chip design. Macro placement is a part of mixed-size placement, which is handled just fine by existing tools, many academic tools, open-source tools, and Nvidia AutoDMP. Even if AlphaChip was commonly accepted as a breakthrough, there is no disruption here. Direct comparisons from the last 3 years show that AlphaChip is worse. Granted, Google is belittling these comparisons, but that's what you'd expect. In any case, evidence is evidence.


> Direct comparisons from the last 3 years show that AlphaChip is worse.

Do you have any evidence to claim this? The whole point of this thread is that the direct comparisons might have been insufficient, and even the author of "The Saga" article who's biased against the AlphaChip work agreed.

> Granted, Google is belittling these comparisons, but that's what you'd expect.

This kind of language doesn't help any position you want to advocate.

About "the potential to disrupt", a potential is a potential. It's an initial work. What I find interesting is that people are so eager to assert that it's a dead-end without sufficient exploration.


> direct comparisons in Cheng

That's the ISPD paper referenced many times in this whole thread.

> Stronger Baselines

Re: "Stronger baselines", the paper "That Chip Has Sailed" says "We provided the committee with one-line scripts that generated significantly better RL results than those reported in Markov et al., outperforming their “stronger” simulated annealing baseline." What is your take on this claim?

As for 'regurgitating,' I don’t think it helps Jeff Dean’s point either. Based on my and vighneshiyer's discussion above, describing the work as "fundamentally flawed" does not seem far-fetched. If Cheng and Kahng do not agree with this, I believe they can publish another invited paper.

On 'belittle,' my main issue was with your follow-up phrase, 'that’s what you’d expect.' It comes across as overly emotional and detracts from the discussion.

Regarding lack of follow-ups (I am aware of), the substantial resources required for this work seem beyond what academia can easily replicate. Additionally, according to "the Saga" article, both non-Jeff Dean authors have left Google until recently, but their Twitter/X/LinkedIn seem to say they came back to Google and seem to have worked on this "Sailing Chip" paper.

Personally, I hope they reignite their efforts on RL in EDA and work toward democratizing their methods so that other researchers can build new systems on their foundation. What are your thoughts? Do you hope they improve and refine their approach in future work, or do you believe there should be no continuation of this line of research?


The point is that the Cheng et al results and paper were shown to Google and apparently okayed by Google points of contact. After this, complaining that Cheng et al didn't ask someone outside Google makes little sense. These far fetched excuses and emotional wording by Jeff Dean leave a big cloud over the Nature work. If he is confident everything is fine, he would not bother.

To clarify "you'd expect" - if Jeff Dean is correct, he'd deny problems and if he's wrong he'd deny problems. So, his response carries little information. Rationally, this should be done by someone else with a track record in chip implementation.


Could you please point out the specific lines you are dissatisfied with? Is it something an additional publication cannot resolve?

Additionally, in case you forgot to answer, what is your wish for the future of this line of research? Do you hope to see it improve the EDA status quo, or would you prefer the work to stop entirely? If it is the latter, I would have no intention of continuing this conversation.


I am referring to direct comparisons in Cheng et al and in Stronger Baselines that everyone is discussing. Let's assume your point about "might have been insufficient". We don't currently have the luxury to be frequentists, as we don't have many academic groups reporting results for running Google code. From the Bayesian perspective, that's the evidence we have.

Maybe you know more such published papers than I do, or you know the reasons why there aren't many. Somehow this lack of follow-up over three years suggests a dead-end.

As for "belittle", how would you describe the scientific term "regurgitating" used by Jeff Dean? Also, the term "fundamentally flawed" in reference to a 2023 paper by two senior professors with serious expertise and track record in the field, that for some reason no other experts in the field criticize? Where was Jeff Dean when that paper was published and reported by the media?

Unless Cheng and Kahng agree with this characterization, Jeff Dean's timing and language are counterproductive. If he ends up being wrong on this, what's the right thing to do?


[flagged]


> EQ

Using a fantasy concept invented by a science journalist doesn't help your posts, you know. Protip: it's just empathy + regular intelligence.


EQ is not a fantasy concept.


yes you figured out what I meant good job




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: