It’s not that the fetching is the problem, but serving the data to many cores *a... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		reliabilityguy 19 days ago \| parent \| context \| favorite \| on: Are OpenAI and Anthropic losing money on inference... It’s not that the fetching is the problem, but serving the data to many cores at the same time from a single source.

supersour 19 days ago [–]

I'm not familiar with GPU architecture, is there not a shared L2/L3 data cache from which this data would be shared?

reliabilityguy 18 days ago | [–]

MMU has a finite amount of ports that drive the data to the consumers. An extreme case: all 32 cores want the same piece of data at the same time.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact