Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
reliabilityguy
19 days ago
|
parent
|
context
|
favorite
| on:
Are OpenAI and Anthropic losing money on inference...
It’s not that the fetching is the problem, but serving the data to many cores
at the same time
from a single source.
supersour
19 days ago
[–]
I'm not familiar with GPU architecture, is there not a shared L2/L3 data cache from which this data would be shared?
reliabilityguy
18 days ago
|
parent
[–]
MMU has a finite amount of ports that drive the data to the consumers. An extreme case: all 32 cores want the same piece of data at the same time.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: