Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
fabmilo
11 days ago
|
parent
|
context
|
favorite
| on:
Multi-Token Attention
I read the paper and the results don't really convince me that is the case. But the problem still remains of being able to use information from different part of the model without squishing it to a single value with the softmax.
Join us for
AI Startup School
this June 16-17 in San Francisco!
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: