Hacker News new | past | comments | ask | show | jobs | submit login

Good catch!

It looks like the main remaining delay cases are integer mul instructions which get a bypass delay no matter what their source, and a few cases like FMAs fed by non-shuffle integer ops (not that common), or non-shuffle integer ops fed by FMA or integer mul (also not that common).

The key part is that shuffles have zero delays in any configuration, as producer or consumer, except when a shuffle feeds an integer mul. That's good because shuffles are very common as inputs to both integer and FP ops.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact