Hacker News new | past | comments | ask | show | jobs | submit login

I mean 1/4 isn't hideable with "proper instruction scheduling" if you are doing a multiplication-heavy benchmark. No amount of scheduling gymnastics is going to get you more than 1 MULH per 4 cycles.

Since the wyhash function needs 2 64x64->128 multiplication, you'll need two high and two low muls on ARM, so this is pretty much a dense multiplication benchmark. No amount of scheduling can save you.

Still by my calculation that should only put the ARM chip at a 5x disadvantage in multiplication throughput, but it was nearly 18x slower. Frequency difference probably explains some of that, but not all.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: