Hacker News new | past | comments | ask | show | jobs | submit login

Optimizing the leftovers loop to

    #pragma clang loop vectorize(enable)
    #pragma clang loop interleave(enable)
    for (; offset < length; offset += 4) {
        const auto x = ((uint32_t\*)start)[offset / 4];
        count += ((x & 0xFF) == 0x7F);
        count += ((x & 0xFF00) == 0x7F00);
        count += ((x & 0xFF0000) == 0x7F0000);
        count += ((x & 0xFF000000) == 0x7F000000);
    }
also gives some points. It'd probably be more if I could be bothered to break apart your assembly. :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: