I'm not sure AVX2 is as ubiquitous as the README says: "We assume AVX2 support which is available in all recent mainstream x86 processors produced by AMD and Intel."
I guess "mainstream" is somewhat subjective, but some recent Chromebooks have Celeron processors with no AVX2:
"Faster" meaning faster than Chromebooks do now; 2.2 GB/s may simply be unachievable hardware-wise with these cheap processors. They're kinda slow, so any speed increase would be welcome.
This looks mostly applicable to server scenarios where the runtime environment is highly controlled.
Are you talking about state transition penalties that can occur if you forget a vzeroupper? That's the only thing I'm aware of which kind of matches that.