
A new, undocumented INT1 tensor core instruction in Nvidia Ampere - lmh
https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/arch/mma_sm80.h#L2006
======
lmh
For the hardware and architecture nerds, CUTLASS now appears to include
support for an INT1 (binary) AND-popcount matrix multiplication, in addition
to the XOR-popcount present in Turing.

