
ARM64 Popcount in Golang and Assembler - fanf2
https://barakmich.dev/posts/popcnt-arm64-go-asm/
======
waynecochran
FWIW, I learned a little bit about how to check for supported features in
x86_64 from the linked code (I added the comments):

    
    
        TEXT ·hasPOPCNT(SB),NOSPLIT,$0
         XORQ AX, AX     ; AX <- 0
         INCL AX         ; AX <- 1
         CPUID           ; check cpu caps
         SHRQ $23, CX    ; shift bit 23 to bit 0 (bit 23 => POPCNT support)
         ANDQ $1, CX     ; examine bit 0
         MOVB CX, ret+0(FP)  ; return 1 iff POPCNT instruction supported
         RET
    

It seems Intel CPU's have such a bloated and varied instruction set you have
to query to see what flavor of CPU you have to create portable code. Writing
compilers for this must be a real pain.

Full disclosure : I work for Intel

~~~
loeg
> It seems Intel CPU's have such a bloated and varied instruction set you have
> to query to see what flavor of CPU you have to create portable code. Writing
> compilers for this must be a real pain.

This is equally applicable to other x86 vendors.

The alternatives are either (a) never introducing new instructions, or (b)
making portable binaries impossible. Neither alternative seems better? It
doesn't look quite as tortured in C:

    
    
      bool has_popcnt() {
        unsigned regs[4];
        cpuid(1, regs);
        // You could name 23 CPUID_POPCOUNT_SHIFT or something.
        return ((regs[2] >> 23) & 1);
      }
    

In normal C compilers (GCC, Clang) you can use ifuncs to do this resolution
once at module load time, and then the correct function for the current CPU is
dispatched cheaply afterwards.

~~~
waynecochran
Yes, adding features to a CPU is not a vendor specific problem. I wonder if
this can be handled at the link / binary loader level so it is more
transparent to the programmer. Almost like a patch using diff. You have a base
binary that runs on any machine, but there are added diffs (added by the
static linker) for different CPU versions and the OS loader patches and loads
the code appropriately.

~~~
loeg
[https://gcc.gnu.org/wiki/FunctionMultiVersioning](https://gcc.gnu.org/wiki/FunctionMultiVersioning)
helps make this somewhat ergonomic, although it's still manual.

------
cesarb
> Take whichever side you want in the Intel vs GNU syntax debate, creating a
> third option means relearning all the quirks from scratch, and ignoring any
> documentation that already exists.

Isn't the "Intel vs GNU syntax debate" only on x86? For ARM, AFAIK there's
only one assembly syntax.

~~~
barbegal
ARM assembly and GNU assembly for ARM are subtly different in terms of syntax.
I think the labels and indenting is different but everything else is broadly
the same. I can happily read either.

------
mikece
Slight tangent, but writing custom assembler code to make this go faster on
the ARM chip makes me wonder how much optimization the Apple team has put into
the Swift compiler for the A-series, work that has been going on for years for
iOS. I suspect the answer is "a lot more than we realize" and it would be
interesting to see this Golang comparison re-run on macOS/ARM (when available)
and then compared to the same program written in Swift.

~~~
donarb
Apple has (or will be) submitting ARM patches for the Go project.

[https://twitter.com/wongmjane/status/1275177255681982464?lan...](https://twitter.com/wongmjane/status/1275177255681982464?lang=en)

------
majke
Compulsory popcount resources from Wojciech Mula [http://0x80.pl/articles/sse-
popcount.html](http://0x80.pl/articles/sse-popcount.html)

~~~
fanf2
Nice link, thanks!

My interest in popcount is for radix trie node compression
[https://dotat.at/prog/qp/](https://dotat.at/prog/qp/) so these bulk popcount
hacks are interesting but not directly useful for my purposes :-)

------
RenThraysk
Erm, math/bits package OnesCount() family of functions?

[https://golang.org/pkg/math/bits/#OnesCount](https://golang.org/pkg/math/bits/#OnesCount)

go compiler output...

0x0020 00032 ($GOROOT/src/math/bits/bits.go:115) FMOVD R0, F0

0x0024 00036 ($GOROOT/src/math/bits/bits.go:115) VCNT V0.B8, V0.B8

0x0028 00040 ($GOROOT/src/math/bits/bits.go:115) VUADDLV V0.B8, V0

------
pantalaimon
I would have expected the Go compiler to emit the POPCNT instruction on it's
own. Most current C compilers will even detect if you implement a popcount in
software and replace it with a single POPCNT instruction - why does Go not do
something like that as well?

~~~
loeg
Go's compiler is not particularly sophisticated.

~~~
saagarjha
(By design.)

------
innocenat
Wow. This is the first time I have seen Go Assembler Syntax, and it confused
me at first, as I thought it was the standard Intel syntax.

------
eqvinox

      __builtin_popcount()
    

... which gcc and clang will replace with best available.

'nuff said.

Edit: or OnesCount() in Go.

------
fulafel
Any clues on what they author is using it for?

~~~
StringyBob
There’s a few potential uses these days - more than just the famous crypto
folklore one! Some examples on
[https://vaibhavsagar.com/blog/2019/09/08/popcount/](https://vaibhavsagar.com/blog/2019/09/08/popcount/)

