Hacker News new | past | comments | ask | show | jobs | submit login
How to Use AVX512 in Golang (gorse.io)
98 points by signa11 on Jan 23, 2023 | hide | past | favorite | 33 comments

I thought the /r/golang comments on this post were pretty useful[1]. They also introduced me to avo[2], a tool for generating x86 assembly from go that I hadn't seen before. There are some examples listed on the avo github page for generating AVX512 instructions with avo.

1 = https://www.reddit.com/r/golang/comments/10hmh07/how_to_use_...

2 = https://github.com/mmcloughlin/avo

For someone experts assembly, avo is a better tool to write assembly in hand.

Upcoming .NET 8 will get native support for Vector512 (AVX512) in runtime, and you can write platform-independent code today with Vector<T> for length-agnostic algorithms that will automatically take advantage of it. No need to ever write assembly, or some other special syntax - just pure C#.

That is fantastic. Swift in the Mac has something similar thanks to the Accelerate framework. What I’ve seen, however, is that if you are targeting a very specific platform (for example, the M1 family), you get better performance by directly using the intrinsics supported by that platform. In the real world, you are unlikely to actually have such a specific target, so this may not be so important.

I look forward to seeing some benchmarks with .NET - Microsoft needs to support a pretty wide variety of platforms. It will be interesting to see if their implementation is better!

Accelerate is a bit different in being both overall more high level API and specific to Apple (and abstracts away the usage of AMX and ANE too). On the other hand, Vector<T>/Vector128/256/512<T> in .NET is what is 'portable-simd' to Rust except it is not in preview and widely used across standard library where applicable.

As of now, Vector<T> automatically targets AVX2, SSE4.2 and AdvSimd (NEON). Vector256<T> targets AVX2 (for the lack of ARM counterpart) and Vector128<T> targets SSE / AdvSimd respectively.

I am the author and I am really regret about this title. I think a better title should be "How to Use AVX512 in Golang via the C Compiler" :O

The graph at the end is not readable on the dark theme background. The labels are almost the same colour.

Another suggestion for the title: Go, not Golang.

This is a hill I am also willing to die on.

Last time this annoyed me, I did a thorough look at the Go source code to figure out how accurate calling the language "Go" is. Basically, aside from the mailing lists and bug tracker URLs, "golang" only appears in one place; the ppc64 port contributed by IBM. Everywhere else in the code, it refers to itself as "Go".

"Go" matches to many things unrelated to the programming language. Golang is more likely to match specifically the programming language.

Similar to how many people back in the day used "csharp" instead of C# so search engines could actually find things.

It didn’t help that search engines would “helpfully” ignore the sharp symbol (octothorp) and show results for just “c”

Is discarding the primary communications channels for the project a valid decision? In addition, the main website of the project was "golang.org" for over a decade. That domain still redirects to the current homepage at go.dev.

Ruby is at ruby-lang.org but I've never heard anyone call it Rubylang. You can't always get the domain name (or Google Group) you want.

They're out there: https://github.com/topics/rubylang

I get that these people are not in your circle or whatever, but the term 'golang' appears nine times on Go's own case studies mood board: https://go.dev/solutions/#case-studies

Personally I suspect people more readily adopt 'golang' because all the mailing lists used it as the prefix, the domain was golang (as opposed to go-lang), the official subreddit is /r/golang, and the developers have such a habit of prefixing things with 'go' (as in GOPATH, goroutines, etc). I used to believe it was because there was another programming language with the same name, but I never did find examples of its use in the wild, so I'm no longer sure that affected anything.

If faced with a co-worker who uses 'Golang', try pronouncing it 'Gol-ang' at regular intervals, liable to annoy them sufficiently to persuade them to desist ...

They’ll more likely ask if your code is still compiling, since you have so much free time.

For that sort of problem I'd generally use C, so laugh in their face :-)

Cgo is a real option here, or at least would be an interesting comparison. The function is likely called with large enough data to amortize the (few ns) overhead and it's certainly more maintainable.

Another option is GOAMD64=v3 / v4 which will enable AVX2 / AVX512 on the regular Go output, although the compiler does almost no autovectorization compared to gccgo/gollvm.

It is a pity that CGO isn't in the final benchmark. >_<

> March 1, 2023

What? Did a timetraveller write this?

I've been trying to make this clear to every one who whines about "oh no the clock rate goes down a little if you use this":

AVX-512 is VERY fast, time even gets ripped asunder.

Aliens are really sloppy these days. A real human would never do AVX512 in Golang and travel in time.

Or their timezone is +50 days.

Or it's a US/normal date format confusion?

Doesn't seem like it: https://raw.githubusercontent.com/gorse-io/docs/main/src/pos...

Sometimes I write a future date when I expect something to be finished. Probably just something like that and forgot to edit it before publishing (or, maybe they expected it wouldn't be published until that date).

Somewhere in the Oort cloud, perhaps.

Granted I don't know what will happen between now and March 1st, but there are some inaccuracies in this article... is this a GPT product?

- Encoding Machine Code[0] Not sure what "Go assembly provides three instructions for representing binary machine code" is intended to mean, but go supports register moves/actions for Byte(1)/Word(2)/Long(4)/Quad(8)/Octo(16), both aligned and unaligned (depending on architecture/processor). The next section literally references a quad action (`movq`). Note O/Octo and DQ/Double Quad mean the same thing depending on instruction/arch (sigh)

- When you're using specialized features you have other commands for larger register features (eg in AVX512 on AMD64 asm:`MULPS`, goasm:`VMULPS`)

- Redirect Jump Instruction[1], what is the value of this conversion recommendation if this article is recommending an auto converter that should do this for you? Go-asm can support jumping to addresses (routine scoped), or labels.. labels can be in any case, perhaps goat requires only upper case?

- Generate Go Function Definitions[2].. err isn't this the value prop of goat[3], the point of this article, that this can be done for you? Why are `cznic`, and `cc` being introduced here?

- Function Calls[4] The "complete wrapper" won't compile (typos), I think it's supposed to be closer to [5] (in the repo)

- Function Calls[4] the "Note" makes no sense given the code is go only (missing go-asm paste?)

- The summary benchmark[6], demonstrates AVX2 (far more common) is often faster, and doesn't explain what was measured, nor on what architecture, whether it was aligned, whether any steps were taken to improve performance on the architecture. Based on the repo [7], there's no attempt at alignment, if you can't do AVX512, or AVX2, you can still do 64bit/quad operations rather than going straight to 8bit/byte ops.

- The assembly (.s) files seemingly produced by GOAT[8] are far from human readable go-asm files (the comments in the file could be go-asm source code), don't follow the convention of <file>_<arch>.s, and don't support all platforms that C/golang do (although ARM64 & AMD64 is a large market share)

[0]: https://gorse.io/posts/avx512-in-golang.html#encoding-machin... [1]: https://gorse.io/posts/avx512-in-golang.html#redirect-jump-i... [2]: https://gorse.io/posts/avx512-in-golang.html#generate-go-fun... [3]: https://github.com/gorse-io/gorse/tree/master/cmd/goat [4]: https://gorse.io/posts/avx512-in-golang.html#function-calls [5]: https://github.com/gorse-io/gorse/blob/master/base/floats/fl... [6]: https://gorse.io/posts/avx512-in-golang.html#summary [7]: https://github.com/gorse-io/gorse/tree/master/base/floats [8]: https://github.com/gorse-io/gorse/blob/master/base/floats/fl...

Thanks, @gnabgib. Your comment is very insightful and reminds me of my mentor correcting my academic paper. The post introduces the basic idea of using AVX512 in Go by writing C codes. There are mistakes and many details are omitted. A complete example is https://github.com/gorse-io/gorse/tree/master/base/floats

Not really related to golang but: AVX512 is THE sweet spot to go for that for vector instructions as 512bits=64bytes=x86_64 cache line size.

Dumb question but do you have to do anything manually to ensure it’s actually aligned to a single cache line or does it just happen because of the way things work?

Not a dumb question. In theory, yes you want it to be aligned for performance reasons, and you can do this with one of the _aligned_malloc()/aligned_alloc() variants. In reality though, for any non-trivial algorithm, you may not be able to always enforce this.

Usually you alloc some slack memory to be able to re-align on a cache line. But yeah, it seems better if the allocator can cleanely deal with that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact