More

mattst88 · 2025-04-25T01:19:55 1745543995

Alpha 21264 is out-of-order.

mattst88 · on May 23, 2024

The judgement (https://www.judiciary.uk/wp-content/uploads/2024/05/COPA-v-W...) very clearly demonstates the truth of Brandolini's Law:

> The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it

Incredible the amount of detail the judgement goes into in documenting how big a fraud Craig Wright is.

mattst88 · on March 24, 2024

> Burnout is your brain giving up because it doesn’t think the benefit is worth the effort anymore. If you don’t have a purpose for what you are doing all your brain will do is obsess over the negatives. Every job in the world has massive negatives. You have to find a positive purpose for what you are doing to balance the negative.

This is spot on and was exactly my experience at my last job.

The one thing I'll add is that for me it wasn't about not having a purpose, it was about not feeling appreciated for all of the time/effort/energy I'd poured into that purpose.

mattst88 · on Jan 18, 2024

Yes, it's still used in the Intel 3D drivers in Mesa (at least).

mattst88 · on Dec 20, 2023

That's right. "Software decoding" means the decoding algorithms run on the CPU. (As opposed to "hardware decoding" which typically means the work is done by some fixed-function video units on e.g. a GPU)

mattst88 · on Dec 20, 2023

> Firefox still has no support for hardware video decode on Linux.

That's not true at all. See:

https://www.omgubuntu.co.uk/2023/07/firefox-115-intel-gpu-vi... https://www.phoronix.com/news/Mozilla-Firefox-115 https://www.omglinux.com/firefox-hardware-acceleration-raspb...

cesarb · on Dec 20, 2023

How can I confirm whether Firefox is actually using hardware decode for videos on my machine for a given codec? Is there a magic keyword I should search for within the about:support page, or is it recorded somewhere else? Mine shows for instance "VP9_HW_DECODE default available", does this means it's using hardware decode for VP9, or merely that it might be possible for my Firefox version?

padenot · on Dec 20, 2023

We've put a dedicated section called Media in about:support, it has decoding capabilities and other things such as audio IO informations.

If you find that it is not accurate, e.g. by cross-checking via other means, please open a ticket at https://bugzilla.mozilla.org/enter_bug.cgi, component "Audio/Video".

cesarb · on Dec 20, 2023

> We've put a dedicated section called Media in about:support, it has decoding capabilities

I see, I hadn't found it when searching for the codec name because it only said something like "Information not available. Try again after playing a video." After opening a random YouTube video and refreshing about:support, it did show the hardware decoding information, and it was as I had expected given the hardware on this computer.

shmerl · on Dec 20, 2023

At least with AMD GPU, an easy way to do it is to check usage of VCN.

Something like this (assuming your GPU index is 0):

    watch -n 1 sudo cat /sys/kernel/debug/dri/0/amdgpu_pm_info

You should see there:

    VCN: Enabled

If GPU accelerated video is being played.

I think you might need to set this flag to true in Firefox's about:config (or it might be not needed anymore):

    media.ffmpeg.vaapi.enabled

mattst88 · on Dec 20, 2023

On an Intel GPU, you can check the 'Video' percentage in `sudo intel_gpu_top` (from the igt-gpu-tools package) while playing a video.

dhx · on Dec 20, 2023

Whilst Firefox may support hardware video decoding, Mesa since March 2022 disables patent encumbered codecs by default[1], and distributions such as Fedora and OpenSuse do not explicitly enable these patent encumbered codecs to avoid possible legal problems. Even Gentoo (built from source code by the user) requires the user to explicitly enable a USE flag (proprietary-codes) to use patent encumbered codecs.[2]

The thought process is that AMD, NVIDIA, Intel and the likes are not providing a patent license with their hardware.[3] They are instead just supplying part of an overall system that together with operating system kernel, display manager software, video player software, etc allows the decoding and encoding of patent encumbered video files. Open source software projects and distributions are concerned they'd be found to be infringing patents by enabling a complete solution out-of-the-box. Hence they put some hurdles in place so that a user has to go out of their way to separately piece together the various parts to form a complete system capable of encoding and decoding patent encumbered codecs.

edit: To clarify, if your Intel or AMD GPU/APU supports a patent-free codec such as AV1 (most GPUs/APUs available for sale?), Firefox on a standard Linux distribution will use hardware video decoding out of the box by default for the patent-free codec. The issue is really one of whether you're sourcing content from a provider that uses a good choice of codec like AV1. The good news is that patent trolls are doing a good job of pushing laggard content providers down this path.[4]

[1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15...

[2] https://github.com/gentoo/gentoo/commit/1265a159743d7f07185a...

[3] https://lists.fedoraproject.org/archives/list/devel@lists.fe...

[4] https://news.ycombinator.com/item?id=38249527

account42 · on Dec 20, 2023

Which is of course a bullshit reason because Linux distributions don't ship the patented hardware decoders (because they are hardwared) so the user already has to assemble the complete system.

> a provider that uses a good choice of codec like AV1

Much more limited HW support and slow as molasses encoders. Not exactly a good choice for most.

Sunspark · on Dec 20, 2023

I use the flatpak version of Firefox because my Linux machine is an immutable OS.

Hardware video decode is working for FF.

Not working for Chrome though, sadly.

jdiff · on Dec 20, 2023

My bad, that's what I get for commenting before bed. I meant to tack on an "on Nvidia" in there. AMD and Intel are well supported with VAAPI, but I've never managed to get any of the NV-to-VAAPI wrappers and shims working.

mattst88 · on June 20, 2023

I am a happy owner of a Tigerlake (Intel 11th Gen) Framework laptop. I've considered upgrading to a 12th or 13th Gen motherboard, and while I have no doubt they'd be great for me as a Gentoo developer with the greatly increased core counts, my hesitation is that the new CPUs have AVX-512 disabled.

Maybe this doesn't matter, almost certainly wouldn't for most people, but I'm compiling the whole system myself so the compiler at least has the freedom to use AVX-512 wherever it pleases. Does anyone know if AVX-512 actually makes a difference in workloads that aren't specifically tuned for it?

My guess is that given news like https://www.phoronix.com/news/GCC-AVX-512-Fully-Masked-Vecto... that compilers basically don't do anything interesting with AVX-512 without hand-written code.

mtklein · on June 20, 2023

The promise of the AVX-512 instruction set really was that it would be much easier to (auto-)vectorize code that wasn’t written with vectorization in mind, with tools like masked execution and gather/scatter that either didn’t exist at all before (SSE) or were very minimal (AVX).

The tools are there in the instruction set, but that still leaves the issues of time and effort to implement in compilers, and enough performance improvement on enough machines in some market (browsers, games, etc) capable of running it all before any of this possibility becomes real.

The skylake-xeon/icelake false start here really can’t have helped. It’s still a much more pragmatic thing to target the haswell feature set that all the intel chips and most amd chips can run (and run well).

johnklos · on June 20, 2023

Funny that if you want AVX-512 now, it's AMD that's offering it and Intel that isn't.

Sometimes the second comer to a game has the advantage of taking their time to implement something, with fewer compromises and a better overall fit.

jeffbee · on June 20, 2023

The compiler will only choose to use AVX-512 if you give it the right `-m` flags. Most people who are running generic distros that target the basic k8 instructions benefit from AVX-512 only when some library has runtime dispatch that detects the presence of the feature and enables optimized routines. This is common in, for example, cryptography libraries.

mattst88 · on June 20, 2023

Right. Since I'm using Gentoo and compiling my whole system with `-march=tigerlake`, the compiler is free to use AVX-512.

My question is just... does it? (And does it use AVX-512 profitably?)

nwallin · on June 20, 2023

It will not use AVX-512 if you have CFLAGS="-march=tigerlake -O2". You will, at the very least, need CFLAGS="-march=tigerlake -O3" to get it to actually use AVX2, and tigerlake's AVX512 implementation is so poor (clock throttling etc) that gcc will not use AVX-512 on tigerlake. AVX-512 is used if you have -march=znver4 though, so the support for autovectorizing to AVX-512 is clearly there.

https://godbolt.org/z/1a39Mf3bv

aseipp · on June 20, 2023

Is it actually that bad on Tiger Lake? Or just for really high-width vectors? On my old Ice Lake laptop, single-core AVX-512 workloads do not decrease frequency at all even with wider registers, and multi-core workloads will result in clock speed degradation of a small amount, maybe 100Mhz or so.

Depends on a couple factors (i.e. Ice Lake client only has 1 FMA unit) but I'd be surprised if Tiger Lake was a major regression relative to Ice Lake. It seems like they had it in an OK spot by then.

secondcoming · on June 20, 2023

In my experience it depends on the compiler. clang seems far more willing to autovectorise than gcc. Also, when writing the code you have to write it in a way that strongly hints to the compiler that it can be autovectorised. So lots of handholding.

jeffbee · on June 20, 2023

I guess a better question is why you rebuild the system without a rational basis to expect benefits.

mattst88 · on June 20, 2023

Are you familiar with source-based distributions?

I'm not rebuilding specifically for this one potential optimization.

oconnor663 · on June 20, 2023

Why not use -march=native?

inopinatus · on June 20, 2023

Surprisingly, -march=native doesn’t always expand to the locally optimal build flags we might expect, particularly with gcc on non-Linux platforms.

oconnor663 · on June 20, 2023

Oh interesting. Is this one of those things where backwards compatibility eventually got in the way of the intended purpose?

mattst88 · on June 20, 2023

I actually do. I just said -march=tigerlake to make it clear what CPU family the compiler was targeting.

aew4ytasghe5 · on June 20, 2023

Why not use -march=snark?

PragmaticPulp · on June 20, 2023

> I've considered upgrading to a 12th or 13th Gen motherboard, and while I have no doubt they'd be great for me as a Gentoo developer with the greatly increased core counts, my hesitation is that the new CPUs have AVX-512 disabled.

Unless you have a very specific AVX-512 workload or you need to run AVX-512 code for local testing, you won’t see any net benefit of keeping your older AVX-512 part.

Newer parts will have higher clock speed and better performance that will benefit you everywhere. Skipping that for the possibility of maybe having some workload in the future where AVX-512 might help is a net loss.

adrian_b · on June 20, 2023

Now you may choose a new AMD Phoenix-based laptop, with great AVX-512 support (e.g. with Ryzen 7 7840HS or Ryzen 9 7940HS or Ryzen 7 7840U).

AMD Phoenix is far better than any current Intel mobile CPU anyway, so it is an easy choice (and it compiles code much faster than Intel Raptor Lake, which counts for a Gentoo user or developer).

The only reason to not choose an AMD Phoenix for an upgrade would be to wait for an Intel Meteor Lake a.k.a. Intel Core Ultra. Meteor Lake will be faster in single-thread (the relative performance in multi-thread is unknown) and it will have a bigger GPU (with 1024 FP32 ALUs vs. 768 for AMD).

However, Meteor Lake will not have AVX-512 support.

For compiling code, the AVX-512 support should not matter, but it should matter a lot for the code generated by the compiler, as it enables the efficient auto-vectorization of many loops that cannot be vectorized efficiently with AVX2.

While gcc and clang will never be as smart as hand-written code, their automatic use of AVX-512 can be improved a lot and announcements like that linked by you show progress in this direction.

causality0 · on June 20, 2023

Does anyone know if AVX-512 actually makes a difference in workloads that aren't specifically tuned for it?

I know game console emulators use it to great effect with significant performance increases.

jsheard · on June 20, 2023

Incidentally that's another case where the 512bit-ness is the least interesting part, the new instructions are useful for efficiently emulating ARM NEON (Switch) and Cell SPU (Playstation 3) code but those platforms are themselves only 128bits wide so I don't believe the emulators have any use for the 512bit (or even 256bit?) variants of the AVX512 instructions.

tarnith · on June 20, 2023

I haven't looked into the code for these but are they possibly pipelining multiple ops per clock? If it's not dependency chained they probably calculate a few cycles at once.

aew4ytasghe5 · on June 20, 2023

Specifically RPCS3 had a huge speedup using AVX-512 [1]

1: https://www.tomshardware.com/news/ps3-emulation-i9-12900k-vs...

jsheard · on June 20, 2023

RPCS3 is a big fan of esoteric CPU features, it was also one of the very few applications which used Intels TSX before Intel killed it off.

saagarjha · on June 20, 2023

Game console emulators are of course specifically tuned for this.

Narishma · on June 21, 2023

What other emulators beside rpcs3 use it?

Tuna-Fish · on June 20, 2023

AVX-512 is specifically the first x86 vector extension for which compilers should eventually be able to emit reasonable code. Thanks to gather and masked execution, with AVX-512 vectorizing a simple loop doesn't always mean blowing up code size to 10x.

However, compilers have so far been slow to implement this, with the relevant patches only going into GCC right now.

mattst88 · on June 1, 2023

I don't have enough details to debug, but something I discovered just the other day might be helpful if you're building the kernel. For kernel builds, you have to specify CC= after `make`, not before. E.g. `make -jX CC="distcc alpha-unknown-linux-gnu-gcc"`.

mattst88 · on May 15, 2023

This is very interesting. I'm going to be excited to see what neat things can be done with it.

garganzol · on May 15, 2023

Count me in. Any kind of formalization is empowering, but what practical applications can it have? I'm sure there are quite few of them, but what are they?

mattst88 · on Feb 27, 2023

Signs a couple of different things: most notably they're signals from the catcher to the pitcher suggesting a pitch for him to throw, but also there can be signs from the dugout to a batter telling him to bunt, or to a runner on base telling him to steal, etc. If the other team is able to decipher any of these, they'll have a significant advantage.

To be clear: stealing signs isn't cheating. It's using electronics to steal signs that's cheating.

For example, a base runner on second can see the catcher's signs to the pitcher and relay them to the batter so he knows what the pitch is. This is not cheating.

In 2017, the Astros used electronics to steal signs. They had an employee somewhere beyond the outfield wall with a camera who watched the catcher and relayed his signs electronically to someone in the Astro's clubhouse (behind the dugout) who then banged on a trash can loud enough for the batter to hear. That was cheating. See https://www.youtube.com/watch?v=FUkJeko0QGE

To prevent that sort of thing from happening in the future, MLB last year allowed pitchers and catchers to switch to an electronic system for relaying signs (the catcher has some buttons/keys and the pitcher has an earpiece).