More

Tpt · 2026-03-08T17:38:31 1772991511

I guess it's mostly because cpython has a fairly good C API allowing PyO3 to just write "safe" wrappers on top of cpython APIs and provide macros to generate boilerplate whereas wasm-bindgen has to generate both Rust and JS sides and deal with the painful linear memory intermediate.

Tpt · 2026-01-14T13:22:46 1768396966

Igalia is a quite specific "consulting" agency that employs developers and get contracts from client to implement specific features or fix specific bugs, usually around FOSS. They have people who knows how to contribute to Firefox, Chrome, WebKit, Linux, Mesa... It's the go to company if you want to get something done in these projects when not having the resources for that in house.

For example they work for Valve to make the Radeon drivers better and got a grant to get basic MathML support done in the three major web browsers.

Tpt · 2025-12-04T11:24:20 1764847460

It's amazing to see MathML moving forward. As a european, I really like this usage of my taxes euros.

Tpt · 2025-07-22T16:58:04 1753203484

UV for packaging and dependency management, Ruff for linting, Mypy for type checking (will be likely replaced by Ty when ready) and whatever editor people like (PyCharm, VSCode, Helix...)

Tpt · 2025-07-17T06:42:42 1752734562

A lot of large castle owners in France have setup a small apartment on the side for winter and use the main rooms only occasionally and during summer. This is not something new, in Versailles you can visit the king actual private bedroom that is much smaller and easy to heat than the official one.

Tpt · 2025-06-14T13:39:15 1749908355

There is slightly more information on Helsing website: https://helsing.ai/centaur

Tpt · 2025-04-19T07:24:12 1745047452

No, it's the first day of the holy week

Tpt · 2025-04-10T15:56:34 1744300594

There is now libgccjit that aims at allowing to embed gcc https://gcc.gnu.org/onlinedocs/jit/

There is an alternative backend to rustc that relies on it.

aengelke · 2025-04-10T16:38:04 1744303084

libgccjit is, despite its name, just another front-end for GIMPLE. The JIT-part is realized through compiling the object file to a shared library and using dlopen on this.

One big problem with libgccjit, despite its fairly bad compile-time performance, is that it's GPL-licensed and thereby makes the entire application GPL, which makes it impossible to use not just in proprietary use-cases but also in cases where incompatible licenses are involved.

Tpt · 2025-03-24T05:39:53 1742794793

If I understand correctly, this library provides some Torch kernels customized for AMD hardware. Why haven't they just upstreamed them to PyTorch for better adoption? Also, they seem to demo usage with Torch default eager execution mode and not Torch JIT/TorchScript. Is this library compatible with TorchScript?

microtonal · 2025-03-24T08:03:44 1742803424

I think a lot of stuff will get upstreamed eventually. PyTorch just moves slower and since it’s a stable library, I think it cannot rapidly adopt something like fused MoE until the dust has settled a little and it’s clear what the API would look like long-term.

I think it’s ok that stuff is tried first in Torch extensions. That’s how Flash Attention started after all and the same is true for newer kernels in CUDA-land (fused MoE, MLA, Marlin, etc.).

With regards to TorchScript, that’s really legacy - torch.compile is where it’s at. This post seems to suggest that the kernels work with torch.compile: https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR...

barrenko · 2025-03-24T07:07:00 1742800020

I really do not understand why can't they just work with existing OSS developers pulling their hair out trying to make AMD devices work and instead do it this way. It's like Mozilla with the questionable decisions.

roenxi · 2025-03-24T08:09:36 1742803776

There are a lot of OSS developers, I doubt AMD has the resources to do that. And realistically they don't need to, I wandered over to watch some George Hotz videos the other day and it looked like the AMD driver situation has improved to the point where specialist AMD access isn't needed to debug any more. Which is a huge change and very exciting for me personally because it means I might be able to jump back to an AMD card and ditch the mess that is Nvidia on Linux.

In theory they might not even need to be involved in optimising compute kernels, there is probably some PhD student who'll do the work because they want to be a kernel-optimising specialist. In practice a few strategic applications of paid talent is all they really need to do. Everyone wants to diversify off Nvidia so there is a lot of interest in supporting AMD if they are willing to push out firmware that multiplies matrices without crashing. Which has been a weird sticking point for AMD for a surprising amount of time.

impossiblefork · 2025-03-24T22:40:47 1742856047

There's only one Pytorch though, and it's what people are using for ML nowadays.

Back in the day you had to optimize your card for Quake, do everything to make it run well. Now you have to do that for Pytorch.

roenxi · 2025-03-25T00:31:22 1742862682

> Back in the day you had to optimize your card for Quake...

That is exactly the attitude that got AMD out in the cold away from the AI revolution; they learned a lot of stupid lessons about optimising to specific games and present-day use cases instead of trying to implement general capabilities to a higher standard like Nvidia did in CUDA. They ended up a decade away from a multi-trillion dollar market

PyTorch might be special. I wouldn't be at all surprised if AMD does have a dedicated engineer working on PyTorch. But their problem to date hasn't been that their engagement with PyTorch, but rather that literally nobody could make PyTorch work on AMD cards which had buggy and terrible support for GPGPU work. If they fixed that some random might do the work without their involvement because a lot of people want to see that happen.

impossiblefork · 2025-03-25T08:02:46 1742889766

Now that the required task is known though, it doesn't really matter. If AMD understand that, they should have no problem putting engineers on making Pytorch work well.

Considering its importance, it shouldn't be one engineer. It should be 50+.

fock · 2025-03-24T07:28:55 1742801335

I think they are taken over by exactly the same people leading the AI-hype. Funny how in this article they are a) not advertising clearly what they are doing, b) solving a small subset of problems in a way noone asked for (I think most people just want ROCm to work at all...) and c) just adding to a complex product without any consideration of actually integrating with its environment.

I guess it's vibecoding "AI"...

microtonal · 2025-03-24T07:56:49 1742803009

solving a small subset of problems in a way noone asked for

What do you mean? Having ROCm fused MoE and MLA kernels as a counterpart to kernels for CUDA is very useful. AMD needs to provide this if they want to keep AMD accelerators competitive with new models.

fock · 2025-03-24T12:27:22 1742819242

should the matrix-multiplication at the core of this not be in a core library? Why are generic layers intermixed with LLM-specific kernels when the generic layers are duplicating functionality in torch?

Upstreaming that might actually help researchers doing new stuff vs. the narrow demographic of people speeding LLMs on MI300X's.

imtringued · 2025-03-24T09:04:38 1742807078

They are imitating Nvidia's TensorRT with AITER. Basically AMD wants to have "CUDA, but not CUDA".

tdullien · 2025-03-24T12:40:09 1742820009

They'd like to have CUDA, period, but are legally barred from it.

almostgotcaught · 2025-03-24T14:58:17 1742828297

> They are imitating Nvidia's TensorRT

Do you know what the RT in TensorRT stands for? hint: AITER has nothing to do with TensorRT.

fc417fc802 · 2025-03-24T07:53:17 1742802797

> I think most people just want ROCm to work at all

I think most people don't want to have to think about vendor lock-in related bullshit. Most people just want their model to run on whatever hardware they happen to have available, don't want to have to worry about whether or not future hardware purchases will be compatible, and don't want to have to rewrite everything in a different framework.

Most people fundamentally don't care about ROCm or CUDA or OneAPI or whatever else beyond a means to an end.

hoomanmo · 2025-03-24T07:22:49 1742800969

which Mozilla's questionable decisions are you referring to?

kouteiheika · 2025-03-24T08:50:57 1742806257

> Why haven't they just upstreamed them to PyTorch for better adoption?

They don't seem to care, or don't understand how to get broader adoption.

For some reason AMD's management is dead set on targeting only the high end part of the market. Like, for example, look at this blog post. Which model they're testing? DeepSeek R1, the 671B behemoth that no normal person can run. Or look at any of their tutorials/docs and see which GPUs they support - it's always only either the unobtanium-grade enterprise GPUs, or high end workstation cards that no one buys. And if your strategy is to target only the super rich entities then a little jank in the software isn't really all that punishing - if you can afford to drop a few million on GPUs then you can also afford to hire someone to spend a few weeks getting AMD's software to work/get it tuned by tweaking two dozen environment variables they do seem to like so much/etc.

saagarjha · 2025-03-24T22:31:16 1742855476

> For some reason AMD's management is dead set on targeting only the high end part of the market.

Because those people are dropping $100 billion on GPU clusters and individuals are not

impossiblefork · 2025-03-24T22:59:46 1742857186

Yes, but researchers use Pytorch and those researchers end up being the end users of the GPU clusters.

NVIDIA GPUs sell so well because they work with what researchers actually use.

saagarjha · 2025-03-25T00:13:30 1742861610

Oh I definitely think they should upstream to PyTorch, I'm just saying doing the usual "why doesn't AMD think of the gamers^W^W^W^W^W local model users" is not going to sway their policies.

imtringued · 2025-03-24T08:52:08 1742806328

That would make the kernels the PyTorch Foundations's problem and they would have to set up CI infrastructure around AMD GPUs to maintain these kernels. For whatever reason, AMD really wants to keep everything in-house even though that has been a losing strategy so far.

Tpt · on March 13, 2023

But before becoming priest there is a quite long training program (often 5-6 years). Dropping out of it is quite common.

And even after becoming a priest, they can asked to be relieved for their obligations, including celibacy. But in this case they would have to find an other job.