Hacker News new | past | comments | ask | show | jobs | submit login
ILGPU: Write GPU programs with C# and F# (github.com/m4rs-mt)
157 points by neonsunset 28 days ago | hide | past | favorite | 35 comments

Also recommending ComputeSharp. I've been using it for a few years now: https://github.com/Sergio0694/ComputeSharp

Unfortunately, ComputeSharp is Windows-only which significantly limits its usability.

I'm going through the CUDA courses. I've done GPU and CPU optimization as an enthusiastic amateur in my day job once or twice a year, but it's not my core focus.

It quickly seems that the low level C/Cpp is becoming obsolete, and it's hard to squeeze performance unless you're doing something truly green field / new. Otherwise someone has already optimized the hell out of it.

So what's the use case for porting GPU to higher level languages like C#? What would you use this for?

> It quickly seems that the low level C/Cpp is becoming obsolete, and it's hard to squeeze performance unless you're doing something truly green field / new. Otherwise someone has already optimized the hell out of it.

I hear and see both of these sentiments frequently from the internet crowd (on-lookers). It's both wrong and humorously arrogant. I'll repeat what I said to someone on reddit yesterday: there are thousands (maybe 10s?) scattered around the FAANGs, NVIDIA, AMD, Intel, accelerator startups, boutiques, etc. whose day to day is both C/C++ and squeezing perf out of kernels and getting many points (sometimes 10s) improvement. Certainly the wins aren't daily but I'm saying they do steadily find room for improvement. How is that possible? I'll give you a hint: platforms, demands, hardware, use-cases all change essentially on a quarterly basis.

So before you proclaim victory on behalf of whatever high-level framework, ask yourself if you're really familiar with the production and business environment for this kind of code.

As a learner and self proclaimed amateur, I assume the vitriolic tone is directed elsewhere. Without that, you're saying that performant pipelines are definitely in demand and optimization is still a full time job. That's good! I'd like to get better at that.

>I assume the vitriolic tone is directed elsewhere

I'm always really shocked on hn when people getting called out for arrogance respond with accusations (of vitriol). What day job do you have where you can make proclamations like "It quickly seems that the low level C/Cpp is becoming obsolete" while simultaneously admitting being an amateur and not get checked. Must be very different from my day job where being precise and accurate and conservative is paramount. Moreover what kind of habit of thought are you in that being called out for that is read as vitriol?

"it seems" combined with "I'm learning" was meant to convey ignorance not arrogance. I think I'm just being misread and that's probably my fault. It wouldn't be the first time - so I'm sorry if you found it off-putting

My day job is optimization for logistics and multi-agent systems on edge deployments. We vary between heterogenous compute environment on-platform and occasional connections to cloud-like servers, but only occasionally. The system has a lot to do and we're pushing what it can do for itself.

I like these problems and am always trying to find new ways to solve them. That's really it, I'm wondering if it's best to focus on learning to work the hardware as is, or learning to twist existing libraries into new problems.

For example if you look at this: https://weightythoughts.com/p/cuda-is-still-a-giant-moat-for...

It actually denigrates anyone who would consider starting at low level CUDA kernels. I liked your message though, you had a great point: what do I imagine all those CUDA programmers do if not write CUDA?

> Moreover what kind of habit of thought are you in that being called out for that is read as vitriol?

That’s as much a matter of personal identity as a ‘habit of thought’, friend. Otherwise, we could we comment on the basis of _your_ perspective as some mutable affectation, no?

You see, we are shadows of ourselves. An inquisitive play, or let’s say a “judgment” about Cuda economics: aspirational, errant? Then, the linguistic phalanx: honed no doubt by long running needs to be heard and seen and listened to.

Oh my god, calm down. Jesus christ.

Same thing applies here: I'm not the one the one that's not calm. I very calmly pointed out that op is wrong. That's it. You painting that as "not calm" is the only abnormal thing going on here.

You can interpret text as being uncalm, but there's always room for misunderstanding, esp in a forum/thread. Nobody is actually upset I think, just using direct language.

Calm or not, when you called their behavior "humorously arrogant" it set the tone of the rest of your comment to condescension. It distracts from the on-topic discussion, despite any informative value contained in your comment.

When people post like this it's hard to tell if they are trying to have a conversation in good faith, or they are seeking pleasure in the form of belittling someone.

> "humorously arrogant" it set the tone of the rest of your comment to condescension

it was as condescending as the original comment was arrogant. what exactly is the confusion here? if you want to a community where people are allowed to make bold claims without doing sufficient research (hn) then you should also allow for those people to treated as such.

Given IL itself is an abstract stack-based bytecode, it can be compiled to the corresponding IR, which can then target corresponding back-end (CUDA, OpenCL, CPU, etc.) - this is what ILGPU does.

Because all code is in the single repository in the post and is fairly easy to read, you can skim through it to draw your own conclusions if this interests you.

Also, very easy to start using: just `dotnet add package ILGPU` on most configurations (as ADHD puts higher mental strain on activities involving complex configuration, I try to keep to the tools that have minimal ceremony)

C# (and F# by extension) generally allow to write system-ish code, with references to locals and same C primitives, which means that you're likely not sacrificing in performance in this particular scenario by having the language be higher-level. After all, you're using ILGPU's APIs first and foremost.

As to why use it at all - you are likely to move faster with it than C++, especially if it's not your full-time job, with all the escape hatches to extract 99.9% efficiency still on the table (that is, if performance of the kernel emitted by ILGPU has issues in the first place - see below for alternative, cheap FFI and easy C/C++ integration are still there as well).

It also lets you do things like PTX assembly: https://github.com/m4rs-mt/ILGPU/blob/master/Samples/InlineP...

1. C and C++ aren’t going anywhere.

2. I don’t think you have an appreciation for the sheer amount of software—including lots of very critical software—implemented in C and C++ that are being improved upon daily.

A buddy of mine works on critical low-level software (including C) in the energy sector. Millions of lines of code. The effects, positive or negative, of any single code change can impact tens of thousands of Americans (likely even more). His job is to maintain and continuously optimize this software. Given your comment, I think you’d be surprised to learn that he never runs out of work.

Well, I write in those languages every day! It's not a statement about the languages, just wondering about whether CUDA is best approached low level or from well established libraries.

Would you mind expanding on what your friend does? Sounds like energy trading which is something I’ve been taking a look at. I’m just curious.

Seems more likely it's to do with running the systems and networks.

Make it more accessible? Prototyping? Eg. easily determining if your use case is even suited for a GPU workload. Best case - it is, and now you can write a custom CUDA kernel to squeeze out more performance. Worst case, you lose a couple of hours vs weeks before you discover it’s not going to work.

what do you mean obsolete, read about PTX target tests:


I use ILGPU all the time for prototyping different image processing and rendering things for work and for my personal projects. I use cpp for a few things where the extra efficiency is worth it, but for most tasks c# is plenty fast enough.

Programmers generally use higher level languages to implement original code because the code is easier for humans to read, write and debug. Porting from low level languages (eg CUDA or C++ to F#) is less common.

Also see https://www.fshade.org/, a F# dsl for shaders

I use ILGPU all the time! You would be amazed the performance you can get out of c# if you are careful about how you write your code. I recently wrote a super fast gaussian splat renderer in c# and used ILGPU to sort the splats on the gpu. You can even do weird cuda stuff like pass buffers between opengl and cuda, all in c#! https://github.com/NullandKale/OpenTKSplat/blob/main/OpenTKS...

This needs a Metal backend IMO. Higher level GPU programming on a Mac is a bit of a shitshow, and Apple devices do have pretty potent GPUs. They’re also largely compatible across their entire product line, thus dramatically expanding the potential installed base of the output of such toolkits, assuming it can be AOT compiled.

Currently ILGPU already supports arm64 linux if you have a cuda or opencl device, so supporting apple silicon is not out of the question. It has been floated that we would support Metal via Vulkan support and MoltenVK but not much work has been done on that front.

Currently SIMD support for a fast CPU accelerator is the main focus for the next big accelerator type.

Note also that Apple SoCs also have a matrix unit, called AMX, which is designed specifically for matmuls. You could expand the math throughput of the CPU quite dramatically by taking advantage of it. I don’t think there’s any official public documentation on it, but here’s a GitHub repo that documents it independently https://github.com/corsix/amx

Keep in mind that Apple's AMX is pretty much undocumented and not guaranteed to be stable, so the effort invested into integrating reverse engineered work might not be the best allocation of resources. Particularly now that M4 supports ARM SME which is the "official" extension (though not yet offered via hardware intrinsics in .NET (pretty much no hardware on the market supports it as of now), with the closest one in the form of SVE2 coming in .NET 9).

I did look once at back-end implementations and contributing either Metal back-end or adapting Vulkan back-end to run on top of MoltenVK seemed much more realistic.

With that said, OpenCL already works as is so you are not completely unsupported.

No disagreement on allocation. I just don't see why anyone would use something like this in the CUDA/CPU ecosystem where lots of established alternatives exist that do not require one to use C# or F#, neither of which is common in scientific computing.

OTOH the field is almost bare on the Apple side, in spite of there being over a billion devices out there with relatively low hardware fragmentation and almost uniform ISA throughout the entire product lineup. Hence the suggestion.

C# and F# are very niche that is true. I think the basics of the languages are way nicer and saner than Python's and C++'s.

You can just give it a try and see if you like it or not.

In general, I think you are right and Python has completely won for high-level libraries while C++ has completely won for implementation, and threshold for making people move is way too high: difficult to match 5-10x improvement over Python in experience, and C++ crowd would never even look at C#, let alone think it has something to offer them, because the mythology says that the only true way is C/C++ for this kind of code and C# is just weird Java (especially now with ggml being on the radar of many).

(fun thought experiment: imagine average reaction to a statement "you can write high level code that compiles to Metal Performance Shaders or targets Apple AMX but it's C#", not dissimilar to a reaction when people hear that C# is the prime choice for portable SIMD code)

We have thought of supporting the tensor cores in cuda devices as well, we could probably use the same abstractions we need for that for the amx support. Unfortunately we mainly focus on cuda support because most cases people are using cuda for gpu compute purposes.

CUDA is a bit of a well-trodden ground, you aren’t going to do much better there (if at all) than cuBLAS and cuDNN. But I get what you’re saying, gotta pick one’s battles.

My understanding is it's less about competing with cuBLAS and cuDNN directly but rather offering the features they expose in a better and more idiomatic way - there's a reason it's less fun and more tedious to write C++ AMP code.

Why would anyone write C++ AMP code when AMP is deprecated, and e.g. Triton exists though?

Not exactly the same thing, but there is a very nice ML framework for .Net called ML.Net.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact