More

vblanco · 2026-03-16T19:56:37 1773690997

There is not much need for this. I already use claude code with godot to build serious projects, and you only need to point the bot at godot + sourcecode folder, and use C#, then it works like a charm.

Nice set of prompts and skills tho, im grabbing them for personal use.

tpxl · 2026-03-16T20:02:11 1773691331

Can you expand on how you do this? I've gotten into gamedev a couple of times, but never got around to completing anything. Something like this might just do the trick.

vblanco · 2026-03-16T20:42:46 1773693766

First of all, you dont do one prompt to do the entire game, but "decent" style vibecoding where you do things little by little controlling the bot.

Godot whole engine is text based. This means you can just let claude rip through the assets and files just fine. It basically just works.

The thing that is critical is to make some documentation about the axis systems and core classes (the one on OP project is pretty good, ive grabbed it) and then you set your claude.md to point at the godot source code so that the bot can doublecheck things.

Ive been playing with multiple engines, and godot is by far the best one to use with the AI. Unreal engine is too heavy on binary files that coding tools cant parse, and Unity is closed source which leaves the bot with no reliable documentation or way to check what the game apis are doing. Godot is small enough that the bot can understand it and works fine for games that arent too complicated.

Im using it to build a spiritual remake of daggerfall as a procedural open world rpg, right now its at 60.000 lines of code, quite advanced. I got it running on a steamdeck at 60 fps even with 4 kilometers of draw distance with thousands of trees and procedural terrain thanks to doing tons of custom shaders and a few engine edits.

guitarlimeo · 2026-03-17T09:52:14 1773741134

Note that while Godot's formats support being text based (.tscn and .tres) you get a massive speed boost in saving and loading once you convert to using .scn and .res for everything that is over 1MB in size. If you add a high-res model and make it unique because you need to change the textures or something, that already makes your scene as big as the model.

So be aware that supporting text only will become slow once the scenes are big enough.

mattfrommars · 2026-03-16T21:11:00 1773695460

incredible. And this was all using $20 plan from Claude or do you pay extra for Claude bandwidth?

vblanco · 2026-03-16T21:57:47 1773698267

I use the 100 plan, the 20 dollar plan is more of a trial, you run out of that in no time. With the 100 model i use it both for work (graphics rendering) and this which i do part time. Ive captured a few screenshots here <https://imgur.com/a/RJIcKqM> .

eudamoniac · 2026-03-17T01:17:17 1773710237

The 60,000 lines of code that just works bro!:

The screenshot:

vblanco · 2026-03-17T15:12:40 1773760360

The actual screenshots only show like 10k lines of code from the procedural generation system and custom render tech. Its a fully playable RPG game, the lines are split on quest systems, stat stuff, inventory, ui, dungeon generation, enemies, etc.

The world is being generated at startup for a 50-50 kilometer play area through multiple steps of generation that includes city placement, roads, and various biomes.

Has 3 months of dev time right now

mattfrommars · 2026-03-17T01:30:18 1773711018

that's reasonable I think if he isn't using any kind of game framework.

vblanco · 2026-01-25T09:17:01 1769332621

Commercial translator services lately are the worst they have ever been. You cant validate that they aren't directly sending your excel with the translation lines into a LLM with no tweaking/checking.

For a indie videogame i work on, we tried a couple translation agencies, and they gave terrible output. At the end, we built our own LLM based agentic translation, with lots of customization for our specific project like building a prompt based on where the menu/string is at, shared glossary, and other features. Testing this against the agencies, it was better because we could customize it for the needs of our specific game.

Even then, at the end of the day, we went with freelancers for some of the languages as we couldn't really validate the AI output on those languages.The freelancers took a month to do the translation vs the 2-3 days we ourselves took for the languages we knew and we could monitor the AI output. But they did a nice job, much better than the agencies.

I feel that what AI really completely kills is those translation services. Its not hard at all to build or customize your own AI system, so if the agency is going to charge you considerable money for AI output, just do it yourself and get a better result. Meanwhile those freelancers are still in demand as they can actually check the project and understand it for a nice translation, unlike the mechanical agencies where you send them the excel and they send it to who knows what or an AI without you being able to check.

I will likely be opensourcing this customizable AI translation system for my project soon.

vblanco · 2025-12-16T20:11:25 1765915885

There is no implementation of it but this is how i see it, at least comparing with how things with fully extensioned vulkan work, which uses a few similar mechanics.

Per-drawcall cost goes to nanosecond scale. Assuming you do drawcalls of course, this makes bindless and indirect rendering a bit easier so you could drop CPU cost to near-0 in a renderer.

It would also highly mitigate shader compiler hitches due to having a split pipeline instead of a monolythic one.

The simplification on barriers could improve performance a significant amount because currently, most engines that deal with Vulkan and DX12 need to keep track of individual texture layouts and transitions, and this completely removes such a thing.

vblanco · 2025-12-16T19:39:13 1765913953

This is a fantastic article that demonstrates how many parts of vulkan and DX12 are no longer needed.

I hope the IHVs have a look at it because current DX12 seems semi abandoned, with it not supporting buffer pointers even when every gpu made on the last 10 (or more!) years can do pointers just fine, and while Vulkan doesnt do a 2.0 release that cleans things, so it carries a lot of baggage, and specially, tons of drivers that dont implement the extensions that really improve things.

If this api existed, you could emulate openGL on top of this faster than current opengl to vulkan layers, and something like SDL3 gpu would get a 3x/4x boost too.

pjmlp · 2025-12-16T20:17:27 1765916247

DirectX documentation is on a bad state currently, you have the Frank Lunas's books, which don't cover the latest improvements, and then is hunting through Learn, Github samples and reference docs.

Vulkan is another mess, even if there was a 2.0, how are devs supposed to actually use it, especially on Android, the biggest consumer Vulkan platform?

_bohm · 2025-12-16T23:24:05 1765927445

I'm surprised he made no mention of the SDL3 GPU API since his proposed API has pretty significant overlap with it.

tadfisher · 2025-12-16T20:03:07 1765915387

Isn't this all because PCI resizable BAR is not required to run any GPU besides Intel Arc? As in, maybe it's mostly down to Microsoft/Intel mandating reBAR in UEFI so we can start using stuff like bindless textures without thousands of support tickets and negative reviews.

I think this puts a floor on supported hardware though, like Nvidia 30xx and Radeon 5xxx. And of course motherboard support is a crapshoot until 2020 or so.

vblanco · 2025-12-16T20:06:28 1765915588

This is not really directly about resizable BAR. you could do mostly the same api without it. Resizable bar simplifies it a little bit because you skip manual transfer operations, but its not completely required as you can write things to a cpu-writeable buffer and then begin your frame with a transfer command.

Bindless textures never needed any kind of resizable BAR, you have been able to use them since early 2010s on opengl through an extension. Buffer pointers also have never needed it.

exDM69 · 2025-12-17T12:19:15 1765973955

> tons of drivers that dont implement the extensions that really improve things.

This isn't really the case, at least on desktop side.

All three desktop GPU vendors support Vulkan 1.4 (or most of the features via extensions) on all major platforms even on really old hardware (e.g. Intel Skylake is 10+ years old and has all the latest Vulkan features). Even Apple + MoltenVK is pretty good.

Even mobile GPU vendors have pretty good support in their latest drivers.

The biggest issue is that Android consumer devices don't get GPU driver updates so they're not available to the general public.

pjmlp · 2025-12-17T13:36:07 1765978567

Neither do laptops, where not using the driver from the OEM with whatver custom code they added can lead to interesting experiences, like power configuration going bad, not able to handle the mixed GPU setups, and so on.

kllrnohj · 2025-12-16T23:29:58 1765927798

No longer needed is a strong statement given how recent the GPU support is. It's unlikely anything could accept those minimum requirements today.

But soon? Hopefully

jsheard · 2025-12-16T23:49:58 1765928998

Those requirements more or less line up with the introduction of hardware raytracing, and some major titles are already treating that as a hard requirement, like the recent Doom and Indiana Jones games.

kllrnohj · 2025-12-16T23:59:12 1765929552

Only if you're ignoring mobile entirely. One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.

m-schuetz · 2025-12-17T21:06:40 1766005600

On the contrary, I would say this is the main thing Vulkan got wrong and the main reason whe the API is so bad. Desktop and mobile are way too different for a uniform rendering API. They should be two different flavours with a common denominator. OpenGL and OpenGL ES were much better in that regard.

HelloNurse · 2025-12-18T09:25:25 1766049925

It is unreasonable to expect to run the same graphics code on desktop GPUs and mobile ones: mobile applications have to render something less expensive that doesn't exceed the limited capabilities of a low-power device with slow memory.

The different, separate engine variants for mobile and desktop users, on the other hand, can be based on the same graphics API; they'll just use different features from it in addition to having different algorithms and architecture.

flohofwoe · 2025-12-18T09:52:51 1766051571

> they'll just use different features from it in addition to having different algorithms and architecture.

...so you'll have different code paths for desktop and mobile anyway. The same can be achieved with a Vulkan vs VulkanES split which would overlap for maybe 50..70% of the core API, but significantly differ in the rest (like resource binding).

kllrnohj · 2025-12-18T20:49:50 1766090990

But they don't actually differ, see the "no graphics API" blog post we're all commenting on :) The primary difference between mobile & desktop is performance, not feature set (ignoring for a minute the problem of outdated drivers).

And beyond that if you look at historical trends, mobile is and always has been just "desktop from 5-7 years ago". An API split that makes sense now will stop making sense rather quickly.

m-schuetz · 2025-12-19T08:09:52 1766131792

Different features/architecture is precisely the issue with mobile, be it due to hardware constraints or due to lack in deiver support. Render passes were only bolted into Vulkan because of mobile tiler GPUs, they never made any sense for desktop GPUs and only made Vulkan worse for desktop graphics development.

And this is the reason why mobile and desktop should be separate graphics APIs. Mobile is holding desktop back not just feature wise, it also fucks up the API.

pjmlp · 2025-12-17T13:37:17 1765978637

It is not unified, when the first thing an application has to do is to find out if their set of extension spaghetti is available on the device.

flohofwoe · 2025-12-17T10:04:19 1765965859

> One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.

In hindsight it really would have been better to have a separate VulkanES which is specialized for mobile GPUs.

pjmlp · 2025-12-17T19:02:29 1765998149

Apparently in many Android devices it is still better to target OpenGL ES than Vulkan due to driver quality, outside Samsung and Google brands.

eek2121 · 2025-12-18T00:56:08 1766019368

Mobile is getting RT, fyi. Apple already has it (for a few generations, at least), I think Qualcomm does as well (I'm less familiar with their stuff, because they've been behind the game forever, however the last I've read, their latest stuff has it), and things are rapidly improving.

Vulkan is the actual barrier. On Windows, DirectX does an average job at supporting it. Microsoft doesn't really innovate these days, so NVIDIA largely drives the market, and sometimes AMD pitches in.

pjmlp · 2025-12-18T12:06:50 1766059610

Where do you think many DirectX features came from?

It has been mostly NVidia in collaboration with Microsoft, even HLSL traces back to Cg.

jsheard · 2025-12-17T00:02:58 1765929778

Eh, I think the jury is still out on whether unifying desktop and mobile graphics APIs is really worth it. In practice Vulkan written to take full advantage of desktop GPUs is wildly incompatible with most mobile GPUs, so there's fragmentation between them regardless.

kllrnohj · 2025-12-17T00:55:26 1765932926

It's quite useful for things like skia or piet-gpu/vello or the general category of "things that use the GPU that aren't games" (image/video editors, effects pipelines, compute, etc etc etc)

Groxx · 2025-12-17T02:16:22 1765937782

would it also apply to stuff like the Switch, and relatively high-end "mobile" gaming in general? (I'm not sure what those chips actually look like tho)

there are also some arm laptops that just run Qualcomm chips, the same as some phones (tablets with a keyboard, basically, but a bit more "PC"-like due to running Windows).

AFAICT the fusion seems likely to be an accurate prediction.

deliciousturkey · 2025-12-17T10:21:11 1765966871

Switch has its own API. The GPU also doesn't have limitations you'd associate with "mobile". In terms of architecture, it's a full desktop GPU with desktop-class features.

kllrnohj · 2025-12-17T13:40:32 1765978832

well, it's a desktop GPU with desktop-class features from 2014 which makes it quite outdated relative to current mobile GPUs. The just released Switch 2 uses an Ampere-based GPU, which means it's desktop-class for 2020 (RTX 3xxx series), which is nothing to scoff about but "desktop-class features" is a rapidly moving target and the Switch ends up being a lot closer to mobile than it does to desktop since it's always launching with ~2 generations old GPUs.

deliciousturkey · 2025-12-21T13:23:26 1766323406

The context was

Only if you're ignoring mobile entirely. One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.

In this context, both old Switch and Switch 2 have full desktop-class GPUs. They don't need to care about the API problems that mobile vendors imposed to Vulkan.

pjmlp · 2025-12-17T19:04:00 1765998240

Still beats the design of all Web 3D APIs, and has much better development tooling, let that sink in how behind they are.

pjmlp · 2025-12-18T12:10:57 1766059857

Those already have their own abstraction API, and implementing a RHI isn't a big issue as FOSS circles make it to be.

jsheard · 2025-12-17T01:03:19 1765933399

I suppose that's true, yeah. I was focusing too much on games specifically.

ablob · 2025-12-17T05:48:09 1765950489

I feel like it's a win by default. I do like to write my own programs every now and then and recently there's been more and more graphics sprinkled into them. Being able to reuse those components and just render onto a target without changing anything else seems to be very useful here. This kind of seamless interoperability between platforms is very desirable in my book. I can't think of a better approach to achieve this than the graphics API itself.

Also there is no inherent thing that blocks extensions by default. I feel like a reasonable core that can optionally do more things similar to CPU extensions (i.e. vector extensions) could be the way to go here.

eek2121 · 2025-12-18T01:00:20 1766019620

I definitely disagree here. What matters for mobile is power consumption. Capabilities can be pretty easily implemented...if you disagree, ask Apple. They have seemingly nailed it (with a few unrelated limitations).

Mobile vendors insisting on using closed, proprietary drivers that they refuse to constantly update/stay on top of is the actual issue. If you have a GPU capable of cutting edge graphics, you have to have a top notch driver stack. Nobody gets this right except AMD and NVIDIA (and both have their flaws). Apple doesn't even come close, and they are ahead of everyone else except AMD/NVIDIA. AMD seems to do it the best, NVIDIA, a distant second, Apple 3rd, and everyone else 10th.

aleph_minus_one · 2025-12-18T10:22:30 1766053350

> If you have a GPU capable of cutting edge graphics, you have to have a top notch driver stack. Nobody gets this right except AMD and NVIDIA (and both have their flaws). Apple doesn't even come close, and they are ahead of everyone else except AMD/NVIDIA. AMD seems to do it the best, NVIDIA, a distant second, Apple 3rd, and everyone else 10th.

What about Intel?

pjmlp · 2025-12-18T12:09:26 1766059766

It is quite telling how good their iGPUs are at 3D that no one counts them in.

I remember there was time about 15 years ago, they were famous for reporting OpenGL capabilities as supported, when they were actually only available as software rendering, which voided any purpose to use such features in first place.

aleph_minus_one · 2025-12-18T12:47:54 1766062074

I know that in the past (such as your mentioned 15 years ago) Intel GPUs did have driver issues.

> It is quite telling how good their iGPUs are at 3D that no one counts them in.

I'm not so certain about this: in

> https://old.reddit.com/r/laptops/comments/1eqyau2/apuigpu_ti...

APUs/iGPUs are compared, and here Intel's integrated GPUs seem to be very competitive with AMD's APUs.

---

You of course have to compare dedicated graphics cards with each other, and similarly for integrated GPUs, so let's compare (Intel's) dedicated GPUs (Intel Arc), too:

When I look at

> https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html

the current Intel Arc generation (Intel-Arc-B, "Battlemage") seems to be competitive with entry-level GPUs of NVidia and AMD, i.e. you can get much more powerful GPUs from NVidia and AMD, but for a much higher price. I thus clearly would not call Intel's dedicated GPUs to be so bad "at 3D that no one counts them in".

01HNNWZ0MV43FF · 2025-12-17T06:03:45 1765951425

If the APIs aren't unified, the engines will be, since VR games will want to work on both standalone headsets and streaming headsets

tjpnz · 2025-12-17T00:39:36 1765931976

Doom was able to drop it and is now Steam Deck verified.

nicolaslem · 2025-12-17T09:38:08 1765964288

Little known fact, the Steam Deck has hardware ray tracing, it's just so weak as to be almost non-existent.

torginus · 2025-12-17T20:17:42 1766002662

It's weird how the 'next-gen' APIs will turn out to be failures in many ways imo. I think still as sizeable amount of graphics devs still stuck to the old way of doing things. I know a couple graphics wizards (who work on major AAA titles) who never liked Vulkan/DX12, and many engines haven't really been rebuilt to accomodate the 'new' way of doing graphics.

Ironically a lot of the time, these new APIs end up being slower in practice (something confirmed by gaming benchmarks), probably exactly because of the issues outlined in the article - having precompiled 'pipeline states', instead of the good ol state machine has forced devs to precompile a truly staggering amount of states, and even then sometimes compilation can occur, leading to these well known stutters.

The other issue is synchronization - as the article mentions how unnecessarily heavy Vulkan synchronization is, and devs aren't really experts or have the time to figure out when to use what kind of barrier, so they adopt a 'better be safe than sorry approach', leading to unneccessary flushes and pipeline stalls that can tank performance in real life workloads.

This is definitely a huge issue combined with the API complexity, leading many devs to use wrappers like the aforementioned SDL3, which is definitely very conservative when it comes to synchronization.

Old APIs with smart drivers could either figure this out better, or GPU driver devs looked at the workloads and patched up rendering manually on popular titles.

Additionally by the early to mid 10s, when these new APIs started getting released, a lot of crafty devs, together with new shader models and OpenGL extensions made it possible to render tens of thousands of varied and interesting objects, essentially the whole scene's worth, in a single draw call. The most sophisticated and complex of these was AZDO, which I'm not sure made it actually into a released games, but even with much less sophisticated approaches (and combined with ideas like PBR materials and deferred rendering), you could pretty much draw anything.

This meant much of the perf bottleneck of the old APIs disappeared.

eek2121 · 2025-12-18T00:51:22 1766019082

I think the big issue is that there is no 'next-gen API'. Microsoft has largely abandoned DirectX, Vulkan is restrictive as anything, Metal isn't changing much beyond matching DX/Vk, and NVIDIA/AMD/Apple/Qualcomm aren't interested in (re)-inventing the wheel.

There are some interesting GPU improvements coming down the pipeline, like a possible OoO part from AMD (if certain credible leaks are valid), however, crickets from Microsoft, and NVIDIA just wants vendor lock-in.

Yes, we need a vastly simpler API. I'd argue even simpler than the one proposed.

One of my biggest hopes for RT is that it will standardize like 80% of stuff to the point where it can be abstracted to libraries. It probably won't happen, but one can wish...

aleph_minus_one · 2025-12-18T10:25:09 1766053509

> Microsoft has largely abandoned DirectX

What does Microsoft then intend to use to replace the functionality that DirectX provides?

PeterStuer · 2025-12-17T08:40:49 1765960849

Still have some 1080's in gaming machines going strong. But as even nVidea retired support I guess it is time to move on.

vblanco · 2025-07-01T19:05:40 1751396740

The modern cryengine compiles very fast. Their trick is that they have architected everything to go through interfaces that are on very thin headers, and thus their headers end very light and they dont compile the class properties over and over. But its a shame we need to do tricks like this for compile speed as they harm runtime performance.

ttoinou · 2025-07-01T19:26:21 1751397981

Why does it ruin runtime performance ? The code should be almost the same

vblanco · 2025-07-01T19:34:27 1751398467

Because you now need to go through virtual calls on functions that dont really need to be virtual, which means the possible cache miss from loading the virtual function from vtable, and then the impossibility of them being inlined. For example they have a ITexture interface with a function like virtual GetSize(). If it wasnt all through virtuals, that size would just be a vec2 in the class and then its a simple load that gets inlined.

pjmlp · 2025-07-02T08:48:54 1751446134

At least on clang with LTO, with bitcode variant, that should be possible to devirtualize, assuming most of those interfaces only have a single implementation.

ttoinou · 2025-07-01T19:37:27 1751398647

Ah yes this kind of interface ok indeed this doesn't seem like a useful layer when running the program. Maybe the compilers could optimize this though

drysine · 2025-07-02T09:33:54 1751448834

They can sometimes

https://quuxplusone.github.io/blog/2021/02/15/devirtualizati...

jeremiahar · 2025-07-02T00:45:22 1751417122

In my experience, as long as there's only a single implementation, devirtualization works well, and can even inline the functions. But you need to pass something along the lines of "-fwhole-program-vtables -fstrict-vtable-pointer" + LTO. Of course the vtable pointer is still present in the object. So I personally only use the aforementioned "thin headers" at a system level (IRenderer), rather than for each individual object (ITexture).

barchar · 2025-07-02T00:56:57 1751417817

In addition to what everyone else has said it also makes it difficult to allocate the type on the stack. Even if you do allow it you'll at least need a probe.

vblanco · 2025-04-12T12:05:46 1744459546

Game consoles generally only offer clang as a possibility for compiler. If you can compile rust to C, then you can finally use rust for videogames that need to run everywhere.

koakuma-chan · 2025-04-12T13:43:40 1744465420

Is Steam Deck a monopoly yet? I feel like if your game compiles to Linux, you can target pretty much every market out there.

dcow · 2025-04-12T13:27:13 1744464433

I don’t think I’ve ever heard those two terms “video game” and “run everywhere” in the same sentence. Bravo.

vblanco · on Jan 8, 2025

Interesting library, but i see it falls back into what happens to almost all SIMD libraries, which is that they hardcode the vector target completely and you cant mix/match feature levels within a build. The documentation recommends writing your kernels into DLLs and dynamic-loading them which is a huge mess https://jfalcou.github.io/eve/multiarch.html

Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects, which lets you branch at runtime between simd levels as you wish. I find its a far better way of doing things if you actually want to ship the simd code to users.

kookamamie · on Jan 8, 2025

100% agreed. This is the main reason ISPC is my go-to tool for explicit vectorization.

janwas · on Jan 8, 2025

+1, dynamic dispatch is important. Our Highway library has extensive support for this.

Detailed intro by kfjahnke here: https://github.com/kfjahnke/zimt/blob/multi_isa/examples/mul...

spacechild1 · on Jan 8, 2025

Thanks, that's an important caveat!

> Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects

That's pretty cool because you can write function templates and instantiate different versions that you can select at runtime.

vblanco · on Jan 8, 2025

Yeah thts the fun of it, you create your kernel/function so that the simd level is a template parameter, and then you can use simple branching like:

if(supports<avx512>){ myAlgo<avx512>(); } else{ myAlgo<avx>(); }

Ive also used it for benchmarking to see if my code scales to different simd widths well and its a huge help

dyaroshev · on Jan 8, 2025

FYI: You don't want to do this. `supports<avx512>` is an expensive check. You really want to put this check in a static.

spacechild1 · on Jan 9, 2025

I guess this was just pseudo-code. Of course you don't want to do a runtime feature check over and over again.

dyaroshev · on Jan 8, 2025

Our answer to this - is dynamic dispatch. If you want to have multiple version of the same kernel compiled - compile multiple dlls.

The big problem here is: ODR violations. We really didn't want to do the xsimd thing of forcing the user to pass an arch everywhere.

Also that kinda defeats the purpose of "simd portability" - any code with avx2 can't work for an arm platform.

eve just works everywhere.

Example: https://godbolt.org/z/bEGd7Tnb3

janwas · on Jan 8, 2025

It is possible to avoid ODR violations :) We put the per-target code into unique namespaces, and export a function pointer to them.

dyaroshev · on Jan 8, 2025

You can do many thing with macros and inline namespaces but I believe they run into problems when modules come into play. Can you compile the same code twice, with different flags with modules?

janwas · on Jan 9, 2025

We use pragma target instead of compiler flags :)

dyaroshev · on Jan 9, 2025

I don't think we understand each other.

We want to take one function and compile it twice:

``` namespace MEGA_MACRO {

void foo(std::span<int> s) { super_awesome_platform_specific_thing(s); }

} // namespace MEGA_MACRO ```

Whatever you do - the code above has to be written once but compiled twice. In one file/in many files - doesn't matter.

My point is - I don't think you can compile that code twice if you support modules.

janwas · on Jan 9, 2025

I think I do understand, this is exactly what we do. (MEGA_MACRO == HWY_NAMESPACE)

Then we have a table of function pointers to &AVX2::foo, &AVX3::foo etc. As long as the module exports one single thing, which either calls into or exports this table, I do not see how it is incompatible with building your project using modules enabled?

(The way we compile the code twice is to re-include our source file, taking care that only the SIMD parts are actually seen by the compiler, and stuff like the module exports would only be compiled once.)

dyaroshev · on Jan 9, 2025

> is to re-include our source file

Yeah - that means your source file is never a module. We would really like eve to be modularized, the CI times are unbearable.

I'd love to be proven wrong here, that'd be amazing. But I don't think google highway can be modularized.

janwas · on Jan 9, 2025

What leads you to that conclusion? It is still possible to use #include in module implementations. We can use that to make the module implementation look like your example.

Thus it ought to be possible, though I have not yet tried it.

dyaroshev · on Jan 9, 2025

Well.

You have a file, something like: load.h

You need to include it multiple times, compiled with different flags.

So - it's never going to be in load.cxx or whatever that's called.

janwas · on Jan 9, 2025

As mentioned ("re-include our source file"), we are indeed able to put the SIMD code, as well as the self-#include of itself, in a load.cxx TU.

Here is an example: https://github.com/google/gemma.cpp/blob/9dfe2a76be63bcfe679...

dyaroshev · on Jan 9, 2025

I don't think this works if your files are modules.

Let's stop here, it doesn't seem like we understand each other.

vlovich123 · on Jan 8, 2025

Since you seem knowledgeable about this, what does this do differently from other SIMD libraries like xsimd / highway? Is it the addition of algorithms similar to the STD library that are explicitly SIMD optimized?

dyaroshev · on Jan 9, 2025

The algorithms I tried to make as good as I knew how. Maybe 95% there. Nice tail handling. A lot of things supported. I like or interface over other alternatives, but I'm biased here. Really massive math library.

vblanco · on Nov 16, 2024

Game developers have been doing this since forever, its one of their main reasons to avoid the STL.

EASTL has this as a feature by default, and unreal engine container library has the boundchecks enabled on most games. The performance cost of those boundchecks in practice is well worth the reduction of bugs even on performance sensitive code.

pjmlp · on Nov 16, 2024

Which is yet another reason to assert (pun intend), how far from reality the anti-bounds check folks are, when even the game industry takes them seriously.

vblanco · on Sept 24, 2024

A truly incredible profiler for the great price of free. There is nothing coming at this level of features and performance even on paid software. Tracy could cost thousands of dollars a year and would still be the best profiler.

Tracy requires you to add macros to your codebase to log functions/scopes, so its not an automatic sampling profiler like superluminal, verysleepy, VS profiler, or others. Each of those macros has around 50 nanoseconds of overhead, so you can liberally use them in the millions. On the UI, it has a stats window that will record average, deviation, min/max of those profiler zones, which can be used to profile functions at the level of single nanoseconds.

Its the main thing i use for all my profiling and optimization work. I combine it with superluminal (sampling profiler) to get a high level overview of the program, then i put tracy zones on the important places to get the detailed information.

eagle2com · on Sept 24, 2024

Doesn't Tracy have the capability to do sampling as well? I remember using it at some point, even if it was finicky to setup because windows.

vblanco · on Sept 24, 2024

it does, but i dont use it much due to it being too slow and heavy on memory on my ryzen 5950x (32 threads) on windows. a couple seconds of tracing goes into tens of gigabytes of ram.

forrestthewoods · on Sept 24, 2024

Yeah I had issues with the Tracy sampler. It didn’t “just work” the way Superluminal did.

My only issue with Superluminal is I can’t get proper callstacks for interpreted languages like Python. It treats all the CPP callstacks as the same. Not sure if Tracy can handle that nicely or not…

forrestthewoods · on Sept 24, 2024

Tracy and Superluminal are the way. Both are so good.

Flex247A · on Sept 24, 2024

Hello! Going through your tutorial and it's been a great ride!

Thanks for the good work.

vblanco · on Oct 15, 2023

They are not slower than headers. Ive been looking into it because modular STL is such a big win. On my little toy project i have .cpp files compiling in 0.05 seconds while doing import std.

Downside is that at the moment you cant mix normal header STL with module STL in the same project (msvc), so its for cleanroom small projects only. I expect the second you can reliably use that almost everyone will switch overnight just from how fast of a speed boost it gives on the STL vs even precompiled headers.

cwzwarich · on Oct 15, 2023

The one way in which they are slower than headers is that they create longer dependency chains of translation units, whereas with headers you unlock more parallelism at the beginning of the build process, but much of it is duplicated work.

maccard · on Oct 15, 2023

Every post or exploration of modules (including this one) has found that modules are slower to compile.

> I expect the second you can reliably use that almost everyone will switch overnight just from how fast of a speed boost it gives on the STL vs even precompiled headers.

I look forward to that day, but it feels like we're a while off it yet

klipt · on Oct 15, 2023

I assume templates can only be partly preprocessed (parsed?) but not fully pre compiled, since final code depends on the template types?

jcelerier · on Oct 15, 2023

Depends on the compiler, clang is able to pre-instantiate templates and generate debug info as part of its pch system - (for instance most likely you have some std::vector<int> which can be instantiated somewhere in a transitively included header).

In my projects enabling the relevant flags gave pretty nice speedups.

vblanco · on Oct 15, 2023

Yes, but template code is all on headers, so it gets parsed every single time its included on some compile unit. With modules this only happens once so its a huge speed upgrade in pretty much all cases.

moregrist · on Oct 15, 2023

Whenever I’ve profiled compile times, parsing accounts for relatively little of the time, while the vast majority of the time is spent in the optimizer.

So at least for my projects it’s a modest (maybe 10-20%) speed up, not the order of magnitude speed up I was hoping for.

Thus C++ compile times will remain abysmal.

dagmx · on Oct 15, 2023

For some template heavy code bases I’ve been in, going to PCH has cut my compile times to less than half. I assume modules will have a similar benefit in those particular repositories, but obviously YMMV