There is not much need for this. I already use claude code with godot to build serious projects, and you only need to point the bot at godot + sourcecode folder, and use C#, then it works like a charm.
Nice set of prompts and skills tho, im grabbing them for personal use.
Can you expand on how you do this? I've gotten into gamedev a couple of times, but never got around to completing anything. Something like this might just do the trick.
First of all, you dont do one prompt to do the entire game, but "decent" style vibecoding where you do things little by little controlling the bot.
Godot whole engine is text based. This means you can just let claude rip through the assets and files just fine. It basically just works.
The thing that is critical is to make some documentation about the axis systems and core classes (the one on OP project is pretty good, ive grabbed it) and then you set your claude.md to point at the godot source code so that the bot can doublecheck things.
Ive been playing with multiple engines, and godot is by far the best one to use with the AI. Unreal engine is too heavy on binary files that coding tools cant parse, and Unity is closed source which leaves the bot with no reliable documentation or way to check what the game apis are doing. Godot is small enough that the bot can understand it and works fine for games that arent too complicated.
Im using it to build a spiritual remake of daggerfall as a procedural open world rpg, right now its at 60.000 lines of code, quite advanced. I got it running on a steamdeck at 60 fps even with 4 kilometers of draw distance with thousands of trees and procedural terrain thanks to doing tons of custom shaders and a few engine edits.
Note that while Godot's formats support being text based (.tscn and .tres) you get a massive speed boost in saving and loading once you convert to using .scn and .res for everything that is over 1MB in size. If you add a high-res model and make it unique because you need to change the textures or something, that already makes your scene as big as the model.
So be aware that supporting text only will become slow once the scenes are big enough.
I use the 100 plan, the 20 dollar plan is more of a trial, you run out of that in no time. With the 100 model i use it both for work (graphics rendering) and this which i do part time. Ive captured a few screenshots here <https://imgur.com/a/RJIcKqM> .
The actual screenshots only show like 10k lines of code from the procedural generation system and custom render tech. Its a fully playable RPG game, the lines are split on quest systems, stat stuff, inventory, ui, dungeon generation, enemies, etc.
The world is being generated at startup for a 50-50 kilometer play area through multiple steps of generation that includes city placement, roads, and various biomes.
Commercial translator services lately are the worst they have ever been. You cant validate that they aren't directly sending your excel with the translation lines into a LLM with no tweaking/checking.
For a indie videogame i work on, we tried a couple translation agencies, and they gave terrible output. At the end, we built our own LLM based agentic translation, with lots of customization for our specific project like building a prompt based on where the menu/string is at, shared glossary, and other features. Testing this against the agencies, it was better because we could customize it for the needs of our specific game.
Even then, at the end of the day, we went with freelancers for some of the languages as we couldn't really validate the AI output on those languages.The freelancers took a month to do the translation vs the 2-3 days we ourselves took for the languages we knew and we could monitor the AI output. But they did a nice job, much better than the agencies.
I feel that what AI really completely kills is those translation services. Its not hard at all to build or customize your own AI system, so if the agency is going to charge you considerable money for AI output, just do it yourself and get a better result. Meanwhile those freelancers are still in demand as they can actually check the project and understand it for a nice translation, unlike the mechanical agencies where you send them the excel and they send it to who knows what or an AI without you being able to check.
I will likely be opensourcing this customizable AI translation system for my project soon.
There is no implementation of it but this is how i see it, at least comparing with how things with fully extensioned vulkan work, which uses a few similar mechanics.
Per-drawcall cost goes to nanosecond scale. Assuming you do drawcalls of course, this makes bindless and indirect rendering a bit easier so you could drop CPU cost to near-0 in a renderer.
It would also highly mitigate shader compiler hitches due to having a split pipeline instead of a monolythic one.
The simplification on barriers could improve performance a significant amount because currently, most engines that deal with Vulkan and DX12 need to keep track of individual texture layouts and transitions, and this completely removes such a thing.
This is a fantastic article that demonstrates how many parts of vulkan and DX12 are no longer needed.
I hope the IHVs have a look at it because current DX12 seems semi abandoned, with it not supporting buffer pointers even when every gpu made on the last 10 (or more!) years can do pointers just fine, and while Vulkan doesnt do a 2.0 release that cleans things, so it carries a lot of baggage, and specially, tons of drivers that dont implement the extensions that really improve things.
If this api existed, you could emulate openGL on top of this faster than current opengl to vulkan layers, and something like SDL3 gpu would get a 3x/4x boost too.
DirectX documentation is on a bad state currently, you have the Frank Lunas's books, which don't cover the latest improvements, and then is hunting through Learn, Github samples and reference docs.
Vulkan is another mess, even if there was a 2.0, how are devs supposed to actually use it, especially on Android, the biggest consumer Vulkan platform?
Isn't this all because PCI resizable BAR is not required to run any GPU besides Intel Arc? As in, maybe it's mostly down to Microsoft/Intel mandating reBAR in UEFI so we can start using stuff like bindless textures without thousands of support tickets and negative reviews.
I think this puts a floor on supported hardware though, like Nvidia 30xx and Radeon 5xxx. And of course motherboard support is a crapshoot until 2020 or so.
This is not really directly about resizable BAR. you could do mostly the same api without it. Resizable bar simplifies it a little bit because you skip manual transfer operations, but its not completely required as you can write things to a cpu-writeable buffer and then begin your frame with a transfer command.
Bindless textures never needed any kind of resizable BAR, you have been able to use them since early 2010s on opengl through an extension. Buffer pointers also have never needed it.
> tons of drivers that dont implement the extensions that really improve things.
This isn't really the case, at least on desktop side.
All three desktop GPU vendors support Vulkan 1.4 (or most of the features via extensions) on all major platforms even on really old hardware (e.g. Intel Skylake is 10+ years old and has all the latest Vulkan features). Even Apple + MoltenVK is pretty good.
Even mobile GPU vendors have pretty good support in their latest drivers.
The biggest issue is that Android consumer devices don't get GPU driver updates so they're not available to the general public.
Neither do laptops, where not using the driver from the OEM with whatver custom code they added can lead to interesting experiences, like power configuration going bad, not able to handle the mixed GPU setups, and so on.
Those requirements more or less line up with the introduction of hardware raytracing, and some major titles are already treating that as a hard requirement, like the recent Doom and Indiana Jones games.
On the contrary, I would say this is the main thing Vulkan got wrong and the main reason whe the API is so bad. Desktop and mobile are way too different for a uniform rendering API. They should be two different flavours with a common denominator. OpenGL and OpenGL ES were much better in that regard.
It is unreasonable to expect to run the same graphics code on desktop GPUs and mobile ones: mobile applications have to render something less expensive that doesn't exceed the limited capabilities of a low-power device with slow memory.
The different, separate engine variants for mobile and desktop users, on the other hand, can be based on the same graphics API; they'll just use different features from it in addition to having different algorithms and architecture.
> they'll just use different features from it in addition to having different algorithms and architecture.
...so you'll have different code paths for desktop and mobile anyway. The same can be achieved with a Vulkan vs VulkanES split which would overlap for maybe 50..70% of the core API, but significantly differ in the rest (like resource binding).
But they don't actually differ, see the "no graphics API" blog post we're all commenting on :) The primary difference between mobile & desktop is performance, not feature set (ignoring for a minute the problem of outdated drivers).
And beyond that if you look at historical trends, mobile is and always has been just "desktop from 5-7 years ago". An API split that makes sense now will stop making sense rather quickly.
Different features/architecture is precisely the issue with mobile, be it due to hardware constraints or due to lack in deiver support. Render passes were only bolted into Vulkan because of mobile tiler GPUs, they never made any sense for desktop GPUs and only made Vulkan worse for desktop graphics development.
And this is the reason why mobile and desktop should be separate graphics APIs. Mobile is holding desktop back not just feature wise, it also fucks up the API.
Mobile is getting RT, fyi. Apple already has it (for a few generations, at least), I think Qualcomm does as well (I'm less familiar with their stuff, because they've been behind the game forever, however the last I've read, their latest stuff has it), and things are rapidly improving.
Vulkan is the actual barrier. On Windows, DirectX does an average job at supporting it. Microsoft doesn't really innovate these days, so NVIDIA largely drives the market, and sometimes AMD pitches in.
Eh, I think the jury is still out on whether unifying desktop and mobile graphics APIs is really worth it. In practice Vulkan written to take full advantage of desktop GPUs is wildly incompatible with most mobile GPUs, so there's fragmentation between them regardless.
It's quite useful for things like skia or piet-gpu/vello or the general category of "things that use the GPU that aren't games" (image/video editors, effects pipelines, compute, etc etc etc)
would it also apply to stuff like the Switch, and relatively high-end "mobile" gaming in general? (I'm not sure what those chips actually look like tho)
there are also some arm laptops that just run Qualcomm chips, the same as some phones (tablets with a keyboard, basically, but a bit more "PC"-like due to running Windows).
AFAICT the fusion seems likely to be an accurate prediction.
Switch has its own API. The GPU also doesn't have limitations you'd associate with "mobile". In terms of architecture, it's a full desktop GPU with desktop-class features.
well, it's a desktop GPU with desktop-class features from 2014 which makes it quite outdated relative to current mobile GPUs. The just released Switch 2 uses an Ampere-based GPU, which means it's desktop-class for 2020 (RTX 3xxx series), which is nothing to scoff about but "desktop-class features" is a rapidly moving target and the Switch ends up being a lot closer to mobile than it does to desktop since it's always launching with ~2 generations old GPUs.
Only if you're ignoring mobile entirely. One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.
In this context, both old Switch and Switch 2 have full desktop-class GPUs. They don't need to care about the API problems that mobile vendors imposed to Vulkan.
I feel like it's a win by default.
I do like to write my own programs every now and then and recently there's been more and more graphics sprinkled into them.
Being able to reuse those components and just render onto a target without changing anything else seems to be very useful here.
This kind of seamless interoperability between platforms is very desirable in my book.
I can't think of a better approach to achieve this than the graphics API itself.
Also there is no inherent thing that blocks extensions by default.
I feel like a reasonable core that can optionally do more things similar to CPU extensions (i.e. vector extensions) could be the way to go here.
I definitely disagree here. What matters for mobile is power consumption. Capabilities can be pretty easily implemented...if you disagree, ask Apple. They have seemingly nailed it (with a few unrelated limitations).
Mobile vendors insisting on using closed, proprietary drivers that they refuse to constantly update/stay on top of is the actual issue. If you have a GPU capable of cutting edge graphics, you have to have a top notch driver stack. Nobody gets this right except AMD and NVIDIA (and both have their flaws). Apple doesn't even come close, and they are ahead of everyone else except AMD/NVIDIA. AMD seems to do it the best, NVIDIA, a distant second, Apple 3rd, and everyone else 10th.
> If you have a GPU capable of cutting edge graphics, you have to have a top notch driver stack. Nobody gets this right except AMD and NVIDIA (and both have their flaws). Apple doesn't even come close, and they are ahead of everyone else except AMD/NVIDIA. AMD seems to do it the best, NVIDIA, a distant second, Apple 3rd, and everyone else 10th.
It is quite telling how good their iGPUs are at 3D that no one counts them in.
I remember there was time about 15 years ago, they were famous for reporting OpenGL capabilities as supported, when they were actually only available as software rendering, which voided any purpose to use such features in first place.
APUs/iGPUs are compared, and here Intel's integrated GPUs seem to be very competitive with AMD's APUs.
---
You of course have to compare dedicated graphics cards with each other, and similarly for integrated GPUs, so let's compare (Intel's) dedicated GPUs (Intel Arc), too:
the current Intel Arc generation (Intel-Arc-B, "Battlemage") seems to be competitive with entry-level GPUs of NVidia and AMD, i.e. you can get much more powerful GPUs from NVidia and AMD, but for a much higher price. I thus clearly would not call Intel's dedicated GPUs to be so bad "at 3D that no one counts them in".
It's weird how the 'next-gen' APIs will turn out to be failures in many ways imo. I think still as sizeable amount of graphics devs still stuck to the old way of doing things. I know a couple graphics wizards (who work on major AAA titles) who never liked Vulkan/DX12, and many engines haven't really been rebuilt to accomodate the 'new' way of doing graphics.
Ironically a lot of the time, these new APIs end up being slower in practice (something confirmed by gaming benchmarks), probably exactly because of the issues outlined in the article - having precompiled 'pipeline states', instead of the good ol state machine has forced devs to precompile a truly staggering amount of states, and even then sometimes compilation can occur, leading to these well known stutters.
The other issue is synchronization - as the article mentions how unnecessarily heavy Vulkan synchronization is, and devs aren't really experts or have the time to figure out when to use what kind of barrier, so they adopt a 'better be safe than sorry approach', leading to unneccessary flushes and pipeline stalls that can tank performance in real life workloads.
This is definitely a huge issue combined with the API complexity, leading many devs to use wrappers like the aforementioned SDL3, which is definitely very conservative when it comes to synchronization.
Old APIs with smart drivers could either figure this out better, or GPU driver devs looked at the workloads and patched up rendering manually on popular titles.
Additionally by the early to mid 10s, when these new APIs started getting released, a lot of crafty devs, together with new shader models and OpenGL extensions made it possible to render tens of thousands of varied and interesting objects, essentially the whole scene's worth, in a single draw call. The most sophisticated and complex of these was AZDO, which I'm not sure made it actually into a released games, but even with much less sophisticated approaches (and combined with ideas like PBR materials and deferred rendering), you could pretty much draw anything.
This meant much of the perf bottleneck of the old APIs disappeared.
I think the big issue is that there is no 'next-gen API'. Microsoft has largely abandoned DirectX, Vulkan is restrictive as anything, Metal isn't changing much beyond matching DX/Vk, and NVIDIA/AMD/Apple/Qualcomm aren't interested in (re)-inventing the wheel.
There are some interesting GPU improvements coming down the pipeline, like a possible OoO part from AMD (if certain credible leaks are valid), however, crickets from Microsoft, and NVIDIA just wants vendor lock-in.
Yes, we need a vastly simpler API. I'd argue even simpler than the one proposed.
One of my biggest hopes for RT is that it will standardize like 80% of stuff to the point where it can be abstracted to libraries. It probably won't happen, but one can wish...
The modern cryengine compiles very fast. Their trick is that they have architected everything to go through interfaces that are on very thin headers, and thus their headers end very light and they dont compile the class properties over and over. But its a shame we need to do tricks like this for compile speed as they harm runtime performance.
Because you now need to go through virtual calls on functions that dont really need to be virtual, which means the possible cache miss from loading the virtual function from vtable, and then the impossibility of them being inlined. For example they have a ITexture interface with a function like virtual GetSize(). If it wasnt all through virtuals, that size would just be a vec2 in the class and then its a simple load that gets inlined.
At least on clang with LTO, with bitcode variant, that should be possible to devirtualize, assuming most of those interfaces only have a single implementation.
In my experience, as long as there's only a single implementation, devirtualization works well, and can even inline the functions. But you need to pass something along the lines of "-fwhole-program-vtables -fstrict-vtable-pointer" + LTO. Of course the vtable pointer is still present in the object. So I personally only use the aforementioned "thin headers" at a system level (IRenderer), rather than for each individual object (ITexture).
In addition to what everyone else has said it also makes it difficult to allocate the type on the stack. Even if you do allow it you'll at least need a probe.
Game consoles generally only offer clang as a possibility for compiler. If you can compile rust to C, then you can finally use rust for videogames that need to run everywhere.
Interesting library, but i see it falls back into what happens to almost all SIMD libraries, which is that they hardcode the vector target completely and you cant mix/match feature levels within a build. The documentation recommends writing your kernels into DLLs and dynamic-loading them which is a huge mess https://jfalcou.github.io/eve/multiarch.html
Meanwhile xsimd (https://github.com/xtensor-stack/xsimd) has the feature level as a template parameter on its vector objects, which lets you branch at runtime between simd levels as you wish. I find its a far better way of doing things if you actually want to ship the simd code to users.
You can do many thing with macros and inline namespaces but I believe they run into problems when modules come into play. Can you compile the same code twice, with different flags with modules?
I think I do understand, this is exactly what we do. (MEGA_MACRO == HWY_NAMESPACE)
Then we have a table of function pointers to &AVX2::foo, &AVX3::foo etc. As long as the module exports one single thing, which either calls into or exports this table, I do not see how it is incompatible with building your project using modules enabled?
(The way we compile the code twice is to re-include our source file, taking care that only the SIMD parts are actually seen by the compiler, and stuff like the module exports would only be compiled once.)
What leads you to that conclusion?
It is still possible to use #include in module implementations.
We can use that to make the module implementation look like your example.
Thus it ought to be possible, though I have not yet tried it.
Since you seem knowledgeable about this, what does this do differently from other SIMD libraries like xsimd / highway? Is it the addition of algorithms similar to the STD library that are explicitly SIMD optimized?
The algorithms I tried to make as good as I knew how. Maybe 95% there.
Nice tail handling. A lot of things supported.
I like or interface over other alternatives, but I'm biased here.
Really massive math library.
Game developers have been doing this since forever, its one of their main reasons to avoid the STL.
EASTL has this as a feature by default, and unreal engine container library has the boundchecks enabled on most games. The performance cost of those boundchecks in practice is well worth the reduction of bugs even on performance sensitive code.
Which is yet another reason to assert (pun intend), how far from reality the anti-bounds check folks are, when even the game industry takes them seriously.
A truly incredible profiler for the great price of free. There is nothing coming at this level of features and performance even on paid software. Tracy could cost thousands of dollars a year and would still be the best profiler.
Tracy requires you to add macros to your codebase to log functions/scopes, so its not an automatic sampling profiler like superluminal, verysleepy, VS profiler, or others. Each of those macros has around 50 nanoseconds of overhead, so you can liberally use them in the millions. On the UI, it has a stats window that will record average, deviation, min/max of those profiler zones, which can be used to profile functions at the level of single nanoseconds.
Its the main thing i use for all my profiling and optimization work. I combine it with superluminal (sampling profiler) to get a high level overview of the program, then i put tracy zones on the important places to get the detailed information.
it does, but i dont use it much due to it being too slow and heavy on memory on my ryzen 5950x (32 threads) on windows. a couple seconds of tracing goes into tens of gigabytes of ram.
Yeah I had issues with the Tracy sampler. It didn’t “just work” the way Superluminal did.
My only issue with Superluminal is I can’t get proper callstacks for interpreted languages like Python. It treats all the CPP callstacks as the same. Not sure if Tracy can handle that nicely or not…
They are not slower than headers. Ive been looking into it because modular STL is such a big win. On my little toy project i have .cpp files compiling in 0.05 seconds while doing import std.
Downside is that at the moment you cant mix normal header STL with module STL in the same project (msvc), so its for cleanroom small projects only. I expect the second you can reliably use that almost everyone will switch overnight just from how fast of a speed boost it gives on the STL vs even precompiled headers.
The one way in which they are slower than headers is that they create longer dependency chains of translation units, whereas with headers you unlock more parallelism at the beginning of the build process, but much of it is duplicated work.
Every post or exploration of modules (including this one) has found that modules are slower to compile.
> I expect the second you can reliably use that almost everyone will switch overnight just from how fast of a speed boost it gives on the STL vs even precompiled headers.
I look forward to that day, but it feels like we're a while off it yet
Depends on the compiler, clang is able to pre-instantiate templates and generate debug info as part of its pch system - (for instance most likely you have some std::vector<int> which can be instantiated somewhere in a transitively included header).
In my projects enabling the relevant flags gave pretty nice speedups.
Yes, but template code is all on headers, so it gets parsed every single time its included on some compile unit. With modules this only happens once so its a huge speed upgrade in pretty much all cases.
Whenever I’ve profiled compile times, parsing accounts for relatively little of the time, while the vast majority of the time is spent in the optimizer.
So at least for my projects it’s a modest (maybe 10-20%) speed up, not the order of magnitude speed up I was hoping for.
For some template heavy code bases I’ve been in, going to PCH has cut my compile times to less than half. I assume modules will have a similar benefit in those particular repositories, but obviously YMMV
Nice set of prompts and skills tho, im grabbing them for personal use.
reply