I used to use a timer to limit the frame rate to 60 Hz too. But on Windows, I found that DwmFlush() seems to act like WaitVSync(), so I've been using that in preference for years now. I think this is undocumented behaviour. I guess what it actually does is wait until the compositor is ready for another frame.
To be able to call that function, I LoadLibrary("dwmapi.dll"), and then GetProcAddress(dwm, "DwmFlush").
> Issues a flush call that blocks the caller until the next present, when all of the Microsoft DirectX surface updates that are currently outstanding have been made.
DWM always runs with vsync so presents never happen more frequently than screen refreshes.
Is there any similar tutorial for XCB? Most documentation only covers libx11, and trying to extrapolate libxcb usage from that is tricky at best and heisenbug-prone at worst.
Very nice! I like that it's a single header. I wrote something similar although it's not single header, doesn't do audio, and no X11 backend https://github.com/samizzo/pixie. I use mine as a backend for Windows ports of MS-DOS demos that I make.
it's described as a way to "fill the complete framebuffer with a solid colour", but 'rgb' is shown in the previous snipped to be 'uint32_t'. That is not how 'memset()' [1] works, it will only use the least significant 8 bits of the 'int'-typed value argument. So it would use whatever blue bits where in 'rgb', only.
For clearing (all zero) it's usable, and then I would recommend writing it as:
this both fixes the typo ("sizeoof" sounds like a C lecturer being punched in the guts), and avoids duplicating the type and instead inferring it from the actual object, which is more DRY and a style I've been advocating for ... years.
Or ... sizeof(*f->buf)? I’d forgotten that sizeof even allows dropping the parens, but isn’t that asking for trouble? “sizeof a <op> b” is sometimes parsed as “sizeof(a) <op> b” and sometimes as “sizeof(a <op> b) depending on the precedence of operator <op>. Is there any downside to always using parens and writing sizeof() like a function call?
it's probably a bit more efficient than fenster on x-windows because it uses shared memory, and i think the programming interface is a little nicer (see the readme above for code examples)
apps i've written in yeso include an adm-3a terminal emulator, a tetris game, a raytracer, an rpn calculator, a fractal explorer, and so on
i haven't ported yeso to win32/64, macos, android, or the browser canvas yet, just x-windows, the linux framebuffer (partly), and a window system implemented in yeso called wercam
it includes bindings for c, python (via cffi), and lua (via luajit's ffi), and presumably you could use it from zig or rust in the same way as fenster, but i haven't tried
Windows' Sleep() function has default resolution 15.6ms, that's not enough for realtime rendering, and relatively hard to fix, ideally need a modern OS and a waitable timer created with high resolution flag.
Use an argument of 1, and windows will come knocking at ~1Khz. This is extremely effective in my experience. Allows you to do some pretty crazy stuff on 1 thread.
That's a silly statement. If you want to place single pixels, a GPU won't help in any way. The most efficient way would be to write pixels with the CPU into a mapped GPU texture, and then render this texture as fullscreen quad. That's hardly more efficient than going through the system's windowing system.
For applications like emulators, or making vintage games (like Doom) run on modern platforms, that approach makes a lot of sense.
Most vintage systems use something way more complex than a pure CPU-controlled framebuffer for graphics. They generally have some sorts of pre-defined "tiles" used to implement a fixed-width character mode, with the addition of a limited number of "sprites" overlaid in hardware. These video modes could be implemented efficiently by modern GPU's.
Only if you don't have a cycle correct emulator inbetween. Those old school system relied on hard real time timings down to the clock cycle to let the CPU control the color palette, sprite properties etc... at the right raster position within a frame. Modern GPUs don't allow such a tight synchronization between the CPU and GPU, so the best way is to run the entire emulation single-threaded on the CPU, including the video decoding.
(the resulting framebuffer can then of course be dumped into a GPU texture for rendering, but that just offers a bit more flexibility, eg embedding the emulated system into a 3D rendered world)
It depends what you mean by "hard real time". In theory, user input you get while scanning out pixel x might change pixel x + 1, and this leaves you with no choice but rendering single pixels in a strictly serial way. In practice, no existing emulator cares about that.
It's not about user input, but the CPU writing video hardware registers at just the right raster position mid-frame (to recycle sprites, change the color palette, or even the resolution). Home computer emulators for systems like the C64 or the Amstrad CPC need to do this at exactly the right clock cycle, otherwise modern (demo scene) demos wouldn't render correctly.
PS: of course one could build a GPU command list to render such a video frame somehow, but I bet just building this command list is more expensive then just doing the video decode with the CPU. It would basically come down to one draw command per (emulated system) pixel in the worst case.
But wouldn't the GPU help if you were mapping e.g. 256x192 virtual pixels to say 1024x768? I.e., each of the pixels from the low-res space being represented by a NxM patch of actual screen pixels, like a Win32 GDI StretchBlt() call.
If you had a frame buffer for the actual screen and you tried to do even a 1 to 2x2 expansion on the CPU's time, that'd have to be a serious speed hit. Presumably GPU hardware can do that sort of thing.
Yes, for fancy upscaling or applying pixel shader effects like CRT filters, doing the final pass on the GPU definitely makes sense. This would no longer be a "minimal" library though.
Window system composers should also be able to upscale bitmaps on their own though, and nothing prevents them to use the GPU for this.
Ok it's most likely slightly slower, but not enough that it matters. Frame latency might actually be lower though if the result doesn't need to go through a swapchain AND the window system composer.
Impossible given the platform limitations at that time, on Linux that code works on top of GLES 3.1. I didn't want too many Windows-only features there.
My remark was about Fenster, and a critique of your comment that it somehow isn't good enough because it lacks GPU acceleration and that sort of thing. The point of Fenster is to be minimal and simple yet still allow programmers to get stuff on the screen in a way that's easy and fun. It performs superbly at the role it was conceived for. We can list features we want/think are table stakes for a modern graphics pipeline or whatever, and there are plenty of libraries that are up to those tasks. This one is doing something different.
> No direct access to the command buffers of the GPU?
Lol, show me one library that does this without going through an abstraction layer like Vulkan. Details like this are not even documented by GPU vendors and you'd need to reverse engineer every supported GPU architecture yourself.
Guy writes a library with the design goals of making basic, retro-like graphics functionality easy and fun with a minimum of code, and certain Hackernews dogpile him because it can't be integrated with an AAA pipeline or whatever.
It turns out without shaders, and all their complexity, you basically can’t do anything useful at high resolutions.
/shrug
It’s too slow; compositing, transformations, all the things people want to do are very much harder to implement in software on top of a naive graphics stack.
Simple doesn’t mean naive; you can have a simple api that is excellent.
…but this is a naïve implementation, and it won’t work in any useful way at scale.
People complain about ballooning complexity, but they often forget that stuff (eg. specialised graphics hardware) wasnt invented by a bunch of idiots.
It was invented because the naive approach, that they used before was found to be in practice fundamentally limiting, inferior and failed to meet people’s expectations.
Of course, if you vastly lower your expectations, and target say, 320x240 at 8bit colour, you can happily have a naïve implementation that works just fine
(Don't believe me? Quote:
> Having this we can now draw complex polygons and would probably need a “flood fill” algorithm. Typically it is implemented using a queue of pixels to check and paint, but we can use recursion, as long as the filled area remains small enough to not overflow the stack
^ This is the very definition of a naive implementation, and https://github.com/zserge/fenster/blob/e71d493fa6d544243dd60... settings one pixel at a time, is too. Delightfully charming as it might be to write your own functions that set one pixel at a time, it's a joke, really, if you expect to do anything serious)
For any graphic intensive application it would be obviously be necessary to use a GPU.
But for quick hacking / porting old demos / writing emulators and also text based UI it can be fast enough.
With the added benefit of small footprint, high compatibility and fast startup time.
The Lite editor https://github.com/rxi/lite is using pure software rendering (on top of SDL) in a rather naïve fashion but it still renders full 32bit colors at full resolution at more than 60FPS on my computer, not the best solution but still surprisingly fast given the simplicity of the renderer.
This is typically a case where simple/naïve can beat a juggernaut like Electron.
I think you'll find that they found the naive approach was sufficiently poor, performance wise, that additional optimizations had to be applied on-top.
> But for quick hacking / porting old demos / writing emulators and also text based UI it can be fast enough.
/shrug
If you want to use it, use it. It's 'good enough'...
If anyone is interested in getting into zig, we made a library that has similar goals (and can run on WASM in the browser): https://github.com/ibebrett/zigzag
Wonderful. I really wanted something like this to do Graphic the "hard way", also the most fun way... I was trying to use Tcl-TK for that but this seems a lot lower level, perfect for what I wanted to do which is to write a small GUI toolkit for tiny apps.
Writing pixels into an RGBA framebuffer in main memory is fun, but it's also the easy and slow way to do graphics. The hard way these days is to figure out how to use a modern GPU to do the rendering you need. It's almost always several orders of magnitude faster.
writing pixels into an rgba framebuffer in main memory is still plenty fast to do a full-screen animation at 60 hertz tho
the benefit of the gpu is no longer (since maybe late last millennium) that you don't have the memory bandwidth to your framebuffer; it's that you can do a lot of computation per pixel
typically the gpu advantage is only about an order of magnitude tho
I think the Xlib dependency would prevent compilation as an αpε. I put some effort into doing X11 from scratch (without Xlib or Xcb) to make this possible. Or at least, my aim was to be able to build with musl libc and generate a single executable that would run on many different Linuxes.
FWIW I'm currently working on a DNS server with nice extra features like Multicast for mDNS/Bonjour: it could be used to publish the X server DISPLAY and make it accessible via service-discovery
To be able to call that function, I LoadLibrary("dwmapi.dll"), and then GetProcAddress(dwm, "DwmFlush").