Hacker News new | past | comments | ask | show | jobs | submit login
GTK: Introducing Graphics Offload (gtk.org)
279 points by signa11 on Nov 18, 2023 | hide | past | favorite | 236 comments



Is this something where it would be helpful if the Linux (environment) developers worked together? Like the (graphics) kernel, GTK, KDE, Wayland, … guys all in one room (or video conference) to discuss requirements iron out one graphics architecture that is efficient and transparent?

I think it’s good that different graphics systems exist but it feels unnecessary that every team has to make their own discoveries how to handle the existing pieces.

If work were coordinated at least on the requirements and architecture level then I think a lot of synergies could be achieved. After that everyone can implement the architecture the way that works best for their use case, but on some common elements could be relied on.



That's exactly what happened. This is the original intent for subsurfaces. A bunch of Wayland developers got together and wrote the spec a long time ago. The only thing happening now is Gtk making use of them transparently in the toolkit.

Subsurfaces didn't have bug-free implementations for a while, so maybe some people avoided them. But I know some of us emulator programmers have been using them for output (especially because they can update asynchronously from the parent surface), and I think a couple media players do, too. It's not something that most applications really need.


They do, it's just not hugely visible. Two great conferences where some of that work happened were linux.conf.au and the Linux Plumbers Conference.


That’s exactly how Wayland came to be.


Rounded corners seems like a feature that has unexpectedly high performance penalty but the ui designers refused to let it go.


It's not like crazy out of control avant garde different thinking UI designers haven't and totally ruined the user interface of a simple video player ever before!

Interface Hall of Shame - QuickTime 4.0 Player (1999):

http://hallofshame.gp.co.at/qtime.htm


And then you have smplayer (Multiplatform) and mpc-hc as a good balance between usability and features.

I might write some UI for mplayer/mpv on Motif some day, is not rocket science, you basically talk to both with sockets and sending commands. Kinda like mpc/mpd.


It's something I as a user would also refuse to let go, given that the performance penalty is reasonably small (I think it is).


I think the point is that it's not - rather than just copying a rectangular area to the screen, you have to go through the intermediate step of rendering everything to a temporary buffer, and compositing the results via a shader.


But… the example given shows that they place the video frame behind the window an make the front window transparent except for the round play button. This apparently offloads the frame… so why not just do the same for rounded corners?

What am I missing?


The selfie example has an explanation under it. Summary: because the video subsurface is square and needs to be clipped to produce rounded corners. Otherwise you would have corners of the video surface protruding from under the corners.


Professional designers mostly cut their teeth on physical objects and physical objects almost never have sharp corners.

This then got driven into the ground with the "Fisher-Price GUI" that is the norm on mobile because you can't do anything with precision since you don't have a mouse.

I would actually really like to see a UI with just rectangles. Really. It's okay, designers. Take a deep breath and say: "GUIs aren't bound by the physical". BeOS and MacOS used to be very rectangular. Give us a nice nostaligia wave of fad design with rectangles, please.

Animations and drop shadows are another thing I'd like to see disappear.


Rounded corners for windows have been in the Macintosh operating system since the beginning.

https://www.folklore.org/StoryView.py?story=Round_Rects_Are_...


Yes, but often not in the interaction areas: http://toastytech.com/guis/macos1.html

Or, take a look at the interaction mechanisms and palettes in MacPaint: https://www.punchkick.com/blog/2015/08/07/how-apple-has-shap...

Even in the 1990s, it was a lot of squareness.

And, as I point out, physical objects have rounded corners for a reason. Computers GUIs do not share that reason.

Rounded corners are a design choice. A different design choice can be made.

One other thing that people forget is that rounded things generally suck for packing.


> And, as I point out, physical objects have rounded corners for a reason. Computers GUIs do not share that reason.

I knew a guy who poked his eye out on a Windows 3.1 window corner. Super brutal, was running on only 16-colors too. It was the upper left corner, so there wasn't even a drop shadow to help.


My main setup it's cwm+uxterm+tmux+sacc+mpv/mocp/mupdf... and such.

But, from time to time, I sometimes use emwm with xfile and a bunch of light Motif apps with some Solaris9-themed GTK2/3 ones when there's no Motif alternative. Usable, with contrast and every menu it's sticky, so good enough for a netbook.


Is it possible that they are just the well know representative example? I vaguely suspect they that is the case, but I can’t think of the broader class they are an example of, haha.

The play button they show seems to be a good one, though. It is really nice to have it overlaid on the video.


Fortunately the Play button disappears when the video starts playing, so it has no effect on the frame rate!

Or instead of a triangular Play button, you could draw a big funny nose in some position and orientation, and the game would be to pause the video on a frame with somebody's face in it, with the nose in just the right spot.

I don't know why the vlc project is ignoring my prs.


If you're writing a video player or a game or something else that wants direct scan-out, then you can disable the round corners in your CSS.


I’m not sure I understand why an overlay allows partial offloading while rounding the corner of the video does not.

Couldn’t the rounded corners of a video also be an overlay?

I’m sure I’m missing something here, but the article does not explain that point.


If you have the video extend to where the corners are rounded, you must use a "rounded clip" on the video ontop of the shadow region (since they butt).

That means you have to power up the 3d part of the GPU to do that (because the renderer does it in shaders).

Where as if you add some 9 pixels of black above/below to account for the rounded corner, there is no clipping of the video and you can use hardware scanout planes.

That's important because keeping the 3d part of the GPU turned off is a huge power savings. And the scanout plane can already scale for you to the correct size.


I think it’s that the _window_ has rounded corners, and you don’t want the content appearing outside the window.


No, you can already be sure it's the right size. This has to do with what it takes to occlude the rounded area from the final display.


>Couldn’t the rounded corners of a video also be an overlay?

No because the clipping is done in the client after the content is drawn. The client doesn't have the full screen contents. To make it work with an overlay, the clipping would have to be moved to the server. There could be another extension that lets you pass an alpha mask texture to the server to use as a clip mask. But this doesn't exist (yet?)


And even if it did, you can't do clip masks with scanout overlays. So you have to composite (and therefore take the hit of ramping up the 3d capabilities of the GPU).


Yes you can, what you would do is clip against the main framebuffer and leave the window area transparent by inverting the mask. Then you can have the other plane underneath the main one. This is only if your hardware supports that though.


Presumably that would still cause issues at the rounded corners because you need to blend those pixels to make them look nice.


I'd love to know the answer to that. This is fantastic work, but it'd be a shame for it to be scunnered by rounded corners.


Because the UX folks want what they want. I want my UI out of the way, including the corners of video and my CPU load.


Easily solved by black bars just like people are used to on a TV. I assume most video players will do this when in windowed mode.


That's a good observation. Plus, you'll get black bars anyway, if you resize the window to not be the same aspect ratio as the video.


I wonder if there are plans to make it work with X11 in the future, I've yet to see the benefit of trying to switch to Wayland on my desktop, it just doesn't work as-is the way my 8 year old setup works.


This is one of the benefits of the Wayland protocol over X, being able to do this kind of thing relatively straightforwardly.

Once support for hardware planes becomes more common in Wayland compositors, this can be tied to ultimately allow no-copy rendering to the display for non-fullscreen applications, which for video playback (incl. likes of Youtube) equals to reduced CPU & GPU usage and less power draw, as well as reduced latency.


> This is one of the benefits of the Wayland protocol over X

What.

The original design of X actually encouraged a separate surface / Window for each single widget on your UI. This was actually removed in Gtk+3 ("windowless widgets"). And now they are bringing it back just for wayland ("subsurfaces"). As far as I can read, it is practically the same concept.


The original design of X had clients send drawing commands to the X server, basically treating the server as a remote Cairo/Skia-like 2D rasterizer, and subwindows were a cheap way to avoid pixel-level damage calculations. This was obviated in the common case by the Xdamage extension. Later use of windows as a rendering surface for shared client/server buffers was added with Xshm, then for device video buffers with Xv.

GTK3 got rid of windowed widgets because Keith Packard introduced the Xrender extension, which basically added 2D compositing to X, which was the last remaining use for subwindows for every widget.


This is completely wrong. Xrender is completely orthogonal to having windows or not. Heck, Xrender takes a _window_ as target -- Xrender is just an extension to allow more complicated drawing commands to be sent to the server (like alpha composition). You make your toolkit's programmer's life more complicated, not less, by having windowless widgets (at the very minimum you now have to complicate your rendering & event handling code with offsets and clip regions and the like).

The excuse that was used when introducing windowless widgets is to reduce tearing/noise during resizing, as Gtk+ had trouble synchronizing the resizing of all the windows at the same time.


>Xrender is completely orthogonal to having windows or not. Heck, Xrender takes a _window_ as target -- Xrender is just an extension to allow more complicated drawing commands to be sent to the server (like alpha composition).

Yes, that's the point. When you can tell Xrender to efficiently composite some pixmaps then there's really no reason to use sub-windows ever.

>You make your toolkit's programmer's life more complicated, not less, by having windowless widgets (at the very minimum you now have to complicate your code with offsets and clip regions and the like).

No, you still had to have offsets and clip regions before too because the client still had to set and update those. And it was more complicated because when you made a sub-window every single bit of state like that had to be synchronized with the X server and repeatedly copied over the wire. With client-side rendering everything is simply stored in the client and never has to deal with that problem.


> When you can tell Xrender to efficiently composite some pixmaps then there's really no reason to use sub-windows ever.

There is, or we would not be having subsurfaces on Wayland or this entire discussion in the first place.

Are you seriously arguing that the only reason to using windows in Xorg is to have composition? People were using Xshape/Xmisc and the like to handle the lack of alpha channels in the core protocol? This is not what I remember. I would be surprised if Xshape even worked on non-top level windows. heck, even MOTIF had windowless widgets (called gadgets iirc), and the purpose most definitely was not composition-related.


>There is, or we would not be having subsurfaces on Wayland or this entire discussion in the first place.

No. The subsurfaces in Wayland are only designed for two things:

1. Direct scan-out, as in TFA. (Because a subsurface can be directly translated to a dmabuf)

2. Embedding content from one toolkit/library into another. (Because without it, lots of glue code would be needed)

It's discouraged to use them otherwise, they would complicate things for no benefit.

>Are you seriously arguing that the only reason to using windows in Xorg is to have composition?

If you mean sub-windows, yes, that and consequently because of the way that XRDB worked. I don't see why you would ever use them just for input events within the same toolkit, they don't do anything special there.


Even XCalc has always had trouble synchronizing the resizing of all the windows at the same time, without shitting its pants.

https://www.donhopkins.com/home/catalog/unix-haters/x-window...

That's what happened when I simply resized stock xcalc several times in a row to different sizes and aspect ratios. I wonder if they've finally figured out how to fix that bug in 35 years?

That was why I was compelled to hack an X11 window manager to take a command line argument telling it which window id to treat as the root window, then ran xcalc, discovered its window id with "xwininfo" or some such utility, then ran the window manager on the calculator, putting window frames around each of the calculator's buttons, so you could resize them, move them around, open and close them to icons, etc! That was a truly customizable calculator.


Drawables on Drawables doesn't help here at all.

Sure it lets you do fast 2d acceleration but we don't use 2d accel infrastructure anywhere anymore.

Subsurfaces have been in Wayland since the beginning of the protocol.

This is simply getting them to work on demand so we can do something X (or Xv) could never do since Drawables would get moved to new memory (that may not even be mappable on the CPU-side) on every frame.

And that's to actually use the scanout plane correctly to avoid powering up the 3d part of the GPU when doing video playback on composited systems.


> I've yet to see the benefit of trying to switch to Wayland on my desktop

how about Graphics Offload?


This feature would be nice-to-have but is not impactful enough (at least to me) to outweigh the cons of having to switch to Wayland, which would include migrating my DE and getting accustomed to it as well as looking for replacement applications for these that do not work properly with Wayland (most notably ones that deal with global keyboard hooks). Admittedly I have never tried XWayland which I think could potentially solve some of these issues.


I think if your waiting for a magic bullet of a feature to upgrade you might be waiting a long time, and even Wayland will be replaced at that point. Instead, look at the combination of features (like this) and think about it and future upgrades. I think you're right that xwayland is probably a compromise for now if you need it for things like global shortcuts.


Frankly, it was X11 which introduced "Graphics Offload" in the first place, with stuff like XV, chroma keying, and hardware overlays. Then compositors came and we moved to texture_from_surface extensions and uploading things into GPUs. This is just the eternal wheel of reinventing things in computing (TM) doing yet another iteration and unlikely to give any tangible benefits over the situation from decades ago.


There are plenty of wheel reinventions in IT, but let’s not pretend that modern graphics are anything like it used to be. We have 8k@120Hz screens now, the amount of pixels that have to be displayed in a short amount of time is staggering.


At the same time, you also have hardware that can push those pixels without problem. When X and these technologies were introduced, the hardware was not able to store the entire framebuffer for one screenful in memory, let alone two. Nowadays you are able to store a handful at the very minimum. Certainly there's a different level of performance here in all parts, but the concepts have changed very little, and this entire article kind of shows it.


Storing is one thing, pushing another. 32bpp×4K@60Hz works out to 1.8GB/s, which even on modern CPUs requires you to not do stupid things—e.g. a C loop with a (modest) iteration per byte will likely end up an order of magnitude slower. And that’s if you’re doing 32bpp SDR like it’s 2005; a floating-point pixel format will be four times more data than that.


No, nothing like this exists in X11. Xorg still doesn't really have support for non-RGB surfaces. DRI3 gets you part of the way there for attaching GPU buffers but the way surfaces work would have to be overhauled to work more like Wayland, where they can be any format supported by the GPU. There isn't any incentive to implement this in X11 either because X11 is supposed to work over the network and none of this stuff would.

Yes, you're technically right that this would have been possible years ago but it wasn't actually ever done, because X11 never had the ability to do it at the same time as using compositing.


> Xorg still doesn't really have support for non-RGB surfaces

You really need to add context to these statements, because _right now_ I am using through Xorg a program which uses a frigging colormap, which is as non-RGB as it gets. The entire reason Xlib has this "WhitePixel" and XGetPixel and XYPixmap and other useless functions which normally fetch a lot of ire is because it tries to go out of its way to support practically other-wordly color visuals and image formats. If anything, I'd say it is precisely RGB which has the most problems with X11, specially when you go more than 24bpp.

> there for attaching GPU buffers

None of this is about the GPU, but about about directly presenting images for _hardware_ composition using direct scan-out, hardware layers or not. Exactly what Xv is about, and the reason Xv supports formats like YUV.

> There isn't any incentive to implement this in X11 either because X11 is supposed to work over the network and none of this stuff would

As if that prevented any of the extensions done to X11 in the last three decades, including Xv.


>I am using through Xorg a program which uses a frigging colormap

That doesn't change what I said. The colormap is mapping indexes to RGB palette entries and technically doesn't even support any other color spaces. Nothing about that is "non-RGB". The other visuals are just various other ways to do RGB. If this isn't making sense to you, think about how this would be implemented in the driver.

>None of this is about the GPU, but about about directly presenting images for _hardware_ composition using direct scan-out

I'm sorry? What do you suppose is doing the direct scan-out to the monitor if not the GPU?

>As if that prevented any of the extensions done to X11 in the last three decades, including Xv.

That's not relevant, XV actually does support running over the network as long as you don't use the SHM functions. But regardless, yes, it actually did. The main example being indirect GLX which hasn't been updated in decades because it's not feasible to run it over the network anymore.


This would require protocol changes for X11 at best, and nobody is adding new protocols. Especially when nobody does Drawable of Drawables anymore and all use client-side drawing with Xshm.

You need to dynamically change stacking of subsurfaces on a per-frame basis when doing the CRTC.


I really don't see why it would need a new protocol. You can change stacking of "subsurfaces" in the traditional X11 fashion and you can most definitely do "drawables of drawables". At the very least I'd bet most clients still create a separate window for video content.

I agree though it would require a lot of changes to the server and no one is in the mood (like, dynamically decide whether I composite this window or push it to a Xv port or hardware plane? practically inconceivable in the current graphics stack, albeit it is not a technical X limitation per-se). This entire feature is also going to be pretty pointless in Wayland desktop space either way because no one is in the mood either -- your dmabufs are going to end up in the GPU anyway for the foreseeable future, just because of the complexity of liftoff, variability of GPUs, and the like.


> I really don't see why it would need a new protocol.

You'll need API to remap the Drawable to the scanout plane from a compositor on a per-frame basis (so when submitting the CRTC) and the compositor isn't in control of the CRTC. So...


This assumes it would be the role of the window manager or compositor rather than the server who decides that, which I didn't think that way. But I guess it'd make sense (policy vs mechanism). "per-frame basis" I don't see why and it just mirrors Wayland concepts. Still, as protocols go, it's quite a minor change, and one applications don't necessarily have to support.


The server doesn't have enough information to do it because it doesn't know anything about window management. At best it could use a heuristic to guess, but one of the reasons Wayland combines the server and the compositor is so that we don't need to pile heuristics into the X server and drivers any more.

Nothing in Xorg is a minor change because it has to be tested with every window manager and compositor before you can even think about merging.


I doubt they have the energy to backport bleeding edge tech.


I would very much doubt it. This would likely require work on Xorg itself (a new protocol extension, maybe; I don't believe X11 supports anything but RGB, [+A, with XRender] for windows, and you'd probably need YUV support for this to be useful), which no one seems to care to do. And the GTK developers seem to see their X11 windowing backend as legacy code that they want to remove as soon as they can do so without getting too many complaints.


No screen tearing is a major benefit of using a compositor.


And screen tearing is a major benefit of not using a compositor. There's an unavoidable tradeoff between image quality and latency. Neither is objectively better than the other. Xorg has the unique advantage that you can easily switch between them by changing the TearFree setting with xrandr.


It's not unique. Wayland has a tearing protocol.

https://gitlab.freedesktop.org/wayland/wayland-protocols/-/t...


This is something that every application has to opt in to individually. It's not a global setting like TearFree.


Which is the correct default? No application should unknowingly render half-ready frames, that’s stupid. The few niches where it makes sense (games, 3D applications) can opt into it, and do their own thing.


That is subjective, i do not want the input latency induced by synchronizing to the monitor's refresh rate in my desktop as it makes it feel sluggish. The only time i want this is when i watch some video (and that is of course only when i actively watch the video, sometimes i put a video at the background when i do other stuff) - so for my case the correct default is to have this disabled with the only exception being when watching videos.


Okay, and then the program will get an event, that propagates down to the correct component, which reacts some way, everything that changes due to that are damaged, and every damaged component is re-rendered with synchronization from the framework itself. It has to be specifically coded (e.g. text editor directly writing to the same buffer the rendered character) to actually make efficient use of tearing, it will literally just tear otherwise with zero benefits.


You don't need to do anything special. Just render to the front buffer immediately and don't worry about the current scanout position. If it's above the part of the screen you're updating, great, you saved some latency. If it's below, latency is the same as if you waited for vsync. And if it's in the middle, you at least get a partial update early.


The average human reaction time is 250ms. The amount of latency you'd save from that on average is unnoticeable, and in exchange you get the appearance of stuttering and corruption from the tearing.


The meme was "the eye can't see more than 25fps anyway", which is less than 50ms. It's a meme because it's of course incorrect. There is a huge difference between how long it takes us to acquire and react to what our retina is sensing (in the order of hundreds of ms), and keeping track of a moving/evolving shape (<5ms).

Here you are saying that latency of 250ms is unnoticeable, which is utter nonsense.


Your comment has nothing to do with the conversation. The reason to have low latency when typing text is so you can correct mistakes. That requires the full response time. There's no moving or evolving shapes. Maybe proofread your own comments before throwing around accusations of "utter nonsense".


Are you saying that latency in the order of 250ms when editing text is unnoticeable?

Yeah, that is utter nonsense. Just try it for yourself, instead of pulling statements like that out of thin air. You would notice a difference in tens of ms even when editing text. Why do you think people cannot stand writing with VSCode for example?

Re: moving/evolving shapes, I did not think I had to clarify that the brain is a massively parallel system with multiple modes of operation. Editing text does not require you to reprocess all visual signals from scratch, because that is not how the visual cortex works. The perceived latency when editing text is between pressing a key and your brain telling you "my eyes have detected a change on the screen. I will assume that it is the result of me pressing a key". It does NOT take 250ms to make this type of assumption, and is basically how our vision operates. It's a prediction engine, not a CCD sensor.


>Why do you think people cannot stand writing with VSCode for example?

Which people? Every recent study I've seen shows VSCode as the most popular code editor by a large margin. Maybe latency isn't as important as you think?

>Are you saying that latency in the order of 250ms when editing text is unnoticeable?

No. Sorry for the info dump here but I'm going to make it absolutely clear so there's no confusion. The latency of the entire system is the latency of the human operator plus the latency of the computer. My statement is that, assuming you have a magical computer that computes frames and displays pixels faster than the speed of light, the absolute minimum bound of this system for the average person is 250ms. You only see lower response time averages in extreme situations like with pro athletes: so basically, not computer programmers who actually spend much more time thinking about problems, and going to meetings, than they actually spend typing.

Now let's go back to reality: with a standard 60Hz monitor, the theoretical latency added by display synchronization is a maximum of about 16.67ms. That's the theoretical MAXIMUM assuming the software is fully optimized and performs rendering as fast as possible, and your OS has realtime guarantees so it doesn't preempt the rendering thread, and the display hardware doesn't add any latency. So at most, you could reduce the total system latency by about 6% just by optimizing the software. You can't go any higher than that.

However, none of those things are true in practice. Making the renderer use damage tracking everywhere significantly complicates the code and may not even be usable in some situations like syntax highlighting where the entire document state may need to be recomputed after typing a single character. All PC operating systems may have significant unpredictable lag caused by the driver. All display hardware using a scanline-based protocol also still has significant vblank periods. Adding these up you may be able to sometimes get a measurement of around 1ms of savings by doing things this way, in exchange for massively complicating your renderer, and with a high standard deviation. Meaning that you likely will perceive the total latency as being HIGHER because of all the stuttering. This is less than 1% of the total latency in the system and it's not even going to be consistent or perceptible.

Now instead consider you've got a 360Hz monitor. The theoretical maximum you can save here is about 2.78ms. This can give you a CONSISTENT 5% latency reduction against the old monitor as long as the software can keep up with it. Optimizing your software for this improves it in every other situation too, versus the other solution which could make it worse. If it doesn't make it worse, it could only save another theoretical 1% and ONLY in a badly perceptible way. It just doesn't make sense to optimize for this less than 1% when it's mostly just caused by the hardware limitations and nobody actually cares about it and they're happy to use VSCode anyway without all this.

So again, you can avoid these accusations of "utter nonsense" when it's clear you're arguing against something that I never said.

>The perceived latency when editing text is between pressing a key and your brain telling you "my eyes have detected a change on the screen.

Your brain needs to actually process what was typed. Prediction isn't helping you type at all, if it did then the latency wouldn't matter anyway. If you're not just writing boilerplate code then you may have to stop to think many many times while you're coding too.


>> That is subjective, i do not want the input latency induced by synchronizing to the monitor's refresh rate in my desktop as it makes it feel sluggish.

Only if the DE or application can't keep up. Oh, it overran the frame time, so let's just dump half of it to screen...

I'm still baffled that this is a problem. I tried ray tracing a hierarchy of rectangles decades ago and it was pretty fast. Should be better than real time today, and that's doing it the expensive way. I suppose the challenge is composting stuff from different processes every frame.


It is not about performance/throughput. There are various factors at play: first of all, the synchronization is basically waiting for the monitor to finish displaying the last image before you send the new one. Second, unless the application takes over the screen (e.g. fullscreen games), the applications do not run in sync with the window system (and you do not want that because you do not want a single application to cause problems for other applications like hogging the screen updates). When you combine these two there is nothing really the application can do. As you wrote:

> I suppose the challenge is composting stuff from different processes every frame.

This is a significant part of the issue, though the fact that synchronization essentially means waiting is also another one (without synchronizing you get partial frames but those partial frames are still feedback for the user that makes the desktop feel more responsive).


This is just untrue.


From the XML file that describes the protocol:

"This global is a factory interface, allowing clients to inform which type of presentation the content of their surfaces is suitable for."

Note that "global" refers to the interface, not the setting. Which Wayland compositor has the equivalent feature of "xrandr --output [name] --set TearFree off"?


Could you explain in what scenario you think it is better to have a display show two half images slightly faster (milliseconds) than one full one?


Text editing. I mostly work on a single line at a time. The chance of the tear ending up in that line is low. And even if it does, it lasts only for a single screen refresh cycle, so it's not a big deal.

And you're not limited to two images. As the frame rate increases, the number of images increases and the tear becomes less noticeable. Blur Busters explains:

https://blurbusters.com/faq/benefits-of-frame-rate-above-ref...


Text editing is mostly about reading, not writing. Scrolling up and down code is actually one of the worst scenarios i can think of, where you absolutely don't want to have tear. Because even as the document is moving you are still reading it.

When touch typing, fingers work decoupled from the eyes anyway, unless you are waiting for the intellisense or copilot prompt, that is usually constrained by language servers anyway, not the framerate.


As the frame rate increases, the latency decreases, making it a non-issue. I rather chose this option, over blinking screens.


The minimum latency is bottlenecked by the monitor unless you allow tearing.


No, the monitor’s refresh rate is a hard limit on the minimal maximal-latency. It doesn’t mean that you are actually close to that minimum, or that it is the hard limit that is active, aka the bottleneck.


Many technologies have been invented to allow to display "two half images slightly faster", such as interlaced scanning...

Most humans will actually prefer the "slightly faster" option. (Obviously if you can do both, then they'd prefer that; but given the trade-off...)


Input latency. I find the forced vsync by compositors annoying even when doing simple stuff like moving or resizing windows - it gives a sluggish feel to my desktop. This is something i notice even on a high refresh rate monitor.


First person shooters. Vertical synchronization causes a noticeable output delay.

For example, with a 60 Hz display and vsync, game actions might be shown up to 16 ms later than without vsync, which is ages in FPS.


Key word here being "might". What actually gets displayed is highly dependent on the performance of the program itself and will manifest as wild stuttering depending on small variations in the scene.

I've seen no game consoles that allow you to turn vsync off, because it would be awful. No idea why this placebo persists in PC gaming.


Personally - gaming. Never liked vsync


If it worked exactly the same there would indeed be no benefit. If you’re happy with that you have then there’s no reason to switch.


I am sorry to tell you that X11 is completely unmaintained by now. So the chances of that happening are zero.


FYI 21.1.9 was released less than a month ago (https://lists.x.org/archives/xorg/2023-October/061515.html), they are still fixing bugs.


You mean, they're still fixing critical CVEs.


Which are still bugs.

Also only two of the four changes mentioned in the mail are about CVEs.


Is the same infrastructure available in Windows and MacOS?


From the article:

> What are the limitations?

> At the moment, graphics offload will only work with Wayland on Linux. There is some hope that we may be able to implement similar things on MacOS, but for now, this is Wayland-only. It also depends on the content being in dmabufs.


macOS supports similar things in its own native stack, but GTK doesn't make use of it.


macOS has IOSurface [0], so it can be done there too. It would require someone to implement it for GTK.

[0] https://developer.apple.com/documentation/iosurface


When I wrote the macOS backend and GL renderer I made them use IOSurface already. So it's really a matter of setting up CALayer automatically the same way that we do it on Linux.

I don't really have time for that though, I only wrote the macOS port because I had some extra holiday hacking time.


On Windows and DirectX, you have the concept of Shared Handles, which are essentially handles you can pass between process boundaries. It also comes with a mutex mechanism to signal who is using the resource at the moment. Fun fact - Windows at the kernel level works with the concept of 'objects', which can be file handles, window handles, threads, mutexes, or in this case, textures, which are reference counted. Sharing a particular texture is just exposing the handle to multiple processes.

A bit of reading if you are interested:

https://learn.microsoft.com/en-us/windows/win32/direct3darti...


The last paragraph says:

> At the moment, graphics offload will only work with Wayland on Linux. There is some hope that we may be able to implement similar things on MacOS, but for now, this is Wayland-only. It also depends on the content being in dmabufs.


Nope, it is yet another step making Gtk only relevant for Linux development.


Supporting a feature on one platform does not make a toolkit less relevant or practical on another platform.


Except this has been happening for quite a while, hence why a couple of cross platform projects have migrated from Gtk to Qt, including Subsurface, a bit ironically, given the relationship of the project to Linus.


The Subsurface developer did that 10 years ago and it only was because he personally preferred Qt. Take a step back for a moment and consider that in 10 years that's the only major example that anyone ever brings out. GTK is still very welcoming for contributions to maintain the GDK backends. Developers like that have to actually step up and do it and have patience, instead of outright quitting and running off to Qt which has a whole company to maintain those ports.


Gtk has always been primarily built by and for Linux users.


The GIMP Tolkit was always cross platform, the GNOME Tolkit not really.


That is ahistorical, and the misnaming doesn't help make your point.


As former random Gtkmm contributor, with articles on the The C/C++ User Journal, I am not the revisionist here.


What's a Tolkit? And why two of them? I thought GTK was the Toolkit, GIMP was the Image Manipulation Program, and Gnome was the desktop Network Object Model Environment. Am I a revisionist here? (I certainly have my reservations about them!)


GTK stand for The GIMP Toolkit, as it was originally used to write GIMP, which actually started as a MOTIF application.

When GNOME adopted GTK as its foundation, there was a clear separation between GTK and the GNOME libraries, back in the 1.0 - 2.0 days.

Eventually GNOME needs became GTK roadmap.

The rest one can find on the history books.


> Eventually GNOME needs became GTK roadmap.

Exactly? If you're still holding out for GTK to be a non-Linux toolkit in 2023 then you're either an incredibly misguided contributor and/or ignorant of the history behind the toolkit. The old GTK does not exist anymore, you either use GNOME's stack or you don't.


GNOME co-opted and sabotaged GTK for anyone that’s not GNOME. GTK used to be capable of being fairly OS-neutral, and was certainly quite neutral within Linux and so became the widget toolkit of choice for diverse desktop environments and worked well thus; but over time GNOME has taken it over completely, and the desires of other desktop environments are utterly ignored. The GNOME Foundation has become a very, very bad custodian for GTK.

As you say, the old GTK is dead. GNOME murdered it. I mourn it.


Yeah, I don't disagree with anything you've said. Still though, I use GTK because it works and think the pushback against it is silly. GTK was never destined to be the cross-platform native framework. If that was attainable, people would have forked GTK 2 (for what?) or GTK 3 (too quirky). Now we're here, and the only stakeholders on the project is the enormously opinionated GNOME team.

They've made a whole lot of objective and subjective missteps in the past, but I don't think it's fair to characterize them as an evil party here. They did the work, they reap the rewards, and they take the flak for the myriad of different ways the project could/should have gone.


The problem with the GIMP team is not that they're enormously opinionated, but that they're WRONGLY opinionated.

It's not that they're the evil party, it's just that they should stop feeling so sorry for themselves that so few people want to use their image editor because of its terrible user interface caused by the fact that they refuse to listen to their users, and it has a terribly offensive name that they refuse to change.

At least they still have a fanatical following of MAGA incel edgelords and ESR sycophants who love it BECAUSE it has an offensive name, so they still have that hard core fanbase to appeal to.

They're as self-sabotaging as RMS himself, and they don't deserve to play the victim or to have a pity party, especially when they try to throw it for themselves.


GIMP has about 3-4 part-time developers and no designers. They have no resources to redesign the user interface even though it's been wanted for a long time. It's taken them an extremely long time just to get GIMP 3 out the door and that's just a port without any major UI changes. But I agree otherwise, the horrible name is completely on them.


No, that's an outlandish conspiracy theory and completely ahistorical. GTK was always developed on Linux first, and before it was used by GNOME it had a lot of GIMP-specific functionality that didn't extend well to other apps. Want to know why? Because GIMP and GNOME developers were the only ones contributing. Those "diverse desktop environments" almost always took from GNOME and contributed very little back. That's fine to do it but they need to accept that they don't call the shots when they do that. They don't get to pull their funding and then complain someone else is being a bad custodian, it doesn't work like that.


> Those "diverse desktop environments" almost always took from GNOME and contributed very little back.

Now that's an ahistorical conspiracy theory. Those diverse desktop environments contributed hugely to GTK, GNOME just didn't use their work or consider it helpful unless it directly related to their desktop... and of course none of that work will relate to their desktop. Nobody is going to fully "kiss the ring" unless they get something out of it, and even back in the GTK3 days it was plainly clear that GNOME didn't care about you if you didn't care about GNOME.

Now, GNOME's "coup" or "killing" of GTK is completely fine by Open Source standards. Even encouraged. I don't stand against the concepts of what they're doing, but they could have done a lot better than fighting third-parties tooth-and-nail. GNOME should be a proud project that leads the GNU movement, and instead it was reduced to a bunch of squabbling supremacists that made their userbase an adversary. I say all that as someone who quite likes modern GTK and writes apps in it.


>Now that's an ahistorical conspiracy theory

No? Where exactly do you think I've theorized about the existence of a conspiracy? Because I've actually said the exact opposite: there isn't a conspiracy and no one is cooperating at all. There's no evil group of developers secretly planning to sabotage everything. It's just the usual bad communication and planning that happens with a distributed team.

>Those diverse desktop environments contributed hugely to GTK, GNOME just didn't use their work

Can you name what any of these contributions were? Because I've never seen them. I've seen contributions here and there, lots of minor bug fixes, but nothing major.

>Nobody is going to fully "kiss the ring" unless they get something out of it

Avoid this rhetoric please. These open source projects are a volunteer collaboration. No one's kissing any rings or trying to get something out of the maintainers, other than the usual: everyone helps each other write and maintain the code.

>but they could have done a lot better than fighting third-parties tooth-and-nail. GNOME should be a proud project that leads the GNU movement

I really don't know what you're talking about here, but disagreeing about technical things isn't "fighting tooth-and-nail". That's a normal part of any project.

Personally I don't think anyone should care about leading the GNU movement, that's been plagued by petty infighting and drama since the very beginning.


Dude, I know. I've been implementing user interface toolkits since the early 80's, but I've still never heard of a "Tolkit", which you mentioned twice, so I asked you what it was -- are you making a silly pun like "Tollkit" for "Toolkit" or "Lamework" for "Framework" or "Bloatif" for "Motif" and I'm missing it? No hits on urban dictionary, even. And also you still haven't explained whether I'm a revisionist or not.

Just like you, I love to write articles about user interface stuff all the time, too. Just in the past week:

My enthusiastic but balanced response to somebody who EMPHATICALLY DEMANDED PIE MENUS ONLY for GIMP, and who loves pie fly, but pushed my button by defending the name GIMP by insisting that instead of the GIMP project simply and finally conceding its name is offensive, that our entire society adapt by globally re-signifying a widely known offensive hurtful word (so I suggested he first go try re-signifying the n-word first, and see how that went):

https://news.ycombinator.com/item?id=38233793

(While I would give more weight to the claim that the name GIMP is actually all about re-signifying an offensive term if it came from a qualified and empathic and wheelchair using interface designer like Haraldur Ingi Þorleifsson, I doubt that’s actually the real reason, just like it’s not white people’s job to re-signify the n-word by saying it all the time...)

Meet the man who is making Iceland wheelchair accessible one ramp at a time:

https://scoop.upworthy.com/meet-the-man-who-is-making-icelan...

Elon Musk apologises after mocking laid-off Twitter worker, questioning his disability:

https://www.abc.net.au/news/2023-03-08/elon-musk-haraldur-th...

The article about redesigning GIMP we were discussing credited Blender with being the first to show what mouse buttons do what at the bottom of the screen, which actually the Lisp Machine deserves credit for, as far as I know:

https://news.ycombinator.com/item?id=38237231

I made a joke about how telling GIMP developers to make it more like Photoshop was like telling RMS to develop Open Software for Linux, instead of Free Software for GNU/Linux, and somebody took the bait so I flamed about the GIMP developer’s lack of listening skills:

https://news.ycombinator.com/item?id=38238274

Somebody used the phrase "Easy as pie” in a discussion about user interface design so I had to chime in:

https://news.ycombinator.com/item?id=38239113

Discussion about HTML Web Components, in which I confess my secret affair with XML, XSLT, obsolete proprietary Microsoft technologies, and Punkemon pie menus:

https://news.ycombinator.com/item?id=38253752

Deep interesting discussion about Blender 4.0 release notes, focusing on its historic development and its developer’s humility and openness to its users’ suggestions, in which I commented on its excellent Python integration.

https://news.ycombinator.com/item?id=38263171

Comment on how Blender earned its loads of money and support by being responsive to its users.

https://news.ycombinator.com/item?id=38232404

Dark puns about user interface toolkits and a cheap shot at Motif, with an analogy between GIMP and Blender:

https://news.ycombinator.com/item?id=38263088

A content warning to a parent who wanted to know which videos their 8-year-old should watch on YouTube to learn Blender:

https://news.ycombinator.com/item?id=38288629

Posing with a cement garden gnome flipping the bird with Chris Toshok and Miguel de Icaza and his mom at GDC2010:

https://www.facebook.com/photo/?fbid=299606531754&set=a.5173...

https://www.facebook.com/photo/?fbid=299606491754&set=a.5173...

https://www.facebook.com/photo/?fbid=299606436754&set=a.5173...


It was clearly a typo you could choose to ignore charitably instead of nitpick. Also, what is the rest of this comment and how is it related to GTK?


How was it clearly a typo when he repeated it with exactly the same spelling and capitalization, two times in a row?

And neither of those things are even toolkits like GTK: "GIMP Tolkit" is an image editor, and "GNOME Tolkit" is a desktop environment.

And even if you ignore the two typos and the two mis-namings, his whole point is factually incorrect, and jdub was correct and not a revisionist when he said "That is ahistorical, and the misnaming doesn't help make your point".

I'm simply giving him the benefit of the doubt, and asking him to explain what he means, or why not only his main point was wrong, but also why he got both of the names wrong two times in a row, and thinks an image editor and a desktop environment are incorrectly spelled toolkits.

And he still hasn't explained, or admitted he made two typos and misnamed two projects in a row while trying to make an incorrect point, while claiming to be an expert tech writer, and accusing someone who was correct of being a revisionist, so the jury is still out. But your theory it's clearly a typo just doesn't wash. Maybe they're the names of his own forks, or maybe he's just a charlatan, who knows? ;) Why don't you ask him yourself.


Because he was incorrectly nitpicking himself, and was wrong to call somebody else a revisionist without citing any proof, while he was factually incorrect himself, and offering an appeal to authority of himself as a writer and "random Gtkmm contributor" instead. I too have lots of strong opinions about GTK, GNOME, and GIMP, so I am happy for the opportunity to write them up, summarize them, and share them.

You'll have to read the rest of the comment and follow the links to know what it says, because I already wrote and summarized it, and don't want to write it again just for you, because I don't believe you'd read it a second time if you didn't read it the first time. Just use ChatGPT, dude.

Then you will see that it has a lot to do with GTK and GNOME and GIMP, even including exclusive photos of Miguel de Icaza and his mom with a garden gnome flipping the bird.


Oopsie I touched a nerve.


You HAD to mention MOTIF! ;) There's a reason I call it BLOATIF and SLOWTIF...

https://donhopkins.medium.com/the-x-windows-disaster-128d398...

>The Motif Self-Abuse Kit

>X gave Unix vendors something they had professed to want for years: a standard that allowed programs built for different computers to interoperate. But it didn’t give them enough. X gave programmers a way to display windows and pixels, but it didn’t speak to buttons, menus, scroll bars, or any of the other necessary elements of a graphical user interface. Programmers invented their own. Soon the Unix community had six or so different interface standards. A bunch of people who hadn’t written 10 lines of code in as many years set up shop in a brick building in Cambridge, Massachusetts, that was the former home of a failed computer company and came up with a “solution:” the Open Software Foundation’s Motif.

>What Motif does is make Unix slow. Real slow. A stated design goal of Motif was to give the X Window System the window management capabilities of HP’s circa-1988 window manager and the visual elegance of Microsoft Windows. We kid you not.

>Recipe for disaster: start with the Microsoft Windows metaphor, which was designed and hand coded in assembler. Build something on top of three or four layers of X to look like Windows. Call it “Motif.” Now put two 486 boxes side by side, one running Windows and one running Unix/Motif. Watch one crawl. Watch it wither. Watch it drop faster than the putsch in Russia. Motif can’t compete with the Macintosh OS or with DOS/Windows as a delivery platform.


Motif today isn't that bad compared to the bloat of GTK4. Today's '486' in the era of Pentium 3's would be an Atom netbook. EMWM (enhanced MWM) + XFile flies. I quickly hacked NNTP auth support through some quick code and an Xresources value for NCSA Mosaic. Yes, that one.

Also, you get XFT and UTF-8 support thru fontconfig and XFT on every Motif based software. Far from the propietary Motif of 1996...

Compare it to Gnome and dconf...


Motif is only free if your time is worthless. ;)


After seeing this whole thread a day late, I have to wonder: is the unspoken difference what "cross-platform" and "always" mean to different posters? To someone with my historical perspective, it grates a bit to see X Windows conflated with Linux as a platform.

My memory of the early days is consistent with what the wikipedia page says about GIMP. It was cross-platform on the typical Unix workstations that were around the UC Berkeley campus labs and XCF. This was things like Solaris, SunOS, HP-UX, Ultrix, and Irix.

Students in this milieu were just as likely to have some BSD variant on their home PC as Linux. I think it was later during and after the "Beowulf" scientific computing period when Linux started to dominate as the Unix-like platform for open source development.


Has it been relevant for something else before?


Yes, back when G stood for Gimp, and not GNOME.


Even in those days, Gtk+ applications were quite horrible on non-X11 platforms. GTK has never been a good cross-platform toolkit in contrast to e.g. Qt.


I guess people enjoying GIMP and Inkscape would beg to differ.


Maybe it’s a thing on Windows (I don’t know), but I’ve never seen anyone use GIMP or Inkscape on macOS. I’m pretty sure they exist somewhere, but all Mac users I know use Photoshop, Pixelmator or Affinity Photo rather than GIMP.


Inskape devs have a lot of trouble making their app work on other OS.


The thing that's always felt slow to me in GTK was resizing windows, not getting pixels to the screen. I'm wondering if adding all these composited surfaces adds a cost when resizing the windows and their associated out of process surfaces.


More likely it removes costs. This is very specifically an optimization.


It is strange that the article doesn't compare and contrast to full-screen direct scanout, which most X11 and presumably Wayland compositors implement, e.g. KDE's kwin-wayland since 2021: https://invent.kde.org/plasma/kwin/-/merge_requests/502

Maybe that is because full-screen direct scanout doesn't take much (if anything) in a toolkit, it's almost purely a compositor feature.


Is there a significant difference? Hardware planes are basically that, just optionally not full-screen.


Fullscreen direct scanout doesn't require (probably tricky) coordination to blend the graphical outputs of multiple processes. How that coordination works is the interesting technical question.


> A dmabuf is a memory buffer in kernel space that is identified by a file descriptor.

> The idea is that you don’t have to copy lots of pixel data around, and instead just pass a file descriptor between kernel subsystems.

So like sendfile for graphics?

https://www.man7.org/linux/man-pages/man2/sendfile.2.html

It's a pretty awesome system call. We should have more of those.


I can't imagine anybody passing video frames one by one by a system call as a n array of pixels.

I believe neither Xv nor GL-based renderers do that, even before we discuss hw accel.


Emulators for older systems very often do this!

The older consoles like the NES had a pixel processing unit that generated the picture, you need to emulate it's state and interaction with the rest of the system, possibly cycle-by-cycle, which makes it not possible to do on the GPU as a shader or whatever.

This is kind of a niche use case for sure but it's interesting.


They usually use libSDL instead of GTK, though.


They used SDL almost as an display buffer output. Oh, and for easy multimedia.


Semi-related question: Are there any real benefits to having the compositor deal with compositing your buffers as opposed to doing it yourself? Especially if you're already using hardware acceleration, passing your buffers to the system compositor seems like it could potentially introduce some latency.

I guess it would allow the user/system to customize/tweak the compositing process somehow, but for what purpose?


Any kind of post-process effect like transparency, zoom; many stuff like window previews, overviews screens (these are sometimes possible without as well), and tear-freedom.


>> Are there any real benefits to having the compositor deal with compositing your buffers as opposed to doing it yourself?

I would guess lower latency since the compositor is running the show trying to maintain full frame rate. The processes still have to cooperate somehow.


The compositor can potentially map your layers to hardware scanout planes instead of compositing with the GPU, or skip the composition if all layers are obscured.


Very cool!

I think I found a minor typo:

GTK 4.14 will introduce a GtkGraphicsOffload widget, whose only job it is to give a hint that GTK should try to offload the content of its child widget by attaching it to a subsurface instead of letting GSK process it like it usually does.

I think that "GSK" nesr the end should just be "GTK". It's not a very near miss on standard qwerty, though ...



Wow thanks, TIL! Goes to show how far removed I am from GTK these days I guess. :/



Is this the same sort of thing Windows 95-XP had before Vista added DWM?

Back when videos wouldn't show up in screen shots and their position on screen could sometimes gets out of sync with the window they were playing in?

I never fully understood what was happening but my theory at the time was that the video was being sent to the video card separate from the rendered windows.


With Linux/BSD you have a fun issue with X. If you used the XV overlayed video output, you got no screenshot except for a blue one. Both with video files and screenshots on TVTime. Some setting allowed it, I can't remember.


steam deck wayland compositor is built on drm(dmabufs)/vulkan.

(Sad it is written in c++)

But what did surprise me even more: I was expecting GTK, and moreover the 4, to be on par with valve software.

On linux, I would not event think to code a modern system hardware accelerated GFX component which is not drm(dmabuf)/vulkan.


This blog post is not talking about Mutter, GNOME's compositor. GTK's hardware acceleration had already been using dmabufs before adding this graphics offload feature.


But as the article states, with GL, not vulkan.

Unless the article is obsolete itself?


Sure, but OpenGL itself is still useful and used in modern software.


It is legacy and started to be retired.

Not to mention GL is a massive and gigantic kludge/bloat compared to vulkan (mostly due to the glsl compiler). So this is good to let it go.


If you have spare time, the GTK maintainers want people to work on the Vulkan renderer. Benjamin Otte and Georges Stavracas Neto have put in a bit of effort to make the Vulkan renderer better.

GL is only deprecated on Mac from what I understand.


GL is being retired.

I humbly don't think I am the right guy for that: I use raw x11, and will use raw wayland once the steam client is finally and actually wayland enabled, even though I may do the jump before... because "valve": their "pressure-vessel" is buggy, and I am helping fixing alsa support in SDL3, all that is starting... namely if I code something GFX related that would be more a wayland compositor on drm(dmabufs)/vulkan, probably based on valve one, either in plain and simple C (compiling with tcc, cproc/qbe, simple-cc/qbe), minimal to 0 dependencies without the latest ISO tantrums, or x86_64 assembly (before a port to riscv in the far future).

But never say never.


You shouldn't use raw X11 or raw Wayland unless you're writing a low-level toolkit. If you're working on games, SDL should handle all that stuff for you.


Completely wrong: chatgpt again?


No, really. Those APIs are too low level to be useful for normal applications. Nothing in them is useful for games at all. I don't know why you think it's appropriate to put in these insults either. Cut it out.


This is so wrong, yep chatgpt.


I don't think Valve's window toolkit ever supported vulkan. Steam no longer uses OpenGL because they replaced their window toolkit with Chrome.

>It is legacy and started to be retired.

The standard itself, but the implementations are still being maintained and new extensions are being added. It is still a solid base to build upon.


It is being retired, it is a massive bloat compared to vulkan (mainly because of the glsl compiler), so it is reasonable to say than building upon it is a mistake.


I see know signs that it is being retired on platforms other than mac. With vulkan you have to bloat your own program's code and introduce complexity. Regardless bloat doesn't matter so I don't know why you are focusing on that.


ok chatgpt.


This comment is pretty wrong. Vulkan still uses the GLSL compiler.



Can you tell me which part of this you're referring to?


chagpt...


BeOS and Haiku allowed exposure to kernel graphics buffers decades ago (https://www.haiku-os.org/legacy-docs/bebook/BDirectWindow.ht...), which bypass the roundtrip to compositor and back. A companion article describing the design is here (https://www.haiku-os.org/legacy-docs/benewsletter/Issue3-12....)

After 25 years, GTK joins the party …

With no native video drivers, moving a Vesa Haiku window with video playback still seems smoother than doing the same in Gnome/Kde under X11.


> After 25 years, GTK joins the party …

I mean.. shall we start a list of ways in which BeOS/Haiku have yet to "join the party" that the linux desktop has managed?

Juggling the various needs of one hell of a lot more users across a lot more platforms, with a lot more API-consuming apps to keep working on systems which are designed with a lot more component independence is a much harder problem to solve.


Are you mixing up number of users with technical sophistication?


No, that isn't allowing exposure to kernel graphics buffers. That's allowing clients to draw to the main framebuffer with no acceleration at all. If you're memory mapping pixels into user space and drawing with the CPU then you're necessarily leaving kernel space. Around the same time X11 had an extension called DGA that did the same thing. It was removed because it doesn't work correctly when you have hardware acceleration.

So the optimization only makes sense for a machine like yours with no native drivers. With any kind of GPU acceleration it will actually make things much slower. GTK doesn't do this because it would only be useful for that kind of machine running around 25 years ago.


> With any kind of GPU acceleration it will actually make things much slower. GTK doesn't do this because it would only be useful for that kind of machine running around 25 years ago.

Precisely one of the points of TFA is be able to use the "25 year old" hardware overlay support whenever possible (instead of the GPU) in order to save power, like Android (and classic Xv) does.


1. The hardware overlay support is implemented on the GPU. It's not "instead of the GPU".

2. The provided code in Haiku doesn't appear to support overlays.

3. Yes, it's upsetting that a real API wasn't available for this until recently. But, it was of limited practical use without the entire display pipeline being moved to the GPU and without Wayland being established (X11 never had the API quite like this, classic XV is too limited to do what this is doing)


Shared kernel graphics buffers in main or memory mapped framebuffer device memory are one thing (common in the early 80's, i.e. 1982 SunView using /dev/fb), but does it expose modern shared GPU texture buffers to multiple processes, which are a whole other ball game, and orders of magnitude more efficient, by not requiring ping-ponging pixels back and forth between the CPU and GPU when drawing and compositing or crossing process boundaries?


To this day, moving a Haiku window under a 5k unaccelerated EFI GOP framebuffer still feels _significantly_ faster than doing the same under Windows 11, and everything KWin's X/Wayland has to offer on the same hardware (AMD Navi 2 GPU).

> BeOS and Haiku allowed exposure to kernel graphics buffers decades ago

In any case, X also allowed this for ages, with XV and XShm and the like. Of course then everyone got rid of this in order to have the GPU in the middle for fancier animations and whatnot, and things went downhill since.


>In any case, X also allowed this for ages, with XV and XShm and the like.

No, XShm doesn't do that and the way XV does it is completely dependent on drivers. If you're using Glamour then XV won't use overlays at all. XShm uses a buffer created in CPU memory allocated by the client that the X server then has to copy to the screen.

> Of course then everyone got rid of this in order to have the GPU in the middle for fancier animations

No, for video, the GPU is used in the middle so you can do post-processing without copying everything into main memory and stalling the whole pipeline. I'd like to see an actual benchmark for how a fullscreen 5k video with post-processing plays on Haiku without any hardware acceleration.


> XShm uses a buffer created in CPU memory allocated by the client that the X server then has to copy to the screen.

Fair enough. Even with XShmCreatePixmap, you are still never simply mmaping the card's actual entire framebuffer, unlike what BDirectWindow allows (if https://www.haiku-os.org/legacy-docs/benewsletter/Issue3-12.... is to believed, which is closer to something like DGA). In XShm, the server still has to copy your shared memory segment to the actual framebuffer.

(sorry for previous answer here, I misunderstood your comment)

> No, for video, the GPU is used in the middle so you can do post-processing without copying everything into main memory and stalling the whole pipeline.

Depends on what you mean by "post-processing". You can do many types of card-accelerated zero-copy post-processing using XV: colorspace conversion, scaling, etc. At the time, scaling the video in software or even just doing an extra memory copy per frame would have tanked frame rate -- Xine can be used to watch DVDs in PentiumII-level hardware. Obviously you cannot put the video in the faces of rotating 3D cubes, but this is precisely what I call "fancy animations".


>colorspace conversion, scaling

There's a lot more than that. Please consider installing the latest version of VLC or something like that and checking all the available post-processing effects and filters. These aren't "fancy animations" and they're not rotating 3D cubes, they're basic features that a video player is required to support now. If you want to support arbitrary filters then you need to use the GPU. All these players stopped using XV ages ago, on X11 you'll get the GPU rendering pipeline too because of this.

I don't really see what's the point of making these condescending remarks like trying to suggest that everyone is stupid and is only interested in making wobbly windows and spinning cubes. Those have never been an actual feature of anything besides Compiz, which is a dead project.


I don't see what you mean by "condescending remarks", but I do think it is stretching it to claim "arbitrary filters" is a "basic feature that a video player is required to support now". As a consumer, I have absolutely _never_ used any such video filters, doubt most consumers are even aware of them, have seen few video players which support them, and most definitely I have no idea what they are in VLC. Do they even enable any video filters by default? The only video filter I have sometimes used is deinterlacing which doesn't really fit well in the GPU rendering pipeline anyway but fits very nicely in fixed hardware. So yes, I hardly see the need to stop using native accelerated video output and fallback to GPU just in case someone wants to use such filters. This is how I end up with a card which consumes 20W just for showing a static desktop on two monitors.

Anyway, discussing about this is besides the point, and forgive me from the rant above.

If you really need GPU video filters then the GPU is obviously going to be the best way to implement them, there's no discussion possible about that. But the entire point of TFA is to (dynamically) go back to a model where the GPU is _not_ in the middle. And that model -- sans GPU -- happens to match what Xv was doing and is actually faster and less power consuming than to always blindly use the GPU which is where we are now post-Xv.


>The only video filter I have sometimes used is deinterlacing

I don't know about anything else, but ffmpeg has some deinterlacing filters that run on the GPU. So your one example is a bad one.

>Anyway, discussing about this is besides the point, and forgive me from the rant above.

Next time can you please not post the rant? It's not interesting to parse through all that just to get to the point. It's also extremely uninteresting to have this conversation like "well I haven't personally used that so it must not be important". VLC and ffmpeg are used by millions (billions?) of people, so can we at least agree that neither of our own very particular and personal use cases are that important?

>But the entire point of TFA is to (dynamically) go back to a model where the GPU is _not_ in the middle.

No? Overlays are entirely driven by the GPU. The entire reason these are performant is because it's zero-copy from a GPU buffer to an overlay.

>And that model -- sans GPU -- happens to match what Xv was doing and is actually faster and less power consuming than to always blindly use the GPU which is where we are now post-Xv.

I have no idea what you're talking about. XV (with a driver that supports it) uses the GPU to do the overlays. If you aren't using any filters, this should have the same power consumption as XV.


[deleted]


Very few clients I've seen actually call XShmGetImage to draw on the root window because this is incredibly slow and error prone. Having multiple clients do this at the same time is an almost guaranteed way to cause graphical corruption. On modern hardware with any kind of acceleration you never want to map the framebuffer into userspace like this.

The actual suggested use case of XShm is for the client to allocate its own buffer, which then gets copied to the screen.

>colorspace conversion, scaling, etc.

This is only a tiny subset of the post-processing operating a video player would implement. If you want arbitrary post-proccessing then realistically, you need to use the GPU.


Android has horrific UI latency despite heavily employing hardware acceleration.

Enlightenments EFL was exclusively software for a long time, but is buttery smooth despite largely using redraw most of the time.

Hardware acceleration does not compensate for the lack of hard computer science knowledge about efficient memory operations, caching, and such


Hardware acceleration in this case is to draw less power doing day-to-day operations and not melt the user's hand.

In the general case most effective hardware acceleration uses are supposed to increase latency.

Let's take another example for the analogy: you can run your LLM straight in the CPU and it will have awesome latency and very poor throughput; or you can take your sweet time to load the weights into your GPU/NPU/TPU/memory-in-compute device and run it hardware accelerated at lower energy cost and higher throughput.


Like most OSes where the desktop APIs are part of the whole developer experience, and not yet another pluggable piece.


BeOS was truly ahead of its time. It’s a shame that it didn’t get more traction.


There must be a name for the logical fallacy I see whenever someone pines over a past "clearly superior" technology that wasn't adopted or had its project cancelled, but I can never grasp it. I guess the closest thing to it is an unfalsifiable statement.

The problem comes from people remembering all of the positive traits (often just promises) of a technology but either forgetting all the problems it had or the technology never being given enough of a chance for people to discover all the areas in which it is a bit crap.

BeOS? Impressive technology demo. Missing loads of features expected in a modern OS, even at that time. No multi-user model even!

This is also a big phenomenon in e.g. aviation. The greatest fighter jet ever is always the one that (tragically!) got cancelled before anyone could discover its weaknesses.


Exactly this. I'm as big a fan of "alternative tech timelines" as the next nerd, but I also can see in retrospect why we have the set of compromises we have today, and all along I watched the intense efforts people made to navigate the mindbogglingly complicated minefield of competing approaches and political players and technical innovations that were on the scene.

People have been working damned hard to build things like Gtk, Qt, etc. not to mention Wayland, etc. all the while maintaining compatibility etc and I personally am happy for their efforts.

BeOS/HaikuOS is a product of a mid-90s engineering scene that predates the proliferation of GPUs, the web, and the set of programming languages that we work with today. There's nothing wrong with it in that context, but it's also not "better." Just different compromises.

The other one I see nostalgia nerds reach for is the Amiga. A system highly coupled to a set of custom chips that only made sense when RAM was as fast (or faster) than the CPU, whose OS had no memory protection, and which was easily outstripped in technical abilities by the early 90s by PCs with commodity ISA cards, etc. because of the development of economics of scale in computer manufacturing, etc. It was ahead of its time for about 2-3 years in the mid 80s, but in a way that was a dead end.

Anyways, what we have right now is a messy set of compromises. It doesn't hurt to go looking for simplifications, but it does hurt to pretend that the compromises don't exist for a reason.

EDIT: I would add though that "multiuser" as part of a desktop (or especially mobile) OS has maybe proven to be a pointless thing. The vast majority of Linux machines out there in the world are run in a single user fashion, even if they are capable of multiuser. Android phones, Chromebooks, desktop machines, and even many servers -- mostly run with just one master user. And we've also seen how rather not-good the Unix account permissions model is in terms of security, and how weak its timesharing of resources etc is in context of today's needs -- hence the development of cgroups, containers, and virtual machine / hypervisor etc.


They run with one master user perhaps, but they have multiple users at one time anyways.


I mean, in those cases they're almost always just using multiple users as a proxy for job authorization levels, not people.

Anybody who is serious about securing a single physical machine for multiuser access isn't doing it through multiple OS accounts, and is slicing it up by VMs, instead.

I do have a home (Windows) computer that gets used by multiple family members through accounts, but I think this isn't a common real world use case.


> There must be a name for the logical fallacy I see whenever someone pines over a past "clearly superior" technology that wasn't adopted or had its project cancelled, but I can never think of it. I guess the closest thing to it is an unfalsifiable statement.

You can run Haiku today, so it's hardly unfalsifiable, nor an effect of nostalgia or whatever way you want to phrase it.

> BeOS? Impressive technology demo. Missing loads of features expected in a modern OS, even at that time. No multi-user model even!

"Even at that time" is just false. multi-user safe OSes abound in the 90s?


> You can run Haiku today, so it's hardly unfalsifiable

Excellent, so let's falsify it: how come 20 years later the best thing people really have to say about BeOS/Haiku is that is has smooth window dragging?

> multi-user safe OSes abound in the 90s?

Windows NT.

Linux and at least two other free unixes.

Countless proprietary unixes.

VMS.

The widespread desktop OSs in the 90s were not considered serious OSs even then, more accidents of backward-compatibility needs.


The thing about Haiku is that their plan (initially as "OpenBeOS") was since they're just re-doing BeOS and some of BeOS is Open Source they'll just get some basics working then they're right back on the horse and in a few years they'll be far ahead of where BeOS was.

Over two decades later they don't have a 1.0 release.


Windows NT...?


Unfortunately being technology superior isn't enough to win.

Money, marketing, politics, wanting to be part of the crowd, usually play a bigger role.


This, plus nostalgia pink glasses plus rooting for the underdog (who lost) plus my niche tech is better than your mainstream one.


I was greatly impressed with BeOS filesystem SQL'esque indexing and querying.


You can bypass the compositor with X11 hw acceleration features too. But what about Wayland? I thought apps always go through the compositor there. As shown eg in the diagram at https://www.apertis.org/architecture/wayland_compositors/

Drawing directly from app to kernel graphics buffers (or hw backed surfaces) and participating in composition are orthogonal I think. The compositor may be compositing the kernel or hw backed surfaces.


Bypassing composition is a key feature in Wayland. The protocols are all written with zero-copy in mind. The X11 tricks are already there, forwarding one clients buffers straight to hardware, while the end-game with libliftoff is offloading multiple subsurfaces directly to individual planes at once.

The compositor is the entire display server in Wayland, and is also the component responsible for bypassing composition when possible.


In my meager understanding which I'm happy to be corrected about: In a windowed scenario (vs fullscreen), in both the X direct-rendering and Wayland scenarios the application provides a (possibly gpu backed) surface that the compositor uses as a texture when forming the full screen video output.

In a full-screen case AFAIK it's possible to skip the compositing step with X11, and maybe with Wayland too.

"Zero copy" seems a bit ambiguous term in graphics because there's the kind of copying where whole screen or window sized buffers are being copied about, and then there are compositing operations where various kinds of surfaces are inputs to the final rendering, where also pixels are copied around possibly several times in shaders but there aren't necessary extra texture sized intermediate buffers involved.


> In a full-screen case AFAIK it's possible to skip the compositing step with X11, and maybe with Wayland too.

This is the trivial optimization all Wayland compositors do.

The neater trick is to do this for non-fullscreen content - and even just parts of windows - using overlay planes. Some compositors have their own logic for this, but libliftoff aims to generalize it.

Zero-copy is not really ambiguous, but to clarify: Wayland protocols are designed to maximize the cases where a buffer rendered by a client can be presented directly by the display hardware as-is (scanned out), without any intermediate operations on the content.

Note "maximize" - the content must be compatible with hardware capabilities. Wayland provides hints to stay within capabilities, but a client might pick a render buffer modifier/format that cannot be scanned out by the display hardware. GPUs have a lot of random limitations.


Wasn't this one of the main security issues with the BeOS architecture?

It's not the same thing as what's being done here via Wayland. BeOS is more YOLO style direct access of the framebuffer contents, without solving any of the hard problems (I don't think it was really possible to do properly using the available hardware at the time).


That's also how SunView worked on SunOS in 1982, and Sun's later GX graphics accelerated framebuffer driver worked in the 90's. The kernel managed the clipping list, and multiple processes shared the same memory, locking and respecting the clipping list and pixel buffers in shared memory (main memory, not GPU memory!), so multiple process could draw on different parts of the screen efficiently, without incurring system calls and context switches.

https://en.wikipedia.org/wiki/SunView

Programmers Reference Manual for the Sun Window System, rev C of 1 November 1983: Page 23, Locking and Clipping:

http://bitsavers.trailing-edge.com/pdf/sun/sunos/1.0/800-109...

But GPUs change the picture entirely. From what I understand by reading the article, GTK uses GL to render in the GPU then copies the pixels into main memory for the compositor to mix with other windows. But in modern GPU-first systems, the compositor is running in the GPU, so there would be no reason to ping-pong the pixels back and forth between CPU and GPU memory after drawing 3D or even 2D graphics with the GPU, even when having different processes draw and render the same pixels.

So I'm afraid Wayland still has a lot of catching up to do, if it still uses a software compositor, and has to copy pixels back from the GPU that it drew with OpenGL. (Which is what I interpret the article as saying.)

More recently (on an archeological time scale, but for many years by now), MacOS, Windows, iOS, and Android have all developed ways of sharing graphics between multiple processes not only in shared main CPU memory, but also on the GPU, which greatly accelerates rendering, and is commonly used by web browsers, real time video playing and processing tools, desktop window managers, and user interface toolkits.

There are various APIs to pass handles to "External" or "IO Surface" shared GPU texture memory around between multiple processes. I've written about those APIs on Hacker News frequently over the years:

https://news.ycombinator.com/item?id=13534298

DonHopkins on Jan 31, 2017 | parent | context | favorite | on: Open-sourcing Chrome on iOS

It's my understanding that only embedded WKWebViews are allowed to enable the JIT compiler, but not UIWebViews (or in-process JavaScriptCore engines). WKWebView is an out-of-process web browser that uses IOSurface [1] to project the image into your embedding application and IPC to send messages.

So WKWebView's dynamically generated code is running safely firewalled in a separate address space controlled by Apple and not accessible to your app, while older UIWebViews run in the address space of your application, and aren't allowed to write to code pages, so their JIT compiler is disabled.

Since it's running in another process, WkWebView's JavaScriptEngine lacks the ability to expose your own Objective C classes to JavaScript so they can be called directly [2], but it does include a less efficient way of adding script message handlers that call back to Objective C code via IPC [3].

[1] https://developer.apple.com/reference/iosurface

[2] https://developer.apple.com/reference/javascriptcore/jsexpor...

[3] https://developer.apple.com/reference/webkit/wkusercontentco...

https://news.ycombinator.com/item?id=18763463

DonHopkins on Dec 26, 2018 | parent | context | favorite | on: WKWebView, an Electron alternative on macOS/iOS

Yes, it's a mixed bag with some things better and others worse. But having a great JavaScript engine with the JIT enabled is pretty important for many applications. But breaking up the browser into different processes and communicating via messages and sharing textures in GPU memory between processes (IOSurface, GL_TEXTURE_EXTERNAL_OES, etc) is the inextricable direction of progress, what all the browsers are doing now, and why for example Firefox had to make so many old single-process XP-COM xulrunner plug-ins obsolete.

IOSurface:

https://developer.apple.com/documentation/iosurface?language...

https://shapeof.com/archives/2017/12/moving_to_metal_episode...

GL_TEXTURE_EXTERNAL_OES:

https://developer.android.com/reference/android/graphics/Sur...

http://www.felixjones.co.uk/neo%20website/Android_View/

pcwalton on Dec 27, 2018 | prev [–]

Chrome and Firefox with WebRender are going the opposite direction and just putting all their rendering in the chrome process/"GPU process" to begin with.

DonHopkins on Dec 27, 2018 | parent [–]

Yes I know, that's exactly what I meant by "breaking up the browser into different processes". They used to all be in the same process. Now they're in different processes, and communicate via messages and shared GPU memory using platform specific APIs like IOSurface. So it's no longer possible to write an XP/COM plugin for the browser in C++, and call it from the renderer, because it's running in a different process, so you have to send messages and use shared memory instead. But then if the renderer crashes, the entire browser doesn't crash.

https://news.ycombinator.com/item?id=20313751

DonHopkins on June 29, 2019 | parent | context | favorite | on: Red Hat Expecting X.org to “Go into Hard Maintenan...

Actually, Electron (and most other web browsers) on the Mac OS/X and iOS use IOSurface to share zero-copy textures in GPU memory between the render and browser processes. Android and Windows (I presume, but don't know name of the API, probably part of DirectX) have similar techniques. It's like shared memory, but for texture memory in the GPU between separate heavy weight processes. Since simply sharing main memory between processes wouldn't be nearly as efficient, requiring frequent uploading and downloading textures to and from the GPU.

Mac OS/X and iOS IOSurface:

https://developer.apple.com/documentation/iosurface?language...

http://neugierig.org/software/chromium/notes/2010/08/mac-acc...

https://github.com/SimHacker/UnityJS/blob/master/notes/IOSur...

Android SurfaceTexture and GL_TEXTURE_EXTERNAL_OES:

https://developer.android.com/reference/android/graphics/Sur...

https://www.khronos.org/registry/OpenGL/extensions/OES/OES_E...

https://docs.google.com/document/d/1J0fkaGS9Gseczw3wJNXvo_r-...

https://github.com/SimHacker/UnityJS/blob/master/notes/ZeroC...

https://github.com/SimHacker/UnityJS/blob/master/notes/Surfa...

https://news.ycombinator.com/item?id=25997356

DonHopkins on Feb 2, 2021 | parent | context | favorite | on: VideoLAN is 20 years old today

>Probably do a multi-process media player, like Chrome is doing, with parsers and demuxers in a different process, and different ones for decoders and renderers. Knowing that you probably need to IPC several Gb/s between them. Chrome and other browsers and apps, and drivers like virtual webcams, and libraries like Syphon, can all pass "zero-copy" image buffers around between different processes by sharing buffers in GPU memory (or main memory too of course) and sending IPC messages pointing to the shared buffers.

That's how the browser's web renderer processes efficiently share the rendered images with the web browser user interface process, for example. And how virtual webcam drivers can work so efficiently, too.

Check out iOS/macOS's "IOSurface":

https://developer.apple.com/documentation/iosurface

>IOSurface Share hardware-accelerated buffer data (framebuffers and textures) across multiple processes. Manage image memory more efficiently.

>Overview: The IOSurface framework provides a framebuffer object suitable for sharing across process boundaries. It is commonly used to allow applications to move complex image decompression and draw logic into a separate process to enhance security.

And Android's "SurfaceTexture" and GL_TEXTURE_EXTERNAL_OES:

https://developer.android.com/reference/android/graphics/Sur...

>The image stream may come from either camera preview or video decode. A Surface created from a SurfaceTexture can be used as an output destination for the android.hardware.camera2, MediaCodec, MediaPlayer, and Allocation APIs. When updateTexImage() is called, the contents of the texture object specified when the SurfaceTexture was created are updated to contain the most recent image from the image stream. This may cause some frames of the stream to be skipped.

https://source.android.com/devices/graphics/arch-st

>The main benefit of external textures is their ability to render directly from BufferQueue data. SurfaceTexture instances set the consumer usage flags to GRALLOC_USAGE_HW_TEXTURE when it creates BufferQueue instances for external textures to ensure that the data in the buffer is recognizable by GLES.

And Syphon, which has a rich ecosystem of apps and tools and libraries:

http://syphon.v002.info

>Syphon is an open source Mac OS X technology that allows applications to share frames - full frame rate video or stills - with one another in realtime. Now you can leverage the expressive power of a plethora of tools to mix, mash, edit, sample, texture-map, synthesize, and present your imagery using the best tool for each part of the job. Syphon gives you flexibility to break out of single-app solutions and mix creative applications to suit your needs.

Of course there's a VLC Syphon server:

https://github.com/rsodre/VLCSyphon


I can't respond to everything incorrect in this, because it's way to long to read. But from the very start...

Also, I wrote a significant part of GTK's current OpenGL renderer.

> But GPUs change the picture entirely. From what I understand by reading the article, GTK uses GL to render in the GPU then copies the pixels into main memory for the compositor to mix with other windows.

This is absolutely and completely incorrect. Once we get things into GL, the texture is backed by a DMABUF on Linux. You never read it back into main memory. That would be very, very, very stupid.

> But in modern GPU-first systems, the compositor is running in the GPU, so there would be no reason to ping-pong the pixels back and forth between CPU and GPU memory after drawing 3D or even 2D graphics with the GPU, even when having different processes draw and render the same pixels.

Yes, the compositor is running in the GPU too. So of course we just tell the compositor what the GL texture id is and it composites if it cannot map that texture (again, because it's really a DMABUF) as a toplevel plane for hardware scanout without using 3d capabilities at all.

That doesn't mean unaccelerrated. It means it doesn't power up the 3d part of the GPU. It's the fastest way in/out with the least power. You can avoid "compositing" from a compositor too when things are done right.

> So I'm afraid Wayland still has a lot of catching up to do, if it still uses a software compositor, and has to copy pixels back from the GPU that it drew with OpenGL. (Which is what I interpret the article as saying.)

Again, completely wrong.

> Check out iOS/macOS's "IOSurface":

Fun fact, I wrote the macos backend for GTK too. And yes, it uses IOSurface just like DMABUF works on Linux.


I’m not the parent poster, I’m just tryin go to grab this opportunity that I “met” someone so familiar with GTK :)

Could you please share your opinion on the toolkit, and its relation to others? Also, I heard that there were quite a lot of tech debt in GTK3 and part of the reason why GTK4 came as a bigger update is to fix those — what would you say, was it successful? Or is there still some legacy decisions that harm the project somewhat?


> Also, I heard that there were quite a lot of tech debt in GTK3 and part of the reason why GTK4 came as a bigger update is to fix those

GTK 3 itself was trying to lose the tech debt of 2.x (which in turn 1.x). But they were still all wrapping a fundamentally crap API of X11 for graphics in this century.

GTK 4 changed that, and it now wraps a Wayland model of API. That drastically simplified GDK, which is why I could write a macOS backend in a couple of weeks.

It also completely changed how we draw. We no longer do immediate mode style (in the form of Cairo) and instead do a retained mode of draw commands. That allows for lots of new things you just couldn't do before with the old drawing model. It will also allow us to do a lot more fun things in the future (like threaded/tiled renderers).

The APIs all over the place were simplified and focused. I can't imagine writing an application the size of GNOME Builder again with anything less than GTK 4.

Hell, last GNOME cycle I rewrote Sysprof from scratch in a couple months, and it's become my co-pilot every day.


Thanks for the comment and for your work!


Thank you for the correction, it's a relief! That's nice work.

I'm sorry, I misinterpreted the paragraph in the article saying "exports" as meaning that it exports the pixels from GPU memory to CPU memory, not just passing a reference like GL_TEXTURE_EXTERNAL_OES and IOSurface does.

>GTK has already been using dmabufs since 4.0: When composing a frame, GTK translates all the render nodes (typically several for each widget) into GL commands, sends those to the GPU, and mesa then exports the resulting texture as a dmabuf and attaches it to our Wayland surface.

Perhaps I'd have been less confused if it said "passes a reference handle to the resulting texture in GPU memory" instead of "exports the resulting texture", because "exports" sounds expensive to me.

Out of curiosity about the big picture, are dmabufs a Linux thing that's independent of OpenGL, or independent of the device driver, or build on top of GL_TEXTURE_EXTERNAL_OES, or is GL_TEXTURE_EXTERNAL_OES/SurfaceTexture just an Android or OpenGL ES thing that's an alternative to dmabufs in Linux? Do they work without any dependencies on X or Wayland or OpenGL, I hope? (Since pytorch doesn't use OpenGL.)

https://source.android.com/docs/core/graphics/arch-st

One practical non-gui use case I have for passing references to GPU textures between processes on Linux is pytorch: I'd like to be able to decompress video in one process or docker container on a cloud instance with an NVidia accelerator, and then pass zero-copy references to the resulting frames into another process (or even two -- each frame of video needs to be run through two different vision models) in another docker container running pytorch, sharing and multitasking the same GPU, possibly sending handles through a shared local file system or ipc (like how IOSurface uses Mach messages to magically send handles, or using unix domain sockets or ZeroMQ or something like that), but I don't know if it's something that's supported at the Linux operating system level (ubuntu), or if I'd have to drop down to the NVidia driver level to do that.

NVidia has some nice GPU video decompressor libraries, but they don't necessarily play well with pytorch in the same process, so I'd like to run them (or possibly ffmpeg) in a different process, but on the same GPU. Is it even possible, or am I barking up the wrong tree?

It would be ideal if ffmpeg had a built-in "headless" way to perform accelerated video decompression and push out GPU texture handles to other processes somehow, instead of rendering itself or writing pixels to files or touching CPU memory.


> Out of curiosity about the big picture, are dmabufs a Linux thing that's independent of OpenGL, or independent of the device driver,

They are independent of the graphics subsystem altogether (although that is where they got their start, afaik). Your webcam also uses DMABUF. So if you want to display your webcam from a GTK 4 application, this GtkGraphicsOffload will help you take that DMABUF from your camera (which may not be mappable on CPU memory, but can DMA pass to your GPU), and display it in a GTK application. It could either be composited on the GPU, or mapped directly to scanout if the right conditions are met.

I wrote a library recently (libmks) and found the culprits in Qemu/VirGL/virtio_gpu that were preventing passing a DMABUF from inside a guest VM to the host. That stuff is all fixed now so theoretically you could even have a webcam in a VM which then uses a GTK 4 application to render with VirGL and the compositor submit the scene to the host OS which itself can set the planes correctly to get the same performance as if it were in the host OS.

> I'd like to be able to decompress video in one process or docker container on a cloud instance with an NVidia accelerator, and then pass zero-copy references

If you want this stuff with NVidia, and you're a customer, I highly suggest you tell your NVidia representative this. Getting them to use DMABUF in a fashion that can be used from other sub-systems would be fantastic.

But at it's core, if you were using Mesa and open drivers for some particular piece of hardware, yes it's capable of working given the right conditions.


> From what I understand by reading the article, GTK uses GL to render in the GPU then copies the pixels into main memory for the compositor to mix with other windows.

This seems very strange to me. It’s how things would work with wl_shm, which is the baseline pixel-pushing interface in Wayland, but AFAIU Gtk uses EGL / Mesa, which in turn uses Linux dmabufs, which is how you do hardware-accelerated rendering / DRI on Linux today in general.

However, how precisely Linux dmabufs work in a DRI context is not clear to me, because the documentation is lacking, to say the least. It seems that you can ask map to dmabufs into memory, and you can create EGLSurfaces from them, but are they always mapped into CPU memory (if only kernel-side), or can they be bare GPU memory handles until the user asks to map them?

I’d hope for the latter, and if so, the only thing the work discussed in the article avoids is extra GPU-side blits (video decoding buffer to window buffer to screen), which is non-negligible but not necessarily the end of the world.


I misunderstood the article saying "exports the resulting texture" as meaning it exported it to CPU memory, but audidude explained how it actually works.

I believe GL_TEXTURE_EXTERNAL_OES is an Android-only OpenGL extension that takes the place of some uses of DMABUF but is not as flexible and general.

ChatGPT seems to know more about them, but I can't guarantee how accurate and up-to-date it is:

https://chat.openai.com/share/abff036b-3020-4093-a13b-86cbf0...

The tricky bit may be teaching pytorch to accept dmabuf handles and read and write dmabuf GPU buffers. (And ffmpeg too!)


Linux on ARM SoCs with HW video decoders that are separate to the GPU can use the V4L2 API to avoid some copying. The decoder writes a frame to a buffer that the GPU can see then you use GL to get the GPU to merge it into the framebuffer.


By going wayland only Gtk is becoming less of a GUI toolkit and more of just an extremely specific and arbitrary lib for GNOME desktop environment.


> By going wayland only

You mean this one very small piece? That seems a bit hyperbolic


Everyone goes Wayland, not just Gnome, X is obsolete.


Yet nothing works on Wayland. This is the kind of stuff that boils my blood as a linux user. In my arch install I do not intend to uninstall X for the next decade or so. Every time I tried switching to Wayland it was a circus of dozens of bugs and workflow I use everyday for work being "not supported".


I realize it's probably a waste to say this to someone with your username, but getting angry at the situation is futile. You shouldn't use Linux if you're not used to random stuff changing and breaking by now and you're not comfortable adapting to those changes. Doubly so for a rolling release distro like Arch. X was obsolete and a security disaster last decade, holding onto it for another decade is just masochism. If this all is to much trouble for you to run a Unix-like desktop and keep it updated, there's always MacOS. They never even made the initial mistake of using X.


I'm using arch with Wayland and Sway and don't see any major issues. The only time I use X is via XWayland for games.


And the handful of feature incompatible waylands are feature incomplete. You still can't keyboard/mouse share under any of them. That's just one of innumberable things you can't do.


I assume that will happen over time, won’t it?


No it won't Wayland protocol doesn't support it, besides Wayland has been here for such a long time. Pretending like Wayland is ready to replace X is like pretending like we can ban cars tomorrow and bikes will be good enough. It's completely delusional.


But if Wayland is a new step of development for X, then these features could be implemented differently, I believe. Maybe for security reasons or simplicity or others. As if someone misses X so much, they could still use it or even maintain it if they can. I never used X much, as I completely switched to Linux just a few years ago. So I don’t miss a thing.


Keyboard/mouse sharing is completely unrelated to the Wayland protocol. Wayland is only concerned with sending input events to client windows. Generating and capturing global events is out-of-scope and it's an entirely different API. The way this works in X11 is a giant hack that requires multiple extensions and the end result is it compromises all security of those devices. It's even more delusional to pretend this was ever production-ready or that Wayland needs to be ready for anything here. The X11 implementation just shouldn't have been shipped at all.


You keep saying X11 is insecure but I've never had a problem with that in the last 20 years. I've never known anyone with an X11 security problem in the last 20 years. I've never heard of anyone having an X11 security problem in the last 20 years. Perhaps you can point me to an incident? The idea that it's "insecure" to let applications on your computer access the inputs of other programs comes from smartphone space where you don't actually control your computer or the software on it and that becomes a problem. But for actual computers you control it just isn't (a problem).

Wayland for "security" is cargo culting smartphone user problems. It's not actually a real issue.

I use the keyboard/mouse sharing in X11 (via synergy) and I have for 20 years. It is vitally important to my workflow. It works on dozens of different OSes including linux. But not the waylands linuxes. Any graphical environment that can't do this is useless to me. Might as well not even release the waylands at all (see how silly applying my personal preferences globally seems?).


>I've never heard of anyone having an X11 security problem in the last 20 years.

Here's 6 CVEs just from last month. Check the mailing lists and you'll see many of these going back for years and years.

https://lists.x.org/archives/xorg/2023-October/061506.html

https://lists.x.org/archives/xorg/2023-October/061514.html

And before you say this is not what you meant, the X server and X client libraries do very little anymore besides parsing untrusted input and passing it somewhere else. That's its main purpose and it's completely bad at it. And because it's X, this input can also come from over the network too so every normal memory bug can also be an RCE. This is probably the single biggest attack vector on a desktop system aside from the toolkit. It's the exact wrong thing for anyone to grant access to every input on the system.

This is not just my personal opinion or me giving anecdotes either, this is paraphrasing what I've heard X developers say after many years of patching these bugs. But that's not even the whole problem as I'll explain shortly.

>But for actual computers you control it just isn't (a problem). Wayland for "security" is cargo culting smartphone user problems. It's not actually a real issue.

Yes it is a problem and no it's not cargo culting. Practically speaking the X11 security model means every X client gets access to everything including all your passwords (and the root password) as you type them, and subsequently lets every X client spawn arbitrary root processes and get access to your whole filesystem including your private keys and insert kernel modules or do whatever. If you actually think this "isn't a real issue" then you should just stop using passwords, stop protecting your private keys, run every program as root, and disable memory protection: because that's what this actually means in practice. No I'm not exaggerating. The security model of X11 has no idea about client boundaries at all. This is completely unacceptable on any other OS but for some reason it's become a meme to say that only smartphones need to care about this. Really? Come on.

>I use the keyboard/mouse sharing in X11 (via synergy) and I have for 20 years. It is vitally important to my workflow. It works on dozens of different OSes including linux. But not the waylands linuxes. Any graphical environment that can't do this is useless to me.

X11 can't do it securely so I would say that's as useless as not implementing the feature, if you have to compromise your security in order to get it.

The feature will be implemented in Wayland eventually when the design for a secure API is finished. There are people working on it now. In comparison, X11 is probably never going to gain a secure way to do that.


Uh... don't expose your X.org server to the internet naked. I thought this was obvious. All the problems you mentioned go away. Who exposes X to the net anyway? That's not something a normal desktop install does.

It is cargo culting. It's not actually a problem that my applications are powerful and can do what I want them to do. It is a problem that other locked down OSes like Macs and smartphone systems are not in the user's control and programs cannot do many things by design. This is because on those systems the users are not in control of what is running and the OS makers believe they know better. If they can't do it it is useless (no qualification re: fantasy security issues needed).

... sharing keyboard/mouse with synergy/barrier/etc is secure.


>Uh... don't expose your X.org server to the internet naked.

This is not something the X maintainers can say. They can encourage people not to do it but if they stop maintaining that feature then the complaints start to roll in because someone somewhere was using it. If you think this situation is awful then yes, you're starting to get it: X is in a bad spot where these broken insecure features are holding else everything back and will continue to do so as long as people depend on it. At best they can disable it by default and make it hard to accidentally re-enable it, which is what they've already done.

>That's not something a normal desktop install does.

Yes, most normal desktop installs don't use X11 in any capacity. They use Microsoft Windows.

>It's not actually a problem that my applications are powerful and can do what I want them to do.

I notice you didn't actually respond to my comment about stopping using passwords and private keys and running everything as root. Because I'd bet even you draw a line somewhere, in a place where you think it's a risk to give an application too much power.

>It is a problem that other locked down OSes like Macs and smartphone systems are not in the user's control and programs cannot do many things by design.

This has absolutely nothing to do with Linux or even on those systems either. It's not actually a problem there. If you have root on the system then you are in control and can do whatever you want anyway. The purpose of setting security boundaries and not running everything as root is because not everything needs to access everything else all the time. The security model you're suggesting became obsolete by the mid 1990s.

And let me say this again so it's perfectly clear. When you use X11 there is effectively no security boundary between any X11 clients. So if you start up your root terminal or you use sudo or anything else like that, then any other X11 client on the system also gets root. This is unacceptable and I can't believe I still have to continually point this out to long time Linux users that should be technical enough to understand. It doesn't even matter if you personally think it's fine to run everything as root: maybe you do. But as a user you should have enough understanding of the system to know that this absolutely is not ok for lots of other users and it's simply not appropriate to be shipped as the default in the year 2023.

These are not fantasy issues, these are actual issues that the underlying system was purposely designed to fix. X11 pokes a huge gaping hole in it.

>sharing keyboard/mouse with synergy/barrier/etc is secure.

No. On a typical X11 install it's not, because it relies on insecure APIs.


Gtk is not going wayland only.


https://www.phoronix.com/news/GTK5-Might-Drop-X11 "Red Hat's Matthias Clasen opened an issue on GTK entitled "Consider dropping the X11 backend"" https://gitlab.gnome.org/GNOME/gtk/-/issues/5004


This sounds so overly complicated considering that you can do all this in HTML without much effort.


I wonder why then, don't they just implement GTK with HTML?

Probably because HTML is at the very top of the stack, while this is much lower... Without everything below it on the stack, HTML is just a text file.


You're reading into it too much.

All I meant was: if the API of HTML is simple, then why does GTK's API have to be that complicated.


This is literally an implementation detail, the API part is a single flag you can set if you want your subsurface to potentially make use of this future.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: