Because it’s very easy for the CPU-GPU interface to become the bottleneck, as state-of-the-art resolutions require shoving around literal gigabytes per second to get a standard framerate (seriously, make a rough estimate how much data your 300 ppi phone or tablet is pumping over the bus to the screen, it’s chilling), and the less you do it the better. (See a HN comment about Audacity slowing down in recent years [as screen resolution increases]: https://news.ycombinator.com/item?id=26498649.) Getting a not-quite-SIMD processing unit specializing in throughput above all else to do your heavy lifting is a bonus.
I suspect the underlying question is “why a supposedly-3D-optimized GPU for a 2D task?” While it’s true that modern GPUs are 3D-optimized, that’s AFAICS because the decade-long lull of 1MP-or-so screens and ever-more-powerful desktop CPUs ca. 2000–2010(?) made CPU 2D rendering mostly “fast enough” so as the programmable 3D pipeline emerged the 2D accelerator from the workstation era died out, leaving perhaps only an optimized blit behind.
Compute- and power-constrained handheld devices and higher-resolution screens made developers (software and hardware) wake up in a hurry, but now, a redraw-everything-every frame, 3D-adapted graphics facility is what you have, so a redraw-everything-every-frame, 3D-adapted graphics facility is what you shall use. The much more parallel and power-efficient processor and wider bus is still easily worth it, if you spend the effort to wrangle it.
(It’s interesting to think what 2D-optimized hardware would look like. Do people know how to do analytic or even just AGG-quality 2D rasterization on a GPU? Or anything but simple oversampling.)
Not a graphics programmer, treat with a measure of skepticism.
I have been using computers on a daily basis since the mid 80's. I have seen many applications that stretched or exceeded the capabilities of the machine I was running them on, resulting in sub-par performance that left me wishing for a hardware upgrade to make using the application a more pleasant experience.
None of them were text editors or terminal emulators.
Notably there's something like an order of magnitude or more difference in performance between terminals that people are perfectly happy to use. (EDIT: To quantify this: on my system, catting the same file to a window taking about half my screen took an average of 1.443s with st, 1.16 seconds with xterm, 0.165s with rxvt xterm, and 0.404s with Kitty - all average over 3 runs)
Very few non-gpu accelerated terminals get anywhere near maximising performance and they're still fast enough that most people don't even notice because most applications, including most editors, once even remotely optimized themselves, don't tend to push even a non-accelerated terminal very hard.
Put another way: a typical test case of the performance of a terminal tends to be spewing log output or similar to it at a rate where the obvious fix is to simply decouple the text buffer from the render loop because nobody can read it all anyway (but really, a lot of the developers of these terminals should start by looking at rxvt and at least get to a decent approximation first before they start doing those kinds of shortcuts)
Fast enough terminal rendering was solved by the mid 1980s, and performance increases have outpaced resolution increases by a substantial factor.
The bulk of the 3D work is still 2D, filling all the pixels, rasterizing 2D triangles after they’ve been transformed, projected to 2D, and clipped. Shaders might have lots of 3D math, but that’s really just pure math and no different from 2D math as far as the GPU is concerned.
> Do people know how to do analytic or even just AGG-quality 2D rasterization on a GPU? Or anything but simple oversampling.
Yes, I think so - if you mean quads & triangles.
The problem, of course, is analytic filtering (even 2D) usually isn’t worth the cost, and that oversampling is cheap and effective and high enough quality for most tasks. (Still, adaptive oversampling and things like DLSS are popular ways to reduce the costs of oversampling.) The path rendering people think carefully about 2d path rendering though:
And don’t forget mipmapped texture sampling, even in 2D, is better than oversampled.
A very common shader trick for 2d antialiasing that’s better than oversampling (not analytic, but sometimes damn close) is to use the pixel derivatives to compute the edges of a mask that blends from opaque to transparent in ~1-pixel. https://www.shadertoy.com/view/4ssSRl
> Do people know how to do analytic or even just AGG-quality 2D rasterization on a GPU?
I suppose the sensible answer is "do it in a compute shader if you care about pixel-perfect accuracy". Which you arguably should for 2D stuff, given that the overhead is low enough.
Sure, but did that "2D" orientation boil down to anything beyond accelerated blitting and perhaps a few geometry-rendering primitives? Part of the problem is also that there never was a standard feature set and API for 2D acceleration comparable to OpenGL or Vulkan. So support for it was highly hardware-dependent and liable to general bit rot.
Yes, there were other hardware acceleration tricks implemented. For example, hardware sprites allowed for bitmaps that were composed in real time for display rather than blitted to the framebuffer. This trick is still used for the mouse cursor. There was hardware scrolling which meant you could move the whole display and only render the edge that came into view. Both of these together is how platformer games were implemented on the SNES and similar hardware of that generation, and it’s why the gameplay was so smooth. The NES could either do hardware sprites or hardware scrolling, but not both IIRC, which is why the game world freezes when Link reaches the edge of the screen and the new screen comes into view in the original Zelda.
There were other hardware accelerations my memory is somewhat more vague on. I remember there were some color palate translation hardware, and hardware dithering.
There wasn’t any standard cross platform graphics api back then, but that’s more a statement about the era. Everyone wrote directly to the metal.
> For example, hardware sprites allowed for bitmaps that were composed in real time for display rather than blitted to the framebuffer.
You can do all these things with compositing, though. A "sprite" is just a very limited hardware surface where compositing happens at scanout, and some modern GPU's have those too.
Why does it need “evidence”? GPUs are fast and good at displaying pixels. Moving rendering to the GPU lets the CPU focus on things it’s good at rather than bog it down with a high bandwidth task that has to go to the GPU anyway. Lots of editors have been getting GPU acceleration (I use Sublime and Visual Studio among others, both have hardware acceleration). All major browsers and operating systems support hardware accelerated drawing. Video games of course… I think the question is why wouldn’t you use the GPU for rendering these days? It’s a pixel processor that nearly everyone has in their machine. What evidence is there to support the decision to render pixels on a CPU rather than use the rendering co-processor?
I’m very confused by that comment, what do you mean snake oil? Are you saying GPUs don’t do anything? Isn’t the XV you’re referring to in the same category as today’s GPU wrt parent’s comment? @cosmotic asked for evidence why not to use the CPU, and XV is not implemented on the CPU.
Also it looks like XV did video resizing and some color mapping (https://en.wikipedia.org/wiki/X_video_extension). Today’s GPUs are doing the rendering. Filling the display buffer using primitives and textures, and outputting the display buffer are different activities that are both done by the GPU today, but it sound like XV didn’t do the first part at all, and that first part is what we’re talking about here.
If GPUs are good at displaying pixels, the benchmarks and evidence should be easy to come by. As I mentioned in another comment, I use iTerm2 and it's GPU acceleration has no visual impact but makes CPU usage much higher while sitting idle. Turning it off is a huge improvement.
Evidence is super easy to come by, but the question does need to be specific and well formed (what kind of rendering, exactly, are we comparing, how many pixels, what’s the bottleneck, etc.). There are loads and loads of benchmarks demonstrating GPUs are faster than CPUs at turning triangles into pixels. Not just a little faster, the numbers are usually in the ~100x range. There’s literally zero contention on this point, nobody is questioning whether simple rendering cases might be faster. Nobody is playing Fortnite with CPU rendering. Because this question is so well settled, GPU rendering is so ubiquitous that it’s even hard to test CPU software rendering.
There are certain kinds of rendering and corner cases where CPUs can have an edge, but those happen in areas like scientific visualization or high end VFX, they don’t come up often in text editor rendering engines.
You can’t use your anecdote of 1 app that might have a broken implementation to question GPUs categorically. I’ve never noticed iTerm2 using significant CPU. My iTerm2 sits at 0.0% CPU while idle. Maybe your install is busted?
perhaps the answer can be found in the 'motivation' section of the linked website.
"The base motivation was just simply that i wanted to have a
look into OpenGL and doing GPU accelerated things, i did not
plan to create a text editor from the get go.
After starting to experiment a bit i decided to call this a
small side Project and implement a proper editor."
Just one more answer - why not? It seems to me that GPUs are heavily underutilized most of the time a PC is running. They kick in only in a few rare applications like gaming/rendering/video decoding and run at literal 0% utilization the other times (atleast according to what windows task manager tells me).
On the other hand, the CPU is often busy. Particularly for a text editor, theres a good chance the user is running all sorts of CPU heavy stuff like code analysis tools, databases, compile loops etc.
That depends on how much of the effort requires CPU work in order to transfer data to the GPU relative to the work that will actually be done on the GPU. For text rendering it's possible but not a given you'll save all that much, depending on how fast your GPU is, and how big you're rendering the glyphs, and whether or not you're applying any effects to them.
For most text-heavy apps there's little reason for the text-rendering to be a bottleneck either way.
While you’re right that it does depend, the imbalance is so large in practice that it’s extremely difficult to find a workload where CPU wins, especially when you’re talking about 2d screen rendering. The bottlenecks are: per-pixel compute multiplied by the number of pixels, and bandwidth of transferring data to the GPU (must include this when doing “CPU” rendering).
What does CPU rendering even mean? Is it sending per-pixel instructions to the GPU, or is it rendering a framebuffer in CPU ram, and transferring that to the GPU or something else? Pure software would be the latter (save pixels to a block of RAM, then transfer to the GPU), but the line today isn’t very clear because most CPU renderers aren’t transferring pixels, they’re calling OS level drawing routines that are turned into GPU commands. Most of the time, CPU rendering is mostly GPU rendering anyway.
> there’s little reason for the text-rendering to be a bottleneck either way.
Think about 4K screens, which is up to 8M pixels to render. Suppose you’re scrolling and want 60fps. If you wanted to rendering this to your own framebuffer in RAM, and you render in 24 bit color, then your bandwidth requirement is 8M * 3bytes * 60frames = 1.4GB/s bandwidth. It’s also 8M * 60 pixel rendering operations. Even if the per-pixel rendering was 1 single CPU instruction (it’s more, probably a lot more, but suppose anyway) then the CPU load is 0.5 billion instructions per second, which is a heavy load even without any cache misses. Most likely, this amount of rendering would consume 100% of a CPU core at 4K.
Anyway, there’s no reason to have the CPU do all this work and to fill the bus with pixels when we have hardware for it.
> While you’re right that it does depend, the imbalance is so large in practice that it’s extremely difficult to find a workload where CPU wins, especially when you’re talking about 2d screen rendering. The bottlenecks are: per-pixel compute multiplied by the number of pixels, and bandwidth of transferring data to the GPU (must include this when doing “CPU” rendering).
Text rendering is pretty much the best case for the CPU in this respect, because rendering inline to a buffer while processing the text tends to be pretty efficient.
> they’re calling OS level drawing routines that are turned into GPU commands. Most of the time, CPU rendering is mostly GPU rendering anyway.
This is true, and is another reason why it's largely pointless for text-heavy applications to actually use OpenGL etc. directly. But what I've said also applies in purely software rendered system. I've terminal code with control of the rendering path for systems ranging from 1980's hardware via late 90's embedded platforms to modern hardware - there's little practical difference; from the mid 1980's onwards, CPUs have been fast enough to software render terminals to raw pixels just fine, and typical CPU performance has increased faster than the amount of pixels a typical terminal emulator needs to push.
>> > there’s little reason for the text-rendering to be a bottleneck either way.
> Think about 4K screens, which is up to 8M pixels to render.
I stand by what I said. I'd recommend trying to benchmark some terminal applications - only a handful of the very fastest ones spend anywhere near a majority of their time on rendering even when doing very suboptimal rendering via X11 calls rather than rendering client side to a buffer. Of the ones that do the rendering efficiently, the only case where this becomes an issue is if you accidentally cat a massive file, and only then if the terminal doesn't allow.
> Most likely, this amount of rendering would consume 100% of a CPU core at 4K.
So let it. You're describing an extreme fringe case of shuffling near-full lines as fast as you can where you can still trivially save CPU if you care about it by throttling the rendering and allowing it to scroll more than one line at a time, which is worth doing anyway to perform well on systems under load or without a decent GPU, or just slow systems (doing it's trivial: decouple the text buffer update from the render loop, and run then async - this is a ~30-40 year old optimisation).
Yes, it looks ugly when you happen to spit hundreds of MB of near-full lines of text to a terminal when the system is under load. If you care about that, by all means care about GPU rendering, but in those situations about the only thing people tend to care about is how quickly ctrl-c makes it stop.
Try tracing what some terminal emulators or editors are actually doing during typical use. It's quite illuminating both about how poorly optimised most terminals (and text editors) are, and how little of their time they tend to spend rendering.
> Anyway, there’s no reason to have the CPU do all this work and to fill the bus with pixels when we have hardware for it.
A reason is simplicity, and performing well on systems with slow GPUs without spinning up fans all over the place, and that the the rendering is rarely the bottleneck so spending effort accelerating the rendering vs. spending effort speeding up other aspects is often wasted. Especially because as you point out many systems optimise the OS provided rendering options anyway. E.g. case in point: On my system rxvt is twice as fast as Kitty. The latter is "GPU accelerated" and spins up my laptop fans with anything resembling high throughput. Rxvt might well trigger GPU use in Xorg - I haven't checked - but what I do know is it doesn't cause my fans to spin up and it's far faster despite repeated X11 requests. A whole lot of the obsession with GPU acceleration in terminal emulators and text editors is not driven by benchmarking against the fastest alternatives, but cargo-culting.
> A whole lot of the obsession with GPU acceleration in terminal emulators and text editors is not driven by benchmarking against the fastest alternatives, but cargo-culting.
It seems strange to demand evidence and benchmarking but end with pure unsupported hyperbolic opinion. If you can’t explain why a huge number of major text editors are now starting to support GPU rendering directly, nor why the OS has already for a long time been using the GPU instead of the CPU for rendering, then it seems like you might simply be ignorant of their reasons, unaware of the measurements that demonstrate the benefits.
I really don’t even know what were talking about anymore exactly, since you moved the goal posts. I don’t disagree that most of the time text rendering isn’t consuming a ton of resources, until it does. I don’t disagree that many editors aren’t exactly optimized. None of that changes the benefits of offloading framebuffer work to the GPU. The ship sailed already, it’s too late to quibble. I can’t speak to rxvt vs Kitty (you really need to do a comparison of features and their design decisions and team sizes and budgets before assuming your cherry picked apps reflect on the hardware in any way), but by and large everyone already moved to the GPU and justifications are pretty solidly explained and understood, evidence for these decisions abounds if you care to look for it.
I have not demanded evidence. I have suggested that you try benchmarking and measuring this for yourself, because it would demonstrate very clearly the points I've been making. I don't need evidence from you, because I've spent a huge amount of time actually testing and profiling more terminals and editors than most people can name.
> If you can’t explain why a huge number of major text editors are now starting to support GPU rendering directly
I did give a reason: It's largely cargo-culting. Some have other reasons. E.g. using shaders to offer fancy effects is a perfectly valid reason if you want those effects. Most of these applications do not take advantage of that. Some, like the linked one, are learning experiences, and that's also a perfectly fine justification. Very few have put any effort into actually benchmarking their code to a terminal which is actually fast - most of the performance claims I've seen comes from measuring against slow alternatives.
Hence why "cherry picking" rxvt: The point being to illustrate that when someone can't even match rxvt, which is open source and so easy to copy and at least match in performance, maybe they should actually figure out why rxvt is fast first before worrying about the specific low level rendering choice.
One of the things people would learn from that is that the rendering choice is rarely where the bottlenecks arise.
> then it seems like you might simply be ignorant of their reasons, unaware of the measurements that demonstrate the benefits.
I've done many enough code reviews and enough measurements and benchmarking of terminals and text rendering code over the years that I'm quite content in my knowledge that I know the performance characteristics of this better than most of those who have implemented the large number of poorly performing terminals over the years. That's not a criticism of most of them - performance on text rendering is simply such a minor aspect of a terminal because most of them are fast enough that it's rarely priority. What this has taught me is that the rendering itself isn't typically the bottleneck. It hasn't been the bottleneck for most such projects since the 1980's. You can continue to disbelieve that all you want. Or you can try actually looking at some of this code yourself and try profiling it, and you'll see the same thing. Your choice.
> I really don’t even know what were talking about anymore exactly, since you moved the goal posts.
I did no such thing. In the comment I made that you replied to first I made two claims:
1. that it is not a given GPU use will be faster.
2. that for most text-heavy apps there's little reason for the text-rendering to be a bottleneck either way.
I stand by those.
> I don’t disagree that most of the time text rendering isn’t consuming a ton of resources, until it does.
My point is that the "until it does" is largely irrelevant, as even the most pathological case for a terminal is easily accommodated and in practice almost never occurs. And as you pointed out a lot of OS text rendering support is accelerated anyway (or will get there), which makes this largely moot anyway by making it a system concern rather than something an application ought to put effort into dealing with.
> I don’t disagree that many editors aren’t exactly optimized. None of that changes the benefits of offloading framebuffer work to the GPU.
The point is that the failure of prominent "accelerated" terminals to even match rxvt is a demonstration that spending the effort to write code that is far less portable is a poor tradeoff and tend to show that people writing these terminals rarely understand where the actual bottlenecks are.
> but by and large everyone already moved to the GPU
This is only true to the extent you're calling running on top of OS API's that have been accelerated as "moving to the GPU". The vast majority of terminal emulators and editors are still relying on generic APIs that may or may not be accelerated depending on where they're running. Only a small minority are targeting GPU specific APIs. In this context when people are talking about GPU accelerated editors or terminals, they're talking about the latter.
Standard frameworks give you text frames with minimal hooks to customize them and "good enough" performance. They're great for dumping a log file to or for small documents, but when you want the fine control of a text editor that can handle big files, you need to roll your own.
GPU acceleration of the text editing viewport, which will not show much more than 100 lines of text at any time, has almost nothing to do with the issues in editing large files.
I suppose "good performance" is subjective. I run VS Code on a 4k (well, 3840-by-2160) monitor attached to a ThinkPad T440, and the performance is acceptable. The CPU does get busier, but not by much.
Though, the biggest limitation is that the framerate is only 30Hz since the HDMI version is older on that laptop, so maybe that has something to do with it as well.
VS Code certainly uses GPU accelerated rendering, it's an Electron app after all. Though yes only needing to hit 30 FPS to not be the bottleneck in your case does make things a bit moot.
You could get good performance with CPU rendering on machines several orders of magnitude slower than what we had today to screens that certainly did not have as many orders of magnitude fewer pixels.
It's the right choice. Otherwise you're wasting power for no reason. The only reason not to use the GPU is driver bugs, usually on Linux, that cause it to use more energy than the CPU.
Learning, as it reads on the page. GPU programming is hard, and splitting work between CPU and GPU is too, at least if you try to make the program go faster.