Hacker News new | comments | show | ask | jobs | submit login
Maybe it’s time to talk about a new Linux Display Driver Model (yosoygames.com.ar)
224 points by epsylon 896 days ago | hide | past | web | favorite | 65 comments

Yea, that's cute, except that DRM can handle pretty much all of it. You can split hairs and complain about GPU scheduling which is inherently rather difficult because scheduling at the command queue level is problem very much alike the halting problem. The real issue isn't that we don't have the pieces we need but rather that we can't get all the players to agree on using the same ones. On Windows you have one entity (Microsoft) that can post WLK and unless you pass it you won't be certified and on GNU/Linux "a working driver" can be anything from "not catching on fire on boot" through "actually brings up display" to "oh, hey a textured triangle!". And I get it, everyone is frustrated because ultimately displaying a bunch of pixels, seems trivial, that is, until you mix in politics. You have NVIDIA, AMD, Intel and the community at large pulling all in different directions. With GNU/Linux graphics support having marginal effect on the bottom line there's little incentive to deal with it. And you'd still miss a controlling entity that could validate that "works on Linux" means anything but "compiles with some random kernel release".

Everyone who thinks that writing great graphics drivers can be a spare time activity is delusional. The fact that we have Android with Gralloc (which in comparison to DRM is, well, a joke), Ubuntu with Mir, others trying out Wayland and folks still stuck on X11 makes this all so much more complicated than it needs to be (and SteamOS is rather terrible in this regard too, which is a shame because Valve is trying to do the right thing with Vulkan but SteamOS is just not a well put together distro, at least right now). It's just not a driver model problem, it's the politics of it all. Outside of Google adopting DRM instead of Gralloc (or Gralloc getting all of the features on DRM and effectively becoming DRM and replacing it on the desktop) there's probably little chance of unifying all the drivers under one coherent umbrella.

I am sorry, sir. What is DRM? I only have Digital Rights Management flashing in into my memory, but I have no idea what you could mean with DRM. Apparently I've been able to ignore and live behind the moon, not knowing what DRM is, while using Linux mainly since 2003-2004.

I'm sad that there is no alternative to X that works with `legacy` graphic cards like mine (Radeon HD 4500). But to be honest I'm personally just concerned due to already mentioned security problems with Xorg.

A quick Google search found "Direct Rendering Manager": https://en.wikipedia.org/wiki/Direct_Rendering_Manager

Doesn't look like something you'd necessarily need to know about unless you were developing for the GPU on Linux.

Right, exactly. DRM is responsible for memory management, mode-setting, command submission and is capable of basically everything the original article asks for.

Unfortunately only the Free Software drivers fully support it, AMD is planning to switch its proprietary drivers to it (using their new amdgpu drm driver), and Nvidia doesn't use it on desktop at all but will likely at least implement the prime buffer sharing for optimus at some point.

There are various reasons for the current situation, e.g. Nvidia's OpenGL drivers are simply the best (on both Windows and GNU/Linux) so they had little reason to play nice with others, especially given that their mode-setting code was a bit better than what DRM used to do which is not the case anymore and hasn't been for a while, but companies need financial incentives to rewrite parts of their working codebases.

And even if you get all of the desktop GPUs on top of DRM then you'll still have Android and mobile GPUs. Certain mobile GPUs do work with DRM but that level of support is usually embarrassing and mostly done to be able to say "we did" rather than actually support any useful features in it which, is not completely unreasonable, because DRM doesn't buy you almost anything on Android right now so why bother.

Now, having said that, WDDM, particularly 2.0, does do some things that DRM can not. But DRM is not far behind (and capable of certain things wddm can not) and the real travesty is that not all GPU drivers on Linux (both GNU/Linux and Android) use it.

The perfect Linux (again, both GNU/Linux and Android) graphics stack will be: DRM kernel driver (not only command submission but memory management and mode-setting), Vulkan user-space driver (Vulkan will become basically what Gallium is right now which is a common layer on top of which we'll implement other, more friendly graphics apis) and Wayland.

DRM solves the multi-gpu sharing, synching, recovery, memory management, gives us a central place to manage security (although with GPUs that's a sort of an "interesting" topic which is a long discussion in itself) and do some rudimentary scheduling (which is, as mentioned previously, also "interesting"), Vulkan, as a side-effect of the fact that it's so tight and will come with a conformance framework, will make drivers a lot more predictable and stable and the Wayland faq explains why it's a neater choice than X11.

Unfortunately, while Google will adopt Vulkan, I doubt they'll have enough good will and reason to drop SurfaceFlinger and Gralloc in favor of DRM and Wayland. So we won't likely get to that stack anytime soon.

Calling it a political problem and not a technical problem doesn't make it less of a problem. Nor does dismissing it as "cute" offer much in the way of a solution.

Yes, except that I didn't do either one of those things. I was responding to the original article which is calling for a new driver model, which, as pointed out, is neither the problem or the solution.

Sorry, it was somewhat unclear what you were calling cute.

At least in software. I think politics can also be a technical problem.

People very smart, knowledgeable, wise, persistent, patient, cleaver, prudent etc. have been able to manage organizations with different concerns, priorities and goals.

The thing about being able to read other process memory under Linux has always driven me nuts. Linux is fairly secure _except_ for this one giant gaping hole that nobody mentions.

This is extra fun considering WebGL.

WebGL implementations zero out newly allocated buffers for exactly that reason.


How long until we get a heartbleed-like attack where some software overruns its GPU buffer into a webgl buffer?

FWIW I've been running with WebGL disabled because of similar concerns.

WebGL works if you enable it? On Linux I've never seen anything other than error messages from WebGL.

WebGL exercises some new corners of the OpenGL interface that many drivers weren't prepared for when WebGL arrived, so browsers tend to have an extensive blacklist: if you have a driver on the list, you won't get WebGL.

On a rolling release like Debian Testing, with well-supported hardware like the Intel Ivy Bridge integrated GPU, WebGL has always worked beautifully.

You can try overriding the blacklist in Chrome, that worked well for me when I needed to for a Debian/Nvidia system a couple years ago.

Ahh, I've been disabling it on Windows, however silly that might be.

Care to provide a link explanning this?

Should be pretty straightforward to fix. I wonder why it isn't. Is someone discussing the issue?

The Yama patchset has been in mainline for the last few years, and disallows non-root processes from reading each others' memory (except for direct children).

The trouble is that it's a sort of retrofitted check. It works surprisingly well, but being added to a 20+-year-old OS and a 40+-year-old design, it can't guarantee being correct. The traditional, historic process boundary on UNIX has been user accounts / UIDs. Desktop Linux needs to figure out a way to do what Android does, and run every application with a separate UID, including the ability to have private files that are unreadable to other applications.

The challenge is defining "application". Android, being a greenfield platform, could define the interaction between applications. On desktop Linux, it's obvious that, say, a web browser and a PDF viewer should be different applications. The two processes in a "find | grep" shell pipeline probably should count as the same application. In between, it's pretty blurry, because we've had 40 years to develop patterns that don't account for this isolation ever happening.

"On desktop Linux, it's obvious that, say, a web browser and a PDF viewer should be different applications"

Not completely obvious. The web browser could use the PDF viewer to render PDF and/or the PDF viewer could use the web browser to run JavaScript embedded in a PDF file (http://www.adobe.com/devnet/acrobat/javascript.html). Both examples are similar to your find|grep example.

Yes, that typically would be done by reusing libraries, not by starting processes, but modern browsers run a separate process per tab and some run flash in a separate process. Running a PDF viewer in a separate process would be a logical extension.

Perhaps you would be interested in my project, subuser.org

Unfortunately, it does absolutely NOTHING with regard to OpenGL isolation :( (That is if you give programs permission to talk to the graphics card, which is denied by default...)

Neato! Many years back I was playing with Xpra for this, but it was in the days before either Docker or useful seccomp. Glad to see something making progress here.

Have you looked at OpenGL virtualization e.g. Virgil (https://virgil3d.github.io/ , https://www.kraxel.org/blog/tag/virgl/) for that problem? The idea is to use Gallium to give them just enough access to get host 3D acceleration, but not direct access to the graphics card.

Looks interesting. But no, I have not tried it. It isn't really a priority for me. I'm not a gamer, nor am I a climatologist, and while I do use CAD, I've had other, simpler, problems to solve :D

GNOME is trying to do this: https://wiki.gnome.org/Projects/SandboxedApps

Knowing nothing about the issue, I speculate many programs have come to depend on the behavior. Similar to antique bugs in x86 that were baked in to the architecture when programmers had to work with them on the 286 & 386. After that they couldn't be fixed because programs depended on the presence of the bug.

I'm having a very hard time imagining there exists even a single program out there that depends on being able to read the memory contents of a program that ran before itself.

I bet it comes down to driver developers never even thinking that this could be a security issue before people came up with the concept of running unauthenticated 3d accelerated software (i.e. javascript+webgl from the network in a browser) - "what do you mean security, the user could just have taken a screenshot at any time. why should we waste precious FPS clearing buffers when the program will fill in its own textures before drawing anything".

Reminds me of GO.COM (http://peetm.com/blog/?p=55) That is/was an example.

GO.COM contained no program bytes at all – it was entirely empty. However, because GO.COM was empty, but still a valid program file as far as CP/M was concerned (it had a directory entry and file-name ending with .com), the CP/M loader, the part of the OS whose job it is to pull programs off disk and slap them into the TPA, would still load it!

So, how does this help? Well, using the scenario above:

the user exited WordStar the user ran DIR (or whatever else they needed) and at some future point would be ready to re-run Wordstar the user now ‘loaded’ and ran GO.COM the loader would load zero bytes of the GO.COM program off disk into the TPA – starting at address 0100h – and then jump to 0100h – to run the program it just loaded [GO.COM]!

result – it simply re-ran whatever was in the TPA when the user last exited to DOS – instantly"

It also wasn't unusual to try and recover data from memory after a program crashed by launching a small program that dumped memory to disk. 100% reliable? No, but it was the best one sometimes had.

The best thing about go.com (I called it x.com) is that, if you don't have one on your disk, you can simply do

    SAVE 0 X.COM 
and it'll be there for you.

> I'm having a very hard time imagining there exists even a single program out there that depends on being able to read the memory contents of a program that ran before itself.

gdb, when using it to attach to an already running process.

Not exactly, GDB would be accessing memory that is still allocated. I think the OP was referring to accessing memory contents from deallocated regions. Also we're talking GPU memory here, for the most part. :)

Still remember the days where one had to compile the kernel to enable sound in doom. The fact that most hardware is supported today is a blessing.

And agree with the author. X11 has aging issues.

Or for that matter XFree86 / Xorg and Xlib have aging issues. IMHO the protocol is okay (it's not perfect). Maybe cut away some of the legacy cruft, implement antialiased rendering for core primitives (yes, even if they are just lines, arcs and ellipses, those are still needed even on modern screens; heck with all the flat style UIs currently being all the rage on a HighDPI screen you could probably disable XRender and nobody would notice).

The whole framebuffer handle management of Wayland is a huge step in the right direction, but I despise Wayland's input and inter client communications model; not to say that X11 was anyway better, but whenever I code against Wayland, that part feels just wrong. Also the obsession of the Wayland devs of getting VSync right has its own share of issues (especially for low latency graphics, like you need it for VR).

It's actually ironic that the Windows driver situation is worse than with Linux (with the exception of GPU drivers)

How much of the compositor problems will be fixed by Wayland? I know a big goal of Wayland is to address tearing and corruption.

Wayland does address basically all of the tearing problems assuming a correct GPU driver (all of the opensource ones). There are several different problems, all of which are solved:

- Clients only provide complete buffer updates rather than drawing on to the front buffer, fixing tearing for all non-OpenGL apps

- Each monitor can have its own set of buffers and sync signal (technically the wayland protocol doesn't specify this, but makes it possible, and Weston does it correctly)

Anecdotal, but these "compositor problems" are not inherent to Linux or X11. I am extremely sensitive to things like tearing, and I can tell you it simply doesn't occur on my linux machines (though it does require enabling a TearFree X11 option on my intel GPU).

Wayland, unfortunately, seems to be the type of project that has been a year or two away from usability since its first release in 2012. I suspect we'll be stuck with X11 for a while yet.

The open source Radeon driver also does a good job of avoiding tearing. I think there are some pathological cases where it can still occur, but it is pretty rare. Catalyst was terrible at avoiding tearing the last time I used it (getting on to be 3 years ago now). I haven't looked into it, but it is possible that the tearing issue may not be solvable without a KMS driver (i.e. no binary blob drivers can do it).

I used Wayland/Weston for a full work day about a year ago to try it out. It really was very nearly ready. I can't quite remember what problems I ran into, but I could do my work (as a web developer) with very few real problems. Probably it's time to look at it again.

Interesting to hear, I might give it a shot too now.

In case anyone else has been getting the "Bandwidth Limit Exceeded" error when attempting to access the page, here's a link to Google's cached copy:


It was time to talk about a new Linux display driver model 15 years ago. 10 years ago I gave up on Linux for this reason and switched to OSX

I ran into issue 3) and a variation of 4) recently trying to accomplish what should be a pedestrian task: running Xvfb on Heroku. X11 is a pain in the ass to package, I spent a day on it and gave up.

I am getting this error

"Bandwidth Limit Exceeded The server is temporarily unable to service your request due to the site owner reaching his/her bandwidth limit. Please try again later."

I am currently using 2 displays connected to my graphics card and one connected to my igpu on my Linux desktop at home, and it's really not that hard to set up. Unfortunately the GPU does all the rendering and passes images to the iGPU, but I believe this is the case too with Windows. Also there are a few bugs when using this setup to do with monitor positioning.

You misread OPs point. He is saying Windows can do both. And vsync across cards. That is quite a feat. If using separate outputs, this is probably achievable on Linux with a bit of work. Also vsync with a compositor shouldn't be impossible. However, if the processing is done on on two graphics cards and we output only on one of them then we need cooperation from the device manufacturers to provide us drivers or implementation details on how to drive their hardware muxers. Without that info, we're dead in the water.

> And vsync across cards. That is quite a feat.

Well, what it does is, that the compositor (DWM) keeps a (small) backlog of presented window framebuffers (think GL SwapBuffers, that add to the queue) and whenever a screen's VSync comes along it blits the most recent picture presented to it. Which has the effect that if you have a window spanning multiple GPUs' outputs they may not all show the same frame, because the display refresh frequencies will beat (even if you set the same refresh frequency on all outputs, since between different GPUs they're not driven by the same clock, you'll get a little bit of clock skew; and video clocks are usually not of the low drift type, because for video it doesn't matter if you're off a few ppm). If you need synchronized video sync vendors are happy to sell you genlock-able cards for $$$.

Initially at least, I don't think Windows did anything of the sort - laptop vendors were shipping carefully-vetted and tweaked versons of the Intel and $OTHERVENDOR drivers that knew how to talk to each other and couldn't be upgraded in the normal way. Naturally, they didn't bother doing this for Linux.

Oh. How did you do this? Are you using an NVidia GPU? Does XRandR work? I couldn't get this to work at all, though I was using the closed drivers (since I couldn't find any evidence CUDA works with the open ones). Do you need the open drivers for this?

In the end I spent £30 on a second GPU. Problem solved... ish. XRandR still didn't work, and GNOME3 switched to fallback mode, which is basically GNOME2, i.e., a totally different window manager and a totally different shell.

I have no opinion on the performance on Windows vs that on Linux (not a concern, hence the £30 GPU...) but this whole situation was certainly a lot less hassle on Windows ;)

I used xrandr --setprovideroutputsource to set one image provider as the source of the others image.

Maybe it's time to upgrade the hosting plan.

Maybe it's timw to pay for some more bandwidtth

Let's incorporate it into systemd. That will solve everything.

New year, new Linux graphics driver model. CADT is alive and well.

Be grateful that graphics cards work on Linux at all. For over a decade, they barely did.

Depends on what you mean. I've been running Linux for 20+ years and have never had a graphics card that I couldn't get to work. If you're talking about acceleration, there have obviously been issues getting manufacturer support, but as a non-gamer, that mattered little to me.

Haven't been using linux as long as you, more like 10-15 years, but I've noticed during that time that the effort required to get a graphics device working has shrunk significantly (and I don't attribute that to an increase in my knowledge).

I definitely use linux primarily for other stuff these days, though, so it's not often I'm setting up a new graphics card with linux anymore.

So no endless hours trying to get a config file that wouldn't set the monitor in flames?

Because I did spent such hours back with Slackware 2.0.

Slackware 2.0 was released in August 1994. Even 1.0 (which is probably the first distro to have XFree86) was released in July 1993. I used MCC before that and I don't think X was available (though, to be fair, I can't really remember when I first got X working on my machine). I distinctly remember having to wait until mmap() was implemented before anyone could even think about getting X to work. :-)

It is quite possible that one had to still edit modelines by hand at that time, but it really wasn't long before the modeline db came out and you could just plonk in canned values with a good expectation it would work untouched. Certainly after that time I never had any difficulty. About the only thing I ever did was to follow instructions to modify a couple of numbers for monitors that had bizarre refresh rates.

I think that this would probably fall pretty close to the "20 years ago" time frame (August 1995). Slackware 3.0 came out in November of that year. 2.0 of the kernel came out in 1996 and I'm quite sure that I wasn't spending any time editing modelines then.

If you wanted to get total plug an play without any configuration at all, I think you would have to wait until 2004, when Ubuntu was first released. So while the original statement that it took 10 years to get video drivers working is completely wrong, I think it would be fair to say that it took 10 years to have an equivalent install experience to Windows on that front.

If you wanted basic 2D stuff yeah, then there were the adventures with hardware accelerated 2D and the first batch of 3D accelerator cards, followed by first generations of integrated 2D/3D cards.

It wasn't just getting the monitor refresh rate working.

Personally, I never had any problems with that. I was admittedly careful to buy supported hardware. 2D acceleration just worked as far as I could tell. Nobody I knew ever had a problem with it at all.

I started using 3D acceleration when Compiz/Beryl was released (2006) with an NVidia card. I was playing World of Warcraft under wine at that time too. The only problem I had was when I updated the kernel I had to remember to recompile the drivers under Debian. I switched to Ubuntu for that reason (because then I didn't have to do anything).

Eventually I retired that and went with an Intel board. It worked 100% but was dog slow at the time (pre-2010 Intel drivers were quite good at 2D acceleration but abysmal for speed at 3D). Compiz worked perfectly, the only problem was a pretty big hit in frame rate for WoW.

I replaced that with a Radeon Card which worked perfectly under Catalyst. I started hearing rumours that the free software drivers were getting good performance numbers and switched to that. 2D was amazing (dramatically better than Catalyst) and 3D was slightly slower, but comparable.

Finally replaced that with a few laptops running Intel hardware again. By this time (2012) the 3D performance had dramatically improved and I'm quite happy with it. Obviously not a gamer set up, but more than adequate for casual gaming.

Before 2006, I think you were pretty much stuck with Nvidia and proprietary drivers if you wanted 3D. Again, I never heard anyone who had real problems other than recompiling the drivers if you were on a distro that didn't do it for you.

I actually worked with Gavriel State at Corel before he started up TransGaming. I thought he was completely nuts to do it because at the time there was almost nothing that ran under Wine properly. That was 2001. Honestly, if you were trying to get 3D working on Linux, you were mostly doing it just to say you did it.

I'm not going to deny that some people had problems with video cards. I often had to jump through hoops to get random hardware at work to give me nice displays. But if you were careful to buy supported hardware, there were very few problems.

To be fair, I did a lot of Windows development in the late 90s and early 2000s and I had at least as many problems with video drivers under Windows. The difference was that on Linux I would have trouble with new hardware, while with Windows I would have trouble with old hardware. The other big difference is that with Linux, you could almost always get something to work so that you could hack on it. With Windows, getting to a point where you could download updated drivers could be unbelievably painful.

I started with SLS linux which pre-dated Slackware and I certainly remember the scary warnings. But with a plain-vanilla graphics card and using ordinary settings, it wasn't an issue.

I would get a bare 640x480 resolution without tweaking.

How about instead of settling for slighty better than it used to be we continue to push for making it better and better over time?

This is a good example of one of the dangers and failure points of open source. When hardware integration matters, you're going to have sub-par results if you don't have a strong relationship with the hardware manufacturers. The best firmware is from companies that control both the software and the hardware. Linux fundamentally cannot have this relationship. Some things need to be very specifically controlled and designed, with business interests driving progress. Open source doesn't help much with this. Trying to build a system that works fluidly with the larger audience, when the audience is independent corporations, is for better or worse a nightmare.

I think you're trying to say that:

a. Graphics requires software and hardware to be integrated

b. Open Source communities don't work where you are driving designs in software and hardware - they are uncoordinated

I can't comment on (a). In terms of (b) you might be right for the idea of "hackers at night" open source. But, a lot of open source is 'professional', so a graphics company could solve this by:

1. Graphics companies provide hardware and information under NDA to specific Open Source developers. That gets you enablement but not necessarily a "design" as you can't direct their work.

2. The hardware companies hire Open Source developers. That way you have either all, or a big chunk of a specific area of the developers. As employees they will work together on the companies priorities.

3. The hardware companies set-up a foundation. Then the foundation co-ordinates and works on the priorities of the members. There are sub-groups of the Linux Foundation that work like this - for ARM Linaro is an example.

There's a meta-comment that talks about the economics of this area - that's a more significant challenge than the organisational ones.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact