
Making Direct3D games faster in Wine using modern OpenGL - BlackLotus89
https://comminos.com/posts/2018-02-21-wined3d-profiling.html
======
platz
I'm fascinated by how someone with this domain level knowledge finds a problem
like this and solves it.

Is there some correlation between WoW players and wine/graphics programmers?

Is this a game dev who happens to play WoW?

Externally, the phenomenon of a gamer (i.e. user) having domain level
knowledge of complex technical understanding is intriguing.

(I'm partially motivated by the fact that as a generalist programmer I don't
think id get anywhere the level of understanding needed to produce something
like this)

~~~
zanny
This guy had a problem, slow fps in a game, knew the generalized view of how
Wine worked, and used the tools they knew of to try to fix the problem.

I've never written a line of DX, GL, etc but I know what command buffers /
driver synchronization / AZDO are like this article mentions.

I also play WoW on Linux, and am kind of embarassed I didn't think to try perf
monitoring the game for easy to fix huge slowdowns like this. I kind of
assumed since WoW is one of the most popular Wine games and generally pushes
the DX api support to make sure it always works that the main Wine devs would
optimize it more.

That being said, buffer_storage is a GL 4.4 extension and Wine has this awful
habit of trying to strictly support OSX, which will never see OpenGL beyond
4.1, and I'm not sure if buffer_storage is available there. That alone might
mean these patches are never merged mainline, which would be... inconvenient.

------
twtw
"Fundamentally, it’s a function that maps a slice of GPU memory into the
host’s address space, typically for streaming geometry data or texture
uploads"

Can someone clarify this for me? Are OpenGL/D3D buffers that get stuff
memcpy'd into them by the CPU actually "slices of GPU memory," or are they
more often reserved driver memory that eventually get DMA'd to the GPU? (I
realize both probably happen at different times, but I'm curious which is more
typical for modern systems)

It seems like spending CPU cycles writing every byte over the bus would
perform much worse than a fast write to sysmem followed by a DMA transfer.

EDIT: I looked into it, and it seems like the typical implementation is that
map returns a pointer to some pinned driver sysmem and unmap kicks off an
async DMA to GPU memory.

~~~
elFarto
> Can someone clarify this for me? Are OpenGL/D3D buffers that get stuff
> memcpy'd into them by the CPU actually "slices of GPU memory," or are they
> more often reserved driver memory that eventually get DMA'd to the GPU?

The answer is, as with all things OpenGL, it depends. You might get back a
pointer to GPU memory that you can directly write to, or you'll get back some
chunk of system memory the driver has.

The ARB_buffer_storage extension improves matters as you can almost guarantee
that you'll get GPU memory, and you can keep it mapped for the entire lifetime
of your application (the old buffer APIs wouldn't let you keep it mapped
during a draw call). The downside is that you're now responsible for
synchronising access to that data.

But as for "is it quicker?", maybe. DMA transfers aren't free, they take time
to setup. Usually they need to operate from a limited pool of source memory.
If the driver has to take a local copy of your data to copy it (which it will
do for every glBufferData/SubData call), then you might as well copy it
yourself, GPUs aren't hurting for PCIe bandwidth these days. In addition, you
can use a separate thread/CPU core to do the copy, since unlike every other
OpenGL call, mapping memory and memcpy'ing doesn't require an OpenGL context.

------
lostmsu
The site looks beautiful and loads insanely fast. Can we easily reuse the
whole theme?

~~~
stickydink
It is delightfully simple, nice and clean.

A pleasing font family, no JavaScript, some basic CSS, stick to basic HTML
tags and use them properly.

[https://comminos.com/css/default.css](https://comminos.com/css/default.css)

~~~
Tech-Noir
> no JavaScript

Not to knock an otherwise nice site (CTRL & \+ improves readability for me
personally, though), but:

    
    
        (function(d, s, id) {
          var js, fjs = d.getElementsByTagName(s)[0];
          if (d.getElementById(id)) return;
          js = d.createElement(s); js.id = id;
          js.src = 'https://connect.facebook.net/en_US/sdk.js#blahblahblah';
          fjs.parentNode.insertBefore(js, fjs);
        }(document, 'script', 'facebook-jssdk'));
    

AFAIK, even for sites that want to feed the monster, that's unnecessary:

[http://chrisltd.com/blog/2015/04/social-share-like-
buttons-w...](http://chrisltd.com/blog/2015/04/social-share-like-buttons-
without-javascript/)

[https://sharingbuttons.io/](https://sharingbuttons.io/)

~~~
kbenson
So I decided to test the christld ones:

Twitter seems to work (brings up form).

Facebook redirects to an error.

Linkedin seems to work (brings up form when logged in).

Pinterest seems to work (brought up create board dialog when logged in)

Google+ seems to work (brings up share form).

I do have to say that for me that these would actually work and the JS likely
wouldn't in some cases, since I make _heavy_ use of Firefox's containers now
to sandbox a lot of online identities, and just have new windows for certain
URLs automatically load in the correct container.

~~~
Tech-Noir
Thanks, that's interesting. The chrisltd blog is from 2015, so I did wonder if
some of it might be out of date by now.

AFAICS (which I didn't notice at first) sharingbuttons has a slightly
different FB URL, which may not give an error:

[https://facebook.com/sharer/sharer.php?u=YOUR-
URL](https://facebook.com/sharer/sharer.php?u=YOUR-URL)

Compared to chrisltd's:

[https://facebook.com/sharer.php?u=YOUR-
URL](https://facebook.com/sharer.php?u=YOUR-URL)

------
stefan_
This a great and concise writeup, thanks.

But I was left wondering what the actual problem was. Why is glBufferMap slow?
Is it just the impedance mismatch between D3D and GL that don't have the same
synchronization guarantees for that specific call? Why does Wine have it's own
command handling thread when very likely, the underlying OpenGL driver has
one, too?

~~~
Jasper_
glMapBuffer doesn't have any ability to declare that you won't overwrite data.
All you can say is whether you want read/write access to the buffer. So the
driver has to assume that the client might overwrite in-flight data, so
synchronization is required.

As for the command stream handling, it makes decent sense to do translation
up-front and your drawing commands into a command stream so a separate thread
can just hammer through it as fast as possible, rather than doing GL calls in-
line with the translation. Partly so the game can return to doing its thing as
fast as possible, and partly to fix issues with GL's threading model being
horrible ( see e.g.
[https://bugs.winehq.org/show_bug.cgi?id=24684](https://bugs.winehq.org/show_bug.cgi?id=24684)
)

~~~
kllrnohj
> glMapBuffer doesn't have any ability to declare that you won't overwrite
> data.

Well, yes and no. glMapBuffer _Range_ , which is basically a drop-in
replacement for glMapBuffer, does have such a flag, it's
GL_MAP_UNSYNCHRONIZED_BIT.

glMapBufferRange requires OpenGL 3.0 whereas glMapBuffer exists all the way
back to OpenGL 2.0 but this looks more like just an oversight in Wine than
anything else.

~~~
twtw
Wine uses glMapBufferRange when available, and has logic to translate
D3DLOCK_NOOVERWRITE to GL_MAP_UNSYNCHRONIZED_BIT. I don't understand what the
current implementation is doing that is causing wined3d_resource_map to block.

[https://github.com/wine-
mirror/wine/blob/538263d0efe725124df...](https://github.com/wine-
mirror/wine/blob/538263d0efe725124df88ce1cce124bc3ad7e2af/dlls/wined3d/resource.c#L422)

------
garaetjjte
Another option is using gallium nine, which directly uses D3D state tracker in
driver skipping GL layer.
[https://wiki.ixit.cz/d3d9](https://wiki.ixit.cz/d3d9) (though on nvidia
nouvenau will be probably slower than propertiary GL driver)

~~~
stuaxo
Weird, since Nouveau itself is based on Gallium.

~~~
garaetjjte
I mean that D3D -> Nouveau could be still slower than D3D -> GL -> NVIDIA.

------
flafla2
Excellent writeup. I would be curious as to the specific considerations given
to a heap allocator on the GPU. Related: I'm not too familiar with Wine
patches - what is the easiest way to view the final source code of this patch?

------
Sytten
Very nice article and project! Have you talked with Wine people to see if you
could eventually merge your patch in the official codebase? Seems like it
would help a lot of us linux gamers :)

~~~
l1n
From the article: "I hope to mainline this once the patchset becomes more
mature."

------
hawski
Wined3d probably can use Vulcan instead of OpenGL in the future. Did anyone
start working on such a port?

~~~
bri3d
There's VK9 for Direct3D 9, DXVK for Direct3D 11, and the official Wine-
supported VKD3D for DirectX 12.

[https://github.com/disks86/VK9](https://github.com/disks86/VK9)
[https://github.com/doitsujin/dxvk](https://github.com/doitsujin/dxvk)
[https://source.winehq.org/git/vkd3d.git/](https://source.winehq.org/git/vkd3d.git/)

------
asdfv09s9d80fu9
Really cool! Would love if the blog had RSS though!

~~~
l1n
Just for you <3
[https://huginn.lin.anticlack.com/users/1/web_requests/80/com...](https://huginn.lin.anticlack.com/users/1/web_requests/80/comminos.xml)

