
DirectX 11 vs. DirectX 12 oversimplified - snake_case
http://www.littletinyfrogs.com/article/460524/DirectX11vsDirectX12_oversimplified
======
gambiting
It really is oversimplified.

"Creating dozens of light sources simultaneously on screen at once is
basically not doable unless you have Mantle or DirectX 12. Guess how many
light sources most engines support right now? 20? 10? Try 4. Four. Which is
fine for a relatively static scene. "

For my Masters degree project at uni I had a demo written in OpenGL with over
500 dynamic lights, running at 60fps on a GTX580. Without Mantle, or DX12.
How? Deffered rendering, that's how. You could probably add a couple thousand
and it would be fine too.

"Every time I hear someone say “but X allows you to get close to the hardware”
I want to shake them. None of this has to do with getting close to the
hardware. It’s all about the cores"

Also not true. I work with console devkits every single day and the reason why
we can squeeze so much performance out of relatively low-end hardware is that
we get to make calls which you can't make on PC. A DirectX call to switch a
texture takes a few thousand clock cycles. A low-level hardware call available
on Playstation Platform will do the same texture switch in few dozen
instruction calls. The numbers are against DirectX, and that's why Microsoft
is slowly letting devs access the GPU on the Xbox One without the DirectX
overhead.

~~~
lbenes
>For my Masters degree project at uni I had a demo written in OpenGL with over
500 dynamic lights,

I don’t know what kind of 3D engine you wrote, but as a FPS gamer I have quite
a bit of experience with 3D engines. From Doom 3 to Alan Wake, some of the
worst performance hits occur in scenes with heavy use of dynamic lights. Did
OpenGL 4/DX11 fix this?

Which of the modern APIs have you actually used? How do Mantle, DX12, Metal,
and OpenGl Next compare?

EDIT: "as a non-3D graphics programmer, who's tried to build FPS levels for
fun and familiar with all the modern engines". This article is for those of us
with interest but not experts in the field, right?

~~~
gambiting
Pure OpenGL 4.0. It allowed me to do Tessellation and use Deferred rendering.
This is the key - rendering the lights in a separate pass is crucial to
performance, and like I've said - it allows you to have thousands of dynamic
lights in the same scene. Neither Doom 3 nor Alan Wake could use this
technique. I mostly work on consoles nowadays, so I use none of these APIs. On
PS3/PS4 you have to do everything manually, instead of using nice OpenGL API
to bind and send vertex buffer object,you have to allocate memory for it
yourself and copy it over manually to the address that you want. That's where
the speed lies. I have done some Xbox One programming but that's mostly
regular DirectX at the moment, I haven't had a chance to play with the DX12
stuff.

There is a quite a good explanation on how deferred rendering works:
[http://gamedevelopment.tutsplus.com/articles/forward-
renderi...](http://gamedevelopment.tutsplus.com/articles/forward-rendering-vs-
deferred-rendering--gamedev-12342)

~~~
lbenes
Great link and thanks for the explanation. So by using a deferred rendering
algorithm you can reduces the complexity O(m+n), (m=number of surfaces,
n=number of lights) where the forward lighting path renderers have complexity
O(m*n). The trade-off seems higher memory usage, not working well on older
hardware and with anti-aliasing and transparent objects explains why many
modern engines like Unreal 3 and IW engine don't utilize it.

------
jra101
"Last Fall, Nvidia released the Geforce GTX 970. It has 5.2 BILLION
transistors on it. It already supports DirectX 12. Right now. It has thousands
of cores in it. And with DirectX 11, I can talk to exactly 1 of them at a
time."

That's not how it works, the app developer has no control over individual GPU
cores (even in DX12). At the API level you can say "draw this triangle" and
the GPU itself splits the work across multiple GPU cores.

~~~
kllrnohj
Moreover the actual GPU doesn't work like that either. GPUs do not have the
capability to run more than one work-unit-thing at a time. They have thousands
of cores, yes, but much more in a SIMD-style fashion than in a bunch of
parallel threads. They cannot split those cores up into logical chunks that
can then individually do independent things.

The whole post isn't just oversimplified, it's just wrong. Across the board
wrong wrong wrong. The point of Mantle, of Metal, and of DX12 _is_ to expose
more of the low level guts. The key thing is that those low level guts aren't
that low level. The threading improvements come because you can build the GPU
objects on different threads, not because you can talk to a bunch of GPU cores
from different threads.

The majority of CPU time these days in OpenGL/DirectX is in validating and
building state objects. DX12 and others now lets you take lifecycle control of
those objects. Re-use them across frames, build them on multiple threads,
etc... Then talking to the GPU is a simple matter of handing over an already-
validated, immutable object to the GPU. Which is fast. Very fast.

~~~
jra101
Most GPUs group together individual ALUs into larger units (sometimes called
compute units or clusters or SMs) and each compute unit can run independent
work.

[http://www.anandtech.com/show/8526/nvidia-geforce-
gtx-980-re...](http://www.anandtech.com/show/8526/nvidia-geforce-
gtx-980-review/3)

~~~
bhouston
I believe that is true but I haven't seen this exposed via DX. Does dx12
expose this functionality? Does cuda or OpenGL expose this type of xontrol.?

~~~
m_mueller
You can have multiple CUDA kernels running simultaneously on the same GPU, but
you have no direct control over which SM(X) core is assigned to which kernel
AFAIK. So it actually works pretty similarly to multi threaded programming on
CPU if you take away thread affinity. In general I find a good way to
approximate a top-of the line NVIDIA GPU is to think of it as an ~8 core with
a vector length of 192 for single precision and 96 (half of full length) for
double precision. It has a high memory bandwidth which has the limitation of
requiring memory accesses to 32 neighbours simultaneously in order to make
full use of the performance. The CUDA programming model is particularly set up
in a way that the programmer doesn't have to handle this manually - (s)he just
needs to be aware of it. I.e. you program everything scalar, map the thread
indices to your data accesses and make sure that the first thread index (x)
maps to the fastest varying index of your data.

~~~
kllrnohj
> In general I find a good way to approximate a top-of the line NVIDIA GPU is
> to think of it as an ~8 core with a vector length of 192 for single
> precision and 96 (half of full length) for double precision.

But that's not entirely correct either. Yes you _can_ use it like that, but
you can also use it as a single core with a vector length of 1536.

In the context of over-simplification these are better thought of as single-
core processors. The reason being if you have method foo() that you need to
run 10,0000 times, it doesn't matter if you use 1 thread, 2 threads, or 8
threads - the total time it will take to complete the work will be identical.
This is very different from an 8-core CPU where using 8 threads will be 8x
faster than using 1 thread (blah blah won't be perfectly linear etc, etc).

------
quonn
> Cloud computing is, ironically, going to be the biggest beneficiary of
> DirectX 12. That sounds unintuitive but the fact is, there’s nothing
> stopping a DirectX 12 enabled machine from fully running VMs on these video
> cards. Ask your IT manager which they’d rather do? Pop in a new video card
> or replace the whole box.

Can someone please comment on that? Everything I know about GPUs suggests that
even if it would be possible to run VMs on them, performance would be
terrible. All those tiny GPU cores are not designed to do much branch
prediction, all the fancy out-of-order execution, pre-fetching etc. and your
average software will totally depend on that.

~~~
muizelaar
I believe the author is referring to accessing the GPUs from VMs. i.e. if you
have 16 core machine with a 1024 core GPU you can split it up so it looks like
4 machines with 4 cores and 256 core GPU.

The sentence that you snipped seems to refer to this. "Right now, this isn’t
doable because cloud services don’t even have video cards in them typically
(I’m looking at you Azure. I can’t use you for offloading Metamaps!)"

~~~
wtallis
It's already possible to partition GPUs to share them across multiple VMs.
Intel's got modified versions of Xen and KVM to support this for Haswell and
later IGPs, and AMD and NVidia both have proprietary solutions.

Furthermore, this has nothing to do with rendering APIs; it's all in the
driver and hypervisor.

~~~
muizelaar
I have no doubt. I was just trying to clarify what the original author may
have been intending to say.

~~~
lmeyerov
Yes, running the VM would be silly, but machine learning etc. is awesome. We
get acces and time sharing across resources much easier (at all) now. At
Graphistry, we are using AWS G2 instances for previously impossibly big
interactive data visualizations and streaming into stock browsers. It's pretty
ridiculously cool :) Always happy to chat about GPU clouds and related fun
things.

------
nawitus
How relevant is DirectX11/12 these days now that OpenGL is so successful
thanks to mobile and web?

~~~
tehaugmenter
I too would like to know the relevance. From a wiki article [1]

In general, Direct3D is designed to virtualize 3D hardware interfaces.
Direct3D frees the game programmer from accommodating the graphics hardware.
OpenGL, on the other hand, is designed to be a 3D hardware-accelerated
rendering system that may be emulated in software. These two APIs are
fundamentally designed under two separate modes of thought.

[1]
[http://en.wikipedia.org/wiki/Comparison_of_OpenGL_and_Direct...](http://en.wikipedia.org/wiki/Comparison_of_OpenGL_and_Direct3D#Comparison)

~~~
jsheard
You can run Direct3D in software, Microsoft even provides a software
implementation of DX11 (albeit the subset that works on DX10.1 hardware).

[https://msdn.microsoft.com/en-
us/library/windows/desktop/gg6...](https://msdn.microsoft.com/en-
us/library/windows/desktop/gg615082\(v=vs.85\).aspx)

------
agapos
"There is no doubt in my mind that support for Mantle/DirectX12/xxxx will be
rapid because the benefits are both obvious and easy to explain, even to non-
technical people."

Except that this won't only depend on the programmers, but on the management
too, and seeing what those guys do at EA, UbiSoft, Activison, etc, I am not
sure it will be properly utilized for the first wave of Dx12 games.

~~~
SG-
EA's "big" engine which is now Frostbite from Dice already has Mantle support,
it will likely be able to implement the DX12 side of it pretty quickly.

------
jokoon
and here I am, barely managing to find a decent, simple 3D C++ rendering
library to run on a mac with XCode.

Edit: 3D

It's pretty crazy how weird it is that there's no dead simple 3D rendering
library that supports just bare polygons. When I look at an opengl tutorial I
just flee when I see how much boiler plate code is necessary to make the mouse
move a camera.

~~~
jstanek
Check out SDL, Allegro, or SFML. [http://libsdl.org/](http://libsdl.org/)
[http://alleg.sourceforge.net/](http://alleg.sourceforge.net/) [http://sfml-
dev.org/](http://sfml-dev.org/)

~~~
jokoon
I meant 3D

~~~
gambiting
SDL supports OpenGL. So you can have SDL handle all the boring stuff with
window handles, keyboard and mouse movement, system events etc, and you are
given an empty OpenGL context so you can draw literally anything you want in
it. It doesn't get much simpler than that.

~~~
jokoon
nah, still need to use opengl when using SDL, to make a projection matrix,
VBO, all those fun things.

~~~
gambiting
"a decent, simple 3D C++ rendering library " \- 3D rendering doesn't really
get any more simple than a blank OpenGL context window. If you want "simple"
as in "easy to use" then Unity/Unreal Engine are both very easy to use.

~~~
jokoon
Those are massive engines, with plenty of features I don't need. I can't rely
on those engines because they're just made to make full fledged games, and
often they force many paradigm or methods of working I don't really like. Not
to mention the freedom thing, and how those engines are supported.

> 3D rendering doesn't really get any more simple than a blank OpenGL context
> window.

OpenGL is pretty low level. It's a bare standard bridge to make use of a GPU.
What I mean is that there are either very high level engines, or a very low
level graphics API, which requires a lot of work if you want to make anything
decent with it. There are things like glfw glew, but nothing like a very
simple to use 3D rendering library, that just does the bare minimum, like a
camera, quaternions, some text rendering, inputs, offers some simpler access
to what opengl has to offer, etc.

3D engines are always being made obsolete. Except irrlicht and ogre3d, which
are still fat enough in my opinion, there are no general-purpose, light 3D
renderer, that are not necessarily trying to do it all. This kind of engine
might serve as a thin wrapper to opengl to just avoid the work with the high
quantity of opengl calls.

~~~
gambiting
I've used libgdx[1] before, it's still fully supported and is being developed
further. It is still fairly low level,but it has just enough things in it to
make things like draw calls, vbos and shader management easier. That being
said, I still ended up writing my own shaders for lighting and shadows, as far
as I know there isn't anything in between libraries like this and huge engines
like Unity/UE.

[1] [http://libgdx.badlogicgames.com/](http://libgdx.badlogicgames.com/)

~~~
jokoon
java ? no thanks

------
patrickfl
Not a hardware guy so don't know a ton about this, the oversimplification was
a bit necessary for me. With that, I'm very excited for DX12, is it available
for download on Windows7-64 yet?

~~~
pas
It's about redesigning the current idiotic graphics pipeline API, so the
driver and the application doesn't have to use the inefficient abstractions,
that are already outdated on both ends. (The GPU is much more complex and
sophisticated than what it used to be. It's not just a fancy framebuffer with
a few megabytes of RAM so you can do z-culling and a 3D to 2D projection. And
nowadays with virtual texturing and megatextures, with the bottleneck being
the latency of pushing these abstract objects to the GPU, which are getting
replaced by in-driver compiled shader programs anyway, the whole thing is ripe
for a bit of fundamental change.)

Take some time to watch these. I'm also not a hardware guy, but these problems
are far-far from the hardware, and the solutions are very educational from a
distributed systems standpoint.

[http://gdcvault.com/play/1020791/](http://gdcvault.com/play/1020791/)

[http://www.slideshare.net/tlorach/opengl-nvidia-
commandlista...](http://www.slideshare.net/tlorach/opengl-nvidia-
commandlistapproaching-zerodriveroverhead)

------
snake_case
With the performance gains of Mantle and DirectX 12, where does OpenGL exactly
fall within all of this?

~~~
muizelaar
A next generation API comparable to Mantle and D3D12 is being developed inside
the Khronos Group: [https://www.khronos.org/news/press/khronos-group-
announces-k...](https://www.khronos.org/news/press/khronos-group-announces-
key-advances-in-opengl-ecosystem)

~~~
pjmlp
Everyone is hoping that it doesn't turn out into another Longs Peak.

------
bronz
Do you need to pay royalties to Microsoft in order to use DX12/DX11 in a
commercial game?

~~~
bhouston
Soft of, it only runs on Windows.

------
netsurfer912
nice

oh wait it only runs on "windows"

------
louhike
Everybody seems to be really critical about the article. But he does say that
it is an "extreme oversimplification". And he is a video game developer. Maybe
he is mostly working as the CEO at Stardocks now (maybe not) but I have
difficulties to think that it is just "wrong". I felt the criticisms were
unfairs. But I'm a web developper so maybe I just miss it completely.

~~~
wtallis
An extreme oversimplification can be expected to get the technical details
wrong but the conclusion right. This article gets the high-level consequences
wrong too, and claims that there will be benefits to things that are
completely unrelated.

