I think one of the primary reasons more people don't target GPUs to offload computation, is because it -- most often -- requires proprietary drivers running on the host OS, for it to work (at an acceptable speed).
Imagine if putting an AMD of nVidia card in your box was the same as adding a CPU. If you wanted to write OpenCL to execute on the GPU, thread allocation, memory management etc. for that would be part of an open source OpenCL library, that your application links to, and then you can write OpenCL kernels that execute on the GPU via these libraries. No need for proprietary driver blobs running on the host machine.
I hope we can get to a point where trying to sell a GPU that requires a proprietary driver running in the host OS is as viable as trying to sell a CPU that requires the same.
This is not to criticize Vulkan, the trend to lower level APIs like Vulkan, DirectX 12 and Metal is a good one but I don't think the goal should be to eliminate the driver entirely or get rid of any hardware abstraction layer on non console platforms.
Having the same for GPUs, where the vendor gives you an ISA and you can use whatever compiler you like (even hand written assembly), would be a lot more preferable. This is what the open-source Gallium3D stack does, though it's hindered by relying on reverse-engineered information.
With CPUs, the code needed to run in the host OS (the driver) is fairly simple code that gives access to the CPU hardware more or less as it is: practically speaking, everything the CPU itself can do, the compiler can output code to do directly. The output of the compiler (eg. gcc) is sent more or less as-is to the CPU for execution.
With a GPU, we have a large chunk of proprietary code running in the host OS (the GPU driver), which provides a 3D API interface, such as OpenGL or Direct3D, to the GPU hardware. There is a huge difference between what is sent to this driver code, and what the driver sends to the GPU hardware.
Applications submit, eg., OpenGL instructions to this driver, the driver compiles these OpenGL instructions into code that will execute on the GPU, and sends it to the GPU. So the GPU driver essentially acts as a closed source compiler, that compiles from OpenGL/Direct3D into whatever intermediate language the GPU accepts, which the GPU may further compile into something that its processors can execute.
So for a CPU, the compiler does most of the work, while for a GPU, the driver does most of the work, and actually functions as a closed source compiler. On top of that the driver handles memory management, and automatically allocates memory according which OpenGL instructions are executed.
Imagine if Intel provided a CPU that you could only use to execute Python code. It would require a closed source driver to work. This closed source driver would compile Python code submitted to it into some unknown instructions that execute on this CPU. It would also have exclusive control over an area of memory, that the driver would automatically allocate portions of, to store data from Python variables. That's essentially what nVidia and AMD offer, except the hardware is a processing unit that comes with its own RAM, and the language is OpenGL/Direct3D and not Python.
Sure you could have a standard like i386. But isn't it the case that there is a lot more innovation happening in the GPU architecture space making this very difficult?
Edit: just saw this:
We're not talking about standardizing hardware (the ISA), but the API: Vulkan. It really doesn't matter in which form your hardware processing unit comes, as long as its driver accepts SPIR-V and can execute it on said processing unit.
 Actually, that particular instance was a complete pain in the ass, because I had bought all of the parts new, but the motherboard didn't have the latest BIOS on it, and I didn't have a compatible CPU on hand. This was also before the liberal return policies that online vendors now have, so I ended up just ordering a different motherboard completely.
The BIOS runs outside of the OS and helps all the parts of your computer speak to each other.
This is why the process is a bit more involved than just running an update from your OS's update manager.
The "binary blob" that's referred to when talking about GPU's is the huge piece of software that you have to download and install in order to make you graphics card able to use all of its features. Some graphics cards won't even use a display's full resolution without the correct driver.
This is different to the CPU in that there is no big downloaded driver sitting between you and it.
If you really want to understand where everything sits, you should check out a book or search for information on Computer Architecture.
This is oversimplifying but my understand is that graphics programs like games send data and instructions as a job into a queue managed by operating system to the CPU which then has to then quickly decide what to do with it. GPU's function as a sort of co-processor in that the CPU will offload a portion or all of the instructions/data of jobs where GPU processing is indicated in the job instructions. This is why all GPU processing involves CPU overhead. Also GPU's are somewhat bottle-necked by this whole process.
It's my impression that current trends like Mantle, DX12 and Vulkan are trying to reduce the amount of stuff the CPU has to do to process jobs intended for the GPU. But they can never really eliminate the role of the CPU even in something like an ARM chip is embedded on the GPU board that handles most of the processing that the CPU would have done.
tl;dr - CPU is what makes the computer happen but the GPU is like a dedicated graphics co-processor.
Feel free to correct me if I'm wrong.
Currently the only reason for OpenGL over GPU being inevitable is that GPU vendors don't provide access to "non-GPGPU" parts (like TMU, ROP) directly to OpenCL (or CUDA), which is clearly a matter of design rather then technicality.
Unsurprisingly, people were offended or upset (you either don't use OpenGL, or you end up defending it against criticism, subconsciously, because you have invested so much time learning it).
I hope something good comes out of this initiative.
Well, then you're going be disappointed by Vulkan as well. OpenGL (and modern GPUs) are built around screen space rasterization of points, lines and triangles. Being aimed primarily at the graphics part of GPUs, Vulkan uses the same primitives.
The major difference to OpenGL is, that you no longer do stuff like
glActiveTexture(... + i)
vkCmdBindDescriptorSet(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, textureDescriptorSet, 0);
vkQueueSubmit(graphicsQueue, 1, &cmdBuffer, 0, 0, fence);
vkMapMemory(staticUniformBufferMemory, 0, (void **)&data);
It seems from the Vulkan Language Ecosystem graphic that they expect new languages to be developed that translate to Vulkan, for those who want to work at a higher level.
Actually no, because Vulkan itself is just a API. But what you can do it create high-level bindings, similar to, say, the Haskell X bindings that immediately leverage the asynchronous execution, lazy evaluation and built-in concurrent parallelism.
Also typesafety regarding the buffer contents can be mapped into a Hindley-Miller system as well, by looking at the tuple `(memory handle, descriptor)`; in Haskell (e.g.) that'd nicely map into a type constructor.
AMD and NVIDIA are free to implement Vulkan on Windows and Linux, but the problem is that we will probably be stuck with OpenGL 4.0 on Mac OS X for the foreseeable future.
Khronos is a funny beast, a bit like the United Nations. Just because these companies are all part of it doesn't mean they actually like each other and want to cooperate!
Apple has famously lagged far behind with their implementation of OpenGL. They only support OpenGL 4.1, the spec for which was released in 2010 .
Their developer tools are also pretty lacking on the desktop, and because they keep their drivers closed, external vendors can't do much to help. (Apple write their own drivers for Intel, NVIDIA and AMD GPUs, and even the engineers at those companies have little insight to what goes on behind Apple's closed doors.)
Perhaps this new, streamlined Vulkan API will be easier and more attractive for Apple to implement.
We live in hope...
It seems to me that this is an accurate description for any technology focused consortium or work group.
Despite urban legends propagating the myth, games consoles don't feature OpenGL APIs as such, rather using more low level ones, even if inspired by OpenGL.
Their frontend only implements up to GL 4.1, so even if the vendors were to release drivers independently there would be no way for an app to access the newer functionality through the OS X frameworks.
I just replaced a dead drive in an old Mini with an SSD, then installed Yosemite. I'm not really an Apple guy, so I was not aware that Yosemite implemented kext (driver) signing, which had the side effect of 'breaking' TRIM support for 3rd party SSDs. It was never supported to begin with, but prior to Yosemite you could enable it by changing strings in the kext; doing so now will render the Mac unbootable unless you globally disable kext signing. Apple does not offer any other ability for a 3rd party to write their own driver to make this work, short of possibly rewriting the entire AHCI stack, which is not feasible.
> Will work on any platform that supports OpenGL ES 3.1 and up
Apple would also then have to support three graphics APIs, vintage OpenGL, Metal (on iOS) and Vulkan (on iOS and OS X?), across four hardware platforms (AMD, NVIDIA, Intel, Imagination on ARM).
Here is the press release: https://www.khronos.org/news/press/khronos-reveals-vulkan-ap...
It seems Valve will work hard on it:
“Industry standard APIs like Vulkan are a critical part of enabling developers to bring the best possible experience to customers on multiple platforms,” said Valve’s Gabe Newell. “Valve and the other Khronos members are working hard to ensure that this high-performance graphics interface is made available as widely as possible and we view it as a critical component of SteamOS and future Valve games.”
Technical previews will be shown at GDC this week.
Valve etc. want this because OpenGL sucks, but DX is Win only.
Microsoft will want to defend its DX stronghold - can they, on a technical level? Will Nvidia play along?
Apologies for the SliderShare link, I hate SlideShare.
Sample code here:
Actually Valve labeled OpenGL to be "shockingly efficient" in one of their presentations (postmortem of porting the Source Engine to Linux).
> "I know it's possible for Linux ports to equal or outperform their Windows counterparts, but it's hard. At Valve we had all the driver devs at our beck and call and it was still very difficult to get the Source engine's perf. and stability to where it needed to be relative to Windows. (And this was with a ~8 year old engine - it must be even harder with more modern engines.)"
Hmm, makes sense. But then NVidia really loves OpenGL.
Also fwiw in DX12 even if parallel submission is supported, what's going to happen is N user threads are going to submit to 1 driver thread that then submits to the actual GPU.
And yes, ordering is needed here.
I can see that it might be not a major issue if time that takes it to fetch from the queue is way smaller than time it takes to execute some command [buffer]. I.e if queue manager will fetch and push commands to the queue, returning to fetch another without waiting for the first to finish, then it will work fine (it will have to handle GPU saturation though). But if it will be blocking - it will be some bottleneck which won't let using hardware fully.
Vulkan is basically a rebranded Mantle. I suspect you're just mistaken about Mantle, but as that API is not (yet?) open we don't really know if this was a change Khronos made when adopting Mantle or if this is just how Mantle works as well.
DX12 works the same as Vulkan, though, with a single submission thread ("The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process." http://blogs.msdn.com/b/directx/archive/2014/03/20/directx-1...)
It shows that multiple queues can be used for one GPU. I guess Vulkan can allow the same but it's not clear from that diagram.
Vulkan does support multiple command queues. And there can be several queues of the same type for one GPU.
I thought it might have turned into a LongPeaks 2.0, but from the initial overview it is quite interesting.
And I really like it seems to be moving forward into a more language agnostic friendly API and better tooling support.
OpenCL will become a library that is implemented by interacting with the GPU via the Vulkan API, and as an application developer you'd use OpenCL. Or perhaps you'd never again use OpenCL, because someone else develops a better library.
Perhaps someone creates a Haskell library -- which talks to the GPU via the Vulkan API -- that implements a map function that executes in parallel on the GPU. In that case you wouldn't use OpenCL or Vulkan directly at all.
If I understood correctly, for graphics at least Vulkan gives great flexibility to application developers, giving them to decide how to manage multithreading. If developers care about high efficiency (and games usually do need it), they'll use it.
Of course if developers are using third party engines which already handle it for them, they might not need to deal with Vulkan directly. But engine developers would have to in such case.
By the way, it looks like Vulkan doesn't get rid of single thread bottleneck for submitting commands to the GPU: https://news.ycombinator.com/item?id=9140849
It doesn't look competitive in comparison with DX12 and Mantle.
With Metal and Mantle, it was clear that graphics programmers were wanting to use GPU's at lower levels of abstraction to more efficiently utilize the design of GPU's, which have a much different architecture than a decade ago. Without a corresponding option that is standards compliant, low-level API's are threatening to fragment GPU programming.
One operation I didn't see any mention of that seems to have some future role, albeit completely uncertain, is the Heterogeneous System Architecture Foundation spear-headed by AMD and having seemingly every chip-maker except Nvidia and Intel on-board.
As GPU stream processors get more CPU-like but are natural at parallel processing, it is a matter of time before we get the right abstractions so that map gets scheduled across many stream processors and is reduced by a higher-single-thread performance CPU and suddenly computer vision and many other naturally parallel data synthesis workloads are programmed in a less heterogeneous software environment and executed on chips where the stream processors and CPU's share a large amount of commonalities, possibly down to micro-op compatibility through something Nvidia could be pushing towards in their Denver architecture (as yet highly speculative).
To somewhat less than enthusiastic coverage, Nvidia has been building up their partnerships with automakers like crazy. Computer vision is obviously one of the applications that will be required in self-driving cars. Tango and other projects are also quiet beneficiaries of Nvidia tech maturing into the Tegra platform.
We have far from conquered programming and CPU design just because JIT's are good, 8GB of RAM is expected or GPU's can mine MHashes/s etc. We're in some future's bad-old-days. The idea that Khronos is involved in the unification of graphics and computing API's as well only makes the exciting question of who will drive our cars more intriguing.
I haven't finished reading the PDF  yet, but it looks like LLVM, but for hardware.
SPIR-V looks to be a completely new format, not evolved from previous versions of SPIR. This new version is being described as a "fully specified Khronos-defined standard", which is great to hear.
Gee, I wonder what happened in June 2014 to inspire this....
You're thinking of Metal.
It's not really a massive issue when you're locally installing programs since you've already compromised your system in the first place. But when leveraged over the web with APIs such as WebGL, it does become a security vulnerability.
Though, I should note I doubt this will really save much. Just, on paper it is a costs savings. (Unless, as usual, I'm wrong on something.)
Also, for something as low level as this api sounds, it probably mapped decently, while still requiring another set of primitives for processor and memory control.
That is, I think the point of this api is to be a subset that can remain a bit more common between platforms. Not the all inclusive api that rules everything.