Limited in that you can only have one tree per whiteboard, though.
Fluid flow, heat transfer, and other such physical phenomena that you might want to simulate.
Phase correlation in image processing is another example. (https://en.wikipedia.org/wiki/Phase_correlation)
MD simulations rely on FFT but I'm not sure how much is typically (or can be) done on the GPU. For example, NAMD employs cuFFT on the GPU in some cases. (https://aip.scitation.org/doi/10.1063/5.0014475)
In general, Vulkan is a thing which commands the GPU, but is not opinionated on what the language used to represent the kernel is as long as it compiles to SPIR-V. SPIR-V in itself is like parallel LLVM IR. If you look into the project source, the shaders are in GLSL which have been pre-compiled using a cross-compiler into SPIR-V. The C file you find on the project root constitutes as the loader program for the SPIR-V files.
Futhark project did some initial benchmarks on translating OpenCL to Vulkan. The results were mainly slowdowns. You can read about it in here: https://futhark-lang.org/student-projects/steffen-msc-projec...
"OTOY | GTC 2020: Real-Time Raytracing, Holographic Displays, Light Field Media and RNDR Network"
They moved into Optix 7 as backend.
> Badly, OctaneRender had moved away from Vulkan into CUDA, because they found out that Vulkan compute wasn't at the level that they wanted.
They mostly talk about Vulkan+Cuda interop which isn't really supported, and they explicitly said they consider rewriting everything using Vulkan to get rid of this issue. So from what I understand, they are still pretty bullish on Vulkan, but it will require a lot of work and it will take some time (“but probably won't be this year”).
Maybe you're right, and this guy is wrong. But you gotta admit that contradicting the speaker of the video you're using as a source is unusual to say the least.
Sorry if I don't spend 2h of my time watching a video presenting the features included in the new version of a piece of software I will never use, just so we have the same knowledge about this product. You generously posted a link to a part of this video relevant to our discussion and I'm thankful for that, but I'm also puzzled because the speaker contradicts you in this very snippet you chose!
There are no error bars on the graphs, so it's very hard to judge if the minor differences are significant. I work in research, so probably I'm peculiar about this point, but: I'd expect better from anyone who's taken basic statistics. But from a quick look, it seems like the performance is pretty much just "on par".
It would also be nice to know how performance is on other hardware. I'm assuming it's tuned to nvidida GPUs (or maybe even the specific GPU mentioned). But how does this perform on Intel or AMD hardware? How does it compare to `rocFFT` or Intel's own implementation?
I have tested VkFFT on Intel UHD620 GPU and the performance scaled on the same rate as most benchmarks do. There are a couple of parameters that can be modified for different GPUs (like the amount of memory coalesced, which is 32bits on Nvidia GPUs after Pascal and is 64bits for Intel). I have no access to an AMD machine, otherwise I would have refined the lauch configuration parameters for it too. I have not tested other libraries than cuFFT yet.
Also: I should've said this in my first post already, which in hindsight might sound too negative: I think this is a cool project and you did a great job! I just thought this might improve the presentation of your results a bit.
You'd think that, but I found all GPUs I'm using here to exhibit multimodal distribution of execution times in the FFT (this is for the cuFFT codepath). The GTX980 (not shown in the plot) and the Titan-X even have very prominent outliers. This is a figure that's going to be in the paper I'm currently writing:
I'm comparing the OCT processing execution times (with HOT caches, mind you) between a Titan-X and a GTX1080. The difference also shows up very prominently when looking at the kernel scheduling order as reported by NVPP.
Find our original publication here: https://doi.org/10.1364/BOE.5.002963
Since then we improved on that. For the resampling and complex tonemapping we determined empirically that a grid of 128 threads, each processing a whole line achieves the best throughput; there's a 2D parameter space of possible launch configurations and we brute force the whole thing (so far I didn't benchmark the RTX20xx and RTX30xx GPUs, but it was consistent between the GTX690 to GTX1080). The FFT plan is what cufftPlan1d is producing for a single axis transform over a 2D array, usually 2048 point FFT, but with up to 4096 lines (well, technically whatever the maximum dimension for 3D textures is).
> Do you launch a big grid that consists of multiple samples combined in a matrix
> or you launch each sample separately?
Of course not, that'd be stupid.
Personally, I would have a hard time hiring anyone without a Github account and less so working in a place where nobody has one.
Despite it, my statement remains true: I do, in fact, adjudge candidates the more favorably for a Gitlab account than a Github account. It demonstrates conscious choice in a knee-jerk world.
(Microsoft doesn't need your assistance, boiz.)
Must be a blast working there...
That said, I have never nixed anybody for being only on Github. And, thank you, it is a blast, but mainly because we don't have meetings.
I just link my GitLab very obviously on my GitHub
Seems a bit more feature complete than my take on the problem: https://github.com/Lichtso/VulkanFFT
Still, to beat CUDA with Vulkan a lot is still missing: Scan, Reduce, Sort, Aggregate, Partition, Select, Binning, etc.
A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License.
By falling outside the scope of the license, the LGPL isn't viral in the way the GPL is. Code you write can use an LGPL licensed library but be licensed differently itself.
However, you (the developer) are still subject to various legal requirements if you make use of an LGPL licensed library! If you fail to meet those requirements (basically, allow relinking against a modified version of the library) then you are in violation of the license and (in most jurisdictions) subject to penalty under copyright law.
> However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables.
When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law.
That certainly does have consequences for what you can do with the software - the object code of your compiled program will include parts of the library.
As far as I understand it, proprietary code using an LGPL licensed library is more or less incompatible with templates in header files (!!!) since there's no way (AFAIK?) to relink against a modified version without providing your full source code. Supposedly the LGPLv3 provides an exception for header files but personally I wouldn't go anywhere near it because it seems quite vague - "small" macros, templates that are less than 10 lines (what constitutes a line?) etc.
So as currently licensed (LGPL), I don't think your library is usable as part of a proprietary project.
The MPL, in contrast, places no relinking requirements on the developer. You only have to share any changes you happen to make to the MPL licensed code.
Ask anybody who actually contributes, usefully, what they think. Their opinion might mean something.
Sometimes LGPL is chosen out of confusion, sometimes symbolically. Whatever the legal demands it does not impose, it would be rude not to honor its intent.
It is a library.
>(1) If you statically link against an LGPLed library, you must also provide your application in an object (not necessarily source) format, so that a user has the opportunity to modify the library and relink the application.
This is covered in sections 6 and 6a of the LGPL 2.1.
MPL doesn't protect such developers, but it "forces" contribution of changes back to original code. In reality, it's ignored.
Header-only library cannot be compiled as separate library, so requirements of LGPL cannot be met by proprietary code.
What about bigger than big? > 2^29 or so ? Are these sizes for double precision ?
A Free GPUFFT implementation will certainly help! Great work.
If your entire stack lived in the GPU, and you're just reading out the result, this is trivial.
If you're constantly copying buffers back and forth because some effects are implemented in the CPU and some in the GPU, not so much!
It's probably the case that a full stack GPU implementation would blow what we have out of the water, but you'd lose your entire ecosystem in the process, so it's probably never going to happen.
But even if that is not the case, machine learning is making its way into music production tools more and more. No doubt a beefy GPU will be useful to a lot of music production professionals in the future at least, as the tools they are using begin to leverage ML more and more.
The time budget to refresh a video frame is 8ms on 120HZ if everything else came free. In practice closer to <4ms. So even looking at the close to worst conditions, that's about the delay of the sound traveling a meter - should be fine for a lot of real life applications.
How long the deadline is depends on your buffer size and sample rate. To my ear, buffer sizes of >128 samples (at a sample rate of 44.1 KHz) have detectable latency (although the amount of latency will depend also on how many applications are in your signal chain). At 128 samples you have just under 3ms to do your processing.
Also note that for graphics, the output is the GPU itself. So you don't need to wait for the output to move back to the CPU, it's already where it needs to be.
> PCI-E 3.0 standard guarantees data transfer for 4 kb data with 1-2 μsec (3-10 round trip).
> size: 8192 bytes, time: 4.72 us,
This should not be a meaningful impact in any audio workflow.
And it's not like this is ABS firmware, where someone might die if you miss your deadline, and where hard real-time OSes are used. But you do get glitches in the audio stream, which in performance contexts is still pretty bad.
Win and Mac have soft-realtime scheduling and the mentioned software does what it can without guarantees https://help.ableton.com/hc/en-us/articles/209072289-How-to-... I've been to a live gig before where the performance was paused a few times because artist's mac was misbehaving.
There are some methods of synthesis which rely on FFT which can't really be done well in real-time with the CPU (PadSynth, PaulStretch) that I'm hoping this will help with.