hear-hear! some of my least favorite things to search and research about are matters related to "R", "C", "dock", "boost", "Go"... i never had any such problems with "erlang", "numpy", or "gromacs".
Please do not underestimate the critical nature and endless annoyance associated with naming things after common concepts (or worse yet, single characters)
Higher level I guess. Higher level than CUDA or OpenCL. But Accelerate is pretty cool. I've been meaning to learn Haskell, so I might take a closer look at it some time. We were aiming for a different niche. Vector is basically just "Coffeescript for CUDA", in that most of the syntax is a one-to-one mapping, but annoying details like memory management are abstracted away.
You know what's worse than CUDA or OpenCL? General purpose algorithms implemented in a graphics API.
I read a lot of GPGPU papers at the university, and I could never understand the older ones, that described algorithms by mapping everything to graphics elements, and computed the solutions as a side-effect of rendering something.
Next to that, undestanding an algorithm implemented in CUDA is a breeze.
i will preface this by saying i'm a C programmer at heart,
CUDA and OpenCL demand a depth of understanding of both C, and how your code is executed on many core processors. but i wouldn't call either of them terrible.
i would however, very much like to see a widespread higher level API for doing compute on GPUs, if only to encourage people to understand the lower level details.
I didn't say at all that C is bad, as a matter of fact I write most of my code in C. It is, however a low-level programming language not much different from assembly.
The reason it is that it operates on concepts that are not abstract but are of register machines. A high completely abstracts underlying architecture so the code can be executed in any possible environment by means of translation, be it a $10MM Cray or a cellular automaton or a mechanical computer. In essence, a high level language provides an abstract notation for computation.
C, however, is not abstract at all. Variables? How those will map to a dataflow computation? Pointers? Would not work beyond register CPUs (e.g. it is very cumbersome to translate C programs to Javascript as a result and usually leads to a memory array emulation). Fixed-width types? Volatile pointers? Returns from the middle of a procedure? Gotos? Come on, how those are even going to run on a non-conventional architecture?
C also does not completely define the language semantics leaving certain operations implementation-specific or undefined. It also (before C11) didn't define any memory model, thus making it impossible to even describe an algorithm that depends on specific properties of memory accesses in a way that will be portable across different register machines. C algorithms are close to impossible to translate to be run on memory-less computation devices as the whole concept of C is based around having local memory and a stack space with certain properties (unless one wants to emulate a register machine, see JavaScript remark above).
In certain areas it lead to some ugly solutions like Cuda where the language looks like C but semantics is completely different and GPUs are the closest thing to he original C target as you can get.
C is no where near as low level as you can get. C-- is closer. LLVM-IR is even closer. but really, the lowest level you can get is the assembly of the architecture you're running on.
at the time of the creation, the general consensus was that C was too high level, that it abstracted away the actual workings of the code.
we've been introduced to high level languages that have made us re-evaluate what it means to be low or high level. but make no mistake, C is.. at least medium-level..!
At the risk of being pointlessly pedantic: assembly is surely not the lowest level relevant to the topic at hand. When programming for very high performance, you usually need to consider the microarchitecture you are targeting.
Hmm. Most "super-high-performance" projects I've seen find they can get more bang for the buck by switching to a different microarchitecture (FPGA etc) or exploiting parallelism (buy ten computers), not so much optimizing the machine code.
While it makes CUDA more readable, I feel like the amount of time taken to write the code in this language will be very close to writing actual CUDA code for someone who is experienced with it.
Maybe not faster to write, but it'd be less repetitive. I've written a bit of CUDA code, and having to put in a bunch of cudaMemcpy calls everywhere got pretty old. Also, reduce is pretty annoying to implement properly, and I'd rather not have to do it again for every possible reducing function.
Very interesting! I'm also implementing a programming language for my undergrad dissertation (but specifically for agent based simulations).
The thing that struck me most about vector was the radically different for loops (compared to C). I'm assuming you're purposefully crippling them to make parallelisation easier? Or is there another reason?
EDIT:
One other thing - the website fails to scroll nicely on a mac (in chrome). I had to manually use the scroll bars instead of being able to 2 finger swipe...
Hey thanks. I've actually already accepted a full-time offer from Amazon, so I'm not looking around anymore. My teammates have all accepted full-time offers from other companies as well.
When I was about your age I've joined IBM for 6 years. The work was great and I liked everything I did there (well, at least for the first 3 years). In the hindsight though I realize that I basically wasted these years.
Not that I know of - but there was an odd iframe on top of the page that stopped scrolling from working. When I got rid of it, it started working again. odd...
I'm wondering about the timings on page 36 in the vector.pdf; those can't be seconds or it would be way too slow. (I've written a program[1] to calculate the mandelbrot set on the CPU with SIMD optimizations, and SMT support, on my ageing laptop with a Core 2 duo it calculates the start set in about 0.07 seconds.) It would be interesting if you provided the pure C program that was used for the timings as then I could get a real grasp of the performance of the GPU variant.
You can find the benchmarks in the "bench" directory of the git repo. The CPU code we generate for the benchmark is not particularly optimized and is completely single-threaded (so not really a fair comparison).
Oh well, this is embarrassing. I rewrote the CPU benchmark in C and it does indeed perform much faster. I think it has something to do with the use of the CUDA complex number functions. Unfortunately, I do not have my desktop with the GPU set up to recompute the GPU numbers.
I'm getting the following when running "vagrant up"; this is on Debian.
$ vagrant up
/home/chrishaskell/src/vector/Vagrantfile:7:in `<top (required)>': undefined method `configure' for Vagrant:Module (NoMethodError)
from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:115:in `load'
from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:115:in `block in procs_for_source'
from /usr/lib/ruby/vendor_ruby/vagrant/config.rb:41:in `block in capture_configures'
from <internal:prelude>:10:in `synchronize'
from /usr/lib/ruby/vendor_ruby/vagrant/config.rb:36:in `capture_configures'
from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:114:in `procs_for_source'
from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:51:in `block in set'
from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:45:in `each'
from /usr/lib/ruby/vendor_ruby/vagrant/config/loader.rb:45:in `set'
from /usr/lib/ruby/vendor_ruby/vagrant/environment.rb:377:in `block in load_config!'
from /usr/lib/ruby/vendor_ruby/vagrant/environment.rb:392:in `call'
from /usr/lib/ruby/vendor_ruby/vagrant/environment.rb:392:in `load_config!'
from /usr/lib/ruby/vendor_ruby/vagrant/environment.rb:327:in `load!'
from /usr/bin/vagrant:40:in `<main>'
If you post the generated C code then I'll give the timings and try to compare what it's doing differently.
The CPU I'm using (Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz) was released in July 2006 [1]. The GPU you're using was released on 15 June 2007 [2]. My CPU code calculates the 1246x998 pixel image of the zoomed out view (real=-2..2, imag=-1.6..1.6, maxdepth=200) in 0.07 seconds, if your GPU code does about the same in 0.61 sec, then that's about 8 times slower than the slightly older CPU can do with hand optimized C code. That wouldn't be such a pretty result yet :)
I've redone the CPU benchmark in C and run the CPU and GPU benchmarks on an EC2 G2 instance. The blog post has been updated with the corrected results.
Nothing against this particular language, but... I feel like there is a new language at least every day. It would seem that this does more harm than good to the developer community's progress. Of course, languages need to be iterated on in addition to the programs they compose. But, there is now such a large spread of similar languages that it necessarily slows the development of the most productive ones by blurring/resetting the focus constantly. Many technical problems can be solved with existing languages, rather than eliciting the distraction of a brand new language. Though, in this case, there is perhaps a clear purpose for the specialization of the language. There is certainly a benefit to new languages that offer truly new concepts or optimizations.
I actually welcome new languages. Even if some may be buggy, and lacking in features. Most will most likely end up being lost or unused. But the knowledge gained from developing it is spread out into the industry. Also, it is distracting if you try and follow every new trend. Like you said, there are already a good amount of options available. No need to learn each and every new language. Though it is fun to download them on a Saturday and learn about the ideas the creator(s) had in mind when developing it.
I agree. This was just a class project, and I don't plan on continuing development. These features would be a lot more useful rolled into existing programming languages.
Hey that's pretty cool, and would probably make OpenCL usable by mere mortals. One improvement that I see you could borrow from vector is getting rid of this explicit copying business. Take a look at the array implementation in our runtime library.
Basically, the VectorArray class contains both the host array pointer and the device array pointer. There are also two boolean flags, h_dirty and d_dirty. When you modify array elements on the host, h_dirty is set to one. Then, when you run a kernel, the data is copied to the device if h_dirty is set, h_dirty is cleared, and d_dirty is set. When you try to read an array element again on the CPU, the data is copied from device to host if d_dirty is set, and d_dirty is then cleared.
I hear this same refrain about Linux distros. Personally, I think there's a lot of merit to the proliferation of languages. Languages don't just change the way we write code, they change the way we think. A programming language monoculture would lead to a thinking monoculture and that would be disastrous for innovation.
We agree. Language bloat is a cacophonous. That's why we don't call ArrayFire a language. It's just a library with compatibility for existing languages, e.g. C, C++, Fortran.
* High-order functions - Been in ArrayFire since 2009
It's always interesting to watch other people reinvent the wheel. It takes a lot of talent though. If the people behind this want an awesome opportunity to join with our team (where we live this stuff every day and have developed a great culture and customer focus), give me a holler. Find me at http://notonlyluck.com
It's interesting how much startups tend to talk about how great the culture is. Can you elaborate on this 'developed culture.' I am really curious and hoping for a real response, not fluff.