So on a tangent, does anybody have a nice article on numerical methods with Haskell? I've used Haskell for other things, but I've never endeavored to try doing any sort of heavy number crunching because it seemed a bit weird.
My standard test is implementing GMRES. I gave it some thought once and I couldn't come up with a non-awkward way to do it in Haskell. Mostly I get caught up on how to do the Arnoldi process as it requires lots of manipulation of submatrices. I couldn't ever seem to find a library that was nice for matrix manipulation. What do people mostly use? Also, what's the idiomatic way to do nitty-gritty matrix operations? Entrywise manipulation of a matrix doesn't seem very Haskell-ish to me, but it's not something you can really avoid in this application.
I've been slowly working on a nice library in that space for the past few years. It's not quite there yet, but I'll be spending a bit of time this Summer getting it into shape
Cool. I feel like it's an area that's kind of under-served (or that I'm ignorant of). Haskell has a much more mathy flavor than a lot of the more traditional scientific languages and it seems to me like it should be really great for implementing algorithms based on their mathematical description, but I haven't really figured out the details yet. I've used Haskell in other projects, but this application eludes me.
Really nice. As a physicist FPGAs looks really tempting to speed up calculations, it has to compete with GPUs and the toolsets around them though. CLaSH really looks promising, I don't thing that too many physicist would touch VHDL or Verilog.
I don't know how practical that is; FPGAs are very efficient, but they won't come close to the raw computational power of a GPU. Many moons ago, the dream was to have a configurable FPGA in every computer, but GPUs ended up winning out due to games, among other reasons.
Adapteva [0] is a startup doing cool things with multi core parallel coprocessors. From their interview on The Amp Hour [1], it seems like their target niche is parallelisable loads that aren't IO constrained, such as computer vision and video processing. Chunks of a video frame don't need to talk to each other.
You have to take into account that Titan are super cheap compared to FPGA. Big FPGA boards for HPC can easily cost between 5k$ to 10k$. If you compare to GPU in the same price range, you end-up with K40 or K80, who have a peak at 4.3TFLOPS SP and 5.6TFLOPS SP respectively, much higher than Stratix 10. Moreover, FPGA are not really good at double precision FP, which is important in many HPC area.
At the end of the day, the important metric is FLOPS/$, and more importantly what you can achieve for your application and tooling and ecosystem. Many scientists are not computer science experts, and many HPC codes are legacy simulations which can be hard to port and re-validate.
In my experience, FPGA are still a nich accelerator vs GPU. And I am not even talking about future Xeon Phi generations.
And of course, when talking about HPC you should not forget the elephant in the room: standard Xeon...
My work intends to change the aspect of having to be a computer scientist in order to leverage the power of an FPGA by using Haskell/CLaSH as a HDL which is close to mathematics.
Furthermore, the verification of the designs is simplified a lot by checking directly in Haskell over generating VHDL testbenches and then running an additional simulator tool.
Lastly, I hope that with the recent acquisition of Altera by Intel, some of the other issues you mentioned (mainly floating point performance) additionally with some tooling issues will be addressed as well.
I understand that, and having a DSL is definitely a good idea, but you need to create a community, which can be hard (and NVIDIA seems to be good at it). I didn't want to be deceptive, I just wanted to highlight that it was not just a matter of peak FLOPS (in fact it never is - as an engineer working on another niche accelerators I know it too much ;) ).
Well, the top of the range FPGAs are priced two orders of magnitude above the top of the range GPUs, so in terms of Tflop/$ GPUs will win in many cases.
Given that most projects, the power costs are much higher than the upfront equipment costs and that the dominating factor on computational density and interconnect speeds is our thermal budget, I'd think it's really a question of Tflop/watt.
The cost of each watt of grid power for 2 years is about $1. A high-end GPU costs $3000 and burns 200W, so power costs $200 over 2 years, or 6% of the total cost. I can't think of any high-performance computing semiconductors costing less than $1 / watt.
What systems can you point to where the power cost over the time-to-obsolescence exceeds capital cost? Besides Bitcoin mining.
You're correct, I had receive wrong information and never really thought it through in regards to high end systems.
Just about the only systems where it makes sense to talk about the power budget being relevant are where you're going in on base commodity systems and talking about a 3-4 year cycle time. (And maybe weird cases where we're having to meet power budgets of existing deployments.)
I was comparing situations that had already made a FLOPS/dollar decision because of constraints on other resources (cheap hardware, lots of it, TONS of storage, high sync latency), and so I guess both falls outside traditional HPC and is a secondary concern.
In terms of power, there is a largely unexplored but yet very interesting world of the mobile GPUs. Project "Mont Blanc" is about to dig into this area: http://www.montblanc-project.eu/
But, yes, I'm very enthusiastic about the Altera acquisition by Intel, it may drive prices down and we'll probably see FPGA-enhanced Xeons soon.
Computational power is not always the most important thing. And GPUs computational power is severely limited to a very specific class of problems.
In physics it's important, for example, to implement very low latency event triggers (at least in the high energy physics), and FPGAs are a bliss here.
FPGAs are great for a wide class of the memory-throughput-bound problems as well, those that suck on GPUs badly (even on those with a proper local memory). There are dozens to hundreds of independent block RAMs on FPGAs, which allows a degree of parallelisation which is never possible with GPUs.
Don't forget that FPGAs are good at different things than GPUs.
GPUs work well for massively parallel problems. But FPGAs work well for mostly-serial problems, and problems that have short parallel sections interlaced with serial sections.