
GRVI Phalanx joins The Kilocore Club - geolqued
http://fpga.org/2017/01/12/grvi-phalanx-joins-the-kilocore-club/
======
vvanders
Do I dare even ask how long the timing and routing pass took on 1680 cores?

~~~
nickpsecurity
Can't the tools do it relatively fast with a geometric method if the
individual cores already have area/timing data to use and are homogenous? And
a FPGA instead of an ASIC?

My reading the various papers on synthesis as a non-hardware guy made me think
this job shouldn't be as hard on that as the SOC's whose components vary
considerably in individual attributes.

~~~
jsgray
I wish it were so. While it is straightforward to do regular placement at the
block level or even at the individual LUT/slice level using RPMs (relationally
placed macros) or absolute LOC placement of LUTs in the XDC implementation
constraints file, most of the implementation time goes into routing and there
is not an easy mainstream way to take a routed one tile design and step and
repeat it (say) 210 times across the die. In part this is due to non
homogeneity across the columns and sometimes rows of the chip.

~~~
nickpsecurity
That makes sense. Thanks.

------
quickben
If I'm not mistaken, the devices go for 5-6k for the eval kit?

~~~
duskwuff
Yep. The Arty board (which ran 32 cores) is $99, though, which is actually a
lot more interesting.

[http://store.digilentinc.com/arty-artix-7-fpga-
development-b...](http://store.digilentinc.com/arty-artix-7-fpga-development-
board-for-makers-and-hobbyists/)

~~~
agumonkey
Is it better than the Zynq platform ? Parallella still sells SBC with a dual
core arm + fpga for 100$ IIRC

~~~
duskwuff
Better for FPGA development. The FPGA on the Arty is slightly larger (28K ->
33K logic cells), and all peripherals, including memory, are connected
directly to the FPGA instead of through the ARM SoC.

Also, the I/O headers are omitted from the $99 Parallela board, so it's
difficult to program. (No JTAG connector.)

~~~
gamiecc
does it means that it is possible to done the 1680 cores thing on the Arty
board? if can't, what made the Xilinx board more suitable to implement the
1680-cores on it ?

~~~
nickpsecurity
The Xilinx board has a ton of logic slices to run the extra cores. It's like
chips having more transistors able to do more stuff. The Arty is too small to
hold the full design. Might run slower, too, if on a older, process node than
the Xilinx FPGA.

------
ethagknight
What does one do with such a cluster?

~~~
RandomOpinion
> _What does one do with such a cluster?_

They presented a short paper at FCCM '16: [http://fpga.org/wp-
content/uploads/2016/05/grvi_phalanx_fccm...](http://fpga.org/wp-
content/uploads/2016/05/grvi_phalanx_fccm2016.pdf) Section VI lists possible
applications.

~~~
jaipilot747
For the lazy like me:

"GRVI Phalanx aspires to make it easier to develop and maintain an FPGA
accelerator for a parallel software workload. Some workloads will fit its
mold, i.e. highly parallel SPMD or MIMD code with small kernels, local shared
memory, and global message passing. Here are some parallel models that should
map fairly well to a GRVI Phalanx framework:

• OpenCL kernels: run each work group on a cluster;

• ‘Gatling gun’ parallel packet processing: send each new packet to an idle
cluster, which may exclusively work on that packet for up to (#clusters)
packet-time-periods.

• OpenMP/TBB: run MIMD tasks within a cluster;

• Streaming data through process networks: pass streams as messages within a
cluster, or between clusters;

• Compositions of such models.

Since GRVI Phalanx is implemented in an FPGA, these and other parallel models
may then be further accelerated via custom GRVI and cluster function units;
custom memories and interconnects; and custom standalone accelerator cores on
cluster RAM or directly connected on the NOC."

------
frozenport
Now use it for a barrel processor!

