
Efficient Recurrent Neural Networks using Structured Matrices in FPGAs - godelmachine
https://arxiv.org/abs/1803.07661
======
cs702
The main idea is to take each weight matrix W, divide it into smaller blocks,
each of dimension, say, n×n, and make each block a _circulant matrix_ that can
be specified by a vector of only n elements.[a]

This reduces W's memory consumption by a factor of n, and makes other gains in
computational efficiency possible. Read the paper for details.

However, as far as I can tell, it appears the authors have tested this
technique only with one RNN architecture, in one task. It's hard to know
whether the technique will hold up well in a broad range of RNN
architectures/tasks.

[a]
[https://en.wikipedia.org/wiki/Circulant_matrix](https://en.wikipedia.org/wiki/Circulant_matrix)

~~~
godelmachine
Speech recognition be the task they implemented this for.

~~~
lnsru
What about image recognition? Too complex? Or doable at 320x240 resolution?

------
lnsru
How does this compare to GPU? These FPGAs are high end ones and not cheap for
sure.

~~~
alain94040
According to [1], a high-end FPGA can do "AlexNet inference performance: int16
over 2,400 img/s, int8 over 4,500 img/s.". Plug in your favorite nVidia
numbers and compare.

[1] [http://mipsology.com/zebra.html](http://mipsology.com/zebra.html)

~~~
lnsru
Thank you very much for the link!

~~~
godelmachine
Hi Insru - how do you think GPUs would fare taking into consideration
alain94040 's metrics?

~~~
lnsru
Please look at this pdf: [https://www.nvidia.com/content/tegra/embedded-
systems/pdf/je...](https://www.nvidia.com/content/tegra/embedded-
systems/pdf/jetson_tx1_whitepaper.pdf) There is a table on page 9 and it
claims, that Titan X can make 3216 img/s while TUL-KU115 FPGA board can make
only >1000 img/s. This KU115 chip has a project price of ~$2000. Real time
capability is nice, but that’s still expensive.

~~~
godelmachine
Thanks for this PDF :)

Do you think TitanX has real time capability?

~~~
lnsru
I am FPGA developer and have not much knowledge about GPUs. I just can make a
guess, that no one can guarantee execution time on GPU. There are all little
details well defined in FPGA design and result is always available after
defined delay.

~~~
fnl
Though wouldn't it be far more interesting to compare the latencies at these
rates, than the throughput? Roughly similar price and throughput, but very
different energy efficiency and possibly latency could be rather significant
for quite a few applications.

~~~
lnsru
Xilinx has prepared some marketing material regarding latency:
[https://forums.xilinx.com/t5/Xcell-Daily-Blog/Xilinx-
reVISIO...](https://forums.xilinx.com/t5/Xcell-Daily-Blog/Xilinx-reVISION-
stack-pushes-machine-learning-for-vision-guided/ba-p/754387)

~~~
godelmachine
For some reasons the Xilinx link is not opening.

~~~
lnsru
Strange. It’s works here. Try this:
[https://www.xilinx.com/support/documentation/backgrounders/r...](https://www.xilinx.com/support/documentation/backgrounders/revision_backgrounder.pdf)

~~~
godelmachine
Got it! Thanks :)

