
Accelerating Large-Scale Matrix Multiplication on FPGAs - godelmachine
https://arxiv.org/abs/1803.03790
======
godelmachine
Abstract →

Large-scale floating-point matrix multiplication is a fundamental kernel in
many scientific and engineering applications. Most existing work only focus on
accelerating matrix multiplication on FPGA by adopting a linear systolic
array. This paper towards the extension of this architecture by proposing a
scalable and highly configurable multi-array architecture. In addition, we
propose a work-stealing scheme to ensure the equality in the workload
partition among multiple linear arrays. Furthermore, an analytical model is
developed to determine the optimal design parameters. Experiments on a real-
life convolutional neural network (CNN) show that we can obtain the optimal
extension of the linear array architecture.

