Layer-specific parallelization for FPGA-based convolutional neural network accelerators: Performance and resource evaluation
By: Mustafa Tasci and Ayhan Istanbullu
References
- A. Shawahna, S. M. Sait, and A. El-Maleh, “FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review,” IEEE Access, vol. 7, pp. 7823-7859, 2019, doi: 10.1109/ACCESS.2018.2890150.
- X. Liu et al., “Collaborative edge computing with FPGA-based CNN accelerators for energy-efficient and time-aware face tracking system,” IEEE Trans. Comput. Soc. Syst., vol. 9, no. 1, pp. 252–266, 2021, doi: 10.1109/TCSS.2021.3059318.
- R. Gadea-Gironés, J. Fe, and J. M. Monzo, “Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge,” Microprocess. Microsyst., vol. 98, p. 104824, 2023, doi: 10.1016/j.micpro.2023.104824.
- M. Tibaldi and C. Pilato, “A survey of FPGA optimization methods for data center energy efficiency,” IEEE Transactions on Sustainable Computing, vol. 8, no. 3, pp. 343-362, 2023, doi: 10.1109/TSUSC.2023.3273852.
- Y. Liang et al., “An efficient hardware design for accelerating sparse CNNs with NAS-based models,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 3, pp. 597-613, 2021, doi: 10.1109/TCAD.2021.3066563.
- D. Ghimire, D. Kil, and S. Kim, “A survey on efficient convolutional neural networks and hardware acceleration,” Electronics (Basel)., vol. 11, no. 6, p. 945, 2022, doi: 10.3390/electronics11060945.
- X. Zhang et al., “DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs,” in Proceedings of the International Conference on Computer-Aided Design, 2018, pp. 1-8. doi: 10.1145/3240765.3240801.
- G. Brignone, R. Bosio, F. Ottati, C. Sansoè, and L. Lavagno, “SILVIA: Automated Superword-Level Parallelism Exploitation via HLS-Specific LLVM Passes for Compute-Intensive FPGA Accelerators,” ACM Trans. Reconfigurable Technol. Syst., vol. 18, no. 2, pp. 1-16, 2025, doi: 10.1145/3705324.
- B. A. Motetti, M. Risso, A. Burrello, E. Macii, M. Poncino, and D. J. Pagliari, “Joint pruning and channel-wise mixed-precision quantization for efficient deep neural networks,” IEEE Transactions on Computers, 2024, doi: 10.1109/TC.2024.3449084.
- C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “NeuFlow: A Runtime Reconfigurable Data-flow Processor for Vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2011, pp. 109-116. doi: 10.1109/CVPRW.2011.5981829.
- M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-Centric Accelerator Design for Convolutional Neural Networks,” in Proceedings of the IEEE International Conference on Computer Design, 2013, pp. 13-19. doi: 10.1109/ICCD.2013.6657019.
- T. Ma, Z. Li, Q. Li, H. Liu, Z. Zhao, and Y. Wang, “FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy,” Computers, Materials & Continua, vol. 77, no. 3, pp. 3237-3263, 2023, doi: 10.32604/cmc.2023.045948.
- V. Leon, S. Mouselinos, K. Koliogeorgi, S. Xydis, D. Soudris, and K. Pekmestzi, “A TensorFlow Extension Framework for Optimized Generation of Hardware CNN Inference Engines,” Technologies (Basel)., vol. 8, no. 1, p. 6, 2020, doi: 10.3390/technologies8010006.
- H. Hong et al., “Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques,” J. Real. Time. Image Process., vol. 21, no. 3, p. 64, 2024, doi: 10.1007/s11554-024-01442-8.
- U. Kulkarni et al. (2021). Performance Improvements in Quantization Aware Training and Appreciation of Low Precision Computation in Deep Learning. In: Thampi, S.M., Krishnan, S., Hegde, R.M., Ciuonzo, D., Hanne, T., Kannan R., J. (eds) Advances in Signal Processing and Intelligent Recognition Systems. SIRS 2020. Communications in Computer and Information Science, vol 1365. Springer, Singapore. https://doi.org/10.1007/978-981-16-0425-6_79.
- Y. Su, K. P. Seng, L. M. Ang, and J. Smith, “Binary Neural Networks in FPGAs: Architectures, Tool Flows and Hardware Comparisons,” Sensors, vol. 23, no. 22, p. 9254, Nov. 2023, doi: 10.3390/s23229254.
- C. Yuan and S. S. Agaian, “A comprehensive review of binary neural network,” Artif. Intell. Rev., vol. 56, no. 11, pp. 12949-13013, 2023, doi: 10.1007/s10462-023-10464-w.
- S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei, “FP-BNN: Binarized Neural Network on FPGA,” Neurocomputing, vol. 275, pp. 1072-1086, 2018, doi: 10.1016/j.neucom.2017.09.023.
- Q. Cheng et al., “Reliability exploration of system-on-chip with multi-bit-width accelerator for multi-precision deep neural networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 10, pp. 3978-3991, 2023, doi: 10.1109/TCSI.2023.3300899.
- M. Ji, Z. Al-Ars, P. Hofstee, Y. Chang, and B. Zhang, “Fpqnet: Fully pipelined and quantized cnn for ultra-low latency image classification on fpgas using opencapi,” Electronics (Basel)., vol. 12, no. 19, p. 4085, 2023, doi: 10.3390/electronics12194085.
- M. Amin and T. Adiono, “Area optimized CNN architecture using folding approach,” in 2019 International Conference on Electrical Engineering and Informatics (ICEEI), 2019, pp. 206-209. doi: 10.1109/ICEEI47359.2019.8988879.
- L. Liu et al., “An Automatic Neural Network Architecture-and-Quantization Joint Optimization Framework for Efficient Model Inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 5, pp. 1497-1510, 2023, doi: 10.1109/TCAD.2023.3339438.
- R. Wu, X. Guo, J. Du, and J. Li, “Accelerating neural network inference on FPGA-based platforms – A survey,” Electronics (Basel)., vol. 10, no. 9, p. 1025, 2021, doi: 10.3390/electronics10091025.
- J.-Y. Kim, “FPGA based neural network accelerators,” in Advances in computers, vol. 122, Elsevier, 2021, pp. 135-165. doi: 10.1016/bs.adcom.2020.11.002.
- M. Tasci, A. Istanbullu, V. Tumen, and S. Kosunalp, “FPGAQNN: quantized neural network hardware acceleration on FPGAs,” Applied Sciences, vol. 15, no. 2, p. 688, 2025, doi: 10.3390/app15020688.
- D. T. Nguyen, H. Kim, and H.-J. Lee, “Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2450-2464, Jun. 2021, doi: 10.1109/tcsvt.2020.3020569.
- J. Wang, W. Tong, and X. Zhi, “Model parallelism optimization for CNN FPGA accelerator,” in Algorithms, vol. 6, no. 2 MDPI, 2023, p. 110. doi: 10.3390/a16020110.
- K. He, B. Liu, Y. Zhang, A. Ling, and D. Gu, “FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10,” in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA: Association for Computing Machinery, 2020, p. 314. doi: 10.1145/3373087.3375389.
- S. K. Venkataramanaiah et al., “FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory,” in Proceedings of the 39th International Conference on Computer-Aided Design, New York, NY, USA: Association for Computing Machinery, 2020. doi: 10.1145/3400302.3415643.
- J. Gao, Y. Yao, Z. Li, and J. Lai, “Fca-bnn: Flexible and configurable accelerator for binarized neural networks on fpga,” IEICE Trans. Inf. Syst., vol. 104, no. 8, pp. 1367-1377, 2021, doi: 10.1587/transinf.2021EDP7054.
- R. Zhao et al., “Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs,” in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 15-24. doi: 10.1145/3020078.3021741.
- Z. Nie et al., “Laius: an energy-efficient FPGA CNN accelerator with the support of a fixed-point training framework,” International Journal of Computational Science and Engineering, vol. 21, no. 3, pp. 418-428, 2020, doi: 10.1504/IJCSE.2020.106064.
- Y. Liu, Y. Chen, W. Ye, and Y. Gui, “FPGA-NHAP: A General FPGA-Based Neuromorphic Hardware Acceleration Platform with High Speed and Low Power,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 6, pp. 2553-2566, 2022, doi: 10.1109/TCSI.2022.3160693.
- Y. Umuroglu et al., “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference,” in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 65-74. doi: 10.1145/3020078.3021744.
- M. Blott et al., “FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks,” ACM Trans. Reconfigurable Technol. Syst., vol. 11, no. 3, pp. 1-23, 2018, doi: 10.1145/3242897.
- M. Kong and J. L. Nunez-Yanez, “Entropy-Based Early-Exit in a FPGA-Based Low-Precision Neural Network,” in International Symposium on Applied Reconfigurable Computing, 2022, pp. 72-86, doi: 10.1007/978-3-031-19983-7_6.
- I. Morianos, K. Georgopoulos, A. Brokalakis, T. Kyriakakis, and S. Ioannidis, “I2DS: FPGA-Based Deep Learning Industrial Intrusion Detection System,” in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 2024, pp. 165-176, doi: 10.1007/978-3-031-78380-7_14.
- T. Xaviour, B. Panuccio, E. Lee, L. Yang, and X. Wang, “Fast Implementation of Quantized Neural Network Accelerators on FPGAs for Image Classification,” in 2025 Systems and Information Engineering Design Symposium (SIEDS), 2025, pp. 155-160. doi: 10.1109/SIEDS65500.2025.11021206.
- E. Fatima, M. Fahad, H. Abrar, H. Waris, and others, “FPGA Based Artificial Neural Network Accelerator,” in 2024 26th International Multi-Topic Conference (INMIC), 2024, pp. 1-6. doi: 10.1109/INMIC64792.2024.11004346.
- S. Ben Ali, S.-I. Filip, O. Sentieys, and G. Lemieux, “MPTorch-FPGA: a Custom Mixed-Precision Framework for FPGA-based DNN Training,” in 2025 Design, Automation & Test in Europe Conference (DATE), 2025, pp. 1-7. doi: 10.23919/DATE64628.2025.10993010.
- B. J. Mohd, K. M. Ahmad Yousef, A. AlMajali, and T. Hayajneh, “Quantization-based optimization algorithm for hardware implementation of convolution neural networks,” Electronics (Basel)., vol. 13, no. 9, p. 1727, 2024, doi: 10.3390/electronics13091727.
- A. Mhaouch, W. Gtifa, T. Althobaiti, H. Faraj, and M. Machhout, “A Quality of Service Analysis of FPGA-Accelerated Conv2D Architectures for Brain Tumor Multi-Classification,” Computers, Materials & Continua, vol. 84, no. 3, pp. 5637-5663, 2025, doi: 10.32604/cmc.2025.065525.
- A. Pappalardo, A. Xilinx/Brevitas. 2021. Available online: https://zenodo.org/records/16987789 (accessed on 10 April 2026).
Language: English
Page range: 200 - 215
Submitted on: Mar 1, 2026
Published on: Jun 17, 2026
Published by: Slovak University of Technology in Bratislava
In partnership with: Paradigm Publishing Services
Publication frequency: 6 issues per year
Keywords:
Related subjects:
© 2026 Mustafa Tasci, Ayhan Istanbullu, published by Slovak University of Technology in Bratislava
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.