Layer-specific parallelization for FPGA-based convolutional neural network accelerators: Performance and resource evaluation

Mustafa Tasci; Ayhan Istanbullu

doi:10.2478/jee-2026-0021

.blurhash-client-img { display: none !important; }

Layer-specific parallelization for FPGA-based convolutional neural network accelerators: Performance and resource evaluation

Journal of Electrical Engineering

Volume 77 (2026): Issue 3 (June 2026)

By: Mustafa Tasci and Ayhan Istanbullu

Open Access

|Jun 2026

Abstract

Deep learning (DL) models require significant computational resources, making their deployment on edge devices with limited power and hardware capabilities challenging. Field-programmable gate arrays (FPGAs) provide an effective platform for accelerating such workloads because of their inherent parallelism and energy efficiency. This study investigates the impact of layer-wise parallelization levels, represented by folding coefficients, on the resource utilization and performance of FPGA-based DL accelerators, with a specific focus on convolutional (CONV) and fully connected (FC) layers. A LeNet-based accelerator model was implemented using the Xilinx FINN framework with W1A2 quantization (1-bit weights and 2-bit activations). Three folding coefficients, namely low (L), medium (M), and high (H), were defined for both the CONV and FC layers, yielding nine unique parallelization configurations. These accelerators were deployed on the PYNQ-Z1 platform and evaluated using the Fashion-MNIST dataset. A comprehensive evaluation quantifies key metrics, such as throughput (frames per second, FPS), resource utilization (look-up tables (LUTs), flip-flops (FFs), and block RAMs (BRAMs)), and power consumption. The results show that lower folding levels, corresponding to higher parallelism, significantly enhance the throughput, reaching up to 6809 FPS in the L-M and L-L configurations. This represents a 13-fold improvement over the baseline H-H configuration (C1) at the cost of increased resource usage. This study extends prior research on quantized neural networks (QNNs) by analyzing layer-specific parallelization strategies through adjustable folding factors and their effects on performance and resource trade-offs, offering valuable insights for optimizing FPGA-based deep learning (DL) inference in resource-constrained environments.

References

A. Shawahna, S. M. Sait, and A. El-Maleh, “FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review,” IEEE Access, vol. 7, pp. 7823-7859, 2019, doi: 10.1109/ACCESS.2018.2890150.
Search in Google Scholar Back to article
X. Liu et al., “Collaborative edge computing with FPGA-based CNN accelerators for energy-efficient and time-aware face tracking system,” IEEE Trans. Comput. Soc. Syst., vol. 9, no. 1, pp. 252–266, 2021, doi: 10.1109/TCSS.2021.3059318.
Search in Google Scholar Back to article
R. Gadea-Gironés, J. Fe, and J. M. Monzo, “Task parallelism-based architectures on FPGA to optimize the energy efficiency of AI at the edge,” Microprocess. Microsyst., vol. 98, p. 104824, 2023, doi: 10.1016/j.micpro.2023.104824.
Search in Google Scholar Back to article
M. Tibaldi and C. Pilato, “A survey of FPGA optimization methods for data center energy efficiency,” IEEE Transactions on Sustainable Computing, vol. 8, no. 3, pp. 343-362, 2023, doi: 10.1109/TSUSC.2023.3273852.
Search in Google Scholar Back to article
Y. Liang et al., “An efficient hardware design for accelerating sparse CNNs with NAS-based models,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 3, pp. 597-613, 2021, doi: 10.1109/TCAD.2021.3066563.
Search in Google Scholar Back to article
D. Ghimire, D. Kil, and S. Kim, “A survey on efficient convolutional neural networks and hardware acceleration,” Electronics (Basel)., vol. 11, no. 6, p. 945, 2022, doi: 10.3390/electronics11060945.
Search in Google Scholar Back to article
X. Zhang et al., “DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs,” in Proceedings of the International Conference on Computer-Aided Design, 2018, pp. 1-8. doi: 10.1145/3240765.3240801.
Search in Google Scholar Back to article
G. Brignone, R. Bosio, F. Ottati, C. Sansoè, and L. Lavagno, “SILVIA: Automated Superword-Level Parallelism Exploitation via HLS-Specific LLVM Passes for Compute-Intensive FPGA Accelerators,” ACM Trans. Reconfigurable Technol. Syst., vol. 18, no. 2, pp. 1-16, 2025, doi: 10.1145/3705324.
Search in Google Scholar Back to article
B. A. Motetti, M. Risso, A. Burrello, E. Macii, M. Poncino, and D. J. Pagliari, “Joint pruning and channel-wise mixed-precision quantization for efficient deep neural networks,” IEEE Transactions on Computers, 2024, doi: 10.1109/TC.2024.3449084.
Search in Google Scholar Back to article
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, “NeuFlow: A Runtime Reconfigurable Data-flow Processor for Vision,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2011, pp. 109-116. doi: 10.1109/CVPRW.2011.5981829.
Search in Google Scholar Back to article
M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-Centric Accelerator Design for Convolutional Neural Networks,” in Proceedings of the IEEE International Conference on Computer Design, 2013, pp. 13-19. doi: 10.1109/ICCD.2013.6657019.
Search in Google Scholar Back to article
T. Ma, Z. Li, Q. Li, H. Liu, Z. Zhao, and Y. Wang, “FPGA Optimized Accelerator of DCNN with Fast Data Readout and Multiplier Sharing Strategy,” Computers, Materials & Continua, vol. 77, no. 3, pp. 3237-3263, 2023, doi: 10.32604/cmc.2023.045948.
Search in Google Scholar Back to article
V. Leon, S. Mouselinos, K. Koliogeorgi, S. Xydis, D. Soudris, and K. Pekmestzi, “A TensorFlow Extension Framework for Optimized Generation of Hardware CNN Inference Engines,” Technologies (Basel)., vol. 8, no. 1, p. 6, 2020, doi: 10.3390/technologies8010006.
Search in Google Scholar Back to article
H. Hong et al., “Survey of convolutional neural network accelerators on field-programmable gate array platforms: architectures and optimization techniques,” J. Real. Time. Image Process., vol. 21, no. 3, p. 64, 2024, doi: 10.1007/s11554-024-01442-8.
Search in Google Scholar Back to article
U. Kulkarni et al. (2021). Performance Improvements in Quantization Aware Training and Appreciation of Low Precision Computation in Deep Learning. In: Thampi, S.M., Krishnan, S., Hegde, R.M., Ciuonzo, D., Hanne, T., Kannan R., J. (eds) Advances in Signal Processing and Intelligent Recognition Systems. SIRS 2020. Communications in Computer and Information Science, vol 1365. Springer, Singapore. https://doi.org/10.1007/978-981-16-0425-6_79.
Search in Google Scholar Back to article
Y. Su, K. P. Seng, L. M. Ang, and J. Smith, “Binary Neural Networks in FPGAs: Architectures, Tool Flows and Hardware Comparisons,” Sensors, vol. 23, no. 22, p. 9254, Nov. 2023, doi: 10.3390/s23229254.
Search in Google Scholar Back to article
C. Yuan and S. S. Agaian, “A comprehensive review of binary neural network,” Artif. Intell. Rev., vol. 56, no. 11, pp. 12949-13013, 2023, doi: 10.1007/s10462-023-10464-w.
Search in Google Scholar Back to article
S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei, “FP-BNN: Binarized Neural Network on FPGA,” Neurocomputing, vol. 275, pp. 1072-1086, 2018, doi: 10.1016/j.neucom.2017.09.023.
Search in Google Scholar Back to article
Q. Cheng et al., “Reliability exploration of system-on-chip with multi-bit-width accelerator for multi-precision deep neural networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 10, pp. 3978-3991, 2023, doi: 10.1109/TCSI.2023.3300899.
Search in Google Scholar Back to article
M. Ji, Z. Al-Ars, P. Hofstee, Y. Chang, and B. Zhang, “Fpqnet: Fully pipelined and quantized cnn for ultra-low latency image classification on fpgas using opencapi,” Electronics (Basel)., vol. 12, no. 19, p. 4085, 2023, doi: 10.3390/electronics12194085.
Search in Google Scholar Back to article
M. Amin and T. Adiono, “Area optimized CNN architecture using folding approach,” in 2019 International Conference on Electrical Engineering and Informatics (ICEEI), 2019, pp. 206-209. doi: 10.1109/ICEEI47359.2019.8988879.
Search in Google Scholar Back to article
L. Liu et al., “An Automatic Neural Network Architecture-and-Quantization Joint Optimization Framework for Efficient Model Inference,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 43, no. 5, pp. 1497-1510, 2023, doi: 10.1109/TCAD.2023.3339438.
Search in Google Scholar Back to article
R. Wu, X. Guo, J. Du, and J. Li, “Accelerating neural network inference on FPGA-based platforms – A survey,” Electronics (Basel)., vol. 10, no. 9, p. 1025, 2021, doi: 10.3390/electronics10091025.
Search in Google Scholar Back to article
J.-Y. Kim, “FPGA based neural network accelerators,” in Advances in computers, vol. 122, Elsevier, 2021, pp. 135-165. doi: 10.1016/bs.adcom.2020.11.002.
Search in Google Scholar Back to article
M. Tasci, A. Istanbullu, V. Tumen, and S. Kosunalp, “FPGAQNN: quantized neural network hardware acceleration on FPGAs,” Applied Sciences, vol. 15, no. 2, p. 688, 2025, doi: 10.3390/app15020688.
Search in Google Scholar Back to article
D. T. Nguyen, H. Kim, and H.-J. Lee, “Layer-Specific Optimization for Mixed Data Flow With Mixed Precision in FPGA Design for CNN-Based Object Detectors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2450-2464, Jun. 2021, doi: 10.1109/tcsvt.2020.3020569.
Search in Google Scholar Back to article
J. Wang, W. Tong, and X. Zhi, “Model parallelism optimization for CNN FPGA accelerator,” in Algorithms, vol. 6, no. 2 MDPI, 2023, p. 110. doi: 10.3390/a16020110.
Search in Google Scholar Back to article
K. He, B. Liu, Y. Zhang, A. Ling, and D. Gu, “FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10,” in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA: Association for Computing Machinery, 2020, p. 314. doi: 10.1145/3373087.3375389.
Search in Google Scholar Back to article
S. K. Venkataramanaiah et al., “FPGA-based low-batch training accelerator for modern CNNs featuring high bandwidth memory,” in Proceedings of the 39th International Conference on Computer-Aided Design, New York, NY, USA: Association for Computing Machinery, 2020. doi: 10.1145/3400302.3415643.
Search in Google Scholar Back to article
J. Gao, Y. Yao, Z. Li, and J. Lai, “Fca-bnn: Flexible and configurable accelerator for binarized neural networks on fpga,” IEICE Trans. Inf. Syst., vol. 104, no. 8, pp. 1367-1377, 2021, doi: 10.1587/transinf.2021EDP7054.
Search in Google Scholar Back to article
R. Zhao et al., “Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs,” in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 15-24. doi: 10.1145/3020078.3021741.
Search in Google Scholar Back to article
Z. Nie et al., “Laius: an energy-efficient FPGA CNN accelerator with the support of a fixed-point training framework,” International Journal of Computational Science and Engineering, vol. 21, no. 3, pp. 418-428, 2020, doi: 10.1504/IJCSE.2020.106064.
Search in Google Scholar Back to article
Y. Liu, Y. Chen, W. Ye, and Y. Gui, “FPGA-NHAP: A General FPGA-Based Neuromorphic Hardware Acceleration Platform with High Speed and Low Power,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 69, no. 6, pp. 2553-2566, 2022, doi: 10.1109/TCSI.2022.3160693.
Search in Google Scholar Back to article
Y. Umuroglu et al., “FINN: A Framework for Fast, Scalable Binarized Neural Network Inference,” in Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 65-74. doi: 10.1145/3020078.3021744.
Search in Google Scholar Back to article
M. Blott et al., “FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks,” ACM Trans. Reconfigurable Technol. Syst., vol. 11, no. 3, pp. 1-23, 2018, doi: 10.1145/3242897.
Search in Google Scholar Back to article
M. Kong and J. L. Nunez-Yanez, “Entropy-Based Early-Exit in a FPGA-Based Low-Precision Neural Network,” in International Symposium on Applied Reconfigurable Computing, 2022, pp. 72-86, doi: 10.1007/978-3-031-19983-7_6.
Search in Google Scholar Back to article
I. Morianos, K. Georgopoulos, A. Brokalakis, T. Kyriakakis, and S. Ioannidis, “I2DS: FPGA-Based Deep Learning Industrial Intrusion Detection System,” in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, 2024, pp. 165-176, doi: 10.1007/978-3-031-78380-7_14.
Search in Google Scholar Back to article
T. Xaviour, B. Panuccio, E. Lee, L. Yang, and X. Wang, “Fast Implementation of Quantized Neural Network Accelerators on FPGAs for Image Classification,” in 2025 Systems and Information Engineering Design Symposium (SIEDS), 2025, pp. 155-160. doi: 10.1109/SIEDS65500.2025.11021206.
Search in Google Scholar Back to article
E. Fatima, M. Fahad, H. Abrar, H. Waris, and others, “FPGA Based Artificial Neural Network Accelerator,” in 2024 26th International Multi-Topic Conference (INMIC), 2024, pp. 1-6. doi: 10.1109/INMIC64792.2024.11004346.
Search in Google Scholar Back to article
S. Ben Ali, S.-I. Filip, O. Sentieys, and G. Lemieux, “MPTorch-FPGA: a Custom Mixed-Precision Framework for FPGA-based DNN Training,” in 2025 Design, Automation & Test in Europe Conference (DATE), 2025, pp. 1-7. doi: 10.23919/DATE64628.2025.10993010.
Search in Google Scholar Back to article
B. J. Mohd, K. M. Ahmad Yousef, A. AlMajali, and T. Hayajneh, “Quantization-based optimization algorithm for hardware implementation of convolution neural networks,” Electronics (Basel)., vol. 13, no. 9, p. 1727, 2024, doi: 10.3390/electronics13091727.
Search in Google Scholar Back to article
A. Mhaouch, W. Gtifa, T. Althobaiti, H. Faraj, and M. Machhout, “A Quality of Service Analysis of FPGA-Accelerated Conv2D Architectures for Brain Tumor Multi-Classification,” Computers, Materials & Continua, vol. 84, no. 3, pp. 5637-5663, 2025, doi: 10.32604/cmc.2025.065525.
Search in Google Scholar Back to article
A. Pappalardo, A. Xilinx/Brevitas. 2021. Available online: https://zenodo.org/records/16987789 (accessed on 10 April 2026).
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/jee-2026-0021 | Journal eISSN: 1339-309X | Journal ISSN: 1335-3632

Journal RSS Feed

Language: English

Page range: 200 - 215

Submitted on: Mar 1, 2026

Published on: Jun 17, 2026

Published by: Slovak University of Technology in Bratislava

In partnership with: Paradigm Publishing Services

Publication frequency: 6 issues per year

Keywords:

convolutional neural network accelerators,

field-programmable gate array,

folding factors,

parallelization

Related subjects:

Engineering,

Introductions and overviews,

Engineering, other

© 2026 Mustafa Tasci, Ayhan Istanbullu, published by Slovak University of Technology in Bratislava
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 77 (2026): Issue 3 (June 2026)