Have a personal or library account? Click to login
Accuracy degradation aware bit rate allocation for layer-wise uniform quantization of weights in neural network Cover

Accuracy degradation aware bit rate allocation for layer-wise uniform quantization of weights in neural network

Open Access
|Dec 2024

References

  1. Y. Bhalgat, J. Lee, M. Nagel, T. Blankevoort and N. Kwak, “LSQ+: Improving low-bit quantization through learnable offsets and better initialization”, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, pp. 2978-2985, 2020.
  2. T. Chu, Q. Luo, J. Yang and X. Huang, “Mixed-precision quantized neural networks with progressively decreasing bitwidth”, Pattern Recognition, vol. 111, pp. 107647, 2021.
  3. A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney and K. Keutzer, A Survey of Quantization Methods for Efficient Neural Network Inference, Low-Power Computer Vision, Chapman and Hall/CRC, 2022.
  4. Z. Yang, Y. Wang, K. Han, C. Xu, C. Xu, D. Tao and C. Xu, “Searching for low-bit weights in quantized neural networks”, in Proc. Conf. Neural Inf. Process. Syst., Vancouver, Canada, 2020.
  5. P. E. Novac, G. B. Hacene, A. Pegatoquet, B. Miramond and V. Gripon, “Quantization and deployment of deep neural networks on microcontrollers”, Sensors, vol. 21, pp. 2984, 2021.
  6. S. Na and D. Neuhoff, “Monotonicity of step sizes of MSE-optimal symmetric uniform scalar quantizers”, IEEE Transactions on Information Theory, vol. 65, no. 3, pp. 1782–1792, 2019.
  7. A. Jovanović, Z. Perić and J. Nikolić, “Iterative algorithm for designing asymptotically optimal uniform scalar quantisation of the one-sided Rayleigh density”, IET Communications, vol. 15, no. 5, pp. 723–729, 2021.
  8. H. Diao, G. Li, S. Xu, C. Kong and W. Wang, “Attention round for post-training quantization”, Neurocomputing, vol. 565, pp. 127012, 2024.
  9. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yniv and Y. Bengio, “Quantized neural networks: Training neural networks with low precision weights and activations”, Journal of Machine Learning Research, vol. 18, no. 187, pp. 6869–6898, 2017.
  10. W. Zhe, J. Lin, V. Chandrasekhar and B. Girod, “Optimizing the bit allocation for compression of weights and activations of deep neural networks”, in Proc. IEEE International Conference on Image Processing, Taipei, Taiwan, pp. 3826–3830, 2019.
  11. Z. Perić, M. Savić, N. Simić, B. Denić and V. Despotović, “Design of a 2-bit neural network quantizer for Laplacian source”, Entropy, vol. 23, no. 8, Art. pp. 933, 2021.
  12. J. Nikolić, Z. Perić, D. Aleksić, S. Tomić and A. Jovanović, “Whether the support region of three-bit uniform quantizer has a strong impact on post-training quantization for MNIST dataset?”, Entropy, vol. 23, no. 12, pp. 1699, 2021.
  13. S. Tomić, J. Nikolić, Z. Perić and D. Aleksić, “Performance of post-training two-bits uniform and layer-wise uniform quantization for MNIST dataset from the perspective of support region choice”, Mathematical Problems in Engineering, vol. 2022, ID 1463094, pp 1-15, 2022.
  14. IEEE Standard for Floating-Point Arithmetic, IEEE Std 754-2019 (Revision of IEEE 754-2008), IEEE, Piscataway, NJ, USA, 2019, pp. 1–84.
  15. Z. Perić, M. Savić, M. Dinčić, N. Vučić, D. Đošić and S. Milosavljević, “Floating point and fixed point 32-bits quantizers for quantization of weights of neural networks”, in Proc. International Symposium on Advanced Topics in Electrical Engineering, Bucharest, Romania, pp. 1–4, 2021.
  16. F. A. C. Alegria, “Precision of sinewave amplitude estimation in the presence of additive noise and quantization error”, Journal of Electrical Engineering, vol. 74, no. 5, pp. 374–381, 2023.
  17. J. Nikolić, Z. Perić, S. Tomić and D. Aleksić, “On different criteria for optimizing the two-bit uniform quantizer”, in Proc. International Symposium INFOTEH-JAHORINA, East Sarajevo, Bosnia and Herzegovina, pp. 1–4, 2022.
  18. J. R. Nikolić, S. S. Tomić, Z. H. Perić and D. R. Aleksić, “Analysis of neural network accuracy degradation due to uniform weight quantization of one or more layers”, in Proc. International Scientific Conference on Information, Commun. Energy Syst. Technol. (ICEST), Ohrid, North Macedonia, pp. 1–4, 2022.
  19. E. Soufleri and K. Roy, “Network compression via mixed precision quantization using multi-layer perceptron for the bit-width allocation”, IEEE Access, vol. 9, pp. 135059–135068, 2021.
  20. F. Tung and G. Mori, “CLIP-Q: Deep network compression learning by in-parallel pruning-quantization”, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, pp. 7873–7882, 2018.
  21. S. Na and D. Neuhoff, “On the support of MSE-optimal, fixed-rate, scalar quantizers”, IEEE Transactions on Information Theory, vol. 47, no. 7, pp. 2972–2982, 2001.
  22. B. Denić, Z. Perić, V. Despotović, N. Vučić and P. Petrović, “Dual-mode quasi-logarithmic quantizer with embedded G.711 codec”, Journal of Electrical Engineering, vol. 69, no. 1, pp. 46–51, 2018.
  23. S. Zhu, L. Xu, E. D. Goodman and Z. Lu, “A new many-objective evolutionary algorithm based on generalized Pareto dominance”, IEEE Transactions on Cybernetics., vol. 52, no. 8, pp. 7776-7790, 2022.
  24. C. O’Mahony and N. Wilson, “Sorted Pareto dominance: An extension to Pareto dominance and its application in soft constraints”, in Proc. International Conference on Tools with Artificial Intelligence, Athens, Greece, pp. 798–805, 2012.
  25. J. Branke, K. Deb, K. Miettinen and R. Słowiński, Multiobjective Optimization Interactive and Evolutionary Approaches, Springer, 2008.
  26. L. Deng, “The MNIST database of handwritten digit images for machine learning research”, IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
  27. S. Jayant and P. Noll, Digital Coding of Waveforms, New Jersey, Prentice Hall, 1984, sec. 5.
DOI: https://doi.org/10.2478/jee-2024-0051 | Journal eISSN: 1339-309X | Journal ISSN: 1335-3632
Language: English
Page range: 425 - 434
Submitted on: Aug 10, 2024
Published on: Dec 6, 2024
Published by: Slovak University of Technology in Bratislava
In partnership with: Paradigm Publishing Services
Publication frequency: 6 issues per year

© 2024 Jelena Nikolić, Stefan Tomić, Zoran Perić, Aleksandra Jovanović, Danijela Aleksić, published by Slovak University of Technology in Bratislava
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.