Have a personal or library account? Click to login
The Generalization Error Bound for A Stochastic Gradient Descent Family bia A Gaussian Approximation Method Cover

The Generalization Error Bound for A Stochastic Gradient Descent Family bia A Gaussian Approximation Method

By: Hao Chen,  Zhanfeng Mo and  Zhouwang Yang  
Open Access
|Jun 2025

References

  1. Arora, S., Du, S.S., Hu, W., Li, Z., Salakhutdinov, R.R. and Wang, R. (2019). On exact computation with an infinitely wide neural net, in H. Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, New York, pp. 8141–8150.
  2. Bartlett, P.L., Foster, D.J. and Telgarsky, M.J. (2017). Spectrally-normalized margin bounds for neural networks, in I. Guyon et al. (Eds), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, New York, pp. 6240–6249.
  3. Bottou, L. (1998). On-line learning and stochastic approximations, in L. Bottou (Ed) On-line Learning in Neural Networks, Cambridge University Press, pp. 9–42.
  4. Bousquet, O. and Elisseeff, A. (2002). Stability and generalization, Journal of Machine Learning Research 2: 499–526, DOI: 10.1162/153244302760200704.
  5. Chen, H., Mo, Z., Yang, Z. and Wang, X. (2019). Theoretical investigation of generalization bound for residual networks, Proceedings of the 28th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco, pp. 2081–2087, DOI: 10.24963/ijcai.2019/288.
  6. Chen, Y., Chang, H., Meng, J. and Zhang, D. (2019). Ensemble neural networks (enn): A gradient-free stochastic method, Neural Networks 110: 170–185.
  7. D’Agostino, R. and Pearson, E.S. (1973). Tests for departure from normality. empirical results for the distributions of b2 and √b1, Biometrika 60(3): 613–622.
  8. Dieuleveut, A., Durmus, A. and Bach, F. (2017). Bridging the gap between constant step size stochastic gradient descent and markov chains, arXiv: 1707.06386.
  9. Elisseeff, A., Evgeniou, T. and Pontil, M. (2005). Stability of randomized learning algorithms, Journal of Machine Learning Research 6: 55–79.
  10. Feng, Y., Gao, T., Li, L., Liu, J.-G. and Lu, Y. (2020). Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation, Communications in Mathematical Sciences 18(1).
  11. Hardt, M., Recht, B. and Singer, Y. (2016). Train faster, generalize better: Stability of stochastic gradient descent, in M.F. Balcan and K.Q.Weinberger (Eds), Proceedings of The 33rd International Conference on Machine Learning, New York, USA, pp. 1225–1234.
  12. He, F., Liu, T. and Tao, D. (2019). Control batch size and learning rate to generalize well: Theoretical and empirical evidence, in H.Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, New York, pp. 1143–1152.
  13. He, K., Zhang, X., Ren, S. and Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Poceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034.
  14. Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H. and Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2704–2713, DOI:10.1109/CVPR.2018.00286.
  15. Kramers, H. (1940). Brownian motion in a field of force and the diffusion model of chemical reactions, Physica 7(4): 284–304.
  16. Krizhevsky, A. and Hinton, G. (2009). LearningMultiple Layers of Features from Tiny Images, Master’s thesis, Department of Computer Science, University of Toronto.
  17. LeCun, Y. and Cortes, C. (2010). MNIST handwritten digit database, http://yann.lecun.com/exdb/mnist/.
  18. Li, H., Xu, Z., Taylor, G., Studer, C. and Goldstein, T. (2018). Visualizing the loss landscape of neural nets, in S. Bengio et al. (Eds), Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, New York, pp. 6389–6399.
  19. Li, J., Luo, X. and Qiao, M. (2020). On generalization error bounds of noisy gradient methods for non-convex learning, arXiv: 1902.00621.
  20. Li, Q., Tai, C. and Weinan, E. (2019). Stochastic modified equations and dynamics of stochastic gradient algorithms i: Mathematical foundations, Journal of Machine Learning Research 20(40): 1–47.
  21. Li, X., Lu, J.,Wang, Z., Haupt, J. and Zhao, T. (2019). On tighter generalization bound for deep neural networks: CNNs, ResNets, and beyond, arXiv: 1806.05159.
  22. Ljung, L., Pflug, G. and Walk, H. (1992). Stochastic Approximation and Optimization of Random Systems, BirkhÀuser, Basel, Switzerland.
  23. London, B., Huang, B., Taskar, B. and Getoor, L. (2014). PAC-Bayesian Collective Stability, in S. Kaski and J. Corander (Eds), Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, pp. 585–594.
  24. Mandt, S., Hoffman, M.D. and Blei, D.M. (2017). Stochastic gradient descent as approximate Bayesian inference, Journal of Machine Learning Research 18(1): 4873–4907.
  25. McAllester, D.A. (1999). PAC-Bayesian model averaging, Proceedings of the 12th Annual Conference on Computational Learning Theory, COLT ’99, New York, NY, USA, pp. 164–170, DOI: 10.1145/307400.307435.
  26. Negrea, J., Haghifam, M., Dziugaite, G. K., Khisti, A. and Roy, D. M. (2019). Information-theoretic generalization bounds for sgld via data-dependent estimates, in H. Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, New York, pp. 11015–11025.
  27. Panigrahi, A., Somani, R., Goyal, N. and Netrapalli, P. (2019). Non-gaussianity of stochastic gradient noise, arXiv: 1910.09626.
  28. Qian, Y., Wang, Y., Wang, B., Gu, Z., Guo, Y. and Swaileh, W. (2022). Hessian-free second-order adversarial examples for adversarial learning, arXiv: 2207.01396.
  29. Rong, Y., Huang, W., Xu, T. and Huang, J. (2020). DropEdge: Towards deep graph convolutional networks on node classification, arXiv: 1907.10903.
  30. Shalev-Shwartz, S. and Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, International Conference on Learning Representations 2020, https://openreview.net/pdf?id=Hkx1qkrKPr.
  31. Simsekli, U., Sagun, L. and Gurbuzbalaban, M. (2019). A tail-index analysis of stochastic gradient noise in deep neural networks, in K. Chaudhuri and R. Salakhutdinov (Eds), Proceedings of the 36th International Conference on Machine Learning, New York, pp. 5827–5837.
  32. Sulaiman, I.M., Kaelo, P., Khalid, R. and Nawawi, M.K.M. (2024). A descent generalized rmil spectral gradient algorithm for optimization problems, Internationl Jouranl of Applied Mathematics and Computer Science 34(2): 225–233, DOI: 10.61822/amcs-2024-0016.
  33. Sutskever, I., Martens, J., Dahl, G. and Hinton, G. (2013). On the importance of initialization and momentum in deep learning, in S. Dasgupta and D. McAllester (Eds), Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, pp. 1139–1147.
  34. Villani, C. (2008). Optimal Transport: Old and New, Grundlehren der Mathematischen Wissenschaften, Springer, Berlin/Heidelberg.
  35. Weinan, E., Ma, C. and Wang, Q. (2019). A priori estimates of the population risk for residual networks, arXiv: 1903.02154.
  36. Welling, M. and Teh, Y.W. (2011). Bayesian learning via stochastic gradient Langevin dynamics, Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, Omnipress, Madison, WI, USA, p. 681–688.
  37. Wu, J., Hu, W., Xiong, H., Huan, J., Braverman, V. and Zhu, Z. (2020). On the noisy gradient descent that generalizes as SGD, International Conference on Machine Learning, PMLR 119: 10367–10376.
  38. Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O. (2016). Understanding deep learning requires rethinking generalization, arXiv: 1611.03530.
DOI: https://doi.org/10.61822/amcs-2025-0018 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 251 - 266
Submitted on: Jun 3, 2024
Accepted on: Oct 16, 2024
Published on: Jun 24, 2025
Published by: University of Zielona Góra
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Hao Chen, Zhanfeng Mo, Zhouwang Yang, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.