A Novel Variance Reduction Proximal Stochastic Newton Algorithm for Large-Scale Machine Learning Optimization

Mohammed Moyed Ahmed

doi:10.2478/ijanmc-2024-0040

References

M. Liu, Y. Mroueh, J. Ross, W. Zhang, X. Cui, P. Das, and T. Yang, “Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning,” arXiv preprint arXiv:2203.17191, 2022.
Search in Google Scholar Back to article
Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, “Lower bounds for non-convex stochastic optimization,” Journal of Machine Learning Research, vol. 23, no. 115, pp. 1–75, 2022.
Search in Google Scholar Back to article
D. Richards, M. Rabbat, and M. Rowland, “Sharpness-aware minimiza-tion improves distributed training of neural networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 29 115–29 135.
Search in Google Scholar Back to article
N. Agarwal, Z. Allen-Zhu, K. Sridharan, and Y. Wang, “On the theory of variance reduction for stochastic gradient monte carlo”, Mathematical Programming, pp. 1–41, 2023.
Search in Google Scholar Back to article
F. Huang, S. Chen, and Z. Huang, “Revisiting resnets: Improved training and scaling strategies,” Neural Networks, vol. 153, pp. 324–337, 2022.
Search in Google Scholar Back to article
P. Xu, Z. Chen, D. Zou, and Q. Gu, “How can we craft large-scale neural networks in the presence of measurement noise?” Advances in Neural Information Processing Systems, vol. 34, pp. 28 140–28 152, 2021.
Search in Google Scholar Back to article
R. Johnson et al., “Stochastic variance reduced gradient descent for non-convex optimization,” Journal of Machine Learning Research, vol. 21, pp. 1–30, 2020.
Search in Google Scholar Back to article
T. Guo, Y. Liu, and C. Han, “An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning,” Optimization Letters, vol. 17, no. 2, pp. 385-400, 2023. doi:10.1007/s11590-023-01884-8.
Open DOI Search in Google Scholar Back to article
H. Zhang, Q. Yang, and Y. Zhang, “Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 124-135. doi:10.5555/3327763.3327786.
Open DOI Search in Google Scholar Back to article
A. K. Sinha, M. K. Gupta, and A. R. Jain, “Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 1-10. doi:10.5555/3495724.3495801.
Open DOI Search in Google Scholar Back to article
T. Qianqian, L. Guannan, and C. Xingyu, “Asynchronous Parallel Stochastic Quasi-Newton Methods,” Journal of Computational and Applied Mathematics, vol. 386, pp. 112-123, 2021. doi:10.1016/j.cam.2021.112123.
Open DOI Search in Google Scholar Back to article
R. M. Gower, P. Richtarik, and F. Bach, “Stochastic Block Coordinate Descent with Variance Reduction,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6262-6281, 2018. doi:10.1109/TIT.2018.2841289.
Open DOI Search in Google Scholar Back to article
Y. Chen et al., “Variance reduced stochastic gradient descent with momentum for non-convex optimization,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 1–10.
Search in Google Scholar Back to article

A Novel Variance Reduction Proximal Stochastic Newton Algorithm for Large-Scale Machine Learning Optimization

References

Paradigm

My account