References
- M. Liu, Y. Mroueh, J. Ross, W. Zhang, X. Cui, P. Das, and T. Yang, “Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning,” arXiv preprint arXiv:2203.17191, 2022.
- Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, “Lower bounds for non-convex stochastic optimization,” Journal of Machine Learning Research, vol. 23, no. 115, pp. 1–75, 2022.
- D. Richards, M. Rabbat, and M. Rowland, “Sharpness-aware minimiza-tion improves distributed training of neural networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 29 115–29 135.
- N. Agarwal, Z. Allen-Zhu, K. Sridharan, and Y. Wang, “On the theory of variance reduction for stochastic gradient monte carlo”, Mathematical Programming, pp. 1–41, 2023.
- F. Huang, S. Chen, and Z. Huang, “Revisiting resnets: Improved training and scaling strategies,” Neural Networks, vol. 153, pp. 324–337, 2022.
- P. Xu, Z. Chen, D. Zou, and Q. Gu, “How can we craft large-scale neural networks in the presence of measurement noise?” Advances in Neural Information Processing Systems, vol. 34, pp. 28 140–28 152, 2021.
- R. Johnson et al., “Stochastic variance reduced gradient descent for non-convex optimization,” Journal of Machine Learning Research, vol. 21, pp. 1–30, 2020.
- T. Guo, Y. Liu, and C. Han, “An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning,” Optimization Letters, vol. 17, no. 2, pp. 385-400, 2023. doi:10.1007/s11590-023-01884-8.
- H. Zhang, Q. Yang, and Y. Zhang, “Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 124-135. doi:10.5555/3327763.3327786.
- A. K. Sinha, M. K. Gupta, and A. R. Jain, “Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 1-10. doi:10.5555/3495724.3495801.
- T. Qianqian, L. Guannan, and C. Xingyu, “Asynchronous Parallel Stochastic Quasi-Newton Methods,” Journal of Computational and Applied Mathematics, vol. 386, pp. 112-123, 2021. doi:10.1016/j.cam.2021.112123.
- R. M. Gower, P. Richtarik, and F. Bach, “Stochastic Block Coordinate Descent with Variance Reduction,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6262-6281, 2018. doi:10.1109/TIT.2018.2841289.
- Y. Chen et al., “Variance reduced stochastic gradient descent with momentum for non-convex optimization,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 1–10.