Have a personal or library account? Click to login
Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing Cover

Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing

Open Access
|Jun 2024

References

  1. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, https://arxiv.org/abs/1706.03762 (2017). https://doi.org/10.48550/ARXIV.1706.03762.
  2. N. Li, S. Liu, Y. Liu, S. Zhao, M. Liu, Neural speech synthesis with transformer network, in: Proceedings of the AAAI conference on artificial intelligence, Vol. 33, 2019, pp. 6706–6713.
  3. P. Morris, R. St. Clair, W. E. Hahn, E. Barenholtz, Predicting binding from screening assays with transformer network embeddings, Journal of Chemical Information and Modeling 60 (9) (2020) 4191–4199.
  4. F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, H. Fu, Transformers in medical imaging: A survey, https://arxiv.org/abs/2201.09873 (2022). https://doi.org/10.48550/ARXIV.2201.09873.
  5. T. Lin, Y. Wang, X. Liu, X. Qiu, A survey of transformers (2021). https://doi.org/10.48550/ARXIV.2106.04554.
  6. M. Zhang, J. Li, A commentary of gpt-3 in mit technology review 2021, Fundamental Research 1 (6) (2021) 831–833.
  7. M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformers, arXiv preprint arXiv:1807.03819 (2018).
  8. M. Feurer, F. Hutter, Hyperparameter optimization, in: Automated machine learning, Springer, Cham, 2019, pp. 3–33.
  9. J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, S.-H. Deng, Hyperparameter optimization for machine learning models based on bayesian optimization, Journal of Electronic Science and Technology 17 (1) (2019) 26–40.
  10. R. Turner, D. Eriksson, M. McCourt, J. Kiili, E. Laaksonen, Z. Xu, I. Guyon, Bayesian optimization is superior to random search for machine learning hyperparameter tuning: Analysis of the black-box optimization challenge 2020, in: NeurIPS 2020 Competition and Demonstration Track, PMLR, 2021, pp. 3–26.
  11. R. G. Mantovani, A. L. Rossi, J. Vanschoren, B. Bischl, A. C. De Carvalho, Effectiveness of random search in svm hyper-parameter tuning, in: 2015 International Joint Conference on Neural Networks (IJCNN), Ieee, 2015, pp. 1–8.
  12. X. He, K. Zhao, X. Chu, Automl: A survey of the state-of-the-art, Knowledge-Based Systems 212 (2021) 106622.
  13. K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE transactions on pattern analysis and machine intelligence 45 (1) (2022) 87–110.
  14. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
  15. M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, I. Sutskever, Generative pretraining from pixels, in: International conference on machine learning, PMLR, 2020, pp. 1691–1703.
  16. N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, D. Tran, Image transformer, in: International conference on machine learning, PMLR, 2018, pp. 4055–4064.
  17. Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, L. Sun, Transformers in time series: A survey, arXiv preprint arXiv:2202.07125 (2022).
  18. E. Dogo, O. Afolabi, N. Nwulu, B. Twala, C. Aigbavboa, A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks, in: 2018 international conference on computational techniques, electronics and mechanical systems (CTEMS), IEEE, 2018, pp. 92–99.
  19. S. J. Reddi, S. Kale, S. Kumar, On the convergence of adam and beyond, arXiv preprint arXiv:1904.09237 (2019).
  20. Y. Liu, Y. Sun, B. Xue, M. Zhang, G. G. Yen, K. C. Tan, A survey on evolutionary neural architecture search, IEEE transactions on neural networks and learning systems (2021).
  21. G. Bender, P.-J. Kindermans, B. Zoph, V. Vasudevan, Q. Le, Understanding and simplifying one-shot architecture search, in: International conference on machine learning, PMLR, 2018, pp. 550–559.
  22. J. Fang, Y. Sun, Q. Zhang, Y. Li, W. Liu, X. Wang, Densely connected search space for more flexible neural architecture search, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10628–10637.
  23. P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, X. Wang, A comprehensive survey of neural architecture search: Challenges and solutions, ACM Computing Surveys (CSUR) 54 (4) (2021) 1–34.
  24. K. Murray, J. Kinnison, T. Q. Nguyen, W. Scheirer, D. Chiang, Auto-sizing the transformer network: Improving speed, efficiency, and performance for low-resource machine translation, https://arxiv.org/abs/1910.06717 (2019), http://arxiv.org/abs/1910.06717.
  25. M. Baldeon-Calisto, S. K. Lai-Yuen, Adaresu-net: Multiobjective adaptive convolutional neural network for medical image segmentation, Neurocomputing 392 (2020) 325–340.
  26. K. Chen, W. Pang, Immunetnas: An immune-network approach for searching convolutional neural network architectures, arXiv preprint arXiv:2002.12704 (2020).
  27. M. Wistuba, A. Rawat, T. Pedapati, A survey on neural architecture search, arXiv preprint arXiv:1905.01392 (2019).
  28. J. G. Robles, J. Vanschoren, Learning to reinforcement learn for neural architecture search, arXiv preprint arXiv:1911.03769 (2019).
  29. A. Vahdat, A. Mallya, M.-Y. Liu, J. Kautz, Unas: Differentiable architecture search meets reinforcement learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11266–11275.
  30. K. De Jong, Evolutionary computation: a unified approach, in: Proceedings of the 2016 on genetic and evolutionary computation conference companion, 2016, pp. 185–199.
  31. S. Gibb, H. M. La, S. Louis, A genetic algorithm for convolutional network structure optimization for concrete crack detection, in: 2018 IEEE congress on evolutionary computation (CEC), IEEE, 2018, pp. 1–8.
  32. L. Xie, A. Yuille, Genetic cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 1379–1388.
  33. F. Ye, Particle swarm optimization-based automatic parameter selection for deep neural networks and its applications in large-scale and high-dimensional data, PloS one 12 (12) (2017) e0188746.
  34. K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, E. Xing, https://arxiv.org/abs/1802.07191, Neural architecture search with bayesian optimisation and optimal transport (2018), https://doi.org/10.48550/ARXIV.1802.07191, https://arxiv.org/abs/1802.07191.
  35. F. P. Casale, J. Gordon, N. Fusi, Probabilistic neural architecture search, arXiv preprint arXiv:1902.05116 (2019).
  36. H. K. Singh, T. Ray, W. Smith, Surrogate assisted simulated annealing (SASA) for constrained multi-objective optimization, in: IEEE congress on evolutionary computation, IEEE, 2010, pp. 1–8.
  37. K. T. Chitty-Venkata, M. Emani, V. Vishwanath, A. K. Somani, Neural architecture search for transformers: A survey, IEEE Access 10 (2022) 108374–108412, https://doi.org/10.1109/ACCESS.2022.3212767.
  38. J. Kim, J. Wang, S. Kim, Y. Lee, Evolved speech-transformer: Applying neural architecture search to end-to-end automatic speech recognition., in: Interspeech, 2020, pp. 1788–1792.
  39. D. Cummings, A. Sarah, S. N. Sridhar, M. Szankin, J. P. Munoz, S. Sundaresan, A hardware-aware framework for accelerating neural architecture search across modalities (2022), http://arxiv.org/abs/2205.10358.
  40. D. So, Q. Le, C. Liang, The evolved transformer, in: International conference on machine learning, PMLR, 2019, pp. 5877–5886.
  41. D. So, W. Mańke, H. Liu, Z. Dai, N. Shazeer, Q. V. Le, Searching for efficient transformers for language modeling, Advances in neural information processing systems 34 (2021) 6010–6022.
  42. V. Cahlik, P. Kordik, M. Cepek, Adapting the size of artificial neural networks using dynamic autosizing, in: 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), IEEE, 2022, pp. 592–596.
  43. B. Goldberger, G. Katz, Y. Adi, J. Keshet, Minimal modifications of deep neural networks using verification., in: LPAR, Vol. 2020, 2020, p. 23.
  44. T. Chen, I. Goodfellow, J. Shlens, Net2net: Accelerating learning via knowledge transfer, arXiv preprint arXiv:1511.05641 (2015).
  45. T. Wei, C. Wang, Y. Rui, C. W. Chen, Network morphism, in: International conference on machine learning, PMLR, 2016, pp. 564–572.
  46. H. Jin, Q. Song, X. Hu, Auto-keras: An efficient neural architecture search system, in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 1946–1956.
  47. T. Elsken, J. H. Metzen, F. Hutter, https://openreview.net/forum?id=ByME42AqK7 Efficient multi-objective neural architecture search via lamarckian evolution (2019), https://openreview.net/forum?id=ByME42AqK7
  48. P. J. Van Laarhoven, E. H. Aarts, Simulated annealing, in: Simulated annealing: Theory and applications, Springer, 1987, pp. 7–15.
  49. D. J. Ram, T. Sreenivas, K. G. Subramaniam, Parallel simulated annealing algorithms, Journal of parallel and distributed computing 37 (2) (1996) 207–212, https://doi.org/https://doi.org/10.1006/jpdc.1996.0121.
  50. N. Metropolis, A. W. Rosenbluth, M. N. Rosen-bluth, A. H. Teller, E. Teller, Equation of state calculations by fast computing machines, The journal of chemical physics 21 (6) (1953) 1087–1092.
  51. D. Delahaye, S. Chaimatanan, M. Mongeau, Simulated annealing: From basics to applications, Handbook of metaheuristics (2019) 1–35.
  52. Z. Xinchao, Simulated annealing algorithm with adaptive neighborhood, Applied Soft Computing 11 (2) (2011) 1827–1836.
  53. Z. Michalewicz, Z. Michalewicz, GAs: Why Do They Work?, Springer, 1996.
  54. S.-H. Zhan, J. Lin, Z.-J. Zhang, Y.-W. Zhong, List-based simulated annealing algorithm for traveling salesman problem, Computational intelligence and neuroscience 2016 (2016).
  55. M. E. Aydin, V. Yigit, 12 parallel simulated annealing, Parallel Metaheuristics: A new Class of Algorithms (2005) 267.
  56. L. Ozdamar, Parallel simulated annealing algorithms in global optimization, Journal of global optimization 19 (2001) 27–50.
  57. G. P. Babu, M. N. Murty, Simulated annealing for selecting optimal initial seeds in the k-means algorithm, Indian Journal of Pure and Applied Mathematics 25 (1-2) (1994) 85–94.
  58. R. S. Sexton, R. E. Dorsey, J. D. Johnson, Optimization of neural networks: A comparative analysis of the genetic algorithm and simulated annealing, European Journal of Operational Research 114 (3) (1999) 589–601.
  59. D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
  60. S. Mo, J. Xia, P. Ren, Simulated annealing for neural architecture search, Advances in Neural Information Processing Systems (NeurIPS) (2021).
  61. C.-W. Tsai, C.-H. Hsia, S.-J. Yang, S.-J. Liu, Z.-Y. Fang, Optimizing hyperparameters of deep learning in predicting bus passengers based on simulated annealing, Applied soft computing 88 (2020) 106068.
  62. H.-K. Park, J.-H. Lee, J. Lee, S.-K. Kim, Optimizing machine learning models for granular ndfeb magnets by very fast simulated annealing, Scientific Reports 11 (1) (2021) 3792.
  63. L. Ingber, Very fast simulated re-annealing, Mathematical and computer modelling 12 (8) (1989) 967–973.
  64. M. Fischetti, M. Stringher, https://arxiv.org/abs/1906.01504, Embedded hyper-parameter tuning by simulated annealing (2019), https://doi.org/10.48550/ARXIV.1906.01504, https://arxiv.org/abs/1906.01504.
  65. M. Chen, H. Peng, J. Fu, H. Ling, Autoformer: Searching transformers for visual recognition, in: Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12270–12280.
  66. A. Cutkosky, H. Mehta, Momentum improves normalized sgd, in: International conference on machine learning, PMLR, 2020, pp. 2260–2268.
  67. M. Trzciński, Optimizing the Structures of Transformer Neural Networks using Parallel Simulated Annealing (3 2023).
  68. F. Guzmán, P.-J. Chen, M. Ott, J. Pino, G. Lample, P. Koehn, V. Chaudhary, M. Ranzato, Two new evaluation datasets for low-resource machine translation: Nepali-english and sinhala-english, 2019.
  69. P. Koehn, https://aclanthology.org/2005.mtsummit-papers.11, Europarl: A parallel corpus for statistical machine translation, in: Proceedings of Machine Translation Summit X: Papers, Phuket, Thailand, 2005, pp. 79–86. https://aclanthology.org/2005.mtsummit-papers.11.
  70. J. Tiedemann, Parallel data, tools and interfaces in opus., in: Lrec, Vol. 2012, Citeseer, 2012, pp. 2214–2218.
  71. K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318.
  72. C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in: Text summarization branches out, 2004, pp. 74–81.
  73. M. Post, A call for clarity in reporting bleu scores, arXiv preprint arXiv:1804.08771 (2018).
  74. D. Coughlin, Correlating automated and human assessments of machine translation quality, in: Proceedings of Machine Translation Summit IX: Papers, 2003.
  75. K. Ganesan, Rouge 2.0: Updated and improved measures for evaluation of summarization tasks, arXiv preprint arXiv:1803.01937 (2018).
Language: English
Page range: 267 - 282
Submitted on: Jan 9, 2024
Accepted on: May 26, 2024
Published on: Jun 11, 2024
Published by: SAN University
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2024 Maciej Trzciński, Szymon Łukasik, Amir H. Gandomi, published by SAN University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.