Have a personal or library account? Click to login
Combining Local and Global Direct Derivative-Free Optimization for Reinforcement Learning Cover

Combining Local and Global Direct Derivative-Free Optimization for Reinforcement Learning

Open Access
|Mar 2013

References

  1. 1. Barash, D.. A genetic search in policy space for solving Markov decision processes, - In: AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information, 1999.
  2. 2. Baxter, J., P. Bartlett. Direct gradient-based reinforcement learning, - In: Proceedings of IEEE International Symposium on Circuits and Systems, Vol. 3, IEEE, 2000, 271-274.
  3. 3. Brachetti, P., M. De Felice Ciccoli, G. Di Pillo, S. Lucidi. A new version of the Price’s algorithm for global optimization, Journal of Global Optimization, Vol. 10, No 2, 1997, 165-184.10.1023/A:1008250020656
  4. 4. Carpin, S., M. Lewis, J. Wang, S. Balakirsky, C. Scrapper. Bridging the gap between simulation and reality in urban search and rescue, Robocup 2006: Robot Soccer World Cup X, 1-12.10.1007/978-3-540-74024-7_1
  5. 5. Conn, A., K. Scheinberg, L. Vicente. Introduction to derivative-free optimization, Vol. 8, Society for Industrial Mathematics, 2009.10.1137/1.9780898718768
  6. 6. Dorigo, M., M. Birattari, T. Stutzle. Ant colony optimization, IEEE Computational Intelligence Magazine, Vol. 1, No 4, 2006, 28-39.10.1109/CI-M.2006.248054
  7. 7. Glover, F., M. Laguna. Tabu search, Vol. 1, Springer, 1998.10.1007/978-1-4615-6089-0_1
  8. 8. Goldberg, D.. Genetic algorithms in search, optimization, and machine learning, Addisonwesley, 1989.
  9. 9. Gomez, F., J. Schmidhuber, R. Miikkulainen. Efficient Non-Linear Control through Neuroevolution, - In: Proceedings of the European Conference on Machine Learning, Springer, Berlin, 2006, 654-662.10.1007/11871842_64
  10. 10. Horst, R., P. Pardalos, N. Thoai. Introduction to global optimization, Springer, 2000.10.1007/978-1-4615-0015-5
  11. 11. Jakobi, N., P. Husbands, I. Harvey. Noise and the reality gap: The use of simulation in evolutionary robotics, Advances in artificial life, 704-720.10.1007/3-540-59496-5_337
  12. 12. Kakade, S.. A natural policy gradient, Advances in neural information processing systems, Vol. 14, 2001, 1531-1538.
  13. 13. Kirkpatrick, S., C. Gelatt Jr, M. Vecchi. Optimization by simulated annealing, Science, Vol. 220, No 4598, 1983, 671-680.10.1126/science.220.4598.67117813860
  14. 14. Kober, J., J. Peters. Policy search for motor primitives in robotics, Machine learning, Vol. 84, No 1, 2011, 171-203.10.1007/s10994-010-5223-6
  15. 15. Kormushev, P., D. G. Caldwell. Simultaneous Discovery of Multiple Alternative Optimal Policies by Reinforcement Learning, - In: IEEE International Conference on Intelligent Systems (IS 2012), 2012.10.1109/IS.2012.6335136
  16. 16. Lucidi, S., M. Sciandrone. On the global convergence of derivative-free methods for unconstrained optimization, SIAM Journal of Optimization, Vol. 13, No 1, 2002, 97-116.10.1137/S1052623497330392
  17. 17. Peters, J., S. Schaal. Policy gradient methods for robotics, - In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2006, 2219-2225.10.1109/IROS.2006.282564
  18. 18. Peters, J., S. Schaal. Reinforcement learning by reward-weighted regression for operational space control, - In: Proceedings of the 24th international conference on Machine learning, ACM, 2007, 745-750.10.1145/1273496.1273590
  19. 19. Peters, J., S. Vijayakumar, S. Schaal. Natural Actor-Critic, - In: Proceedings of the 16th European Conference on Machine Learning (ECML), 2005, 280-291.10.1007/11564096_29
  20. 20. Price, W.. Global optimization by controlled random search, Journal of Optimization Theory and Applications, Vol. 40, No 3, 1983, 333-348.10.1007/BF00933504
  21. 21. Ribas, D., N. Palomeras, P. Ridao, M. Carreras, A. Mallios. Girona 500 AUV, from survey to intervention, IEEE/ASME Transactions on Mechatronics, Vol. 17, No 1, 2012, 46-53.10.1109/TMECH.2011.2174065
  22. 22. Sutton, R., D. McAllester, S. Singh, Y. Mansour. Policy gradient methods for reinforcement learning with function approximation, Advances in neural information processing systems, Vol. 12, No 22.
  23. 23. Theodorou, E., J. Buchli, S. Schaal. A generalized path integral control approach to reinforcement learning, The Journal of Machine Learning Research, Vol. 9999, 2010, 3137-3181.
  24. 24. Torczon, V., et al .. On the convergence of pattern search algorithms, SIAM Journal on optimization, Vol. 7, No 1, 1997, 1-25.10.1137/S1052623493250780
  25. 25. Torn, A., A. Zilinska s. Global Optimization, Springer, 1989.
  26. 26. Williams, R.. Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning, Vol. 8, No 3, 1992, 229-256.10.1007/BF00992696
DOI: https://doi.org/10.2478/cait-2012-0021 | Journal eISSN: 1314-4081 | Journal ISSN: 1311-9702
Language: English
Page range: 53 - 65
Published on: Mar 22, 2013
Published by: Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2013 Matteo Leonetti, Petar Kormushev, Simone Sagratella, published by Bulgarian Academy of Sciences, Institute of Information and Communication Technologies
This work is licensed under the Creative Commons License.