Have a personal or library account? Click to login
An Active Exploration Method for Data Efficient Reinforcement Learning Cover

An Active Exploration Method for Data Efficient Reinforcement Learning

Open Access
|Jul 2019

References

  1. Ahmed, N.A. and Gokhale, D. (1989). Entropy expressions and their estimators for multivariate distributions, IEEE Transactions on Information Theory35(3): 688–692.10.1109/18.30996
  2. Bagnell, J.A. and Schneider, J.G. (2001). Autonomous helicopter control using reinforcement learning policy search methods, IEEE International Conference on Robotics and Automation, Seoul, South Korea, Vol. 2, pp. 1615–1620.
  3. Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S. and Levine, S. (2017). Combining model-based and model-free updates for trajectory-centric reinforcement learning, arXiv:1703.03078.
  4. Deisenroth, M.P., Fox, D. and Rasmussen, C.E. (2015). Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions on Pattern Analysis and Machine Intelligence37(2): 408–423.10.1109/TPAMI.2013.21826353251
  5. Deisenroth, M. and Rasmussen, C.E. (2011). PILCO: A model-based and data-efficient approach to policy search, Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, pp. 465–472.
  6. Ebert, F., Finn, C., Lee, A.X. and Levine, S. (2017). Self-supervised visual planning with temporal skip connections, arXiv:1710.05268.
  7. Fabisch, A. and Metzen, J.H. (2014). Active contextual policy search, Journal of Machine Learning Research15(1): 3371–3399.
  8. Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion, arXiv:1610.00696.10.1109/ICRA.2017.7989324
  9. Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S. and Abbeel, P. (2015). Deep spatial autoencoders for visuomotor learning, arXiv:1509.06113.10.1109/ICRA.2016.7487173
  10. Gruslys, A., Azar, M.G., Bellemare, M.G. and Munos, R. (2017). The reactor: A sample-efficient actor-critic architecture, arXiv:1704.04651.
  11. Hayes, G. and Demiris, J. (1994). A robot controller using learning by imitation, International Symposium on Intelligent Robotic Systems676(5): 1257–1274.
  12. Levine, S., Finn, C., Darrell, T. and Abbeel, P. (2016). End-to-end training of deep visuomotor policies, Journal of Machine Learning Research17(1): 1334–1373.
  13. Nagabandi, A., Kahn, G., Fearing, R.S. and Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, arXiv:1708.02596.10.1109/ICRA.2018.8463189
  14. Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. and Liang, E. (2006). Autonomous inverted helicopter flight via reinforcement learning, in M.H. Ang Jr. and O. Khatib (Eds.), Experimental Robotics IX, Springer, Berlin/Heidelberg, pp. 363–372.10.1007/11552246_35
  15. Pan, Y. and Theodorou, E.A. (2014). Probabilistic differential dynamic programming, Advances in Neural Information Processing Systems3: 1907–1915.
  16. Pan, Y., Theodorou, E.A. and Kontitsis, M. (2015). Sample efficient path integral control under uncertainty, Advances in Neural Information Processing Systems2015: 2314–2322.
  17. Price, B. and Boutilier, C. (2003). Accelerating reinforcement learning through implicit imitation, Journal of Artificial Intelligence Research19: 569–629.10.1613/jair.898
  18. Silver, D., Sutton, R.S. and Müller, M. (2008). Sample-based learning and search with permanent and transient memories, International Conference on Machine Learning, Helsinki, Finland, pp. 968–975.10.1145/1390156.1390278
  19. Sutton, R.S. (1988). Learning to predict by the methods of temporal differences, Machine Learning3(1): 9–44.10.1007/BF00115009
  20. Sutton, R.S. (1991). Dyna, an integrated architecture for learning, planning, and reacting, ACM Sigart Bulletin2(4): 160–163.10.1145/122344.122377
DOI: https://doi.org/10.2478/amcs-2019-0026 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 351 - 362
Submitted on: Jul 18, 2018
Accepted on: Jan 31, 2019
Published on: Jul 4, 2019
Published by: University of Zielona Góra
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2019 Dongfang Zhao, Jiafeng Liu, Rui Wu, Dansong Cheng, Xianglong Tang, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.