An Active Exploration Method for Data Efficient Reinforcement Learning

Dongfang Zhao; Jiafeng Liu; Rui Wu; Dansong Cheng; Xianglong Tang

doi:10.2478/amcs-2019-0026

.blurhash-client-img { display: none !important; }

An Active Exploration Method for Data Efficient Reinforcement Learning

International Journal of Applied Mathematics and Computer Science

Volume 29 (2019): Issue 2 (June 2019)

By: Dongfang Zhao, Jiafeng Liu, Rui Wu, Dansong Cheng and Xianglong Tang

Open Access

|Jul 2019

Ahmed, N.A. and Gokhale, D. (1989). Entropy expressions and their estimators for multivariate distributions, IEEE Transactions on Information Theory35(3): 688–692.10.1109/18.30996
Search in Google Scholar Back to article
Bagnell, J.A. and Schneider, J.G. (2001). Autonomous helicopter control using reinforcement learning policy search methods, IEEE International Conference on Robotics and Automation, Seoul, South Korea, Vol. 2, pp. 1615–1620.
Search in Google Scholar Back to article
Chebotar, Y., Hausman, K., Zhang, M., Sukhatme, G., Schaal, S. and Levine, S. (2017). Combining model-based and model-free updates for trajectory-centric reinforcement learning, arXiv:1703.03078.
Search in Google Scholar Back to article
Deisenroth, M.P., Fox, D. and Rasmussen, C.E. (2015). Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions on Pattern Analysis and Machine Intelligence37(2): 408–423.10.1109/TPAMI.2013.21826353251
Search in Google Scholar Back to article
Deisenroth, M. and Rasmussen, C.E. (2011). PILCO: A model-based and data-efficient approach to policy search, Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA, pp. 465–472.
Search in Google Scholar Back to article
Ebert, F., Finn, C., Lee, A.X. and Levine, S. (2017). Self-supervised visual planning with temporal skip connections, arXiv:1710.05268.
Search in Google Scholar Back to article
Fabisch, A. and Metzen, J.H. (2014). Active contextual policy search, Journal of Machine Learning Research15(1): 3371–3399.
Search in Google Scholar Back to article
Finn, C. and Levine, S. (2016). Deep visual foresight for planning robot motion, arXiv:1610.00696.10.1109/ICRA.2017.7989324
Search in Google Scholar Back to article
Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S. and Abbeel, P. (2015). Deep spatial autoencoders for visuomotor learning, arXiv:1509.06113.10.1109/ICRA.2016.7487173
Search in Google Scholar Back to article
Gruslys, A., Azar, M.G., Bellemare, M.G. and Munos, R. (2017). The reactor: A sample-efficient actor-critic architecture, arXiv:1704.04651.
Search in Google Scholar Back to article
Hayes, G. and Demiris, J. (1994). A robot controller using learning by imitation, International Symposium on Intelligent Robotic Systems676(5): 1257–1274.
Search in Google Scholar Back to article
Levine, S., Finn, C., Darrell, T. and Abbeel, P. (2016). End-to-end training of deep visuomotor policies, Journal of Machine Learning Research17(1): 1334–1373.
Search in Google Scholar Back to article
Nagabandi, A., Kahn, G., Fearing, R.S. and Levine, S. (2017). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, arXiv:1708.02596.10.1109/ICRA.2018.8463189
Search in Google Scholar Back to article
Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E. and Liang, E. (2006). Autonomous inverted helicopter flight via reinforcement learning, in M.H. Ang Jr. and O. Khatib (Eds.), Experimental Robotics IX, Springer, Berlin/Heidelberg, pp. 363–372.10.1007/11552246_35
Search in Google Scholar Back to article
Pan, Y. and Theodorou, E.A. (2014). Probabilistic differential dynamic programming, Advances in Neural Information Processing Systems3: 1907–1915.
Search in Google Scholar Back to article
Pan, Y., Theodorou, E.A. and Kontitsis, M. (2015). Sample efficient path integral control under uncertainty, Advances in Neural Information Processing Systems2015: 2314–2322.
Search in Google Scholar Back to article
Price, B. and Boutilier, C. (2003). Accelerating reinforcement learning through implicit imitation, Journal of Artificial Intelligence Research19: 569–629.10.1613/jair.898
Search in Google Scholar Back to article
Silver, D., Sutton, R.S. and Müller, M. (2008). Sample-based learning and search with permanent and transient memories, International Conference on Machine Learning, Helsinki, Finland, pp. 968–975.10.1145/1390156.1390278
Search in Google Scholar Back to article
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences, Machine Learning3(1): 9–44.10.1007/BF00115009
Search in Google Scholar Back to article
Sutton, R.S. (1991). Dyna, an integrated architecture for learning, planning, and reacting, ACM Sigart Bulletin2(4): 160–163.10.1145/122344.122377
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/amcs-2019-0026 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X

Journal RSS Feed

Language: English

Page range: 351 - 362

Submitted on: Jul 18, 2018

Accepted on: Jan 31, 2019

Published on: Jul 4, 2019

Published by: University of Zielona Góra

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

reinforcement learning,

PILCO,

Related subjects:

© 2019 Dongfang Zhao, Jiafeng Liu, Rui Wu, Dansong Cheng, Xianglong Tang, published by University of Zielona Góra
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 29 (2019): Issue 2 (June 2019)