Epoch-incremental reinforcement learning algorithms

Roman Zajdel

doi:10.2478/amcs-2013-0047

.blurhash-client-img { display: none !important; }

Epoch-incremental reinforcement learning algorithms

International Journal of Applied Mathematics and Computer Science

Volume 23 (2013): Issue 3 (September 2013)

By: Roman Zajdel

Open Access

|Sep 2013

Abstract

In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.

References

Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposiumon Circuits and Systems, Cairo, Egypt, pp. 121-124.
Search in Google Scholar
Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics13(5): 834-847.10.1109/TSMC.1983.6313077
Search in Google Scholar
Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning, Journal of Artificial Intelligence Research2: 287-318.10.1613/jair.135
Search in Google Scholar
Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, TechnicalReport EDI-INF-RR-0176, University of Edinburgh, Edinburgh.
Search in Google Scholar
Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning, Journal of Machine LearningResearch 6: 503-556.
Search in Google Scholar
Forbes, J. R. N. (2002). Reinforcement Learning for AutonomousVehicles, Ph.D. thesis, University of California, Berkeley, CA.
Search in Google Scholar
Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th InternationalConference on Machine Learning, Corvallis, OR, USA, pp. 273-280.
Search in Google Scholar
Kaelbing, L.P., Litman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence4(1): 237-285.10.1613/jair.301
Search in Google Scholar
Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, InternationalJournal of Applied Mathematics and Computer Science21(4): 717-731, DOI: 10.2478/v10006-011-0057-3.10.2478/v10006-011-0057-3
Search in Google Scholar
Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research4: 1107-1149.
Search in Google Scholar
Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedingsof the Sixth International Conference on Simulationof Adaptive Behavior, Cambridge, MA, USA, pp. 333-342.
Search in Google Scholar
Lin, L.-J. (1993). Reinforcement Learning for Robots UsingNeural Networks, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
Search in Google Scholar
Markowska-Kaczmar, U. and Kwaśnicka, H. (2005). NeuralNetworks Applications, Wrocław University of Technology Press, Wrocław, (in Polish).
Search in Google Scholar
Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104.10.1007/BF00993104
Search in Google Scholar
Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journalof Artificial Intelligence Research 11: 241-276.10.1613/jair.613
Search in Google Scholar
Peng, J. and Williams, R. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior1(4): 437-454.10.1177/105971239300100403
Search in Google Scholar
Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical ReportCSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz.
Search in Google Scholar
Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE2005 International Conference on Systems, Man and Cybernetics,Big Island, HI, USA, pp. 3191-3196.
Search in Google Scholar
Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge.
Search in Google Scholar
Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the InternationalConference on Robotics and Automation, Washington,DC, USA, pp. 3404-3410.
Search in Google Scholar
Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh InternationalConference on Machine Learning, Austin, TX, USA, pp. 216-224.
Search in Google Scholar
Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th InternationalWorkshop on Machine Learning, Evanston, IL, USA, pp. 353-357.
Search in Google Scholar
Sutton, R. and Barto, A. (1998). Reinforcement Learning: AnIntroduction, MIT Press, Cambridge, MA. 10.1109/TNN.1998.712192
Search in Google Scholar
Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications36(4): 8032-8039.10.1016/j.eswa.2008.10.056
Search in Google Scholar
Watkins, C. (1989). Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge.
Search in Google Scholar
Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358.10.1007/978-3-642-27645-3_10
Search in Google Scholar
Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning, Journal of MachineLearning Research 7: 877-917.
Search in Google Scholar
Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactionson Systems, Man, and Cybernetics, Part B: Cybernetics33(1): 17-27.10.1109/TSMCB.2003.80817918238153
Search in Google Scholar
Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366. 10.1007/978-3-642-29347-4_42
Search in Google Scholar