Have a personal or library account? Click to login

Abstract

In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.

DOI: https://doi.org/10.2478/amcs-2013-0047 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 623 - 635
Published on: Sep 30, 2013
Published by: Sciendo
In partnership with: Paradigm Publishing Services
Publication frequency: 4 times per year

© 2013 Roman Zajdel, published by Sciendo
This work is licensed under the Creative Commons License.