Have a personal or library account? Click to login

References

  1. Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposiumon Circuits and Systems, Cairo, Egypt, pp. 121-124.
  2. Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics13(5): 834-847.10.1109/TSMC.1983.6313077
  3. Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning, Journal of Artificial Intelligence Research2: 287-318.10.1613/jair.135
  4. Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, TechnicalReport EDI-INF-RR-0176, University of Edinburgh, Edinburgh.
  5. Ernst, D., Geurts, P. and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning, Journal of Machine LearningResearch 6: 503-556.
  6. Forbes, J. R. N. (2002). Reinforcement Learning for AutonomousVehicles, Ph.D. thesis, University of California, Berkeley, CA.
  7. Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th InternationalConference on Machine Learning, Corvallis, OR, USA, pp. 273-280.
  8. Kaelbing, L.P., Litman, M.L. and Moore, A.W. (1996). Reinforcement learning: A survey, Journal of Artificial Intelligence4(1): 237-285.10.1613/jair.301
  9. Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, InternationalJournal of Applied Mathematics and Computer Science21(4): 717-731, DOI: 10.2478/v10006-011-0057-3.10.2478/v10006-011-0057-3
  10. Lagoudakis, M. and Parr, R. (2003). Least-squares policy iteration, Journal of Machine Learning Research4: 1107-1149.
  11. Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedingsof the Sixth International Conference on Simulationof Adaptive Behavior, Cambridge, MA, USA, pp. 333-342.
  12. Lin, L.-J. (1993). Reinforcement Learning for Robots UsingNeural Networks, Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.
  13. Markowska-Kaczmar, U. and Kwaśnicka, H. (2005). NeuralNetworks Applications, Wrocław University of Technology Press, Wrocław, (in Polish).
  14. Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104.10.1007/BF00993104
  15. Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journalof Artificial Intelligence Research 11: 241-276.10.1613/jair.613
  16. Peng, J. and Williams, R. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior1(4): 437-454.10.1177/105971239300100403
  17. Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical ReportCSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz.
  18. Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE2005 International Conference on Systems, Man and Cybernetics,Big Island, HI, USA, pp. 3191-3196.
  19. Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge.
  20. Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the InternationalConference on Robotics and Automation, Washington,DC, USA, pp. 3404-3410.
  21. Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh InternationalConference on Machine Learning, Austin, TX, USA, pp. 216-224.
  22. Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th InternationalWorkshop on Machine Learning, Evanston, IL, USA, pp. 353-357.
  23. Sutton, R. and Barto, A. (1998). Reinforcement Learning: AnIntroduction, MIT Press, Cambridge, MA. 10.1109/TNN.1998.712192
  24. Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications36(4): 8032-8039.10.1016/j.eswa.2008.10.056
  25. Watkins, C. (1989). Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge.
  26. Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358.10.1007/978-3-642-27645-3_10
  27. Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning, Journal of MachineLearning Research 7: 877-917.
  28. Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactionson Systems, Man, and Cybernetics, Part B: Cybernetics33(1): 17-27.10.1109/TSMCB.2003.80817918238153
  29. Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366. 10.1007/978-3-642-29347-4_42
DOI: https://doi.org/10.2478/amcs-2013-0047 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X
Language: English
Page range: 623 - 635
Published on: Sep 30, 2013
Published by: Sciendo
In partnership with: Paradigm Publishing Services
Publication frequency: 4 times per year

© 2013 Roman Zajdel, published by Sciendo
This work is licensed under the Creative Commons License.