Atiya, A.F., Parlos, A.G. and Ingber, L. (2003). A reinforcement learning method based on adaptive simulated annealing, Proceedings of the 46th International Midwest Symposiumon Circuits and Systems, Cairo, Egypt, pp. 121-124.
Barto, A., Sutton, R. and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning problem, IEEE Transactions on Systems, Man, and Cybernetics13(5): 834-847.10.1109/TSMC.1983.6313077
Cichosz, P. (1995). Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning, Journal of Artificial Intelligence Research2: 287-318.10.1613/jair.135
Crook, P. and Hayes, G. (2003). Learning in a state of confusion: Perceptual aliasing in grid world navigation, TechnicalReport EDI-INF-RR-0176, University of Edinburgh, Edinburgh.
Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in UCT, Proceedings of the 24th InternationalConference on Machine Learning, Corvallis, OR, USA, pp. 273-280.
Krawiec, K., Jaśkowski, W.G. and Szubert, M.G. (2011). Evolving small-board Go players using coevolutionary temporal difference learning with archives, InternationalJournal of Applied Mathematics and Computer Science21(4): 717-731, DOI: 10.2478/v10006-011-0057-3.10.2478/v10006-011-0057-3
Lanzi, P. (2000). Adaptive agents with reinforcement learning and internal memory, From Animals to Animats 6: Proceedingsof the Sixth International Conference on Simulationof Adaptive Behavior, Cambridge, MA, USA, pp. 333-342.
Moore, A. and Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning 13(1): 103-130, DOI: 10.1007/BF00993104.10.1007/BF00993104
Moriarty, D., Schultz, A. and Grefenstette, J. (1999). Evolutionary algorithms for reinforcement learning, Journalof Artificial Intelligence Research 11: 241-276.10.1613/jair.613
Reynolds, S. (2002). Experience stack reinforcement learning for off-policy control, Technical ReportCSRP-02-1, University of Birmingham, Birmingham, ftp://ftp.cs.bham.ac.uk/pub/tech-reports/2002/CSRP-02-01.ps.gz.
Riedmiller, M. (2005). Neural reinforcement learning to swing-up and balance a real pole, Proceedings of the IEEE2005 International Conference on Systems, Man and Cybernetics,Big Island, HI, USA, pp. 3191-3196.
Rummery, G. and Niranjan, M. (1994). On-line q-learning using connectionist systems, Technical Report CUED/FINFENG/TR 166, Cambridge University, Cambridge.
Smart, W. and Kaelbing, L. (2002). Effective reinforcement learning for mobile robots, Proceedings of the InternationalConference on Robotics and Automation, Washington,DC, USA, pp. 3404-3410.
Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh InternationalConference on Machine Learning, Austin, TX, USA, pp. 216-224.
Sutton, R. (1991). Planning by incremental dynamic programming, Proceedings of the 8th InternationalWorkshop on Machine Learning, Evanston, IL, USA, pp. 353-357.
Vanhulsel, M., Janssens, D. and Vanhoof, K. (2009). Simulation of sequential data: An enhanced reinforcement learning approach, Expert Systems with Applications36(4): 8032-8039.10.1016/j.eswa.2008.10.056
Whiteson, S. (2012). Evolutionary computation for reinforcement learning, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, Berlin, pp. 325-358.10.1007/978-3-642-27645-3_10
Ye, C., Young, N.H.C. and Wang, D. (2003). A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance, IEEE Transactionson Systems, Man, and Cybernetics, Part B: Cybernetics33(1): 17-27.10.1109/TSMCB.2003.80817918238153
Zajdel, R. (2012). Fuzzy epoch-incremental reinforcement learning algorithm, in L. Rutkowski, M. Korytkowski, R. Scherer, R. Tadeusiewicz, L.A. Zadeh and J.M. Zurada (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer-Verlag, Berlin/Heidelberg, pp. 359-366. 10.1007/978-3-642-29347-4_42