Have a personal or library account? Click to login
Feature Reinforcement Learning: Part I. Unstructured MDPs Cover

Feature Reinforcement Learning: Part I. Unstructured MDPs

By: Marcus Hutter  
Open Access
|Nov 2011

References

  1. Aarts, E. H. L., and Lenstra, J. K., eds. 1997. Local Search in Combinatorial Optimization. Discrete Mathematics and Optimization. Chichester, England: Wiley-Interscience.
  2. Banzhaff, W.; Nordin, P.; Keller, E.; and Francone, F. 1998. Genetic Programming. San Francisco, CA, U.S.A.: Morgan-Kaufmann.
  3. Barron, A. R. 1985. Logically Smooth Density Estimation. Ph.D. Dissertation, Stanford University.
  4. Berry, D. A., and Fristedt, B. 1985. Bandit Problems: Sequential Allocation of Experiments. London: Chapman and Hall.10.1007/978-94-015-3711-7
  5. Brafman, R. I., and Tennenholtz, M. 2002. R-max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research 3:213-231.
  6. Cover, T. M., and Thomas, J. A. 2006. Elements of Information Theory. Wiley-Intersience, 2nd edition.
  7. Dearden, R.; Friedman, N.; and Andre, D. 1999. Model based Bayesian Exploration. In Proc. 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), 150-159.
  8. Duff, M. 2002. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. Dissertation, Department of Computer Science, University of Massachusetts Amherst.
  9. Dzeroski, S.; de Raedt, L.; and Driessens, K. 2001. Relational Reinforcement Learning. Machine Learning 43:7-52.10.1023/A:1007694015589
  10. Fishman, G. 2003. Monte Carlo. Springer.
  11. Givan, R.; Dean, T.; and Greig, M. 2003. Equivalence Notions and Model Minimization in Markov Decision Processes. Artificial Intelligence 147(1-2):163-223.10.1016/S0004-3702(02)00376-4
  12. Goertzel, B., and Pennachin, C., eds. 2007. Artificial General Intelligence. Springer.10.1007/978-3-540-68677-4
  13. Gordon, G. 1999. Approximate Solutions to Markov Decision Processes. Ph.D. Dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.
  14. Grünwald, P. D. 2007. The Minimum Description Length Principle. Cambridge: The MIT Press.10.7551/mitpress/4643.001.0001
  15. Guyon, I., and Elisseeff, A., eds. 2003. Variable and Feature Selection. JMLR Special Issue: MIT Press.
  16. Hastie, T.; Tibshirani, R.; and Friedman, J. H. 2001. The Elements of Statistical Learning. Springer.10.1007/978-0-387-21606-5
  17. Hutter, M. 2005. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer. 300 pages, http://www.hutter1.net/ai/uaibook.htm. http://www.hutter1.net/ai/uaibook.htm
  18. Hutter, M. 2007. Universal Algorithmic Intelligence: A Mathematical TopDown Approach. In Artificial General Intelligence. Berlin: Springer. 227-290.10.1007/978-3-540-68677-4_8
  19. Hutter, M. 2009a. Feature Dynamic Bayesian Networks. In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume 8, 67-73. Atlantis Press.10.2991/agi.2009.6
  20. Hutter, M. 2009b. Feature Markov Decision Processes. In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), volume 8, 61-66. Atlantis Press.10.2991/agi.2009.30
  21. Hutter, M. 2009c. Feature Reinforcement Learning: Part II: Structured MDPs. In progress. Will extend Hutter (2009a).10.2478/v10229-011-0002-8
  22. Kaelbling, L. P.; Littman, M. L.; and Cassandra, A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101:99-134.10.1016/S0004-3702(98)00023-X
  23. Kearns, M. J., and Singh, S. 1998. Near-optimal reinforcement learning in polynomial time. In Proc. 15th International Conf. on Machine Learning, 260-268. Morgan Kaufmann, San Francisco, CA.
  24. Koza, J. R. 1992. Genetic Programming. The MIT Press.
  25. Kumar, P. R., and Varaiya, P. P. 1986. Stochastic Systems: Estimation, Identification, and Adaptive Control. Englewood Cliffs, NJ: Prentice Hall.
  26. Legg, S., and Hutter, M. 2007. Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines 17(4):391-444.10.1007/s11023-007-9079-x
  27. Legg, S. 2008. Machine Super Intelligence. Ph.D. Dissertation, IDSIA, Lugano.
  28. Li, M., and Vitányi, P. M. B. 2008. An Introduction to Kolmogorov Complexity and its Applications. Berlin: Springer, 3rd edition.10.1007/978-0-387-49820-1
  29. Liang, P., and Jordan, M. 2008. An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators. In Proc. 25th International Conf. on Machine Learning (ICML'08), volume 307, 584-591. ACM.10.1145/1390156.1390230
  30. Liu, J. S. 2002. Monte Carlo Strategies in Scientific Computing. Springer.
  31. Lusena, C.; Goldsmith, J.; and Mundhenk, M. 2001. Nonapproximability Results for Partially Observable Markov Decision Processes. Journal of Artificial Intelligence Research 14:83-103.10.1613/jair.714
  32. MacKay, D. J. C. 2003. Information theory, inference and learning algorithms. Cambridge, MA: Cambridge University Press.
  33. Madani, O.; Hanks, S.; and Condon, A. 2003. On the Undecidability of Probabilistic Planning and Related Stochastic Optimization Problems. Artificial Intelligence 147:5-34.10.1016/S0004-3702(02)00378-8
  34. McCallum, A. K. 1996. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation, Department of Computer Science, University of Rochester.
  35. Ng, A. Y.; Coates, A.; Diel, M.; Ganapathi, V.; Schulte, J.; Tse, B.; Berger, E.; and Liang, E. 2004. Autonomous Inverted Helicopter Flight via Reinforcement Learning. In ISER, volume 21 of Springer Tracts in Advanced Robotics, 363-372. Springer.10.1007/11552246_35
  36. Pankov, S. 2008. A Computational Approximation to the AIXI Model. In Proc. 1st Conference on Artificial General Intelligence, volume 171, 256-267.
  37. Pearlmutter, B. A. 1989. Learning State Space Trajectories in Recurrent Neural Networks. Neural Computation 1(2):263-269.10.1162/neco.1989.1.2.263
  38. Poland, J., and Hutter, M. 2006. Universal Learning of Repeated Matrix Games. In Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn'06), 7-14.
  39. Poupart, P.; Vlassis, N. A.; Hoey, J.; and Regan, K. 2006. An Analytic Solution to Discrete Bayesian Reinforcement Learning. In Proc. 23rd International Conf. on Machine Learning (ICML'06), volume 148, 697-704. Pittsburgh, PA: ACM.10.1145/1143844.1143932
  40. Puterman, M. L. 1994. Markov Decision Processes — Discrete Stochastic Dynamic Programming. New York, NY: Wiley.10.1002/9780470316887
  41. Raedt, L. D.; Hammer, B.; Hitzler, P.; and Maass, W., eds. 2008. Recurrent Neural Networks - Models, Capacities, and Applications, volume 08041 of Dagstuhl Seminar Proceedings. IBFI, Schloss Dagstuhl, Germany.
  42. Ring, M. 1994. Continual Learning in Reinforcement Environments. Ph.D. Dissertation, University of Texas, Austin.
  43. Ross, S., and Pineau, J. 2008. Model-Based Bayesian Reinforcement Learning in Large Structured Domains. In Proc. 24th Conference in Uncertainty in Artificial Intelligence (UAI'08), 476-483. Helsinki: AUAI Press.
  44. Ross, S.; Pineau, J.; Paquet, S.; and Chaib-draa, B. 2008. Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research 2008(32):663-704.10.1613/jair.2567
  45. Russell, S. J., and Norvig, P. 2003. Artificial Intelligence. A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition.
  46. Sanner, S., and Boutilier, C. 2009. Practical Solution Techniques for First-Order MDPs. Artificial Intelligence 173(5-6):748-788.10.1016/j.artint.2008.11.003
  47. Schmidhuber, J. 2004. Optimal Ordered Problem Solver. Machine Learning 54(3):211-254.10.1023/B:MACH.0000015880.99707.b2
  48. Schwarz, G. 1978. Estimating the Dimension of a Model. Annals of Statistics 6(2):461-464.10.1214/aos/1176344136
  49. Singh, S.; Littman, M.; Jong, N.; Pardoe, D.; and Stone, P. 2003. Learning Predictive State Representations. In Proc. 20th International Conference on Machine Learning (ICML'03), 712-719.
  50. Strehl, A. L.; Diuk, C.; and Littman, M. L. 2007. Efficient Structure Learning in Factored-State MDPs. In Proc. 27th AAAI Conference on Artificial Intelligence, 645-650. Vancouver, BC: AAAI Press.
  51. Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.10.1109/TNN.1998.712192
  52. Szita, I., and Lörincz, A. 2008. The Many Faces of Optimism: a Unifying Approach. In Proc. 12th International Conference (ICML 2008), volume 307.
  53. Wallace, C. S. 2005. Statistical and Inductive Inference by Minimum Message Length. Berlin: Springer.
  54. Willems, F. M. J.; Shtarkov, Y. M.; and Tjalkens, T. J. 1997. Reections on the Prize Paper: The Context-Tree Weighting Method: Basic Properties. IEEE Information Theory Society Newsletter 20-27.
  55. Wolpert, D. H., and Macready, W. G. 1997. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation 1(1):67-82.10.1109/4235.585893
Language: English
Page range: 3 - 24
Published on: Nov 23, 2011
Published by: Artificial General Intelligence Society
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2011 Marcus Hutter, published by Artificial General Intelligence Society
This work is licensed under the Creative Commons License.

Volume 1 (2009): Issue 1 (December 2009)