Feature Reinforcement Learning: Part II. Structured MDPs

Hutter, Marcus

doi:10.2478/jagi-2021-0003

References

Bertsekas, D. P., and Tsitsiklis, J. N. 1996. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific.
Search in Google Scholar Back to article
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer.
Search in Google Scholar Back to article
Boutilier, C.; Dean, T.; and Hanks, S. 1999. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. Journal of Artificial Intelligence Research 11:1–94.10.1613/jair.575
Search in Google Scholar Back to article
Chow, C. K., and Liu, C. N. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory IT-14(3):462–467.10.1109/TIT.1968.1054142
Search in Google Scholar Back to article
Dean, T., and Kanazawa, K. 1989. A Model for Reasoning about Persistence and Causation. Computational Intelligence 5(3):142–150.10.1111/j.1467-8640.1989.tb00324.x
Search in Google Scholar Back to article
Friedman, N.; Geiger, D.; and Goldszmid, M. 1997. Bayesian Network Classifiers. Machine Learning 29(2):131–163.10.1023/A:1007465528199
Search in Google Scholar Back to article
Gaglio, M. 2007. Universal Search. Scholarpedia 2(11):2575.10.4249/scholarpedia.2575
Search in Google Scholar Back to article
Goertzel, B., and Pennachin, C., eds. 2007. Artificial General Intelligence. Springer.10.1007/978-3-540-68677-4
Search in Google Scholar Back to article
Grünwald, P. D. 2007. The Minimum Description Length Principle. Cambridge: The MIT Press.10.7551/mitpress/4643.001.0001
Search in Google Scholar Back to article
Guestrin, C.; Koller, D.; Parr, R.; and Venkataraman, S. 2003. Efficient Solution Algorithms for Factored MDPs. Journal of Artificial Intelligence Research (JAIR) 19:399–468.10.1613/jair.1000
Search in Google Scholar Back to article
Hutter, M. 2003. Optimality of Universal Bayesian Prediction for General Loss and Alphabet. Journal of Machine Learning Research 4:971–1000.
Search in Google Scholar Back to article
Hutter, M. 2005. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer.
Search in Google Scholar Back to article
Hutter, M. 2009a. Feature Dynamic Bayesian Networks. In Proc. 2nd Conf. on Artificial General Intelligence (AGI’09), volume 8, 67–73. Atlantis Press.10.2991/agi.2009.6
Search in Google Scholar Back to article
Hutter, M. 2009b. Feature Reinforcement Learning: Part I: Unstructured MDPs. Journal of Artificial General Intelligence 1:3–24.10.2478/v10229-011-0002-8
Search in Google Scholar Back to article
Kaelbling, L. P.; Littman, M. L.; and Cassandra, A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101:99–134.10.1016/S0004-3702(98)00023-X
Search in Google Scholar Back to article
Kearns, M., and Koller, D. 1999. Efficient Reinforcement Learning in Factored MDPs. In Proc. 16th International Joint Conference on Artificial Intelligence (IJCAI-99), 740–747. San Francisco: Morgan Kaufmann.
Search in Google Scholar Back to article
Koller, D., and Parr, R. 1999. Computing Factored Value Functions for Policies in Structured MDPs,. In Proc. 16st International Joint Conf. on Artificial Intelligence (IJCAI’99), 1332–1339.
Search in Google Scholar Back to article
Koller, D., and Parr, R. 2000. Policy Iteration for Factored MDPs. In Proc. 16th Conference on Uncertainty in Artificial Intelligence (UAI-00), 326–334. San Francisco, CA: Morgan Kaufmann.
Search in Google Scholar Back to article
Legg, S., and Hutter, M. 2007. Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines 17(4):391–444.10.1007/s11023-007-9079-x
Search in Google Scholar Back to article
Lewis, D. D. 1998. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proc. 10th European Conference on Machine Learning (ECML’98), 4–15. Chemnitz, DE: Springer.10.1007/BFb0026666
Search in Google Scholar Back to article
Littman, M. L.; Sutton, R. S.; and Singh, S. P. 2001. Predictive Representations of State. In Advances in Neural Information Processing Systems, volume 14, 1555–1561. MIT Press.
Search in Google Scholar Back to article
McCallum, A. K. 1996. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation, Department of Computer Science, University of Rochester.
Search in Google Scholar Back to article
Puterman, M. L. 1994. Markov Decision Processes — Discrete Stochastic Dynamic Programming. New York, NY: Wiley.10.1002/9780470316887
Search in Google Scholar Back to article
Ross, S.; Pineau, J.; Paquet, S.; and Chaib-draa, B. 2008. Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research 2008(32):663–704.10.1613/jair.2567
Search in Google Scholar Back to article
Russell, S. J., and Norvig, P. 2003. Artificial Intelligence. A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall, 2nd edition.
Search in Google Scholar Back to article
Singh, S.; Littman, M.; Jong, N.; Pardoe, D.; and Stone, P. 2003. Learning Predictive State Representations. In Proc. 20th International Conference on Machine Learning (ICML’03), 712– 719.
Search in Google Scholar Back to article
Singh, S. P.; James, M. R.; and Rudary, M. R. 2004. Predictive State Representations: A New Theory for Modeling Dynamical Systems. In Proc. 20th Conference in Uncertainty in Artificial Intelligence (UAI’04), 512–518. Banff, Canada: AUAI Press.
Search in Google Scholar Back to article
Strehl, A. L.; Diuk, C.; and Littman, M. L. 2007. Efficient Structure Learning in Factored-State MDPs. In Proc. 27th AAAI Conference on Artificial Intelligence, 645–650. Vancouver, BC: AAAI Press.
Search in Google Scholar Back to article
Sutton, R. S., and Barto, A. G. 2018. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 2nd edition.
Search in Google Scholar Back to article
Szita, I., and Lörincz, A. 2008. The Many Faces of Optimism: a Unifying Approach. In Proc. 12th International Conference (ICML 2008), volume 307, 1048–1055.
Search in Google Scholar Back to article

Feature Reinforcement Learning: Part II. Structured MDPs

References

Paradigm

My account