Feature Reinforcement Learning: Part II. Structured MDPs

Hutter, Marcus

Feature Reinforcement Learning: Part II. Structured MDPs

Journal of Artificial General Intelligence

Volume 12 (2021): Issue 1 (January 2021)

By:

Marcus Hutter

Open Access

|Jun 2021

Abstract

The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.

DOI: https://doi.org/10.2478/jagi-2021-0003 | Journal eISSN: 1946-0163

Journal RSS Feed

Language: English

Page range: 71 - 86

Submitted on: Oct 21, 2020

Accepted on: Apr 6, 2021

Published on: Jun 14, 2021

Published by: Artificial General Intelligence Society

In partnership with: Paradigm Publishing Services

Publication frequency: 2 times per year

Keywords:

Reinforcement learning,

dynamic Bayesian network,

structure learning,

feature selection,

global vs. local reward,

explore-exploit,

information & complexity,

rational agents,

partial observability

Related subjects:

Computer sciences,

Artificial intelligence

© 2021 Marcus Hutter, published by Artificial General Intelligence Society
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Previous article Volume 12 (2021): Issue 1 (January 2021)Next article