Tackling Uncertainty in Reinforcement Learning: A Dual Variational Inference Approach for Task and State Estimation

Yang, Zhidong; Liu, Haoyu; Yao, Zongxin; Yao, Hongge

doi:10.2478/ijanmc-2025-0030

References

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.
Search in Google Scholar Back to article
March, J. G. (1994). A primer on decision making: How decisions happen. Free Press.
Search in Google Scholar Back to article
Wang Qingyin, Liu Zhiyong. The Conception, Sort and Mathematical Expression of Uncertainty Information [J]. Operations Research and Management Science, 2001, (4): 9-15.
Search in Google Scholar Back to article
Deng, J. L. (1982). Control problems of grey systems. Systems & Control Letters, 1(5), 288-294.
Search in Google Scholar Back to article
Liu, S., & Lin, Y. (2007). Grey information: Theory and practical applications. Springer Science & Business Media.
Search in Google Scholar Back to article
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338-353.
Search in Google Scholar Back to article
Dubois, D., & Prade, H. (1980). Fuzzy Sets and Systems: Theory and Applications. Academic Press.
Search in Google Scholar Back to article
Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114.
Search in Google Scholar Back to article
Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. arXiv preprint arXiv:1401.4082.
Search in Google Scholar Back to article
Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic Variational Inference. Journal of Machine Learning Research, 14(1), 1303-1347.
Search in Google Scholar Back to article
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859-877.
Search in Google Scholar Back to article
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164-171.
Search in Google Scholar Back to article
Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. CRC press.
Search in Google Scholar Back to article
Bishop, C. M., & Winn, J. M. (2006). Variational Bayesian inference. Journal of Machine Learning Research, 6(Jan), 211-244.
Search in Google Scholar Back to article
Schmidhuber, J. (1987). Evolutionary principles in self-referential learning. On learning how to learn: The meta-meta-. hook. Ph.D. thesis, Tech. Univ. Munich.
Search in Google Scholar Back to article
Bengio, Y., Bengio, S., and Cloutier, J. Learning a synaptic learning rule. Universit ’ e de Montr ’ eal, 1990.
Search in Google Scholar Back to article
Thrun, S., & Pratt, L. (1998). Learning to learn: Introduction and overview. In Learning to learn (pp. 3-17). Springer.
Search in Google Scholar Back to article
Tan Xiaoyang, Zhang Zhe. A Survey of Meta-Reinforcement Learning [J]. Journal of Nanjing University of Aeronautics and Astronautics, 2021, 53(5): 653-663. DOI: 10.16356/j.1005-2615.2021.05.001.
Open DOI Search in Google Scholar Back to article
Duan, Y., et al. “RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning.” arXiv preprint arXiv:1611.02779 (2016)
Search in Google Scholar Back to article
Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (pp. 1126-1135).
Search in Google Scholar Back to article
Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., and Levine,S. Meta-reinforcement learning of structured exploration strategies. In Advances in Neural Information Processing Systems, 2018.
Search in Google Scholar Back to article
Mishra, N., Rohaninejad, M., Chen, X., and Abbeel, P. A simple neural attentive meta-learner. In International Conference on Learning Representations, 2018.
Search in Google Scholar Back to article
Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., and Botvinick, M. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
Search in Google Scholar Back to article
Rakelly, K., Zhou, A., Finn, C., Levine, S., & Quillen, D. (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. In Advances in Neural Information Processing Systems (pp. 11807-11818).
Search in Google Scholar Back to article
Wingate D, Goodman N D, Roy D M, et al. Bayesian policy search with policy priors[C]//Twenty-second international joint conference on artificial intelligence. 2011.
Search in Google Scholar Back to article
Florensa C, Duan Y, Abbeel P. Stochastic neural networks for hierarchical reinforcement learning [J]. arXiv preprint arXiv:1704.03012, 2017.
Search in Google Scholar Back to article
Mnih, A., & Rezende, D. J. (2016). "Variational inference for Monte Carlo objectives." International Conference on Machine Learning, 2016.
Search in Google Scholar Back to article
Gu, S., Lillicrap, T., Ghahramani, Z., & Turner, R. E. (2017). Variational Policy Gradient Algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1194-1203).
Search in Google Scholar Back to article
Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
Search in Google Scholar Back to article
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256.
Search in Google Scholar Back to article
Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4), 279-292.
Search in Google Scholar Back to article
Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. In International Conference on Machine Learning (pp. 1583-1592).
Search in Google Scholar Back to article
Barth-Maron, G., Hoffman, M. W., Budden, D., Dabney, W., Horgan, D., Muldal, A, … & Lillicrap, T. (2018). Distributed Distributional Deterministic Policy Gradients. In International Conference on Learning Representations (ICLR).
Search in Google Scholar Back to article
Todorov, E., Erez, T., and Tassa, Y. Mujoco: A physicsengine for model-based control. In International Conference on Intelligent Robots and Systems (IROS), pp.5026–5033, 2012.
Search in Google Scholar Back to article
Rothfuss, J., Lee, D., Clavera, I., Asfour, T., and Abbeel, P.Promp: Proximal meta-policy search. In International Conference on Learning Representations, 2018.
Search in Google Scholar Back to article

Tackling Uncertainty in Reinforcement Learning: A Dual Variational Inference Approach for Task and State Estimation

References

Paradigm

My account