Have a personal or library account? Click to login
Tackling Uncertainty in Reinforcement Learning: A Dual Variational Inference Approach for Task and State Estimation Cover

Tackling Uncertainty in Reinforcement Learning: A Dual Variational Inference Approach for Task and State Estimation

Open Access
|Sep 2025

References

  1. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.
  2. March, J. G. (1994). A primer on decision making: How decisions happen. Free Press.
  3. Wang Qingyin, Liu Zhiyong. The Conception, Sort and Mathematical Expression of Uncertainty Information [J]. Operations Research and Management Science, 2001, (4): 9-15.
  4. Deng, J. L. (1982). Control problems of grey systems. Systems & Control Letters, 1(5), 288-294.
  5. Liu, S., & Lin, Y. (2007). Grey information: Theory and practical applications. Springer Science & Business Media.
  6. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338-353.
  7. Dubois, D., & Prade, H. (1980). Fuzzy Sets and Systems: Theory and Applications. Academic Press.
  8. Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114.
  9. Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. arXiv preprint arXiv:1401.4082.
  10. Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic Variational Inference. Journal of Machine Learning Research, 14(1), 1303-1347.
  11. Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859-877.
  12. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164-171.
  13. Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. CRC press.
  14. Bishop, C. M., & Winn, J. M. (2006). Variational Bayesian inference. Journal of Machine Learning Research, 6(Jan), 211-244.
  15. Schmidhuber, J. (1987). Evolutionary principles in self-referential learning. On learning how to learn: The meta-meta-. hook. Ph.D. thesis, Tech. Univ. Munich.
  16. Bengio, Y., Bengio, S., and Cloutier, J. Learning a synaptic learning rule. Universit ’ e de Montr ’ eal, 1990.
  17. Thrun, S., & Pratt, L. (1998). Learning to learn: Introduction and overview. In Learning to learn (pp. 3-17). Springer.
  18. Tan Xiaoyang, Zhang Zhe. A Survey of Meta-Reinforcement Learning [J]. Journal of Nanjing University of Aeronautics and Astronautics, 2021, 53(5): 653-663. DOI: 10.16356/j.1005-2615.2021.05.001.
  19. Duan, Y., et al. “RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning.” arXiv preprint arXiv:1611.02779 (2016)
  20. Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (pp. 1126-1135).
  21. Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., and Levine,S. Meta-reinforcement learning of structured exploration strategies. In Advances in Neural Information Processing Systems, 2018.
  22. Mishra, N., Rohaninejad, M., Chen, X., and Abbeel, P. A simple neural attentive meta-learner. In International Conference on Learning Representations, 2018.
  23. Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., and Botvinick, M. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
  24. Rakelly, K., Zhou, A., Finn, C., Levine, S., & Quillen, D. (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. In Advances in Neural Information Processing Systems (pp. 11807-11818).
  25. Wingate D, Goodman N D, Roy D M, et al. Bayesian policy search with policy priors[C]//Twenty-second international joint conference on artificial intelligence. 2011.
  26. Florensa C, Duan Y, Abbeel P. Stochastic neural networks for hierarchical reinforcement learning [J]. arXiv preprint arXiv:1704.03012, 2017.
  27. Mnih, A., & Rezende, D. J. (2016). "Variational inference for Monte Carlo objectives." International Conference on Machine Learning, 2016.
  28. Gu, S., Lillicrap, T., Ghahramani, Z., & Turner, R. E. (2017). Variational Policy Gradient Algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1194-1203).
  29. Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
  30. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256.
  31. Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4), 279-292.
  32. Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. In International Conference on Machine Learning (pp. 1583-1592).
  33. Barth-Maron, G., Hoffman, M. W., Budden, D., Dabney, W., Horgan, D., Muldal, A, … & Lillicrap, T. (2018). Distributed Distributional Deterministic Policy Gradients. In International Conference on Learning Representations (ICLR).
  34. Todorov, E., Erez, T., and Tassa, Y. Mujoco: A physicsengine for model-based control. In International Conference on Intelligent Robots and Systems (IROS), pp.5026–5033, 2012.
  35. Rothfuss, J., Lee, D., Clavera, I., Asfour, T., and Abbeel, P.Promp: Proximal meta-policy search. In International Conference on Learning Representations, 2018.
Language: English
Page range: 101 - 115
Published on: Sep 30, 2025
Published by: Xi’an Technological University
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Zhidong Yang, Haoyu Liu, Zongxin Yao, Hongge Yao, published by Xi’an Technological University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.