References
- Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124-1131.
- March, J. G. (1994). A primer on decision making: How decisions happen. Free Press.
- Wang Qingyin, Liu Zhiyong. The Conception, Sort and Mathematical Expression of Uncertainty Information [J]. Operations Research and Management Science, 2001, (4): 9-15.
- Deng, J. L. (1982). Control problems of grey systems. Systems & Control Letters, 1(5), 288-294.
- Liu, S., & Lin, Y. (2007). Grey information: Theory and practical applications. Springer Science & Business Media.
- Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338-353.
- Dubois, D., & Prade, H. (1980). Fuzzy Sets and Systems: Theory and Applications. Academic Press.
- Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. arXiv preprint arXiv:1312.6114.
- Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic Backpropagation and Approximate Inference in Deep Generative Models. arXiv preprint arXiv:1401.4082.
- Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic Variational Inference. Journal of Machine Learning Research, 14(1), 1303-1347.
- Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859-877.
- Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164-171.
- Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. CRC press.
- Bishop, C. M., & Winn, J. M. (2006). Variational Bayesian inference. Journal of Machine Learning Research, 6(Jan), 211-244.
- Schmidhuber, J. (1987). Evolutionary principles in self-referential learning. On learning how to learn: The meta-meta-. hook. Ph.D. thesis, Tech. Univ. Munich.
- Bengio, Y., Bengio, S., and Cloutier, J. Learning a synaptic learning rule. Universit ’ e de Montr ’ eal, 1990.
- Thrun, S., & Pratt, L. (1998). Learning to learn: Introduction and overview. In Learning to learn (pp. 3-17). Springer.
- Tan Xiaoyang, Zhang Zhe. A Survey of Meta-Reinforcement Learning [J]. Journal of Nanjing University of Aeronautics and Astronautics, 2021, 53(5): 653-663. DOI: 10.16356/j.1005-2615.2021.05.001.
- Duan, Y., et al. “RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning.” arXiv preprint arXiv:1611.02779 (2016)
- Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning (pp. 1126-1135).
- Gupta, A., Mendonca, R., Liu, Y., Abbeel, P., and Levine,S. Meta-reinforcement learning of structured exploration strategies. In Advances in Neural Information Processing Systems, 2018.
- Mishra, N., Rohaninejad, M., Chen, X., and Abbeel, P. A simple neural attentive meta-learner. In International Conference on Learning Representations, 2018.
- Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., Blundell, C., Kumaran, D., and Botvinick, M. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
- Rakelly, K., Zhou, A., Finn, C., Levine, S., & Quillen, D. (2019). Efficient off-policy meta-reinforcement learning via probabilistic context variables. In Advances in Neural Information Processing Systems (pp. 11807-11818).
- Wingate D, Goodman N D, Roy D M, et al. Bayesian policy search with policy priors[C]//Twenty-second international joint conference on artificial intelligence. 2011.
- Florensa C, Duan Y, Abbeel P. Stochastic neural networks for hierarchical reinforcement learning [J]. arXiv preprint arXiv:1704.03012, 2017.
- Mnih, A., & Rezende, D. J. (2016). "Variational inference for Monte Carlo objectives." International Conference on Machine Learning, 2016.
- Gu, S., Lillicrap, T., Ghahramani, Z., & Turner, R. E. (2017). Variational Policy Gradient Algorithms. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1194-1203).
- Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
- Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3-4), 229-256.
- Watkins, C.J.C.H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4), 279-292.
- Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing Function Approximation Error in Actor-Critic Methods. In International Conference on Machine Learning (pp. 1583-1592).
- Barth-Maron, G., Hoffman, M. W., Budden, D., Dabney, W., Horgan, D., Muldal, A, … & Lillicrap, T. (2018). Distributed Distributional Deterministic Policy Gradients. In International Conference on Learning Representations (ICLR).
- Todorov, E., Erez, T., and Tassa, Y. Mujoco: A physicsengine for model-based control. In International Conference on Intelligent Robots and Systems (IROS), pp.5026–5033, 2012.
- Rothfuss, J., Lee, D., Clavera, I., Asfour, T., and Abbeel, P.Promp: Proximal meta-policy search. In International Conference on Learning Representations, 2018.