CMU-CS-22-100 Computer Science Department School of Computer Science, Carnegie Mellon University
Tackling Challenges in Modern Reinforcement Learning: Ruosong Wang Ph.D. Thesis February 2022
Modern reinforcement learning (RL) methods have achieved phenomenal success on various applications. However, reinforcement learning problems with large state spaces and long planning horizons remain challenging due to the excessive sample complexity burden, and our current understanding is rather limited for such problems. Moreover, there are important problems in RL that cannot be addressed by the classical frameworks. In this thesis, we study the above issues to build a better understanding of modern RL methods. This thesis is divided into the following three parts: Part I: RL with Long Planning Horizons. Learning to plan for long horizons is a central challenge in RL, and a fundamental question is to understand how the difficulty of RL scales as the horizon increases. In the first part of this thesis, we show that tabular reinforcement learning is possible with a sample complexity that is completely independent of the planning horizon, and therefore, long horizon RL is no more difficult than short horizon RL, at least in a minimax sense. Part II: RL with Large State Spaces. In modern RL methods, function approximation schemes are deployed to deal with large state spaces. Empirically, combining RL algorithms with neural networks for feature extraction has led to tremendous success on various tasks. However, these methods often require a large amount of samples to learn a good policy, and it is unclear if there are fundamental statistical limits on such methods. In the second part of this thesis, we study necessary and sufficient conditions on the representation power of the features that permit sample-efficient reinforcement learning, through both theoretical analysis and experiments. Part III: RL in Other Settings. Classical reinforcement learning paradigms aim to maximize the cumulative reward when the agent has access to the reward values. Despite being able to formulate a large family of sequential decision-making problems, there are important applications that cannot be casted into the classical frameworks. In the third part of this thesis, we study two new settings, the reward-free exploration setting and planning with general objective functions, that generalize the classical frameworks.
175 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |