CMU-CS-22-100
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-22-100

Tackling Challenges in Modern Reinforcement Learning:
Long Planning Horizons and Large State Spaces

Ruosong Wang

Ph.D. Thesis

February 2022

CMU-CS-22-100.pdf


Keywords: Reinforcement Learning, Sample Complexity, Function Approximation

Modern reinforcement learning (RL) methods have achieved phenomenal success on various applications. However, reinforcement learning problems with large state spaces and long planning horizons remain challenging due to the excessive sample complexity burden, and our current understanding is rather limited for such problems. Moreover, there are important problems in RL that cannot be addressed by the classical frameworks. In this thesis, we study the above issues to build a better understanding of modern RL methods.

This thesis is divided into the following three parts:

Part I: RL with Long Planning Horizons. Learning to plan for long horizons is a central challenge in RL, and a fundamental question is to understand how the difficulty of RL scales as the horizon increases. In the first part of this thesis, we show that tabular reinforcement learning is possible with a sample complexity that is completely independent of the planning horizon, and therefore, long horizon RL is no more difficult than short horizon RL, at least in a minimax sense.

Part II: RL with Large State Spaces. In modern RL methods, function approximation schemes are deployed to deal with large state spaces. Empirically, combining RL algorithms with neural networks for feature extraction has led to tremendous success on various tasks. However, these methods often require a large amount of samples to learn a good policy, and it is unclear if there are fundamental statistical limits on such methods. In the second part of this thesis, we study necessary and sufficient conditions on the representation power of the features that permit sample-efficient reinforcement learning, through both theoretical analysis and experiments.

Part III: RL in Other Settings. Classical reinforcement learning paradigms aim to maximize the cumulative reward when the agent has access to the reward values. Despite being able to formulate a large family of sequential decision-making problems, there are important applications that cannot be casted into the classical frameworks. In the third part of this thesis, we study two new settings, the reward-free exploration setting and planning with general objective functions, that generalize the classical frameworks.

175 pages

Thesis Committee:
Ruslan Salakhutdinov (Chair)
Zico Kolter
Aarti Singh
Sham M. Kakade (Harvard University)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu