|
|
CMU-CS-25-120 Computer Science Department School of Computer Science, Carnegie Mellon University
owards Unified Interfaces for Generalist Yueqi Song M.S. Thesis August 2025
Recently, large language models (LLMs) have enabled agents that can perceive, reason, and act in increasingly complex environments. Yet today's agents remain constrained by the interfaces they rely on, hampering generalization. This master thesis advances the goal of a unified agent framework. Examining web agents, we found that web browsing agents, though intuitive humans as they simulate human behaviours by browsing the web, are less effective and efficient. Thus, we proposed an API-based web agent that calls APIs through code generation, and demonstrated superior performance compared to browsing agents. Building on this, we further proposed a hybrid web agent that could inter-leave API calling and web browsing, broadening the agent's interface and allowing it to operate more effectively and efficiently in diverse environments. Beyond web agents, we aim to extend the unified interfaces to generalist agents across diverse environments as a future work. Alongside a unified framework, strong reasoning abilities are crucial for agents to make correct decisions, plan, and execute tasks based on users' goals. We thus introduced VisualPuzzles, a benchmark that could evaluate models' multimodal reasoning abilities in a knowledge-light environment, which could provide guidance on the future development of models with strong multimodal reasoning capabilities. Last but not the least, to serve people around the world, agents need to understand and generate multilingual contents. Thus, we proposed and trained Pangea, a multilingual model that achieved SOTA results on multilingual benchmarks. Together, these contributions pave a path towards unified interfaces for generalist agent in diverse environments, providing the conceptual, empirical, and engineering foundations for the next generation of generalist AI agents 191 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
|
|
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |
|