CMU-CS-25-106
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-25-106

Automating Real-to-Sim Traffic Scene
Generation with Large Language Models

Alex Tianyi Xu

M.S. Thesis

April 2025

CMU-CS-25-106.pdf


Keywords: Large Language Models, Self-Driving, Code Generation

Simulation-based evaluation of autonomous driving (AD) offers a scalable and reproducible alternative to real-world testing, yet current scenario generation methods often prioritize coverage over realism. This thesis presents an exploration in enabling open-source models to automatically generate realistic traffic scenarios from natural language descriptions of real-world crashes. I conducted a series of experiments to investigate the effectiveness of different inference-time methods in this domain, and proposed a framework for leveraging these approaches to create a dataset that can be used to finetune open-source models with fewer parameters. I found that open-source models can effectively learn from synthetic data generated by closed-source LLMs in the simulator code generation domain: an open-source model finetuned on this new dataset achieves a 87.3% success rate in generating syntactically correct scenarios while also achieving a higher ROUGE-L F1 score for high-level behavioral alignment with the description. This work demonstrates the feasibility of LLM-assisted scenario reconstruction at scale and lays the foundation for open, realistic, and automated evaluation pipelines for AD algorithms.

55 pages

Thesis Committee:
Chenyan Xiong (Chair)
Reid Simmons

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu