CMU-CS-24-121
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-121

Fine-tuning Does Not Remove Language Model Capabilities

Suhas Kotha

M.S. Thesis

May 2024

CMU-CS-24-121.pdf


Keywords: Fine-tuning, language models, catastrophic forgetting, jailbreaking

Fine-tuned language models catastrophically forget tasks outside the fine-tuning distribution. On the flip side, fine-tuning is often used to remove unsafe behavior such as toxic content generation. Both this failure mode and success require that fine-tuning removes a capability from the model. We show that fine-tuning does not remove such capabilities, which is encouraging for reducing forgetting, and pessimistic for defending jailbreaks.

Via synthetic experiments, we hypothesize that language models implicitly infer the task of the prompt and that fine-tuning skews this inference towards tasks in the fine-tuning distribution. To test this, we propose Conjugate Prompting, which artificially makes the task look farther from the fine-tuning distribution while requiring the same capability, and we find that this recovers in-context learning abilities lost via instruction tuning and natural reasoning capability lost during code fine-tuning. More concerningly, conjugate prompting can recover harmful content generation suppressed by safety fine-tuning in chatbots like ChatGPT. Can algorithms like fine-tuning and input defenses reliably remove unwanted behavior? We find that the best fine-tuning and input defenses can not enforce one of the simplest, perfectly defined behaviors: do not output the word "purple".

Both forgetting and jailbreaking demonstrate that fine-tuning currently does not fully remove/change model capabilities. We propose future directions on improving capabilities by investigating length generalization and reliably removing capabilities via machine unlearning.

94 pages

Thesis Committee:
Aditi Raghunathan (Chair)
Daphne Ippolito

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu