CMU-HCII-23-108
Human-Computer Interaction Institute
School of Computer Science, Carnegie Mellon University



CMU-HCII-23-108

Multimodal and Social Modeling of Client-Therapist Interaction

Alexandria K. Vail

December 2023

Ph.D. Thesis

CMU-HCII-23-108.pdf


Keywords: Multimodal behavior, social behavior, client-therapist interaction, dyadic inter action, verbal communication, nonverbal communication, entrainment, representation learning, structural equation modeling


Productive interaction between client and therapist is central to successful therapy, but is often hindered by substantial challenges along the way. During each therapy session, the therapist is constantly assessing the client's symptoms through their behavior. These behaviors may be expressed through multiple channels: verbal spoken language and nonverbal "body language". Therefore, the first challenge we focus on is the multimodal aspect of behavior. Another fundamental challenge during therapy is the development and maintenance of a collaborative relationship between the client and the therapist. This relationship develops over the course of several weeks, requiring longitudinal study within and across multiple sessions. Thus, the second challenge we focus on is the social aspect of behavior. Finally, we acknowledge the challenge of modeling such complex behavior over time in a manner that is useful for prediction tasks, especially in settings with rich but small datasets. We explore hybrid modeling: the combination of data-driven methods frequently used in computational modeling, such as neural networks, with theory-driven methods often preferred in psychology and statistics, such as structural equation modeling. While neural networks allow us to learn complex patterns and make predictions, structural equation modeling allows us to create graph models based on prior domain knowledge or hypotheses.

We pursue the challenge of multimodal behavior dynamics through two dimensions: verbal behavior and nonverbal behavior. This work addresses the difficulty of evaluating client symptoms across multiple modalities. The verbal component of behavior conveys information not only through high-level message intent, but also through more detailed aspects of speech, such as word choice and sentence structure. We present a multifaceted analysis of the client's spoken language as it relates to their psychological health, including a detailed consideration of lexical, structural, and disfluency components of their speech. The nonverbal component of behavior includes behaviors such as facial expressions, gestures, or eye gaze patterns. In particular, we study the ever-prevalent nonverbal signal of gaze aversion patterns and how they provide information about the severity of the client's symptoms.

We pursue the challenge of social behavior dynamics in two aspects: turn-taking behavior and entrainment behavior. This work investigates the growth and decline of the collaborative relationship between the client and therapist over the course of multiple dyadic interactions. Through turn-taking behavior, interaction participants attempt to maintain the flow of conversation. We recount a detailed analysis of turn-taking behaviors and mirroring of head gestures as they signal the quality of the collaboration between client and therapist. Through entrainment behavior, participants synchronize their behavior patterns, whether consciously or subconsciously. We present a modeling of stylistic and content entrainment over multiple sessions as it relates to the client-therapist relationship.

Finally, we pursue the challenge of modeling these complex behavior patterns using hybrid modeling, combining data-driven and theory-driven methods for computational behavior modeling. Our objective is to improve the performance of data-driven predictive models, particularly in situations with limited data, by incorporating domain knowledge through theory-driven methods. This thesis specifically focuses on integrating structural equation modeling into traditional computational models. We present a unique approach to representation learning: the process of identifying meaningful patterns in data. Our approach utilizes structural equation models to create valuable and meaningful representations for use in larger machine learning models. We further refine this method to support end-to-end learning, including simultaneous training of both data-driven neural networks and theory-driven structural equation models. We demonstrate that integrating structural equation modeling into a neural network during the training process can often improve the predictive performance of the model.

178 pages

Thesis Committee:
Louis-Philippe Morency (Chair)
Robert Kraut
Adam Perer
Jeffrey F. Cohn (University of Pittsburgh)

Brad A. Myers, Head, Human-Computer Interaction Institute
Martial Hebert, Dean, School of Computer Science



Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu