CMU-ML-13-110
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-13-110

Cortical spatiotemporal plasticity in visual category learning

Yang Xu

August 2013

Ph.D. Thesis

CMU-ML-13-110.pdf


Keywords: Visual category leanring, visually-similar categories, cortical spatiotemporal plasticity, ventral stream, prefrontal cortex, face perception network, hot spots, source locationzation, MEG


Central to human intelligence, visual categorization is a skill that is both remarkably fast and accurate. Although there have been numerous studies in primates regarding how information flows in inferiortemporal (ITC) and prefrontal (PFC) cortices during online discrimination of visual categories, there has been little comparable research on the human cortex. To bridge this gap, this thesis explores how visual categories emerge in prefrontal cortex and the ventral stream, which is the human homologue of ITC. In particular, cortical spatiotemporal plasticity in visual category learning was investigated using behavioral experiments, magnetoencephalographic (MEG) imaging, and statistical machine learning methods.

From a theoretical perspective, scientists from work on non-human primates have posited that PFC plays a primary role in the encoding of visual categories. Much of the extant research in the cognitive neuroscience literature, however, emphasizes the role of the ventral stream. Despite their apparent incompatibility, no study has evaluated these theories in the human cortex by examining the roles of the ventral stream and PFC in online discrimination and acquisition of visual categories. To address this question, I conducted two learning experiments using visually-similar categories as stimuli and recorded cortical response using MEG–a neuroimaging technique that offers a millisecond temporal resolution. Across both experiments, categorical information was found to be available during the period of cortical activity. Moreover, late in the learning process, this information is supplied increasingly in the ventral stream but less so in prefrontal cortex. These findings extend previous theories by suggesting that the ventral stream is crucial to long-term encoding of visual categories when categorical perception is proficient, but that PFC jointly encodes visual categories early on during learning.

From a methodological perspective, MEG is limited as a technique because it can lead to false discoveries in a large number of spatiotemporal regions of interest (ROIs) and, typically, can only coarsely reconstruct the spatial locations of cortical responses. To address the first problem, I developed an excursion algorithm that identified ROIs contiguous in time and space. I then used a permutation test to measure the global statistical significance of the ROIs. To address the second problem, I developed a method that incorporates domain-specific and experimental knowledge in the modeling process. Utilizing faces as a model category, I used a predefined "face" network to constrain the estimation of cortical activities by applying differential shrinkages to regions within and outside this network. I proposed and implemented a trial-partitioning approach which uses trials in the midst of learning for model estimation. Importantly, this renders localizing trials more precise in both the initial and final phases of learning.

In summary, this thesis makes two significant contributions. First, it methodologically improves the way we can characterize the spatiotemporal properties of the human cortex using MEG. Second, it provides a combined theory of visual category learning by incorporating the large time scales that encompass the course of the learning.

163 pages


SCS Technical Report Collection
School of Computer Science