Computer Science Department
School of Computer Science, Carnegie Mellon University
Anomaly Detection and Modeling of Trajectories
Junier B. Oliva
The recent boom in the availability and use of geolocation technologies has created a great need to understand datasets of trajectories. Moreover, trajectory data is collected in a wide range of different domains including: meteorology, zoology, and business. However, trajectories have several intrinsic attributes that make them difficult to analyze. First, their time-series nature makes applying traditional techniques challenging. Secondly, most datasets contain trajectories of many points, making for a high-dimensional modeling problem. Lastly, there are several competing notions of similarity/difference in trajectories. In order to deal with these challenges, this thesis proposes several methods using statistics and machine learning (ML) that provide a deep understanding of trajectory datasets. In particular, this thesis brings forth methods to perform anomaly detection, density estimation, and spatial graphical models.
In general, an anomaly is an instance that is abnormal or unlikely based on the rest of the dataset. This thesis develops a technique for detecting anomalous trajectories in a dataset in an unsupervised fashion using support vector machines (SVMs) and various spatial representations of trajectories. This thesis will also focus on techniques for density estimation, that is providing a likelihood for each trajectory in a dataset. In order to effectively perform density estimation on trajectories, a combination of a Markovian assumption on the independence of the next position of a trajectory given its previous positions and kernel density estimation (KDE) is explored. Lastly, this thesis explores spatial graphical models. Undirected graphical models detail the conditional independence structure of a set of random variables. Given sparsity assumptions, this concept is used to build graphical models for indicator variables that have spatial locations associated with them, indicating if an agent has come near the corresponding location.
In order to effectively test the methods developed, experiments were ran using the following two real world datasets: one dataset consists of AIStracked shipping vessels in the English Channel; the other dataset contains every Atlantic Ocean tropical storm and hurricane track from 1949 to 2011. Overall, the methods presented were found empirically to provide a rich analysis of trajectory datasets.