CMU-CS-23-104 Computer Science Department School of Computer Science, Carnegie Mellon University
Low-Bandwidth Remote Sensing of Rare Events Shilpa Anna George Ph.D. Thesis March 2023
Remote Sensing enables knowledge discovery from live data collected by unmanned probes. Planetary exploration, drone surveillance, and underwater sensing are three examples of domains in which remote sensing plays a central role. Near real-time knowledge acquisition of a rare target during such missions is challenging due to three extremes: low bandwidth, novelty of target, and class imbalance. We call the learning that happens in these extreme conditions as Live Learning. This is a new capability at the intersection of edge computing and machine learning. It aims to learn a model for a rare target from unlabeled data captured on distributed probes that are only reachable over a low-bandwidth network. The main contribution of this thesis is the design, implementation, and evalu- ation of Hawk, an interactive model-agnostic live learning system that enables the discovery of rare novel phenomena from a stream of extremely skewed unlabeled visual data capture on weakly-connected remote sensing probes. Hawk is designed to optimize the use of two critical resources: (a) the network bandwidth from the remote source to the human expert, and (b) the expert’s labeling bandwidth. Live Learning embodies a new semi-supervised learning algorithm to train models on-the-fly to discover instances of a target from very few initial labeled data. We show the effectiveness of Hawk by performing extensive validation on three very demanding publicly-available datasets from the domains mentioned above. Each of these datasets was released within the past few years, and has been used in recent ML research publications in its domain. Our experiments show that even at bandwidths as low as 12 kbps and a base rate of 0.1%, a team of 7 probes is able to use Hawk to discover up to 87% of the event instances that could have been discovered using a brute-force model. Such a model is created from advance knowledge, transmission and labeling of all mission data. Our results show 1.5X–2X improvement in recall when Live Learning in Hawk is combined with recent Few Shot Learning algorithms such as SnaTCHer. Our results also show how the use of Diversity Sampling can further improve recall in Hawk.
110 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |