|
CMU-CS-01-168
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-01-168
A Subspace Approach to Layer Extraction and Its Application
to Patch-Based Structure from Motion and Video Compression
Qifa Ke, Takeo Kanade
December 2001
(Available August 2003)
CMU-CS-01-168.ps
CMU-CS-01-168.pdf
(Color images)
Keywords: Supspace, layer extraction, layered representation,
structure from motion, patch-based SFM, video compression, video
representation, motion segmentation
Representing videos with layers has important applications such as
video compression, motion analysis, 3D modeling and rendering. This
thesis proposes a subspace approach to extracting layers from video
by taking advantage of the fact that homographies induced by planar
patches in the scene form a low dimensional linear subspace. In the
subspace, layers in the input images are mapped onto well-defined
clusters, and can be reliably identified by a standard clustering
algorithm (e.g., mean-shift). Global optimality is achieved since
both spatial and temporal redundancy are simultaneously taken into
account, and noise can be effectively reduced by enforcing the
subspace constraint. The existence of subspace also enables outlier
detection, making the subspace computation robust. Based on the
subspace constraint, we propose a patch-based scheme for affine
structure from motion (SFM), which recovers the plane equation of
each planar patch in the scene, as well as the camera epipolar geometry.
We propose two approaches to patch-based SFM: (1) factorization approach;
and (2) layer based approach. Patch-based SFM provides a compact video
representation that can be used to construct a high quality texture map
for each layer.
We plan to apply our approach to generating Video Object
Planes (VOPs) defined by MPEG-4 standard. VOP generation is a critical
but unspecified step in MPEG-4 standard. Our motion model for each VOP
consists of a global planar motion and localized deformations, which
has a closed-form solution. Our goals are: (1) combining different
low level cues to model VOPs; and (2) extracting VOPs that undergo
more complicated motion (non-planar or non-rigid).
37 pages
|