@device(postscript)
@libraryfile(Mathematics10)
@libraryfile(Accents)
@style(fontfamily=timesroman,fontscale=11)
@pagefooting(immediate, left "@c",
center "@c",
right "@c")
@heading(An Investigation of the Gradient Descent Process in Neural Networks)
@heading(CMU-CS-96-114)
@center(@b(Barak A. Pearlmutter))
@center(January 1996 - Ph.D. Thesis)
@center(FTP: Unavailable)
@blankspace(1)
@begin(text)
Usually gradient descent is merely a way to find a minimum, abandoned if
a more efficient technique is available. Here we investigate the
detailed properties of the gradient descent process, and the related topics
of how gradients can be computed, what the limitations on gradient descent
are, and how the second-order information that governs the dynamics of
gradient descent can be probed. To develop our intuitions, gradient descent
is applied to a simple robot arm dynamics compensation problem using
backpropagation on a temporal windows architecture. The results suggest
that smooth filters can be easily learned, but that the deterministic gradient
descent process can be slow and can exhibit oscillations. Algorithms to
compute the gradient of recurrent networks are then surveyed in a general
framework, leading to some unifications, a deeper understanding of
recurrent networks, and some algorithmic extensions. By regarding
deterministic gradient descent as a dynamic system we obtain results
concerning its convergence, and a quantitative theory of its behavior when
a saturating or "robust" error measure is used. Since second-order
structure enters into the dynamics, an efficient exact technique for
multiplying a vector by the Hessian of a gradient system is derived and
applied.
@blankspace(2line)
@begin(transparent,size=10)
@b(Keywords:@ )@c
@end(transparent)
@blankspace(1line)
@end(text)
@flushright(@b[(148 pages)])