CMU-CS-96-114Computer Science Department School of Computer Science, Carnegie Mellon University
CMU-CS-96-114
Barak A. Pearlmutter January 1996 Ph.D. Thesis
Keywords:
Neural networks, control, optimization, gradient descent, Newton
method, convergence rate, Hessian
Usually gradient descent is merely a way to find a minimum, abandoned if a more efficient technique is available. Here we investigate the detailed properties of the gradient descent process, and the related topics of how gradients can be computed, what the limitations on gradient descent are, and how the second-order information that governs the dynamics of gradient descent can be probed. To develop our intuitions, gradient descent is applied to a simple robot arm dynamics compensation problem, using backpropagation on a temporal windows architecture. The results suggest that smooth filters can be easily learned, but that the deterministic gradient descent process can be slow and can exhibit oscillations. Algorithms to compute the gradient of recurrent networks are then surveyed in a general framework, leading to some unifications, a deeper understanding of recurrent networks, and some algorithmic extensions. By regarding deterministic gradient descent as a dynamic system we obtain results concerning its convergence, and a quantitative theory of its behavior when a saturating or "robust" error measure is used. Since second-order structure enters into the dynamics, an efficient exact technique for multiplying a vector by the Hessian of a gradient system is derived and applied. 140 pages
| |

Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |