MACHINE LEARNING TECHNICAL REPORT ABSTRACTS

	CMU-ML-18-100 Machine Learning Department School of Computer Science, Carnegie Mellon University CMU-ML-18-100 Learning with Staleness Wei Dai March 2018 Ph.D. Thesis CMU-ML-18-100.pdf Keywords: Large Scale Machine Learning, Distributed Optimization Method, Distributed System, Parameter Server A fundamental assumption behind most machine learning (ML) algorithms and analyses is the sequential execution. That is, any update to the ML model can be immediately applied and the new model is always available for the next algorithmic step. This basic assumption, however, can be costly to realize, when the computation is carried out across multiple machines, linked by commodity networks that are usually 10⁴ times slower than the memory speed due to fundamental hardwarelimitations. As a result, concurrent ML computation in the distributed settings often needs to handle delayed updates and perform learning in the presence of staleness. This thesis characterizes learning with staleness from three directions: (1) We extend the theoretical analyses of a number of classical ML algorithms, including stochastic gradient descent, proximal gradient descent on non-convex problems, and Frank-Wolfe algorithms, to explicitly incorporate staleness into their convergence characterizations. (2) We conduct simulation and large-scale distributed experiments to study the empirical effects of staleness on ML algorithms under indeterministic executions. Our results reveal that staleness is a key parameter governing the convergence speed for all considered ML algorithms, with varied ramifications. (3) We design staleness-minimizing parameter server systems by optimizing synchronization methods to effectively reduce the runtime staleness. The proposed optimization of a bounded consistency model utilizes the additional network bandwidths to communicate updates eagerly, relieving users of the burden to tune the staleness level. By minimizing staleness at the framework level, our system stabilizes diverging optimization paths and substantially accelerates convergence across ML algorithms without any modification to the ML programs. 206 pages Thesis Committee: Eric P. Xing (Chair) Gregory R. Ganger Andrew Pavlo Joseph E. Gonzalez (University of California at Berkeley) Manuela M. Veloso, Head, Machine Learning Department Andrew W. Moore, Dean, School of Computer Science

SCS Technical Report Collection School of Computer Science