Barzilai-Borwein Step Size for Stochastic Gradient Descent
Refereed conference paper presented and published in conference proceedings
Officially Accepted for Publication


Full Text

Other information
Abstract

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization methods, the common practice in SGD is either to use a diminishing step size, or to tune a step size by hand, which can be time consuming in practice.

In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD and its variant: stochastic variance reduced gradient (SVRG) method, which leads to two algorithms: SGD-BB and SVRG-BB. We prove that SVRG-BB converges linearly for strongly convex objective functions. As a by-product, we prove the linear convergence result of SVRG with Option I proposed in [10], whose convergence result has been missing in the literature. Numerical experiments on standard data sets show that the performance of SGD-BB and SVRG-BB is comparable to and sometimes even better than SGD and SVRG with best-tuned step sizes, and is superior to some advanced SGD variants.

Acceptance Date12/08/2016
All Author(s) ListConghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian
Name of ConferenceThe Thirtieth Annual Conference on Neural Information Processing Systems (NIPS)
Start Date of Conference05/12/2016
End Date of Conference10/12/2016
Place of ConferenceBarcelona
Country/Region of ConferenceSpain
Proceedings TitleAdvances in Neural Information Processing Systems 29
Year2017
PublisherCurran Associates, Inc.
Pages685 - 693
LanguagesEnglish-United States

Last updated on 2018-21-01 at 21:37