Journal Articles

DiffGrad: An Optimization Method for Convolutional Neural Networks

Shiv Ram Dubey, Indian Institute of Information Technology, Sri City
Soumendu Chakraborty, Council of Indian Institutes of Information Technology
Swalpa Kumar Roy, Indian Statistical Institute, Kolkata
Snehasis Mukherjee, Indian Institute of Information Technology, Sri City
Satish Kumar Singh, Indian Institute of Information Technology, Allahabad
Bidyut Baran Chaudhuri, Indian Statistical Institute, Kolkata

Article Type

Research Article

Publication Title

IEEE Transactions on Neural Networks and Learning Systems

Abstract

Stochastic gradient descent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal-sized steps for all parameters, irrespective of the gradient behavior. Hence, an efficient way of deep network optimization is to have adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp, and adaptive moment estimation (Adam). These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take advantage of local change in gradients. In this article, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad optimization technique, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters. The convergence analysis is done using the regret bound approach of the online learning framework. In this article, thorough analysis is made over three synthetic complex nonconvex functions. The image categorization experiments are also conducted over the CIFAR10 and CIFAR100 data sets to observe the performance of diffGrad with respect to the state-of-the-art optimizers such as SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam. The residual unit (ResNet)-based convolutional neural network (CNN) architecture is used in the experiments. The experiments show that diffGrad outperforms other optimizers. Also, we show that diffGrad performs uniformly well for training CNN using different activation functions. The source code is made publicly available at https://github.com/shivram1987/diffGrad.

First Page

4500

Last Page

4511

DOI

10.1109/TNNLS.2019.2955777

Publication Date

11-1-2020

Comments

Open Access, Green

Recommended Citation

Dubey, Shiv Ram; Chakraborty, Soumendu; Roy, Swalpa Kumar; Mukherjee, Snehasis; Singh, Satish Kumar; and Chaudhuri, Bidyut Baran, "DiffGrad: An Optimization Method for Convolutional Neural Networks" (2020). Journal Articles. 95.
https://digitalcommons.isical.ac.in/journal-articles/95

Link to Full Text

COinS

Journal Articles

DiffGrad: An Optimization Method for Convolutional Neural Networks

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Browse

Search

Author Corner

Links

Journal Articles

DiffGrad: An Optimization Method for Convolutional Neural Networks

Authors

Article Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Comments

Recommended Citation

Share

Browse

Search

Author Corner

Links