Member-only story
ML Paper Challenge Day 28 — On the importance of initialization and momentum in deep learning
3 min readMay 10, 2020
Day 28: 2020.05.09
Paper: On the importance of initialization and momentum in deep learning
Category: Model/Deep Learning/Optimization
- both the initialization and the momentum are crucial
- poorly initialized networks cannot be trained with momentum
- well-initialized networks perform markedly worse when the momentum is absent or poorly tuned
- carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods.
Momentum and Nesterov’s Accelerated Gradient
- Nesterov’s Accelerated Gradient (NAG) can be viewed as a simple modification of Classical Momentum (CM)which increases stability, and can sometimes provide a…