Member-only story

Day 28: 2020.05.09
Paper: On the importance of initialization and momentum in deep learning
Category: Model/Deep Learning/Optimization

  • both the initialization and the momentum are crucial
  • poorly initialized networks cannot be trained with momentum
  • well-initialized networks perform markedly worse when the momentum is absent or poorly tuned
  • carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods.

Momentum and Nesterov’s Accelerated Gradient

  • Nesterov’s Accelerated Gradient (NAG) can be viewed as a simple modification of Classical Momentum (CM)which increases stability, and can sometimes provide a…

--

--

Chun-kit Ho
Chun-kit Ho

Written by Chun-kit Ho

cloud architect@ey | full-stack software engineer | social innovation | certified professional solutions architect in aws & gcp

No responses yet