Member-only story

ML Paper Challenge Day 31, 32 — Learning to learn by gradient descent by gradient descent

3 min readMay 13, 2020

Papers with Code - Learning to learn by gradient descent by gradient descent

paperswithcode.com

Day 31–32: 2020.05.12–13
Paper: Learning to learn by gradient descent by gradient descent
Category: Model/Optimization

Vanilla gradient descent only makes use of gradient & ignore second-order information -> Limit its performance
Many optimisation algorithms, like Adagrad, ADAM, etc, improve the performance of gradient descent. However, only apply to specific classes of problem
“No Free Lunch Theorems for Optimization” [Wolpert and Macready, 1997] proves that in the setting of combinatorial optimization, no algorithm is able to do better than a random strategy in expectation.
-> specialization to a subclass of problems is in fact the only way that improved performance can be achieved in general.
Any way to make it better? Any way to have a more generalised algorithm…