Member-only story
ML Paper Challenge Day 31, 32 — Learning to learn by gradient descent by gradient descent
3 min readMay 13, 2020
Day 31–32: 2020.05.12–13
Paper: Learning to learn by gradient descent by gradient descent
Category: Model/Optimization
Background
- Vanilla gradient descent only makes use of gradient & ignore second-order information -> Limit its performance
- Many optimisation algorithms, like Adagrad, ADAM, etc, improve the performance of gradient descent. However, only apply to specific classes of problem
- “No Free Lunch Theorems for Optimization” [Wolpert and Macready, 1997] proves that in the setting of combinatorial optimization, no algorithm is able to do better than a random strategy in expectation.
-> specialization to a subclass of problems is in fact the only way that improved performance can be achieved in general. - Any way to make it better? Any way to have a more generalised algorithm…