Member-only story

Day 31–32: 2020.05.12–13
Paper: Learning to learn by gradient descent by gradient descent
Category: Model/Optimization

Background

  • Vanilla gradient descent only makes use of gradient & ignore second-order information -> Limit its performance
  • Many optimisation algorithms, like Adagrad, ADAM, etc, improve the performance of gradient descent. However, only apply to specific classes of problem
  • “No Free Lunch Theorems for Optimization” [Wolpert and Macready, 1997] proves that in the setting of combinatorial optimization, no algorithm is able to do better than a random strategy in expectation.
    -> specialization to a subclass of problems is in fact the only way that improved performance can be achieved in general.
  • Any way to make it better? Any way to have a more generalised algorithm…

--

--

Chun-kit Ho
Chun-kit Ho

Written by Chun-kit Ho

cloud architect@ey | full-stack software engineer | social innovation | certified professional solutions architect in aws & gcp

No responses yet