Member-only story

ML Paper Challenge Day 2 — Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

2 min readApr 13, 2020

Papers with Code - Well-Read Students Learn Better: On the Importance of Pre-training Compact…

Recent developments in natural language representations have been accompanied by large and expensive models that…

paperswithcode.com

Day 2: 2020.04.13
Paper: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Category: Model/NLP

Goal: “to build accurate (NLP) models which fit a given memory and latency budget”

Scope: “Since an exhaustive search over this space is impractical, we fix the model architecture to bidirectional Transformers, known to be suitable for a wide range of NLP tasks”

Idea

Teacher: “a highly accurate but large model for an end task, that does not meet the resource constraints”. Use BERT-BASE and BERT-LARGE respectively.
Student: “compact models that satisfy resource constraints”
Pre-trained Distillation (PD) :
> Step 1 — Pre-training on Unlabelled language model data: “A compact model is trained with a masked LM…

ML Paper Challenge Day 2 — Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Papers with Code - Well-Read Students Learn Better: On the Importance of Pre-training Compact…

Recent developments in natural language representations have been accompanied by large and expensive models that…

Written by Chun-kit Ho

No responses yet