Member-only story
ML Paper Challenge Day 2 — Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
2 min readApr 13, 2020
Day 2: 2020.04.13
Paper: Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Category: Model/NLP
Goal: “to build accurate (NLP) models which fit a given memory and latency budget”
Scope: “Since an exhaustive search over this space is impractical, we fix the model architecture to bidirectional Transformers, known to be suitable for a wide range of NLP tasks”
Idea
- Teacher: “a highly accurate but large model for an end task, that does not meet the resource constraints”. Use BERT-BASE and BERT-LARGE respectively.
- Student: “compact models that satisfy resource constraints”
- Pre-trained Distillation (PD) :
> Step 1 — Pre-training on Unlabelled language model data: “A compact model is trained with a masked LM…