Member-only story
ML Paper Challenge Day 19 — Achieving Human Parity in Conversational Speech Recognition
3 min readApr 30, 2020
Day 19: 2020.04.30
Paper: Achieving Human Parity in Conversational Speech Recognition
Category: Model/Deep Learning/Speech Recognition
Model Architecture
Achieved by combining multiple models!
3 CNNs
- VGG architecture
uses small (3x3) filters, is deeper, and applies up to five convolutional layers before pooling - ResNet architecture
adds highway connections, i.e. a linear transform of each layer’s input to the layer’s output
The only difference is that we apply Batch Normalisation before computing ReLU activations. - LACE (layer-wise context expansion with attention) model
a TDNN variant in which each higher layer is a weighted sum of nonlinear transformations of a window of lower layer frames
-> each higher layer exploits broader context than lower…