Member-only story

ML Paper Challenge Day 18 — Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

3 min readApr 30, 2020

Papers with Code - Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese…

paperswithcode.com

Day 18: 2020.04.29
Paper: Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Category: Model/Deep Learning/Speech Recognition

Model Architecture

Input: log-spectrograms of power normalised audio clips, calculated on 20ms windows
Output: alphabet of each language
Inference: CTC models paired a with language model trained on a bigger corpus of text

Batch Normalisation for Deep RNNs

Objective: To train networks using gradient descent when the size and depth increases

2 Ways to apply:

Insert a BatchNorm transformation, B(·), immediately before every non-linearity -> Not effective
Batch normalise only the vertical connections
For each hidden unit, compute the mean and variance statistics over all items in the mini-batch over the length of the sequence.

SortaGrad

Objective: Make training more stable. Accelerates training and results in better generalisation

Use the length of the utterance as a heuristic for difficulty and train on the shorter (easier) utterances first.

In the first training epoch,
iterate through mini-batches in the training set in increasing order of the length of the longest utterance in the mini-batch.

After the first epoch,
training reverts back to a random order over mini-batches

ML Paper Challenge Day 18 — Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Papers with Code - Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese…

Model Architecture

Batch Normalisation for Deep RNNs

SortaGrad

GRU vs LSTM

Written by Chun-kit Ho

No responses yet