Day 18: 2020.04.29
Paper: Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Category: Model/Deep Learning/Speech Recognition

Model Architecture

Input: log-spectrograms of power normalised audio clips, calculated on 20ms windows
Output: alphabet of each language
Inference: CTC models paired a with language model trained on a bigger corpus of text

Batch Normalisation for Deep RNNs

Objective: To train networks using gradient descent when the size and depth increases

--

--

Chun-kit Ho
Chun-kit Ho

Written by Chun-kit Ho

cloud architect@ey | full-stack software engineer | social innovation | certified professional solutions architect in aws & gcp

No responses yet