Member-only story

ML Paper Challenge Day 33 — Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

3 min readMay 14, 2020

Papers with Code - Deep Compression: Compressing Deep Neural Networks with Pruning, Trained…

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded…

paperswithcode.com

Day 33: 2020.05.14
Paper: Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
Category: Model/Optimization

Deep Compression

Background:

Having DNN locally: better privacy, less network bandwidth & real time processing -> But model size too large
Large model -> High energy consumption

Goal:

reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices

How:

Step I — Pruning: Prune the networking by removing the redundant connections, keeping only the most informative connections

Also reduce the network complexity and prevent over-fitting

Learn the connectivity via normal network training
Prune the small-weight connections: all connections with weights below a threshold are removed from the network
Retrain the network to learn the final weights for the remaining sparse connections

How it is stored after pruning

Use compressed sparse row (CSR) or compressed sparse column (CSC) format, which requires 2a + n + 1 numbers
(a: number of non-zero elements, n: number of rows or columns)
To compress further, store the index difference instead of the absolute position, and encode this difference in 8 bits for conv layer and 5 bits for fc layer. When we need…