Member-only story
ML Paper Challenge Day 33 — Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
3 min readMay 14, 2020
Day 33: 2020.05.14
Paper: Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
Category: Model/Optimization
Deep Compression
Background:
- Having DNN locally: better privacy, less network bandwidth & real time processing -> But model size too large
- Large model -> High energy consumption
Goal:
- reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices
How:
Step I — Pruning: Prune the networking by removing the redundant connections, keeping only the most informative connections
Also reduce the network complexity and prevent over-fitting
- Learn the connectivity via normal network training
- Prune the small-weight connections: all connections with weights below a threshold are removed from the network
- Retrain the network to learn the final weights for the remaining sparse connections
How it is stored after pruning
- Use compressed sparse row (CSR) or compressed sparse column (CSC) format, which requires 2a + n + 1 numbers
(a: number of non-zero elements, n: number of rows or columns) - To compress further, store the index difference instead of the absolute position, and encode this difference in 8 bits for conv layer and 5 bits for fc layer. When we need…