Member-only story

ML Paper Challenge Day 33 — Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

Chun-kit Ho
3 min readMay 14, 2020

--

Day 33: 2020.05.14
Paper: Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
Category: Model/Optimization

Deep Compression

Background:

  • Having DNN locally: better privacy, less network bandwidth & real time processing -> But model size too large
  • Large model -> High energy consumption

Goal:

  • reduce the storage and energy required to run inference on such large networks so they can be deployed on mobile devices

How:

Step I — Pruning: Prune the networking by removing the redundant connections, keeping only the most informative connections

Also reduce the network complexity and prevent over-fitting

  1. Learn the connectivity via normal network training
  2. Prune the small-weight connections: all connections with weights below a threshold are removed from the network
  3. Retrain the network to learn the final weights for the remaining sparse connections

How it is stored after pruning

  1. Use compressed sparse row (CSR) or compressed sparse column (CSC) format, which requires 2a + n + 1 numbers
    (a: number of non-zero elements, n: number of rows or columns)
  2. To compress further, store the index difference instead of the absolute position, and encode this difference in 8 bits for conv layer and 5 bits for fc layer. When we need…

--

--

Chun-kit Ho
Chun-kit Ho

Written by Chun-kit Ho

cloud architect@ey | full-stack software engineer | social innovation | certified professional solutions architect in aws & gcp

No responses yet