Variational Channel Distribution Pruning and Mixed-Precision Quantization for Neural Network Model Compression

Apr 18, 2022·

WAN ting chang

Chih hung kuo

Li chun fang

· 0 min read

Abstract

This paper presents a model compression frame-work for both pruning and quantizing according to the channel distribution information. We apply the variational inference technique to train a Bayesian deep neural network, in which the parameters are modeled by probability distributions. According to the characteristic of the probability distribution, we can prune the redundant channels and determine the bit-width layer by layer. The experiments conducted on the CIFAR10 dataset with the VGG16 show that the number of parameters can be saved by 58.91x. The proposed compression approach can help implement hardware circuits for efficient edge and mobile computing.

Publication

2022 International Symposium on VLSI Design, Automation and Test (VLSI-DAT)

Last updated on Apr 18, 2022

Authors

Authors

Authors

← Edge-Guided Video Super-Resolution Network Nov 7, 2022