A Unified Knowledge Distillation Framework for Deep Directed Graphical Models

Abstract

Knowledge distillation (KD) is a technique that transfers the knowledge from a large teacher network to a small student network. It has been widely applied to many different tasks, such as model compression and federated learning. However, existing KD methods fail to generalize to general deep directed graphical models (DGMs) with arbitrary layers of random variables. We refer by deep DGMs to DGMs whose conditional distributions are parameterized by deep neural networks. In this work, we propose a novel unified knowledge distillation framework for deep DGMs on various applications. Specifically, we leverage the reparameterization trick to hide the intermediate latent variables, resulting in a compact DGM. Then we develop a surrogate distillation loss to reduce error accumulation through multiple layers of random variables. Moreover, we present the connections between our method and some existing knowledge distillation approaches. The proposed framework is evaluated on four applications: data-free hierarchical variational autoencoder (VAE) compression, data-free variational recurrent neural networks (VRNN) compression, data-free Helmholtz Machine (HM) compression, and VAE continual learning. The results show that our distillation method outperforms the baselines in data-free model compression tasks. We further demonstrate that our method significantly improves the performance of KD-based continual learning for data generation. Our source code is available at https: //github.com/YizhuoChen99/KD4DGM-CVPR.

Publication
2023 Conference on Computer Vision and Pattern Recognition
Zhe Zeng
Zhe Zeng
Ph.D. student in AI

My research goal is to enable machine learning models to incorporate diverse forms of constraints into probabilistic inference and learning in a principled way, by combining machine learning (probabilistic modeling, neuro-symbolic AI, Bayesian deep learning) and formal methods.