Communications in Information and Systems

Volume 20 (2020)

Number 3

Mathematical Engineering: A special issue at the occasion of the 85th birthday of Prof. Thomas Kailath

Guest Editors: Ali H. Sayed, Helmut Bölcskei, Patrick Dewilde, Vwani Roychowdhury, and Stephen Shing-Toung Yau

Augment deep BP-parameter learning with local XAI-structural learning

Pages: 319 – 352

DOI: https://dx.doi.org/10.4310/CIS.2020.v20.n3.a3

Authors

S. Y. Kung (Princeton University, Princeton, New Jersey, U.S.A.)

Zejiang Hou (Princeton University, Princeton, New Jersey, U.S.A.)

Abstract

With the explosion of big data, Deep Learning has become the main stream of the machine learning and AI research and development. However, its back-propagation learning paradigm relies on the traditional optimization of externally-supervised metrics, limiting its option to improve the network design structurally. To rectify this problem, we augment the BP-Learning with a structural learning paradigm: XAI-Learning, abbreviated as X-Learning. In order to come up with high-performing learning models with lower structural complexity, X-Learning places its focus on local learning/ ranking of individual neurons in hidden layers. It pioneers the use of backward-broadcast so that the teacher values become directly and locally accessible to all hidden layers, making feasible the so-called output-residual learning. This is conceptually dual to the input-residual learning, advocated by ResNet. The local teacher permits the computation of local optimization metrics (LOM) to facilitate the ranking of hidden neurons. Such a ranking provides a theoretical footing of our structural learning paradigm, based on a notion of structural gradient. This ultimately leads to an evolutionary X-Learning strategy to jointly learn the structure and parameters of the learning models. The purpose is to reduce the network complexity while preserving (if not outright improving) its accuracy performance.

X-Learning can be applied to numerous applications, with either classification or regression formulation. Moreover, it outperforms prominent state-of-the-art approaches. To highlight its superiority, we shall showcase three key comparisons: (a) for ImageNet classification, X-Learning edges the 2018 LPIRC winner; (b) for regression, X-Learning outperforms the 2018 PIRM winner in image super-resolution; and (c) for finger-printing, our hierarchical HSRN CNN outperforms SRGAN by 1.5 dB in PNSR. In addition, to demonstrate its broad spectrum of applications, we present more examples in classification (e.g. ImageNet), in regression (e.g. DIV2K), and in (classification/regression) mixed domain-driven problems (e.g. Oxford Flower Dataset).

Received 31 January 2020

Published 2 December 2020