Statistics and Its Interface

Volume 17 (2024)

Number 2

Special issue on statistical learning of tensor data

Multi-way overlapping clustering by Bayesian tensor decomposition

Pages: 219 – 230

DOI: https://dx.doi.org/10.4310/23-SII790

Authors

Zhuofan Wang (Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing, China)

Fangting Zhou (Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing, China; and Department of Statistics, Texas A&M University, College Station, Tx., U.S.A.)

Kejun He (Center for Applied Statistics, Institute of Statistics and Big Data, Renmin University of China, Beijing, China)

Yang Ni (Department of Statistics, Texas A&M University, College Station, Tx., U.S.A.)

Abstract

The development of modern sequencing technologies provides great opportunities to measure gene expression of multiple tissues from different individuals. The three-way variation across genes, tissues, and individuals makes statistical inference a challenging task. In this paper, we propose a Bayesian multi-way clustering approach to cluster genes, tissues, and individuals simultaneously. The proposed model adaptively trichotomizes the observed data into three latent categories and uses a Bayesian hierarchical construction to further decompose the latent variables into lower-dimensional features, which can be interpreted as overlapping clusters. With a Bayesian nonparametric prior, i.e., the Indian buffet process, our method determines the cluster number automatically. The utility of our approach is demonstrated through simulation studies and an application to the Genotype-Tissue Expression (GTEx) RNA-seq data. The clustering result reveals some interesting findings about depression-related genes in human brain, which are also consistent with biological domain knowledge. The detailed algorithm and some numerical results are available in the online Supplementary Material, available at $\href{https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}{ https://intlpress.com/site/pub/files/supp/sii/2024/0017/0002/sii-2024-0017-0002-s001.pdf}.

Keywords

Bayesian nonparametric prior, gene expression data, Indian buffet process, low-rank tensor, mixture model

2010 Mathematics Subject Classification

Primary 62H30. Secondary 62F15.

The first two authors contributed equally to this work.

Received 27 September 2022

Accepted 9 March 2023

Published 1 February 2024