Statistics and Its Interface

Volume 17 (2024)

Number 1

Special issue in honor of Professor Lincheng Zhao

Copy number variation detection based on constraint least squares

Pages: 27 – 37

DOI: https://dx.doi.org/10.4310/23-SII814

Authors

Xiaopu Wang (Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, China)

Xueqin Wang (Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, China)

Aijun Zhang (Department of Statistics and Actuarial Science, University of Hong Kong)

Canhong Wen (Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, China)

Abstract

Copy number variations (CNVs) are a form of structural variation of a DNA sequence, including amplification and deletion of a particular DNA segment on chromosomes. Due to the huge amount of data in every DNA sequence, there is a great need for a computationally fast algorithm that accurately identifies CNVs. In this paper, we formulate the detection of CNVs as a constraint least squares problem and show that circular binary segmentation is a greedy approach to solving this problem. To solve this problem with high accuracy and efficiency, we first derived a necessary optimality condition for its solution based on the alternating minimization technique and then developed a computationally efficient algorithm named AMIAS. The performance of our method was tested on both simulated data and two realworld applications using genomic data from diagnosed primal glioblastoma and the HapMap project. Our proposed method has competitive performance in identifying CNVs with high-throughput genotypic data.

Keywords

alternating minimization induced active set, change point detection, circular binary segmentation, HapMap

Received 23 December 2022

Accepted 14 August 2023

Published 27 November 2023