Communications in Information and Systems

Volume 21 (2021)

Number 1

Fast random algorithms for manifold based optimization in reconstructing 3D chromosomal structures

Pages: 1 – 29

DOI: https://dx.doi.org/10.4310/CIS.2021.v21.n1.a1

Authors

Duan Chen (Department of Mathematics and Statistics, University of North Carolina, Charlotte, N.C., U.S.A.)

Shaoyu Li (Department of Mathematics and Statistics, University of North Carolina, Charlotte, N.C., U.S.A.)

Xue Wang (Department of Health Science Research, Mayo Clinic, Jacksonville, Florida, U.S.A.)

Kelin Xia (School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore)

Abstract

Inferring 3D structures of chromatins and chromosomes from experimental data is critical to understand biological functions of DNAs, and various computational algorithms have been developed in the past a few years. All algorithms are subject to the challenge of high computational cost if the number of loci in the target chromosome is large. In this paper, we tackle this difficulty and develop a set of fast algorithms for the manifold-based optimization (MBO) model, which is a popular method to reconstruct 3D chromosomal structures from Hi-C data. The proposed algorithms are based on random projection theory. We first approximate the column (row) space of the original data in a reduced dimension. Then interpolative decomposition technique is used to decompose the data matrix into a product of two matrices, each of which has a much smaller dimension comparing to the number of degree of freedom of the problem. With this low-rank approximation, all components in the gradient descent method of the optimization, including calculating gradient, line search, and solution updating, have the linear complexity, with respect to the total number of loci in the target chromosome. At last, a randomly perturbed gradient descent method is adopted so one can effectively escape saddle points of the non-convex optimization. In simulations, a synthetic simple helix and a simulated chromosomal structure are used to validate our algorithms, suggesting its highly enhanced efficiency and desired ability to recover structures from data subject to random lost and mild contamination of noises.

Keywords

Hi-C data, chromosomal structures, mathematical model, manifold based optimization, fast algorithms, randomized numerical linear algebra

Received 5 November 2020

Published 8 February 2021