Statistics and Its Interface

Volume 14 (2021)

Number 4

Feature screening via Bergsma–Dassios sign correlation learning

Pages: 417 – 430

DOI: https://dx.doi.org/10.4310/20-SII662

Authors

Daojiang He (Department of Statistics, Anhui Normal University, Wuhu, China)

Xinxin Hao (Department of Statistics, Anhui Normal University, Wuhu, China)

Kai Xu (Department of Statistics, Anhui Normal University, Wuhu, China)

Lei He (Department of Statistics, Anhui Normal University, Wuhu, China)

Youxin Liu (School of Science, Nanjing University of Science and Technology, Nanjing, China)

Abstract

Robust rank correlation screening (RRCS) procedure that is built on Kendall $\tau$, has been suggested by Li, Peng, Zhang and Zhu (2012) as a robust alternative to the sure independence screening (SIS) method that is based on the Pearson’s correlation. However, as a drawback for certain applications is that $\tau$ may be zero even if there is an association between two random variables, RRCS is not omnibus, only having an ability to detect monotonic effects. In this paper, we use the Bergsma–Dassios sign correlation (Bergsma and Dassios, 2014, $\tau^\ast_b$) to introduce a new SIS procedure.We advocate using the $\tau^\ast_b$‑SIS for three reasons. First, as $\tau^\ast_b$ possesses the necessary and intuitive properties as a correlation index, the $\tau^\ast_b$‑SIS has a better screening ability for nonlinear effects including interactions and heterogeneity compared with the RRCS. Second, as $\tau^\ast_b$ is a natural extension of $\tau$, the $\tau^\ast_b$‑SIS is conceptually simple, easy to implement and robust to the presence of extreme values and outliers in the observations. Third, without assuming any moment condition on the response and predictors, the $\tau^\ast_b$‑SIS enjoys several appealing properties, such as the sure screening property, ranking consistency property and the characteristic of minimum model size. We demonstrate the merits of the $\tau^\ast_b$‑SIS procedure through extensive Monte Carlo experiments and illustrate the method through a real-data example.

Keywords

Bergsma–Dassios sign correlation, feature screening, Kendall $\tau$, sure screening property, ranking consistency property, minimum model size

He’s work is supported by the National Natural Science Foundation of China (Grant No. 11201005), the Humanities and Social Sciences Foundation of Ministry of Education, China (Grant No. 17YJC910003) and the Natural Science Foundation of Anhui Province (Grant No. 2008085MA08).

Kai Xu’s work is supported by the National Natural Science Foundation of China (Grant No. 11901006) and the Natural Science Foundation of Anhui Province (Grant No. 1908085QA06).

Lei He’s work is supported by the Natural Science Foundation of Anhui Province (Grant No. 2008085QA15).

Received 20 April 2020

Accepted 27 December 2020

Published 8 July 2021