Statistics and Its Interface
Volume 11 (2018)
Robust model-free feature screening based on modified Hoeffding measure for ultra-high dimensional data
Pages: 473 – 489
Sure independence screening (SIS) has become a cutting-edge dimension reduction technique to extract important features from ultrahigh-dimensional data in statistical learning. Many of the screening methods are developed to be suitable for special models that follow certain assumptions. With the availability of more data types and complicated models, a robust model-free procedure with less restrictive conditions of data is required. In this paper, we propose a modified Hoeffding measure which efficiently characterize the dependence between two random variables. The modified Hoeffding measure is between $0$ and $1$, and zero if and only if the two variables are independent under some mild conditions. This property enables us to propose a novel feature screening procedure based on it without specifying the regression structure. The proposed method is robust for both the predictors and response with the heavy-tailed data and outliers, and suitable for complex data including discrete and multivariate variables. In addition, it can extract important features even when the underlying model is complicated. We further establish the sure screening property and ranking consistency property even when the dimensionality is an exponential order of the sample size without assuming any moment condition on the predictors and response. Simulations and an analysis of real data demonstrate the versatility and practicability of the proposed method in comparison with other state-of-the-art approaches.
feature screening, Hoeffding measure, ranking consistency property, robustness, sure screening property, ultrahigh-dimensional data
2010 Mathematics Subject Classification
62E99, 62G05, 62G35, 62H20, 62P10
Yu’s work was supported by Graduate Innovation Foundation of Shanghai University of Finance and Economics, China (2015110758).
He’s work was supported by Graduate Innovation Foundation of Shanghai University of Finance and Economics, China (CXJJ-2014-452).
Zhou’s work was supported by the State Key Program of National Natural Science Foundation of China (71331006), the State Key Program in the Major Research Plan of National Natural Science Foundation of China (91546202).
Received 7 December 2016