Statistics and Its Interface

Volume 7 (2014)

Number 2

Significance analysis for pairwise variable selection in classification

Pages: 263 – 274

DOI: https://dx.doi.org/10.4310/SII.2014.v7.n2.a11

Authors

Xingye Qiao (Department of Mathematical Sciences, Binghamton University, State University of New York, Binghamton, N.Y., U.S.A.)

Yufeng Liu (Department of Statistics and Operations Research and Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, N.C., U.S.A.)

J. S. Marron (Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, N.C., U.S.A.)

Abstract

The goal of this article is to select important variables that can distinguish one class of data from another. A marginal variable selection method ranks the marginal effects for classification of individual variables, and is a useful and efficient approach for variable selection. Our focus here is to consider the bivariate effect, in addition to the marginal effect. In particular, we are interested in those pairs of variables that can lead to accurate classification predictions when they are viewed jointly. To accomplish this, we propose a permutation test called Significance test of Joint Effect (SigJEff). In the absence of joint effect in the data, SigJEff is similar or equivalent to many marginal methods. However, when joint effects exist, our method can significantly boost the performance of variable selection. Such joint effects can help to provide additional, and sometimes dominating, advantage for classification. We illustrate and validate our approach using both simulated example and a real glioblastoma multiforme data set, which provide promising results.

Keywords

classification, marginal screening, permutation test, variable selection

2010 Mathematics Subject Classification

Primary 62H30. Secondary 62P10.

Published 17 April 2014