Statistics and Its Interface
Volume 9 (2016)
Special Issue on Statistical and Computational Theory and Methodology for Big Data
Guest Editors: Ming-Hui Chen (University of Connecticut); Radu V. Craiu (University of Toronto); Faming Liang (University of Florida); and Chuanhai Liu (Purdue University)
Smoothing spline ANOVA for super-large samples: scalable computation via rounding parameters
Pages: 433 – 444
In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.
smoothing spline ANOVA, rounding parameter, scalable algorithm
2010 Mathematics Subject Classification
Primary 62G08, 65D07. Secondary 65D10.