Subscribe Bookmark RSS Feed
kathy_walker

Staff

Joined:

Aug 19, 2014

Bias Adjustment in Data Mining: Local Control Analysis of Radon and Ozone

Stanley Young stanyoung, PhD, CEO, CGStat  LLC

Bob Obenchain, PhD, Principal Consultant, Risk Benefit Statistics LLC

Goran Krstic, Fraser Health Authority

Large (observational) data sets typically present research opportunities, but also problems that can lead to false claims.  In Big Data, the standard error of an effect estimate goes to zero as sample size increases, so even small biases can lead to declared (but false) claims.  In addition, the average of treatment can be almost meaningless when there are interactions with confounders that create local variation in effect-sizes.  Data miners need statistical methods that can deal simply and efficiently with these sources of bias.  Here, we demonstrate use of a JMP add-in, Moving Median, and a new JMP platform, Local Control, for the analysis of two data sets.  Our first case study illustrate reduction of bias in an environmental epidemiology data set. Our second study uses Local Control on a time series air quality example.  By detecting interactions, data miners can produce more realistic and more relevant analyses that reduce the bias typically implied by the variety and heterogeneity of Big Data.

Article Labels
Article Tags