Bob Obenchain, PhD, Principal Consultant, Risk Benefit Statistics LLC
Goran Krstic, Fraser Health Authority
Large (observational) data sets typically present research opportunities, but also problems that can lead to false claims. In Big Data, the standard error of an effect estimate goes to zero as sample size increases, so even small biases can lead to declared (but false) claims. In addition, the average of treatment can be almost meaningless when there are interactions with confounders that create local variation in effect-sizes. Data miners need statistical methods that can deal simply and efficiently with these sources of bias. Here, we demonstrate use of a JMP add-in, Moving Median, and a new JMP platform, Local Control, for the analysis of two data sets. Our first case study illustrate reduction of bias in an environmental epidemiology data set. Our second study uses Local Control on a time series air quality example. By detecting interactions, data miners can produce more realistic and more relevant analyses that reduce the bias typically implied by the variety and heterogeneity of Big Data.