Stanley Young stanyoung, PhD, CEO, CGStat  LLC

Bob Obenchain, PhD, Principal Consultant, Risk Benefit Statistics LLC

Goran Krstic, Fraser Health Authority

Large (observational) data sets typically present research opportunities, but also problems that can lead to false claims.  In Big Data, the standard error of an effect estimate goes to zero as sample size increases, so even small biases can lead to declared (but false) claims.  In addition, the average of treatment can be almost meaningless when there are interactions with confounders that create local variation in effect-sizes.  Data miners need statistical methods that can deal simply and efficiently with these sources of bias.  Here, we demonstrate use of a JMP add-in, Moving Median, and a new JMP platform, Local Control, for the analysis of two data sets.  Our first case study illustrate reduction of bias in an environmental epidemiology data set. Our second study uses Local Control on a time series air quality example.  By detecting interactions, data miners can produce more realistic and more relevant analyses that reduce the bias typically implied by the variety and heterogeneity of Big Data.

Published on ‎03-24-2025 08:56 AM by Community Manager Community Manager | Updated on ‎03-27-2025 09:43 AM

Stanley Young stanyoung, PhD, CEO, CGStat  LLC

Bob Obenchain, PhD, Principal Consultant, Risk Benefit Statistics LLC

Goran Krstic, Fraser Health Authority

Large (observational) data sets typically present research opportunities, but also problems that can lead to false claims.  In Big Data, the standard error of an effect estimate goes to zero as sample size increases, so even small biases can lead to declared (but false) claims.  In addition, the average of treatment can be almost meaningless when there are interactions with confounders that create local variation in effect-sizes.  Data miners need statistical methods that can deal simply and efficiently with these sources of bias.  Here, we demonstrate use of a JMP add-in, Moving Median, and a new JMP platform, Local Control, for the analysis of two data sets.  Our first case study illustrate reduction of bias in an environmental epidemiology data set. Our second study uses Local Control on a time series air quality example.  By detecting interactions, data miners can produce more realistic and more relevant analyses that reduce the bias typically implied by the variety and heterogeneity of Big Data.



Start:
Mon, Sep 14, 2015 09:00 AM EDT
End:
Sat, Sep 17, 2016 05:00 PM EDT
Attachments
0 Kudos