Level: Intermediate
Philip Ramsey, Principal Lecturer, University of New Hampshire; and Owner, North Haven Group

Chris Gotwalt, JMP Director of Statistical Research and Development, SAS


There are two different goals to statistical modeling: explanation and prediction. Explanatory models often predict poorly (Shmueli, 2010). Often analyses of designed experiments (DOE) are explanatory, yet the experimental goals are prediction. DOE is a best practice for product and process development where one predicts future performance. Predictive modeling requires partitioning the data into training and validation sets where the validation set is used to assess predictive models. Most DOEs have insufficient observations to form a validation set precluding direct assessment of prediction performance. We demonstrate a “balanced auto-validation” technique using the original data to create two copies of that data, one a training set and the other a validation set. The sets differ in row weightings. The weights are Dirichlet distributed and “balanced;” observations contributing more to the training data contribute less to the validation set (and vice versa). The approach only requires copying the data and creating a formula column for weights. The technique is general, allowing one to apply predictive modeling techniques to smaller data sets common to laboratory and manufacturing studies. Two biopharma process development case studies are used for demonstration. Both cases have large validation sets combined with definitive screening designs. JMP is used to demonstrate analyses.

Published on ‎03-24-2025 08:41 AM by Community Manager Community Manager | Updated on ‎03-26-2025 05:19 PM

Level: Intermediate
Philip Ramsey, Principal Lecturer, University of New Hampshire; and Owner, North Haven Group

Chris Gotwalt, JMP Director of Statistical Research and Development, SAS


There are two different goals to statistical modeling: explanation and prediction. Explanatory models often predict poorly (Shmueli, 2010). Often analyses of designed experiments (DOE) are explanatory, yet the experimental goals are prediction. DOE is a best practice for product and process development where one predicts future performance. Predictive modeling requires partitioning the data into training and validation sets where the validation set is used to assess predictive models. Most DOEs have insufficient observations to form a validation set precluding direct assessment of prediction performance. We demonstrate a “balanced auto-validation” technique using the original data to create two copies of that data, one a training set and the other a validation set. The sets differ in row weightings. The weights are Dirichlet distributed and “balanced;” observations contributing more to the training data contribute less to the validation set (and vice versa). The approach only requires copying the data and creating a formula column for weights. The technique is general, allowing one to apply predictive modeling techniques to smaller data sets common to laboratory and manufacturing studies. Two biopharma process development case studies are used for demonstration. Both cases have large validation sets combined with definitive screening designs. JMP is used to demonstrate analyses.



Event has ended
You can no longer attend this event.

Start:
Mon, Mar 12, 2018 05:00 AM EDT
End:
Thu, Mar 15, 2018 01:00 PM EDT
Attachments
0 Kudos