Level: Intermediate
Robert Anderson, JMP Senior Statistical Consultant, SAS


Correctly identifying the best possible model and determining which factors are genuinely important are always vitally important tasks, but never easy. Holdback validation is often used to suppress overfitting and avoid including non-genuine terms in a model. However, it is not a foolproof method, especially when working with small data sets. The model you obtain is often dependent on how the training and validation rows are assigned. A single validation column cannot be relied on to point to the “best” model. However, by using many different validation columns, a clearer picture starts to emerge. Using the Simulate function in JMP Pro and some simulated data sets, this presentation will demonstrate how refitting models using multiple validation columns allows the most frequently occurring and most likely model to be identified. It will also demonstrate that this approach works even for data sets with as few as 30 rows.

Slide1.JPG Slide2.JPG Slide3.JPG Slide4.JPG Slide5.JPG Slide6.JPG Slide7.JPG Slide8.JPG Slide9.JPG Slide10.JPG Slide11.JPG Slide12.JPG Slide13.JPG Slide14.JPG Slide15.JPG Slide16.JPG Slide17.JPG Slide18.JPG Slide19.JPG Slide20.JPG Slide21.JPG Slide22.JPG Slide23.JPG

Published on ‎03-24-2025 08:52 AM by Community Manager Community Manager | Updated on ‎04-08-2025 09:06 AM

Level: Intermediate
Robert Anderson, JMP Senior Statistical Consultant, SAS


Correctly identifying the best possible model and determining which factors are genuinely important are always vitally important tasks, but never easy. Holdback validation is often used to suppress overfitting and avoid including non-genuine terms in a model. However, it is not a foolproof method, especially when working with small data sets. The model you obtain is often dependent on how the training and validation rows are assigned. A single validation column cannot be relied on to point to the “best” model. However, by using many different validation columns, a clearer picture starts to emerge. Using the Simulate function in JMP Pro and some simulated data sets, this presentation will demonstrate how refitting models using multiple validation columns allows the most frequently occurring and most likely model to be identified. It will also demonstrate that this approach works even for data sets with as few as 30 rows.

Slide1.JPG Slide2.JPG Slide3.JPG Slide4.JPG Slide5.JPG Slide6.JPG Slide7.JPG Slide8.JPG Slide9.JPG Slide10.JPG Slide11.JPG Slide12.JPG Slide13.JPG Slide14.JPG Slide15.JPG Slide16.JPG Slide17.JPG Slide18.JPG Slide19.JPG Slide20.JPG Slide21.JPG Slide22.JPG Slide23.JPG



Start:
Mon, Oct 8, 2018 09:00 AM EDT
End:
Fri, Oct 12, 2018 05:00 PM EDT
Attachments
0 Kudos