cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

Validation for Continuous Process Data

What inspired this wish list request? 

@FN linked to a scikit-learn article in a comment on theValidation for Continuous Processing Data add-in page.  This is a better way to do cross validation with continuous process data and it reminded me that the techniques used in that add-in would be much more effective and used more often if they were built into the JMP. This situation applies to many analyses my colleagues and I do regularly with manufacturing data.

 

What is the improvement you would like to see? 

- Incorporate grouping by time in to the made a validation column dialog box, including similar functionality to what is in the Add-in linked above to guide the user to an appropriate group size.

- Add an option for an individual table so crossvalidation used everywhere in JMP uses a time-based method referenced in the link below instead of randomly assigning rows.

 

Why is this idea important?

Although @DrewLuebe and I created the validation add-in linked above specifically to help users better understand the predictability of models in data sets that have correlation between rows, including autocorrelation, the built-in crossvalidation behavior and default validation still use per-row splits.  This means that many reported validation fit metrics are artificially better than the actual performance will be with new data .  By baking these techniques into JMP itself, users will have a much better understanding of their data and models no matter what tools they use.

 

 

1 Comment
aharding
Level III

This is a definite need. Many users would not recognize the need to split their data, considering autocorrelation. Giving it an appropriate place in JMP or JMP Pro will help make your users get more robust solutions.