cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
anaheilman
Level II

Adding new data using previously develop random forest model

Community,

I am working with my colleague Kim on a project she used random forest to develop a prediction model for plant disease in potato based on soil properties. Here are our questions:

 

  1. How do we use new data with the already generated model that was already validated?
  2. How can we validate that the old predictive model is explaining the response variable?

The predictive model indicates that with higher iron you get higher disease pressure. We want to use data from a previous year and see if that holds true.

Thank you very much for your kind suggestions.

 

 Kim Anderson & AM Heilman

1 REPLY 1
P_Bartell
Level VIII

Re: Adding new data using previously develop random forest model

I'll start with question #2. The general best practice for validating your 'old' model is to partition the original data set into a training, validation, and test categories for each row in the data table. Then examine the various fit oriented statistics and visualizations. What you see in the platform report will depend in large measure on the nature of the response...is it categorical or numeric continuous? Ultimately, it's up to you to say whether the model is 'validated'...whatever that means to you. You'll need JMP Pro to effectively use this pathway. Here's a link to the 'how to' create a validation column in JMP Pro Creating a Validation Column I'm assuming you've got lots and lots of observations. This is required to get the most benefit from this pathway. If you don't have lots of data, then other methods like Kfold can be useful. Then once you get to the modeling platform you are using, make sure to place the Validation column name in the Validation specification window. If you don't have JMP Pro...this general pathway can still be recreated in part...but it's gonna be real clunky wrt to the workflow and beyond the scope of what can be described here in a forum such as this.

 

As for question #1 make sure you store the predicting equation in a column unto itself...then just add your new observations and compare the predictions to the actual responses.