Just started using JMP 15 today. when I try JMP 15 make a validation column with stratification columns (some columns with missing values), New created Validation column would not recognize the missing value and only stratify rows no missing values.
but JMP 14 didn't have this kind of result, it stratified the columns just by percentage even with missing values.
My question is: is there any way I can use missing cells/values to split rows for my validation dataset?
For Example in JMP 15: for a dataset with 10000 rows and 10 columns. I want to make 2000 rows as validation dataset so I used "Make Validation Columns" > Stratification Columns> 7 columns are selected (some of them have missing values, for example, at least 10% missing value before imputation). in JMP 14, I can get exact 2000 rows as validation dataset, but in JMP 15, I only get 7200 rows as training and 1800 rows labeled as validation. all 1000 rows are empty value(neither labeled as training or validation)...so I have to go back to use JMP 14 to make validation dataset...
I am slightly confused. Your target variable(s) has some missing values, but you still want to form your validation column stratified by the target variable. Is this correct? If so, where would you want the missing values to go? Training or validation? I think you still want the 80/20 split fo the missing values, correct? I will assume this is the case.
Think of a simple scenario where you have a binary target: 0 or 1, but you have some missing values. You want 80% of your data to go into training, 20% into validation. When you split into training and validation based on the target, you are saying that you want an 80/20 split for all of the 0's, an 80/20 split for all of the 1's, and an 80/20 split for the missings. However, the missing target cannot be used for the model building. There is no target to try and predict! JMP will ignore missing target rows regardless of which validation role they have been assigned.
However, you can still assign a validation role to the missing values, if you have some other reason for doing this. Go to Analyze > Screening > Explore Missing Values. Choose just your target column. Then choose one of the imputation methods. It won't really matter which one. I chose Multivariate Normal Imputation. Your missing target values will now all have a result, likely the same value. Be sure to keep this report open!
Now create your validation column as you normally would, specifying the target as your stratification variable. The Validation Column Type must be fixed for this to work. Note that you have imputed values so there are no missing values and your validation column will be completely filled in using the imputed data. Return to the Explore Missing Values report and click the Undo button in the Imputation Report area. This will remove the imputed values from your target variable, but your validation column will still have the Training and Validation values filled in.
Thanks for your reply.
The target variable has no missing value. they either labeled as 0 or 1.
I want to stratify 2or 3 input/feature variables (just want to make sure train /validation has a similar distribution). But input variables have missing values.
in JMP 14, I can stratify target+ feature variables(with missing values) and get the validation column with no missing value.
But new JMP 15, the validation columns with missing values because of several missing cells of feature variables.
The approach that I outlined should still work, even if stratifying by input variables. But, why are you stratifying based on an input? The modeling is to take care of the relationship between the inputs and the outputs so that stratifying based on inputs is not needed or even desired. (I know in some very rare cases you stratify on the target and an input, but that is only in situations where you have quasi-separation). Training and validation are typically only stratified on targets. If you insist on stratifying based on inputs, be sure to also stratify based on your target as well. Bottom line is that the approach that I outlined should still work, even if the stratification columns are inputs.
Thanks again for your sharing. it is useful.
personally I think it is Bug for JMP 15 comparing JMP 14.
Stratification ColumnS function should consider missing values existing for the input variables.
JMP14 has this function, but JMP 15 not.
I personally do not feel it is a bug, but a correction of an oversight in v14, but that is just my opinion. You should report this to JMP Technical Support. That will get the suggestion to the developers as a possible bug and a feature that they may wish to consider for future releases.
There are no labels assigned to this post.