cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
AnwarSalim
Level II

How to compare baseline data from Derivation and Validation Cohorts?

I am assigned by my supervisor to independently perform an analysis for risk and score models for my study outcome. After cleaning my data and assigning two-third of my sample size as the Derivation cohort and the other as the validation cohort, the next step is to compare the baseline characteristics if they are statistically significant or not. At the moment these two sample sizes of cohorts are in two different data tables, having a similar number of column labels but different raw numbers, as the Derivation sample is larger. I have no knowledge on how to compare (Differences) for example the age between these two cohorts. Please could anyone assist me in how to perform the analysis on JMP?

2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: How to compare baseline data from Derivation and Validation Cohorts?

You need to read the document, Discovering JMP which is available from the Help pull down menu. There are also tutorials available under the Help menu. The Beginners Tutorial and the Two Means Tutorial should help you understand the direction to go.
Jim

View solution in original post

AnwarSalim
Level II

Re: How to compare baseline data from Derivation and Validation Cohorts?

I have just found out that I had made a huge mistake on how I should approach this kind of study design, apart from being called Derivation cohort and Validation Cohort all the data should be in one data set whereas before I treated them as separated data sets, this made me crazy.
I had to use Excel, this involves making a new column and using the RAND formula to randomly select the rows, sort them from lowest to highest values according to the random numbers in the new column, then make categories by assigning 2/3 of the data as Derivation (0) and 1/3 as Validation (1). Making the two groups comparable. then I had to import my data back to JMP.

As I did this on excel, I think it's possible to be done on JMP without getting troubles, I will go through the discover JMP and Tutorials to find for myself.

I'm sorry for my inconvenience, I hope this will be helpful to others.
thank you @txnelson and @P_Bartell

View solution in original post

8 REPLIES 8
txnelson
Super User

Re: How to compare baseline data from Derivation and Validation Cohorts?

You need to read the document, Discovering JMP which is available from the Help pull down menu. There are also tutorials available under the Help menu. The Beginners Tutorial and the Two Means Tutorial should help you understand the direction to go.
Jim
AnwarSalim
Level II

Re: How to compare baseline data from Derivation and Validation Cohorts?

Thank you txnelson, I will go through Discovering JMP.
P_Bartell
Level VIII

Re: How to compare baseline data from Derivation and Validation Cohorts?

Let's back up a minute. Can you share how you created these two, in your word, cohorts? Random selection/assignment? Or some other method? You could have set yourself up for real analytical trouble by picking just about any other method than a random selection/assignment.

 

And what do the words 'statistically significant' mean? Without a testable hypothesis they mean little. @txnelson 's advice is sound wrt to 'how to get JMP' to do various tests for significance of population/sample parameters.

AnwarSalim
Level II

Re: How to compare baseline data from Derivation and Validation Cohorts?

I randomly assigned the two cohorts by using the Table menu, then selected Subset to assigned the 2/3 as Derivation cohort, and the other 1/3 as a Validation cohort. so I found myself with two datasets which are the result of this study design, and I don't know the technique of how to compare these two data sets in JMP if possible.
this is my first time doing this kind of study, using Derivation and Validation Cohorts. Of course, I have basic skills in performing statistical analysis on one data set. my question may look complex but actually it's not, I need a guidance, maybe I have done it wrong somewhere or I cant explain properly.
AnwarSalim
Level II

Re: How to compare baseline data from Derivation and Validation Cohorts?

I have a large population data set, this study has a follow-up period of 5 years, aiming at making prediction scores (Risk models), one of the best study designs is to randomly assign data set into Derivation Cohort (used to develop the score) and later use the Validation Cohort to assess the validity of the score. but before performing associations and predictive modeling analysis in the Derivation cohort, I have to compare the two data sets on the variables that I'm interested in. I am stuck at this point.
AnwarSalim
Level II

Re: How to compare baseline data from Derivation and Validation Cohorts?

I have just found out that I had made a huge mistake on how I should approach this kind of study design, apart from being called Derivation cohort and Validation Cohort all the data should be in one data set whereas before I treated them as separated data sets, this made me crazy.
I had to use Excel, this involves making a new column and using the RAND formula to randomly select the rows, sort them from lowest to highest values according to the random numbers in the new column, then make categories by assigning 2/3 of the data as Derivation (0) and 1/3 as Validation (1). Making the two groups comparable. then I had to import my data back to JMP.

As I did this on excel, I think it's possible to be done on JMP without getting troubles, I will go through the discover JMP and Tutorials to find for myself.

I'm sorry for my inconvenience, I hope this will be helpful to others.
thank you @txnelson and @P_Bartell
P_Bartell
Level VIII

Re: How to compare baseline data from Derivation and Validation Cohorts?

@AnwarSalim I have two more thoughts for you as you move ahead in your educational and problem solving efforts...

 

1. I highly recommend taking the JMP Statistics for Industrial Problem Solving online course from beginning to end. There are several modules in this course which will help you with this particular problem as you've articulated it. Here is a link to the course. It's free. And as the course designers say, "All you need is a browser, internet connection and inquisitive mind."

JMP Statistics for Industrial Problem Solving 

2. If you aren't using JMP Pro, you should investigate that investment. For the type of problem you've articulated, JMP can work...JMP Pro will shine. The methodological and technical analytics tool set is much broader and deeper in JMP Pro for the type of problem you are working with. For example, to create your 'Derivation' and 'Validation' cohorts in JMP Pro takes about 3 mouse clicks and would have saved you all that work in Excel, importing back to JMP etc.

AnwarSalim
Level II

Re: How to compare baseline data from Derivation and Validation Cohorts?

Thank you @P_Bartell, for your support. I'll work my ass off. Yes, I'm using JMP pro from my institution, but still a beginner. Hoping to improve my skills.