Hello JMP Community,
I am trying to split a dataset into Test and Control for a marketing experiment. The data set is unique at the customer id and has a million rows. Both the new datasets must be balanced using two parameters which are annual customer spend (YTD) and last quarter spend (QTD). These parameters are represented in two columns. I am unable to figure out how to create a stratified/balanced split. I tried using the "Stratified Data Partitioning (with balancing options) add-in." but in this add-in, the balance is performed at a categorical level (Focal Group). In my case, the balance needs to be done on a numerical data type (YTD and QTD spend).
I am NOT a JMP Pro user.
Any help in this use case will be much appreciated.
Thanks
JMP Pro is going to be the best way to do what you want. However, since you do NOT have JMP Pro, here is an approach that will get you pretty close. I will assume that you want 75% training, 25% validation, but you can easily revise this for other percentages.
Sort your data by your two stratification variables. Then, since you are looking at training and validation sets, take groups of 4 rows of data and randomly assign three of those rows to training, the other row to validation. This should give you fairly representative groupings according to your two stratification variables.