BookmarkSubscribeRSS Feed
brady_brady

Staff

Joined:

Jun 9, 2012

Choose Language Hide Translation Bar

Stratified Data Partitioning (with balancing options) add-in.

This add-in allows the user to split a dataset into train/validate/test partitions. It includes options for rebalancing the proportions of the output data set's strata variable levels in relation to a focal group. This feature is useful, for example, in oversampling an event that is rare in the original data.

 

Instructions for using the add-in are attached.

 

Updated 3/23/2016:  Includes additional balancing options.

Updated 9/1/2016:  Bug fixes (related to an error when running the add-in)

Updated 9/2/2016:  Added instructions (attached pdf)

Updated 11/27/2017: Uploaded revised instructions (attached pdf)

 

11168_pastedImage_1.png

 

Comments or suggestions? Please contact mia.stephens of JMP's Academic team.

Comments
sursangeet1

This add in is  going to be a great help in teaching Data Mining using JMP!!.

tajrida

Thanks for this useful JMP add in. I request instructions on how  to properly use this and gain benefit. Specially on using the new options for re balancing the proportions of the output data set's strata variable levels in relation to a focal group. A video instruction is preferred, If not at least a PDF instruction is highly suggested

waynergf

I agree with tajrida.  -)

mia_stephens

Thanks for the comments!  We will work on writing instructions for using this add-in.  In the meantime, if you have a copy of Data Mining for Business Analytics with JMP Pro (Data Mining for Business Analytics, Textbook Page) this add-in was designed based on materials covered in Chapter 5 (Pages 123 - 126).

Mia

mia_stephens

Instructions for using the add-in have been added (as an attachment).  Please let us know if you have any questions.

Thanks!

Mia

gurtejsbains

Hello Brady and Mia, 
This is a great add-in. I have a similar situation where I am trying to do a stratified sampling ( split dataset into two groups test and control) by using a numerical variable for balancing the data. I want to ensure that YTD sales and QTD sales are balanced in both. How do I use this add-in to achieve that? Seems like the add-in only like a categorical data for stratification.

 

Thanks 

Gurtej