Subscribe Bookmark RSS Feed

Need a better resource than JMP Help to explain Partitioning

jeff_kolton1

Community Trekker

Joined:

Jun 25, 2014

I'm looking at a dataset that has 250 rows, and about 60 columns.  I'm trying Partition to do a first analysis, to cull out totally useless factors.  JMP Help explains what everything on the Partition Launch Window means, but I need something more in-depth to pick the best values for Validation, and other parameters.  Any recommendations?

5 REPLIES
louv

Staff

Joined:

Jun 23, 2011

jeff_kolton1

Community Trekker

Joined:

Jun 25, 2014

Thanks. Good read.

Jeff Kolton

Staff Engineer

Enterprise Excellence

Fiberglass

940 Washburn Switch Rd.

Shelby, NC, USA, 28150

Tel: 704-434-2261 ext. 2374

E-Mail: kolton@ppg.com<mailto:kolton@ppg.com>

Web: www.ppg.com<http://www.ppg.com/>

Byron_JMP

Staff

Joined:

Apr 26, 2012

If you're just culling variables, you might not need a validation column. You should have a validation column if you are using k-fold cross validation, or if you're building a model, but its not entirely necessary for factor reduction.

Hopefully you have JMPv12. Go to the "Cols" tab, Modeling Utilities, Screen Predictors. Select Xs and Y's and run. This tool is great for doing quick factor reduction.

old but good reference on how partition works: http://www.jmp.com/content/dam/jmp/documents/en/newsletters/jmper-cable/17_spring_2005.pdf

jeff_kolton1

Community Trekker

Joined:

Jun 25, 2014

Thanks. Very helpful…your advice and the paper.

Jeff Kolton

Staff Engineer

Enterprise Excellence

Fiberglass

940 Washburn Switch Rd.

Shelby, NC, USA, 28150

Tel: 704-434-2261 ext. 2374

E-Mail: kolton@ppg.com<mailto:kolton@ppg.com>

Web: www.ppg.com<http://www.ppg.com/>

Peter_Bartell

Joined:

Jun 5, 2014

To add to what my esteemed colleagues Lou and Byron add above...if you have JMP Pro, and variable identification is your key goal...I suggest evaluating your data using multiple tree based methods available within JMP Pro. Try the Decision Tree path, then the Bootstrap Forest and Boosted Tree methods. The beauty of these last two methods is, all things considered, the Decision Tree path is what I like to call a 'voracious' algorithm in that, at any one split it identifies the single most influential variable. The bootstrapping and bagging that accompanies the Bootstrap Forest approach is generally more adept at identifying the more subtle influential factors because occasionally, due to the random selection of columns in the Bootstrap Forest, the VERY influential factors are not included in the tree creation and this gives the more subtle factors a fighting chance to show their stuff.

You may also want to use the penalized regression methods within the Generalized Regression platform for variable identification. Completely different algorithms...but often used for the same purpose...variable identification.

Lastly, don't forget to save model predicting equations to the data table so you can use the JMP Pro Model Comparison platform to have a 'one stop shop' for evaluating these many models performance in a very easy to use side by side manner...including JMP's Prediction Profiler.