turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Validate / Test for overfit model by re-ordering Y...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 29, 2017 10:02 AM
(784 views)

A co-worker described a test in another software package which randomly re-arranges the y values in an analysis and then re-fits the model to make sure it has a poor fit. Is anyone familiar with this techinque or attempted to automate it in JMP?

Comparing these two fits demonstrates the technique, although it sounds like many y' columns are created and tested.

dt = New Table( "Untitled 2", Add Rows( 10 ), New Column( "x", Numeric, "Continuous", Format( "Best", 12 ), Set Selected, Set Values( [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] ) ), New Column( "y", Numeric, "Continuous", Format( "Best", 12 ), Formula( :x * 2 + Random Normal( 0, 0.1 ) ) ), New Column( "y'", Numeric, "Continuous", Format( "Best", 12 ), Formula( Col Stored Value( :y, Col Shuffle() ) ) ) ); dt << Fit Model( Y( :y, :y' ), Effects( :x ), Personality( "Standard Least Squares" ), Emphasis( "Effect Leverage" ), Run( :y << {Lack of Fit( 0 ), Plot Actual by Predicted( 0 ), Plot Residual by Predicted( 0 ), Plot Effect Leverage( 0 )}, :y' << {Lack of Fit( 0 ), Plot Actual by Predicted( 0 ), Plot Residual by Predicted( 0 ), Plot Effect Leverage( 0 )} ) );

2 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 29, 2017 10:12 AM
(782 views)

This *resampling method* is usually used to determine significance of the effect of changing the levels of the independent variable. This method is often used instead of the *t* test or the ANOVA, which make assumptions about the data and the model, for inference. The resampling approach generates the empirical distribution instead of assuming a particular model. You compare your sample statistic to the empirical distribution to obtain a *p*-value.

JMP Pro can *bootstrap* any result, such as a parameter estimate, in order to determine its significance.

You don't need the second dependent variable (random normal deviate).

Learn it once, use it forever!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 29, 2017 3:06 PM
(765 views)

Mark,

If I understand your post I think bootstrapping would be an alternate approach to (hopefully) arrive at the same conclusion. Honestly though I have not had good luck bootstrapping parameter estimates for any more than a basic model, perhaps I am doing something wrong :-). For example, attempting to bootstrap parameter estimates for a second order model results in multiple columns for each possible coefficient.

I think I found a way to automate this using the simulation method. Instead of using the simulation function generated by a platform just swap the original y with the re-arranged y' column. I am having trouble applying the technique to partition methods though; I suspect the same trees are used for each simulation.

Attached is a script showing what this looks like for a few different methods but here is the basic idea:

Random Reset(1); dt = New Table( "Untitled 2", Add Rows( 4 ), New Column( "x", Numeric, "Continuous", Format( "Best", 12 ), Set Selected, Set Values( [1, 2, 3, 4] ) ), New Column( "y", Numeric, "Continuous", Format( "Best", 12 ), Formula( :x * 2 + Random Normal( 0, 0.1 ) ) ), New Column( "y'", Numeric, "Continuous", Format( "Best", 12 ), Formula( Col Stored Value( :y, Col Shuffle() ) ) ) ); // ------ Linear Model ------ linmdl = dt << Fit Model( Y( :y ), Effects( :x ), Personality( "Standard Least Squares" ), Emphasis( "Effect Leverage" ), Run( :y << {Lack of Fit( 0 ), Plot Actual by Predicted( 0 ), Plot Residual by Predicted( 0 ), Plot Effect Leverage( 0 )}, :y' << {Lack of Fit( 0 ), Plot Actual by Predicted( 0 ), Plot Residual by Predicted( 0 ), Plot Effect Leverage( 0 )} ) ); linrpt = linmdl << Report; lindt = linrpt["Summary of Fit"][1][2] << Simulate( 100, Out( :y ), In( :y' ) ); lindt[2] << Distribution( Continuous Distribution( Column( :RSquare ) ) );