Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Re: A speedy way to create a table of Rsquared values

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Jul 23, 2018 11:06 AM
(4913 views)

I have a script which finds the Rsquared value for one parameter versus many parameters. Once in a table it's simple to sort for the best values. It has worked very well over the years but when I try to push it to find the value for tens of thousands or more parameters it takes a very long time. I wondered if anyone knows a fast way in JSL to do this.

Things I have tried but all take too long:

1. Making a Bivariate report and reading the Rsq from the display tree.

xyp = Bivariate(Y(column(resp)), X(eval(fact)), Fit Polynomial(degree), invisible);

This command actually works faster than I would expect but looping through display tree to get the values takes most of the time.

2. Using Fit Model -> StepWise -> All Possible Models of 1 term.

This seems to only fit rows of data that have no missing data for all terms and I may have a lot of missing data for some terms. So it doesn't fit my needs.

3. Summing up squares with a function like the one below.

```
GetRsquared = function ({x,y}, {sx,sy,ssxx,ssyy,ssxy,nr,np,i},
sx=0;
sy=0;
ssxx=0;
ssyy=0;
ssxy=0;
nr=nrows(y);
np=nr;
for( i=1, i<=nr, i++,
if(is missing(x[i]),np--,
if(is missing(y[i]),np--,
sx=sx+x[i];
sy=sy+y[i];
ssxx=ssxx+x[i]*x[i];
ssyy=ssyy+y[i]*y[i];
ssxy=ssxy+x[i]*y[i]
)));
ssxx=ssxx-sx*sx/np;
ssyy=ssyy-sy*sy/np;
ssxy=ssxy-sx*sy/np;
r2=ssxy*ssxy/ssxx/ssyy
);
```

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

In response to your first question about my script, multivariate only has the "Y" variables, so I concatenate X's and Y's and let it calculate it. I dont' recall what I do for overlap but I suppose I handle it somehow.

The response screening that Karen pointed out makes much more sense for what I was trying to do. It can be scripted apparently. Run it as a test, then look at the red triangle from the report output, not the table. The top red triangle. Click save script -> to script window and you can see it. I don't know how to get the reference to the table and the r values, but I'd probably get a list of open tables before the script was executed... then find the new one after it is generated.

Response Screening( Y( .... ), X(.... ) )

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: A speedy way to create a table of Rsquared values

I have a script that does pretty much the same thing. Used to use the bivariate, now I use multivariate (see subset of the script below). I recently found a bug which I haven't yet fixed, likely with special characters in the column name. But overall this should work. I don't recall if I've used it on such large datasets, but for the data I worked with this was much faster. Note that to get a generic # of x's vs a generic # of y's, it is actually calculating for (num_x + num_y) vs (num_x + num_y), which is still faster than looping through the bivariate.

Edit: Karen's method looks much more efficient / elegant than this one. If I ever need to update my script I would definitely check out that method.

```
numX = nitems(xParamNames);
numy = nitems(yParamNames);
yxParamNames = concat(yParamNames, xParamNames);
mvObj = Multivariate(
// Y(evallist(yParamList)
Y(evallist(yxParamNames)
),
Estimation Method( "Row-wise" ),
Matrix Format( "Square" ),
Scatterplot Matrix(0
)
);
corrMatrixObj = mvObj << xpath( "//OutlineBox[ text() = 'Correlations']//MatrixBox");
corrMatrix = corrMatrixObj << get();
corrMatrix = corrMatrix[1];
corrMatRows = 1::numY;
corrMatCols = (numY+1)::(numY+numX);
sub_corrMatrix = corrMatrix[corrMatRows,corrMatCols];
myRsqr = aslist(power(sub_corrMatrix,2)); // rsqr is corr matrix to the power 2. rest of script expects a list.
mvObj<< closewindow();
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: A speedy way to create a table of Rsquared values

Mike,

Thanks for the response. I tried this. Is this only recording the R of terms common over all the data? I wonder if it's robust to bad columns of data.

Pat

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: A speedy way to create a table of Rsquared values

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: A speedy way to create a table of Rsquared values

Karen,

This is very fast and seems to get all the parameters I could ever want.

My application is part of a script that does other things with the results.

Do you think it can be scripted?

I'm surprised that there is no Source variable for the table.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: A speedy way to create a table of Rsquared values

I am also suprised there is not a "Source" entry on the new data table. I suggest that you add an entry in the JMP Wish List asking for this to be added.

Jim

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: A speedy way to create a table of Rsquared values

You can call the Response Screening platform from JSL...from there my JSL skills can't help you.

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

In response to your first question about my script, multivariate only has the "Y" variables, so I concatenate X's and Y's and let it calculate it. I dont' recall what I do for overlap but I suppose I handle it somehow.

The response screening that Karen pointed out makes much more sense for what I was trying to do. It can be scripted apparently. Run it as a test, then look at the red triangle from the report output, not the table. The top red triangle. Click save script -> to script window and you can see it. I don't know how to get the reference to the table and the r values, but I'd probably get a list of open tables before the script was executed... then find the new one after it is generated.

Response Screening( Y( .... ), X(.... ) )

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: A speedy way to create a table of Rsquared values

Thank you all for your help. I would like to summarize here, just so there is a record for future JMP explorers.

I was trying to find a way to quickly determine all the R Squared values of one term vs many terms, up to ~100,000 terms.

In my trial I was fitting 2,000 terms, using an HP EliteBook with Windows 7, 64-bit Operating System, Intel Core i5-4300U CPU @190 GHz 2.50 Ghz, 16 GB RAM.

My original method of either 1) looping a bivariate fit and reading the R squared from the display tree, or 2) brute forcing calculations, both took ~ 37 sec on my laptop. That's ~ 30 minutes for 100,000 terms! And want to do this for more than one response term!

The method of using the Multivariate platform consumed all my computers resources for ~ 15 minutes and then crashed due to insufficient memory. Don't try this at home.

The method on using Response Screening from KarenC took 1 sec! This is clearly the solution. I marked a response from mikedriscoll as the Solution only because he summarizes the discussion and he shows the code to use.

To be complete, here is the code I used in the end.

dt is my original table of data.

resp is a character string naming the y column of interest.

fact is a list of character strings naming the x columns of interest.

```
dt1 = dt << Response Screening(Y(column(resp)), X(eval(fact)));
// Note: This is not a normal table and JMP will not recognize dt1 as a table
// The 'table' is called "Pvalues". Hopefully you don't already have a table of that name.
// Of course you can write more code to make this error proof.
// To immediately move this information to a normal JMP table I took a subset.
dt2 = Data Table( "PValues" ) << Subset( All rows, Selected columns only( 0 ) );
// Now I can do other things, like ...
dt2 << Sort( By( :RSquare ), Order( Descending ), Replace Table );
rsq = dt2:RSquare<< get as matrix;
```