Solved: A speedy way to create a table of Rsquared values

pcarroll1 · Jul 23, 2018 02:06 PM

I have a script which finds the Rsquared value for one parameter versus many parameters. Once in a table it's simple to sort for the best values. It has worked very well over the years but when I try to push it to find the value for tens of thousands or more parameters it takes a very long time. I wondered if anyone knows a fast way in JSL to do this.

Things I have tried but all take too long:

1. Making a Bivariate report and reading the Rsq from the display tree.

xyp = Bivariate(Y(column(resp)), X(eval(fact)), Fit Polynomial(degree), invisible);

This command actually works faster than I would expect but looping through display tree to get the values takes most of the time.

2. Using Fit Model -> StepWise -> All Possible Models of 1 term.

This seems to only fit rows of data that have no missing data for all terms and I may have a lot of missing data for some terms. So it doesn't fit my needs.

3. Summing up squares with a function like the one below.

GetRsquared = function ({x,y}, {sx,sy,ssxx,ssyy,ssxy,nr,np,i},
	sx=0;
	sy=0;
	ssxx=0;
	ssyy=0;
	ssxy=0;
	nr=nrows(y);
	np=nr;
	for( i=1, i<=nr, i++,
		if(is missing(x[i]),np--,
			if(is missing(y[i]),np--,
				sx=sx+x[i];
				sy=sy+y[i];
				ssxx=ssxx+x[i]*x[i];
				ssyy=ssyy+y[i]*y[i];
				ssxy=ssxy+x[i]*y[i]
				)));
	ssxx=ssxx-sx*sx/np;
	ssyy=ssyy-sy*sy/np;
	ssxy=ssxy-sx*sy/np;
	r2=ssxy*ssxy/ssxx/ssyy
	);

mikedriscoll · Jul 23, 2018 05:03 PM

In response to your first question about my script, multivariate only has the "Y" variables, so I concatenate X's and Y's and let it calculate it. I dont' recall what I do for overlap but I suppose I handle it somehow.

The response screening that Karen pointed out makes much more sense for what I was trying to do. It can be scripted apparently. Run it as a test, then look at the red triangle from the report output, not the table. The top red triangle. Click save script -> to script window and you can see it. I don't know how to get the reference to the table and the r values, but I'd probably get a list of open tables before the script was executed... then find the new one after it is generated.

Response Screening( Y( .... ), X(.... ) )

View solution in original post

mikedriscoll · Jul 23, 2018 1:33 PM

I have a script that does pretty much the same thing. Used to use the bivariate, now I use multivariate (see subset of the script below). I recently found a bug which I haven't yet fixed, likely with special characters in the column name. But overall this should work. I don't recall if I've used it on such large datasets, but for the data I worked with this was much faster. Note that to get a generic # of x's vs a generic # of y's, it is actually calculating for (num_x + num_y) vs (num_x + num_y), which is still faster than looping through the bivariate.

Edit: Karen's method looks much more efficient / elegant than this one. If I ever need to update my script I would definitely check out that method.

	numX = nitems(xParamNames);
	numy = nitems(yParamNames);


	yxParamNames = concat(yParamNames, xParamNames);

	mvObj = Multivariate(
//		Y(evallist(yParamList)
		Y(evallist(yxParamNames)

		),
		Estimation Method( "Row-wise" ),
		Matrix Format( "Square" ),
		Scatterplot Matrix(0
		)
	);

	
	
	corrMatrixObj = mvObj << xpath( "//OutlineBox[ text() = 'Correlations']//MatrixBox");
	corrMatrix = corrMatrixObj << get();
	corrMatrix = corrMatrix[1];
	
	corrMatRows = 1::numY;
	corrMatCols = (numY+1)::(numY+numX);
	sub_corrMatrix = corrMatrix[corrMatRows,corrMatCols];
	myRsqr = aslist(power(sub_corrMatrix,2));		// rsqr is corr matrix to the power 2. rest of script expects a list. 
	
	mvObj<< closewindow();

pcarroll1 · Jul 23, 2018 04:40 PM

Mike,

Thanks for the response. I tried this. Is this only recording the R of terms common over all the data? I wonder if it's robust to bad columns of data.

Pat

KarenC · Jul 23, 2018 03:18 PM

Recently I have been using Analyze>Screening>Resposne Screening (not JSL just the UI) to get RSq values. This works for one versus many parameters or many vs. many parameters. It is fast. You end up with other information, however, the RSq values are in the pvalue table that is generated. Bonus Tip: in the P-value table you can select rows (pairs of parameters) and run the "Fit Selected Items" table scritp to get bivariate fits. I love this as I always want to see a relationship that appears strong numerically before I really believe it.

pcarroll1 · Jul 23, 2018 04:44 PM

Karen,

This is very fast and seems to get all the parameters I could ever want.

My application is part of a script that does other things with the results.

Do you think it can be scripted?

I'm surprised that there is no Source variable for the table.

txnelson · Jul 23, 2018 05:02 PM

I am also suprised there is not a "Source" entry on the new data table. I suggest that you add an entry in the JMP Wish List asking for this to be added.

Jim

KarenC · Jul 23, 2018 05:06 PM

You can call the Response Screening platform from JSL...from there my JSL skills can't help you.

mikedriscoll · Jul 23, 2018 05:03 PM

In response to your first question about my script, multivariate only has the "Y" variables, so I concatenate X's and Y's and let it calculate it. I dont' recall what I do for overlap but I suppose I handle it somehow.

The response screening that Karen pointed out makes much more sense for what I was trying to do. It can be scripted apparently. Run it as a test, then look at the red triangle from the report output, not the table. The top red triangle. Click save script -> to script window and you can see it. I don't know how to get the reference to the table and the r values, but I'd probably get a list of open tables before the script was executed... then find the new one after it is generated.

Response Screening( Y( .... ), X(.... ) )

pcarroll1 · Jul 24, 2018 9:56 AM

Thank you all for your help. I would like to summarize here, just so there is a record for future JMP explorers.

I was trying to find a way to quickly determine all the R Squared values of one term vs many terms, up to ~100,000 terms.

In my trial I was fitting 2,000 terms, using an HP EliteBook with Windows 7, 64-bit Operating System, Intel Core i5-4300U CPU @190 GHz 2.50 Ghz, 16 GB RAM.

My original method of either 1) looping a bivariate fit and reading the R squared from the display tree, or 2) brute forcing calculations, both took ~ 37 sec on my laptop. That's ~ 30 minutes for 100,000 terms! And want to do this for more than one response term!

The method of using the Multivariate platform consumed all my computers resources for ~ 15 minutes and then crashed due to insufficient memory. Don't try this at home.

The method on using Response Screening from KarenC took 1 sec! This is clearly the solution. I marked a response from mikedriscoll as the Solution only because he summarizes the discussion and he shows the code to use.

To be complete, here is the code I used in the end.

dt is my original table of data.

resp is a character string naming the y column of interest.

fact is a list of character strings naming the x columns of interest.

dt1 = dt << Response Screening(Y(column(resp)), X(eval(fact)));
// Note:  This is not a normal table and JMP will not recognize dt1 as a table
// The 'table' is called "Pvalues".  Hopefully you don't already have a table of that name.
//  Of course you can write more code to make this error proof.
// To immediately move this information to a normal JMP table I took a subset.
dt2 = Data Table( "PValues" ) << Subset( All rows, Selected columns only( 0 ) );
// Now I can do other things, like ...
dt2 << Sort( By( :RSquare ), Order( Descending ), Replace Table );
rsq = dt2:RSquare<< get as matrix;

A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values

Re: A speedy way to create a table of Rsquared values