Choose Language Hide Translation Bar
Highlighted
tnad
Level II

variable selection with a correlation cutoff

I'm screening through thousands of variables and would like to only keep ones that are not very highly correlated with each other for example: r2<0.95 and >-0.95. Is there an easy way to do this? I can use: "Multivariate methods > multivariate" to calculate r2, but I have no idea how I can make the selection according to cutoff above.

3 REPLIES 3
Highlighted

Re: variable selection with a correlation cutoff

You can use the pairwise correlations report instead of the default matrix version of the report. Click the red triangle at the top and choose Pairwise Correlations. Next, right-click the new report and select Sort by Column. Select the column with the p-values. Make sure to select the order that is most useful to you (ascending or descending). Right click the report again and select Make into Data Table if you like.

Learn it once, use it forever!
Highlighted
tnad
Level II

Re: variable selection with a correlation cutoff

Thanks. I can filter on p-value here, but I'm not sure how I can only keep variables that are not highly correlated with each other using this table

Highlighted
txnelson
Super User

Re: variable selection with a correlation cutoff

@markbailey suggestion will work, however, I find that doing what you are attempting can be done easily using 

     Analyze==>Screening==>Response Scrieening

It create a data table where you can sort, select, etc. the results.

Jim
Article Labels

    There are no labels assigned to this post.