Subscribe Bookmark RSS Feed

How to use scripting to interactively change columns for PCA

flo

Community Trekker

Joined:

Jun 21, 2012

I want to interactively exclude columns from a PCA in order to see what is changing. Unfortunately, the "automatic recalc" options seems only to work if I exclude rows and not for excluding columns. Does anybody know a different way to do that, the "column switcher" functions seem to be able to do it (but does not what I want), so in principle it should be possible....

2 REPLIES
julian

Staff

Joined:

Jun 25, 2014

Hi Flo,

As you discovered, automatic recalc only works for excluding rows; changes to columns are not taken into account (with one exception: changing a column to character will remove it from PCA as it is no longer suitable for that analysis).

I'm attaching a script I put together with one method for accomplishing what you need. With an open data table you can run this script and will be given a dialog box where you can add or remove columns.

7033_Screen Shot 2014-07-23 at 1.08.05 PM.png

Each time you add or remove a column the PCA is recalculated and the previous results are minimized (but retained for reference). You can modify the PCA script in the code to generate the output you normally request.

I hope this helps!

Julian

flo

Community Trekker

Joined:

Jun 21, 2012

Hi Julian,

thanks for the fast reply! That's a nice script. Unfortunately I am trying for more interactivity. I plan to use different sliders to exclude/include portions of the columns due to different criteria and have a real-time view on what is changing in the first 2 or 3 PCs via the Score Plots (separate groups, finding clusters...).

I don't really get, why the "exclude" function for Columns is ignored by the platforms, seems to me straightforward, especially if you work with high-dimensional data sets. Or having a "insert column" command at least on scripting level. But I could not find any way to get a direct handle on that. On the other hand, it has to be implement somehow, otherwise the "Column Switcher" would not work....

Anyway, so far I have found two workarounds to get thinks working, I just mention them here, if anybody wants to try something similar.

  1. use the "pairwise" method for PCA estimation. In that case you can work with a second copy of the table in the background and it is possible to just fill Columns you want to exclude with blanks or the real values again and enforce recalc by exclude/include of a row. This works fine as long as at least the first 3 PCs have Eigenvalues >1
  2. build your own report with its own data table, do the necessary PCAs in the background and just copy the new data in the correct columns in your data table. Here you loose the nice graphics provided by the PCA platform.