cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
dinosaur4
Level I

JMP PCA Data Help!

I'm new to doing PCAs in JMP. Playing around with it though seems fairly simple. However, when I'm uploading my own data, that's where the problems begin. I am doing 3D GM, so each landmark (36 in total) has an X, Y, & Z coordinate. In a simple txt or Excel table, the data for each specimen is converted into 3 columns (an X, Y, & Z column) with 36 rows (a row for each landmark).

 

It would seem that in JMP that I have to convert my data - each specimen would therefore be a single row with 108 columns (X1, Y1, Z1, X2, Y2, Z2, etc.) - otherwise it would reach each row as a specimen with a single X, Y, & Z coordinate. Correct? Secondly, some of my landmarks are missing, and it's not as simple as "Landmark 8" is missing in each specimen. I know the pros and cons and arguments of estimating missing data etc., but I don't want to do that. In R I simply used "NA" for missing data, but it seems that JMP doesn't like combining numbers/values/symbols. Nor does it seem to like a - in place of missing data, nor an empty field. Any help or suggestions are greatly appreciated!

5 REPLIES 5
ian_jmp
Level X

Re: JMP PCA Data Help!

Yes, PCA expects each observation or specimen to be a single row. Take a look at the two table attached, and the saved scripts therein. To get from one format to the other you need to understand 'Tables > Split' and 'Tables > Stack' which are, essentially inverse operations (look in the 'Help').

 

JMP displays missing numeric values as a '.'. If you double click on one of the nemeric cells and delete its contents you should see this. But (as you observed), values within any one column must have the same data type and modeling type.

dinosaur4
Level I

Re: JMP PCA Data Help!

Thank you for the help! I've attached a JMP table of my working dataset (now appropriately organized). So if I understand you correctly, I'm pretty much SOL for missing data? When I go to Analyze > Multivariate Methods > Principal Components, there is a box of selectable columns. The columns missing data have a red bar graph icon to the left of each column label, while the "complete" ones have a blue triangle icon. When I highlight all of the columns and drag them to Y, Columns (which says numeric values are required), it only "deposits" the blue triangle columns. So I presume this is effectively only selecting the complete columns? As a test I started to replace some of the missing entries with values (not copied, but random values the same length as all the others). Oddly, after doing this and re-running, these modified columns still showed as "incomplete". So what am I missing? And I guess my biggest question, what should I do for my missing data? I didn't want to estimate missing data because I'm comparing minor shape changes between related species, so an estimation would likely muddy the water. Since they're so closely related and morphologically similar, "blending" them would be less that ideal...

gianpaolo
Level IV

Re: JMP PCA Data Help!

Hi Dinosaur

i just manipulated little bit your datafile. i trasformed all Column in Numerico/continuous, then i imputed missing values on the table using JMP feature: EXPLORE MISSING VALUE--> MULTIVARIATE NORMAL Imputation. After that has been possible to perform PCA.

hope is what you needed |(please check the attached TEST TABLE jmp) 

 

 

Gianpaolo Polsinelli
dinosaur4
Level I

Re: JMP PCA Data Help!

Awesome! This is exactally what I needed! Thank you so much. Out of curosity - and I'm been reviewing the debating literature to no avail - is there a particular reason(s) that you picked Multivariate Normal versus Multivariate SVD? Thank you all again for the amazing help!

gianpaolo
Level IV

Re: JMP PCA Data Help!

I think is basically depend of your dataset...:

 

by JMP HELP MANUAL we can read that:
The MNI imputes missing values based on the multivariate normal distribution. The procedure requires that all variables have a Continuous modeling type. The algorithm uses least squares imputation. The covariance matrix is constructed using pairwise covariances. The diagonal entries (variances) are computed using all non-missing values for each variable. The off-diagonal entries for any two variables are computed using all observations that are non-missing for both variables. In cases where the covariance matrix is singular, the algorithm uses minimum norm least squares imputation based on the Moore-Penrose pseudo-inverse.

Multivariate Normal Imputation allows the option to use a shrinkage estimator for the covariances. The use of shrinkage estimators is a way of improving the estimation of the covariance matrix.


While the SVD is useful for data with hundreds or thousands of variables
Because SVD calculations do not require calculation of a covariance matrix, the SVD method is recommended for wide problems that contain large numbers of variables

 

Sometime i just replace (using a jsl script) empty cell with the mean if the Column

 

ciao

Gianpaolo

 

Gianpaolo Polsinelli