JMP PCA Data Help!

Report Inappropriate Content · Jan 16, 2019 03:39 PM

I'm new to doing PCAs in JMP. Playing around with it though seems fairly simple. However, when I'm uploading my own data, that's where the problems begin. I am doing 3D GM, so each landmark (36 in total) has an X, Y, & Z coordinate. In a simple txt or Excel table, the data for each specimen is converted into 3 columns (an X, Y, & Z column) with 36 rows (a row for each landmark).

It would seem that in JMP that I have to convert my data - each specimen would therefore be a single row with 108 columns (X1, Y1, Z1, X2, Y2, Z2, etc.) - otherwise it would reach each row as a specimen with a single X, Y, & Z coordinate. Correct? Secondly, some of my landmarks are missing, and it's not as simple as "Landmark 8" is missing in each specimen. I know the pros and cons and arguments of estimating missing data etc., but I don't want to do that. In R I simply used "NA" for missing data, but it seems that JMP doesn't like combining numbers/values/symbols. Nor does it seem to like a - in place of missing data, nor an empty field. Any help or suggestions are greatly appreciated!

ian_jmp · Jan 17, 2019 9:20 AM

Yes, PCA expects each observation or specimen to be a single row. Take a look at the two table attached, and the saved scripts therein. To get from one format to the other you need to understand 'Tables > Split' and 'Tables > Stack' which are, essentially inverse operations (look in the 'Help').

JMP displays missing numeric values as a '.'. If you double click on one of the nemeric cells and delete its contents you should see this. But (as you observed), values within any one column must have the same data type and modeling type.

dinosaur4 · Jan 21, 2019 08:33 AM

Thank you for the help! I've attached a JMP table of my working dataset (now appropriately organized). So if I understand you correctly, I'm pretty much SOL for missing data? When I go to Analyze > Multivariate Methods > Principal Components, there is a box of selectable columns. The columns missing data have a red bar graph icon to the left of each column label, while the "complete" ones have a blue triangle icon. When I highlight all of the columns and drag them to Y, Columns (which says numeric values are required), it only "deposits" the blue triangle columns. So I presume this is effectively only selecting the complete columns? As a test I started to replace some of the missing entries with values (not copied, but random values the same length as all the others). Oddly, after doing this and re-running, these modified columns still showed as "incomplete". So what am I missing? And I guess my biggest question, what should I do for my missing data? I didn't want to estimate missing data because I'm comparing minor shape changes between related species, so an estimation would likely muddy the water. Since they're so closely related and morphologically similar, "blending" them would be less that ideal...

gianpaolo · Jan 22, 2019 03:07 AM

Hi Dinosaur

i just manipulated little bit your datafile. i trasformed all Column in Numerico/continuous, then i imputed missing values on the table using JMP feature: EXPLORE MISSING VALUE--> MULTIVARIATE NORMAL Imputation. After that has been possible to perform PCA.

hope is what you needed |(please check the attached TEST TABLE jmp)

Gianpaolo Polsinelli

dinosaur4 · Jan 24, 2019 10:22 AM

Awesome! This is exactally what I needed! Thank you so much. Out of curosity - and I'm been reviewing the debating literature to no avail - is there a particular reason(s) that you picked Multivariate Normal versus Multivariate SVD? Thank you all again for the amazing help!

gianpaolo · Jan 24, 2019 10:39 AM

I think is basically depend of your dataset...:

by JMP HELP MANUAL we can read that:
The MNI imputes missing values based on the multivariate normal distribution. The procedure requires that all variables have a Continuous modeling type. The algorithm uses least squares imputation. The covariance matrix is constructed using pairwise covariances. The diagonal entries (variances) are computed using all non-missing values for each variable. The off-diagonal entries for any two variables are computed using all observations that are non-missing for both variables. In cases where the covariance matrix is singular, the algorithm uses minimum norm least squares imputation based on the Moore-Penrose pseudo-inverse.

Multivariate Normal Imputation allows the option to use a shrinkage estimator for the covariances. The use of shrinkage estimators is a way of improving the estimation of the covariance matrix.

While the SVD is useful for data with hundreds or thousands of variables
Because SVD calculations do not require calculation of a covariance matrix, the SVD method is recommended for wide problems that contain large numbers of variables

Sometime i just replace (using a jsl script) empty cell with the mean if the Column

ciao

Gianpaolo

Gianpaolo Polsinelli

JMP PCA Data Help!

Re: JMP PCA Data Help!

Re: JMP PCA Data Help!

Re: JMP PCA Data Help!

Re: JMP PCA Data Help!

Re: JMP PCA Data Help!

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Multiple-Group Analysis in Structural Equation Modeling