Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar

Distance-based linear modeling

Being an ecologist, I routinely switch back and forth between JMP and PRIMER, but, with each update of JMP, I can do more in JMP (and therefore less need for PRIMER). However, PRIMER has a multivariate, distance matrix-based algorithm called "distance-based linear modeling" (DistLM) that essentially works like stepwise regression (model selection by minimizing BIC/AICc) though with a distance matrix as the input. It is great for those looking to find the best model to explain similarity and differences between samples in a multivariate framework. I feel like JMP already has distance matrix capacity AND stepwise regression, and, although I know this is much more complicated than univariate stepwise regression, I think it should be possible to develop this technique in JMP (as well as the more commonly used PERMANOVA). Actually, it doesn't need to be DistLM, just any multivariate incarnation of stepwise regression. Pleassssssseeeeee. 

Tracking Number:

Defect ID: S1560329

4 Comments
Super User

I just reviewed several videos on using PRIMER, DistLM. It appears the method it uses is some type multivariate analyses on the biological quality metrics (the characteristics, like species count, or species quality), either using MDS (MultiDimensional Scaling) on attrbutes, or principal components. Then DistLM is run on  PC1 or PC2 which is a  stepwise regression or all possible regressions on potential factors.

 

This is likely doable in JMP but not in a single GUI interface.  I am really impressed with JMP spatial clusters. 

 

I am not downplaying your request for environmental platform. My goal is to propose JMP might provide a soution for you today.

 

If you have a sharable data set, like the one analyzed in this Keith Guiness video, post it then one of the JMP guru's might take a stab at it.

Thanks for the Post, it offered an avenue for learnng something new. 

Level V

JMP+practice+data.jmp

Hello, 

   Thank you for your interest in my query/request. What is this spatial clustering you speak of? heirarchical clustering? I do know how to make a distance matrix in JMP using cluster analysis: I just don't know how to uncover the factors that are most important in modeling the relationships/similarities within the matrix. Here is a small dataset I have been working with. There are 9 or so response various (continuous) and 10-12 environmental parameters (as character data). I would like to see which environmental parameters (or combinations thereof) best model the similarity between samples (using the multivariate response of all 9 response variables). Is this achievable in JMP (i.e., multivariate stepwise regression)? I was told "no" previously, but perhaps I was told in error!

Thanks, 

Anderson

Super User

Anderson,

Thank you for the practice JMP table.   An outline of the analysis steps are:

  1. Ran MultiDimensional Scaling on  columns max length (cm) thru Sym ubiq-lig specifying the data format as Attributes and asked for 3 dimensions. I am not familiar with this data and have no context so I specified Trasform None .Then Save the Dimensions. This saves Dimension 1 - Dimension 3 scores.  The graph of Dimension 2 vs. Dimension 1 shows some groups. However, I have little experience with MDS and more with Ptincipal Components ...
  2.  Ran Principal Components on these same variables. Below is a screenshot of the std. components of variable clustering and the loading matrix for the variables.  Prin1 is a mtrix for the Sym variables except for Sym GCP; Prin2 seems to be related to size and Prin3 some genetic chracteristics. Save the first 3 components.                                               image.png
  3. The character variables seem difficult to work with and I think you should score them, like temperature, or salinity. I just added some Value ordering like color, I ordered them as normal, pale, very, pale, bleached. At this step you could use stepwise for the 3 responses Prin1, Prin2 and Prin3.  However, when there are variables like Country and Island, I like to look at characteristics. This looks like availabe data. Below are three examples.  Not knowing this data and the fact that certain characteristics are nested (specific) to others, like Island or Host, and all effects are nominal, I will not recommend any specific method nor provide an analysis.  However, Host, salinity and color seem to be important.  Plot the dimensions or the principal components by Host to see the results   

reef zonereef zonereef typereef type   

salinity vs. hostsalinity vs. host

 

I am not a biologist and I I only watched a couple videos showing examples of Primer.  This was meant as an FYI.  

Level V

Hello, 

   Thanks so much for taking a look at my data. As a similar conclusion to what you came to, I ended up having to throw out several environmental parameters (reef zone and reefy type) because they were either so unbalanced or they scaled with island. I ended up just using 1) island, 2) temperature, 3) salinity, 4) ALCC, 5) depth, 6), sampling time (proxy for light), 7) colony color, and 8) host species as environmental factors. Then, I only used one of the two size parameters (max. colony length) since they give virtually the same information. I also concluded that "host" is likely to be the most important factor when using PCA, stepwise regression, and a few other approaches on JMP. 

    I really like your idea of using stepwise regression for the principal components, and another JMP technical support staff recommended this, so I'm going to check that out soon. Interestingly, I used stepwise regression for the Mahalanobis distance since I am interested in those samples that are most different (outliers), and a best-fit model including island, host, and ALCC (coral cover) resulted in the minimum BIC and an adjusted r2 of 0.4 or so. 

    I do still hope JMP can develop a multivariate, distance-based platform for uncovering environmental variation in the future (in a similar manner to PRIMER), but I think stepwise regression of the PC and/or MDS coordinates will be a good solution in the meantime. Thanks again for your help.