cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
mkrb
Level I

I have a question about multivariate analysis

I believe I set the data table up incorrectly.  What I would like to do is run a multivariate anaysis on this data.  I would like to determine the main effects of the bacterial strain (EC vs. SE) and type of surface (notes vs. control) over time.  So I would like to see if there is significant differenced between strain and if the numbers are significantly different between the control and notes and do the numbers decline over time.  Any advice would be greatly appreciated, thanks much, Mark

1 ACCEPTED SOLUTION

Accepted Solutions
julian
Community Manager Community Manager

Re: I have a question about multivariate analysis

Hi mkrb,

 

As Lou suggested, a stacked format of these data will be more amenable to analysis. It seems as though a mixed effects linear model would be appropriate in this situation. If you're using JMP Pro 11 you can use the new mixed model personality in Fit Model, or if you're not using Pro or are using an earlier version of JMP, you can still do the mixed model through standard least squares by marking effects as random. Here are the steps I went through to generate that analysis:

 

1. To stack the data, I used Tables > Stack. I then added the four response columns to the "Stack Columns" section, and renamed "Data" to "Y" and "Label" to "Condition"

 


7278_Screen Shot 2014-09-12 at 7.31.22 PM.png

 

2. We need to separate out the levels of your two factors  (Ec vs Se  and notes vs controls) if we wish to analyze these factorially. We can accomplish this by twice using Cols > Recode to create two new variables. First select the "Condition" column in the dataset and then select Cols > Recode. Here are the screen shots of the recoding I did, and be sure to select "New Column" rater than "In Place" for the new column (otherwise values will be written over).  What these will do is create two new columns that have the levels of each factor independently. This is important when we define the mixed model later.

 

7220_Screen Shot 2014-09-07 at 12.57.50 AM.png7274_Screen Shot 2014-09-12 at 6.39.48 PM.png

 

3. Rename the two new columns to have reasonable names. I'm using bacterial strain and surface.

 

4. We can now define the mixed model, which will take into account (i.e. explicitly model) the statistical dependence among the data points (introduced by taking repeated observations on the same experimental unit). We'll be using Analyze > Fit Model. From this point forward there will be a number of different models you can fit, each with their own assumptions about the covariance structure of the data. I like to start first with a maximal model with random slopes and intercepts for the experimental units across the fixed structure. Simulation studies have shown these models to generalize better (e.g. http://www.sciencedirect.com/science/article/pii/S0749596X12001180) and I find the meaning of the denominator error in each test to be more interpretable (e.g. the degree to which the magnitude of some effect or interaction differs across the experimental units).  Here are the general steps to produce such a model when your entire experimental structure is replicated for each experimental unit (which is what you have here - each replicate is measured on all experimental levels). This heuristic is not appropriate when you have units nested inside of other factors.

 

    a. Full factorial (macro) of only your fixed factors (or factorial sorted, which will produce an ordered list)

    b. Full factorial (macro) of your fixed factors AND experimental unit (JMP will not duplicate terms, so this produces only the sources for the experimental units and crosses with the experimental units). Ensure that the column identifying the individual units is modeled as Nominal.

7277_Screen Shot 2014-09-12 at 7.28.43 PM.png


    c. Mark all effects associated with your experimental unit as Random Effect

  d. Remove terms for which you do not have degrees of freedom to fit*

          *This is actually optional. JMP is intelligent enough to know when an effect is confounded with residual error and will remove those terms from the model.

 

    The above is more simple than it looks when written out. Here's a quick video of me defining the model with your data:

 

 

5. Click Run and you'll see the results of the analysis. The Fixed Effects section is probably of most interest.

 

I like to turn on the Profiler from the top Red Triangle menu >> Factor Profiling >> Profiler. I also like to turn on the LSMeans plots for each factor, which can be deployed by expanding the Effect Details section, and then under the Red Triangles for each source you can turn on the plots and request further tests. If you want to turn on all plots at once, hold down the control key before clicking the Red Triangle to turn on a plot. Once you have requested the plot you can let go, and this will tell JMP that you want to broadcast the command. If you're on a mac you will use the command key.

 

I've attached the restructured dataset with scripts saved for the analysis.

 

I hope this helps!

 

Julian

View solution in original post

3 REPLIES 3
louv
Staff (Retired)

Re: I have a question about multivariate analysis

Maybe your data table needs to be stacked? Tables>Stack

rishikeshgg
Level I

Re: I have a question about multivariate analysis

Comparing two time series data is difficult (I personally don't know how to do it in JMP). Please check the attched file, scripts saved in it. Is this what you wanted to do?

julian
Community Manager Community Manager

Re: I have a question about multivariate analysis

Hi mkrb,

 

As Lou suggested, a stacked format of these data will be more amenable to analysis. It seems as though a mixed effects linear model would be appropriate in this situation. If you're using JMP Pro 11 you can use the new mixed model personality in Fit Model, or if you're not using Pro or are using an earlier version of JMP, you can still do the mixed model through standard least squares by marking effects as random. Here are the steps I went through to generate that analysis:

 

1. To stack the data, I used Tables > Stack. I then added the four response columns to the "Stack Columns" section, and renamed "Data" to "Y" and "Label" to "Condition"

 


7278_Screen Shot 2014-09-12 at 7.31.22 PM.png

 

2. We need to separate out the levels of your two factors  (Ec vs Se  and notes vs controls) if we wish to analyze these factorially. We can accomplish this by twice using Cols > Recode to create two new variables. First select the "Condition" column in the dataset and then select Cols > Recode. Here are the screen shots of the recoding I did, and be sure to select "New Column" rater than "In Place" for the new column (otherwise values will be written over).  What these will do is create two new columns that have the levels of each factor independently. This is important when we define the mixed model later.

 

7220_Screen Shot 2014-09-07 at 12.57.50 AM.png7274_Screen Shot 2014-09-12 at 6.39.48 PM.png

 

3. Rename the two new columns to have reasonable names. I'm using bacterial strain and surface.

 

4. We can now define the mixed model, which will take into account (i.e. explicitly model) the statistical dependence among the data points (introduced by taking repeated observations on the same experimental unit). We'll be using Analyze > Fit Model. From this point forward there will be a number of different models you can fit, each with their own assumptions about the covariance structure of the data. I like to start first with a maximal model with random slopes and intercepts for the experimental units across the fixed structure. Simulation studies have shown these models to generalize better (e.g. http://www.sciencedirect.com/science/article/pii/S0749596X12001180) and I find the meaning of the denominator error in each test to be more interpretable (e.g. the degree to which the magnitude of some effect or interaction differs across the experimental units).  Here are the general steps to produce such a model when your entire experimental structure is replicated for each experimental unit (which is what you have here - each replicate is measured on all experimental levels). This heuristic is not appropriate when you have units nested inside of other factors.

 

    a. Full factorial (macro) of only your fixed factors (or factorial sorted, which will produce an ordered list)

    b. Full factorial (macro) of your fixed factors AND experimental unit (JMP will not duplicate terms, so this produces only the sources for the experimental units and crosses with the experimental units). Ensure that the column identifying the individual units is modeled as Nominal.

7277_Screen Shot 2014-09-12 at 7.28.43 PM.png


    c. Mark all effects associated with your experimental unit as Random Effect

  d. Remove terms for which you do not have degrees of freedom to fit*

          *This is actually optional. JMP is intelligent enough to know when an effect is confounded with residual error and will remove those terms from the model.

 

    The above is more simple than it looks when written out. Here's a quick video of me defining the model with your data:

 

 

5. Click Run and you'll see the results of the analysis. The Fixed Effects section is probably of most interest.

 

I like to turn on the Profiler from the top Red Triangle menu >> Factor Profiling >> Profiler. I also like to turn on the LSMeans plots for each factor, which can be deployed by expanding the Effect Details section, and then under the Red Triangles for each source you can turn on the plots and request further tests. If you want to turn on all plots at once, hold down the control key before clicking the Red Triangle to turn on a plot. Once you have requested the plot you can let go, and this will tell JMP that you want to broadcast the command. If you're on a mac you will use the command key.

 

I've attached the restructured dataset with scripts saved for the analysis.

 

I hope this helps!

 

Julian