Re: multivariate data with a repeated measures design - Page 2

abmayfield · Jun 8, 2023 09:01 PM

Apologies for farming out what is more of a statistics than a JMP question to the community, but here goes: I was recently asked if JMP Pro could analyze a design in which 3,000 analytes are measured in the same individuals over time (one group perhaps receiving a medicine and the other having been given a placebo). The large number of analytes is not the problem, but the fact that this is a repeated-measures design and, to my knowledge, the multivariate options under Fit Model cannot handle a repeated measures design (would be no problem if there was a single Y). If I put in "time" into the model under MANOVA or partial least squares, that doesn't accommodate the repeated measures nature. Am I correct in assuming that the optimal statistical test I want cannot be performed in JMP Pro? Maybe instead I could look at "% change in concentration of each analyte" for each individual, thereby removing time from the model, but I am open to other options!

Anderson B. Mayfield

abmayfield · May 19, 2023 12:27 PM

I think Chris' idea is a good one because, really, what one would want to know is whether the drug (or whatever treatment):

A) elicited no effect over time for the average individual

B) elicited a sustained effect (whether it be a decrease or increase)

C) elicited a hyperbolic effect (caused analyte concentration to go down and then return to baseline or go up and return to baseline)

So by doing it molecule-by-molecule in the mixed model, I can accommodate the repeated-measures nature, and, although the resulting output will be large, it shouldn't be difficult to do some table sorting and then subsetting by the various effects, e.g., identify those proteins with no effect, those with an increase over time, and those with a "rebound" (hyperbolic) response.

Thanks everyone for the team effort, and I'll report back later as to how this worked (I'm asking on behalf of someone else or else I could try it immediately.).

Anderson B. Mayfield

eclaassen · May 18, 2023 03:44 PM

Hi, Anderson,

When you say "3,000 analytes", that means you'd have 3000 Y variables you/they're interested in analyzing simultaneously due to the possible correlations between them (as they're measured on the same subject), correct?

Mixed Model certainly has the capability of fitting a repeated measures structure (ie AR(1), ANTE) to model the correlations across timepoints for a single Y (https://www.jmp.com/support/help/en/17.1/#page/jmp/example-of-repeated-measures.shtml#ww1279888). It also has the capability of fitting multiple Ys that are correlated responses (https://www.jmp.com/support/help/en/17.1/#page/jmp/example-of-a-correlated-response.shtml#). But there is no way to specify multiple covariance structures in JMP to combine the two.

It might be possible to fit such a mixed model in SAS, but I think it likely would be asking too much of the data to model the multiple sources of correlation in this way (and hope to have any power to detect the treatment differences you're interested in!). You'd have 4,501,500 variances & covariances between the 3000 Ys plus any repeated measures parameters to estimate before adding in the treatment parameter(s)!

Another possible platform might be SEM, as it is very flexible in specifying correlation structures. @LauraCS might be able to speak to that more effectively than I can. I do know that it can run into the same issues of not enough data that mixed models do.

I, personally, have never heard of including a repeated measures structure with a PLS model, but I'm not as familiar with PLS, generally. MANOVA is mathematically equivalent to the Mixed Model correlated responses, so we're kind of in the same place there.

Simplifying the responses to be the delta from beginning to end would likely get to a similar decision with a much simpler model to try to explain later! Assuming the response didn't change in the middle and then return to baseline at the end, of course. I do like the FDE idea, as well, if they're continuous responses, though I don't think that will capture the responses' possible correlation.

-Elizabeth

LauraCS · May 18, 2023 04:58 PM

Hi Anderson,

I have the same question that @eclaassen brought up regarding the precise structure of your data. However, it does sound like Mixed Models or SEM is the natural fit for this analysis (indeed, PLS + mixed model w/ repeated structure is somewhat like SEM).

Here's a discovery presentation that explains how to model trajectories in SEM:

https://community.jmp.com/t5/Discovery-Summit-Americas-2021/Modeling-Trajectories-over-Time-with-Str...

Helpful times in the video:

1min 33 sec -- Why SEM can help with repeated measures analysis

3min 40 sec-- Requirements for using SEM with longitudinal data (but note that version 18 will have robust inference for non-normal continuous data)

11min 23sec-- Example of using SEM with repeated measures data

As @ih pointed out, your data will need to be in "wide format" with one individual per row and one column per repeated measure. SEM allows one to compare a model where all individuals have flat trajectories (Intercept-only Latent Growth Curve), to one where they have linear, quadratic, or other nonlinear (latent basis) growth.

Because you have two groups (placebo and medication), you can compare the trajectories across the groups by using multiple-group analysis in SEM. It's pretty common to use SEM in clinical trials for this sort of thing (here's one example from a quick google search). Our documentation has an example comparing male and female students' trajectories over time (with 4 repeated measures):

https://www.jmp.com/support/help/en/17.1/#page/jmp/example-of-multiple-group-analysis.shtml#ww676033

HTH,

~Laura

Laura C-S

abmayfield · May 18, 2023 07:30 PM

Wow! Thanks so much everyone. This is indeed what is referred to as a longitudinal study. I started looking at data, though, and it seems like the sample size needs to be 4- or 5-fold higher than the number of Y's, or else you failed the "sample size test." In these 'Omics studies, there will almost always be many more Y's (molecules) than subjects. For instance, I have one dataset with 71 proteins measured in each of 16 individuals (8/treatment x 2 treatments). This design would "fail" the sample size rule. Does this mean you could only use SEM when there are more experimental subjects than analytes? If so, then it may not be the best platform for what I'm seeking to do (going molecule-by-molecule for 3,000 proteins would take too long!).

Anderson B. Mayfield

LauraCS · May 19, 2023 09:27 AM

Ah! Yes, that's an important caveat... SEM won't work if there are more columns than rows in your data. The sample size required in SEM depends on the number of parameters your model estimates; you want more rows than parameters.

Mixed models are more forgiving than SEM when it comes to sample size but with 16 rows and 71 columns, I don't think that's the solution either. Based on this, I think FDE will provide what you need. Checking out the links that @Chris_Kirchberg shared for FDE-DOE should help.

Best,

~Laura

Laura C-S

SamGardner · May 22, 2023 12:37 PM

Another approach would be to use a random intercept / random slopes model. If the profiles of the subject results vs time can be described by a linear regression model, then the Mixed Model platform can fit a model with random slopes and/or or random intercepts for each subject. This script creates and example table and with scripts saved to run this type of analysis.

Names Default To Here( 1 );

dt = New Table( "Example" );

dt << New Column( "Y" );
dt << New Column( "subject", Nominal, Numeric );
dt << New Column( "time" );

ys = [];
t = [];
subjects = [];

times = [0, 1, 2, 3, 6, 9, 12];
ntimes = N Rows( times );

For( subject = 1, subject <= 1000, subject++, 

	mu = Random Normal( 0, 1 );
	slope = Random Normal( 1, .5 );

	y_subject = J( ntimes, 1, mu ) + slope * times + J( ntimes, 1, Random Normal( 0, 0.5 ) );

	ys = ys |/ y_subject;
	t = t |/ times;
	subjects = subjects |/ J( ntimes, 1, subject );
);

dt:y << set values( ys );
dt:time << set values( t );
dt:subject << set values( subjects );

dt << Add Properties to Table(
	{New Script(
		"Y vs. time",
		Graph Builder(
			Size( 528, 454 ),
			Show Control Panel( 0 ),
			Variables( X( :time ), Y( :Y ), Overlay( :subject ) ),
			Elements( Points( X, Y, Legend( 13 ) ), Line Of Fit( X, Y, Legend( 15 ) ) )
		)
	)}
);

dt << Add Properties to Table(
	{New Script(
		"Fit Mixed",
		Fit Model(
			Y( :Y ),
			Effects,
			Random Effects( Intercept[:subject], :time[:subject] & Random Coefficients( 1 ) ),
			NoBounds( 1 ),
			Personality( "Mixed Model" ),
			Run(
				Repeated Effects Covariance Parameter Estimates( 0 ),
				Residual Plots( 1 ),
				Conditional Residual Plots( 1 ),
				Covariance of Covariance Parameters( 1 ),
				Conditional Profiler(
					1,
					Confidence Intervals( 1 ),
					Term Value(
						"Conditional",
						subject( 9, Lock( 0 ), Show( 1 ) ),
						time( 4.714, Lock( 0 ), Show( 1 ) )
					)
				),
				Linear Combination of Variance Components( [1 0 1], Label( "asdfsad" ) )
			),
			SendToReport(
				Dispatch( {}, "Random Coefficients", OutlineBox, {Close( 0 )} ),
				Dispatch(
					{"Linear Combination of Variance Components"},
					" ",
					TextEditBox,
					{Set Text( "asdfsad" )}
				)
			)
		)
	)}
);

abmayfield · May 31, 2023 04:11 PM

Thanks so much. I will certainly try this, too.

FYI (to anyone reading this, especially those who responded), I have now tried @Chris_Kirchberg 's suggestion: I put ~35,000 proteins as Y's in the Mixed Model platform, and then set it up as a repeated-measures ANOVA: treatment, time (day), and treatment x day, with a repeated subject defined (as a unique sample ID) and the repeated event being "day" (unequal variances personality). On my Macbook Pro with 64 GB of RAM (JMP Pro 18 beta), this only took 2-3 minutes to run. I now have a series of tables, one of which having all ~114,000 comparisons (treatment, time, treatment x time x 35,000 proteins) that I can sort by FDR p-value. My only question now is: assuming I use the FDR p-value (to avoid type I errors for having so many comparisons), am I remise in NOT checking out the gobs of other output data? Covariance estimates should have been accommodated by the repeated structure. Maybe I could test out other RM personality types and see if the BIC drops?

I think looking at distributions and homogeneity of variance may go out the window with 35,000 analytes, BUT I suppose I could subset by analytes for which no transformations are necessary, analytes for which a square root transformations are necessary, etc. to try and improve fit.

Anderson B. Mayfield