Re: Sample size calculation for TOST

Elofar · Jun 8, 2023 5:16 PM

Hi everyone!

Context: I have a process A which is my reference, and I developped a process B which I want to assess its comparability to process A. To do so, I plan to use a TOST (equivalence test) using the process A's 3xSTD (standard deviation) as a treshold. Now, I want to know how many time should I repeat process B to be sure to be able to detect such variation (knowing that a run is a significant amount of money).

Option 1: I wanted to use the Sample size & Power tool, with 2 sample means comparison, with an alpha of 0.1 (2x0.05 for each process), a STD = 1, no extra parameter, Difference to detect = 3 (3x STD of 1), Power = 0.95, & I obtain 7, so 4 runs of process B to compare to 4 runs of process A to ensure the comparability.

Option 2: To use the t-distribution (as shown below) and therefore suggest 6 runs:

What do you think about the two options, which one is best for this case? Do you have any other tool I didn't think of?

Thanks a lot !

@martindemel maybe ?

calking · Mar 31, 2020 11:59 AM

Hello Elofar,

I found an article which seemed to indicate that you could approximate the power calculation by using a t-test where the alpha_ttest = 1- power_equiv, power_ttest = 1-alpha_equiv, and the difference to detect matches the threshold value for the equivalence test. Using the Power and Sample Size calculator for two sample means, you would specify alpha=0.05 (1-0.95), StdDev=1, Difference to detect = 3, and Power=0.9 (1-0.1). That gets you a total sample size of 8, which looks to be in agreement with your Option 1.

Hope that helps!

cwillden · Mar 31, 2020 12:29 PM

We developed an app that does TOST sample size calculations. I can't send that out, but I can share the script I used to replicate power calculations for TOST in SAS using Owen's Q. It has the code for 1 and 2-sided scenarios with an example for each. Note, this will not handle small signal-to-noise scenarios well. In my experience with this so far, it only works until nu (sample size parameter) gets to be around 200-250 because Gamma( nu/2 ) runs into a numerical overflow issue and returns a missing value.

//Owen's Q
OwensQ = Function( {t, delta, a, b, nu},
	c = Sqrt( 2 * Pi() ) / (Gamma( nu / 2 ) * Power( 2, (nu - 2) / 2 ));
	integrand = Expr(
		Normal Distribution( ((t * x / Sqrt( nu )) - delta) ) * Power( x, nu - 1 ) * Normal Density( x )
	);
	c * Integrate( integrand, x, a, b, <<Starting Value( Mean( a, b ) ) );
);

/******** 1-Sample TOST Power ****************/
One_TOST_power = expr(
	nu = n - 1;

	t1 = -t Quantile( conf, nu );
	t2 = t Quantile( conf, nu );

	delta1 = (mu - muu) / (sigma / Sqrt( n ));
	delta2 = (mu - mul) / (sigma / Sqrt( n ));

	b = (Sqrt( nu ) * (muu - mul)) / ((2 * sigma / Sqrt( n )) * t Quantile( conf, nu ));

	//Power:
	OwensQ( t1, delta1, 0, b, nu ) - OwensQ( t2, delta2, 0, b, nu );
);

//1-sample case:
n = 15;
conf = 0.95;
mu = 505;
muu = 510;
mul = 490;
sigma = 4;

One_TOST_power(); //should be 0.9983947 (second part is effectively 0)

/******** 2-Sample TOST Power ****************/
Two_TOST_power = expr(
	nu = N - 2;
	w1 = 1/(W+1);
	w2 = W/(W+1);

	t1 = -t Quantile( conf, nu );
	t2 = t Quantile( conf, nu );

	//For Unequal Sample Sizes
	//Non-centrality parameters:
	delta1 = (mudiff - mudiff_u) / (sigma / (Sqrt( N ) * Sqrt( w1 * w2 )));
	delta2 = (mudiff - mudiff_l) / (sigma / (Sqrt( N ) * Sqrt( w1 * w2 )));

	//Upper integration limit:
	b = (Sqrt( nu ) * (mudiff_u - mudiff_l)) / ((2 * sigma / (Sqrt( N ) * Sqrt( w1 * w2 ))) * t Quantile( conf, nu ));

	Power = OwensQ( t1, delta1, 0, b, nu ) - OwensQ( t2, delta2, 0, b, nu );
);

//2-sample equal sizes:
N = 36;
W = 1;
conf = 0.95;
mudiff_l = -2;
mudiff_u = 2;
mudiff = 0;
sigma = 2;

Two_TOST_power();  //Should be just over 0.8

//2-sample unequal sizes:
N = 42;
W = 2; //unequal sample sizes:
conf = 0.95;
mudiff_l = -2;
mudiff_u = 2;
mudiff = 0;
sigma = 2;

Two_TOST_power();  //Should be just over 0.8

-- Cameron Willden

Elofar · Apr 1, 2020 02:01 AM

Thank you for that answer, not sure I understood the whole script tho, what am I supposed to change in?
My concern is that this experiment is "official" (will be submitted to authorities), therefore I am not sure about using such script ...

cwillden · Apr 1, 2020 03:20 PM

For the one-sample case, you specify the values of these variables:

n = 15; //total sample size
conf = 0.95; //confidence
mu = 505; //hypothesized mean
muu = 510; //upper equivalence limit
mul = 490; //lower equivalence limit
sigma = 4; //estimated standard deviation

For the two-sample case, you specify the values of these variables:

N = 36; //total sample size
W = 1; //weighting for sample size (e.g. use 2 for 1 of the groups to have 2x as many samples as the other
conf = 0.95; //confidence level
mudiff_l = -2; //lower equivalence limit on mean difference
mudiff_u = 2; //upper equivalence limit on mean difference
mudiff = 0; //estimated mean difference
sigma = 2; //estimated standard deviation

To find a sample size, you would just need to implement a loop to increase N until the desired power is achieved.

I understand your concern about needing to submit this to an authority for review. I work in a regulated business myself, and we have a validation protocol for the JMP add-in with the calculators we built based upon this script. The SAS documentation for PROC POWER shows the formulas for the power calculation for 1 and 2 sample TOST. You could reference that document to show the calculations are accurate, reproduce it in another software (e.g. R, I do have a script for that if you would like), or confirm power estimates through simulation. If you have SAS in anywhere in your organization, you can just get a proc power printout and be done.

-- Cameron Willden

Elofar · Apr 2, 2020 02:44 AM

Alright great thanks a lot !
I don't have R so I'll try this with the script !

Elofar · Apr 2, 2020 2:39 AM

Hi @calking ,

That sounds good, thank you! Couple of questions:
1. About the alpha, we were told that since we have 2 populations, we have to apply the alpha of 1-0.95 to both, so to use 0.05x2=0.1 in the global alpha of the tool, what do you think about that?
2. About the power, is that usual to use 0.9 ?

Thanks a lot for your help!

calking · Apr 2, 2020 09:52 AM

Certainly!

To answer your follow-up questions, the values I chose for the alpha and power to use in the Power Calculator were actually based on the double alpha and power values you were looking for. To clarify, since JMP currently does not have a calculator for an equivalence test, we are using a trick (see article) to approximate the power calculation using the two-sample t-test power calculator. The trick involves the fact that the equivalence test has switched the hypotheses you typically use in a two-sample t-test, so we'll need to switch up some values. So for your case, we proceeded as follows:

1. You want an alpha of 0.1 (2x0.05). For the trick, this means you'll set the power in the calculator to 0.9; that is, 1-(alpha intended for equivalence test).

2. You want a power of 0.95. For the trick, this means you'll set the alpha level in the calculator to 0.05; that is, 1-(power intended for equivalence test).

Note that we had to switch up the values we used for power and alpha. Hopefully that helps clarify where those values I used came from.

I made sure to use the total alpha value rather than the individual values to specifically match up with your particular values. However, if 0.05 is indeed the global confidence level you want to achieve, then that is probably the value you should use. A value of 0.1 will result in a smaller total sample size, but only because you're loosening the restriction on global confidence (90% confidence vs. 95% confidence). Also, the two-sample t-test already accounts for there being two populations and so already adjusts to ensure the global confidence level is maintained.

I ran some quick numbers and, either way you go (alpha=0.1 vs. alpha=0.05), the overall difference in total sample size is only 8 vs. 9. So ultimately it will come down to how your management decides to weigh the risks and returns (8 total samples for 90% global confidence vs. 9 total samples for 95% global confidence).

jszarka · Apr 10, 2020 01:49 PM

@calking - Any chance that equivalence tests get added?

Currently it can be done in similar software, such as Minitab -- https://support.minitab.com/en-us/minitab/18/help-and-how-to/statistics/equivalence-tests/supporting...

*ducks*

martindemel · Apr 29, 2020 07:28 AM

Hi John,

Please get in contact with your JMP SE/JMP Sales rep to provide you an invitation for the early adopter program (EA) for JMP 16 (as long you are user and are willing to provide feedback). In the EA our development share potential new features and ask for feedback before it will be shipped in the new version. This is a great chance to provide feedback on the "new" way to do power calculations and equivalence tests, as well if there is still something different to what you would expect. And for mid to end of may most likely there will be presented the new power and sample size platform in the EA, including equivalence tests as far as I know. Your feedback is highly appreciated.

Best,

Martin

/****NeverStopLearning****/