Hello, and welcome to our presentation, Approaches to Comparations with JMP® 17. My name is Mark Bailey. I'm a senior Analytics Software Tester, and my co- presenter today is Jianfeng Ding, a Senior Research Statistician Developer. Before we get into the new features, we're going to take a moment to make sure that everyone has the proper background to appreciate these new methods.
This has to do with using statistical inference when we're trying to compare two populations. This is a very common task, and the comparison usually leads to a decision between two ideas about these populations. If we could observe the populations in their entirety, we wouldn't need statistics, but that's not usually the case. So, we have to work with samples from the populations. Statistical inference can provide some really valuable information about those samples.
In particular, is there sufficient evidence to reject one idea about the two populations? So a clear statement of these ideas or hypotheses is essential to making the correct choice for the test and also for the correct interpretation. So let's talk a little bit about the ideas or hypotheses that are part of these statistical tests. The alternative and null hypotheses, as they're known, represent mutually exclusive statements about these populations, and no other hypothesis is possible.
For example, one statement might be for population A and population B that their means are equal. The other idea is that they're not equal. Those two ideas or hypotheses are mutually exclusive and no other hypothesis is possible. What's the role of these two ideas? The alternative hypothesis states the conclusion that we would like to claim. It represents the populations and it will require sufficient evidence in the data to overthrow the other hypothesis.
The other one is called the null hypothesis. It states the opposing conclusion that must be overcome by strong evidence. It serves as a reference for this comparison. It's assumed to be true. Now, currently, there's somewhat of a misunderstanding in hypothesis testing in that it only thinks about comparison in one direction. That's because historically, this is the way it was presented in training.
That most often taught test is used to demonstrate that there's a difference between two populations. The resulting lack of understanding can lead to a misuse of these tests. To be clear, the choice of the test is not a matter of what data is collected or how the data is collected. It's entirely about the stated hypotheses for the purpose of your comparison. Let's look at these two possibilities. Let's say that the goal of our comparison is to demonstrate a difference.
We want to explicitly state these ideas to make sure the test is clear. In the first example, I want to demonstrate that a temperature change causes a new outcome. That is, there is a difference. We'd like to claim that the new level of the response will result from a change in the process temperature. Perhaps we expect a higher yield or we'd like to show more stability. A design experiment is used to randomly sample from the population for a low- temperature condition and from a population of a high- temperature condition.
The two hypotheses for this test, the null hypothesis states that the temperature does not affect the outcome. Remember, it's our reference and we assume it to be true. The alternative hypothesis is our idea. Temperature affects the outcome, but we can decide this only if the evidence is strong enough to reject the null hypothesis. Now, let's change that. Let's reverse that. Let's say that in this comparison, I want to demonstrate equivalence. In the second example, I want to show that a temperature change does not cause a change in the outcome.
That is, the outcome, either way, is equivalent. We want to claim that a planned change in the process temperature will improve the yield but not affect the level of an impurity of the product. We design the same experiment, collect the same samples, but now our hypotheses are switched. The null hypothesis is that the temperature affects the outcome. It's our new reference, we still assume it to be true, while the alternative hypothesis states that the temperature does not affect the outcome, the impurity level.
But we can make that claim only if the evidence is strong enough to reject the null. So do I test for a difference or for equivalence? The key is how you state your hypotheses. These two examples, I think you can see, use identical data but different tests. The choice of the test is not about the data. It's about the claim that we want to make and how we state that properly in the hypothesis. Remember that statistical tests are unidirectional. That is, we can reject a null or not. If the test is to reject a null hypothesis with a high probability when it's false.
Now, let's get to the new features in JMP® 17. Jianfeng will now present the equivalence tests, and when she's finished, I'll present method comparison.
Now, I'm going to share my screen. Hello, my name is Jianfeng Ding. I'm a Research Statistic Developer at JMP® 17. In this video, I'm going to talk about the Equivalence, Noninferiority, and Superiority Test in JMP® 17. The basic hypothesis test on the left is a test that most quality professionals are familiar with. It is often used to compare two or more groups of data to determine whether they are statistically different.
The parameter theta can be a mean response for continuous outcome and a proportion when the outcome variable is binary. Theta T represents response from treatment group and theta zero represents the response from a control group. There are three types of the basic hypothesis test. The first one is two-sided test and the rest are one- sided tests. If you look at the two-sided test on the left, the no hypothesis is that the treatment means are same and the alternative hypothesis is that the treatment means are different.
Sometimes we really need to establish that things are substantially the same, and the machinery to do that is called an equivalence test. An equivalent test is to show the difference in theta T, theta zero is within a pre- specified margin delta and allow us to conclude equivalence with a specified confidence level. If you look at the equivalence test, the no hypothesis is that the treatment means are different and the alternative hypothesis is that the treatment means are within a fixed delta of one another.
This is different from the two- sided hypothesis test on the left. Another alternative testing scenario is the noninferiority test which aims to demonstrate that results are not substantially worse. There is also a testing scenario called superiority testing that is similar to noninferiority testing, except that the goal is to demonstrate that results are substantially better. There are five different types of equivalence type tests. Depend on the situation. When should we use this test? Will be discussed next.
These tests are very important in industry, especially in the biotech and pharmaceutical industry. Here are some examples. If the goal is to show that the new treatment does not differ significantly from the standard by more than some small margin, then equivalence test should be used. For example, a generic drug that is less expensive and cause few side effects than a popular name- brand drug. You would like to prove it has same efficacy as the name- branded one. The typical goal in noninferiority testing is to conclude that a new treatment process or product is not significantly worse than the standard one.
For example, a new manufacturing process is faster. You would make sure it creates no more product defects than the standard process. A superiority test try to prove that the new treatment is substantially better than the standard one. For example, a new fertilizer has been developed with several improvements. The researchers want to show that the new fertilizer is better than the current fertilizer. How to set up the hypothesis? The graph on the left summarizes five different type of equivalence type tests very nicely.
This graph is created by our SAS/STAT® colleagues, John Castelloe, and Donna Watts. You can find their white paper on this on the web. Choosing which test depend on the situation. For each of this situation, the region that we are trying to establish with the test is shown in blue. For equivalence analysis, you can construct an equivalence region with upper bound theta zero plus delta and lower bound theta zero minus delta. You can conduct an equivalence test by checking whether the confidence interval of theta lies in entirely in the blue equivalence region.
Likewise, you can conduct a noninferiority test by checking whether the confidence interval of theta lies entirely above the lower bound if larger theta is better or below the upper bound if smaller theta is better. These tests are available in JMP® 17 in one way for comparing normal means and in contingency for comparing response rates. The graphical user interface of equivalence test launch dialogs makes it easy for you to find the type of test that correspond what you are trying to establish.
A forest plot in the report summarize the comparison very nicely and makes it easy for you to interpret the results. Next, I'm going to do a demo about equivalent test, superiority test, and noninferiority tests. I'm going to use a data set called the Drug Measurement that is in the JMP sample data library. Twelve different subjects were given three different drugs A, B, and C, and continuous measurement were made. I first launched Fit Y by X and put the measurement as Y and drug type as X factor.
This will bring the one- way analysis. Under the red triangle menu, let's first find equivalent test and there are the two options, means, and standard deviations. We're going to focus on means for this example. This will bring the equivalence launch dialog. In this section, you can choose which test you would like to conduct, and the graph represents the choice of the selected test. For superiority and noninferiority test, there are two scenarios.
One is large difference is better. Another is small difference is better. Choose which one depend on the situation. You need to specify the delta or the margin for the test. You also need to specify the confidence level alpha for the test. You can either choose the Pooled Variance or Unequaled variance to run the test. For this example, we run the equivalent test first and we specify three as a difference. We are going to do the equivalent test for all the pairs. Click okay and it will bring the equivalent test result.
On the top, it is the statistical detail for the equivalence test, and the bottom it's a forest plot. You notice there are two regions. The blue region are the equivalence region and the red are the non- equivalence region. The lines here represent the confidence interval of the mean difference between two groups. If we look at this line, this is the confidence interval of the mean difference between drug A and C, and you see this line is completely contained inside this blue region. We look at the P value of the equivalence test that is 0.02 which is smaller than 0.05. So at this 5 % significance level, we can declare that drug A and C are equivalent.
But when you look at the confidence interval of a mean difference between drug A and B, and drug B and C, they are lined beyond this blue region. So we conclude at the 5 % significance level drug A and B, and drug B and C, we cannot conclude they are equivalent. Next, if we assume drug C is a standard drug and you would like to find out if drug A and B are better than drug C, that make us want to do a superiority test. Let me close this outline note for now and we launch the equivalent test again and we click Means.
This time we're going to run superiority test. We click the superiority test and we prefer large difference and we specify 0.4 as our margin. This time we need to set drug C as our control group and click OK. This will bring the superiority test. From this forest plot you can see the confidence interval of a mean difference between drug B and C is completely contained inside this blue region. So we conclude that at the 5 % confidence level, we can declare that the B is superior to drug C. But we cannot make a same conclusion to drug A and C. This concludes my first example.
The next example will show how to conduct a noninferiority test for the relative risk between two proportions. Let me open the data table. A randomized trial is to compare drug FIDAX as an alternative to drug VENCO for the treatment of colon infections. The two drugs have similar efficacy and safety. Two hundred twenty- one out of 225 patients treated with FIDAX achieved clinical Q by the end of the study, compared to 223 out of 257 patients treated with VENCO.
We launch Fit Y by X again and we plug in Q as Y and drug as X factor and account as a frequency and click OK. Will bring the contingency analysis. From the Likelihood Ratio test, Pearson test, and Fisher's Exactly test, they all indicate that there is no big difference between these two drugs. But we would like to find out if drug FIDAX is not inferior to drug VENCO. We go to the up red triangle and bring the equivalent test. There are two options. One is the risk difference and one is relative risk. For this example, we choose relative risk.
Again, bring this equivalent test launch dialog. For this example, we're going to run noninferiority test. A gain, we prefer large ratios. We specify 0.9 as our margin and we care about the treatment effect. So we should choose Yes and then we click OK. So this bring the noninferiority test. From the forest plot, we can see the confidence interval of the relative risk is completely contained in this blue region, and the P value for the noninferiority test is very small. So we conclude at the 5 % significant level, the drug FIDAX is not inferior to drug VENCO. This concludes my talk and I will give back to Mark. I need to stop this share.
Thank you, Jianfeng. Now in the last part of our presentation, I'm going to talk about another comparison where we want to compare the results or measurements from two different methods of measuring some quantity. We assume that there's a standard method that already exists. It has been validated. We can use it to measure the level of some quantity. That might be the temperature or the potency of a drug. But for some reason, we've developed a new method for the same result.
We must compare its performance to the standard method before we use it. This is a long standing issue. This comparison has been codified for a long time by numerous international organizations. I've listed a few of them on this slide. So this is a very well studied and an established comparison. In this case, we're going to compare to identity. We're going to compare these two methods where ideally the test method would give us the same value as the standard method. So we plot the data using a scatter plot. We have the test method on the vertical axis and the standard method result on the horizontal axis. We can even plot the identity line where Y equals X for reference.
Ideally, we would get the result from both methods, but that won't happen because of measurement error in both methods. We'll use regression analysis to determine the best fit for this line where we have the test method versus the standard method. Then the estimated parameters for our model can be compared to the identity line. The null hypothesis, we start with the idea that they are not the same. So the intercept of this line is not zero or the slope is not one or possibly both. In other words, the results are not equivalent.
The alternative, which we would like to claim is that they are equivalent. So there we state that the intercept should be zero and the slope should be one. To make this comparison using regression, we have to postulate a model. In this case, it's a simple linear regression model. We have a constant term A, a proportional term B times X. We're going to estimate those parameters A and B, and use our hypotheses to decide. We also have a term epsilon, and that represents the measurement error, the random variation. Using linear regression, we assume that the Y and the X are linearly related.
We assume that the statistical errors that epsilon are in Y, not in X. We also assume that those errors are distributed in a way that are independent of the response. In other words, the statistical error is the same across the entire range of this method. Also, that no data exert any excessive influence on these estimates. Well, in method comparison, we usually violate these assumptions. First of all, there is measurement error in the standard method as well.
Also, often the errors are not constant. That is, we observe a constant coefficient of variation but not a constant standard deviation. Outliers are present that can strongly influence the estimation. Other regression methods are required in such a case. The Deming regression simultaneously minimizes the least square error in both Y and X. That's appropriate for this case. The Passing- Bablock regression is a nonparametric method that's based on the median of all possible pairwise slopes.
It's resistant to outliers and nonconstant errors. Let's talk about each of these briefly. The Deming regression is provided in the Bivariate platform through the Fit Orthogonal Command. This has been available in JMP for many years. The Deming regression can estimate the errors in Y and X, assume that the errors in Y and X are equal, or use a given ratio of Y to X error.
Passing- Bablock regression is new. JMP® 17 introduced this method in the Bivariate platform through the Fit Passing-Bablock command. This command also includes checks for the assumptions that the measurements are highly positively correlated and exhibit a linear relationship. Method comparison often includes a comparison by difference. The Bland- Altman analysis compares the pairwise differences as Y to the pairwise means as X to assess bias between the two values.
The results are presented in a scatterplot of Y versus X for your examination and also to identify any anomalies. This occurs in the Matched Pairs platform. The Matched Pairs platform has been part of JMP for many years as well, but the Bland-Altman test is a new addition. The report also presents the hypothesis test. Now I'd like to demonstrate these two methods. As I said, Deming regression has been available for a long time, but for completeness' sake, I'm going to demonstrate it here alongside the new methods.
I select from the Analyze menu, Fit Y by X. I'm going to compare test Method 1 to my standard method. The standard method goes in the X role and the new test method goes in the Y role. You could evaluate more than one test method at the same time. Here's my plot. Initially, I see a scatter plot. I expect these two to agree very well. I expect to see that they follow this diagonal path, that they're linear, and so forth. I'll click the red triangle next to Bivariate and select Fit Orthogonal.
In this case, I don't really know that the variances are equal or I don't have any prior information that I could specify a ratio. So I'll have JMP estimate the errors in both. I'll use the first option here. Now we have the fitted line using Deming regression. Below that we have the report. The report includes an estimate of the intercept. We can see it's small and close to zero. We have an estimate for the slope.
Using the confidence interval, we see that it includes one, which we would expect if the test method agrees with the standard. That's Deming regression. Now, we're going to take a look at the new methods Passing-Bablock regression in Blad- Altman. Same start, select Analyze, and then select Fit Y by X. I'm going to use the Recall button here. I want to compare Method 1 to the standard, but I'm going to use a new regression technique.
I'll click on the red triangle and select Fit Passing-Bablock. There's actually two lines here. There's a red line that represents the best- fit line using the Passing-Bablock regression. But there's also, for our reference, a line that represents where y equals x. It's hard to see. I'm going to use the magnifier tool to magnify a few times. Now you can see that there are, in fact, two separate lines. One is the identity and one is the fit, but they overlap quite a bit. These are quite similar.
In the numerical reports, first I have a test for the high positive correlation. We're using Kendall's Tau, and we can see that it is highly significant. We reject the idea that they're not strongly correlated. Next, we have a test of linearity. Here, this test assumes that they're linear, and w e're looking for strong evidence against that. But we have a very high P value here, so we do not reject the assumption that they're linear. Finally, we have the parameter estimates by using Passing -Bablock regression, we have the point estimate, we have the interval estimate.
For the intercept, that interval includes zero, so we cannot reject an intercept of zero. We have the slope is contained within an interval that includes one. Similarly, we can't reject that the slope of that line is equal to one. Let's say we'd also like to compare these two methods by difference. To do that, I click on the red triangle for the options of the Fit Passing-Bablock results, and here we see the command Bland-Altman analysis. It takes all the information here and launches match pairs with the additional information in the Bland-Altman analysis.
The plot is showing us on the Y axis the pairwise difference between the method one and the standard method, and it's plotted against on the horizontal axis is the mean of those two values. The Bland-Altman analysis is helpful because it gives us an idea about the bias. So here I have an estimate of the bias is negative 0.113, but we can see that the interval estimate of the bias includes zero, so we can't reject the idea that the bias is equal to zero and so on. Now we have in JMP® 17, a much more complete set of tools for comparing different test methods. That concludes our presentation. Thank you.