Hello. My name is Mark Bailey.
I'm a senior Analytics Software Tester at JMP.
My colleague today is Jianf eng Ding,
a Senior Research Statistician Developer.
I'm going to start the presentation
about some new approaches to comparisons that will be available in JMP 17.
I'm going to start over. I don't know why.
Hello. My name is Mark Bailey.
I'm a Senior Analytics Software Tester at JMP.
My co-presenter today is Jianfeng Ding,
a Senior Research Statistician and developer.
I'm going to begin with an introduction to our topic.
Before we talk about specific comparisons,
we'd like to introduce some fundamental concepts.
All of this has to do when we're comparing populations.
Comparing populations is a very common task.
The comparison, we hope, will lead to a decision between two hypotheses.
Samples from these populations are often collected for the comparison.
S tatistical inference can provide some valuable information about our samples.
In particular,
is there sufficient evidence to reject one hypothesis about these populations.
A clear statement of the hypothesis is
really essential to making the correct choice of a test for your comparison.
These hypotheses represent two mutually exclusive ideas
that together include the only possibilities.
They're called the alternative and null hypotheses.
The alternative hypothesis
is really a statement about the conclusion that we want to claim.
It serves to represent the populations and it will require sufficient evidence
to overthrow the other hypothesis, which is the null hypothesis.
It states the opposing conclusion that must be overcome with strong evidence.
It serves as a reference for comparison and it's assumed to be true.
This is important that we sort this out today because historically,
statistical training has presented only one way of using these hypotheses.
The most often taught statistical tests
are used to demonstrate a difference between the populations.
But that's not the only possibility.
The lack of understanding about this distinction
can lead to misusing these tests.
The choice of a test is not a matter of the data that's collected
or how the data is collected.
It's strictly a matter of the stated hypotheses
for the purpose of your comparison.
Let's look at two similar examples that are actually fundamentally different.
But let's start out where we have a purpose of demonstrating a difference.
In this example, let's say I would like to demonstrate that a change in temperature
will cause a new outcome, an improvement perhaps.
We want to claim that a new level
of our response will result from changing the process temperature.
We'll use a designed experiment to randomly sample from a population
for the low temperature condition and the high temperature condition.
The two hypotheses are the null states
that the temperature does not affect the outcome.
This will be our reference.
The alternative states our claim,
which is the temperature affects the outcome,
but only if the evidence is strong enough to reject the null hypothesis.
All right, this is going to sound very similar, but it's exactly the opposite.
In this case, an example two, we need to demonstrate equivalents.
Here we want to demonstrate
that a temperature change doe not cause a new outcome.
That is, after the change, we have the same outcome.
For example, this might be the case where
we are planning to change the process temperature
to improve the yield,
but we want to make sure that it doesn't change
the level of an impurity in our product.
We design the same experiment to collect the same data
and we have the same two hypotheses, but now they're reversed.
It's the null that states that the temperature affects the outcome,
that is, there's a difference,
while the alternative states
that our change in temperature will not affect the outcome.
Are we testing for a difference or for equivalents?
Really we see that from these examples
that it's not the data, the data identical, but the tests are different.
the choice is not about the data,
it's about our claim, or in other words, how we state our hypotheses.
Also remember that hypothesis tests are unidirectional.
They serve only to reject a null hypothesis
of a high probability when it's false.
in our presentation today,
we'd like to introduce some new equivalents tests as well as some
additional methods that are used when comparing two measurement systems.
I'm now going to hand it over to Jianfe ng to talk about equivalence test.
Thanks Mark.
Hello.
I'm Jianfeng Ding .
I'm a Research Statistician Developer at JMP.
In this video I'm going to talk about the equivalence,
non-infererority and superiority test in JMP 17.
The classical hypothesis test on the left
is the test that most quality professionals are familiar with.
It is often used to compare two or more groups of data
to determine whether they are statistically different.
The parameters data can be a mean response
for continuous outcome and a proportion when the outcome variable is binary.
Theta t represents the response from treatment group
and theta zero represents response from a control group.
There are three types of the classic hypothesis test.
The first one is the two sided test and the rest are one sided tests.
If you are looking at this two side test on the left,
the no hypothesis is that the treatment means are same
and the alternative hypothesis is that the treatment means are different.
Sometimes we really need to establish that things are substantially the same
and the machinery to do that is called an Equivalence Test.
An equivalent test is to show the difference in theta t and theta zero
is within a prespecified margin delta
and allow us to conclude the equivalence with a specified confidence level.
If you look at the equivalence test, the no hypothesis is that
the treat statement means are different and the alternative hypothesis is that
the treatment means are within a fixed delta of one another.
This is different from the two sided hypothesis test on the left.
Another alternative testing scenario is the Non-inferiority Test,
which aims to demonstrate that results are not substantially worse.
There is also a testing scenario called superiority testing,
that is similar to non-inferiority testing,
except that the goal is to demonstrate that results are substantially better.
There are five different types of equivalent type test
depend on the situation.
When should we use this test will be discussed next.
These tests are very important in industry,
especially in the biotech and pharma industry.
Here are some examples,
if the goal is to show that the new treatment
does not differ significantly from the standard one
by more than some small margin, then equivalent test should be used.
For example, a generic drug that is less expensive
and cause few side effects than a popular name branded drug.
You would like to prove it has same efficacy as the name brand one.
The typical goal in non-inferiority testing is to conclude
that a new treatment or process
is not significantly worse than the standard one.
For example, a new manufacturing process is faster.
You would make sure it creates no more product defects than the standard process.
A superiority test tries to prove
that the new treatment is substantially better than the standard one.
For example, a new fertilizer has been developed with several improvements.
The research wants to show
that the new fertilizer is better than the current fertilizer.
How to set up the Hypothesis.
The graph on the left,
summarize these five different type of equivalent type tests very nicely.
This graph is created by SAS STAT College, john Castelloe and Donna Watts.
You can find their white paper easily on the web.
Choosing which test depend on the situation.
For each of the situation,
the blue region is the region that you are trying to establish with the test.
For equivalent analysis, you can construct an equivalence region
with upper bound theta zero plus delta and lower bound theta zero minus delta.
You can conduct an equivalence test by checking
whether the confidence interval of theta
lies entirely in the blue equivalence region.
Likewise, you can conduct a non-inferiority test by checking
whether the confidence interval of theta
lies entirely above the lower bound if large theta is better,
or below the upper bound if smaller theta is better.
These tests are available in one way for comparing normal means
and in contingency for comparing response rates.
The graphical user interface of equivalence test launch dialog
makes it easy for you to find the type of test
that corresponds to what you are trying to establish.
A [inaudible 00:12:00] in the report summarize the comparison very nicely
and makes it easy for you to interpret the results.
Next, I'm going to demonstrate equivalence tests.
I'm going to use the data set called Drug measurements.
That is in JMP sample data as my first example.
Twelve different subjects were given three different drugs A, B and C.
And 32 continuous measurements are collected.
We go to theta YX,
and we load the response and the treatment.
This will bring the one way analysis.
Under the red triangle find the equivalence test.
There are two options means and standard deviations.
We are going to focus on means in this talk.
we bring the dialogue and you can select a test that you would like to conduct
and the graph will represent the selected test.
For the superiority or non-infererority test there are two scenarios.
Large difference is better or smaller difference is better.
Choose option depend on the situation.
You also need to specify the margin here for the delta.
You need to specify the significance level alpha as well.
You can choose use pooled variance or unequal variance to run the test.
You can do all pair wise comparison
or you can do a comparison with the control group.
We're going to run an equivalence test first
and we will specify the three as margin for the difference.
We click the Okay button.
Here is the result of the equivalence test.
From this forest plot you can see that
the confidence interval for the main difference between drug A and drug C
is completely contained in this blue equivalence region.
The max P-value is zero less than .05.
We can conclude to the .o5 significance level.
Drug A and drug C are equivalent.
But if we look at drug A and B, drug B and C
we can see their confidence interval of the main difference
is both beyond this blue region.
At the .05 significance level
we cannot conclude that drug A and B or drug B and C are equivalent.
Assume drug C is our standard drug and we would like to find out
if the measurements of drug A or B are much better than drug C.
We can run a superiority test to prove.
Let me close this outline note first and we bring the launch dialogue again.
This time we're going to do a superiority test.
For this test we believe large difference is better.
Here we keep this selection.
A lso for this study we want to set drug C as our control group.
We plug in the delta, the margin .04 for this case click Okay button.
Here is the result for superiority test.
From the forest plot you can easily see that the confidence interval
of mean difference between drug B and C
is completely contained in this superior region
and the P-value is less than .05 .
We conclude that drug B is superior to drug C.
The confidence interval of mean difference
between drug A and C is beyond this blue region.
The P-value here is much bigger than .05 .
we conclude at the .05 significance level
we cannot conclude that drug A is superior to drug C.
This concludes my first example.
Now I'm going to use a second example
to use the relative risk between two proportions
to show you how to conduct a non-inferiority test.
Bring the data table.
The trial is try to compare a drug
called FIDAX as alternative to drug VANCO for the treatment of colon infections.
Both drugs have similar efficiency and safety.
221 out of 224 patients treat with FIDAX
achieved clinical cure by the end of study.
Compare to 223 out of 257 patients treated with VANCO.
We're going to launch Fe Y by X again.
And put our response and a treatment variable and account will be Freg.
Since the response variable is categorical.
Contingency analysis is produced
and all the test here is based on classical hypothesis test.
The P-value suggests us that we cannot conclude
that clinical Q is different across the drug.
But for this study we really want to find out
if drug FIDAX is not inferior to drug VANCO.
We go to red triangle menu, find equivalent test.
There are risk difference and relative risk.
We are going to choose relative risk to do this case.
In the launch dialogue we choose non-inferiority test
and the large ratio is preferred by us for this study.
We also need to find the category interest.
For this study we select yes as a category of interest
and we also need to plug in our ratio margin here.
We specify zero .09.
We click Okay button and here is the result of non-inferiority test.
From the forest plot you can easily see that
the confidence interval for the relative risk between drug FIDAX and drug VANCO
is completed contained in this non-inferior region.
We conclude at the .05 significance level,
drug FIDAX is not inferior to drug VANCO.
This concludes my talk and I will give it back to Mark.
Thank you JainFeng.
I'm going to now talk about a very common procedure called method comparison.
It's a standard practice whenever new measurements are being developed.
We have to assume that there is a standard method that exists already
to measure the level of some quantity.
Perhaps it's the temperature or the potency of a drug.
A new method has been developed for some reason.
We want to make sure that its performance is comparable to the standard method.
Today there are many standards that have been developed
over many years by various organizations to make sure that this is done properly.
What we would hope is that the new test method
ideally returns the same value as the standard method.
A scatter plot of the test method versus the standard method
would show that the data agree with the identity line Y=X .
But of course the data points won't perfectly agree
because of measurement error
in both the standard method and the new test method.
Regression analysis can determine the best fit line for this data
and the estimated model parameters can be compared to that identity line.
This ends up being stated in the two hypotheses as follows.
The null hypothesis
says that they're not comparable and so another way of saying that is
the intercept is not zero and the slope is not one.
The alternative
represents our claim that the new method is comparable
and so we would expect the intercept to be zero and the slope to be one.
We'll compare by using regression and ordinary least squares regression
assumes a few different things.
It assumes that Y and X are linearly related.
It assumes that there are statistical errors in Y but not in X.
These statistical errors are independent of Y, that is, they're constant for all Y.
There's no data that exert excessive influence on the estimates.
But in the case of a method comparison,
the data often violate one or more of these assumptions.
There are measurement errors in the standard method as well.
Also, the errors are not always constant,
in which case we might observe that the coefficient of variation is constant.
That is, the errors are proportional, but the standard deviation is not constant.
Finally, there are often outliers present
that can strongly influence the estimation of these parameters.
other regression methods can help.
Deming regression will simultaneously minimize
the least squared error in both Y and X
and Passing-B ablok regression is a non-parametric method.
It's based on the median of all possible pair-wise slopes
and because of that it's resistant to outliers and non- constant errors.
The Deming regression is available in JMP
through the Bivariate platform using the Fit Orthogonal command.
Deming regression can estimate the regression several ways.
It can estimate the error in both Y and X,
or it can assume that the error in Y and X are equal,
or it can use a given ratio of error of Y to X.
Passing Bablo k is now available in JMP 17,
again through the Bivariate platform using the Fit passing Bablok command.
It also includes checks for the assumptions that
the measurements are highly positively correlated
and exhibit a linear relationship.
There's also a comparison by difference.
The Bland- Altman analysis compares
the pair-wise differences to the pair-wise means
to assess the bias between these two measurements.
The results are presented in a scatter plot of Y versus X for your examination
and also to see if there are any anomalies in the differences.
This is all provided through the Match Pairs platform along with
several hypothesis tests.
I'll now demonstrate these features.
I'm going to show you Deming regression for completeness
that's actually been available in JMP for many years.
I'm going to use a data table which has measurements for 20 samples
by the standard method and then four different test methods.
I'm just going to use method 1.
I start this by selecting the analyze menu set Y by X.
The standard goes in the X roll, while the method one goes in the Y roll.
Here we have the scatter plot to begin with.
I'll click the red triangle and select Fit Orthogonal
and you can see the different choices I mentioned just a moment ago.
I'm going to have JMP estimate the errors in Y and X.
There's a best fit line using Deming regression along with
information about that.
we can see that our intercept for the estimated line is close to zero,
our slope is close to one, and in fact our confidence interval includes one.
Now I'm going to show you Passing Bad blok.
I return to the same red triangle,
select Fit passing Ba blok,
and a new fitted line is added to my scatter plot.
It looks very much like the result from the Deming regression.
But remember that Passing Ba blok
is resistant to outliers or non-constant variance.
First we have Kendall's test that is telling us about the correlation.
Positive correlation is statistically significant.
We then have a test, a check for linearity,
and we have a high P- value here indicating we cannot reject linearity.
Finally we have the regression results.
I see that I have an intercept close to one, but the interval includes zero,
so I can't reject zero.
The slope is close to one.
My interval includes one, so I can't reject that the slope is one.
Finally, using Passing Ba blok this curve menu,
I'll click the red triangle and select Bland- Altman analysis.
This launches the Match Pairs platform, so it's a separate window.
Here we are looking at the pair-wise differences
between method one and the standard
versus the mean of those two values.
We're using this to assess a bias.
The Bland- Altman a nalysis is reported at the bottom.
The bias is the average difference.
We hope that it's zero.
The estimate is not exactly zero,
but we can see that the confidence interval includes zero,
so we would not reject zero.
We also then have lower limits of agreement,
and we see that they also include zero as well.
The standard methods that are used
when comparing two measurement methods are now available in JMP 17.
That concludes our presentation.
Thank you for watching.