how do I compare sensitivity and specificity among several methods?

Maureen · Jun 8, 2023 5:22 PM

Hi all

I have data for a reference method and three alternate methods. I have columns for each of the three alternate methods that, for each row, indicate if the specimen is a true positive (TP), false positive (FP), true negative (TN) or false negative (FN). I want to evaluate the significance difference between the three methods for sensitivity (TP/(TP+FN)), specificity (TN/(TN+FP)), positive predictive value (TP/(TP+FP)), negative predictive value (TN/(TN+FN)) and accuracy ((TP+TN)/total).

How can I make a summary statistic from a formula such that I can tell if the different methods are significantly different from each other?

Maureen

SDF1 · Oct 7, 2020 04:03 PM

Hi @Maureen ,

There's a couple ways to go about this, and it depends on how your table's layout. If it's stacked, meaning you have a column labeled "Method" for example, and it has possible entries of "reference", "Alt A", "Alt B", etc., then you would create a statistic columns for each of the stats you're looking at, sensitivity, specificity and so on. Where each stat is calculated by the formulas you have. You would then do an ANOVA on the stacked column for sensitivity, etc. The ANOVA analysis should give you an idea if there is a significant difference in the sensitivity of A vs B vs C, etc. Having it stacked will really help to see not only how each of the three alternative methods compare to the reference, but how they all compare to each other.

If your table is arranged row-wise by the experimental run, then you'd want to create a new data tables where you stack by the stat columns in order to run the ANOVA.

It might be easier to address your specific issue if you're able to share the data table (anonymize it if need be) to better understand the data structure and where you're trying to go.

Out of curiosity, are your TP, FN, TN, FP values binary 0 or 1, or do they range between 0 and 1? Also, are you wanting to sum the stats (sensitivity, etc.) across all runs of the experiment to compare the aggregate values, or evaluate it on a run by run basis?

Hope this helps!,

DS

Maureen · Oct 8, 2020 08:54 AM

The values are literally characters that say "TP", "TN", "FP", "FN". So the calculations of sensitivity, specificity etc are a summary statistic for the method using the count of the values, not something calculated for each row. I do have the data stacked and can do chi square for the four counts, but each summary statistic only uses two of the four available options so I'm not sure how to do a chi square with say, only the number of TP and FN which are used to calculate sensitivity without taking the TN and FP into that overall significance of the chi square.

SDF1 · Oct 8, 2020 09:46 AM

Hi @Maureen ,

I think I understand how your data is organized. If I do, and if I understand how you are trying to summarize each method's statistical evaluation, you're going to need to do some data table manipulation to get it in the right format. Here's what I'd do:

Make a summary of your stacked data where you put the column containing the TP, FP, TN, FN in the "Group" entry and then the column that contains the information whether the row was reference, method A, method B, or method C as "Subgroup" (be sure to uncheck the "link to original data table" box). This creates a new data table with a column of the FN, TN, etc. characters, a column "N Rows" that says how many rows had that value and then several other columns with names like N(Method A), N(method B), N(reference), etc. These columns contain the counts of each of the TP, FP, TN, FN events.
You'll need to then transpose this table. You'll want to put all the columns that are titled "N(name of something)" in the "Transpose Columns" selection and then for "Label" you'll want to use the column (maybe named "Data") containing the TP, TN, etc. values. This will generate a new data table that has a column called "Label", which will contain the column names from the previous table, i.e. "N(method A)" and so on. There will now be columns with the names "TP", "TN", "FP", "FN" and each row will contain the count of the number of times that particular event occurred for the specific method.
Now, you will create your statistic summary columns as formulas. Specificity will be TP/(TP+FN), and so on for each of the summary statistics that you want to calculate.
The last step would be to plot using the Fit Y by X platform. You'd use the "Label" column as the X Factor (this is the column containing values like "N(method A)", etc. and your summary statistic (sensitivity, specificity, etc.) as the Y Response. Unfortunately, since this is a summary statistic and not looking at each row's statistic you won't be able to do an ANOVA on the data. You will be able to see the overall picture for each method and how good each method is for the different statistics.

I hope I've understood the problem correctly. Again, it helps to be able to see the data structure to know exactly how to move from where you are to where you want to be.

Hope this helps!,

DS

Maureen · Oct 9, 2020 07:48 AM

thank you. All of this works as you described. But I do want a way of proving statistically if there is significant differences between the methods for these summary statistics.

I can do contingency tables with the inputs, but each summary statistic is a combination of two inputs so I'm not sure if a significant difference in the number of FP translates to a significant difference in the two calculations that use it (specificity and positive predictive value)

SDF1 · Oct 9, 2020 10:38 AM

Hi @Maureen ,

It sounds like you want to do something like a Profit Matrix or Cost Sensitive Learning. This would be done by doing a nominal logistic fit in the fit model platform.

Either that, or you would have to replicate your experiment several times to gather enough statistics for each method that you can determine p-values for the summary stats of sensitivity, specificity, etc.

For example if Run 1 of the experiment results in:

Ref = TP

A = FP

B = TP

C = FN

But, if you repeat Run 1 again (after you've gone through all the other runs), maybe you get:

Ref = TP

A = FP

B = FN

C = TP

This would result in the numerical count being different for Run 1 for the different methods. As a result, if you then do a Fit Y by X, where the y-axis is the counts for each of the different runs and the x-axis is the different methods (ref, a, b, c), then each method (except maybe the reference method) should have a spread in the counts, which would result in slightly different summary statistics. The ANOVA would look at the means and standard deviations of those summary statistics to find if there are any significant differences between the methods. Since you'd be comparing four different methods, you'd want to do a Tukey-Kramer analysis of means rather than a Student's-t.

Another option would be to look into a Bland-Altman analysis. You'd have to recode the TP, TN, FP, FN to actual numerical values. See for example here. This analysis compares the difference of one method (to the reference) and compares it to the average of the reference to the measurement. In other words, it would look at DIF = Ref-A (on the y-axis) and AVG = (Ref+A)/2 (on the x-axis). You could do this analysis on the FP, TN, etc. or the summary statistics.

Depending on how many rows (experimental runs you have), I think you might be better off doing an analysis like in the first paragraph.

As mentioned before, if you can share your data table (you can always anonymize it (Tables > Anonymize)), it would be a lot easier to determine exactly how to help you.

Good luck!,

DS

how do I compare sensitivity and specificity among several methods?

Re: how do I compare sensitivity and specificity among several methods?

Re: how do I compare sensitivity and specificity among several methods?

Re: how do I compare sensitivity and specificity among several methods?

Re: how do I compare sensitivity and specificity among several methods?

Re: how do I compare sensitivity and specificity among several methods?