A typical diagnostic test has two outcomes: positive or negative with the goal of classifying subjects correctly based on a “gold standard” or known outcome. These diagnostic tests have specific performance measures that are used to assess the clinical value of a test. Two basic measures for a diagnostic test are sensitivity and specificity. Sensitivity is the percent of true positives subjects who test positive by your test, and specificity is the percent of negative subjects who test negative by your test. There are additional measures of interest as well, which are outlined in the FDA’s Statistical Guidance on Reporting Results from Studies Evaluating Diagnostic Tests. These performance measures are based on the 2x2 contingency table of truth (i.e., diagnosis by some known standard – often invasive) vs. the test result.
For example, let’s consider an example taken from Zhou, et. al. (Statistical Methods in Diagnostic Medicine, Wiley 2002).
Our data table has two columns of interest: “Truth” and “Test Result.” We can obtain a contingency table (shown below) using the fit Y by X platform with Y = Truth and X = Test Result. Using this contingency table, we can gather our performance measures.
Recall that sensitivity is the proportion of true positives that test positive by our test. For our example, that is 29/30 or the Col % value of 96.67 from our table. Another performance measure is negative predictive value, or NPV. NPV is the proportion of subjects who test negative that are truly negative. In our example, the NPV is 11/12 or the Row % of 91.67. So our diagnostic performance results are in our contingency table if we know where to look; however, their confidence intervals are missing. To get to the confidence intervals, we can use the Distribution platform so long as we use the correct subset of subjects for each performance measure. Sensitivity is based on true positives; NPV on those who test negative, and so on.
Needing performance measures on a regular basis and needing to share them with others in a clear format, I turned to JSL. Now the JMP add-in “Performance Measures” has all the summary statistics I need for a diagnostic test in one place. When you invoke the add-in, you will be presented with a dialog box in which to enter the diagnosis column (i.e., true outcome) and the test result column (i.e., test result), and if you have a grouping column you can use that as well. The dialog box also allows you to set the significance level (for your confidence intervals).
Next, you will be prompted for the levels corresponding to a positive test. Note, if you want your 2x2 table to be in the “standard” format of the upper-left cell being the positive/positive cell, then you will need to use your column properties “value ordering” to set the positive outcomes as the top value for each of your columns.
The resulting output is as follows:
The output was built in terms of what I most often needed: the 2x2 table of counts and the table of performance measures with confidence intervals. I retained a graphic in my output to keep with the spirit of JMP platforms that are graphically oriented. The script uses the categorical modeling platform to obtain the 2x2 table and the frequency chart, and then references the values in the 2x2 table to generate the performance measures table. The confidence intervals are score confidence intervals except those on for the likelihood ratios, which are based on the method in the article: “Likelihood ratios with confidence: sample size estimation for diagnostic test studies.” J Clin Epidemiol. 1991;44(8):763-70. Simel DL, Samsa GP, Matchar DB.
The performance measures script is available as part of the “Medical Diagnostic Tools” add-in in the JMP File Exchange (download requires a free SAS login). So far this tool kit has two tools: the one described above and a second that will calculate confidence intervals for the AUC from an ROC curve.