Locating potentially fraudulent data in SDTM Findings domains
We continue our discussion of unusual and potentially fraudulent data from clinical trials, and today we focus on data that occur in CDISC Findings domains. From the SDTM Implementation Guide, Findings domains represent data from planned evaluations that address specific tests or questions. For example, these domains contain data from laboratory tests; vital signs such as blood pressures, heart and respiratory rates, and body temperature; data from questionnaires; or evaluation of therapeutic area-specific signs and symptoms. There are numerous graphical and analytical approaches to identify unusual observations, particularly if these data are repeatedly measured across time. Here, we describe two straightforward methods from the upcoming JMP Clinical 4.1that identify some form of data duplication.
Our previous discussion focused on identifying subjects that may have enrolled in a study multiple times, typically at more than one clinical site. In general, however, fraudulent behavior occurs within the study site, often at a single or small number of centers. Our ability to detect unusual data may depend on how the data at one site may differ from the rest of the pack. These signals include differing response rates and trends across time, or unusual or atypical associations between variables. While it may be straightforward to make up a blood pressure at any given time point, the effects of time and the relationships to other variables are difficult to account for. However, the methods we describe today do not rely on comparing data across sites, except perhaps in the frequency that hits occur.
The first check for SDTM Findings domains is the Constant Findings Analytical Process (AP). This AP searches for any tests where there is no variability within a subject, which could be an indication that results were copied throughout the CRF. In Figure 1, we have summarized the occurrence of constant lab tests from the Nicardipine trial. For Bilirubin, it appears that 45 subjects had no change in their values from the time they entered the trial until the time they exited.
While unusual, these results may not necessarily indicate that fraudulent behavior is occurring. Some laboratory tests may show little variability across time. Further, for subjects who discontinue the trial early, there may only be a handful of repeated measurements. As we see in Figure 2, the majority of subjects had repeated values that occurred only twice before they discontinued. Only in one instance did a subject have a value that occurred across four visits. This may be entirely plausible, but clinical judgment will decide. On the other hand, if a subject completes a study of lengthy duration and there is no change across time in multiple tests, this may warrant further investigation.
The results could also highlight certain deficiencies in CRF design. For example, it might be completely natural to collect data for a physical exam at each study visit. For each body system, we may report whether the exam was normal or abnormal, and if abnormal, list the particular abnormalities. Collecting data in this manner allows us to summarize abnormality rates across time. However, these abnormalities are typically so rare that there are often no results to report! We now find that we have to report the subjects were normal across every visit of the trial. This may result in a lot of data to sift through in order to find anything interesting (read: problematic!).
Further, if these spontaneously occurring abnormalities are “clinically significant,” we may find the need to collect them as adverse events. We are now in a position of performing consistency checks across CRF domains. Perhaps a more straightforward approach is to document that physical exams were performed, but report physical exam abnormalities as adverse events. This reduces the amount of data collected and eliminates the need for cross-domain consistency checks.
The Duplicate Records AP identifies sets of variables that may occur more than once. What does “record” mean? For our purposes, record will refer to a set of tests reported at the same visit and time (if applicable) number from the same domain. This will often encompass a single panel on a CRF page. For example, a set of vital sign measurements (stored in the SDTM VS domain) may be collected for a subject:
It would be extremely unusual to find the same exact set of measurements within the subject or across subjects within the same clinical site. Multiple hits between two subjects can possibly indicate that CRF pages were copied between these subjects. Even worse, a large number of multiple hits may identify a fictitious subject. Hits within a subject can identify “carried-forward” problems similar to those described above, or it may indicate visits that did not actually occur for the subject. Figure 3 identifies the triplicate above among two subjects within the same clinical site.
While unusual, it is possible for a record (i.e., a set of measurements) to occur between or within subjects. However, if this same record is repeated among multiple subjects within a clinical site, or repeated across multiple visits or time points, or there are a large number of tests that make up the record (say, a lab panel of a dozen or more tests), this would warrant additional investigation.
As we have discussed, these unusual signals could have completely plausible explanations; and based on a small number of subjects, it may be extremely difficult to conclude that data from a clinical trial site are the result of malice or fraud. Comparing the frequency of hits across sites may indicate that something unusual is taking place, and signals from a variety of visual and analytical data checks may suggest that additional review of a clinical site is necessary.