It’s World Statistics Day! To honor the theme of the day, the JMP User Community is having conversations about the importance of trust in statistics and data. And we want to hear from you! Tell us the steps you take to ensure that your data is trustworthy.
Choose Language Hide Translation Bar
Staff (Retired)
Identifying re-enrolled subjects in clinical trials

Over the next several blog posts, I will discuss various analyses that can be used to identify unusual and potentially fraudulent data from clinical trials. See Buyse at al. for a good summary of clinical trial fraud, including suggestions for statistical methods to help identify unusual data. JMP Clinical 4.1 will include several new analytical processes (APs)  to aid statisticians, data managers and monitors in identifying subjects or clinical sites that may require greater scrutiny.

Fraud is difficult to diagnose. Oftentimes, methods to identify problematic values are graphical in nature, where data trends and patterns at a particular clinical site may vary substantially from the remaining sites. These graphs require review, and what qualifies as a “signal” may vary depending on the analyst. Further, while these differences may highlight a potential problem, they can also merely identify sites with important differences in how certain procedures are performed (which, ideally, will be standardized across sites as much as possible), or important differences in the available population of subjects. Worse yet, they can highlight certain deficiencies in the design and data collection of the study itself.

Let’s take a simple example. Patients in clinical trials receive excellent medical care, sometimes with the addition of financial reimbursement. Sometimes the need to maintain this treatment or receive additional financial incentives may cause subjects to re-enroll in the trial at another participating study center. These duplicate-enrolled subjects are problematic. From a statistical perspective, the assumption of independence among study subjects is violated. Not considering this violation has the potential to underestimate the standard error for the effect of treatment. Alternatively, if these errors are identified after the trial has completed enrollment, this can result in a loss of power since a straightforward way to handle duplicated data is to include efficacy data only for the first instance of each subject participating in the trial (resulting in a loss of sample size). Sensitivity analyses may include these duplicated subjects. In any event, the analysis and study reporting becomes much more difficult with the presence of duplicate subjects.

How is it possible to identify subjects that have participated more than once? Straightforward ways of identifying potential re-enrollers include matching subject IDs by birth date or initials. See Figure 1. This output is from the Birthdays and Initials AP using data from Nicardipine. Subjects are matched either by birth date, with options to allow for windows around a birth date (in case dates are entered into the database incorrectly); or by initials accounting for the possibility that a previously reported middle initial may go unreported at another site. Here, differing sex and race within a match can quickly identify pairs that do not require further attention. If available, JMP Clinical will also summarize ethnicity, height and weight in this table.

For example, the first two rows representing 15Apr1915 are likely not the same subject since the gender is different. However, the pair for 28Jul1928 (rows 7 and 8) may require some review. Finally, review the pair of subjects in the last two rows for 18Oct1956. They match on gender and race and are participating at the same site! Reviewing additional data from the DM/ADSL may provide additional clarity by using the Show Subjects drill down. A worse case is that a duplicate CRF was submitted for this subject (possibly just a sloppy error). Another possibility is that twins have enrolled at this site!

As we’ve seen above, identifying matches based on birth dates and initials does have the potential to identify false positives. The famous Birthday Problem states that with 57 people, there is a 99% chance of having at least one pair of subjects with a matching birth date. However, summarizing additional demographic and vital sign characteristics should quickly identify and eliminate these cases.

Article Labels

    There are no labels assigned to this post.


Locating potentially fraudulent data in SDTM Findings domains wrote:

[...] previous discussion focused on identifying subjects that may have enrolled in a study multiple times, typically at more than one clinical site. In general, however, fraudulent behavior occurs within [...]


Identifying multivariate inliers and outliers wrote:

[...] clinic can signify problems at the clinical site, and discussed how trial participants can appear multiple times within the same study. For the last two posts, we focus on some methods that use as much of a [...]