Risk-Based Fraud Detection: How Centralized Monitoring Can Boost Data Quality


A new study has shown how risk-based monitoring can help detect site fraud at clinical trial sites, protect data quality, avoid delays to market, and keep people safe.

We worked with industry and bioinformatics partners to reanalyze the database of a large randomized clinical trial known to have been affected by fraud – and found that ongoing, unsupervised statistical monitoring could have uncovered the problem around a year earlier than traditional, retrospective methods were able to do.

Published in Therapeutic Innovation and Regulatory Science, in collaboration with Boehringer-Ingelheim the paper concluded that an unsupervised approach to centralized monitoring, using mixed-effects statistical models, was effective at detecting sites’ data anomalies.

That’s why we at CluePoints believe that increasing the use of these methods could enhance patient safety and the validity of outcomes.

Historic fraud

More than 7,000 patients across 60 sites in 13 countries took part in the Second European Stroke Prevention Study (ESPS2) in the early 1990s.

The international, multisite, randomized, double-blind trial compared acetylsalicylic acid and/or dipyridamole to matching placebos to prevent stroke or death in patients with pre-existing ischemic cerebrovascular disease.

Severe inconsistencies in the case report forms (CRF) of one site, #2013, led the trial’s steering committee to question the data’s reliability. A for-cause analysis of quality control samples and extensive additional analyses, including blood concentrations of the investigational drugs, showed the patients had never received the protocol medications.

In the end, the site’s data, comprising 438 patients, were excluded from the trial, and the investigator was convicted, but the process took around a year. So, if that trial was to take place today, what could be different if anything?

Cutting-edge analyses

To find out if the fraud could have been detected earlier with the use of unsupervised statistical monitoring, our research team reanalyzed the entire ESPS2 database, which included all the clinical data from CFR and laboratory results for all patients from all sites.

The project was divided into two phases. The first objective was to confirm if a CluePoints Data Quality Analysis (DQA), which works on the principle that data from all the sites should be largely similar, aside from the random play of chance and systemic variations would identify the fraud issue.

Next, the team, blinded to the fraud until after the analysis was complete and the findings presented, aimed to find out how much earlier the review would be able to flag problems.

In phase one, CluePoints’ advanced set of mixed-effects statistical tests were applied across the completed study database to identify unusual patterns at sites, regions, or among patient groups. This generated hundreds of p-values per site, a weighted average computed and converted into an overall data inconsistency score (DIS) for each site.

Site #2013 was assigned a DIS of 4.14, which identified it as the second most atypical site across the 60 sites.

Closer examination of #2013’s data found it had only reported serious rather than non-serious adverse events and that there was atypically low variability between and within patients’ laboratory results and vital signs.

What’s more, the analysis revealed multiple atypical proportions and missing values in domains such as study drug compliance and adverse events. These findings were consistent with the sponsor’s previously published conclusions.

Saving precious time

In phase two, the team carried out the same analyses on the version of the study database that represented incrementally earlier time points in the execution of the study. This reproduced the effect of the study being subject to regular, ongoing central statistical monitoring reviews at the time.

They found that site #2013 would have been detected as atypical when around 25% of the final data volume had accrued, which was in May 1991.

By contrast, the ESPS2 study team first developed suspicions and carried out a detailed statistical assessment of the site’s data 13 months later, in June 1992. At this point, around 75% of the site’s had been collected.

This led to a for-cause audit in January 1993 and an expert review of patient compliance in June 1993. Only at this point was the fraud was confirmed and the site’s data was excluded from the trial. The exclusion of site #2013’s data did not materially affect the results of the ESPS2.

It means that had CluePoints been used in this study, the erroneous data would have been flagged at least a year before the traditional approach had allowed.

Why does it matter?

Employing DQA and ongoing statistical analyses would allow the industry to spot, investigate, and report incidences while still having the opportunity to maintain study power when data must be excluded.

Our own experience with central statistical monitoring suggests that overt fraud is relatively rare. Other causes of data errors, including sloppiness and a lack of understanding or training, are quite common, and DQA can help here too.

When data errors are detected in real time, they can be corrected and any required remedial actions, such as enhanced training, can be quickly implemented – before they can threaten data quality, or lead to costly approval or market delays.

Meet SPOT: Transforming Site Monitoring Practices with Adaptive Intelligence
A New Era of Automation: Improving Efficiency & Outcomes with Intelligent Medical Coding
Press Release
CluePoints Continues ‘Turning Artificial Intelligence into Human Intelligence’ by Launching Two New Innovations