Traditional clinical trial data management is often manual, cumbersome, and vulnerable to human error, slowing progress and compromising quality. Streamlining these processes isn’t just a nice-to-have; it’s a game-changer.
Risk-Based Quality Management (RBQM) has already raised the bar by accelerating studies and enhancing oversight. But there’s even greater potential on the horizon: artificial intelligence (AI). At CluePoints, we’re exploring new, practical ways to embed AI into everyday workflows, amplifying the impact of RBQM and pushing quality standards even higher.
Laura Trotta, Vice President of Research at CluePoints, is here to answer your most pressing questions about using AI to drive smarter, faster, more reliable clinical trials.
Table of Contents
What is Artificial Intelligence (AI)?
Why are Clinical Trials a Good Fit for AI?
How Can AI Improve Clinical Trial Quality?
How Does Deep Learning (DL) Coding Compare to Traditional Auto-Coding?
What Standards are AI Solutions Designed to Meet?
What Programming Languages are Needed to Implement an AI or Machine Learning (ML) Project?
What Is Artificial Intelligence (AI)?
Simply put, AI is intelligence as demonstrated by machines. While computer intelligence is an aspirational goal, machine learning (ML) is the field of study that allows computers to learn without being programmed. ML includes a wide range of supervised and unsupervised algorithms that allow the computer to learn from data and perform a series of tasks. As part of those techniques, deep learning (DL) is a set of advanced ML algorithms relying on deep neural networks.
Why Are Clinical Trials a Good Fit for Artificial Intelligence (AI)?
There’s a growing, industry-wide interest in applying ML and DL to the large volume of clinical data collected during a trial. This data is drawn from multiple sources across both clinical and operational workflows. In fact, the typical Phase III protocol now collects more than 3.4 million data points per study.1 Key sources of this data include:
- Electronic Case Report Forms (eCRF): Digital forms used by clinical sites to enter patient data and study-specific information.
- Laboratory Data: Test results such as blood work and diagnostics used to assess patient health and drug effects.
- Electrocardiograms (ECG): Recordings of heart activity used to monitor cardiac safety during the trial.
- Electronic Patient-Reported Outcomes / Clinical Outcome Assessments (ePRO/eCOA): Digital tools that capture health feedback from patients, clinicians, or caregivers.
- Wearable Devices: Tools like smartwatches that continuously collect health metrics such as activity, heart rate, and sleep patterns.
- Clinical Trial Management Systems (CTMS): Platforms used to oversee study operations and track data like site performance and patient enrollment.
As both the volume and complexity of this data rapidly increase, advanced ML algorithms can be used to extract more insights. DL algorithms are particularly suited for high volumes of complex data and are particularly powerful in dealing with complex data such as text or images. Besides dealing with high volumes of complex data, DL techniques are powerful in decision-making processes that rely on complex cognitive processes. This is particularly relevant to clinical trials where many processes depend on the experience of subject matter experts (SMEs) and their clinical knowledge of the study, indication, and therapeutic area of the drug under investigation.
How Can Artificial Intelligence (AI) Improve Clinical Trial Quality?
Many RBQM platforms rely on advanced statistical and ML algorithms. Data quality assessment (DQA), key risk indicators (KRIs), and quality tolerance limits (QTLs), for example, can all be used to monitor data during the study. They work by looking at the study data from all sites in real time as it accumulates and flagging outlying data points that could signify potential issues.
- DQA continuously evaluates data completeness, consistency, and accuracy by applying statistical checks, helping identify issues like missing values, protocol deviations, or data entry errors early in the process.
- KRIs track predefined metrics (e.g., high protocol deviations, delayed data entry, or elevated adverse event rates) to detect trends or outliers that may indicate problems at specific sites or across the study.
- QTLs establish acceptable thresholds for critical parameters (e.g., screen failure rates, dropout rates) and trigger alerts when those thresholds are exceeded, prompting timely investigation and mitigation.
CluePoints has been working on a series of ML solutions designed to improve study efficiency and data quality even further, including a new medical coding module. Adverse events and concomitant drugs recorded in case report forms must be manually mapped to MedDRA or WHODrug dictionaries, which takes time. The DL model guides researchers to the correct corresponding term in seconds, with more than 90% accuracy.
DL can also enhance risk detection in clinical trials. When RBQM analyses detect a potential data issue, they generate a “risk signal” to flag it for further review. These signals serve as a central point for tracking investigations and corrective actions. As users document their findings, they often enter large volumes of free-text commentary—creating rich but unstructured data that DL algorithms can analyze to uncover patterns, assess risk severity, and support smarter decision-making.
A root-cause decision feature is a natural language processing (NLP) algorithm that screens all the documentation attached to risk signals and flags those that either lack the required documentation or for which the root cause selected by the user is unreliable. This helps build a consistent database of documented issues that can be used to predict which signals are more likely to represent a real issue at the point of identification. It should allow study teams to prioritize signal review and ensure effective follow-up and documentation of findings.
How Does Deep Learning (DL) Coding Compare to Traditional Auto-Coding?
DL coding solutions offer greater flexibility and adaptability compared to traditional auto-coding tools. While conventional auto-coding systems often rely on static dictionaries and require manual input to improve accuracy, DL models continuously learn from new data, enabling them to recognize and adapt to novel terms more effectively.
- Traditional auto-coding tools typically achieve hit rates around 60%, due to their limited ability to generalize beyond pre-programmed terms.
- In contrast, DL-driven solutions can achieve hit rates of 90% or higher, thanks to their contextual understanding and pattern recognition capabilities.
- For maximum efficiency, medical coding solutions should include an API-driven architecture that allows seamless integration with various clinical systems and data sources.
What Standards Are Artificial Intelligence (AI) Solutions Designed to Meet?
In 2021, the U.S. Food and Drug Administration (FDA) introduced its Good Machine Learning Practice (GMLP) guidelines to support the safe, effective, and high-quality development of AI- and ML-enabled medical devices. While CluePoints software is not classified as a medical device, we closely align our development approach with the FDA’s ten guiding principles, ensuring our AI solutions are built with the same rigor, transparency, and trustworthiness. Here are the ten guiding principles of GMLP:
- Multi-Disciplinary Expertise Is Leveraged Throughout the Total Product Life Cycle
- Good Software Engineering and Security Practices Are Implemented
- Clinical Study Participants and Data Sets Are Representative of the Intended Patient Population
- Training Data Sets Are Independent of Test Sets
- Selected Reference Datasets Are Based Upon Best Available Methods
- Model Design Is Tailored to the Available Data and Reflects the Intended Use of the Device
- Focus Is Placed on the Performance of the Human-AI Team
- Testing Demonstrates Device Performance during Clinically Relevant Conditions
- Users Are Provided Clear, Essential Information
- Deployed Models Are Monitored for Performance and Re-training Risks are Managed
At CluePoints, these principles inform every stage of our AI development process:
- We engage a multi-disciplinary team of SMEs to guide development and testing.
- Our models are trained and tested on independent datasets to preserve objectivity and reduce bias.
- Clinically relevant testing is conducted across diverse therapeutic areas to ensure generalizability.
- We perform proof-of-value studies where SMEs interact directly with the model, evaluating predictions in context.
- Post-launch, we work closely with product teams to monitor real-world performance and manage re-training as needed.
This GMLP-aligned approach helps us deliver AI solutions that are not only innovative, but also trustworthy, scalable, and built for impact in real-world clinical settings.
What Programming Languages Are Needed to Implement an Artificial Intelligence (AI) or Machine Learning (ML) Project?
Developing an AI/ML solution requires skilled machine learning (ML) engineers with strong programming expertise—Python is the most widely used language due to its compatibility with leading ML frameworks like PyTorch and TensorFlow, both built primarily in Python.
However, if you’re simply using an existing ML algorithm (e.g., consuming its predictions through an application), no programming skills are required. The technical complexity is handled on the back end, allowing end users to benefit from AI-powered insights without writing code.
Want to learn more about how AI can elevate your clinical research? Discover how CluePoints’ RBQM harnesses the power of advanced analytics and AI to improve data quality, accelerate timelines, and enhance oversight. Explore our RBQM solutions or connect with our team to see how we can support your next study.
Click here to download our Ultimate Guide to RBQM to explore the fundamentals, benefits, and real-world applications of this transformative approach.
REFERENCES:
1. Getz K, Smith Z, Kravet M. Protocol Design and Performance Benchmarks by Phase and by Oncology and Rare Disease Subgroups. Ther Innov Regul Sci. 2023 Jan.