JMP Blog

carla_paquet1 · Apr 3, 2025 02:00 PM

Chandramouli-Ramnarayanan.png-original.png Data integrity is essential for accurate predictive modeling, regulatory compliance, and business efficiency. I spoke with Chandramouli Ramnarayanan, Global Technical Enablement Engineer at JMP, about the biggest data challenges companies face and the best strategies for improving data quality.

From statistical monitoring to design of experiments (DOE) and anomaly detection, Chandra shares expert insights on strengthening data integrity. He also explains how JMP’s tools help organizations ensure reliable data and showcases a visualization that demonstrates its impact on predictive models.

What are the biggest data integrity challenges you’ve encountered in the pharmaceutical industry?

One of the most persistent data integrity challenges in the pharmaceutical industry is managing the sheer volume and diversity of data generated across various stages of research, development, and post‐market surveillance. Clinical trials, laboratory experiments, and real‐world evidence collection all contribute to data sets that often originate from disparate systems and formats, creating issues of consistency and compatibility.

This fragmentation is aggravated by legacy systems that have been in use for decades, which may not communicate effectively with modern digital platforms. Manual data entry and reliance on outdated processes further increase the risk of errors, making it difficult to maintain reliable audit trails and ensure regulatory compliance.

Moreover, integrating heterogeneous sources – such as electronic health records, genomic databases, and manufacturing logs – demands advanced management strategies and robust validation protocols. Organizations thus face the dual challenge of safeguarding data quality while meeting strict regulatory requirements.

Addressing these issues requires state‐of‐the‐art technological solutions, such as automation and machine learning for anomaly detection, as well as a cultural shift toward continuous improvement in data governance practices. Developing comprehensive data stewardship policies and investing in workforce training are essential steps in overcoming these challenges, ensuring consistency.

How can organizations quantify the cost of unreliable data in their decision-making processes?

Quantifying the cost of unreliable data in decision making involves a multifaceted approach that examines both direct and indirect financial impacts. Organizations can begin by identifying key business processes where data drives critical decisions.

By mapping these processes, companies can evaluate how data inaccuracies, inconsistencies, or delays lead to operational inefficiencies, rework, or even regulatory penalties.

For instance, errors in clinical trial data might result in costly delays in drug approvals or misinterpretations of study outcomes, thereby impacting market performance. Additionally, the cost analysis should factor in potential risks of suboptimal decisions, including lost revenue opportunities, increased operational costs, and damage to brand reputation. Implementing data quality metrics and performance indicators – such as error rates and reconciliation discrepancies – allows organizations to assign quantifiable values to data issues.

Advanced analytics and simulation models can then project the long-term financial implications of these issues, while benchmarking against industry standards and cost-of-quality models provides additional insights into the overall burden. Ultimately, a rigorous quantitative assessment combined with qualitative evaluations ensures organizations capture a comprehensive view of unreliable data costs, which ultimately drive sustainable competitive advantage and growth.

In industries beyond pharma, what are the common mistakes you see when managing data integrity for predictive models?

In industries outside of pharmaceuticals, several common mistakes in managing data integrity for predictive models are prevalent. One major error is neglecting data validation at the initial stages of model development, resulting in predictive algorithms built on flawed or incomplete data sets.

Organizations often tend to prioritize speed over accuracy, deploying models rapidly without thoroughly cleaning and verifying data quality. This oversight can lead to skewed results and unreliable forecasts that affect business outcomes. Another frequent issue involves inadequate data governance, where roles and responsibilities for data stewardship are not clearly defined, leading to inconsistent practices across departments.

Moreover, many companies underestimate the importance of ongoing data monitoring and recalibration, failing to account for data drift or emerging anomalies over time. The integration of heterogeneous data sources without proper standardization further worsens these challenges, as inconsistencies and biases become embedded in models.

These errors are compounded by a lack of interdisciplinary collaboration, where process owners, data scientists, IT specialists, and business analysts fail to communicate effectively. Addressing these issues requires a holistic approach that emphasizes rigorous data quality assessments, continuous model validation, and the implementation of robust governance frameworks, ensuring that predictive models remain reliable, effective, and precise in dynamic environments.

You’ve spoken about the importance of statistical monitoring. What key metrics or indicators should companies track?

Statistical monitoring is crucial for maintaining high data quality and ensuring model reliability; companies should track a range of key metrics to detect anomalies and measure performance over time.

Fundamental indicators include measures of central tendency (such as mean, median, and mode) alongside dispersion metrics (like variance, standard deviation, and interquartile range) to understand data spread and variability. Additionally, monitoring outlier frequency and distribution shape provides insights into potential skew or abnormal behavior.

Process capability indices, such as Cp and Cpk, assess how well processes meet specifications. In predictive modeling, tracking error metrics like mean absolute error (MAE), root mean squared error (RMSE), and R-squared values is essential for gauging model accuracy.

Companies should also incorporate control charts and trend analyses to monitor shifts in behavior over time, while statistical tests like hypothesis tests and confidence intervals help validate data consistency. Real-time dashboards that display these indicators offer immediate insights, enabling proactive intervention when metrics deviate from expected norms. Finally, combining these metrics provides a comprehensive view of data integrity, ensuring that any deterioration in quality is promptly identified and corrected, thereby safeguarding the reliability of predictive models and informed decision making.

What strategies have you seen work best for detecting data anomalies before they affect decision making?

Detecting data anomalies early requires a combination of proactive strategies that use both advanced technology and human expertise. One effective approach is implementing automated anomaly detection algorithms that use machine learning techniques to identify unusual patterns in large data sets.

These algorithms can be configured to trigger alerts when data points deviate significantly from established norms, allowing teams to investigate potential issues in real time. Complementing this, regular data audits and validation checks are essential to ensure that inputs adhere to quality standards. Integrating statistical process control tools and visualization dashboards further enhances the ability to monitor trends and detect outliers.

Also, embedding domain expertise into the detection process is critical, as experienced professionals often spot subtle irregularities that automated systems might overlook. Cross-functional collaboration between data scientists, IT professionals, and business analysts promotes a comprehensive understanding of data flows and tailors detection strategies to specific contexts.

Establishing feedback loops enables continuous improvement, ensuring that detection systems evolve with emerging patterns. Ultimately, these strategies create a robust framework that minimizes the risk of erroneous data influencing strategic decisions, preserving both operational efficiency and regulatory compliance with assurance.

What are the most important DOE principles for ensuring data quality?

Design of experiments (DOE) principles play a critical role in ensuring data quality by providing a structured approach to planning, conducting, and analyzing experiments.

One key principle is randomization, which mitigates biases and confounding variables by randomly assigning treatments and conditions. Replication is equally important; by repeating experiments, analysts can assess variability and ensure that findings are reliable and reproducible. Blocking is another crucial principle, grouping similar experimental units to control for known sources of variation, thereby enhancing result precision.

Factorial design, which studies multiple factors simultaneously, allows for the identification of interactions between variables that might otherwise be overlooked. Moreover, the principle of orthogonality ensures that factors vary independently, enabling clear interpretation of outcomes. Incorporating robust statistical analysis and validation techniques further strengthens conclusions drawn from experimental data.

These DOE principles collectively contribute to a systematic, rigorous approach to data collection, reducing errors and improving overall data set integrity. In a business context, applying these principles not only leads to higher-quality data but also fosters a culture of continuous improvement and informed decision making. Ultimately, adherence to DOE methodologies is essential for developing reliable predictive models across diverse industries.

If you had to give one key piece of advice to companies looking to improve their data integrity using JMP, what would it be?

The key advice for companies seeking to improve data integrity using JMP is to embrace a proactive, integrated approach to data management that includes advanced analytics and robust process controls.

JMP’s powerful suite of tools offers unparalleled capabilities in data visualization, statistical analysis, and predictive modeling, yet these features yield maximum benefit only when used within a framework that prioritizes continuous data quality improvement.

Organizations should invest in training to ensure that all team members are adept at implementing JMP’s functionalities – from data cleaning to in-depth analysis. It is essential to establish clear protocols for data entry, validation, and ongoing monitoring, and to regularly audit data sets for inconsistencies or anomalies.

Fostering a culture of collaboration among data scientists, quality assurance professionals, and business stakeholders enables more effective identification and resolution of quality issues. By integrating JMP into a broader data governance strategy, companies can not only streamline analytical processes but also ensure that decision making is based on reliable, high-quality data.

Ultimately, investing time and resources in building a solid data integrity foundation using JMP will pay dividends in operational efficiency, regulatory compliance, and strategic agility. Consistently review and refine your practices to stay ahead for success.

Finally, can you share a powerful data visualization that demonstrates the impact of improving data integrity on predictive models? What are the insights it reveals, and how would you explain it to a broader audience?

One compelling data visualization that illustrates the impact of improved data integrity on predictive models is an interactive dashboard that combines a control chart with a scatter plot overlay. In this visualization, the control chart tracks real-time data quality metrics – such as error rates, variance, and drift over time – while the scatter plot displays model predictions against actual outcomes.

As data integrity improvements are implemented, viewers can observe a noticeable tightening of control limits and a reduction in variability. Simultaneously, the scatter plot shows predictions clustering more closely around actual values, indicating enhanced model accuracy.

This dual-display approach offers a clear, intuitive representation of how higher-quality data directly translates into more reliable predictive performance. For a broader audience, I would explain that the control chart acts like a health monitor for data, signaling when things deviate from acceptable ranges, while the scatter plot functions as a performance review of the model.

Together, these visual elements demonstrate that by investing in better data practices, companies not only reduce errors but also achieve more precise forecasts, ultimately driving smarter decision making and improved business outcomes. This visualization makes the abstract concept of data integrity tangible, highlighting its critical role in successful analytics.

Chandramouli’s expertise underscores the importance of having good data right from the start.

How does your organization handle data integrity? Let’s exchange insights in the comments!