Empowering R&D: AWS, Snowflake, and JMP for Scalable Data-Driven Insights (2025-...

Unlocking innovation in R&D requires more than just data. It demands seamless accessibility, real-time insights, and effortless collaboration. This presentation explores how AWS, Snowflake, and JMP empower product development by enabling scalable data ingestion, dynamic analytics, and cross-functional synergy. With cloud-based storage, high-performance processing, and advanced data modeling, decision making accelerates, reducing time to market and optimizing resources.

We showcase how we’ve used JMP Live to track on-market products in real time and manage NPD development instruments. From initial concept to full implementation in under a year, JMP Live integrates seamlessly with our AWS-Snowflake environment to provide real-time data sharing, interactive reporting, and smarter collaboration. This real-time tracking enhances visibility into product performance and enables proactive decision making across teams.

Hello everyone, my name is Christopher Pillsbury, and I'm really glad to be with you here today. I'll be sharing how we've built a framework that helps R&D teams move from ideas to actionable insights by bringing together AWS, Snowflake, and JMP.

In this session, we'll walk through how experimental data is collected and analyzed in real-time, and how this architecture empowers scientists to make faster, more confident decisions. No matter your role in R&D, data science or operations, my goal is to show you the journey from raw data to insight and how scalable analytics can reshape the way teams innovate and collaborate.

For today's agenda, I will start by walking you through the key milestones along the implementation path for our JMP Live instance, we'll explore the data journey from raw experiment to actionable decisions, we'll discuss the impacts of scaling data generation. I'll outline the IDEXX Data Solutions architecture process, showing how ingestion, data organization and analytics work together to empower R&D users. We'll look at concrete examples of dashboards and reports in action and the measurable impact they've had on decision-making and efficiency.

Finally, I'll wrap up with key takeaways and the lessons learned from building a scalable real-time analytics system in R&D. But first, let me take a moment to explain to you what my role does.

A Data Solutions Architect isn't just one role. It's a blend of several disciplines that come together to make data usable, trustworthy and impactful. That includes Governance and Security, ensuring compliance, privacy and access controls so data remains trustworthy and safe. Data engineer, ensuring data flows where it needs to go, building pipelines, integrations, transformations so raw data becomes usable.

DevOps, handling the infrastructure side, cloud platforms, automation and keeping everything secure and resilient. Data Architecture, designing models, defining structures, making sure the foundation can scale. Data Analyst, translating data into insights, dashboards, visualizations and stories that drive decisions. Business Analyst, connecting the technical side to the business problem, gathering requirements, understanding context and making sure we solve the right problems.

Finally, Database Administrator, focusing on performance, tuning, reliability, and in today's world, cost. Together, these perspectives form the role of a Data Solutions Architect. My job is to wear parts of each hat, bring them together, and deliver solutions that connect people and teams to create data pipelines all the way to business insights.

Now that you know a bit about me and what my role is, let's talk a bit about our JMP Live journey so far. As any good project should do, we started with conversations with Enterprise IT, where we mapped out a cloud first deployment path and the steps to achieve that. Their guidance helped us set up an Azure server and the necessary service accounts, building the secure foundation we needed for automation.

On July 31, 2024, Alex helped me install and get JMP live up and running. Just over a week later, on August 9th, we published our first fully automated report from a Snowflake view. Over the first year of our pilot, we pushed the limits of our JMP Live instance by load testing automated SPC charting with 7 million records and over 20 columns, refreshing every 5 minutes. Rolling out dashboards to support different business needs, and providing that this architecture could scale while delivering real-time insights to the teams who rely on it every day.

On paper, our implementation is very straightforward. However, the overall journey from idea to decision and the path to the data is not. Anyone who's worked with experiments, data and reporting knows there's a lot of hidden steps along the way. That's why before we get into scalability and systems, I want to ground us in the process we all share. Whether you're in R&D, data science or operations, the path from an idea to an actionable decision follows a similar pattern.

The path to an insight is a simple but powerful way to describe how we move from curiosity to decisions. It all begins with a question or a hypothesis. Maybe you're an R&D testing a new product concept, or maybe you're mining and analyzing a large data set. At some point, you have an idea worth testing. From there, experiments are conducted, lots are manufactured, or even code is developed.

This is where the actual work begins to take shape. Output data from instruments, manual data recording or code-generated data sets get stored, sometimes locally, sometimes in the cloud, but at this stage, it's still raw. Now comes the critical part, structuring, cleaning and analyzing the data. This is where we apply statistical methods, test our assumptions, and start to uncover the patterns hidden in the results.

Finally, insights are communicated. That might mean sharing a presentation, writing up the results in ELN, or sharing a major development milestone. Often these insights trigger the next set of questions and the cycle restarts.

At a high level, you've done the work. You ran the experiment, you collected the data, you wrangled it, you performed your analysis, and congratulations, you've reached an insight. Now, naturally, you want to share what you found. What's the next step? You grab your trusty sidekick PowerPoint, you copy over the charts, maybe add a text box or two, and then present it in a meeting. In that moment, you've achieved your goal. You've generated data that supports a decision.

But here's the question we should pause on. At what cost? How much time did it take you to get the data in shape? How many manual steps were involved? How many times did you have to reexport, reformat or update when something changed? This is what I call the hidden tax on insight. The overhead of moving from raw data to a decision.

Because in reality, the data you generate doesn't just flow neatly into JMP or any other system. Most of the time it's raw, fragmented and inconsistent. That means you or your team end up spending hours manually locating, cleaning, structuring and massaging that data before you can even start analyzing it.

Once you finally get a usable table, what happens next? Maybe you need to add calculations. Maybe you build a few derived columns. JMP is great for that. But sometimes people fall back on Excel because it feels easier. This requires a significant time investment, especially in cases where the analysis is built from scratch, even for experiments that have the same output.

But here's the trap. Even with standardized processes, everything is static. It's frozen in place. You've done the work, but the next experiment you have to start all over again. That's the burden. Insight achieved once, but with heavy manual effort and no easy way to refresh or reuse it.

What happens when you try to grow? Think about it this way. We don't expect data engineers to understand the science in order to manage the data. It stands to reason that we shouldn't expect scientists to become data engineers in order to manage the science. Their expertise is in experimentation, discovery and interpretation. That's where their time is most valuable. The more time they spend cleaning spreadsheets or wrestling with formats, the less time they spend doing actual science. Scalability is about systems that grow without breaking under more people, more data, or more complexity.

Let's take the path of an insight and apply that to a team that's generating data. The moment you add more people into the process, you introduce a whole new challenge. Variation. Different people have different levels of data processing skill, different habits, and different ways of formatting or structuring their results. The closer you are to raw data, the bigger the variation.

It doesn't stop there. That variation follows all the way through analysis and into reporting. The result, sharing data isn't straightforward, and analysis don't always line up. And presentations? Well, they can turn into an adventure of, "What does that chart mean?" "Oh, that? I just like how so-and-so format it, so I copied their style."

What happens? Teams spend enormous amounts of unseen time just aligning on storage locations, column definitions, cleaning up, formatting, and making sure there's continuity. That hidden effort keeps multiplying the more data they generate. If that's the picture for a single team, imagine scaling this up to an entire department.

You've got numerous teams generating data, each with their own approaches. Suddenly countless hours are being spent just manipulating that data, trying to get it into shape. But the bigger issue is still variation. Different interpretations, different analysis, different math, different output formats. All of these differences chip away at efficiency.

And at scale? That doesn't just slow down individual projects, it can stall overall progress. This is the real organizational burden. When every team is reinventing the wheel, insights become inconsistent, alignment takes longer and momentum gets lost. The challenge and the opportunity is to build systems that let each group focus on what they do best while still working seamlessly together.

At IDEXX, we've recognized this challenge, and we're working to rebuild the relationship our people have with data. We've retooled our systems, our teams and our processes, so that instead of wrestling with data, people can focus on using it. That means defining the work clearly, building useful automations and creating no code or low code access to the big data systems we're putting in place for ourselves. Because becoming a data-driven organization isn't just about technology. It's about understanding where the pain and waste really live. It's about knowing what can realistically be taught to people and what is better solved with technology.

By combining the strengths of our people with the strengths of the right tools, AWS, Snowflake and JMP, we're removing barriers to insight and making it possible for teams to move faster with consistency and confidence.

Here's the bottom line. In today's R&D landscape, scalable systems aren't optional anymore. The pace of science is faster than ever. The volume of data we're generating is only growing. If our systems don't scale, they can't keep up. We end up with bottlenecks, inconsistencies and wasted effort. Scalability is what allows us to turn science into solutions at speed. It's the difference between spending weeks wrangling data and being able to make decisions in real-time. This is why rethinking how we approach data is so critical. Because speed, consistency and trust in our insights directly translates into progress in R&D.

For the next few minutes, we'll be exploring the high-level details of the data solutions architecture process, which is built around two main categories. The first category is Data Routing and Ingestion, where raw data is collected, migrated and managed. The second category is Data Lakehouse and Analytics, where our data and analytic architecture reside.

In our R&D environment, data can come from multiple sources. Some of it is collected manually by scientists, while other data is generated automatically by instruments and machines during experiments. Once the data exists, our cloud migration service steps in. It continuously watches designated directories for new files and when it detects them, it automatically moves the data into AWS storage. This ensures that all incoming data is captured in a centralized, secure location, ready for downstream processing and analysis.

Once the data has been migrated to AWS, it enters the Data Architecture layer. Here, all files are collected and stored securely, giving us scalable and reliable raw data storage. From there, the data is ingested into Snowflake using SNS notifications and Snowpipes, enabling real-time movement from AWS to our data platform.

Within Snowflake, automated ETL processes clean, normalize and transform the raw data into standardized models, so it's ready for analysis. We also leverage Snowflake's advanced capabilities such as views and dynamic tables, which allow analysts and scientists to query and combine multiple data sources seamlessly, providing a consistent and efficient foundation for downstream analysis in JMP Live.

Finally, we reach the Analytic Architecture layer. This is where the standardized data models from Snowflake are transformed into actionable data views designed to support specific business and scientific use cases. We provide curated access points tailored to the needs of different stakeholders and their analytic workflows, so users can focus on insights rather than data wrangling.

Essentially, this layer bridges the gap between raw data and decision-making, aligning data structures with reporting, visualization and overall business objectives. It's the step that turns structured data into meaningful real-time insights in JMP Live.

At a high level, it all starts with data generation and the creation of a data package. Once the data exists, our cloud migration service continually monitors the relevant directories and moves new files to AWS storage. Next, Data Architecture layer takes over. From here, the data is ingested to Snowflake from AWS using SNS notifications and a Snowpipe.

In the Analytic Architecture layer, we transform the raw data into standardized models and then into actionable data views that support specific scientific use cases.

Finally, publishing reports and data to JMP Live allows for self-service automation. In short, this flow from data generation through architecture to analytics turns raw experimental data into the insights that drive decisions.

Now I want to pause for a moment. All the architecture diagrams and flowcharts we've seen are important. They show that the system works. They don't actually prove value on their own. Real value comes when the people using the system, our scientists, can immediately see the insights they need to make decisions.

When a scientist says, "That's exactly what I need," that's the moment the system has delivered real impact. That's exactly what we'll explore next. The tangible use cases and measurable impacts that show how this architecture translates into results for R&D.

To set the stage, we have a microtiter plate reader and that produces a log file that has two variations, as you see here. During a calibration and control run, instruments generate these log files. They contain important calibration and control results embedded in tables directly within the file.

An MPD team became very interested in this data and created a report process that tracked equipment health based on the results of these runs. As with most things, they didn't just collect it from the R&D labs, but they also started pulling it from the field. Here's the catch. All of this was done manually. What began as a simple curiosity gradually turned into a repetitive, time-consuming process.

Every month, teams were collecting hundreds of log files, manually copying and pasting data into junk tables and generating reports. It was a huge effort repeated month after month. A perfect example of a high-value task being bogged down by manual work.

Working closely with this team, we built a DSA data pipeline to automate the entire process. Instead of manually collecting files, we began capturing them programmatically as soon as they were created and disseminated them to AWS storage. Once in Snowflake, the data underwent further processing, cleaning and standardization. From there, we created views that served as access points for JMP Live reports as seen here in the screenshot. The result is a well organized centralized table transforming what was once messy manual data into structured reliable resource that scientists can use instantly for analysis and decision-making.

Now, the report in JMP Live updates automatically whenever a new calibration or control run is completed. Instruments are grouped by ID and location and each calibration or control event is clearly tracked by date. This provides real-time visibility into system health, helping R&D operations and reference labs catch potential maintenance issues before they become problems. In other words, we not only eliminated the repetitive manual work, but we also created a cross departmental report. A single point of truth that everyone relies on for accurate up-to-date data.

Used to be a time-consuming manual process is now fully automated. Each step in the workflow from data generation to report update is completed within about 20 minutes of the event finishing. This rapid turnaround means team have real-time visibility to their data, allowing faster decision-making and more proactive management of experiments and instrument health.

But the system isn't valuable just because it exists, it's valuable because it changes the way scientists interact with data. When they stop asking how did this data get here, and start asking what does it mean? That's the proof that we've delivered real actionable value. Across multiple areas, we've transformed manual time-consuming processes into automated real-time workflows.

In calibration and control field monitoring, a single scientist once collected data from hundreds of instruments. Now, network-wide automated collection provides real-time visibility into system health. For on market data access and platform NPD, R&D staff previously manually processed microtiter plate assay data. Automation now delivers analysis ready data sets within minutes of a plate read, accelerating decision-making.

In the development of novel instruments and consumables, we remove the need for expected manual data collection by establishing automated ingestion, giving teams near instant access to results post run.

Finally, in manufacturing SPC data, repetitive slow analysis have been replaced by real-time SPC charts accessible 24/7 through JMP Live dashboards, enabling faster data-driven decision across teams.

What are the key takeaways from this? We've got raw instrument data is automatically ingested from the labs into AWS and Snowflake. This reduces delays and ensures scientists are working with complete accurate data sets from the start. Snowflake provides a single secure platform to store and structure all experimental data. No more silos or fragmented spreadsheets, just clean query-ready data accessible across teams. With JMP and JMP Live, scientists can create reusable dashboards and analyzes that refresh automatically with new data, saving hours of manual work and supporting continuous monitoring.

Dashboards update as experiments are completed, giving decision makers live insights into progress, outcomes and emerging trends. No more waiting for static reports.

Results can be shared easily across R&D operations, QA, manufacturing and leadership, helping align priorities and accelerate product development. Overall, this system transforms how teams access, analyze and act on data, turning raw experimental results into timely, actionable insights that drive innovation.

As we've seen today, the journey from raw experimental data to actionable insights doesn't have to be slow or manual. By combining AWS, Snowflake and JMP Live, we've built a system that captures, structures and analyzes data automatically, turning what was once tedious work into real-time reliable insight.

We have demonstrated several key impacts. Faster access to high-quality data, a scalable and centralized architecture, repeatable and automated analytics, real-time decision support and cross-functional visibility that drives collaboration. The true measure of success is in the user experience. Scientists no longer ask how the data got there they ask what it means.

That shift from a process question to insight questions is the ultimate proof that the system delivers value. Looking ahead, this approach provides a foundation for continuous innovation. Teams can spend less time wrestling with data and more time using it to solve problems, make decisions, and accelerate product development.

Thank you for your time today, and I hope this has given you a clear picture of how automated scalable analytics can transform an organization.

Presented At Discovery Summit 2025

Presenter

Chris Pillsbury

Skill level

Intermediate

Beginner
Intermediate
Advanced

Empowering R&D: AWS, Snowflake, and JMP for Scalable Data-Driven Insights (2025-US-30MP-2307)

Presenter

Skill level

Data Access

Data Exploration and Visualization