Automation of Data Processing for Accelerated Product Development

During product development, reagent formulations for new sequencing platforms require multiple cycles of development, optimization, and robustness testing. The result? Significant amounts of data that need to be processed and compared using statistical analysis.

These data handling tasks are often tedious, time-consuming, and prone to copy-and-paste errors. Automation of the data processing workflow represents the ideal solution to those issues.

The Workflow Builder in JMP allows scientists with minimal programming experience to create their own scripts that automate otherwise repetitive data-handling workflows.

In this presentation, a reuseable script is created for the analysis of an example: the results of a plate-based enzyme assay. Use of column formulas, column stacking, non-linear (exponential) fits, and combined data tables are demonstrated live in conjunction with basic functions of the Workflow Builder. The final workflow is debugged in real time and modified to make the script reuseable and robust, using simple JSL elements such as variables, and the “Current Data Table” and “Pick File” functions.

The presentation culminates by showing how JMP scripts were leveraged to streamline data processing at Illumina, unlocking substantial time savings and faster insights for project development teams.

Hello everyone, on behalf of myself and Elliot. In this presentation, we're going to show how we use automation of data processing in JMP at Illumina to accelerate our product development pipeline. In this presentation, the first thing we're going to do is to introduce the product development pipeline at Illumina, who we are, and how we do our products. Then we will show how an automated data processing fits into this, provide an example of our typical workflow, and then close with key take-home messages.

Illumina is generally known as a company that generates, makes instruments for genomics. Nowadays, we can be defined as a provider of solutions and platforms for multiple types of omics, genomics, transcriptomics, proteomics. Our flagship instrument, high throughput instrument, is the NovaSeq X Series that we launched in 2023. The key technology featured in the NovaSeq X is the XLEAP Technology.

One of the key features of the XLEAP technology is that with respect to the type of reagent volume that was necessary to run a high throughput instrument in our previous iteration of high throughput instruments, we have achieved, thanks to that technology, about 90% of reduction both in package and in efforts to manage and store on the side of the customer.

This was possible because, in here, no reagent is shipped using dry ice or an ice package, so the reagent travels into a single box, and there is no effort to discard and manage the packaging, the dry ice packaging, and similar packaging on the side of the customer. Inside that box, the customer receives this type of flow cell that can run about 300 cycles per run with 25 billion measurements per cycle.

In order to develop these instruments, we estimate it, it was necessary to run approximately 5,000 of these flow cells. Every time we run them, we need to prime the instrument with a cartridge containing all the reagents, about 20 unique reagents, each one of which was developed using analytical assays and then sequencing. This is where automated data processing comes into play.

We have exemplified here the use of analytical assays to develop reagents. As you know, a plate-based assay, the scientist performs the experiment, extracts the data, performs data reduction, data processing analysis, click by click, Copy and Paste by Copy and Paste until a report is ready. At that point, one can draw conclusions and move to the next experiment. Repeat the same procedure over and over again or leverage on the feature of JavaScript.

Generate a script that will automate data extraction, data reduction, data processing, and then enable in an iterative way to perform more and more of this type of analysis on repetitive experiments, which obviously comes with multiple benefits. First of all, saving time on the experimentalist. Second, enable these to be done iteratively across multiple experiments, which leads to accelerating product development pipeline. Moreover, since we remove the need for operator click to click, Copy and paste activities, we have less errors. Having less errors means that we can standardize our procedures.

Standardization is a pillar to obtain reproducible and robust processes. This is possible even for those scientists that do not have great experience with scripting, because they can leverage on the Workflow Builder feature in JMP and also with the information available on the JMP community. Minimal coding experience is necessary.

Let's walk through an example of how we can build a reusable automated data processing using the Workflow Builder with very little understanding of GSR required. It's typical at LUNAR that we develop new sequencing reagents, and often, we need to develop a test to measure its activity. Let's say that we've got this test where we take a dark substrate, and we test our reagents to convert it into a fluorescent product, and then we put this on a plate reader, and we measure a time course of this response of this fluorescent product as a function of time.

It's typical that such a reaction would have first order kinetics, and so the formation of a product P, can be described by this equation here, which has a function of time, and then two important parameters. We have our Pₘₐₓ, which describes the maximum intensity we obtain at the end of time course, and then we also have our rate k.

This rate is the thing that we're really interested in, looking at which of our reagents has the highest rate possible. Typically, we get from our plate reader a software, raw data file. It's often an Excel file, and then we wish to use JMP to automate the data processing. What we'd like to do is import this Excel data. We need to subtract background intensity data from our responses.

We want to perform an exponential fit to determine this reaction rate k, and then we want to extract this rate data into a JMP table that gives us a summary of the results. Now we're going to go into JMP, and I'm going to show you how we can do this. But first, let's go to our Excel file. This is a typical type of file we might get from a plate reader software in its raw format. As you can see, we've got some experimental details populated by the software.

We've got a graphic of the plate layout, and then it isn't until we get to line 21 where we get our experimental data. We want to be able to open this, get a JMP to process the data, then give us a nice summary. If we want to build an automated workflow for this, and we don't know how to write code with JSL, and to get a script, we can use what's known as a Workflow Builder for this. For all our examples, we're using the basic version of JMP 18. We are performing the examples within a JMP project just so that we can keep everything organized and nice for you to see, but you can perform these tasks in the standard version of JMP too.

The first thing we need to do is go to New, and we want to go to new Workflow. This is the workflow platform, and the important thing to know is that before you start recording, what you do, you click this red button here, and now every single step you do is going to be saved as a step, and it's going to have a corresponding script written for you. At the end, you can take these steps, and you can string them together into a script that can be used.

We want to import our Excel file here, so we're going to open this. As I showed you before, the data doesn't start on the first line, and here we can specify where the data begins and where the data ends, so we can import this. When I click Import now, it's going to record that step into our workflow steps of our Workflow Builder tab, and this is what the data looks like. We have our time, we have different samples, and we have our background.

In order to perform our bits, we need to add a new column, and we're going to call this Pred for prediction. What we're going to do is we're going to predict what it might look like as a function of time using the formula I showed you on the slide. We have our two variables here. What we've done is we've specified some starting values and for the fits to be performed.

Now, if you want to pause what you're doing at any time, golf and explore, you can just click the red button again, and then you can play around. Whatever you do is not going to be added as a step. Say, for example, let's look at how the data looks now. If we go to Graph Builder and reimport our data there, and see what it looks like in its current raw form.

You can see that we have our background here, which is at around a thousand RFU units, and our prediction response starts at zero. You can see that what we need to do is we need to normalize our samples and remove the background. We need to do a bit more processing first. These buttons are very useful. If you want to go back to the start, you can click this Reset button. It's going to get you back there, and then if we click Play, we can get through to where we were when we last stopped recording.

Now before we record any more steps, we need to make sure we click that Record button again. What we need to do is subtract the backgrounds. The best way to do that here is to stack these sample columns up. We're going to create a new table where we stack these together, and we're going to call the data column now RFU, which is the fluorescence unit, and we're going to label the column as sample.

What we need to do now is we need to find the difference between our background signal and the signal of our samples. The easiest way to see that is to highlight these two columns. We right-click, we can have a new formula column. Under Combine, we can find the difference of these two columns. There's two options. There's difference in reverse order. I find it's trial and error to know which one you need to choose. If we were to choose the wrong one now, we'd have negative values. As you can see, we've now successfully subtracted the background from our samples.

Now we're able to run our fitting to work out the rate constant. We're going to go to Analyze, Specialized modeling. We're going to fit a Nonlinear model, and we want to add our prediction formula from our column to X Predictor Formula. As you can see, it has provided that formula that we added as a column attribute there for the fit model. Our baseline subtracted sample data is going to be the response. Because we've got multiple samples, we need to fit by sample.

If we click OK, we're going to open our Nonlinear Fit platform, and what we can see is we've got a different fit available for each sample. The thick, lot black line is our experimental data. The thin line is our prediction formula, which provides a starting point, and we're going to iteratively change the two variables here until we obtain a fit. Like all platforms in JMP, if you want to do things simultaneously, if you hold the Ctrl button, and we click Go, it's now going to iteratively fit our response using a prediction formula for each of our samples.

You can see now that the thin line is superimposed over the thick line, and we've obtained a convergence in our fit. Here are our estimates on the Pₘₐₓ and k variables. What we want to do now is pull these into a summary table. If I right-click and go to make combined data table, this is going to extract all our variable data from these fits. We're only interested in the rate constants, so we want to clean this up a little bit now as part of our data processing.

We want to select all the matching cells for the Pₘₐₓ. We're not interested in Pₘₐₓ, so we can remove those lines. Again, we don't necessarily need this table column here, and we know that the parameter is k, so we can also close and clean that up.

We have our estimates, our rate constants, and upper and lower amounts, upper and lower confidence intervals. Just for a better visual, we might want to reorder the estimates, so it sits in between the low and the high, so we now have a nice summary table. Then let's say we don't need all the other intermediate tables, and we want to delete them, close them without saving. We can record all these steps into our Workflow Builder here, and this is the final output that we have.

Now we can stop recording, and what we need to do is just check through and make sure that this all works properly. We can step through each step and just make sure it does what we want it to do, and there are no errors. You can see every time it executes successfully, we're getting a green tick, but now we've hit a problem. This is quite common when you make a combined data table. It creates a new table called Untitled with a number. Every time this is repeated, it proliferates a number. The script's looking for a very specific number from when we did the Workflow Builder.

You can see it can't find Untitled 7, so it's asking you which one do you want to use. This shows us a bit of a problem here, we've got across. We do need to do a bit of manual work here to get our script to work multiple times. With some very simple coding, we can resolve this. We can open the script for each step. If we go back to the last step that successfully executed, what we want to do is we want to define this Data Table as a variable, so we're not relying on an untitled number, so let's call this dt.

We say dt equals Current Data Table. What we need to do is go on each next step. Wherever it's calling for the untitled table, we want to replace that with our new variable. This may come up a few times, so we just need to go and change each untitled table to dt. We've written all those, and then we'll test it again, and we'll see if it works. This may not be the most elegant way of doing it, but it allows us to build up our own automated script about knowing how to write in code.

If we go through this again, are we successful this time? I have missed one here, person Untitled 7. We've successfully run through that script, and we now have our summary table. What we can do is we can save this workflow, or we can click on the red arrow, and we can Export, Save to Scripts Window. Now I said we want to have a general reusable script that can be used multiple times by different scientists.

Now at the moment, it's still quite specific because what it's doing is opening very specific files. With just one more line of code, we can make this a lot more general. I'm going to define a variable F, and I'm simply going to put F equals pick file. This command will allow us to select what file to use if we replace this specific file path with the variable F. Let's check out the script and see if it works now.

We're going to click the green arrow to run the script. It's asking us to pick our file, and let's see if this works. There we go. It was so quick, you didn't even see what happened. But it quickly worked through all those steps that we've written to work with the builder, and it's given us a summary table. This is one way where we can create reusable scripts that can automate our data processing, and we just performed a process that took me 10 minutes to show before, in seconds, and this can unlock substantial time saving in R&D.

Now that we have shown you how to generate the workflow, let's see an example in which the workflow processes data. We have here the example of an enzyme activity assay. As we have said, these enzyme activity assay will have a fluorescence readout. The job of the scientist is to prepare in this case an assay plate. The assay plate contains in yellow the various samples that will be processed. Each sample is repeated an equal five across the wells.

Then we have a column with positive controls and this information is used to extract the Pₘₐₓ parameter for the fitting. Negative controls, like there should be in every good experiment, and a column called dt which will provide the reading of fluorescence before the reaction starts, enabling to assess the baseline noise that can be subtracted to align the data to the zero point on the y-axis. At this point, the user has generated the plate. The plate is handed over to a plate reader that will allow the reaction to go on, process the plate, generate fluorescence data, record them into an Excel file. In the meantime, the user will have filled a sample ID table with the information relative to each sample.

This is when the automated pipeline now comes to play. It will import the Excel data and combine them in the sample ID table, so that each well is associated with the right sample. There will be time and response normalization, so that the time courses start at time zero and at the zero point on the y-axis. There will be the exponential fit following the kinetics that we just described. The reaction rate will be extracted, averaged across five runs. The diverging run will be excluded, yielding a final N of equal 4.

Then a statistical test, for instance in this case ANOVA, will be used to compare the results for the various samples. One additional feature is that we will have a folder called process dump that will be generated at the beginning of the process. In this folder, all the intermediate steps of the pipeline will be recorded. This will help the scientist to troubleshoot in case the automated pipeline fails to work.

The output in essence is taking the raw data and turning them in an iterative way into intelligible results, represented here in this example by an ANOVA. The first time the experimentalist goes about this approach, it can take them about 45 minutes to generate the data reduction pipeline. But then once the script is ready, the data reduction pipeline can run in about a minute.

If we think it took us more than hundred plates to develop one of those reagents in the reagent cartridge that I showed at the beginning of seminar, then you can imagine easily that we have saved at least 75 hours of our development pipeline, accelerating the development of the XLEAP Technology. Now I'm going back to JMP and show how the enzyme activity data pipeline works. I'm running the script. The script will ask me to select the folder and the file as shown before.

Here we go, the file has been opened, exported. Now we have some additional data that have been taken from the file. In particular, these in Column 2 will be used. These are the time zero fluorescence ratings will be used to generate a background and align every time course to the zero point on the y-axis. In a moment, this will be exported, the data will be normalized, and now we have a table with normalized time courses. On each one of them, we have imposed the signal at time zero, which is about 1%, and now these data will be stacked.

A model, which is the prediction model, will be added. There we go. The model obviously accounts for every time point and every well on the plate, what kind of fluorescence we expect to read and compare it with the actual readings. In a moment, now the Nonlinear Fit interface will open and enable automatic fit of each one of these data time courses. You can see the various time courses, as Elliot shown before, are now appearing on the screen. These will be fitted in a second automatically.

The reaction rate parameter, the k, will be extracted and tabulated for us. Fitting has happened across all the time courses provided, data have been extracted, and now the script is telling me that we have available already data presented into an ANOVA format. If you appreciate that it took about a minute maybe to run the script in slow motion, which enabled me to show you step by step what was happening, once the slow motion is removed, the script as it just showed a minute ago runs in no time and the data are ready.

We have two main messages from our presentation. The first one is we should try and work smarter, not harder. Hopefully, we've convinced you that using JMP scripts and for automated data processing can significantly boost your R&D productivity. We can automate it as more tedious tasks which allows your scientists and engineers more time to think about the meaning of those results, and innovate new products and processes. We've hopefully also convinced you that minimal experience is really required.

In the example I demonstrated earlier, we only needed two lines of JSL to make a reusable script. The JMP Workflow Builder is a real game changer that allows you to write your own custom scripts without being an expert in JSL code. We'd also like to end by acknowledging contributions of all our colleagues, in particular, Steve Mason from the chemistry team. Thank you all for listening today.

Thank you.

Presented At Discovery Summit Europe 2025

Presenters

Skill level

Intermediate

Beginner
Intermediate
Advanced

Files

Illumina_Automation of Data Processing for Accelerated Product Development_JMPDiscovery2025.pptx

Automation of Data Processing for Accelerated Product Development

Presenters

Skill level

Files

Automation and Scripting

Basic Data Analysis and Modeling