Looking at the Big Picture: Equivalence Between Two Sets of Batch Data Over Time...

Hello, everyone. Thanks for joining my JMP talk today. Today, I would like to talk to you about how we look at equivalence between sets of batch data over time at Fujifilm. In particular, I'd like to speak about a new multivariate take on the two one sided T-t ests.

Although this particular bit is a new comparison technique, it doesn't replace, rather it complements the usual single point techniques, and in the workflow, we are still relying heavily on those. The last bit here will describe how we compare data sets, from the time series, we get from two different scales, but it could be any logical group.

I'll quickly go through how we prepare the data, or rather get it in the state where we can run the scripts, and we will look at the visualizations for the usual single time points in JMP, and we'll also look at some scripts that I use to run PCA on all the variables and test equivalence.

T alking about TOST or equivalence test. T he two T-test is a two one sided T- test, and it checks whether on average, your two data sets for a given variable are equivalent. Very similarly, a multivariate test checks whether on average the end's principal component or PC for a given day of a fermentation here, is equivalent for two different scales.

You need data to be in a specific format, but in particular you need two different groups here. It's two scales, if you have more than two, you will need to split them in sets of two. It's more suitable for time series, because it's a data reduction technique that you wouldn't need if you hadn't a problem with having a lot of data points.

This was part, originally, of a script that was all done in R, but as I moved into using JMP for visualizations more and more, I thought that it was much easier to use, especially if we want to pass on those scripts to staff. R is not always that accessible.

I moved a lot of the script into JMP by now. The only thing that's still in R is the data imputation. The scripts and pre work take care of outliers, missing data, and inconsistent entries. The R does the data imputation. Why have I moved to JMP? First of all, it was because I visualized in JMP.

Why is it good to visualize data in JMP? Because JMP is just made for that. It's really good for looking at missing data and outliers and any graphs. The time series are no exception to this. The missing data visuals in JMP, give you a color map of where your data is missing, so you can find rows where data is missing.

That means a day might not be the best to keep in the data set. You can immediately visualize chunks of data missing, that it would be days in a row, and you can make a decision on whether you want to keep all those days or do the analysis twice, for example, and it will quickly show you if you have data missing from one group and not the other, in which case you'd have to do away with that variable altogether.

Also, outliers need processing prior you interpolate, otherwise they will have a huge effect on the PCA and the comparison test. There is an outlier detection platform in JMP, but this is used here in our workflow combination with watching the time series and the comparisons. But I'll show all that in the demo.

Then the Graph Builder is used to plot all the time series. There are many ways to do that, but I will show you in the script the two main graphs that we use to check that our data is good to go. Here they are, the time series, and those in particular are wrapped by a batch, so that you get an individual plot for each time series.

It's a small plot, but usually it's enough to spot missing data outliers or any weird or different behavior. Here, for example, let me get the laser on. There we go. Here we have a cluster of points that are questionable. We need to check whether this is behavior that we want to capture or if it's behavior that's unusual, and we want to imputate the rest of the data.

Or we could have single outliers like here. We see this quite often, but here you can notice it's all on the same day. So is it something that happens on day six in those fermentations? Another way to plot time series, and we do that as well, is to actually overlay them. By overlaying them, you can see whether your data is consistent for a given day.

Here we have individual value plots, which is what we look at. But I've also asked JMP to put a little box plot around all those data because this mimics what we have when we are doing our ANOVAs in the second step of our data visualization.

This is very typical of what happens in our processes over time. In the first week, the data is showing very low variability. The box plots are small and they are usually fairly well aligned. The average is around the same value.

When we reach the second week of the fermentation, things start to drift apart and things start to get much more variable. If you were plotting day six by itself in one batch per day type of plot here, you'd be able to see what the difference is on average between the large scale in red and the small scale in blue.

On day six, you have a small difference and on day 12, you have a large difference. Those differences are what we are looking to test when we do single poin T-tests with our ANOVAs. Before we carry on, I just want quickly have a recap of what the differences are between T- test and TOST.

A T- test is completely statistical, whether the TOST requires a user input for a practical acceptable difference. In a T- test, you hypothesize that there is no difference between the mean and if you get a small P value or significant result, then you deny that and you say there is a difference between your data sets.

A TOST tells you that there is no difference between your data sets, if you have a significant result. If you fail a T- test, the confidence interval for the main difference, in those plots, that's that black square here. So the confidence interval for the TOST, if you fail, does not cross the zero line.

But if you fail a TOST, the confidence interval for the difference is not contained by the practical acceptable difference. You have two outcomes for a T- test and two outcomes for a TOST, which means you have four combinations.

Either you pass or fail both, or you pass and fail one of them. In JMP, usually, there are different platforms that do TOST and T-test, but you will have a normal distribution for the difference. If you pass a test, then your mean difference is in that little bell curve. If not, it's outside.

So it could quickly visualize which ones passed or not in an ANOVA. H ere they are, the ANOVAs. Let's step two of our visualization and clean process. We use a script to plot all of those together in a report so that we can look at all of them. If you think about the data set here, it was about 15 variables over 12 days. So you have over 150 such plots, which is a lot of data to look at, especially if you change things and plot them again.

But here are some examples of what you might see. You might pass a T-test or fail it. You might pass a T-test, but only because you have enough layer that's pulling one of the data sets up or down, for example. There are many possible results that you would get here.

Not everything is on this screenshot, but we're also looking at the variance in that report. The principal components comparisons, we don't do this here with the script. I use the graph builder to actually plot those because they are like the plots I had in a couple of slides previous.

But the difference that you could see here is in the scale. Now, because we're talking about principal components, the scores tend to be around zero on average, and they vary between minus three and three because we normalize the data before carrying out the PCA.

The advantage of this is that, now instead of having to provide an acceptable value for a task, because the data is normalized, we can actually blanket calculate that acceptable difference by taking a multiplier of the standard deviation for our scores.

Here we have the first principal component and it clearly shows there's a big difference between the large and the small scale. Here with the second principal component, there is a smaller difference. This is typical of what we see, because the first principal component tends to capture the broader shape of the fermentation profile.

So if there is a difference in that broader shape, the TOST for the first PC tends to fail. Typically, what I've seen is that for our data, two principal components could capture about 60 % of the information in the variables. For those of you who may have done PC on data before, that may seem a low number, but that's probably because all the variables have a different story to tell.

Another thing I'd like to spend a little bit of time on is the loading' s plots. This is part of the PCA platform in JMP, and it has this plot at the very top of the platform, and it's a good one to look at if you're a scientist that's more interested in looking at what's really going on.

But the reason why I have this on a slide here is because this is a good representation of what we are going to see in how much each variable contributes to the model that we're choosing before doing the equivalence test.

Here, for example, all the variables related to viability for our fermentation are highly correlated because they are close together and the way they project onto the PC one and PC two here, they get high values. We said that those map well to the first two PCs. So that means that they are participating a lot to the model.

Here we have some other variables that are closely clustered together here. So sodium, potassium and Glutamine. They are highly correlated. They map very well to the first PC, but not to the second. So they don't contribute a lot to the model with the second PC.

Then here you have problematic variables. They do not map well to either PCs. That means that in a 2PC model, you are not going to capture the behavior for those variables. When you see this, you already know that a 2PC model is not going to give you a lot of equivalence for those variables.

Last step is to actually plot the TOST. This is done again using a script. Those graphs are not the graphs that you would usually find in JMP, but they're pretty typical of TOST plots. For each PC for each day, we will have a TOST or equivalence test result. If the confidence interval is outside of the acceptable range, which is three times the standard deviation of the scores in this case, then we fail the TOST.

When we fail the test, we give it a zero in the script. To summarize what happens here, each PC will capture a certain amount of the variability. Each PC can pass or fail a TOST. Furthermore, each variable contributes to a certain extent to each PC.

A principal component is a linear combination of all the variables. Altogether, a variable that has a strong contribution to a principal component that passes a TOST will have a strong impact on overall equivalence between the batches. This is what we are trying to put together.

How do we put this together? There are many ways we could put this together. I have done something pretty simple here. It's just a sum product of passing or failing a PC, times the contribution of the variable. Basically, for example, for two PCs here, we're failing the first equivalence test, and this viable, had a 40 % contribution, so that gets zero, plus one passing the TOST, times the contribution here, to that PC.

That's the overall score for it. In black, you have the basic scores for each day here. Let's call the IEQ, you'll see that in the tables later. On average, we're getting about 70 %. It's not too bad over the course of the fermentation. Adding PCs doesn't make a very big difference because this mapped very well to the two PCs.

The pH, which was one of the variables that did not map very well to the first two PCs gets a really bad average score, if you add only two PCs in the model, around 30 %. But if you add another four PCs, there are going to tell us this one, that number goes up to over 80 %.

There is no bad or good numbers here, but it's something you need to keep in mind that it really depends on the model that you choose for running this. Moving on, this is the very last output from the script, and that's what we're really interested in, especially if we are comparing different processes or different ways to run it.

You have a bar chart of all your individual equivalence s. This shows you really, by variable, which ones are similar from one scale to the other. Here we have three variables that are pretty similar amongst batches, and then it really drops down up to the last one here which has a very low equivalence.

In the top right corner here, JMP will put an average if you ask for it. T hat's a good metric, although it's very reducting. It's a good metric to compare the same processes, if you're using different ways to run the TOST or different numbers of PCs.

I have more slides about this, but I think it's better to run straight into the demo in the interest of time. I've put a little JMP journal together. Not very good at this. I hope it's going to work. In JMP, we'll just look at the data, the three scripts, and two different ways to run the last one.

That's the anonymized data set here. My computer is very slow. There we go. I think it's working. It's just really sluggish. In this data set is the bare minimum that you need, a run type with two groups, a batch ID, which is a categorical variable despite being a number here, and the time ID in this case, it's over one recording a day, so it's over a number of days.

The first bit we do is plot all the time series. I've left this with a few bits and bobs that are not really good, so that I could point them out. All the scripts in this group and there are three of them, are basically doing the same thing at the start, a bit of cleaning up and prepping, and it's going to create a clone of your data table to work on without breaking your data table, and also a directory to save all the outputs from the script.

Then the scripts are basically looping here over the number of variables and plotting them one at a time and putting everything in a report. Let's run this. This is a very generalized script and it works well on all the data sets. Always nervous, things are not going to work because it's so slow. Here we go. It's still thinking. I'll just be patient and wait.

This will plot this wrapped by batch time series and in overlaid time series as well. You'll have one of each for each variable. It will say variable one, the actual name of your variable here and plot them. This is what we want to see, basically.

We have the same shape for all the batches and they are consistent across the scales. We'll move down to one that doesn't look as good. There we go. For this variable, we have the same shape for this scale ish, and then a very different shape for the small scale. We need to find out why this is happening and do we want to keep this variable in the model.

In particular here, we have one batch that's very misbehaved. If you look at this in an overlay plot, it is very obvious that this average curve doesn't represent either the large scale or the small scale. This is a variable that you need to come back on.

Variable 6, I think was in the presentation. This shows you where you have some outliers that you may have missed the time before. Then another thing you need to look for in those plots, is whether your small and large scale numbers are mingled together.

If your red and blue points are all mixed together, then chances are your scales are pretty similar. But in some cases, like here, for example, the small scale data is almost always above the large scale data. So you can expect to see a difference here. Here I have less than variable 11, which is really, really bad.

This happens quite often to us, when we have a difference in recording the variables. Here it was actually a different unit, and that's why we have very different numbers. Now, when you put the data from graphs in a report like this, unfortunately, you lose my favorite feature in JMP, which is interactivity.

You can't actually highlight a point or a series of points and go see what they are doing in the table. But the script is saving all those individual plots for you. Here we were. In here, it's created a directory with all the plots that we've just seen, plus it saved that clone with the time series tagged at the end.

In here, you can see... If I can actually use my mouse, you can see those time series one at a time. Then you can select the points that you would normally to use the interactivity in JMP. If you have several open at the same time, all of them will be highlighted in all your plots.

That's it for time series. We'll move on to the second part of the process, and that's looking at all your ANOVA. This is in the Fit Y-by-X platform, and like the other script, it's doing a bit of tidying up at the start, and then it loops over days and creates a subset of the table for each day. Then there's a report on the differences between the groups.

If you have written script before, you will see that this is pretty typical of writing a script in JMP. Some of it is written by hand, and a lot of it, the bulk of what's happening, I basically ran in JMP and copy pasted it into the script. We'll run this on this dirty data set. Hopefully, it doesn't take too much time.

If you have a fast computer, I believe that you would not even see those windows actually open. It would be instant. You could see here how JMP is basically going to a number day and then taking a subset of that table, running a script, saving it to that little data table, and it's doing this for every day, and we have 12 days here, so it takes a while.

This will also save everything in its own folder. In this case, we'll just look at one of the saved reports. Here you have a subset, all this is for day one, but it's a much smaller table. Here you have the report that you and your scientists would want to look at, which shows you all the T-tests. Now you can look at all those T-tests just to see if they pass.

You could count how many pass and take a proportion of passing T-test. But this is also a good place for finding those more subtle outliers because each box plot might have some data that you want to question. Again, you would highlight those points and check whether you want to keep them in your final data set or not.

Moving on again. We're finally at the last bit, which is probably the most interesting. That's all the PCAs. This is a much bigger script, because it has to fetch information from the JMP platforms. I don't have a lot of time for this, but I can answer questions at the end if you're interested.

The other thing with this script is that I have hard coded some bits, so it needs to be modified for every data set. I need to fix that at some point. For example here, it's actually doing a principal component analysis on one of the days, so a subset of the data table. Then we switch to the PCA report, and this becomes an object in your JMP script.

Then from this object here, you can get items. For example, I run the PCA and I have this as an object, and now I say, I want the eigenvalues in there. The way to find the objects that you need is to open the tree structure, in your JMP report and everything is numbered and aligned. So you can get everything that you need from the JMP report as a value, as a matrix, as an array.

It really depends on what you want. But you could see I've done this here. So once it has all these values here, I extract the principal components and I fit again, Y-by-X the principal components versus my scales here. A gain, I switch to report. I'm doing this so that I can get the root mean square error from that report.

That's because it's the best estimate I will have for my standard deviation. I'm using this standard deviation here to blanket calculate my acceptable difference for my TOST. I finally can actually run my TOST here. So again, that's another group. And this time it's feet wide by X, but I'm asking for an equivalence test with Delta as my acceptable difference.

The rest of the script will plot all the tasks and it's very boring. Then at the end, it will create a table with all the outputs and all the things that we need to create our bar chart and eventually we could also create the bar chart. We'll run this for this data set. Just checking I have the right one. There we go. There's it. I'll click on it now.

You could see it in the background here. It's upsetting the tables and it's doing this painfully slowly. For every day, it will select the day, make a smaller data table, do a PCA on all the variables, and then I will save the principal components, the eigenvalues and the cosines for further calculations.

It will use the principal components for first doing a T-test, because that's where we're going to get our estimate of the standard deviation and second, do an equivalence test to check whether it passes equivalence. I think we're on day seven, we're going to get there eventually.

It will also plot all our equivalence tests, and it will also create the bar chart and the new directory. Bre with my computer. Well, this is taking longer than it should, really. I hope it's going to work. Sometimes scripts that are quite busy, mean that it's hard for JMP to catch up with what's happening in the background.

I hope it's not going to fail because of that. No, here we go. It's now created a report, and for each day, it's going to put each TOST in a column of graphs. I have written the script in such a way that they're all the same size and that was suggested by one of our scientists, actually, so they're much easier to compare.

Here we had data that really needed some extra cleaning up, so it comes to no surprise that all our equivalence tests for the first principle component are failing. That's because the PCA is done on variables that are not similar between groups. But the more subtle behavior that's captured in a second PC is still passing a lot of the equivalence tests.

I'll close this to show you what's been saved in the directory for this one. For this, you have individual subsetted table with their PCA and sub script. Even opening a small table like this is taking a long time. There we go. Here are the PCAs. Here's the loading plot.

This is where the eigenvalues come from, and here the cosines which are pulled out by the script. It has the TOST results that's used for making the TOST graphs, but we've already seen those. It has a table that shows you which TOST passed with a zero or a one here, and the explained variance, and the calculations for the explained variance in the same table.

T his columns here is what we're going to use to create our bar chart. The bar chart gets saved in the journal in this case. There are many ways you could do this, really. For 15 variables and not the best of cleanup jobs, let's see what equivalence we get here. It is all working.

It's just really slow. Sorry about that. There we go. I've had, again, feedback from scientists saying that they would prefer to see the variables in the order they were in originally, because most of our data is recorded in templates, so people are used to seeing those variables in order.

But it's also nice to have it in descending order so that you can quickly see which variables are quite equivalent and which ones are not doing so well. Here on average, we have 21 % equivalence across all our variables. It's not a very high number. I don't have a criteria for that number, but I think around 60 %- 75 % would be quite desirable.

I'll close everything I can to make some space. We'll go back to see what happens if we remove one offensive variable. I haven't done enough cleaning up here, but I'm removing variable 11, which was really not an acceptable variable to have in our data set.

I will run the task with three PCs this time, so that I can at least have a shot at capturing the variability in things like pH or PO2, which tend to be much more complex. We'll run this one and we'll have a look at the bar chart and see how much equivalence we can capture.

I suspect this is going to be slow again. This is going slowly. We're only on day two, so I need to fill up the time. As I said, we don't have a criterion to look at this total number. It's more of a relative number.

Either you have a set of criteria for cleaning up your data, or maybe because you are running batches and recording them in similar ways, you would say, we will always only look at those 10 variables, and then you can compare the overall equivalence or the bar charts for given sets of variables that are comparable.

The other way you could do it is using the same data sets like I have today. I know we have 21 % equivalence for 15 variables, but once we remove variables 11 and five, for example, and clean up some of the outliers, then that number starts going up, or it could be we have only 21 % with two PCs, but if we add a couple because some of the variables don't map very well to the first two PCs, then this number also is going up.

It's very difficult to put a criterion on that number, but it's pretty good for comparing different models or different data sets that have been treated reasonably similarly. How are we doing here? Almost there. I'm very sorry about this. My computer is particularly slow today. Here we go.

Here are tasks, and this time there are three PCs, so they're aligned by three. I think if we did this bigger, it would start sticking out of the window here. Because we have removed one variable already, we could see that some of the tasks are passing even for the first PC. So that's definitely made a very big difference.

I will close those and go back into the directory it was created. The way I've written this, if I'm doing two data sets in the same directory, it's going to get erased because Save As in script in JMP will save on top of existing data if it has the same name.

Here was the same data, we just removed one variable and added one PC, and we got from 21 % to about 47 % on average across the variable equivalence. That's showing you what a big difference it can make from just a small cleaning step or choosing a slightly different model with one more PC in this case.

Now it's me. I've gone through all the scripts. I'll put back my very last slide up here to conclude. This is a new technique to look at equivalence, this multivariate technique. I haven't seen it used somewhere else. It's a complement, not a replacement.

You should still, especially if you're heavily involved with the data, you should still looking at all the time points that you're interested in. It gives a holistic picture with a lot of detail because you have a lot of output. But if you're only interested in the final information, really that bar chart, gives you a lot of information in just one graph.

You could do this with any types of groups that you want to. This happens to be scales because we look at the difference between manufacturing and lab scales a lot at Fujifilm. That's it, really. It's your multivariate to one sided T-test. As a part of our process flow to look at scale up and scale down data. I'd be happy to take any questions.

Looking at the Big Picture: Equivalence Between Two Sets of Batch Data Over Time (2023-EU-30MP-1350)

Presenter

Advanced Statistical Modeling

Automation and Scripting

Basic Data Analysis and Modeling

Consumer and Market Research

Content Organization

Data Blending and Cleanup

Data Exploration and Visualization

Design of Experiments

Predictive Modeling and Machine Learning

Quality and Process Engineering

Reliability Analysis