When solving a real-world problem – such as increasing yield or decreasing scrap in a manufacturing process; understanding which factors affect the quality of a product; or performing a root-cause-analysis on equipment failures – leveraging data can provide a powerful advantage. However, answers are very rarely offered up freely. Before solutions are found, the data must be accessed and imported, cleaned and organized, visualized and explored, and maybe even passed through various machine learning algorithms. The results must then be interpreted and considered alongside practical knowledge relating to the situation in question. As a final step, learnings must be communicated in a way that is understood by those who will make decisions or perform the necessary actions that follow.
The cumulative process by which untapped data is captured and converted into valuable insight is called an analytic workflow. Multiple capabilities are utilized while moving along the steps of the workflow, potentially even passing between individuals or departments responsible for various aspects.
JMP contains everything needed to transform raw, messy data into knowledge and value for an organization. You can find a depiction of the JMP Analytic Workflow (Figure 1), by which insight can be extracted from raw data, at the JMP website. Clicking on the orange boxes corresponding to the various analytic capabilities included in the software takes you to the JMP Minute Demo, a brief explanation of how the feature can be used.
Figure 1: The JMP Analytic Workflow, showing all the tools needed to transform raw data into shareable insights for any organization.
For example, in the data access step, data from an external source may be:
- Accessed using the JMP Excel Import Wizard.
- Cleaned using Recode and the various tools found in the Tables menu.
- Plotted to check for irregularities or weird data points.
- Modeled to explore benefits/risks of various options to determine best solution/trade-off.
The results can then be communicated using JMP Live, allowing users to see reports and providing on-demand access to raw data for further analysis. Perhaps these steps were recorded using the Action Recorder and automated for when new data becomes available, for periodic updates (daily/weekly/etc.) to monitor KPIs with control charts, or to send an email alert if the process suddenly drifts. It could be that, over the course of these analyses, it was discovered the data didn’t contain the right information and it was necessary to collect more, providing an opportunity to benefit from design of experiments.
The point is that these analytic techniques are not complete solutions within themselves and are rarely used independently of each other, thus underscoring the convenience and need for having them within a single platform.
The flip side to that is the Analytic Workflow is rarely used in its entirety by any one person within an organization. There are numerous companies that make use of much of the diverse analytic capabilities found within the software, but these are delineated according to an individual user’s role (see Figure 2).
Figure 2: Examples of specific capabilities of the JMP Analytic Workflow that may be used by various individual user’s roles within an organization.
Where to start: Defining the Problem and Solution Criteria
Before we can begin to determine the steps needed to solve a problem, we first have to define both the problem and the criteria by which we will judge whether it has been solved. I encourage you to not take the example here too literally. The analytical techniques used to explore and, ultimately, overcome the problem presented in this case study can equally be applied in many different situations across a range of industries. Therefore, consider examples in your own work in which you could employ these features for your and your organization’s benefit.
How to use this case study:
Below is a list of all the steps in the workflow used to investigate and solve the problem presented in this case study. I've also given examples of the steps that are more commonly used by people with different job titles. Click through the links below to follow your own workflow according to your current role, or read through all of them to get a sense of how JMP can be used throughout an organization. I’ve included the starting Excel file so you can replicate the steps, but feel free to follow along using your own data.
R&D Scientist: 2-9, 14
Manufacturing Engineer: 2-9, 13-14
Data Scientist: 1-6, 10-14
IT Administrator: 1-2, 14
Problem definition
This example involves a manufacturing process in which pharmaceutical tablets are being produced. Like many processes, this one involves a series of steps. At each step, there are various settings that can be adjusted. These can be seen in the process flow diagram (Figure 3) below each step. In addition, there are several nominal variables that must be selected for each batch, which are listed above the steps (materials from different suppliers, as well as a choice of compressor for the Compress step).
Figure 3: Tablet manufacturing process flow map used as an example case study.
The response measured at the end of the process is Dissolution, a measure of how fast the tablets dissolve in the body. High values of dissolution are desirable, with a lower spec of 70%; batches measured below this value are scrapped. The task at hand is to determine how to produce the tablets and run the manufacturing process with a scrap rate reliably below 5%.
(Reminder: this is a demonstration of how the power of analytics, delivered through the JMP Analytic Workflow, can be used to solve a problem. The techniques presented here are not limited to manufacturing or pharmaceuticals.)
1) Clear log to get ready to record analysis for automation
- Go to the View menu in JMP and select Log.
- Click on the red hotspot near the top-left corner of the log window and select Clear Log (Figure 4).
Figure 4: Top-left corner of log window, showing menu behind red hotspot with Clear Log indicated.
2) Data access
As our data is contained within an Excel file, we’re going to import the data using the Excel Import Wizard.
- Simply drag the Excel file from your desktop, or wherever you saved it, into any open JMP window (the Home Window, for example). See the result in Figure 5, below.
Figure 5: Excel Import Wizard after dragging Excel file into open JMP window, showing default settings.
As can be seen in the Data Preview section in Figure 5, the table doesn’t look quite right. Let’s get it into an analysis-ready format by adjusting the Individual Worksheet Settings as shown in Figure 6.
- Set Column headers start on row to 2.
- Set Number of rows with column headers to 2.
- Select Import.
Figure 6: Individual Worksheet Settings in Excel Import Wizard after adjusting values to align data properly.
You should now have a table called TabletDissolution.
3) Data exploration and visualization, part 1
Immediately, it appears there are some issues with our table. It looks as though some of the values in the second column, API-Particle Size, were entered as Small or s, presumably to mean Small. Scrolling down the table shows entries for Medium and Large have similar issues. Do any other columns have similar problems? We can get a sense for this by turning on the Column Header graphs.
- Click the Histogram symbol at the top-left of the table to turn on the Column Header graphs (Figure 7).
Figure 7: Top-left of the table with the Column Header graph button circled in blue.
4) Data blending and cleanup, part 1
Looking through the Column Header graphs, it seems the MgSt-Supplier has some data-entry problems, as well as API-Particle Size. Let’s correct those using Recode.
- Select the API-Particle Size column by clicking on the column name at the top of the column.
- With the column selected, go to the Cols menu and select Recode.
- At the top-left of the Recode window, change the drop-down menu from New Column to In Place.
- Hold the shift key while clicking on the l and Large in the first two rows of the Recode window. Release shift and press the Group button (Figure 8).
Figure 8: Recode window, with Large and l values highlighted, and the Group button highlighted.
- Highlight and group Medium and m and Small and s values (see Figure 9).
- Press Recode.
- Optional: Repeat Recode procedure for the column MgSt-Supplier.
Figure 9: Recode window with the three groups for API-Particle Size defined.
Another problem some of you may have noticed is that many of the columns that should contain continuous data appear as categorical. This can be seen in the Columns viewer on the left-middle of the table (Figure 10). All columns after and including Blend Time-hr, apart from Compressor and Coating Supplier, should be continuous. We can correct this easily by first changing the format of the numbers and then using the Standardize Attributes feature.
NOTE: The reason the columns are categorical is due to the period used to denote a decimal place. I’m based in Europe, so Windows is configured to use a comma on my computer. If your system is set up to use a period, the data will have been imported properly and you can skip these steps. Note also that it's possible use JMP language, rather than System locale settings, for numbers, dates and currency formats; this can be set in the "Windows Specific" preferences.
Figure 10: Columns panel, showing that most columns are categorical, with the exceptions of Mill Time-hr and Screen Size.
- From the Edit menu, choose Search --> Find.
- Enter a period symbol in Find what: and a comma symbol in Replace with:.
- Uncheck the Match entire cell value checkbox.
- Press Replace All.
- Close the Search dialog window.
All periods in the cells will have been replaced by commas. Now, we can convert the Data Type and Modeling Type of the affected columns.
- Click on Blend Time-hr to select the Columns Panel.
- While holding the shift key, click on Dissolution in the Columns Panel to select all columns.
- Release the shift key, hold the control key, and click on Compressor and Coating Supplier to deselect these columns.
- From the Cols menu, choose Standardize Attributes...
- In the Standardize Columns Attributes window, select Data Type and Modeling Type from the drop-down menu of attributes to be standardized.
- Change Data Type to Numeric and Modeling Type to Continuous.
- Press OK.
The machine sensor columns will now all be numeric and continuous. We’ll change the modeling type of one more column: Screen Size. JMP recognized the numbers and set the modeling type to Continuous by default, but we know that 3, 4 and 5 represent separate shapes. Ordinal would be more appropriate if we wanted to compare Dissolutions between sizes to confirm there is a difference.
- Click on the blue triangle in front of Screen Size in the Columns Panel.
- Change the modeling type to Ordinal.
5) Data exploration and visualization, part 2
We can use the Columns viewer to get a quick snapshot of our data in the form of summary statistics relating to each column. For the nominal or ordinal data, we are shown the number of distinct categories in each column. For the continuous, we see the range, mean and standard deviation. If these are within our expectations, we should be fine to do some analyses. On the other hand, if they’re way off, we may have some more cleaning to do.
- From the Cols menu, select Columns Viewer.
- In the Columns View Selector window, select all columns and press Show Summary.
Immediately, it appears everything is fine. We have the correct number of categories and approximate ranges for the nominal and ordinal data, and continuous data respectively. However, upon looking at this, it comes to mind that we may want to include an Accept/Reject column to indicate accepted or rejected batches (those with Dissolution values above or below 70).
6) Data blending and cleanup, part 2
There are many ways we can go about creating a new column of accepted and rejected batches. If you know a good way that you’re comfortable with, feel free to use it. Or you can follow along with me as we use a new feature introduced in JMP 16, Make Binning Formula…
- Select the Dissolution column.
- From the Cols menu, choose Utilities --> Make Binning Formula…
- Select all the default cut points and press the -sign to remove them (see Figure 11).
Figure 11: Binning dialog window, with all default cut points selected and the remove button (-) indicated by a blue circle.
- Change the value of the cut point to 70 and, if desired, rename the Value Labels to Reject and Accept, as in Figure 12, by double-clicking on the Value Labels and typing "Reject" and "Accept."
Note that it's also possible to change the label types using the red triangle menu beside "Cutpoints" and selecting "Bin Label Style -> Custom."
Figure 12: Binning dialog window, with cut point value changed to 70 and value labels changed to Reject and Accept.
- Press Make Formula Column.
- To rename the new column, double-click on the name at the top of the column, change name to Accept/Reject and press OK.
Now that we’ve cleaned and formatted our data and added a column indicating whether a batch was accepted or rejected, we’re in a position where we start to understand the story behind the data. Specifically, we’re interested in knowing the proportion of accepted/rejected batches and the factors affecting that proportion (and the other factors that don’t have any influence). Ultimately, we’d like to know how we need to adjust our process to minimize the occurrence of rejected batches.
7) Basic data analysis and modeling, part 1
To start getting the answers to these questions, we need to have an idea of where to look. The process of creating visuals to have a sense of where the answers lie, or maybe even what questions we should be asking, is called exploratory data analysis (EDA).
- From the Analyze menu, select Distribution. In the pop-up dialog window, drag the Accept/Reject column in to the Y, Columns box and click OK (Figure 13).
Figure 13: Distribution dialog window, with column Accept/Reject in the Y, Columns box.
The resulting Distributions report, shown in Figure 14, includes a Frequencies table indicating that there are 14 rejected batches out of a total of 90, corresponding to a proportion of about 15.6%.
Figure 14: Distribution report for column Accept/Reject.
We can follow up with histograms of the factors in our table, which may or may not have any effect on the accept/reject ratio. We’ll also include the response column Dissolution.
- From the Analyze menu, select Distribution. This time, drag all columns from API-Particle Size to Dissolution Reject column in to the Y, Columns box.
- Before clicking OK, check the Histograms Only box at the bottom-left of the dialog window to hide the summary statistics. These can be turned on later using the red triangle menus.
- Click OK.
Note: If you pressed OK before checking Histograms Only, no problem. Simply close the Distribution report and go back to Analyze --> Distribution. Pressing Recall here will set up the dialog the way it was the last time you used Distribution. This is a nice trick to save yourself a few clicks if you ever need to redo an analysis.
The resulting report shows histograms for all the factors plus the response. You can scroll to the right to see how all values for each factor is distributed. For example, the left-most figure shows there were slightly more Medium API-Particles than Small or Large ones. To the right of this figure is the histogram for Mill Time, for which there is an approximately uniform distribution of values up until around 30 hours. We can learn a lot from simply looking at these figures and considering whether these distributions align with our understanding of the way the batches were run.
Like all visuals created in JMP, these are dynamically linked to the data table. We will use this property to drill down into the images further to uncover yet more about our process.
- Click on the green histogram-rectangle beside Reject in the Accept/Reject distribution report, depicted in Figure 14.
- Notice that all rows of rejected batches in the data table are now selected.
- The corresponding values of the factors are also made visible within the distribution report as embedded histograms (dark green), as seen in Figure 15.
Figure 15: Left-most three histograms of the Distribution report, which indicate embedded histograms for rejected batches.
The embedded histograms represent the subset of data comprising the selected rows. Contrasting these with the histograms for the overall data (light green in the figures) can signal significance when there are drastic differences. The contrary is also true: embedded and overall histograms with the same shape suggest the factor is has no effect on the response – in this case, whether a batch is accepted or rejected. Notice that for Mill Time, the overall distribution is uniform, while it is right-skewed for the rejected batches. Looking over at Screen Size, Size 5 seems overrepresented among rejections, while Sizes 3 and 4 appear underrepresented. This is curious and merits a closer inspection of exactly how these factors relate to Dissolution.
Distributions is a powerful tool that can provide clues as to where we need to look to find answers with just a few clicks. Feel free to explore further to see if anything else jumps out that may warrant closer inspection.
Data exploration and visualization, part 3
Now that we’ve uncovered two factors that may be affecting our process, we can drill down into them even further using Graph Builder.
- From the Graph menu, choose Graph Builder.
- In the Graph Builder window, drag the Mill Time-hr column and drop it in the rectangular area marked with an X (middle-bottom).
- Scroll down the columns to Dissolution. Drag and drop it into the Y area.
The result is a scatter plot of Dissolution vs Mill Time-hr with a blue spline snaking through the data. This spline is meant to help guide viewers to the shape of the data. Rather than use this, let’s change the smoother to a Line of Fit.
- Right-click in the middle of the graph.
- Choose Smoother --> Change to --> Line of Fit. (Note: We could have done this using the boxes just above the graph title as well.)
- Press Done.
We have now produced the graph as seen below in Figure 16. Looking at the figure, it definitely seems there is a positive correlation between Mill Time and Dissolution in our historical batches. Is this statistically significant (which is another way of asking the likelihood that this is real and not just a fluke)?
Figure 16: Dissolution vs Mill time-hr with a Line of Fit depicting a positive correlation between the variables.
We’ll attempt to answer that question in the next section. In the meantime, you may recall that Screen Size also emerged as a factor of interest when exploring the histograms. Let’s add Screen Size to this graph to see if we can learn something more about how it may influence Dissolution.”
- From the red triangle beside Graph Builder, choose Show Control Panel.
- Drag (but don’t drop!) Screen Size over to the Group X, Group Y, Wrap, Overlay and Color areas of the Graph Builder window. Doing so will provide a preview of what the graph would look like if the column were dropped in that drop zone.
- After exploring the various drop zones with Screen Size, drop the column into the Group X drop zone.
- Press Done.
The nature of Graph Builder allows columns to be dragged around, producing various other graphs, until one is stumbled upon that uncovers something useful about our data. By dropping Screen Size into Group X, we’ve created three scatter plots side-by-side, each with its own Line of Fit (Figure 17). From the figure, we can see clearly that the positive correlation between Dissolution and Mill Time-hr was influenced by the size used; there was a strong effect when Sizes 3 or 5 were used, but no effect when using Size 4. The suggestion here is that an interaction effect exists between these two factors and our response. We will investigate that further when we reach the modeling phase of our analysis, but for now, let’s take a closer look at these variables individually.
Figure 17: Dissolution vs Mill time-hr for all three values of Screen Size, with Lines of Fit depicting positive correlations for Sizes 3 and 5, but no correlation for Size 4.
9) Basic data analysis and modeling, part 2
Is the effect of Mill Time on Dissolution statistically significant? How confident can we be that increasing Mill Time really does lead to an increase in Dissolution? What about the lower dissolution that was measured for batches run with Screen Size 5? Was that just an accident? If we continue using Size 5, are we likely to continue seeing lower-quality batches than if we used Sizes 3 or 4? These are the sorts of questions we are going to answer in this part of the analysis using the Fit Y by X platform.
- Select Fit Y by X from the Analyze menu (second from the top, after Distribution).
- In the dialog window, drag and drop Dissolution into the Y, Response column box.
- Next, drag and drop both Mill Time-hr and Screen Size to the X, Factor column box.
- Press OK.
The first thing you may notice is that the two resulting plots, apart from looking very different, are also called different things: the plot on the left is called Bivariate Fit of…, while the one on the right is called Oneway Analysis of…. This is because they were made using dissimilar types of data, for which certain analyses may or may not apply. In the case of the Mill Time-hr, continuous data plotted on the x-axis and the Dissolution continuous data plotted on the y-axis, it may make sense to plot a regression line that fits the data and describes how Dissolution changes as a function of all values of Mill Time. We can then calculate the likelihood that this is real or just aberrations in the data by checking if the slope of our fit line is significantly different from 0 (zero being the slope of a flat line plotted at the mean of our response). On the other hand, a continuous line of fit between Screen Sizes 3 and 4 is nonsensical if there are no sizes between 3 and 4. In this case study, Screen Size is a discreet variable. A reasonable question to ask is, are there real differences in our response when comparing values made using one size with the values made using another? We’re interested in analyses that investigate these comparisons rather than calculate slopes between the sizes.
Let’s start with the Bivariate Fit on the left:
- From the red triangle of the Bivariate Fit figure, choose Fit Line.
The resulting plot and report are shown below in Figure 18. There is lots of information about the line and how well it fits the data. It isn’t necessary to understand every aspect of this report to answer our questions, much like it isn’t necessary to read all the details of a weather report to know whether to take an umbrella. If anyone is interested, a description of all of this can be found in the JMP Online Documentation. For now, let’s just extract what we need to answer the questions we had about our process.
Figure 18: Bivariate Fit of Dissolution vs Mill time-hr, with a linear fit and accompanying details indicating that the effect is likely real.
We can see that the line in the scatter plot is sloped upward, indicating that higher dissolution values were achieved using longer mill times, and lower ones using shorter times. The equation of this line can be found just under the Linear Fit, with the slope having a value of about 0.165 %/hr. Is this significant? The estimate for the slope is repeated in the Parameter Estimates section of the report, along with its associated p-value (0.0004), which means is there is a four out of 10,000 chance that a slope that steep was achieved simply by chance. It is, of course, possible that Mill Time has no real effect on Dissolution, but it’s a much safer bet that the effect is real.
Let’s now turn to the red triangle menu beside the Oneway Analysis plot on the right. You’ll notice immediately there is no option to fit a line. As discussed, it makes no sense to fit a continuous line through discreet data. We do, however, have many options for comparing these batches. Let’s use a very common approach, the t-test.
- From the Oneway Analysis red triangle menu, select Compare Means -> Each Pair, Student’s t.
That’s it, t-test done. Of course, it now falls on us to interpret this report (Figure 19). Once again, there are many details here, but we only need very few to determine if these screen sizes are equivalent.
Figure 19: Oneway Analysis of Dissolution vs Screen Size, with a means comparison done using Student’s t-test. The results suggest a significant negative effect of using Screen Size 5.
Looking at the Connecting Letters Report, it is stated that levels not connected by the same letter are significantly different. Levels that correspond to Screen Sizes 3 and 4 both have the letter A beside them, whereas the letter B is shown beside Size 5. The implication is, therefore, that Sizes 3 and 4 are equivalent to each other, but not to Size 5. The associated probabilities that this has not occurred by chance is expressed in the p-Values of the Ordered Differences Report. It is quite safe to conclude the differences between Sizes 3 and 5 and Sizes 4 and 5 are real (p-values of <0.0001 and 0.0017, respectively). Conversely, the comparison between Sizes 3-4 is less likely to be real ( 5.79% that it occurred by chance, which may sound small but is above the usually accepted burden-of-proof of 5%).
If we aren’t convinced, we can go ahead and try to collect more data to be sure. Alternatively, if we immediately stop using Screen Size 5 and crank Mill Time up to 30 hours, we should begin to see an improvement in Dissolution values. We’re also going to continue along the analytic workflow to Advanced Statistical Modeling, firstly to identify factors that are likely significant that we may have missed during our EDA. We also want to build a statistical model using all of these factors to uncover any underlying interactions between them and, ideally, find a sweet spot in our process where we can reliably and consistently produce the highest quality batches possible while reducing the occurrence of rejections to a minimum.
10) Advanced statistical modeling
Let’s start with the first question: are there any factors we might have missed that we should be taking a closer look at in addition to Mill Time and Screen Size?
- From the Analyze menu, choose Predictive Modeling --> Partition.
- Select Dissolution from the list of columns in the Partition dialog window and press Y, Response.
- Select all columns from API-Particle Size to Atom. Pressure-Pa and press X, Factor.
- Select OK.
- In the new Partition for Dissolution report window that popped up, select Split.
Once again, lots of information in this report, but we don’t need all of it to recognize that Screen Size came up as the most significant variable. No huge surprise here, since we had already noticed it during the EDA. Notice that the report now contains two boxes that say Screen Size(3, 4), and Screen Size (5). The partition algorithm identified a cut point within this significant factor that divides, or partitions, the data to capture as much of the variation in our process as possible. Apparently, it’s come to the same conclusion as we have when we explored this factor using Fit Y by X.
Let’s keep going:
Mill Time-hr has emerged as our second-most important factor in our process. Once again, not a huge surprise. But we’ve uncovered another cut point (11 hours). Also, you may have noticed that the Mill Time-hr boxes are rooted in the Screen Size(5) box. This means that the cut point only applies when using Size 5, and therefore gives us a spec limit if we’re ever stuck using it. Size 5 isn’t ideal, but if we must use it, we should at least make sure we mill for 11 hours or longer. That’ll give us our best shot at making half-decent batches.
Ha! Blend Time-hr wasn’t something we noticed before.
- Select Split again and again, until leaves are no longer added to our decision tree. (You should be able to do 11 more splits.)
Well, a number of factors have now been added to the tree. In fact, a few factors have been added multiple times. Each leaf in this tree is a recipe, with factors in the leaves above (parent leaves, closer to the trunk) set above or below, as applicable, and all other factors set to their mean (or assigned randomly in the case of the categorical variables). The predicted mean value of dissolution produced with this recipe is given within the leaf. One thing we could do is look through all the leaves to find the best recipe, that is, the recipe with the highest predicted mean. Take a minute to see if you can find it… Hint: it’s near the center-bottom of the tree: Screen Size (3 or 4), Blend Time (≥15.9 hours), Lactose Supplier (Bond Inc.) and Coating Viscosity ( ≥99.3 cp), with all other factors set to their mean values. As you can see, it’s predicted to produce batches with a mean dissolution of about 79.
We could try running that recipe and monitoring the resulting batches to see if our model is accurate and if the dissolution values of the resulting batches are close to our predicted value of 79. It’s perfectly fine to stop here to check if this works, if producing batches of this quality solves our problem. In that case, we can move on to whatever problem needs solving next. However, in this scenario, this isn’t good enough. We’re going to need to find a way to get higher, which means squeezing even more information out of this data to try to wring out our solution.
- From the red triangle menu for Partition for Dissolution, choose Column Contributions.
This action adds the Column Contributions report, shown in Figure 20, to our Partition window. Starting at the top, the report tells us, in descending order, the significance of each of our factors. It appears as though Screen Size is the most significant factor, followed by Mill Time, Blend Time, Blend Speed, and so on. Reading further down, we eventually reach factors (starting with API-Particle Size and below) that don’t appear in any of the leaves and, therefore, don’t seem to have any effect on Dissolution.
Figure 20: Column Contributions from the Partition for Dissolution report, showing the significance of the factors in descending order. Spray Rate is seen to be the last significant factor; everything below it is considered not to have any effect on Dissolution.
We can therefore focus our efforts on increasing Dissolution on the eight factors that are significant, and we can largely ignore the remaining nine factors that aren’t.
- From the Column Contributions report, select the eight significant factors (Screen Size to Spray Rate). Notice that these columns are now selected in the table.
- From the Analyze menu, choose Fit Model. The Fit Model dialog window now appears with the columns already selected (Figure 21).
Figure 21: Fit Model dialog window, with columns of significant factors selected.
- Press the Macros button in the Fit Model dialog window and select Response Surface.
- Finally, select the Dissolution column and press Y (button at top-middle of dialog window).
- Press Run.
Note: A response surface model includes all main effects (in this case, our eight significant factors), all two-way interactions, and all squared terms. It’s been shown many times that this type of model is an excellent starting point for tackling optimization problems, trying to find the sweet spot among our process settings. Barring some model refinements, which we’ll discuss below, it’s usually a good-enough ending point as well, meaning it captures enough information about our process to solve whatever problem we were trying to solve.
A Response Dissolution report window pops-up. Once again, there is lots of information, all of it explained in detail in the JMP Online Documentation, but most of it is not required knowledge for solving our problem.
We can start examining this report from the top and work our way down, skipping over things that aren’t of particular interest. The first thing we encounter is the Actual by Predicted Plot, replicated below in Figure 22. What’s important here is that all the points mostly fall along the dark red straight line. They’ll never all fall exactly along the line; there will always be some variation in our process that is not captured by the model. The question then becomes how much variation is captured by the model? The RSq value just below the plot provides that information: 0.88 (or, 88%).
Figure 22: Actual by Predicted Plot for a full response surface model with eight main effects, capturing 88% of the variation in Dissolution (RSq = 0.88).
Below the Actual by Predicted plot is the Effects Summary (Figure 23), which lists all the effects we included in our model in decreasing order of significance. Of course, this is all based on the historical data contained within the table. The right-most column, labeled PValue, provides the likelihoods that we would see a greater influence on Dissolution of the respective effects if we were to collect more data, assuming the effect has no influence whatsoever. It looks as though there is a 0% chance, rounded to five decimals, that extra runs would show Screen Size appearing to have a greater effect if it had none. It is practically a mathematical certainty that the size of the influence we’re seeing isn’t due to random chance. We can therefore conclude that Screen Size really does impact the values of Dissolution.
Figure 23: Effects Summary for a full response surface model with eight main effects.
All the way at the other extreme is the interaction effect between Lactose Supplier and Blend Speed, the factor at the very bottom of the list. With a p-value of roughly 0.959, which is far above the accepted cutoff of 0.05, we can safely conclude that this effect doesn’t have any influence on Dissolution. Let’s go ahead and remove it from the model.
- Click on “Lactose-Suppliers*Blend Speed-rpm” in the Effects Summary table and press the Remove bottom just below it.
Did you notice that all the p-values, and everything else in the report, changed slightly? Like many of the reports in JMP, changes lead to all values being immediately recalculated, which allows us to refine the model by removing all non-significant effects without having to restart the analysis.
- One by one, remove the least significant effect (the last one at the bottom of the Effects Summary report) until all remaining effects have p-values greater than 0.05, as in Figure 24.
Figure 24: Effect Summary for a refined RSM model containing only effects with a p-value greater than 0.05.
Note: Don’t remove the effects that have a ^ symbol to the right of their p-value. The symbol is meant to make you aware that the effect is contained within other, more significant effects. As a rule, we leave those in the model, only removing them if and after all containing effects have first been removed.
The RSq value below the Actual by Predicted Plot decreased to 0.83, meaning less variation in our response is captured by this new, refined model. That’s ok. The original RSq of 0.88 was misleadingly high because of the inclusion of the non-significant terms that fit random noise and not our actual process. Removing high p-value factors from a model naturally leads to a lower RSq but is more likely to result in more accurate predictions for future batches.
So, is being able to explain 83% of the variation in our process good, or is it entirely inadequate? Let’s remind ourselves of the objective of this entire exercise: we are trying to solve the problem of how to reliably produce batches with Dissolutions above 70 to reduce the scrap rate from 15% to below 5%. Once we’ve understood enough to figure out how to do that, we’re done and can move on to the next problem. If this model allows us to do that, then it’s good enough. We can find out by scrolling down to the very end of the Fit Least Squares report to the Prediction Profiler, replicated below in Figure 25.
Figure 25: Prediction Profiler for our refined RSM model.
The Prediction Profiler is a graphical representation of the model and can be used to see the predicted result of a combination of settings. For example, in Figure 25, the mean of batches produced using those settings (Mill Time of 17.011 hours, Screen Size 3, etc.) is predicted to be 77.49%. We can also see from the profiles that increasing Mill Time, Blend Time, and Coating Viscosity, while decreasing Spray Rate, will, according to the model, lead to batches with even higher Dissolutions.
- Click on the vertical red dashed lines within the profiles and drag them right or left.
- Continue to manually move the lines to find the setting that the model predicts will achieve the highest Dissolution.
In addition to trying to find the most optimal setting manually, we can use the Prediction Profiler’s Maximize Desirability feature:
- Click on the red triangle menu for the Prediction Profiler.
- Select Optimization and Desirability --> Maximize Desirability.
- Go back to the red triangle menu and select Optimization and Desirability --> Maximize Desirability.
After maximizing the desirability, the resulting optimal setting is seen (Figure 26).
Figure 26: Prediction Profiler with Desirability Functions enables on, with factors at optimum settings resulting in highest value of Dissolution, 86.5% (y-axis adjusted from default).
The predicted value is higher than our 70% milestone. Of course, the real test will come when we try these settings in our factory.
11) Predictive modeling and machine learning (JMP Pro)
In JMP Pro, deploying models into production is simplified with the Formula Depot. From there, we can easily generate scoring code to deploy our model into production.
- From the red triangle menu beside Response Dissolution at the top of the Fit Least Squares report, choose Save Columns --> Publish Prediction Formula.
- In the Formula Depot report, click on the Fit Least Squares – Dissolution red triangle menu and select Generate… (see Figure 27).
Note: The Formula Depot can be used to generate C++, Python, JavaScript, SAS or SQL.
Figure 27: Formula Depot with the Fit Least Squares model for Dissolution (left), with automatically generated Python code (right).
12) Automation and scripting
It’s now time to save the script for importing the table into JMP and getting it in the right format. This will make it much easier to repeat when new data becomes available. We won’t worry about saving the instructions for all the analysis we did at this moment, since we’re not likely to have to redo it exactly as was done before. Eventually, we are going to want to monitor the effect of implementing the new recipe on future batches, as well as to make sure everything remains stable and the process isn’t drifting. For right now, let’s just capture the Data Table Operations from the Action Recorder within the log.
- From the Log window, click on the red triangle menu in the top left corner and select Action Recorder.
- Select Platform Launch to deselect this option.
- Repeat to deselect Report Snapshot on Close as well, so that only Data Table Operations appears selected, as in Figure 28.
Figure 28: Log window, showing the Action Recording options within the red triangle menu, with only Data Table Operations selected.
- Click on the red triangle menu in the Log window again. Select Save Script --> To Script Window.
- From the script window, select File --> Save As. Name the script Tablet Data Import and save it in a convenient spot on your desktop.
Now that we’ve saved the script, you can close the JMP data table, including all open reports. We are left with our original Excel table, as well as all instructions needed to import it into JMP and get it ready for analysis. We’ve also submitted the instructions for our factory to start producing products using the new recipe. Let’s clear the log, and then take a break while we wait for new data to become available.
- From the red triangle menu in the Log window, select Clear Log.
13) Quality and process engineering
Imagine that some time has passed and a number of runs using our new and improved process recipe have been completed. These new runs have been added as new rows in the Excel work; this may have been done automatically, or it could have been that the rows were added manually and then the file resaved. The new process is called B to differentiate it from the old process, known as A.
- Save the Tablet Data updated.xlsx file in the same location as the previously-used Tablet Data.xlsx file.
- Delete Tablet Data.xlsx.
- Rename the new file Tablet Data.xlsx.
- Open the Tablet Data Import.jsl script file saved in Section 12 and press the Run button (Figure 29).
Figure 29: Script window with the Run button near the top of the window circled in blue.
Notice that formatting changes we made to our original JMP table have been automated in the importing of this table, including recoding columns API-Particle Size and MgSt-Supplier, changing the modeling type of Screen Size to ordinal, using a comma as a decimal symbol and converting the affected columns to numeric, and adding the Accept/Reject column at the end. Notice also that 25 runs have been added to the table, corresponding to our new runs, as well as a column indicating the recipe for each batch. Briefly inspecting the table, it appears the importing and formatting script successfully accommodated these additions. This is good news because, as new data continues to become available, we will still be able to rely on this script.
Scrolling to the bottom of the table, the new runs have been indicated in the Recipe column as B. From the Accept/Reject column (last, at the far right), we can see that all new batches have been classified as Accept. This, of course, is also excellent news. You may have noticed that the recipe setpoints don’t match what’s in the table. For example, the model determined the optimal setting of Mill Time to be 25.7 hours, but the values in the table vary somewhat from this setting. This is because the table contains the actual values as measured during each batch, which don’t always correspond exactly to the setpoint. Tightening these values through better control of the process variables may be a future avenue of improvement for our process.
In the meantime, we can use a Control Chart to see how our process has changed after we implemented the new recipe. We can also use it to track and monitor the process and alert us to any gradual drifting or sudden changes that require intervention.
- From the Analyze menu, choose Quality and Process --> Control Chart Builder.
- In the Control Chart Builder window, scroll down to the Dissolution column in the columns panel. Click on Dissolution and drag it to the Y drop zone.
- Drag the Recipe column to the Phase drop zone at the top of the control chart.
- Select Done.
We now have two Individual & Moving Range charts side by side, with the chart for Recipe A on the left of that for Recipe B, as seen in Figure 30. The first thing we notice is that the points for Recipe B are higher, corresponding to their higher dissolution values. The solid green lines through the middle of the charts designate the means of each process, while the solid red lines above and below the mean are the upper and lower control limits, respectively. Their calculated values are provided in the Dissolution Limit Summaries, just to the right of the control chart.
Figure 30: Individual & Moving Range chart for both recipes, A (left) and B (right), with Limit Summaries providing the calculated means (Avg) and lower/upper control limits (LCL/UCL).
There is more variation in the batches made using Recipe B, the new one, than using the old one, Recipe A. We can tell this because the points themselves are spread across a wider range of Dissolution values. We also note that the red lines are further apart, indicating a larger standard deviation. However, as we noted, looking at the Accept/Reject column, all Recipe B batches are acceptable, which is our primary goal. A mean of 86.76% is quite a bit higher than 72.86%, the mean of our original recipe; 86.76% is also remarkably close to the 86.5% value predicted by our model.
Despite large standard deviation leading to the wide control limits, it appears at first glance that our process is stable. Having said that, it may be that Recipe B is trending upward. One option, available in the Quality and Process menu, could be to keep an eye on it using a CUSUM chart to determine as early as possible if it is a real trend.
A simple test for stability, and an indication that something may have changed, would be for one of the points to go beyond the red lines. There are other tests as well, which can be activated to serve as alerts for process monitoring.
- Right-click in the middle of the Individual & Moving Range chart.
- Select Warnings --> Tests --> All Tests.
One warning appears in A (hover over the warning to see the details), while none appear in B. However, we can – and absolutely should – update the control chart to monitor for changes as new data becomes available and do this regularly for as long as the product is being produced in our factory.
- From the red triangle menu beside Control Chart Builder, select Save Script --> To Clipboard.
- Go to the Tablet Data Import script and paste (right-click --> Paste, or ctrl-v) after the last line of the script.
- Save the updated script File --> Save.
We can now close the control chart and table without saving. Running the script will re-import the table and create the control chart. As the work needed to update the analysis has now been reduced to a single button-push, it is little effort to repeat this regularly and as often as needed. It can even be automated through the use of the Task Scheduler in Windows or iCal in Macs (or any of the other myriad free tools for automated scheduling). But, what if our colleagues responsible for monitoring the process aren’t JMP-savvy? How can we keep them up to date and informed in real time?
14) Sharing and communicating results
JMP Live is a tool for scientists and engineers to access JMP reports through a secure web connection without needing to use JMP. It is even possible to enable warning emails to be sent out in the event of a process alarm being triggered within a control chart. The entire procedure of importing the data, reproducing the analysis, publishing the report to JMP Live (a control chart, in our case) and sending out email warnings in the case of an out-of-control process can be easily automated for periodic updates.
JMP Live, which allows for confidential or sensitive reports to be uploaded to a secure server and shared within closed circles, is a separate product within the JMP family. JMP Public is a public version of JMP Live. Anyone with a web browser can view the reports (available at public.jmp.com), while anyone with JMP 15 or higher can publish freely to JMP Public.
Let’s begin by publishing our report to JMP Live:
Note: If you don’t have access to JMP Live, you can publish to JMP Public. Alternatively, you can simply read the steps and try following along later. If this is your first time publishing to JMP Live or JMP Public, begin by establishing a connection (File --> Publish --> Manage Connections…). Details can be found in this blog post, as well as a how-to on automatic publishing.
- From the Control Chart Builder report window, select File --> Publish --> Publish to JMP Live…
- Select the report to be published and press Next.
- Ensure the Enable Warnings checkbox is checked, indicating emails will be sent out in the event of an activated control chart warning (see Figure 31).
- Select the group (previously saved list of emails) that will be allowed access to the report.
- Press Publish.
Figure 31: JMP Live new post configuration window, with the Enable Warnings box checked, for Control Chart warning emails. Toward the bottom, we can see that access to the report will be limited to the Engineering group.
A message indicating the report was successfully published is displayed upon completion of these steps.
Note: For the purpose of this blog, a replica of the report has been published in JMP Public.
Now that the report has been published once, updating the online report as new data becomes available is simple. As mentioned above, details and helpful tips can be found here. In short, three pieces of information are needed:
- The name of the report in JMP.
- The name of the JMP Live connection.
- The report identifier for our published report in JMP Live.
The name of the report (in our case, Control Chart Builder report) is the name that was used to reference the object in JSL. Back in Section 13 when we copied the script and pasted it into our JSL window, we didn’t name it anything. Let’s go back and do that now.
- Just before the Control Chart Builder function that was used to create the control chart in your JSL script window, put an equal sign and give it a unique name (see Figure 32).
Figure 32: JSL script window with the Control Chart Builder, now referenced by the name ccb.
Next, the name of the connection to JMP Live can be found using the Manage Connections… feature of the File --> Publish menu. It is indicated the first time a connection to JMP Live is established, as detailed in the JMP Live admin tricks: Automating Publishing blog post.
Finally, the report identifier is the code after the last forward slash (/) in the web address of the published report. For example, in the report seen in JMP Public, it corresponds to the code circled in blue in Figure 33.
Figure 33: JMP Public report with the report identifier code circled in blue.
With that information, all that is left to do is to add the following four lines of code to your script:
liveconnection = New JMP Live( Connection( *JMP Live Connection Name* ) );
webreport = New Web Report();
webreport<<Add Report(*JMP report name*);
jmpliveresult=liveconnection<<Replace(webreport, ID( *report identifier* ) );
The lines can be copied into your script exactly as they appear above, with the placeholder names bookended by asterisks (*) replaced with their actual names.
Final Remarks
At this point, the process is operating within spec, and automated monitoring with email alerts has been set up. We’re finally ready to move on to the next problem, with all its unmet challenges and unanswered questions offering new opportunities to improve performance, speed development, reduce cost or increase efficiency, perhaps all at once. The paths towards these will necessitate us to take new routes through the JMP Analytic Workflow, uncovering new insights along the way.
But, what of the problem we just solved? Can we further increase yield? Is there a way to push Dissolution even higher? What if a machine breaks or a supplier changes, how will that affect our process? How should we respond when the control chart email alerts we set up through JMP Live get triggered?
Real value comes from viewing the JMP Analytic Workflow as a guide along a continuous cycle of improvement, rather than a one-way journey that begins at a problem and ends at a solution. Each round of the cycle incrementally improves performance, leading to better products or processes, strengthening our business.
To learn more about how the JMP Analytic Workflow can help you meet your challenges, visit our website.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.