Abstracts

0 attendees

0

Saturday, March 4, 2023

Do you or your colleagues ever wonder if JMP can do a particular analysis? It probably can, and now in JMP 17 there is a new way to find out how. JMP Search is a new help capability that can guide you to JMP features and other resources like the JMP Community. Whatever your experience with JMP, it is designed to help find new-to-you features or re-find features you have used but forgotten. You will learn to use JMP Search for data manipulation, statistical analyses, and visualizations. And we will touch on some of the underlying technology behind JMP Search and how you can use it yourself. Hi, my name is Evan McCorkle. I'm a developer with JMP, and I'm here today to talk to you about a new feature in JMP version 17 called JMP Search. It's available under the Help menu. There's Search JMP, and there's accelerator key, CTRL+ comma here. The idea behind this feature is to help you, whether you're a new user of JMP or an experienced user of JMP, to find features within JMP that will help you to get your job done. I'm going to go through this demo today using one of our sample data tables you may have seen before, Big Class. We're not going to focus on any of the actual statistical analysis or anything like that. I'm just going to focus on using JMP search to find different features within JMP to use on that table. I can start by opening JMP search and you get a dialog here. I'm going to type Big Class. I misspelled it, but that's okay. It knows what I meant. I see results here in a details pane over here. I'm going to go ahead and click on that and bring open the data table. From here, we can do a lot of data table cleanup, data table manipulation. For instance, what if I wanted to exclude all the men in this table? I can click on this and I don't quite remember where this feature is, but it's something about finding matches or something like that. I can bring that up by the accelerator key, CTRL+ comma, I can type Matches and look through the results here. I see under the rows or triangle, Row Selection, Select Matching Cells. I think that's what I want. I could learn more by looking at the topic help here, but I think I just want to run it. I could run through the Show Me dialog there or Show Me button there, or I can just double click on this item and I see a guided path down to that item, just like if I were to open the Rows red triangle menu. I've selected all the men and now I can come over here and hide and exclude them. Now let's say I'm going to expand this table. I'm going to change this from 40 rows to millions of rows. I might want to turn compression on to do that. I can go open search and type in Compression and I see a couple of options here. One under the red triangle for the table, this one, I see Compressed File When Saved and then under Utilities, Compressed Select Columns. Let's do that under utilities, Compressed Select Columns. If we look at that, we see it's turned age, which is my selected column into a one by integer compressed column. We don't need to reopen the search because I remember under this red triangle, there's that compression option. I'm going to go ahead and bring these back. From the data table, we can do other things like splitting and stacking, and joining, and recoding column names, etc. But of course JMP is more than just data tables. We also have statistical analysis and statistical platforms. Let's look at some of that. I can come into here and I can type Anova. There's a couple of options here, but I want to look at these first two. One, there's a tutorial, and I might want to go through that a little bit later, but not now. Under Analyze Fit Y by X, under One way, there's Means/Anova . If I look at this diamond here, I can see, when I hit Go, it's going to launch the platform launcher dialog, and it's going to ask me to put columns into Y and X, and turn knobs and flip switches, and depending on the data used and the options chosen, this Means/ Anova may not be available to me, or it might be. In particular, if we look here, it says this option is only available when X has more than two levels, but I think that's going to be okay. Let's hit Go. It brings up in the Fit Y by X dialog, I want to do height by age. We see one way here, that's just what I wanted. Let me say okay, and we have age has more than two levels, so I can bring back open the search and search again for that. I see, it's just right under the red triangle, second one down, we can turn that on. Then I want to do a letters report. I can't really remember connecting, connected... Something about letters. Let's look at that. I see a couple options with Fit Model, and I see one under One way, and I'm already in One way. It would be great if this is appropriate to use Oneway to do this. It's not available to me right now, but I see it's under One way Means Comparisons, and I see lots of different techniques in here with Student's t and Tukey. Let's just look at that under Compare Means, student's T. I'm thinking if I do this, and then from here we look at letters again, we can see under the outline for One way, and then under the outline for Means Comparison, we have a red triangle and Connecting Letters Report is actually already on. If we go down, we can see it under this Means Comparison. I've already got what I wanted. We talked about data tables. This is some statistical platforms, some statistical analysis. Now let's talk about some visualization. I go back to the table and I'm actually going to go down and look at this Fit Polynomial on Bivariate. I want to do height by weight, bring up in Bivariate here and search for Quadratic again. I see this red triangle entry. This is another red triangle entry, but I'm going to look at this one. Do a Quadratic Fit. I'm not quite sure about that fit, but I know I don't like the red, so let's change it to something else. Let's change it to blue. Under that red triangle there, we can change the line color , I'm going to change it to a blue. I can bring this up and now I want a little more in the visualization here. Let's look at the Nonparametric Density and turn that on. That's a couple options in terms of visualization in this frame within Bivariate, but it's also available in other platforms and in other situations within JMP. To go back to the table here, I want to call a couple of things out. When I showed this, I typed matches, but we got results for matching and matched, and that's because we are doing stemming on the search query and on all the content within JMP. This search will work no matter what your display language is within JMP. It'll work in English and French, and Italian, and German, and Spanish, and Japanese, and Korean, and simplified Chinese. No matter what your display language is in JMP, you can search in that language and then get results here in that language that are localized for you, as well as in the details pane over here and navigate in the same way that I did in English. The technology that we're using to turn matches into results for matching and matched is also available for you to use for yourself in your your own data tables through Analyze Text Explorer. Within Text Explorer, if you have a bunch of text data to look at, you can go and tell it what language the column is in and what stemming you want and what tokenization you want. JMP will do that same collapsing of different conjugation of words and things like that into a single form. That same technology is the technology that we're using within JMP search to help it to work in whatever display language you happen to use JMP in. With that, we've gone through JMP search for data tables, statistical tests, and visualization. Again, I hope that JMP search will help you if you're a new user or an experienced user, to either find new things or re-find things that you knew about but you maybe forgot where they were, and to use JMP to help you to get your job done quickly and easily. Again, that's JMP Search under Help, Search JMP, and it's available in JMP version 17. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

This presentation is an extension of a case study presented at a Discovery conference a few years ago, where a client’s protocols required a Gauge R&R study to be performed before running a functional response designed experiment. As in a standard Gauge R&R study, there were several Parts, several Operators, and several replicates per combination of Part and Operator. However, in this case, the test equipment returned a set of curves as the response instead of a single point. A functional random effects model is appropriate for this type of data. In this application, the functional model is expanded using basis splines and then expressed as a mixed model, where variance components can be estimated using standard methods. This is done using the Functional Data Explorer and Fit Mixed platforms. Due to the functional model expansion, multiple variance components may be associated with each of the Part, Operator, and Part*Operator terms. It is shown that these variance components can be summed and written in the form of a standard Gauge R&R computation, therefore providing a Functional Gauge R&R analysis. Hi, my name is Colleen McKendry, and I am a senior statistical writer at JMP, but I also like to experiment with functional data. This presentation is an extension of when Flash was inspired by a presentation that was originally done in 2020 titled Measurement Systems Analysis for Curve Data. There was also a slightly earlier presentation at a JSM conference in 2019. My talk is essentially how I would go about solving the problems that were presented in those original papers. I'll discuss them a little bit more later too. First, a little bit of background on measurement systems analysis. MSA studies determine how well a process can be measured prior to studying the process itself. So it answers the question, how much measurement variation is contributing to the overall process variation. Specifically, the Gage R&R method, which I'll be using later in this analysis, determines how much variation is due to operation variation versus measurement variation. These types of studies are important and they're often required to be performed prior to any type of statistical process control or design of experiments. A Gage R&R classical MSA model is shown here. For a given measurement, your response, Y sub I K is the Kth measurement on the Ith part. In this model, you have a mean term and a random effect that corresponds to the part and your error term. The random effect in the error term are normally distributed random variables with mean zero and some corresponding variance components. This is simply a random effects model, and we can use that model to estimate the variance components, and then use those variance component estimates to calculate the % gage R&R using the formula that's shown on the screen. Here we have the same model but the crossed version. For your response, Y sub IJK , that's going to be the Kth measurement made by the Jth operator on the on the Ith part. A gain, we have a mean term, a random effect that corresponds to the part, and now we have a random effect that corresponds to the operator, and a random effect that corresponds to the cross term, which is the interaction between the operator and the part, and of course, our error term. All of these random effects are normally distributed random variables with mean zero and some corresponding variance component. So just like in the classical model, this is just a random effects model, and we can estimate the variance components and use them to calculate the % gage R&R. In both of the models that I just described, the response or the measurement was a single point. But what happens if this isn't the case? What if your measurement is something like a curve instead? This was the motivation behind those initial presentations in 2019 and 2020 that I talked about. There was a client of JMP that was a supplier of automotive parts, and they had a customer that specified that a part needed to have a specific force by distance curve. Obviously, the client wanted to match their customer specified curve, and so they wanted to run a functional response DOE analysis in JMP to design their product to do that. However, before spending the money on this type of experiment, they first wanted to perform an MSA on their ability to actually measure the parts force. There's a lot more details in the paper noted at the bottom, so if you're interested in more of the background, please see that. This is what the data looks like. We have force on the Y axis and distance on the X axis, and the curves are colored by part. It looks like there are only 10 curves, but there are actually 250 curves in total. It's just that a lot of the curves are clustered together. In the data, there were 10 parts, five operators, and five replications per part operator combination. I just wanted to note that this is simulated data, and it's simulated to look similar to the actual data, but that we aren't sharing any proprietary data here. A few function characteristics that I wanted to point out, the functions are actually all different lengths, so they have a different number of observations in their curve. Although the functions were collected at equally spaced time intervals, they were not collected at equally spaced distances. That means there's no true replication in terms of distance. When this project was first presented, one of the original ideas thrown out was whether we could pick a set of distance locations and do a standard gage R&R MSA at each of those locations and then summarize that information for a final result. The problem with that is that if we picked a specific location, there wasn't a guarantee that there would be an observation for each of those curves because there wasn't replication for the distance. Another problem that's more generalized is with this type of curve data, doing point wise analysis like that does not take into account the within function correlation. Luckily, there's a whole field of statistics dedicated to this type of data called functional data analysis. There are a variety of techniques to handle unequally spaced data. A lot of those techniques are now available in JMP through the Functional Data Explorer platform. The question became, can functional data methods be combined with traditional MSA methods to perform some type of functional measurement systems analysis? This was the solution that was presented in the older papers that I referenced. This is just going to be a little bit of a review of what they did. First, a penalized spline model was fit to estimate the part force functions and so there were 10 functions that were estimated, averaged over operator and replicates. Then these functions were subtracted from the original force functions to obtain a set of residual force functions. These residual functions no longer contain any variation due to the part. All of the variation in those residuals were due to the operator and the replicates. They then fit a random effects model to the residuals to obtain the corresponding variance components from the model. A graphical method was then used to find the smallest part variance to use as an estimate for the part variance component. This was then used to calculate a type of worst case scenario % gage R&R. This method they worked fairly well. They got results that made sense and the client was happy. But this was just a generalization of a standard MSA with some functional components sprinkled in. When I looked at this data and when I looked at the problem, I wanted to try to take a more traditional functional approach. I have a background in functional data analysis, and that is what my dissertation was on, specifically functional mixed models. There was a chapter in my dissertation dedicated to estimating and testing the variance components from functional mixed models. I did that by expanding the functional model using eigen function or basis function expansions and rewriting it as a mixed model and then using known techniques to estimate those variance components. I started to think, could I use the same type of technique? I don't need a full mixed model. I only have random effects here. Can I create a functional random effects model for the part and operator variance components? This is what I came up with for a functional MSA across models, since we do have an operator term. For functional models, they're set up a little bit differently because they're all based around the input. In this case, your response, Y sub IJK is the Kth replicate made by the Jth operator on the ith part, but this time at a particular distance, D. We have a functional mean term, a functional random effect that corresponds to the part, a functional random effect that corresponds to the operator, and a functional random effect that corresponds to the cross term and our error term. In this method, I subtract the mean term over, and so I'm left with this set of residuals, and that's when I'm going to model. This here represents the eigen function expansion of the functional model. We're going to have capital B eigen functions and sum all of those parts together to create this big long random effects model. But for one eigen function, what is shown in these brackets is the expansion. For each functional random effect, it's split into two parts. We have a functional part and then just a regular part. The functional part is taken care of by evaluating those eigen functions. Then we have just standard random terms for the part, the operator, and the cross term. Then what this essentially does is you build this long random effects model, and then you have a number of variance components for each term. For the MSA, there will now be three sets of capital B variance components. There's going to be capital B part variance components, capital B operator variance components, and capital B cross term variance components. Because of the way eigen functions are structured, they are known to be independent and so we can assume that based on how we structured the model, all of these variance components are actually independent from each other also. That means we can sum them together to obtain these functional variance components. Since we're just summing them together, we can also substitute them into the formula for the % gage R&R and compute that just like we did in the standard models. How do I actually do this in JMP? Well, it's a multi step process. I'm going to briefly outline it here, and then I'm going to do a demo for you. First, I'm going to estimate the mean curve in FDE and obtain the residual curves. I'm then going to model those residual curves serves also an FDE to obtain the eigen functions needed for the eigen function expansion. I'm going to save those eigen functions to the original data table and use them in FitMix. Using FitMix, I'm going to fit a random effects model to the original data using nesting and the eigen functions to define the appropriate model specifications. Hopefully, that all makes a little bit more sense once I demo it. We're going to exit out of here. Here is our data, and we have a column for the ID variable, a column for the part variable that defines the 10 parts, the operator, which defines the five operators, our distance column, and our force column. Just as a reminder, this is what the data looked like. My first step is to estimate the mean function of the force curves and then use that to obtain some residuals. To do that, I'm going to model the force functions in FDE. I'm going to go to the Analyze menu, Specialize Modeling and select Functional Data Explorer. I'm going to define force as my output, distance as my input. Then because I want the mean function averaged over all of the IDs, I'm not going to specify an ID function here. I'm going to click Okay. We have our basic intro FDE report. But I want to fit a model, so I can go to the red triangle menu, models. Technically, you could fit any of those models. I just chose a B spline because it's first, it's easy, and it'll just take a few seconds to run here. Okay, so here's our model fit. There's a red line here that's pretty hard to see, but that is what is the mean function, the estimated mean function. I'm going to give you a better picture of that in a minute or so. But I actually want to save the functions for this meme. I can do that by going to this Function Summaries report. I can click the red triangle menu and select Customize Function Summaries. I only want the same formulas, so I'm going to deselect them all and then reselect that one and click OK and Save. I get a new data table with what appears to be this lonely little entry here. There is a hidden column, so I'm going to unhide that. We have a distance column and then we have this force mean functional formula. I'm going to take a look at that, what that actually looks like. When we look at the formula column, we can see that this is a function of distance. For any value of distance, this is going to be evaluated to give what the mean function is at that distance. This formula column can be put into any data table that also contains a distance column. That's exactly what we're going to do. We're going to make sure this is highlighted. We're going to right click and select copy column properties. Then we're going to find our way back to our original data table. Double click to create a new column. I'm going to right click here and do paste column properties. Now we have the mean force evaluated at every level, every distance value in our data table. We can use that to now find our residual function. I'm going to double click again to create a new column and title it Force Resids. Now I'm going to create my own formula column that is simply going to be force minus the mean force. I'm going to click OK. Now we have our set of residuals and this is what it looks like. In the top graph, these light gray curves are the original functions from the force column. This red line is the same red line that I tried to show you in the FDE report that was hard to see, but this is what that was. That's the mean function. Then the bottom graph in green shows the residual curves. These are the curves that I'm going to use to proceed with my analysis. My next step is going to be to model the residual curves using FDE to obtain the eigen functions that I need for the model expansion. I'm going to go to the Analyze menu again, select Specialized Modeling, Functional Data Explorer. This time I'm going to specify the residuals as the output, distance as the input, and I am going to specify my ID column this time. I'm going to click OK. We have our initial port here, and now I want to fit a model to this data. I go to the red triangle menu to look at the models. Again, technically, you can fit any of these models. In my experimentation, I found that these top three, they took a really long time to fit, and they didn't provide super great fits for what I needed. The wavelets models and the direct functional PCA were much, much quicker and while also providing better fits. The caveat with those two models is that they require the data to be on an evenly spaced grid. A s I mentioned, when I introduced the data, that's not the case. However, in FDE, we have some data processing steps that help us manipulate our data a little bit. We can go to the clean up, reduce, and this first tab is what we want, and we can use that. Now that just puts it so that every distance value has an observation, has a force residuals observation. Now we can go ahead and fit one of those models. I just chose direct functional PCA. A s you can see, it was very quick. The fitting was super fast. This functional PCA report is where we're going to get all of our information that we need. But I was just going to scroll down to look at the data fit a little bit. We can see that these look pretty good. Then if we look at the diagnostic plots, these are on the diagonal, the residuals look good. This looks like a pretty good fit, and I can use this information from the Functional PCA. In the Functional PCA report, we have a table of eigen values and then these graphs of our shape functions. The shape functions are actually our eigen functions. They're just called shape functions in JMP. How these functions work is that your original input, so distance, is on the X axis, and then the eigen function evaluation is on the Y axis. For any distance D, you're going to have an evaluation at eigen function 1 , an evaluation at eigen function 2, and so on. You can use the eigen functions to get... You can use a linear combination of the eigen functions to get an estimate of the original functions. The eigenvalues table gives you an idea of how much % of the overall data variation you're taking into account when you use a certain number of eigenvalue, eigen function pairings. In this case, the first eigenvalue and eigen function pairing actually accounts for 99.9 % of the total variation in the data, which is actually pretty incredible. This is important in determining how many eigen functions you're going to use for the basis expansion. Typically when you're selecting a number of eigen functions to use, you don't actually want to use all of the eigen functions that you're given. You want to use the least number of eigen functions that still account for an adequate amount of variation in the data. This is because the more eigen functions you use, the bigger your random effects model is going to be, and it's going to be harder to estimate. What does accounting for an adequate amount of variation in the data mean? It can mean different things to different people in different fields. Typically, when I'm working with this, I have used in the past about 90 % as a cut off. If I was just doing this analysis to do the analysis, I might only take this first eigen function since it explains so much, and run with that. For demonstration purposes, I'm going to take the first two just so you can see what the model expansion looks like using two eigen functions. To save these, I'm going to go to this Function Summaries report again, click on the red triangle menu and select Customize Function Summaries. I just mentioned that I'm only going to save two of them, so I want to enter two here. I'm going to deselect these all again and only save the formulas. I can click OK and Save and I have another new data table now. The things that I need are actually hidden. We want to look for these force resids shape functions which represent our eigen functions. We can unhide those so that they're now included in our data table. We can take a look at these formula columns. Just like with our mean function, this is simply a function of distance. For any value of distance, these formulas are going to give you what the eigen function value is at that distance. Also, like the main function, we can put these formula columns into any data table that also contains a distance column. That's what we're going to do again. We're going to put these formula columns into our original data table. We're going to make sure both of these are highlighted and right click and select copy column properties. Again, find our way back to our original data table. I want to add two new columns to my data table. I'm going to go to calls, new columns. We're just going to title them E1 and 2 to represent the eigen functions. I want to add two columns and I want to add them as a group. I'll click okay. Then with these two new columns highlighted, I'm going to right click and select paste column properties. Now we have our eigen functions evaluated for every distance. Just to give you an idea, this is what they look like and it's the same graph. It's almost the same graph as what was in the FDE report. We're just taking what was graphically there and now we have the numbers for everything, for every value here. Now we want to do the eigen function expansion and expand our functional model. But what does that model actually look like when you have two eigen functions? I'm going to hop back over to my slides real quick and show you what the model expansion looks like when capital B equals 2. I have this divided into a section for the part, the operator, and the cross term. Then within each of these sections, we see that we have a term that involves eigen function one and a term that involves eigen function two. Essentially, this means that we're going to have two variance components for part, two variance components for the operator, and two variance components for the cross term. Now I'm going to go back to my data table. I want to fit this model using Fit Model. I'm going to go to the Analyze menu and select Fit Model. I want to specify my personality as the mixed model. Now I'm going to specify the residuals as my Y. Then I'm going to move down to the effects section. In the fixed effects tab, I don't have any fixed effects and I also don't have an intercept because I mean term over originally. Now I'm going to move to the random effects tab. Here is where I'm going to use the eigen functions and the part and operator variables and nest them in an appropriate way to define the model that I just showed you. We can add both of these eigen functions and we're going to select these and also select part. We're going to nest part in each eigen function. Then we're going to do the same thing for operator and the same thing for the cross term. That's how we're going to define our model. The last thing I want to do in this launch window is deselect the Unbounded Variance Components option. When this is selected, it means that you can have negative estimates or variance components, and I don't want that. Any negative estimates are just going to be set to zero. Now I can run this, and we have our report here. This is the table that's going to give us our estimates that we need to calculate the % gage R&R, but I'm going to poke around the report a little bit first. This actual by predicted plot looks really weird at first, but it makes sense when you think about it. Since we don't have any fixed effects or an intercept, when we don't take the random effects into account, our estimate for everything is just zero. When we do take those random effects into account, we see that the actual by conditional predicted plot looks a lot better and that these observations fall pretty well along this diagonal. We can also take a look at the conditional residual plots, and we can see that they're pretty small and they're centered around zero. We do have some deviation from this line here, but there's nothing super crazy about the residuals. I feel okay about using these estimates to calculate the percdent gage R&R. I actually pulled this table and put it back into my slides. I'm going to go back to my slides for the remainder of the presentation. Here's that data table, not data table, the report table that was just there. As you can see, you have a variance component, S2 variance, components, estimates for part, two for operator, and two for the cross term. A s I mentioned when I was describing the model, we can sum these together to calculate the functional variance components. These specific numbers aren't as important in this analysis as what you get when you put them into the formula for the % gage R&R. So when I do that, we get a % gage R&R of 3.3030. That is what Baren team defines as an acceptable measurement system. If this had been my project, I would have gone back to the client and said that it seems like your measurement system is accurate, you can go ahead and proceed with your design of experiments. That's basically it for this analysis for this particular data. Just some thoughts that I had. This result was actually very similar to the worst case scenario % gage R&R that was presented in the 2019 JSM presentation. It was higher by just a few decimal places. It'd be really interesting to compare these methods and other data sets to see if they are always similar or if this was just a happy coincidence. I don't have much experience at all with measurement systems, and so I don't have any other data to play around with or even really know how to obtain it. If anybody has any data that they think might apply to this type of project, any functional data that also they might want to do an MSA on, I'd be really interested in hearing about it. For some future work, some thoughts that I had was, the first one was, should I add a functional random effect for the ID? This is very commonly done in a lot of functional mixed models, at least in the fields that I worked in. This was a big contribution in my dissertation was the use of this functional random effect for ID. Typically, this captures the within function correlation across, in this case, distance. I played around with this random effect in this data, and every model I used, the number of eigen functions I used, it didn't matter. The variance component associated with this random effect always came out to be zero, and that's not useful. I think in this case, once you took into account the variance from the part and the operator, there just wasn't any variation left to account for. I don't know if that's true for all functional MSA studies or if this was just true for this particular data. If I was ever able to get my hands on some different data, this is definitely something I would keep in mind to see if it could be added to a model in any other data sets more successfully. I also think it would be cool if we could calculate a confidence interval for the % gage R&R. And finally, I wanted to talk about the one thing I wasn't super happy with in this project, which was the residuals. What was wrong with them? These are graphs of some different models and the residuals for each one. I'm going to go back and forth between this slide and the next one. So yes, the residuals are relatively small. They're centered around zero. There's no crazy spikes or outliers, and that's good. That's what we want. However, in all the models I fit, I just still didn't love how they looked across distance. Looking at the residuals this way is especially important when working with functional data. This is because it can really show when you're not capturing all the functional parts of the data. A lot of times in functional data, you see this fanning effect where the residuals are really good in the beginning, and then as you get towards the end of your domain, they fan out a little bit. This data actually had almost the opposite problem. We can see that the residuals are a little bit wider in the beginning of the domain and get closer to zero as distance gets larger. There's also definitely some type of cyclical pattern in these residuals. I don't think it's the end of the world. I think they're super centered around zero, but you can see in these graphs that there's clearly some up and down patterns. Essentially what that means is that I'm missing something. I'm not capturing the full functional nature of the data, and I don't really know why yet. I'd really like to figure that out and fit an even better model, whether that's possible with this data or different data in the future. I'm not sure, but it's definitely something I want to spend a little more time on, and I'd be open to any discussion anyone would like to have about it. That's it for me. Thanks for watching. If you have any questions, suggestions, questions or feedback, feel free to email me. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Climate change is a reality. The Paris climate goal of limiting global warming to 1.5 degrees is barely achievable. The only remaining question is: what can we still do to keep the consequences reasonably limited? How many conferences have you flown to last year? How many times a month do you eat meat? Do you drive to work? All of these questions matter, but how much? In this talk, we will try to find a data-based answer to this question using JMP as enabling software, and we will show how each one of us can contribute to preventing climate catastrophe. I am a mathematician, not a climatologist, but we need to get involved—all of us. Thanks for tuning into my talk. My name is David Meintrup. I'm P rofessor at Ingolstadt University of Applied Science. Today I'm going to talk about the Earth, the climate and you. Let's start with a warm up talking about the climate, no pun intended. Please consider these four actions. Imagine you would keep doing these for one year. No plastic bags, go vegan, drive fuel- efficiently, or always switch off stand by modes. Can you order them by the amount that they reduce your carbon dioxide footprint? No? Well, don't worry, you're in good company. A study that was performed by A.T. Kearney i n 2019 came to the conclusion that we generally have no clue what reduces our personal footprint. They gave people seven personal actions, from no plastic bags to one flight less per year to region and season food, et cetera. You can see what people thought, their answers on the left side. I will give you the correct answers at the end of my presentation, but I can already tell you that people were completely wrong. I stumbled across this study a year ago, and I will openly admit I had no idea either. But how are we going to save our planet if we don't know? This was the motivation for today's presentation. I'm mathematician by education, and I've been promoting statistical literacy for many years. There's this famous quote, "Statistics is too important to be left to statistic ians and I think that the same is true for the climate. We need climate literacy to know and understand the Earth's climate, impacts of climate change, and approaches to adoption and mitigation. In the same spirit, I would like to say climate change is too important to leave it to climatologists. And despite the fact that I'm a mathematician, I wanted to study the topic and talk about it. In one sentence, the goal of my presentation today is to increase my own and everybody else's climate literacy. I would like to do this by answering three questions. Why exactly does climate change happen? Since when do we know? And what can each and everybody of us do about it? Let's start with another question. Did the average global temperature increase? Yes, no, or one can't say? Well, as you all know, the answer is obviously yes, and that is not difficult to prove as one can simply measure the temperature. H ere you see the development of the global temperature over the last 140 years. C ompared to the reference interval from 1880 -1910, we have an increase of approximately 1.1 degrees Celsius. In addition, it took only 30 years to double the increase from 0.5 degrees to one degree. Next question, what causes global warming? Well, again, I guess that you are all familiar with the answer. Carbon dioxide emissions from burning fossil fuels like coal, oil, and gas. But do you also remember why? Why do these emissions cause global warming? Well, the answer is the greenhouse effect. And I would like to present a few more details on that. The temperature on Earth is completely determined by the radiation balance. We have incoming solar radiation that is partially absorbed and partially reflected by the Earth and the atmosphere. From the absorbed energy, one part is radiated back into space as heat, and another part is absorbed by greenhouse gasses and then reemitted down to the Earth. And this part is actually what is called the greenhouse effect, and that is causing the global warming. Now, which gas contributes most to the greenhouse effect? Is it water vapor, carbon dioxide, methane, or ozone? I guess that most of you will have answered carbon dioxide, but actually it's a trick question. Because the trap here is that I didn't ask about the manmade greenhouse effect. Let's have a look at the details. Greenhouse gasses actually keep us warm. Without atmosphere and therefore without any greenhouse gasses, the temperature on Earth would be on average minus 18 degrees. No life on Earth would be possible. Now, if we add an atmosphere, including natural greenhouse gasses, water vapor, methane, and carbon dioxide, approximately at a level of 280 parts per million, then we have a natural greenhouse effect. This rises the temperature from minus 18 degrees to plus 50. So it's a huge effect, an increase of 33 degrees, and this is what is called the natural greenhouse effect. And the main gas contributing to it is water vapor. Now, if we continue and we add anthropogenic manmade greenhouse gasses, for example, we raised the carbon dioxide to 410 parts per million, which is more or less where we are right now, then we also add another layer of warming, as I said before, approximately 1.1 degrees, and this leads then to an average temperature of 16.1. And in this additional manmade greenhouse gas effect, indeed, carbon dioxide is the most important contributor. You can see this confirmed on this slide. R oughly two thirds of the greenhouse effect is caused by carbon dioxide. Methane contributes more or less one sixth. There's an important difference between these two gasses, though, and this refers to their lifespan. Every molecule of carbon dioxide in the atmosphere is adding to global warming for the next 100 years and more. Methane, on the other hand, has a lifetime of about nine years. So cutting methane emissions is a very quick and good fix for short time period. But on the long run, we will have to reduce carbon dioxide emissions. Let's have another look at the greenhouse effect. The first slide I showed you was, of course, an oversimplified as this is tool, but it has some more details on it. I would just like to repeat two elements. On the right side, you have the greenhouse this very important down welding radiation. And on the left side, you have the information that part of the incoming radiation is reflected by the Earth's surface. And of course, the brighter the surface is, the more radiation is reflected. This is important to understand some feedback loops. The feedbacks are self reinforcing. For example, the most famous one is the ice- albedo feedback. The surface of the ice reflects 85 % of the solar energy, only 15 % is absorbed. The dark sea, however, only reflects 7 % of the energy and absorbs 93 %. Now, if global warming induces that the ice is melting and turning into dark sea, then more energy is absorbed, causing more global warming, causing more ice to melt, et cetera. The same feedback happens with the melting of the permafrost that is a huge storage for methane and carbon dioxide. We can even see an increase over time of water vapor that also obviously has a feedback loop because warmer air can store more vapor. Unfortunately, these effects are difficult to quantify. In fact, they are not included in many models. Let's quickly summarize the physics that we've seen so far. Temperature on Earth is a question of radiation balance. The natural greenhouse effect is at about 33 degrees and is a prerequisite for life on Earth. The anthropogenic manmade greenhouse effect consists in adding additional greenhouse gasses in particular, carbon dioxide and methane. Carbon dioxide in the atmosphere increased from more or less 280 part per million to 410, inducing an increase in global temperature of 1.1 degrees Celsius. If we want to stop the global warming, this results in stopping greenhouse gas emissions. Let me turn to the second question. Since when do we know? Longer than you think. Many of you might be familiar with the Mathematician and Physicist, Jean-Baptiste Joseph Fourie Jean Bartiste Joseph Foyer. He was the first one to realize that the temperature on Earth is much higher than one would expect. The explanation he came up with was that the atmosphere acts as an insulator, storing heat that would otherwise escape. Roughly 30 years later, John Tyndall proved that Fourier was actually right. He demonstrated that carbon dioxide absorbs and emits infrared radiation. Finally, at the end of the century, the Swedish Chemist, Svante Arrhenius,, was able to quantify the greenhouse effect, the amount of global warming due to the carbon dioxide emissions. By the way, living in Sweden, he considered this a positive effect. He hoped that life would become more pleasant with a little bit warmer temperatures. I would like to jump to the '70s and show you 30 seconds from a very popular German TV show at that time. The host describes in detail how global warming works. It's in German, unfortunately, but I put subtitles in English so that you can read it. A gain, this is from 1978. [foreign language 00:12:39]. The consequences will be dramatic. Isn't it incredible how precisely they predicted global warming in 1978? I find this amazing every time I see it. This TV host was not the only one who knew. Many companies knew, including. [foreign language 00:13:45] Here is, since then, Exxon. Exxon knew, and Exxon knew exactly. You might have heard about it because it just recently made the news. A group of scientists just published a science article assessing Exxon Global warming projections. I would like to read two sentences from the abstract. The first one is, "Their projections were also consistent with and at least as skillful as those of independent academic and government models." In other words, they had excellent predictions and scientists. The final sentence says, "On each of these points, however, the company's public statements about climate science contradicted its own scientific data." This is a very polite way to express that they invested a huge amount of money to actually dismiss global warming. As we do have the documents, I can show you this in a little bit more detail. This is the original letter from Exxon from 1982 called CO₂ Greenhouse Effects, and ending with the remark "Not to be distributed externally. For internal use only." Exxon estimated the development of carbon dioxide in the atmosphere and global temperature until 2100. So let's zoom in. Here, we see the year 2022. And the corresponding most probable measurements that Exxon predicted were 420 carbon dioxide and increase of temperature of 1.1 degrees. This is spot on, right? A ll one can say is excellent work. By the way, if you wonder why they were interested in this question, it was partially because they knew that global warming would lead to a rise of sea level so that they had to build their oil platforms higher. Now, with the First World Climate Conference in 1979, did climate policy have measurable success over the last 40 years? This is my final question to wrap up the historic part of this talk. Yes, no, or one can't say? Very unfortunately, the answer is a very clear no. Below the graph, you see the famous temperature stripes showing the increase of temperature over the last 60 years. The graph itself shows the carbon dioxide in the atmosphere. W e have the First World Climate Conference, the First IPCC Report, the First UN Climate Conference, the Kyoto- Protocol, the Copenhagen Accord, and finally, the Paris Agreement. During all these meetings, conference, and agreements, carbon dioxide in the atmosphere increased from 316- 420, the measure that we have right now. None of these conference, agreement or meetings had any measurable effect on our actual situation regarding climate change. Finally, not to leave it to these a little bit depressing news, what can each and every one do about it? Let me first very quickly remind you that there is a practically linear relationship between temperature increase and global emissions. If we want to keep the 1.5 degrees goal from the Paris Agreement, we can very easily estimate that we have left approximately 500 gigatons of carbon dioxide. This was in 2020, three years ago. Three years ago, we had 500 gigatons left for the 1.5 degree goal of the Paris Agreement. Now, if we relax this a little bit to two degrees, then we have 1,350 gigatons left. To demonstrate the current status of our emissions, I'm going to switch to JMP. Let's start with having a look at the global emissions. If we look at it historically, since 1850, you can see the map on the left side of the main emitters, and I have the data here on the right side, the three top units that contributed historically to global emissions are the United States, responsible for one quarter, a little bit less the European Union, and 13 % from China. This adds up to roughly 60 % that these three top units are responsible for historically since 1850. Now, if we look at the current status, so this is data from 2018, you can see here that the top three are still the same, but the order changed. China is now by far the country emitting most, followed by the United States, and third in place is the European Union. T hese three still add up to roughly 50 % of global emissions per year. The conclusion here is that without these players, we are not going to get anywhere. Now, to make the comparison a little bit fairer, let's look at emissions per person. Here in the in this lower graph, you see the emissions per capita for different countries. The top ones are the Gulf states like Qatar and similar countries, followed by a second group of high emitters, Australia, Canada, and the United States. Then there is a third group that consists of China and the European countries in the middle. Then we have low- emitting countries typically found in Africa. Now, please follow me on the following calculation. I said before that in 2020, we had 500 gigatons left. Now, it's easy to turn this into a per- person budget, which is 56 tons. I f we want to be carbon neutral by 2050, this leaves us 28 years, 56 tons. T he personal budget on average per year that is compatible with the 1.5 degree of Paris is two tons per person. This is this red line that you see down here. Now, let's look at the United States, for example. For the last three years, the United States have emitted approximately 18.4 tons per person. In three years, they already used the 56 tons that they had left until 2050. On the top, you see the years left until the corresponding country has entirely used its budget. And you can see that the US, Canada, Australia, and Qatar, they are at zero or below. In other words, these countries already have used everything they had left to keep the Paris goal of 1.5 %. Every breath they take now, every car they drive, every plane they fly, is already on the depth side towards this climate goal. D on't worry, the Europeans are all going to follow in a future in a couple of years. The conclusion of this is unfortunately that we have absolutely no chance to reach the 1.5 climate goal of the Paris Agreement. Every ton that we can save is good, but the 1.5 degree goal is gone. Unfortunately, there's agreement on this. If we now look at the global emissions by sector to a little bit approach the question in what area we can personally contribute, then you can see that almost three quarter of the emissions come from burning fossil fuels, oil, gas, and coal. So if someone says the climate crisis is a global energy crisis, he or she is absolutely right. Almost three quarter of the emissions are due to burning fossil fuels. 20 % come from agriculture and then they are cemented waste. Here in the middle, you see this in a little bit more detail, and I would just like to emphasize one, and this is livestock. Livestock is responsible for 8 % of global emissions. What that means is the following. If you put all the cows, pigs, and sheep, and everything in one country, looking at their emissions, they would be number three in the world. There's China, the US, and all the animals. The country consisting of all the animals would be the third biggest emitter on this planet. This is one reason why agriculture is a huge contributor and is actually the field with the highest impact for your personal influence. Followed by buildings, meaning how you heat and the electricity, and the third question is how you move. So transportation, buildings, and agriculture are the three big contributors where you have personal influence on global emissions. I would like now to turn the attention to these three fields. I will start with transportation because I think this is the best known. But it's always good to look at this personal budget. Let me remind you, your personal budget is two tons. One transatlantic flight Frankfurt- New York, consumes four tons. Twice your personal budget is spent on one transatlantic flight. There are other ways to use your personal budget quickly. One luxury cruise seven days, 2.8 tons. Driving your fossil fueled car for one year, 2.3 tons. Everything already above your personal budget and you haven't eaten anything yet. Generally, it will be known to you that taking a plane is the worst way of moving. You cut emissions more or less by half if you take the car and you cut emissions by one tenth if you take the train. Of course, public transportation is better than private one, and the best way to move is if you use your own muscle on a bicycle or just by walking. Now, for buildings, the situation is quite clear. 60 % of the emissions come from direct or indirect use of fossil fuels by heating, cooking, and electricity. T he conclusion here is very easy. Turn to renewable sources for your power use in your house. Heating, cooling, electricity. Turn this into a greenhouse and you will significantly contribute to a reduction of your carbon footprint. 20 % almost of the emissions in the building's area come from building material. And it's very interesting that there's a lot of research going on to replace classic building material by carbon dioxide neutral or even negative one. And I included one example, this is a company from Switzerland that actually stores carbon dioxide into recycled concrete and tries to reduce the carbon footprint by this. Finally, agriculture. I have here the data for four different diets and their carbon footprint. It's data from the US. It's not that easy to find the data for other countries, this is why I took the one from the US. The average American diet, again, uses your full budget of two tons. If you leave out dairy or if you leave out meat, so you turn to vegetarian, this significantly reduces your footprint. But the really huge step is leaving out both and becoming a vegan. If you wonder why this is the case, it's because of the footprints of different types of food. You can see that all the vegan food here is in the lower section. This is split in methane and non- methane greenhouse gasses. And all the high emitters are in the upper ones. A ctually, I didn't arrange the scale right. If you look at the top, beef from beef herds, it's not 40, neither 50 nor 60, nor 70. It's actually 100 kilograms per kilogram of the corresponding food. I f you only want to do one thing, in your diet, leave out beef. Personally, I find one of the most impressive statistics, this one, 29 % of our Earth's surface is land, 71 % is habitable. Half of it we use for agriculture. Of this part, 77 % is directly or indirectly used for livestock. This is one third of the habitable land, but we only produce 18 % of calories from meat and dairy. This is why the lead author of the corresponding article, Joseph Poore says, "A vegan diet is probably the single biggest way to reduce your impact on planet Earth, not just greenhouse gasses, but global acidification, eutrophication, land use, and water use." He himself turned vegan after conducting the study. Let me just wrap up a little. H ere are the four actions I introduced in the beginning, and I hope that by now it will be no surprise anymore to anyone that going vegan is the most efficient thing you can do. No plastic bags is good for the environment, but it doesn't really have an important impact on the carbon footprint. Here are the answers from the survey I showed you. And as you can see, no plastic bags was the answer with the highest rank, highly overestimated, just like only eating regional and seasonal food. And on the other hand, reducing meat was highly underrated. I would like to wrap up with a quote from Al Gore, where maybe because it's not 100 % clear that he said it, "Vote, voice and choice." What can you do personally? You can vote in every election, make climate policy a priority, and let officials know what you want. Make your voice heard. Support organisations, talk about it in your company, et cetera. Finally, your personal choices matter. Ideally, eat a plant- based diet, reduce use of fossil fuels for mobility, in particular flying, and make your home green by using renewable energy for electricity and heating. My contribution to making our voices heard was to give this talk today. I would like to thank you very much for taking the time to listen to my message.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Has your world changed in the past couple of years? Ours has too! We hope our changes will make your job easier. This talk will present our newly-free and low-cost options to educate your engineers and scientists. We offer instructor-led courses on our public schedule and will add courses to the public schedule on request. If you have instructors in-house, we can provide them with our course materials, and they can adapt our demonstrations to use data relevant to your students. We also have free self-paced eLearning available. We are looking for ways to help you even more! What topics do you want to see for half-hour Mastering JMP sessions, one-hour Deeper Dive sessions, or multi-day analytics education? What times should we offer these sessions? In-person or remote? This session will include time to gather your feedback. I'm Di Michelson from JMP Education. I'm happy to be with you today to talk about the resources that JMP Education has to offer you. They are centered around learning how to get the most out of JMP, including both how to use JMP and how to use statistical and analytical methods in JMP. In the live session at the end of this recorded talk, I'll provide links for where to find the content in the talk. I also want to get your feedback of how the JMP Education group can provide even more services to you and your company's JMP users. I'm part of the JMP Education group, managed by Ruth Hummel. Monica Beals is also part of the group, maybe some of you know her. Together, we have over 40 years experience teaching people how to collect and analyze data to get the information needed to make smart decisions, mostly using our favorite software, JMP. Today, I want to talk about what we can offer, and to get your feedback on other ways we can help you to Learn JMP and analytics. We'll talk about eLearning, instructor led classes, a new way for your trainers to develop courses quickly using our course materials, as well as the one place for you to learn about JMP, the new Learn JMP space in the JMP user community. Let's start with free, on demand, self-paced eLearning. We converted some of our paid eLearning courses to free courses in 2022. Of course, you know STIPS. It's been available for quite a few years. STIPS is a very broad course with over 30 hours of self- paced learning on many analytical methods, and you can integrate STIPS into your academic or corporate training program. JMP Educations' analytical eLearning courses go deeper into statistical methods and JMP usage than STIPS does. We have self-paced eLearning on many topics and released both our introductory JSL course and our SPC course last year. By the time you see this recording, I hope that a few more courses will have been released. Our plan is to convert all courses available in the JMP learning subscription, which is currently a paid service, to free eLearning. Data Exploration is our most popular course. It teaches students how to use JMP by means of several case studies. That course is usually followed by ANOVA and regression, which teaches the basics of statistical modeling. The custom DOE course teaches the principles of design, so you can use the custom design platform in JMP, to collect the data needed for analysis with statistical models to enable you to make good decisions about your processes. There are two other courses currently in the JMP learning subscription that we're planning on converting to eLearning. Our classic DOE course, which covers fractional, factorial, and response surface designs, in a different way than our custom DOE course. We also have a course on stability analysis, which is written for those in the pharmaceutical industry doing shelf life studies. Here's an example from our SPC course. Each lesson consists of videos of theory of control charts or process capability, along with demonstration in JMP , along with quizzes and practices to help you retain what you've learned. Self-paced eLearning is great, especially when you need to learn control charts at 2 AM. But some people learn better with an instructor, and for you, we also offer instructor- led classes that are live. These are public classes with students from many different companies and industries. We organize these courses into these buckets, and currently, Monica, Ruth, and I are teaching a few times per month. We have many courses, ranging from how to get started with JMP and analytics through designing experiments. We have three courses on designing experiments. The first one here is general, and the next two are for specific types of experiments. We have classes on quality improvement, including quantifying the variability in your process that is due to your gage, measurement systems analysis. Also, controlling the variability of your process using control charts, statistical process control, and analyzing time- to- event data in our reliability analysis class. We have lots of courses on advanced analytics, including platforms in JMP Pro, like analyzing many response variables at once, modeling categorical or discrete responses, functional data analysis, text analysis, generalized regression, and methods for explanatory modeling, and predictive modeling. We also have two scripting courses, the introductory course that teaches you the language, and a course that takes you through two examples of designing and building a production script, including building an interactive user interface, pulling data by querying different data sources, building custom reports, and then presenting results back to the user. The public course schedule is in the Learn JMP space in the community and on jmp.c om/ training. I'll take you there after this recorded talk is finished. From this page, you can see the course descriptions and the schedule as well as register for classes. One thing I'm excited to tell you about is we've recently implemented a request a course button, and that's for you to use if you look at our public schedule and don't see a course that you want to take on the schedule, or you found the course but it's not scheduled at a time that's convenient to you, especially in Europe, then you can click the request a course button and ask us to put a course on the public schedule at a date and time that's convenient to you. You can also use the request a course button if you just want to be notified when we add a particular course to the public schedule. So we're really excited about this request a course button. We hope it will be useful for you to tell us when you want to take instructor led courses. We've talked about on- demand eLearning and public instructor- led classes. Now let's talk about how your trainers can use our course materials in their course development process. We are providing free of charge our course materials, and that's the power point slides, the PDF file of the course notes, and the course data, and for most classes, that's a JMP Journal. You can request access from your sales team, your account manager, and your JMP systems engineer. If you don't know who they are, ask your JMP administrator. You'll sign a contract with some very basic terms of use, and you can modify our materials as much as you want, including replacing our data in the demonstrations and practices, with data that are relevant to your learners. We hope that your trainers will be able to use the courses that we have created to quickly create more relevant learning content for your company. Here are those categories of courses again, with the number of courses that have course material available within each category. If your trainer wants to see how we teach the course, have them come to a public class, or use that request a course button to ask us to schedule a public class at a time that's convenient to them. W e've talked about the courses that JMP Education has to offer. There's much more JMP learning content at the Learn JMP page in the user community. Just go to jmp.c om/ community and click on Learn JMP. Our vision is for Learn JMP to be the one place to access all JMP learning materials. All the pieces are not there yet, but we will be continually improving this space. We want the Learn JMP space to be helpful for users across the spectrum, from never having used JMP before, to being a JMP expert, and new to analytics, to a trained statistician. The learning materials cover different learning styles with live sessions, recorded or created videos, as well as things to read. We also organize by time commitment, from short one- page documents or a five- minute video to a half- hour mastering JMP, to full courses. A ll of the material is organized according to the JMP Analytic Workflow. It ranges from data sources through visualization and analysis using JMP platforms through sharing results with JMP users and people who don't have JMP. You can see this JMP Analytic workflow in action at jmp.c om/ workflow. In the Learn JMP space, you'll find the brand new getting started in JMP on demand, and that's for new JMP users. There's also lots of additional material on how to use JMP. There's Mastering JMP live webinars and on demand recordings, and there's our eLearning and instructor- led courses that I talked about earlier. There's also something new this year, deeper dive, and it fits into that 1- 4 hour session, so it's longer than a half hour mastering JMP, but not as in depth as a two day formal course. We'll be adding to the deeper dive topics as the year goes on. Please let us know what topics you are interested in learning more about. We want to make learning content that you want to use.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

To address the new space market focused on cost reduction, the use of COTS (circuit on the shelves) products is a good option. To be compliant with space reliability, COTS must be evaluated and modified to meet space agency specifications, especially on Single Event Latchup (SEL). This effect occurs by the strike of a heavy ion on the circuit. The transmitted energy to the matter (LET) triggers the parasitic thyristor and induces the latchup. The SEL sensitivity is characterized by the LET threshold and the holding voltage criteria. Until now, to evaluate LET threshold, TCAD simulation and experimental tests were performed, but TCAD is time consuming and irradiation sessions are very expensive. An analytical model LET threshold is a solution to obtain a quick estimation at lower cost and could help to harden the product to radiation. JMP is well adapted to assist in building and analyzing the results of a design of experiment (DOE). Its profiler is adapted to find best combination of the inputs to meet LET threshold criteria.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Extracting pertinent information from unstructured text data can pose a daunting challenge. Someone may wish to mine blocks of text for websites, telephone numbers, emails, or physical addresses. It could be that units of measurement between documents need standardizing. The Regex function, quietly incorporated into JMP a few releases ago, is an extremely powerful tool to quickly and easily perform these and other tasks. It is also a tool that, for many, is shrouded in mystery. This presentation seeks to highlight this often overlooked and underrated function and decode its inner workings to allow anyone and everyone to tap into its full potential. Hi, welcome. You've found our talk on Regex. It's a powerful text analytics tool that Hadley and I are going to explore the basics of today in our talk. Yes, thank you very much for clicking on this link and watching this presentation. What is Regex? Well, Regex is a function that searches for a pattern within a store source string and returns a string. That definition was taken from the Regex function of the JMP scripting guide. I'm not sure that that definition quite does it justice. Before we go into some details about how you can use it, what the power and value of it is, what I'd like to show you here is just the format of the function. It takes in a source, a pattern, and then if you like a replacement string, it has other functionality as well. But for the purpose of this presentation, we are going to be talking about these first three inputs to the function. Before we dive too deeply into it and show you some examples, I just like to talk a little bit about how to set up a pattern in Regex and specifically about the concept of escape characters. These are characters that can mean many things. For example, a \W can maybe mean a W. It can also mean any lowercase or uppercase letter A through Z, as well as numbers zero through nine and a lower space lowercase, what's that called? Underscore. How you would refer to that is simply by typing \W. If you wanted to refer to a literal W, you would just write the word the letter W. Digits can be expressed in their actual form or they can express generally as \ D, and then \ S refers to a single white space character, including tab, return, new line, vertical tab, and something called form feed. Probably some of you watching it know what's that means. I'd like to mention some special characters now, so you can see the period, the question mark, the asterisk plus refer to matches of different characters. So the period refers to any single character. Question mark matches zero or one instance of whatever is put in front of it. The asterisk matches zero or more and then the plus matches one or more. Now there's some other characters as well. I won't go through all of these and there are many more that I haven't captured, but I thought to put them in this table and save them here so that if you like, you can pause this and you can see exactly what these are. Let's look at an example. Let's say that you wanted to extract all email addresses from blocks of text, free text with many email addresses in all different formats. How would you do that? Well, let's look at our source, which would be for example, for help contactsupport@jmp.com. It's free with your license of jump. If we wanted to look through this and extract the email address, we'd have to refer to it as a pattern. So that pattern is one or more instance of any character, including numbers. Perhaps we can refer to these as \W followed by an ad sign, followed by one or more instances of \W of any character or number or letter, followed by a literal period indicated by \. a nd then the letters C-O-M. If we were to set that up in a Regex function, the return result would be the email address support@jmp.com. That would be the pattern that matches. Now some of you watching this, I know what you're thinking. Not all email addresses follow this format. Some of them have other characters in them, some of them have multiple periods, some of them perhaps don't end in com, they end in something else. That's all very true and this isn't going to match with those. What you could do is then take this pattern and perhaps generalize it in different ways to get more email addresses. The more of what you're looking for match more patterns. We'll talk and we'll show you an example of how you can do that and what that process looks like. The examples we're going to look at is an example of automated machine messaging indicating error messages, different parts of the system. What we want to do is extract the components that are broken from all of these messages. I'll show you how to do that. We're going to take phone numbers that have been entered manually in all different crazy formats and we're going to put them in a uniform format and we're going to extract info from coded text. In this case, this is file names that contain information about how different biological samples were run, the temperatures, the stressed tests and so on. Times, all of this is coded in the name of the file. We're going to pull out all those pieces and then organize them in a table that we can work with them. Now probably you've all clued into the fact that Peter and I are not Regex experts. I think that the word novice is probably a better description of how of our competency in Regex. The purpose of this talk really isn't to show off our Regex prowess and how great we are using Regex so that everybody should be impressed. Now the purpose of this talk is to demonstrate how powerful Regex can be, even for novices. Even with a very little bit of knowledge about how Regex works and how patterns work, you can get a lot of use and a lot of functionality. Now Regex can be intimidating, but it needs because at its core it really is very simple. We're going to take you through some examples and show you exactly how simple it is and how you can start using it right away. Without further ado, I will turn things over to Pete. All right. Thanks, Hadley. Go ahead and get started here with the first example. Like Hadley said, this is an example where we're trying to extract out of a description here what part was actually broken. There's probably many different ways you could get at this, but we're going to show you how to do this with Regex. I'm going to create a new column, generate a formula here, and I'm going to look for Regex in the filter, find it there, and then start with my description. That's what I want to run the Regex on. Then I'm going to define a pattern. If you remember with what Hadley shared there, there's a couple of little tricks to remember with Regex that will make it a lot easier. The first thing I'm going to do is put in a W, which is a character, but I want this to be more than one character. I'm going to do a W and a plus. Then after that w and plus, I'm looking for something that has a space and says the word broken. As long as I type that out, right, you'll see here that my formula result is there. If I hit apply here, you can see that it tells me what is broken, but it also contains that word broken. Maybe I don't want that. Maybe I just want what the part is, not the word broken in there. Then if I want to do that, how I can do that is go in here and containerize this to make this a first word of the list here. Then I'm going to just say, hey, I only want that first word. If we look at the preview here, it's just giving me that. Now if I hit apply and okay, I've extracted out what I was looking for. Now, this is a simple example and you could probably think of other ways to be able to get that specific part of this description out, but I wanted to show you how you could do that with Regex and really just a very simple start to this. Let's look at a little bit more complex example. Here we have phone numbers that are entered randomly and they have different spacing, different delimiters in there. Sometimes there's a one, sometimes there's not. Sometimes there is extensions, sometimes there's not. We want to format that in a different way and end up with a more clean format. Here's the end result. Unlike the last example, I think this one is a lot more difficult to do without Regex. Let's walk through how we can do this with Regex. Very similar. We're going to start, I'm going to type in Regex here and I'm going to move this down so you guys can see as we're building this Regex, the results pop up there. I'm going to put that phone numbers in as my original pattern or my original data, and then I'm going to start with that pattern. If we remember again from what Hadley said, we're looking for digits this time. Our pattern is digit, digit, digit, then something. We don't know what, but we'll put in that question mark because it could be many different things and then we have digit, digit, digit. Let me pop this open a little so we can see it. Then again, we have a question mark because we don't know what that delimiter is in there. Then we have four digits. Okay, all right. If we look at a preview, you can see it catches some of these. I'm going to just hit apply and now you can see some of these numbers were captured here, but some were not. Then our output formula isn't what we're after. Let's go back and open this up and we're going to containerize those like we did in the previous example. We're going to look at three individual words here, or three individual sets of digits, I should say. We've containerized them, we'll hit okay. Then we want an output that looks a certain way. We want to have the first word followed by a dash, then the second word or set of digits followed by a dash, and then the third. Okay. When we hit apply here, you can see this is cleaned it up a little and at least the output format is what we're looking for, but we're missing a few. Like, let's look at this one specifically. This one has a space here. How do we tell Regex that there might be a space, but there might not? We'll go back here and we're going to edit this a little bit. We're going to put in a potential space. I'm going to put a space with a question mark there because it might be there, might not and I'm going to hit okay and apply. There you can see it captured those two with the space. But you can also see some of these have a one at the start, like line five here. How do we tell Regex that there might be a one there? So just like we did with the space, we're going to go in, we're going to say, "hey, there could be a one here. " I f we do that and hit okay and apply, you can see that it cleaned those up. Now we're pretty happy. We've got everything in the format that we want it. But you can see there is other examples of different styles of phone numbers here. If people have put in letters instead of numbers, it's not capturing all of that. There's more we could do with this to clean these up further, but we've taken a lot of messy phone numbers here and clean them up into a nicer format. This is a good way to use Regex. Now I'm going to pass it back to Hadley for the last example. All right, thank you very much, Pete. Very well done as well. What I'm going to do is I'm going to show you this example here, which is an example of descriptions taken from file names. The first seven digits, I think the first seven things are the name of the sample and then how it was run. Temperatures sometimes included, but not all of them. Days sometimes or weeks. Time sometimes included, but not always. Let's extract all of this information and what we ultimately want it to look like is that. We are going to use Regex to extract the sample project code from the front, the stress condition from within, the temperatures as well as the mean of those temperatures, temperature range. Then if there is a time we'd like that as well expressed in days and not in weeks. Let's delete all this and see how we can do it. Now, the first thing we can do is to add our project code and we could do this in Regex. But you know what, this is actually probably pretty simple to do using substring. It's this guy, the first seven. There we go. Let's not complicate our lives. Now, the rest of it, I think, is a little bit more tricky. What I'm going to do is I'm going to open up a new script. We're going to start out, we start out old scripts and we are going to go in and grab all of these descriptor names. We're just going to create a list called Description with all the values in this column. What I'm going to do is just show the log. You can see here that if I run Description, I've now got all my descriptions. What do we feel like starting with? Let's see, I think temperature is probably a good one to start with. What I'm going to do is just to show you that if we take the temperature code here, all of these are going to be in about the same format. We're going to create a list container to hold whatever it is. We're going to loop over all of the items in description. Temp code, I going to equal something at a description. Then once we get all these, we can just slap the whole thing into a new column. What is this going to look like? Well, it's going to look like Regex first of all, our description, I think this is just description I followed by what is it? We're talking about temperatures here. It's one digit, maybe a second digit, followed by a dash, followed by another digit and maybe a second digit. Then the letter C. What we want is this first set of digits, followed by this second set of digits. If I run this, hopefully it works. There we go. As I'm doing this, I see that I probably could have gotten away with just doing this. That would have been fine too. I probably didn't need that second one. But if it works, it works. If it's broken, don't fix it. There we go. Let's move forward and what should we do next? Let's grab our time. Time is going to work exactly the same way. We're going to create a container for time. We're going to loop over descriptions for time. Now what do we want? We want our time code equals Regex. What does this look like? It looks like well, first of all, we've got our description followed by, what's our pattern? It is the word day or the word week. Then one digit. Might there be two digits? I guess there might be. We're just going to wrap some containers around this so we have a day or a week. We don't have both. Then we have one digit and maybe a second digit. We want our second container. We don't want the word day or week. We want just this. If I run this, let's see what time code looks like. There. You can see that where it was able to it managed to grab the day or week and put it in. Let's take all of this and drop it into a column. But before we do that, you perhaps want this expressed as numbers rather than characters. What I could do is run that and express the whole thing as a number instead of a character. Now we're getting closer to where we need to be. Of course we want to know whether these are days or weeks and we're not going to know that. That's going to affect how we put this in what we need to do here. Because if it's days, then it's fine. If it's weeks, then we should take whatever numbers in here and multiply it by seven to show that we are consistent with the number of days. Then we'll put that in a new column. What is that going to look like? Well, it's going to be an if statement. If and another Regex, if our descriptor day or week equals week. Once we pull this out, our description if it's week, then take whatever time code we have and multiply it by seven. What did I do? I think I probably need to close that guy. Sorry about that, everyone. Okay, now if we run our time code, you can see that our weeks are now multiplied by seven. We can take all that and drop it into a column. All right, so far so good. What's left? Oh, yeah. We want the mean temperature rather than the ranges. What I'd like to show you right now is how we can make use of Regex once more, and that is to take whatever was in our temperature code and again, apply Regex to it to say that if it was the lower one, the minimum one is going to be the one on the left side. The maximum temperature is going to be the container on the right side. To set these up, but I'm going to take all of this and wrap it into a loop again, like that. Now we've got our minimum temperature, our max temperature, and our mean. This is how we're going to set this up in Regex. Anytime we've got temp code and this is the pattern, take the first one, take the second one, turn them into numbers, calculate the mean, and then slap that entire thing into a new column. Oops. Okay, so the last thing we want to do, is grab this middle sample here. Now, I'm not going to walk through this in its entirety. Let me say that back. I am going to walk through this in its entirety. Some of you watching this, if Regex is as new to you as it is to me, it may not get this on the first try. That's the beauty of recording. This is you can pause the recording, you can look at this, you can try it out for yourself. But basically what we're doing is we're going through the same process. We're creating a container for stress. We're looping through all of our descriptions and we're using those each individually as the source. What are we saying? Well, there's going to be eight characters. Any letter or number or underscore potentially a space as well, although I don't think there are any spaces. Oh, yes, there are. That's why I included that. There may be a space to eight of them. Then I like this here. This is going to be some stuff. Anything one or more of them, I think was what that meant. What this does is it just tells you to start at the beginning and start looking. Okay, and now where are you going to stop? You're going to stop when you find day. You're going to stop when you find week or week or a space, an open parenthesis, closed parentheses or some digits followed by C, or you get to the end of the line. When you go through all this, what are we looking for? We're looking to extract the second parentheses thing here. This was a literal open bracket. That's what we're looking for. Just drag all of these things here and drop those into your column. As you can see, this was a little bit more complicated. It used some more complex functionality, including look ahead. I'm not going to go into the details of that right now. But I'll just leave this up here so that you can see how that was done and how you would go about doing this for yourself. All this says is keep looking forward until you see day and then take everything before. That's what these means. That's what these mean. With that, what I'm going to do is open this up again. Just to summarize that regular expressions are a specification of a pattern frequently used to clean up or extract pieces of data. That you can search for a pattern and replace it with a different string or extracts different parts of the string. You can define the pattern using the Regex function or the Regex match function, which we didn't talk about, which we invite you to check out in the help files, which contain lots and lots of information all about Regex. As well as examples about how you can use it to solve the problems that you're looking to solve in whatever industry or whatever situation you're dealing with that. I would like to thank you very much for your attention and I hope you enjoy the rest of the conference to check out the other talks. Thanks again. Bye, bye

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

At the 2021 JMP Discovery Summit Americas, we presented a method for creating an “easy button” for data access, combining, cleaning, filtering, visualizing, analyzing, and generating new data. The use of the singular tense of “button” is not a typo. It only takes one button to perform any combination (or all) of these techniques, thus saving time and allowing problems to be diagnosed earlier. Informed answers to questions that lead to the best possible outcomes can be made faster, ultimately saving costs and speeding products to market. We now utilize the new OSIsoft PI connector in JMP 17 to extend the sources of data that can be quickly and effortlessly imported and everything listed earlier, with just the push of a button. Hi, thanks for finding our talk today. Hadley and I are going to be talking about making an easy button for data access. Now, this is a talk that we had given previously at a former Discovery Talk, and we're going to be talking about how you can extend this capability with our new OSI soft PI connector. Hadley, take it away. Yes, that is absolutely right. Before I move into what we're going to be showing you today and what you can use yourself, I'd like to just introduce those of you who aren't familiar with the JMP Analytic Workflow. E veryone watching this talk likely already knows that JMP contains all of the analytic capabilities necessary to take any data that you have in any raw format and transform it into insight that can then be shared throughout an organization. What we are going to be focusing on today is the data access and the data blending and cleanup aspects of the analytic workflow. Why are these important? Well, any problem- solving effort begins by collecting and compiling the data. One big problem, is that this can often be time- consuming and tedious, especially for scientists and engineers who have background in this stuff. What this effectively means is that it's often not done or not done in a timely enough manner. So problems can get unnoticed and problems, therefore, aren't solved. The other problem is that data can be found in many different places, and it's an effort to grab all of this and put it in the right format, compile it together. A solution is an easy button for quick access to data wherever it is. What we have got and what we are going to show you is a simple interface built using the application builder, which is a simplified strip down option allowing people to press a button and get data from exactly where they need it in the format that they need to be able to solve their problems. They can pick a data source and filter what is needed if necessary, even combining multiple sources and automating this. As Pete mentioned, what we're going to be doing is we're going to be building on a tool that we had previously shown which used SQL web APIs and even manual entries as well as combining data from other sources. Where have we shown this before? We've shown this in a previous Discovery Talk. So t hose of you watching this can look at the past Discovery presentations and check those out if you like. What we are going to do now is take it one step step further from where we were back in 2022, and that is to extend it to data contained in OSIsoft PI servers. We're going to be making use of two features that were introduced in JMP 17. There's the Connect to the PI Server as well as the OSIsoft PI Wizard. With that, I'll turn things over to Pete to demonstrate that functionality. Thank you, Hadley. Share my screen here. If I'm going to launch that PI importer, you'll find it in the same place you'll find all of the database connectors. Just like we would do for SQL, you'll go under file, database, and import from OSIsoft PI. You enter the name of your PI Server, your authentication method. Hit okay. Then it gives you this nice interface here and you can browse to what you're interested in and pick out a couple of attributes or tags that you want. Let's just pick one for now. Then I can select what my start time is. I'm going to go back a little bit in time and shorten this query a bit so it goes a little quicker. Once you're ready, you can hit Import. This is a big improvement over what you had to do before, which involved a fair amount of scripting. But the nice thing here is once I've imported this, everything that I need to pull that up again is captured right here in the source script. So if I hit Edit, you can see all that was needed to be passed into that PI data source was right here. Hadley is going to take this now and start to make our easy button. I'm going to stop sharing and pass it back to Hadley. All right, thanks for very m uch, Pete. I'm going to go ahead and share my screen once again and show you that what we are going to do is we're going to take that script that Pete just generated using the OSIsoft PI Import Wizard in JMP 17 and turn that into a simple add- in that literally anybody could use to select whatever tags they need and then grab that data. If you know what server it's coming from, you know what the configuration is, you're always grabbing the same data in exactly the same way, the only thing that might change is the tags, then this may be a stripped down simplified solution that anybody could use. Of course, if you had other things that you wanted to filter on, like timelines and stuff, that's easy to include as well. And if you wanted to take this a step further and combine these data sources and maybe do some automation on them or automated analysis. That's an easy step from there, and Pete will show you how to do that a little bit later. But what I'm going to do is I'm going to take the source script that was used to generate this data. I'm going to copy it and paste it into a JMP script. Now when I run the script, it goes back and collects the data from this tag IA right there . It could very well be that there are multiple tags that you would like rather than just one. Maybe you have a list of tags that you need. What I'm going to do at first is I'm going to define a tag list which may contain the tag IA as well as IB and IC and as many more as I feel like including. This would be a good option, is if you were always getting the same tags every time. It didn't need to select them they're always the same. Here they are. What I'm going to do is I'm going to run this for each one of these tags in this tag list. To do that, I'm going to make use of another relatively recent addition to JMP. Only I think it was added in 15 or 16, I'm not quite sure, but that's the For Each function. For each tag in my tag list, run this. My tag is going to be here. I nstead of running IA, we're going to just concatenate the tags and then go ahead and run that. Excuse me. There we have it. It really is just that simple. It'll take a few seconds, but there we've got our text. That's a good solution if you were always running the same text. But if you wanted to take this functionality and extend it a bit to allow a user to select some tags using these configurations, I'm going to take this code and I'm going to run it or set it up in application builder. Now, rather than hard coding a list, I'm going to ask the user to select the list from the list here. We'll just add a few tags to that. Let's add tag IA, IB, IC, and maybe one more kilowatt A. There they are. Now we'll just add a button that the user can press to grab whatever they've selected in the list and then get the tags. Button 1 is a good variable name, but we need a bit of a better descriptor so the user know what to do. There we have it. When we press this button, we are going to have it run the script that we just wrote. Of course, instead of getting our tags from this tag list, we are going to have it grab whatever a user selected from our list 1 list box. Can it really be that simple? Yes, it can, and yes, it is. Of course, if we wanted to extend this functionality, at this point in the sky, our imagination and our needs are the limit. At this point, I will pass things back to Pete to show you how you can go ahead and do that. Wow, Hadley, that really does look easy. Very nicely done. Why don't I share where we went from here? Actually, let me share this screen here. All I did was take what Hadley had shown and add a few more tags. The next thought is, "Hey, that's great that the PI importer is bringing in these tables individually, but what happens if I want to bring them together?" L et me just show what this does first. I'll do that data poll. This is what Hadley showed. Then I'm going to do the next step, which is a data compile. Now, this takes advantage of the workflow builder, and we'll go ahead and walk through and actually write this since it's really easy to do. I'm going to just pick a couple of things to compile here. There you go. I'll show you how all of this was done. Okay, so basically, what JMP has done is it went through and it grabbed a bunch of those data tables, it concatenated them, then it split them apart, and all of these steps are here. So it concatenated those data tables, it split them apart, then it recoded those column names, and finally, it made just a simple report. So let's walk through how we would do this inside of the workflow builder. I'm going to close out of these. I'll minimize this, and I'll just start with those three tables that were pulled from the data. Here I have IA, IB, and IC metrics that I'm looking at. To start a workflow, you'll find it under File, New, and Workflow. A ll this is doing is it's grabbing stuff out of the log when I tell it to. If I hit record, it will capture all the steps that I do to any table manipulations, any joining or splitting of tables, any renaming or recoding of variables. All of that will be captured in here. Let's start with that. The first thing I'm going to do is concatenate these. U nder the Tables menu, Concatenate. I have A there. I want to add B and C. I'll give it a name here. We'll just call this Stacked Data and hit okay. There you can see that this was stacked and it's captured here as well. Everything I needed to do there was captured. I want to back up here. You can see anything, while that recording is going on, is captured. Let me back up, start over, show this one more time. With that off, it won't record anything. With that on, it will, so here, we'll do that. Tables, Concatenate again. I forgot to click one button there. I want to add a source name, so we'll do this again. Again, call this Stacked Data and hit okay. T hat was my first step. Now, the next thing I'm going to do is split this apart because I actually don't want it stacked. I want them together in the same table, but I want to split it now. So we're going to go to Tables and Split. I want to Split by that source column, which is why I didn't have that in the first time I did it. Here we can see this. I'm splitting by source column. This is also a new feature here. It gives us a nice preview. Now, this was something that I don't know about everyone else in here, but I used to struggle with this. I wasn't quite sure what I was going to get, especially with things like transpose and split, join some of the more complex table formulas. Here I have all of my 500 rows of data for each of these different metrics, but what I'm missing is a time stamp. Without having that before, I might have done this wrong, but now I want to group this by time stamp. All of these now have a time stamp associated with that particular metric. Now I'm going to just call this Split Data and hit okay. There, back to my workflow, you can see I've concatenated, I've split, but now I have this big ugly column name that I don't want. T here's a nice feature inside of JMP to recode these column names. I f I go to Columns and Column Names, there's a Recode Column Name, and this works just like recode for your normal data. I'm going to do a little advanced extract segment here. I want to pull out a portion that just looks at the very end, and that looks like the right values and I hit okay. Then I'm going to hit recode, and there we go. Then the last thing you'll notice, again, this is all captured. The last thing I want to do here is make that graph. I'm going to just go graph and I'm going to grab those metrics, A, B, and C, and then map them out by timestamp. Now, you'll notice one thing. This is not added to the workflow yet, so I can hit done and it's still not added. The reason that is, is I could still be making changes to this. Maybe I don't like this name. I might want to call it metric versus time. Maybe I don't like the format of these time stamps, so I can change that. But I'm doing all of these changes and the workflow doesn't capture it until I close this. When I hit Close, there you go. I'm going to stop recording now and I will go ahead and close these two new tables that were made and show you that this works. There we go. Okay, so now you may be asking, "Well, why would I use a workflow? Why not just use a script? What's the advantage of that?" So l et's close out of this and show you why. I gave you an accidental preview of this earlier, but we'll show you here. We're back to our application here. I went through and I pulled these three tags, and then when I concatenated them, it was looking for those three data tables with those three names. If I pulled different tags, so let's say I just want all of this data, so everything with an A at the end here, I'm going to do a data poll on. If I was using a script and I was looking for those specific data file names, the workflow or a script wouldn't work, but the workflow has this generalizability. If I look here in this concatenate tables, it's looking for three tables, IA, IB, and I C, and I don't have those tables open. I have AI open, but I have two other tables open. L et's see what happens when I run this. It prompts me. It says, "Hey, what data do you actually want to compile? You have different data sources here." I actually want to compile these three that I have open. Now it says, "Oh, wait, I couldn't find the column names." Again, when I went through and I recoded columns, if I was running a script, it would potentially just wouldn't work because it didn't find that column name. But here it says, "Hey, I can't find this column IB. Which one is that?" Let's just use a replacement column. There we go, and it worked. What this is doing is it has the ability to be generalizable with this reference. By default, a workflow has this ability to have a replacement reference, and I can manage this. Here you can see here are the tables, and I can prompt you to pick those tables. Then here are the columns that are referenced, s o I can have substitutes there. Unlike a script, this will prompt you if it doesn't find what it's looking for. So it's very nice in that aspect. T hat was basically how to build a workflow and then use that to compile data and have it be generalizable. I'm going to pass it back to Hadley here for some closing thoughts. Thanks very much. Let me just share my screen. In summary, making an easy button for data access solves some problems. It makes a lot of things easier. What it does is it addresses difficulties in assessing data because problems persist longer than needed. That's what happens when you don't have access to the data. Getting the data in the right format is really 80 % of a solution. Once you've gotten the data collected and formatted, compiled it, cleaned it, and then doing the rest of it is really the fun and easy part. Creating these buttons allows data to be quickly and easily imported. It's possible to add filters, not something that we showed today, but if you go back and look at our... Well, I guess the selecting from the list was one of the filter options. Of course, you can always add others. You can see that from the previous presentation that we did back in 2022, as well as extending this to SQL, web API, and any other place that your data may be. There are two add- ins on the community that we'd like to mention. There's the OSIsoft PI Importer as well as the PI Concatenator, you can find those things here. If you just Google these or look on our community, JMP. C ommunity.j mp.c om. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Working as a manufacturer in the biopharmaceutical industry means we often need to show that we obtain similar results on different sites when transferring a manufacturing process and, notably, when scaling processes up or down. Comparison techniques such as t-tests and ANOVAs are widespread, but equivalence testing has become a standard way to show two processes behave similarly. When we look at a few parameters, those techniques are easy to apply, but when we have large numbers of variables, it becomes difficult to see the bigger picture. The challenge with equivalence testing is that it requires the scientists to provide a value for what they deem an acceptable difference between the groups of data. In addition, many processes change over time, and we are interested in capturing whether they behave similarly across the duration of the process. JMP scripting is a great way to automate the data prep, visualisations, and production of all the plots and comparison tests for those data sets. The multivariate platform in JMP helps create a holistic picture of the process for each time point. We can now use equivalence testing and relate it to the individual variable contributions. Hello, everyone. Thanks for joining my JMP talk today. Today, I would like to talk to you about how we look at equivalence between sets of batch data over time at Fujifilm. In particular, I'd like to speak about a new multivariate take on the two one sided T-t ests. Although this particular bit is a new comparison technique, it doesn't replace, rather it complements the usual single point techniques, and in the workflow, we are still relying heavily on those. The last bit here will describe how we compare data sets, from the time series, we get from two different scales, but it could be any logical group. I'll quickly go through how we prepare the data, or rather get it in the state where we can run the scripts, and we will look at the visualizations for the usual single time points in JMP, and we'll also look at some scripts that I use to run PCA on all the variables and test equivalence. T alking about TOST or equivalence test. T he two T-test is a two one sided T- test, and it checks whether on average, your two data sets for a given variable are equivalent. Very similarly, a multivariate test checks whether on average the end's principal component or PC for a given day of a fermentation here, is equivalent for two different scales. You need data to be in a specific format, but in particular you need two different groups here. It's two scales, if you have more than two, you will need to split them in sets of two. It's more suitable for time series, because it's a data reduction technique that you wouldn't need if you hadn't a problem with having a lot of data points. This was part, originally, of a script that was all done in R, but as I moved into using JMP for visualizations more and more, I thought that it was much easier to use, especially if we want to pass on those scripts to staff. R is not always that accessible. I moved a lot of the script into JMP by now. The only thing that's still in R is the data imputation. The scripts and pre work take care of outliers, missing data, and inconsistent entries. The R does the data imputation. Why have I moved to JMP? First of all, it was because I visualized in JMP. Why is it good to visualize data in JMP? Because JMP is just made for that. It's really good for looking at missing data and outliers and any graphs. The time series are no exception to this. The missing data visuals in JMP, give you a color map of where your data is missing, so you can find rows where data is missing. That means a day might not be the best to keep in the data set. You can immediately visualize chunks of data missing, that it would be days in a row, and you can make a decision on whether you want to keep all those days or do the analysis twice, for example, and it will quickly show you if you have data missing from one group and not the other, in which case you'd have to do away with that variable altogether. Also, outliers need processing prior you interpolate, otherwise they will have a huge effect on the PCA and the comparison test. There is an outlier detection platform in JMP, but this is used here in our workflow combination with watching the time series and the comparisons. But I'll show all that in the demo. Then the Graph Builder is used to plot all the time series. There are many ways to do that, but I will show you in the script the two main graphs that we use to check that our data is good to go. Here they are, the time series, and those in particular are wrapped by a batch, so that you get an individual plot for each time series. It's a small plot, but usually it's enough to spot missing data outliers or any weird or different behavior. Here, for example, let me get the laser on. There we go. Here we have a cluster of points that are questionable. We need to check whether this is behavior that we want to capture or if it's behavior that's unusual, and we want to imputate the rest of the data. Or we could have single outliers like here. We see this quite often, but here you can notice it's all on the same day. So is it something that happens on day six in those fermentations? Another way to plot time series, and we do that as well, is to actually overlay them. By overlaying them, you can see whether your data is consistent for a given day. Here we have individual value plots, which is what we look at. But I've also asked JMP to put a little box plot around all those data because this mimics what we have when we are doing our ANOVAs in the second step of our data visualization. This is very typical of what happens in our processes over time. In the first week, the data is showing very low variability. The box plots are small and they are usually fairly well aligned. The average is around the same value. When we reach the second week of the fermentation, things start to drift apart and things start to get much more variable. If you were plotting day six by itself in one batch per day type of plot here, you'd be able to see what the difference is on average between the large scale in red and the small scale in blue. On day six, you have a small difference and on day 12, you have a large difference. Those differences are what we are looking to test when we do single poin T-tests with our ANOVAs. Before we carry on, I just want quickly have a recap of what the differences are between T- test and TOST. A T- test is completely statistical, whether the TOST requires a user input for a practical acceptable difference. In a T- test, you hypothesize that there is no difference between the mean and if you get a small P value or significant result, then you deny that and you say there is a difference between your data sets. A TOST tells you that there is no difference between your data sets, if you have a significant result. If you fail a T- test, the confidence interval for the main difference, in those plots, that's that black square here. So the confidence interval for the TOST, if you fail, does not cross the zero line. But if you fail a TOST, the confidence interval for the difference is not contained by the practical acceptable difference. You have two outcomes for a T- test and two outcomes for a TOST, which means you have four combinations. Either you pass or fail both, or you pass and fail one of them. In JMP, usually, there are different platforms that do TOST and T-test, but you will have a normal distribution for the difference. If you pass a test, then your mean difference is in that little bell curve. If not, it's outside. So it could quickly visualize which ones passed or not in an ANOVA. H ere they are, the ANOVAs. Let's step two of our visualization and clean process. We use a script to plot all of those together in a report so that we can look at all of them. If you think about the data set here, it was about 15 variables over 12 days. So you have over 150 such plots, which is a lot of data to look at, especially if you change things and plot them again. But here are some examples of what you might see. You might pass a T-test or fail it. You might pass a T-test, but only because you have enough layer that's pulling one of the data sets up or down, for example. There are many possible results that you would get here. Not everything is on this screenshot, but we're also looking at the variance in that report. The principal components comparisons, we don't do this here with the script. I use the graph builder to actually plot those because they are like the plots I had in a couple of slides previous. But the difference that you could see here is in the scale. Now, because we're talking about principal components, the scores tend to be around zero on average, and they vary between minus three and three because we normalize the data before carrying out the PCA. The advantage of this is that, now instead of having to provide an acceptable value for a task, because the data is normalized, we can actually blanket calculate that acceptable difference by taking a multiplier of the standard deviation for our scores. Here we have the first principal component and it clearly shows there's a big difference between the large and the small scale. Here with the second principal component, there is a smaller difference. This is typical of what we see, because the first principal component tends to capture the broader shape of the fermentation profile. So if there is a difference in that broader shape, the TOST for the first PC tends to fail. Typically, what I've seen is that for our data, two principal components could capture about 60 % of the information in the variables. For those of you who may have done PC on data before, that may seem a low number, but that's probably because all the variables have a different story to tell. Another thing I'd like to spend a little bit of time on is the loading' s plots. This is part of the PCA platform in JMP, and it has this plot at the very top of the platform, and it's a good one to look at if you're a scientist that's more interested in looking at what's really going on. But the reason why I have this on a slide here is because this is a good representation of what we are going to see in how much each variable contributes to the model that we're choosing before doing the equivalence test. Here, for example, all the variables related to viability for our fermentation are highly correlated because they are close together and the way they project onto the PC one and PC two here, they get high values. We said that those map well to the first two PCs. So that means that they are participating a lot to the model. Here we have some other variables that are closely clustered together here. So sodium, potassium and Glutamine. They are highly correlated. They map very well to the first PC, but not to the second. So they don't contribute a lot to the model with the second PC. Then here you have problematic variables. They do not map well to either PCs. That means that in a 2PC model, you are not going to capture the behavior for those variables. When you see this, you already know that a 2PC model is not going to give you a lot of equivalence for those variables. Last step is to actually plot the TOST. This is done again using a script. Those graphs are not the graphs that you would usually find in JMP, but they're pretty typical of TOST plots. For each PC for each day, we will have a TOST or equivalence test result. If the confidence interval is outside of the acceptable range, which is three times the standard deviation of the scores in this case, then we fail the TOST. When we fail the test, we give it a zero in the script. To summarize what happens here, each PC will capture a certain amount of the variability. Each PC can pass or fail a TOST. Furthermore, each variable contributes to a certain extent to each PC. A principal component is a linear combination of all the variables. Altogether, a variable that has a strong contribution to a principal component that passes a TOST will have a strong impact on overall equivalence between the batches. This is what we are trying to put together. How do we put this together? There are many ways we could put this together. I have done something pretty simple here. It's just a sum product of passing or failing a PC, times the contribution of the variable. Basically, for example, for two PCs here, we're failing the first equivalence test, and this viable, had a 40 % contribution, so that gets zero, plus one passing the TOST, times the contribution here, to that PC. That's the overall score for it. In black, you have the basic scores for each day here. Let's call the IEQ, you'll see that in the tables later. On average, we're getting about 70 %. It's not too bad over the course of the fermentation. Adding PCs doesn't make a very big difference because this mapped very well to the two PCs. The pH, which was one of the variables that did not map very well to the first two PCs gets a really bad average score, if you add only two PCs in the model, around 30 %. But if you add another four PCs, there are going to tell us this one, that number goes up to over 80 %. There is no bad or good numbers here, but it's something you need to keep in mind that it really depends on the model that you choose for running this. Moving on, this is the very last output from the script, and that's what we're really interested in, especially if we are comparing different processes or different ways to run it. You have a bar chart of all your individual equivalence s. This shows you really, by variable, which ones are similar from one scale to the other. Here we have three variables that are pretty similar amongst batches, and then it really drops down up to the last one here which has a very low equivalence. In the top right corner here, JMP will put an average if you ask for it. T hat's a good metric, although it's very reducting. It's a good metric to compare the same processes, if you're using different ways to run the TOST or different numbers of PCs. I have more slides about this, but I think it's better to run straight into the demo in the interest of time. I've put a little JMP journal together. Not very good at this. I hope it's going to work. In JMP, we'll just look at the data, the three scripts, and two different ways to run the last one. That's the anonymized data set here. My computer is very slow. There we go. I think it's working. It's just really sluggish. In this data set is the bare minimum that you need, a run type with two groups, a batch ID, which is a categorical variable despite being a number here, and the time ID in this case, it's over one recording a day, so it's over a number of days. The first bit we do is plot all the time series. I've left this with a few bits and bobs that are not really good, so that I could point them out. All the scripts in this group and there are three of them, are basically doing the same thing at the start, a bit of cleaning up and prepping, and it's going to create a clone of your data table to work on without breaking your data table, and also a directory to save all the outputs from the script. Then the scripts are basically looping here over the number of variables and plotting them one at a time and putting everything in a report. Let's run this. This is a very generalized script and it works well on all the data sets. Always nervous, things are not going to work because it's so slow. Here we go. It's still thinking. I'll just be patient and wait. This will plot this wrapped by batch time series and in overlaid time series as well. You'll have one of each for each variable. It will say variable one, the actual name of your variable here and plot them. This is what we want to see, basically. We have the same shape for all the batches and they are consistent across the scales. We'll move down to one that doesn't look as good. There we go. For this variable, we have the same shape for this scale ish, and then a very different shape for the small scale. We need to find out why this is happening and do we want to keep this variable in the model. In particular here, we have one batch that's very misbehaved. If you look at this in an overlay plot, it is very obvious that this average curve doesn't represent either the large scale or the small scale. This is a variable that you need to come back on. Variable 6, I think was in the presentation. This shows you where you have some outliers that you may have missed the time before. Then another thing you need to look for in those plots, is whether your small and large scale numbers are mingled together. If your red and blue points are all mixed together, then chances are your scales are pretty similar. But in some cases, like here, for example, the small scale data is almost always above the large scale data. So you can expect to see a difference here. Here I have less than variable 11, which is really, really bad. This happens quite often to us, when we have a difference in recording the variables. Here it was actually a different unit, and that's why we have very different numbers. Now, when you put the data from graphs in a report like this, unfortunately, you lose my favorite feature in JMP, which is interactivity. You can't actually highlight a point or a series of points and go see what they are doing in the table. But the script is saving all those individual plots for you. Here we were. In here, it's created a directory with all the plots that we've just seen, plus it saved that clone with the time series tagged at the end. In here, you can see... If I can actually use my mouse, you can see those time series one at a time. Then you can select the points that you would normally to use the interactivity in JMP. If you have several open at the same time, all of them will be highlighted in all your plots. That's it for time series. We'll move on to the second part of the process, and that's looking at all your ANOVA. This is in the Fit Y-by-X platform, and like the other script, it's doing a bit of tidying up at the start, and then it loops over days and creates a subset of the table for each day. Then there's a report on the differences between the groups. If you have written script before, you will see that this is pretty typical of writing a script in JMP. Some of it is written by hand, and a lot of it, the bulk of what's happening, I basically ran in JMP and copy pasted it into the script. We'll run this on this dirty data set. Hopefully, it doesn't take too much time. If you have a fast computer, I believe that you would not even see those windows actually open. It would be instant. You could see here how JMP is basically going to a number day and then taking a subset of that table, running a script, saving it to that little data table, and it's doing this for every day, and we have 12 days here, so it takes a while. This will also save everything in its own folder. In this case, we'll just look at one of the saved reports. Here you have a subset, all this is for day one, but it's a much smaller table. Here you have the report that you and your scientists would want to look at, which shows you all the T-tests. Now you can look at all those T-tests just to see if they pass. You could count how many pass and take a proportion of passing T-test. But this is also a good place for finding those more subtle outliers because each box plot might have some data that you want to question. Again, you would highlight those points and check whether you want to keep them in your final data set or not. Moving on again. We're finally at the last bit, which is probably the most interesting. That's all the PCAs. This is a much bigger script, because it has to fetch information from the JMP platforms. I don't have a lot of time for this, but I can answer questions at the end if you're interested. The other thing with this script is that I have hard coded some bits, so it needs to be modified for every data set. I need to fix that at some point. For example here, it's actually doing a principal component analysis on one of the days, so a subset of the data table. Then we switch to the PCA report, and this becomes an object in your JMP script. Then from this object here, you can get items. For example, I run the PCA and I have this as an object, and now I say, I want the eigenvalues in there. The way to find the objects that you need is to open the tree structure, in your JMP report and everything is numbered and aligned. So you can get everything that you need from the JMP report as a value, as a matrix, as an array. It really depends on what you want. But you could see I've done this here. So once it has all these values here, I extract the principal components and I fit again, Y-by-X the principal components versus my scales here. A gain, I switch to report. I'm doing this so that I can get the root mean square error from that report. That's because it's the best estimate I will have for my standard deviation. I'm using this standard deviation here to blanket calculate my acceptable difference for my TOST. I finally can actually run my TOST here. So again, that's another group. And this time it's feet wide by X, but I'm asking for an equivalence test with Delta as my acceptable difference. The rest of the script will plot all the tasks and it's very boring. Then at the end, it will create a table with all the outputs and all the things that we need to create our bar chart and eventually we could also create the bar chart. We'll run this for this data set. Just checking I have the right one. There we go. There's it. I'll click on it now. You could see it in the background here. It's upsetting the tables and it's doing this painfully slowly. For every day, it will select the day, make a smaller data table, do a PCA on all the variables, and then I will save the principal components, the eigenvalues and the cosines for further calculations. It will use the principal components for first doing a T-test, because that's where we're going to get our estimate of the standard deviation and second, do an equivalence test to check whether it passes equivalence. I think we're on day seven, we're going to get there eventually. It will also plot all our equivalence tests, and it will also create the bar chart and the new directory. Bre with my computer. Well, this is taking longer than it should, really. I hope it's going to work. Sometimes scripts that are quite busy, mean that it's hard for JMP to catch up with what's happening in the background. I hope it's not going to fail because of that. No, here we go. It's now created a report, and for each day, it's going to put each TOST in a column of graphs. I have written the script in such a way that they're all the same size and that was suggested by one of our scientists, actually, so they're much easier to compare. Here we had data that really needed some extra cleaning up, so it comes to no surprise that all our equivalence tests for the first principle component are failing. That's because the PCA is done on variables that are not similar between groups. But the more subtle behavior that's captured in a second PC is still passing a lot of the equivalence tests. I'll close this to show you what's been saved in the directory for this one. For this, you have individual subsetted table with their PCA and sub script. Even opening a small table like this is taking a long time. There we go. Here are the PCAs. Here's the loading plot. This is where the eigenvalues come from, and here the cosines which are pulled out by the script. It has the TOST results that's used for making the TOST graphs, but we've already seen those. It has a table that shows you which TOST passed with a zero or a one here, and the explained variance, and the calculations for the explained variance in the same table. T his columns here is what we're going to use to create our bar chart. The bar chart gets saved in the journal in this case. There are many ways you could do this, really. For 15 variables and not the best of cleanup jobs, let's see what equivalence we get here. It is all working. It's just really slow. Sorry about that. There we go. I've had, again, feedback from scientists saying that they would prefer to see the variables in the order they were in originally, because most of our data is recorded in templates, so people are used to seeing those variables in order. But it's also nice to have it in descending order so that you can quickly see which variables are quite equivalent and which ones are not doing so well. Here on average, we have 21 % equivalence across all our variables. It's not a very high number. I don't have a criteria for that number, but I think around 60 %- 75 % would be quite desirable. I'll close everything I can to make some space. We'll go back to see what happens if we remove one offensive variable. I haven't done enough cleaning up here, but I'm removing variable 11, which was really not an acceptable variable to have in our data set. I will run the task with three PCs this time, so that I can at least have a shot at capturing the variability in things like pH or PO2, which tend to be much more complex. We'll run this one and we'll have a look at the bar chart and see how much equivalence we can capture. I suspect this is going to be slow again. This is going slowly. We're only on day two, so I need to fill up the time. As I said, we don't have a criterion to look at this total number. It's more of a relative number. Either you have a set of criteria for cleaning up your data, or maybe because you are running batches and recording them in similar ways, you would say, we will always only look at those 10 variables, and then you can compare the overall equivalence or the bar charts for given sets of variables that are comparable. The other way you could do it is using the same data sets like I have today. I know we have 21 % equivalence for 15 variables, but once we remove variables 11 and five, for example, and clean up some of the outliers, then that number starts going up, or it could be we have only 21 % with two PCs, but if we add a couple because some of the variables don't map very well to the first two PCs, then this number also is going up. It's very difficult to put a criterion on that number, but it's pretty good for comparing different models or different data sets that have been treated reasonably similarly. How are we doing here? Almost there. I'm very sorry about this. My computer is particularly slow today. Here we go. Here are tasks, and this time there are three PCs, so they're aligned by three. I think if we did this bigger, it would start sticking out of the window here. Because we have removed one variable already, we could see that some of the tasks are passing even for the first PC. So that's definitely made a very big difference. I will close those and go back into the directory it was created. The way I've written this, if I'm doing two data sets in the same directory, it's going to get erased because Save As in script in JMP will save on top of existing data if it has the same name. Here was the same data, we just removed one variable and added one PC, and we got from 21 % to about 47 % on average across the variable equivalence. That's showing you what a big difference it can make from just a small cleaning step or choosing a slightly different model with one more PC in this case. Now it's me. I've gone through all the scripts. I'll put back my very last slide up here to conclude. This is a new technique to look at equivalence, this multivariate technique. I haven't seen it used somewhere else. It's a complement, not a replacement. You should still, especially if you're heavily involved with the data, you should still looking at all the time points that you're interested in. It gives a holistic picture with a lot of detail because you have a lot of output. But if you're only interested in the final information, really that bar chart, gives you a lot of information in just one graph. You could do this with any types of groups that you want to. This happens to be scales because we look at the difference between manufacturing and lab scales a lot at Fujifilm. That's it, really. It's your multivariate to one sided T-test. As a part of our process flow to look at scale up and scale down data. I'd be happy to take any questions.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Despite the development of new network and media technologies, the intense use of bandwidth and data storage could be a limiting factor in industrial applications. When recording sensor signals from multiple machines, a question must always be asked: which meaningful information could be extracted from the data and what should be saved for later analysis? The answer to this question is a method proposed and implemented by the Production Data Engineering Team at Bundesdruckerei GmbH in Berlin, a wholly-owned subsidiary of the German federal government that produces security documents and digital solutions. This method focuses on pre-processing data directly in the machine controller, strategically reducing the amount of data to send only the meaningful information to the network over OPC/UA, stored in the database, and further analyzed using JMP. A case study is presented, describing the implementation of this method in torque and position data from a servomotor used in a cutting process. The JMP Scripting Language is used to automatically generate reports of the cutting tool wearing, which is also analyzed in combination with the quality data of the product. Those reports allow the Production Engineers to understand the machines better and strategically plan tool changes. Hi, I'm Günes Pekmezci and my colleague, Luis Furtado. We are working at Bundesdruckerei, and we are both engineers in production department for the data team. Today, we would like to present you a method to strategically process data from industrial processes before analysis and storage. I would like to, first of all, tell a little bit more about our company. Bundesdruckerei is a government- owned company that produces security documents and digital solutions. We are getting bigger and bigger every day. Right now, we have 3,500 employees. We continue to grow. These figures are from 2021. In that year, we also had a sales margin of €774 million. We have over 4,200 patents. Most profits we are earning is coming from German ID systems, which I will talk about it a little bit more in further slides. Then we have also secure digitization solutions as a bigger profit bringer for us. If we look at the target markets and our customers, we will see, like I said, the official ID documents first. This means that we are physically and digitally producing official identity documents like ID documents, passports, resident permits, and this is our biggest market. Then we also have some security documents, which means that we are producing bank notes, postage stamps, text stamps, and pertinent security features for the government. On top of that, we have a growing department for eG overnment. Here we are creating solutions for the authorities, mostly German state authorities, to digitalize their public administration systems. We also have high security solutions. In this department, we are creating higher security required solutions for the security authorities and organizations. We are also having a target market in the health industry. We are creating products here and also systems for secure and trusted digitalized health systems. Other than that, we also are active in the finance field. Here we are creating products and systems to control and secure financial transactions, both in public and also enterprise sector, which also could be taxes, banks , insurance, et cetera. If we come to our use cases, what we want to share with you today, we are going to tell you about a use case that we decided to implement for us, for predictive maintenance. Like every other company, our aim was to create some use cases for new digital area. We thought about what could be analyzing for big data, predictive maintenance, and things like that. We decided also starting with our biggest document that the German passport . This document is very, very complex, and it has a lifetime for 10 years. We have a high production rate also here, and we decided to create a predictive maintenance use case for a process in this document. Our process is punching process. It was a good process for us because we have a good understanding in this process and also which is very, very important in industrial of things that we had the access to the data that we could analyze to create our information. Our objective for this use case was to create a better product quality by making a predictive maintenance for our tool wear state. Instead of having the tool worn out we react to it. We decided to look at the data and create an information that will allow us to plan our tool change time. We can also minimize our downtime, minimize our scrape rates. We could also use this use case in different machines, use this as a long-term behavior of the process. It was a really good use case for us to start with. I will give the rest to Luis to explain you further how we go into this use case and what we did exactly, what were our challenges, and how we find solutions for that. Thank you, guys. I'm going to present a bit more about our product and process. In the case of product that we are analyzing this study is the passport. The passport, when you think about this, is a book. It's like a sandwich full of pages, and those pages has also a lot of security features like the picture that is printed, the data that is lasered. There's also the chip, the antenna from the chip. There's also h olography layers. There's several features for security that is inside of the German passport. When you make the sandwich, there's a lot of machines also to bring all those features to the product. When you make the sandwich, you need to cut it in the right size according to the norm. When you cut it, we separate the finish book and also the borders that we don't need anymore. The point is this cutting process, we use a punching machine . This tool that is installed at the end of this punching machine, also wears with the time, and the quality of the cut also starts to be not very good at the end as it was in the beginning. What we are trying here in this project is how to make the assurance, what's the perfect time to change the tool and that with that term. Here's a picture of the end product, the passport . Here the borders that were cut. I'm going to present a bit more the tales of a sketch of the machine, how that works, and what was the original idea. But first, we have our original architecture of implementation of the data. We have a machine with several sensors, sensor number 1, 2, up to any sensors that we need to measure. We bring all the sensors to the machine PLC, that is the controller of the machine, and then you just mirror this data to the master computer, and you mirror this data again to the database. That was the first original implementation that we had. The database will have a lot of data, and then it starts to make the analysis of the machine and try to understand what is happening in the machine, and in this case, what is happening in the punching tool that is cutting the passport . When you think about the sketch of this machine, so we have this servomotor that turns this wheel . With the mechanical linkage here, we can move up and down the punching tool . At the end, we have the tool, the cutting tool that has exactly the end shape that we need. This tool here, with the time, we see it wearing. It's not that sharp anymore, and then we start to have a not good quality in the product that we are producing. Then you need to change the tool to make it sharp again. Good. How you can be sure that this tool is good to cut? We measured the position of the servomotor, and we measure also the torque of the servomotor, and we bring all the data from the position and the torque to the controller, then you get, as I presented in the previous slide, we mirror the data to the master computer, and then you mirror the data to the database. In this industrial controller, it's not continuous. The curve is not continuous like here, but it's discrete. In the end, you need to think about a measurement of every CPU cycle or the clock tick of the CPU. In this case, we get all this data and it transferred to the master computer. Then we make the analysis from the database. But the point is we realized that using OPC UA, not all 100 % of the data comes. This is a scenario that everything is fine. We have all the points inside of the server, inside of the database, but sometimes we have missing areas. We have like a lacon that data is not coming. We realized that we have only 95 % of the data. 5 % of this data is lost when you have a CPU cycle of 100 hours. Well, this loss could be in the point that we are not measuring but could be exactly the point when you have the peak . When you miss data like here and you miss data like here, we compromise our measurement of the tool. Even with only that, you have a data loss of 5 %, and then you have not 100 % of the data, you have 95 % of the data coming to the storage, but 95 % of the data coming from storage for all the sensors that we have in a machine, for all the machines that you have in the production process, it's a lot of data. Then you start to realize that after a year, we have a lot of database storage amount, and this is something that you want to reduce. With this original implementation, we still have that missing data, normally missing data in the points that we need to measure. Then you had open questions about this implementation. The first question is, is that possible to measure this tool in a reliable way using the motor torque? The other one is how to reduce the amount of data that you're sending to the database? Good. Then the first idea that we had, we decided, "Okay, we won't check the data from the database. We're going to collect the data directly on the machine with a different method that we won't lose the data. 100 % of the data come to the computer because you're measuring exactly in the machine controller. Let's do this experiment a lot of times for different sets of the machine. Let's see if the curve has the same form and if this curve changes a bit in amplitude when you change the scenario in the machine. At the end, you had four scenarios, and you're doing this extensively this test in the machine, and you realize this is the result of this experiment. We tried old and worn tool, so the tool was not that sharp anymore. We had a passport with 32 pages. We have two products. It's a passport with 32 pages and the passport with 48 pages. The client can order quarterly to, "Okay, if you're going to travel too much, then order 48 pages." We tried with old and worn tool, 32 pages. We tried with old and worn tool with 48 pages. Then we changed the tool for a new one, and we repeat this experiment with the new and sharp tool for 32 and new and sharp tool for 48- page product. This is the result of the curves. We realized that all the curves has the same shape, and this is a superposition of a lot of curves that we tried, and the variation is very small. But we can also see that we can see very clearly the peak value for the old two 48 pages is a bit far from the old two 32- page. Also, new tool , the peak value is shorter because you have less force to cut. This is about the torque in the motor. When you have less force to cut because it was sharp, then you have an even lower amount of torque. Good. With this, we got some information. All the scenarios present the same shape of curve. The curve is in the same shape, and we realize that "Okay, then I don't need to record all the curve. I can also only record the position of the peak." This is what is interesting for us for this new implementation that we are proposing here. The peak value could be used for two different things. The peak value could be used for the tool wear monitoring. That is the original idea that we wanted. Another thing that for us is also important is product classification. You can also check the quality of the product if you are producing a 32 or 48 page is a safe way to say the product has 32 or 48 page. Good. Then what is the difference? The difference is the implementation directly in the controller of the machine. The whole sketch of this machine is the same. Then you get the data inside of the controller in the same way. But what we made here different. We preprocessed the data, we filter, we made a window here . In this window, we search for the peak. When we find the peak, we get the peak of the torque and in which position of the motor this peak happened. Then we just transfer one set of data, not the whole curve of the machine. How that works in the end? The original implementation that you saw, per sensor in a machine, we had every year, 11.7 gigabytes per sensor. That was quite a lot. When you think that we have several hundred, almost thousands of sensors in a machine, and we have more machines in our production area, this is something very critical for us. With this proposed implementation, we have everything very similar. The sensors go to the machine. But inside of the machine, we do a preprocessing. We filter just the meaningful information that you need, and then it transfer less data to the master computer, and then it transfer less data to the database . It made our analysis just with this less amount of data but the meaningful one. In this case, it reduced more than a hundred thousand... No, a thousand times less. Now, it's 8 megabytes per year per sensor. This is a good implementation. This was implemented in JMP and JMP Live. I'm going to give the word back to Günes , so she could keep explaining the next steps, what we did afterwards. Thank you, Luis. How we generated information in JMP with this analysis is like everyone else. We started analyzing our data in the JMP first, and it was easy to analyze also our huge data sets, like 20 million data sets in JMP. But then when we decided to get just the peak values, we were able to create our reports also very lighter and very informational. Then we decided, okay, when it's so good, then we decided to send our results to the JMP Live. Right now in JMP Live, we have the following reports, and it is generated automatically every week. There is a meeting every week for the machine colleagues, and they look at this report to decide when is the time to change the tool. Here you can see different machines. We have six machines of this kind. Then you can see our peak value for the torque, and then you see the development through the weeks. Here you can also see when we have a tool change in machine 1 and 2, you c ould automatically see next week the values of the peak starting again from a lower point of view, which Luis already explained why is it happening. This is our JMP Live report that we create our planned change time for the tool. If we go to the method that we are proposing... I want to tell you again how we started going toward this use case. We started, like every other use case, first of all, defining our project requirements. Then we took all the data, like many of the other industries also trying to do in industrial of things. We said, "Okay, we need all the data." We tried to take all the signals from the machine. We analyzed it somewhere different. Then we looked at the data and we said, "Okay, is this good enough for our quality of the information? Does it meet our project requirements?" It wasn't meeting our project requirements because of this missing data. With the missing data, we weren't able to see the right data to have the relevant information. Then we said, "Okay, let's go to the machine and understand the process a little bit better. Why is this happening? What can we do about it?" Then we started doing these experiments that Luis explained on the machine directly, and we collected the data locally. Then we come back to our analyze process, and then we said, "Yeah, now the data is good, the quality is good." Now, we also ask the question, "Okay, is this all the relevant data? Is there a way to reduce the storage without reducing the data quality?" Then we decided to implement this preprocessing algorithm directly at the machine to reduce the size of the data. What we are suggesting for you, too, is when you start a use case for the production processes, after defining your project requirements, it is better directly go to the machine and start doing experiments there, and then collect the data locally. When you first do this step, you will spare yourself a lot of time to create the architecture to be able to get all these data somewhere else. Also, you will spare yourself lots of money because you maybe don't need that much space in your servers and et cetera. If you start directly here, you can go all the other steps, and then you will be able to get a result, a use case that works the best, and you will have less time for that. If we summarize our lessons learned and benefits for the use case, we can definitely say an application-oriented approach is very good implementing use cases for production. You really need a deep process and machine understanding for the industrial of things use cases. It will definitely will be better for you if you create a team of engineers, people who are working at the machines, and also the data people together, because you need a really deep understanding of what's happening, what you exactly need to be able to get a benefit out of it. Our personal benefits for this specific use case was to create a method that we can use for other machines and processes, which we are also sharing for you today, and hoping that you can also use it for your processes. Then also this method that we created for us was able to use in other machines and other punching processes that other machines have. Also, we had a really good knowledge at the end of this use case about the tool wear state for us. We could also increase our downtime because instead of waiting for a tool to be worn out, we were able to plan our downtime. That means automatically that we were also decreasing our costs. On top of it, we were also able to use this method and this analysis for a long-time behavior of our tools, which also a great thing because at the end, we were able to have a predictive maintenance use case. A s a cherry on the top, we were able to reduce our data storage needs significantly. In today's world where we talk about the energy, it's very important to have just the relevant data in our servers because it's more sustainable, it's more energy efficient. We were really happy with our results, and we are hoping also you will get some inspiration out of our method, and maybe you'll be able to use it for yourselves. Thank you for your attention, and this was our method. Have a nice day.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Siltronic AG is a global technology leader in the semiconductor wafer industry. This presentation will introduce the Siltronic AG approach to preparing batch process data for modeling with JMP Pro. It will demonstrate some interactive steps to clean and rearrange the dataset before modeling using an anonymized dataset containing both historical and experimental batch data. Once the best model algorithm is found, the boosted tree model will be tuned. The Siltronic AG team found that a technically sound model may be physically worthless, meaning it had been overfitted. Therefore, the team started with a large set of factors, gradually reducing the factor list and testing the model's behavior to find the most effective factors (step backward strategy for a boosted tree in a small JSL routine). The last step provided the best insight into which levers are the strongest to optimize the process. Hello, everyone. Thanks for joining in. In this talk, I want to talk about how we did prepare our batch process data for modeling with JMP Pro and gaining valuable insights with a team approach. My presentation is detailed in a PowerPoint part. That is the first part and the details all here shown will follow in JMP, like how the data set looks, which platforms I have used like missing data, multi collinearity, functional data explorer, predictor screening, modeling batch data with Boosted Tree and Profiler. Summarized data will be analyzed by Boosted Tree as well, and then by a script with Boosted Tree backward selection. At first, my company, Siltronic, has world- class production sites all over the world like shown here and about 4,000 employees. Here are some key figures. If you imagine that we have a complex process flows like shown here with silicon mold in a crucible. Silicon ingot is created here. That's my special task. To make processes for growing silicon ingots. Then the ingot is ground and sliced. Edge rounding is done for the wafers, laser marking, lapping, cleaning, etching, polishing, and maybe epitaxy for the final wafer to be created. Our portfolio is that we are selling 300 millimeter, 200 millimeter and smaller diameter wafers for different applications like shown here, silicon wafers with several specifications. About me, my education is I'm an electrical engineer and I did some Six Sigma education, and my main task is to develop processes for growing silicon crystals like shown here, and I'm as well responsible for around 500 users at Seltronic JMP users. How does the task look like? What we see here is the final table, but it has been created and this takes a lot of effort as well. So there are some database queries behind to get this data from database. We fetched the results into JMP data tables and enlarged the data set with archives from earlier date and enriched some information like details of experiments and details on consumables and wrote some script for graphs and evaluations. T hen we have done the modeling tasks and of course, looked for missing data correlations to see what are the most significant effects and to do feature engineering to see which features are important to generate an optimal result. At this point, I will switch into JMP then. We can see here my journal I'm working with and the JMP main window and the abstract is seen here. We will start with technical hints. The use data set I show here is fully anonymized and standardized, and all identifiers are generic for better understanding what are the features, what is the result, and so on. The aim of this presentation is to show all the steps we needed for getting an overview, restructuring, and understanding the data set, and how to build the models to get some insights of the content of the data set. I will show some results that we have discussed in a team. The team is very important here because the team drove a lot of discussion and work as well, how to analyze and what features may be interesting and what should not be, and what may be the physics behind. I will start with the data set here. It's also a part of the contribution in the community. Here it is opened and I will change the design a little bit to see how it looks like. We have around 80,000 rows in this data set and it's a batch data set, so we have a batch ID. T his data set is quite challenging because it has a mixture of historical data like here, POR batches. We can see here that we have… Most of the data is historical data, and there are only a few special experiments shown here. We have several features then, like one categorical, it's consumable. We have the batch maturity, it's the time, also standardized. Then we have several features. So these X values here, we have one result column, and to reduce the noise a little bit, we have calculated a new moving average as well. Let's have a look at how the data set looks more in detail. We can see here, if we do a summary on the data like this, you can do this from the table s menu as well. Summary. We get around 500 rows, 500 batches. This is a summary by batch, and we see that there is no variation in the parameters X 1 to X 4, meaning that they are constant for each batch, and the others are changing at different rates. To have a look how the data looks at all, we can see here the result parameter like yield, a long time for all the rows of the batch data set, and this smoothing is done by JMP Graph Builder platform. We implemented this as a formula as it is available as a function in JMP. We can have a look here at some special batches as well. If we use the local data filter and see here how the average works and what noise is in the single data points, the blue ones are the original data of yield, and the orange one are the moving average. I will close this then, and next point may be to look at how much data is missing. So we have this in JMP as well. We can mark all the columns and do the missing data pattern platform like this. It will show us that from about 80,000 rows, we have 178 rows with some missing data in one column. This can also be shown here as a graph. This is very important, at least for the data creation steps, it was important to see where is some data missing and to fix this missing data then as much as possible. Another step to look at the data, maybe Columns Viewer. We can get here, I put all the columns in, and here we can see again like we had before, there is some rows missing for parameter X2. We can see what the min, max, mean, standard deviations, and so are for all the parameters we can see here. Here we nicely see that everything is standardized. For the yield, it's between zero and 100. We can as well start from here, distribution platform. So all the columns are marked and we get by only one click for all the data, the distributions. We can see here what consumables are used how often that we have most data from historic processes and only some of some experiments with special settings. The time, of course, looks nicely distributed, but the others don't look that nicely. So there is a lot of room between some settings, and it's sparsely distributed, non- normal distributed for the most parameters, and that makes it even more challenging to analyze this data. We go to the next steps. I will close these reports. Then we may look even more in detail on some things like how the parameters are correlated. We can see this in the multivariate platform. It needs some time to be calculated. You will find it here under the analysis menu, multivariate. It takes the numeric columns and generates this correlation report, and you will see that the parameters like X6 and X5 are highly correlated. This makes it difficult, like X10 and X9 as well, makes it difficult to do feature engineering. What we want to know from the analysis is which parameter causes some yield drop. If two parameters are correlated, it's not so easy to detect or to find out which one is the responsible one. Here in the scatter plot matrix, you can see as well which parameters change with time, like X1, X2, up to X4 is constant over time, and the others are changing and how they are distributed, and you can nicely mark some rows like here. They are selected in the data table then and see how the curves are for each parameter over time or which parameter over what parameter combination looks like. Next, I want to use the functional data explorer. The functional data explorer allows us to fit curves for each batch and extracts the features of each curve. Then we can have a look at which batches behave similar or maybe extreme ones. So the start is like this. We can have a look at how I started the analysis. We launch analysis. I put time as an X parameter, Yield as the output parameter Y, and the ID function is the Batch ID. Then we have here some informal parts like Part and Group. This platform is available in JMP Pro only, and when we start it, we can do some data processing here. But in this case, it's not necessary. We can have a look at each batch, how it looks. So there are a lot of graphs here like this. We can mark the rows. We can see here the marked rows and the data table as well. To go on with this platform, we need to make some models like P-splines for each batch and JMP does this and defines itself which splines are used and how many supporting functions are needed, like the knots shown here. So the best result is given with a cubic spline with 20 knots. You can see how each batch is modeled here by the red line shown here and how it looks. We have here the shape functions. So each curve is added together by a combination of shape functions, and we get for each batch the coefficient for each shape function. If we are looking at Shape Function 1… This is the main behavior of all batches with a drop here at around 0.7. We can see that here we have Component 1. This is a coefficient for the shape function one. If we select these batches, we will see that they have a pronounced shape like Shape Function 1. We can see it here. We can as well use the Profiler. So this is mostly for understanding the data but we have not used it for further analysis because we did not really need the information how the back batch looks as a shape for each curve. We were more interested in average yield of each batch because we cannot define only to use the first part of the batch and forget about the second part. This would not work in our case. As well to see again how this works together, we can have a look in Graph Builder. The graph of some batches we have seen just before. Maybe you see this number here. We have seen it before. Here it is shown again together with the moving average of yield. Next step would be to start modeling of the batch data. When doing modeling, it may be interesting to see or have an idea which parameters are most important for the variability of the output. There we have the predictor screening platform. You can as well start it from here. Analysis and predictor screening. I wrote it here as a script simply to start it by pressing a button. When doing so, we will see some Bootstrap Forest analysis going on, and it shows us the importance of the features we have in our data set. Time is the most important, but this is useless at the end for us because we need to use the full batch. T hen comes X1, then comes part X8, X5, and so on. So here we could as well select a few rows, copy them and put them into a model. I will stop this here, and to see which model works best, I used model screening platform. I will not run it here because it takes several minutes. But we have seen that Boosted Tree platform may perform well. There is maybe not so a big difference between the next ones, but that's the reason why I used Boosted Tree platform. Then we will run Boosted Tree like this on the batch data, and it works quite quick. We will see the result, and a nice feature of the Boosted Tree platform as well is that you have column contributions so that you can nicely do some feature engineering. We can see here that we have 71 % R square for training and 66 for validation, may be okay, and we have still all features in. But we are interested mostly in which features are reliably the most important ones. When doing this, we can save it as a column, save prediction formula in the data table. We see in the data table we have a formula now and we can use it. We can maybe have a look at how the model performs or to use it in Graph Builder simply to see how the model data looks like over the batch maturity. I hope the Graph Builder to show the graph the graph soon, and here it comes. Yes. We have seen that this modeling works quite well, so we have a formula now to rebuild the data and we can maybe work with it. But especially for the batch data modeling, we have a problem here that validation will not work because we may have here for some batch, these rows in training set and the rows next to it in validation set. So they are not well separated for the features that control the batch then. Additionally, the model is not very stable, so we will get a different result. For our different runs of the model, this is known from tree- based methods that they may give different results for high variability data. If we do something like running Boosted Tree twice, we get also different column contributions and maybe different order like we can see here for the part and X 5 are switched here for these two runs, and I will show it again also here. If we run Boosted Tree twice. Don't know. Yes, here I should have the script. It comes later. So at this point, we have said that it may be better to model the summarized data because we need to use the full batch. Here I have a script now to summarize the data in a form that we have only one row for each batch. There is a nice feature statistics column name format that we get the same columns for the summarized data as we have in the original table that we can use the same scripts for both. Doing so, we get here the summary data table— I can close the script— with around 500 batches. It's a lot of easier to model, and here I have summarized the data for 0.6 to 0.8 time. So it's where the yield drop was, and we can again do here some predictor screening like this and see that I still have time in that data set to see… It's more like to see what level noise is, and it's around these parameters that are likely also noise for the model then. Then we can, of course, do some model comparison. So I selected a few parameter that we found to be responsible, most probably, and I'm doing two Boosted Tree analysis and then do some model comparison for both. It looks like this. We can see here we get a Profiler and compare the result for different settings, maybe like this. Here we have still the problem. We see some features like this here for X10 in this model and not in that model. So it likely seems to be noise. At the beginning, we have discussed a lot about these differences. We have seen sometimes and sometimes not, and asked the question, what is true, what is physical, and what is not. That brought me to the step then that we need to continue with feature selection, and that's why we created this script. It takes this summary data and has been done for the batch data as well. For each step, it builds Boost ed Tree model for the full parameter set, saves the model into the formula depot, and saves the model performance R square and so into a data table, and shows us the column contribution. Here we can see something that we have seen often that with the higher number model, it is the model with less parameters like we can see here with the column contributions, we have the best result. It looks different for each run, but the tendency we see in most times. So here we can see for more or less sure that part and X1 and X5 are the most important parameters. This one may be there sometime and maybe not, so we will focus on these three parameters. As well, we can have a look in the Formula Depot. There we can start model comparison. We maybe can compare the first model. So we do it like this, model comparison. This is our data table. Take the first number. The numbers here are shifted by one, and maybe the 5th should be that one, and the last one. This will not work. I think it's number three, and the last one. The ones with the highest validation score and maybe compare them here. We see the model comparison dialog. We see that the last model is between the best models we could fit at all, and can see here the Profiler, for example, and as well, may use extrapolation control. We have seen that we have sparse data, so not behind every point. There is some data. Let's look where it is. Here it is. Extrapolation control warning on. So it shows us when there is no data between the points. Here we can maybe compare the models and we see here that there is no variability on the X factors that haven't been used here. To sum up, let's close some tables first and some dialogs. To sum up, we have prepared a workflow for modeling this data and have done several steps and additional script to enhance understanding and to drive the discussion about what's important and what's not. I have a proposal for a model and some tasks we can focus on to improve the year yield of our process, and you will find the data and the presentation in the user community. If you have other ideas how to explore this data set and how to find the final best model, you can contact me or post something on my contribution in the in the community for this Discovery Summit. Thanks for your attention and bye. That's it, Martin.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

A client working on blending his wine wanted to get as close as possible to a known wine with different ingredients. To help the client do this, we created a mixture design with four ingredients for 16 samples to compare with the known blend. The samples were tasted randomly, and the panel was asked to create groups of similar samples and describe the group. The result was a distance matrix developed with the help of a JMP script. This matrix was processed by a multidimensional scaling to obtain a map that was easy to describe to the panel. A K-means classification was used to find the samples close to the target. Finally, the distance between the target and the other sample was calculated and represented by a contour plot to show the best part of the mixture design. The terms used by the taster to describe the groups were processed by JMP Text Explorer and then by AFCM to show a map with samples and terms to better describe each sample's position using sensory properties. Good morning or good afternoon for all of you. I'm Margaux Renaud and today we will talk about the mixing plan of wine blending and the testing of these modalities to validate the receipt of a wine. First of all, I would like to present my company. I'm working for Chêne Company, which is a group of cooperage. I t owns a French cooperage, Taransaud, which is make barrels and vats from French oak. An American cooperage, Canton, and Kádár Cooperage, and [inaudible 00:00:42] , which make oak wood sticks and chips, XtraChên. The French cooperage and the R&D department are based in the Bordeaux area in France. In our R&D department, we have 8% with different background. We have a PhD in chemistry and enology, engineer, agronomist, enologist, technician. W ith all these various skills, we have a lot of different trials. From the forest, for example, we can do trials about DNA of the oak in the forests. On the aging wood, we thought to do the barrel or the vats in relation with the climate change. Two, the analysis on our client wine in our barrel directly in the wine. Today, I would like to talk with you about a mixing plan for a wine blending. It's a client trial and I will present you the problematic of the client. In this case, the client has different wine and one of them, you want to keep in the same wine style, but you want to optimize the ingredients. In the wine industry what we call ingredients it's very different, very diverse. It can be different variety of wine. In Bordeaux area, we used to mix Merlot and Cabernet, for example. It can be different quality of wine, different type of aging. If the wine is aging in barrel or in tanks with oak chips or without. I n this case, the client has four different ingredient to mix. Our team follow the different ingredient during all the wine aging, and at the end, we have created a mixing plan and we taste it. Just before to go on JMP to present you the way to process data, I just want to talk a bit about the wine testing. An important thing in the wine industry is all the wine recipe is decided by testing. We do a lot of analysis, but it's not the last point of a recipe. It's always the tasting. The usual way to taste wine it's a quantitative tasting. We do a profile with grade on different descriptor. It could be bitterness, for example, or the fruity notes or the woody notes. All the taster are writing the intensity of the descriptor. Then we process the data with an ANOVA, a two- factor ANOVA. The first factor is the modalities, the different modalities in the trial, and the tester. To have really significant results you need to have a large and trained panel for your testing. In our case, when we do a trial with a client or in our group, we have different different type of taster. Most of the time you have the winery team, some part of the commercial team, and some part of the R&D team. A ll this taster doesn't taste the wine in the same way. They don't have the same target when they taste the wine. The target for the client is not the same as the commercial part and it's not the same for us. Most of the time we are not trained to taste the wine in the same way. When we analyzed the data, there is a really big effect of the taster. In fact, the taster have not the same feeling about the profile asking. For us, it's complicated to using the profile, so we decide to use another type of testing, the free sorting. The free sorting it's a testing when I asked my taster to test the different modalities and to make group groups inside them. I put a little example on the PowerPoint. In this case, I asked the taster to make groups if the wine is similar and if there is difference between the two modalities, they put them in two different groups. In this case, for example, there is 11 samples, and the taster decides to make four groups. A first one with four samples, a second one with three, another one with three of the sample, and the last one with an only glass of wine. I ask them after making group to describe a bit the group. In this case, the taster decide to put together for sample because they have some chestnut not present in the other samples. I n this case, we don't need to have a trained panel, so if there is enough big difference between my modalities, normally all the taster will put together the wine, the wine really close and put separately the other wine. This type of tasting is really easy to use for us because we don't need a trained panel. We can have a small panel too. It can be used in different language. It doesn't matter if we have a French panel or an Italian panel, for example. They just have to do groups. The other thing, thanks to JMP, it's easy to present the result right after the testing. When you do a profile, you have to process data making the ANOVA test, and send the result to the client. Most of the time it takes a few days or a few weeks if you are really late. With the free sorting, we can , and thanks to JMP, present the result right after. This type of testing will create a distance matrix between all the sample. In fact, if you put samples in the same group, there is no distance between them. If you put them in two other group, there is a distance of one between them. At the end, you can make a matrix distance between all the samples. It's what I do with JMP. I will show you just after. Okay, I will switch on JMP. To process this data, I'm using a project. I'm using several data tables and it's easier for me to put them in the same place. Before to go on the testing result, I just want to talk a bit about my mixing plan. I told you that my client has four ingredients. Unfortunately, I didn't make the mixing table with JMP. Because when I began to work on the mixing plan, I was not really confident enough with JMP to do it on it. The client gave us a lot of rules in this mixing plan, a bit complicated. So we decided to make it by hand and to treat the rest of the result with JMP. Just to show you, this is my mixing plan. I have a code for all of my samples and the ingredients one, two, three, four, and the proportion of each one in these samples. There is just few information. For my ingredients, there is a minimum and maximum proportion. The important thing is the ingredients works two by two. The ingredients one and two are working together. In fact, the ingredient one plus the ingredient two is always equal to 14 % of the blending. Exactly the same for three and four. The addition of these two is always equal to 86 % of the sample. That's few words given by the client. T hanks to that we did a mixing plan with 16 samples and a target. The target is the historical recipe of the winery, the typical wine. The client wants the other ingredients to be closer than the historical wine. You can see here the mixing plan. It's why I explained just earlier, they're working to pay two. Okay, this is the mixing plan. We created, we're blending the samples, and we did the tasting with the client. This is my results data table. It's in fact, very easy. I have a first column with my sample in the wine testing. Most of the time you have to test without knowing which is the modality in your glass. To do that, to recreate a random number, sorry, a random number of three digits like that you can't know which sample is it. I put it on my first column and after that I have one column by tester. In this case, I have five testers. On each column, I put the group where the sample has been put. Just to show you with the distribution we can see for the tester one in the group three, for example, just for the tester one. He put in the group three the sample 4 74, 486 and 910. It's the same for all the samples. I'm not sure I said... Yes, I told you that at the beginning. I asked to my tester to describe the group with few words. When I do a testing with my clients, I don't write on my result data table, group six, group one. I'm writing directly the descriptor, the term used by the tester to describe the proof. I will explain you why a bit later. I have this data table. To have it, most of the time, I ask to my tester to put the result on Excel file on a tablet like that. He put directly all the results on the file and I just have to open it after with JMP. I need another the data table, which is called NUMMOD. You can see that the first column is my random number and the second one is the modalities. Y ou can see what modality is behind the number given. Then the other column is the description of each modalities. In this case, it's the proportion of each ingredient. I need these two data table and I need a script. To process the data directly after the testing, I have created a script. For this script, I have to thank a lot the JMP communities because they helped me a lot to do this really complicated part. In fact, this script helped me to create the distance matrix just with the data result I show you earlier. In this case, this, I will not explain all the line because it's a bit complicated, but I will show you how I'm using it. I'm just checking I am on the right data table and I'm running the script. I can save the results. Thanks to the project, I can save the result directly inside the folder result. Yes, the folder result. Directly, I can have my distance matrix. You can see I have still my sample number in the first column. Then all the samples in column and the distance with all the other samples, so for the 001, it's the same sample, so it's 0. Then you have the distance with the other samples. In the script, I have also joined the information from my data table in the map, so I can add the modalities and the ingredient proportion in the same data table. The best way to show the result is to create a map. To show the map, I'm using a multivariate method and precisely the multi dimensional scaling. In this case, I will put in column my distance matrix. I didn't show you, but I have grouped directly all my matrix, it's also in the script. Like that I just have to select this group of columns to put inside the process. I add my distance matrix on it. I'm running it and I can have this map. I can see all my sample, the 16 plus the target. I don't know which one is it. What we can see is some samples are really close. For example, the 246 and the 592 are really close. They look really similar for all the taster. Not the same because they are not on the same point. There's a little distance between them, but really close. At the opposite, the 246 and the 661 are really far away from each other. They look really different. At this point when I present the results to my panel, I begin to show which sample is it. We can talk about if all the tester are agree with the map. I f they say, okay, I can find my group on this one. We can talk about that and I show which sample is it. For that, I have just to label the modality. I go back on my map and you can see there is the code of each sample of the mixing plan and most important, the target. You can see the original recipe is here. W e can say that there is some sample really close from this one. I think this one should be interesting to use with all the ingredients to keeping the same wine style of the target. To be sure of that, I will do clustering to ask to JMP to show me which sample are really close from each other. For that, I'm doing a clustering and more precisely a [inaudible 00:19:04] cluster. A s I did for the multi dimensional scaling, I'm using the distance matrix as [inaudible 00:19:15], sorry. I'm running it. Usually, I'm testing three, four, or five cluster because I know in my testing it's more or less the number of group usually. In this one, I already know that three cluster is the best way. I'm testing three and I'm saving the cluster in the data table like that. I can put in legend the row state of the cluster and the map will be colored with the different cluster. You can see we have three cluster really well separate. One looks very interesting, the green one. You have the target and four sample really close of the target. I can start the process now. I can say to the client, okay, you can use one of these four samples from the mixing plan to keep the same quality or the same type of wine. They are really close. Maybe you can choose this one, it's the closer one. But if I want to give more information to the client about where it can play inside the mixing plan, I did another treatment. I would like to know the distance between each sample from the target. For that, I saved the coordinates of each sample. You can see they are right here, the dimension one and the dimension two. I have just calculated the distance between the target and all the others in sample. To go a bit faster, I have already created a script with just adding a new column and a formula to calculate the distance between the target and the sample. I will just running it. You can see here the new column with the distance. To represent the best part of the mixing plan, I will do a graph builder. A s I said, the sample is working two by two. I can represent it in two dimensions. For that, I will put the ingredients three here and the ingredient one here. As they're working two by two, we know that the complement of the ingredient one is the ingredient two, and the complement of the ingredient three is the four. We don't need to show the target, so I will hide and exclude it. I have my 16 sample right here. I will put the distance in color and I will represent it with the contour and the points. To be easier, I'm just changing the color. I will take this one, the green, yellow, red. Like that the sample is close from the target with the shorter distance from the target are in green and the other one are in red. I don't really know how to change the color of the points. We don't see them very well. You can have this type of mode. That is really interesting for the client. You can see there is different spots in green and different spots in the in red. In fact, we know that it's not interesting for the client to playing with the mixing plan in this area. It doesn't look like historical wine. It's the same for this area. But there is two other green area. This one there's in fact, only one sample really close from the target. If you look the point around, then doesn't really look like for the historical wine, so it's not really interesting to play in this area. In this one, it's really more interesting because you have three sample really close from the target and two other one a bit far away, but still close. W e can say to the client that, okay, if you want to keep the same type of wine, you can add between 4 and 10 % of your ingredient one and between 20 % and 60 % of your ingredients three. The most interesting is to keep in this area above 50 % of your ingredient 3 and around 7 % of your ingredient one. With this information our clients in relation with the age is volume tank, is what aging you want to do. It can play a bit, but in the way to be sure to keep the same quality and the same type of wine. This helps really a lot the clients. We can do another treatment. I will explain you quickly because it's a long treatment to do. But in this case, I only use the group. It doesn't matter if it's called group one or if it's called Fruity, Woody, it doesn't matter. It's just the group. But I asked my panel to describe the group. In this case, I do another treatment. From the data table result, this one, I'm doing a text explorer with a classic JMP Pro . I can have this type of data table with my samples and descriptor. In fact, I ask him to count how many times each descriptor are written for each sample. Like that, I can do another type of map with a multi word method, but this one, a multiple correspondence analysis. In this case, I will put in response the descriptor, and in factor, the modalities. I will add in the count in the frequency. Just after running that, I just will show you with the script because we'll see it's well, there is a better presentation in this way. Okay. You can have this type of map with in blue all the modalities, all the samples, and in red, all descriptor used. It's a complementary map from the first one, from this one. From this one because in this one, you have the sample close or far away from each other, but you don't know why. You don't know why this sample are together, or why this sample are on the right of the map, and why this one are on the left, why they are separated. With this process, we try to explain a bit why the sample are separated. It's not always exactly the same map because it's not the same treatment. This one needs a process a bit longer than the first one because when sometimes you don't have exactly the same way to write a word, whereas in French we have accent, so sometimes you have to check the result of the data table before to do the process. Some words are more or less the same sense, so you have to put them together. So it's a bit longer, so I can't do it right after the testing, but I do it after. We can explain a bit better why the sample are located on this way on the map. In this case, you can see some sample are really high, coconut, some vanilla nuts, other one more toasty, spicy on this one. Unfortunately, some are really negative descriptor, so you can explain a bit better, always working all the samples. That is really good complementary information of the mixing plan to explain. If you choose to go on that side of the mixing plan, all your wine will be described. That's it. I just now conclude. I hope it was not too speedy. For us, the testing, it's a difficult exercise to modelize and to represent with the panel we used because it's not trend, it's not a big one, and we don't have the same target when we begin testing. It's why we decided to use a descriptive testing, not quantitative, the free sorting. This type of testing can be only thanks to JMP, thanks to the script. I can do all the process really quickly, really show about the significance of the results, and I can show it right after the testing. Like that, we can talk with all the taster about the results. When we leave the testing, we are all clear with the wine we have tasted and the result. It's really more powerful than just testing with some weight. W e can use that type of testing with a small and untrained panel. Just to finish, in this trial, the client was really happy with this mixing plan and it can adjust the recipe. I know the recipe is working since two years with the four ingredients and it can play a bit each year, but the recipe is fixed and he's really happy with that. Thank you very much for your attention. Have a good day.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Many know JMP as a powerful tool for analytics and modeling and aspire to leverage JMP’s advanced capabilities to champion improvements and business understanding. It can take time and domain experience to achieve a high level of proficiency. Don’t dismay; we all start somewhere! Even at modest experience levels, value can rapidly be achieved using JMP fundamentals. Fundamentals can be quickly propagated across an organization to seed and inspire a culture of analytics. Hear how our team has integrated offerings from JMP education in a JMP “boot camp” format. The faster an organization can establish basic proficiency in JMP, the sooner it can benefit from that investment. Additionally, having a shared platform for both basic and advanced analytics creates a collaborative community, increases self-sufficiency, and provides a learning path to foster employee development. While sharing our training approach, we will demonstrate foundational JMP features, including data filters, tabulate, summary, recode, column formula, and column properties functions to track student progress. See JMP in action as we highlight methods to construct, customize and journal graph builder visuals in ways that entice spreadsheet users to make the “JMP” to becoming JMP data ninjas. Hi, I'm Trish Roth. I am going to be presenting to you today about managing a learning program with JMP that is developing the next generation of JMP Ninjas. A little bit about myself. I am a data scientist in core diagnostics. My colleague, Jeff Pennoyer, who helped develop this training and the presentation materials, isn't able to join us. But acknowledge his contributions and also many other colleagues' contributions over the years in putting together training to improve the skill sets that we have within our organization with the use of JMP. We both have biochemistry, technology backgrounds, and have worked in the data science, analytics space for a number of years, along with many other folks in our division. A little bit about Abbott, in case you aren't familiar. It's a large global health care company. We've been in business for over 130 years, operate around the world. We have over 113,000 employees. We're all focused on bringing life- changing health technologies to the people who need them. Y ou can see on the right hand side a number of the different product lines that we support. It varies by country. If you go to abbott.c om from your location, you'll see more information about the kinds of products that Abbott delivers. Both Jeff and I and a number of colleagues who've been involved with this project and training are from the diagnostics division, particularly core laboratory where we work with large hospitals and reference laboratories who provide diagnostic testing directly to patients or to physicians. Y ou can see some of the other product lines here. T he purpose of the presentation is really twofold. Wanted to give some insight and thoughts around how we approach training, the types of things we include. How we organize it, and as well talk about some of the features and functions of JMP that we focus on, particularly in our beginner training, to get people comfortable with data manipulation and data preparation and data summarization. These skill sets can really serve them as they continue to grow and develop as data analysts and move on to more advanced analytics and statistical analysis and modelling. But this is a good foundation to get people started and comfortable. T hat's the approach that we have taken. We leverage materials that JMP provides. We leveraged area experts. We try to have area specific examples to really make the training relevant to people so that they can see the value and how they might apply it in their day-to-day. We talk to both managers and employees about what they want, what they need. As we try to think about how much can we really deliver. What kinds of skill sets are we lacking, or where do we not have enough people. How do we grow those skill sets. How do we fill those knowledge gaps. Then on the employee side, people want to contribute. They want to grow. They want to become functioning members of their departments, especially when they're new. They want to be independent. H opefully we can find an intersection between those needs and wants to develop some training. We always have to have the conversation around investing time, both from a manager standpoint and giving their employees time and space to work on developing their skill sets and employees have to be willing to invest time and practice and think about how they're going to apply things so that it sticks and they really do hone their skill sets. We've defined a body of knowledge that we focus on, particularly, again, at the beginning and intermediate stages. We have a fair amount of information and knowledge getting started and we do this in what we call a boot camp style. Sometimes it's intensive over a couple of days. A couple of hours at a time to just really get people into JMP, get them familiar with how to work with data in JMP. Oftentimes, they're coming from Excel, so we have to reorient them a bit but the basics. What are the menus? What are the preferences? How do you get data in? How do you do some basic data clean-up and summarization functions, basic graphing, creating formulas? T his allows people to get up to speed and be able to actually deliver some analysis pretty readily once they get through this boot camp core information. Then depending on how deep we want to go with the learning. Depending on what the organizational needs are. What the time availability is. We will start to get into the more traditional exploratory data analysis and statistical analysis like particularly capability analysis, control charting, hypothesis testing, regression. A s people move into these topics, they then go on to do much more modelling. We've got a lot of interest in scripting. O nce people have these foundations, they can start to move on to these other topics and really deliver value to the organization. Once we got the material and how much we think we're going to deliver and present, we have to really think about what's the best way to deliver it. We do like the in person. Obviously, in the last couple of years, we've not done a lot of that. But there are some folks who just really do better face to face where they can have somebody standing over their shoulder and watching what they're doing. We do predominantly use virtual conferencing. We can bring together people from a lot of different sites, locations that way, and minimize travel. We also do some very informal things, small bursts of topics, one particular topic, maybe over a lunch hour or a small group meeting. We also have some fully independent learners. We point them to the learning resources both from JMP and also from our curated set of information and presentations and recordings that we have internally. W e've planned for how to organize and centralize and post information so that it remains accessible to others when they want to come and do some training. Just have a little snippet. From our SharePoint side, just basic information. You don't have to be a really good website developer. You can put a little calendar of events. You can have information for beginners, intermediates. We provide links to past recordings. When folks finish their training, we like to do a little congratulations and give them some recognition. I nternally, these links would take you out to some presentations to listing folks who have successfully completed elements of our training. But then this leaves a body of knowledge and a body of resources internally that folks can leverage as well. We have a lot of links out to the JMP community where there's a lot of good information and SharePoint document libraries, so presentations and data files. We can keep this all centralized and people can access it on demand when they have time or interest in training. A s well, we maintain a list of subject- matter experts here just showing Jeff and I. But there's many other colleagues that have been involved and give their time and talent to help others develop. They put a direct link to top five countdown of why data preparation is faster, easier, and better in JMP. It's about a three minute video from Julian Paris at JMP. It gets people energized and motivated and excited when they see all of the features and functions that they're going to be learning about. W e sometimes kick off training with a couple of little videos. As I mentioned, we've defined some learning levels that helps folks try to figure out what they should sign up for or where they might fit. It is a challenge because there's such a broad base of functions within JMP that an intermediate level or beginner level can cover a lot of territory. But we do our best to try to get people into a group where they feel comfortable and are at the same learning pace and level. O ur more advanced and intermediate folks, we have them do teach facts and presentations. It helps hone their skills and, again, builds the community within our organization. Put a little example of how we survey to solicit people that might be interested in the training. We leverage Microsoft Forms. We can create internal surveys, collect demographic information about people who their managers are. Again, it's very important that there's good collaboration and communication between the learners and their managers to make sure that this is something that can be supported. We need to know where people are so we can consider time zones as we're thinking about how we're going to schedule training. Just survey 101, the more you can give canned responses that a user selects from versus entering their own information, the easier you will have in being able to analyze and summarize that information when you get it back. Have a multi response question, because one of the things we can look at it shortly when we get to the demo is how JMP can handle a single question that has multiple responses so that you can understand different categories that people might have selected. But we do also want to understand, again, the why folks want to participate in the training so we can ensure that we meet their needs and that they're coming into it for the right reasons. S ometimes we collect other information about other things they might be interested in learning. Now we'll move into JMP. W here we're going to start is we've done a survey and we've gotten back our results. We're going to get that survey information into JMP to find out who's interested in taking the class. Then we're going work up through a series of columns and formulas. How are we going to keep track of these folks as they move through their training? I will be using JMP 17 standard, but most of this has also been done actually started out in 15 and 16. It will work there as well. Just so you can see where we're going. Again, we're going to import this information. We're going to do some cleanup. We're going to enrich the information by adding some formulas and columns. Then we'll create a subset so we can track our beginners. Then we're going to start to import information into that table to keep track of what people have completed. Then I've got some scoring formulas so I can figure out who has completed the training. If not, what elements of the training that they're missing. Then we can use that data table with the scoring to then communicate back congratulations to both the student and their manager. I'm going to get out of PowerPoint and go to a JMP journal. For the remainder of the discussion, we will be in JMP. Again, the registration form comes back in the form of an Excel file. It's embedded in this worksheet. When you launch it, JMP is going to look to import that information. As a best practice, I always click on the Restore Default Settings. This is a fairly simple worksheet. It only has one tab and you can quickly evaluate, are the columns looking right? My headers in the right place? The data elements look like they're going to be properly imported. W e have a quick look at the data. If we did have any hidden rows or columns or empty rows columns, we could decide whether or not we wanted those to be imported or not. We're going to leave the defaults for this. Simply click Import and there we go. JMP has ingested the information from the spreadsheet. Again, there is 83 respondents. You can see all of the categories. These were the questions in the questionnaire. Each one comes in as a different column in JMP. Obviously, this is anonymized, so you can see the learner's email, first name, last name. This email is going to be important because that's going to be the key. That's going to be the piece of information when we look at importing. Did they attend a training session? Did they turn in their homework? That's going to be how we join the information. All of the reports that we get will have the learner's email. That's how we can combine data that's going to come as we progress through the training. What are we going to do with this file? We've imported it. We're going to do some data functions to clean it up and enrich it. I've listed those here. We're going to look at the location information and we're going to see that we have some permutations. We're going to use the Recode function to clean that up. When we surveyed, we combined track and level. When we did this survey, we were actually surveyed for more than just JMP training. This is a subset of that. But we want to separate those into two pieces of information rather than having them glued together. Then we'll take a look at doing some summaries and tabulations and graphics so that we understand who is the learner population that has signed up for training. I'm going to jump over, save this one to the version where I've already cleaned this up. W e'll take a look at what that looks like. Again, here is, in the survey, we have location and you can see there's a new column with a plus sign that's got a formula. Y ou can have a look at it and see that it's doing some manipulation. Where it said Lake Forest, we're actually converting that to Chicago, Illinois. The Recode function generated those formulas. I did not have to write that formula. The way that you do that is you go to learner location and you can just right- click and select Recode. It's going to show you here's all the data elements. You can see pretty quickly that there's some permutations. Somebody entered Knoxville, Tennessee with and without a parenthesis. Geographically, you may or may not know. Chicago is a big location, a big city, and actually Des Plaines and Waukegan are actually all suburbs and Lake Forest as well are all really part of Chicagoland. W e want to group those together. They're all the same time zone. Those folks are within 15, 20- 30 minutes of each other. W e could, again, we've got them all highlighted. I highlighted multiple by holding down the CTRL key. I'm going to right- click and I can say I want to group these all to be Chicago. N ow you see all four of these entries are going to be Chicago. I'm actually going to add Illinois. W e go through that process for a number of the different permutations. Again, Santa Clara, California got entered with and without the state designation, so I can group these. I just want to use the two- letter designation. Similarly, so we can go through the different permutations and do this data clean up it. The way that we want to save it is we could overwrite the data. I don't like to overwrite data in my data tables. I want to save the formula because if I run another class. Or I go to the run another survey, it's likely that I might see similar permutations. Just for the purposes of demo, I'm just going to rename this as demo. Now it's going to create a new column formula. I'm going to hit Recode and you can see that it created this new column here with the formula. A gain, I didn't do all of the clean-up permutations, but you can see how it did the mapping. I n the future, if it sees Des Plaines, it's going to group all of these under Chicago. W hy do that? It obviously reduces the number of variables if you try to plot or summarize, and it just cleans things up. W e did some additional clean up items. As I mentioned, this track and level, it broke it into two pieces using a word formula. We just said take the first word of track and level, and there you have it. JMP and take the last word of track and level, and that will get you level. You can do that by simply taking the combined data column and using some pre-set column formulas that JMP provides. Here's first word, or you can select last word, you can see first word JMP. Then I just retitled these to simplify the name. Now, why do I do that? Now, I want to have a look at what's in this data set so I can use the Analyze, Tabulate function. Now that I have them separated, I could leave them like this. Y ou can see the population of beginners and intermediates. I like to have them broken out. I'm going to do track and then I want to know level. T here's different drop zones where you can put these depending on what you want to see. But I'm going to build up a series of charts. I'm going to do location. Now you can see that there were 83 respondents, 61 beginners, 22 intermediates. If I check the box down here for order by by count of grouping columns. You can see that it resorted so that the grouping with the highest count is listed first. So rather than being alphabetical, it's in descending order by how many are in each of the categories. You can quickly see what is the distribution of the locations of the people interested in your training, and you can start to plan for how you're going to deliver it. If you're done, you can click done. Then what I've done is use the script function within JMP, Save script to data table. Then I can give that a descriptive name and it will save it right back to the data table. Here we go. Tabulate by location and level. If I click that button, basically it's repeating that analysis. I've added a little bit more detail where I, in addition to the number of respondents, percentage of total, so you can see what proportion is in each category. Instead of tables, graphics are always nice. Again, I've presaved some. We'll take a look quickly at how to build those. I've taken advantage of a function called the column switcher. Built up a graph that I liked and now I can easily toggle between different categories. You see, this one's a little bit messy, but you can go between categories to have a look at different managers. Some only submitted one person to be going to training. Some submitted multiple. It's a little bit busy. But again, you can see what functional area they come from and you can see what location they come from. I've also added a data filter. If I really just wanted to hone in on beginners, I could select beginner intermediate. You can see in one page, I can very quickly get a variety of graphs and easily put them into a presentation or save them so I can communicate. The beginners are mostly from Chicago, but there's a good chunk from Dallas. We've got two Irish sites that have a number of folks that are interested. But if I go to intermediate, Chicago, Germany, again, Texas, you can see where folks are coming from that are interested in different levels of training. The way we build this, I'll pull it off to the side and we'll just look at it quickly. Graph Builder. I started with location. I'm actually going to drag location to the Y axis and hit a bar chart. Now I see each of the categories. The reason I like to do this, sometimes if the text is long, it's a little bit easier to read it when it's in this horizontal orientation than if it's in a vertical orientation and you're trying to read it sideways. Again, if you right click, I can change the ordering and then go order ascending. A gain now it's count data. So it's putting the category with the highest counts at the top and the least counts are at the bottom. I'm going to change this from mean, it's not really a mean it is just a count. Now I can add a label that is percent of total values. I can see again, Chicago is 40 % and you can get the proportion. Then on the Y, X axis, rather, you can see the actual counts. You're getting a lot of information in one graphic. One other feature that's really nice in Graph Builder is I right click, add caption box, and I right click it again and change the caption box location, the Y position we'll put it at the bottom. Just there's more real estate at the bottom. Again, you can see there were 83 participants, 40 % are from Chicago. A lot of information in one graphic. You can further customize it by right click, hide. I don't really want all these annotations. I think it's obvious that it's count. I'm just going to hide some of the annotations. It just makes it a little cleaner. The way I got the coloring was to drag location over to color. Now each category has got a different color, but I can customize those so I can click on the bar. Chicago's blue, it's fine. Next one down is Longford. If I right click, I can change the coloring. It will go a little bit lighter blue. Dallas, Texas was the next category, fill color. I will do light blue. Then the rest of them, I'm just going to hold down my CTRL key as I'm clicking through all of these categories to highlight the rest of them. Again, pick one of them in the legend, right click, and we're just going to send them to gray. That's how I got to the customization. Now, I don't really need to see this legend, so I can get rid of it. The way the column switcher works, if I look up here in the toolbar column switcher, it's also icon. It's right now on location. I can add to that organization manager's email, again, holding the CTRL key. Okay. T hat's what brings up this window where now I can switch between. It gets a little busy and I have to go through the same color customization if I want to have the blue and just a simple blue and Gray. But it remembered it for location. I can say done. Again, that gets me real estate. I don't really want to look at that legend. U nder Show, I can turn it off and then resize. If I wanted to be able to toggle again between beginner and intermediate, we can use a local data filter and say I want to be able to filter on level to select it, do the plus sign. One nice feature I think I've noticed in 17 is if I resize this, it enlarges the font automatically, which is actually very nice. Now I can toggle between beginner and intermediate, or I can clear the selection and again leave it at both. Once I'm happy with that, again, onto the red hotspot, Save script to data table, give it a descriptive name so you know what it is, and it will save that script to the data table so you can re execute it. The other thing you can do is under the Edit menu is Edit Journal or CTRL J, then it will grab an image of that analysis and place it in your journal, which is what I have done here. I f I go to the journal, you can see that I've captured these images of the graphs the way that I like them. Then nice thing about doing it in a journal versus grabbing a static picture is, again, you've got your in JMP and you've got your red hotspot, which means you can have interactivity. I can select the graph from the journal, and as long as the table behind the graph is open, you can say run a new window and I get back my interactive graph. If I wanted to make some additional changes, change the text, I could do that. But I've got everything saved in a nice workbook. Now we have a learner list. We know where they're from. I wanted to quickly touch on the skill set piece in this multi response. The way I handled that is I actually created a copy of the column called skill set. If you look, it's a little bit hard to see, but if you look carefully at the original column, each of the selected items was separated by a semi colon. JMP can handle a semi colon as a delimiter. I found that it didn't work very well in this analysis. As a workaround, I created a copy of the column, and then I did CTRL F, and you can do a simple find and replace. I replaced the semi colon with a comma, and JMP liked the comma a whole lot better. Now why do that? Again, I'm going to go to the graph builder. Then the final thing I did was told JMP, Hey, this column is actually a multi response column rather than being a number or character. It's multi response. That's what prompts JMP to look for that delimiter and understand that there's different categories in that column. Again, we'll just quickly go to the graph builder and you can look at the difference. Now, if I take multi, you can see each category is only represented once. Of all the different reasons why people are interested in participating in training, each category gets counted independently and you don't see all the permutations. A gain, right click, order by. You can see what the most popular ones people want to improve their skill set so that they can be more efficient. We've been talking a lot about storytelling with data, how to getting a message across, how to drive action with data stories. These are all the reasons that people want to participate in training. Yeah, that multi response function is nice, particularly if you're doing surveys. Final thing we're going to do on this data table is we're just going to take a subset because we're going to focus on the beginners. A gain, you can use the data filter level. I only want the beginners. Again, out of the 83, it's highlighting the 61. I'm going to select a set of columns in my data. I don't want all of them. I don't ant that one. Then we're going to create a subset table, subset, and we're going to tell JMP, I only want to use this selected columns. It gives you a nice preview. Here's the email address, where they're from, and their level, and we can say okay. Now we have just a list of the beginners, where they're from, basic information, and this is what we can use to start to build a tracker to say, Okay, these are the folks that are beginners. I got to make sure that they complete their requirements. How am I going to do that? I'm going to take you to the version where I already have this set up. Let's close the registration information. Now we're into the completing. What I've done is taken this basic data table that had the information about who registered for training, and I started adding a whole bunch of columns. Beginner training consists of five different classes that they need to attend, five different sessions, three homework assignments. We assign them a couple of STIPs modules. That's Cisco Thinking for industrial problem solving. They're free courses and modules available through the JMP learning community. We assign a couple of them to the beginners. They can take more if they want to. I've got them all listed here. I will talk about these hash marks in a minute and why these columns aren't blank, like the homework columns. We request that they provide a data example. These are all the elements of the training. What I've leveraged is a couple of column features. So if I go to class 1 and I do column information, what you can see is 1, I've used this list check function. I've told JMP, these are the only values that can go in that column. That just helps keep the data sheet clean. If I do any data entry, it forces consistency across the data table. The other really nice feature is called value colors. Then I've assigned a specific color to each value. Yes, if they attended class, they get a nice dark green. If our students were recording, they maybe they watched the recording later, they didn't come to class live. Sometimes people tell me they're out of the office, just color coded that red. Then the key feature is if you click on the little box at the top and hit Apply, it will color code your cells similar to what people are used to seeing in Excel, it'll color code your cells based on the content of that cell. It makes it very easy to look across this data sheet or data table to say where are we at, how many people are missing things, how many people are green, how many people are red. Once you have one column set up, you can use copy column properties, and I can broadcast that across the remaining four columns for the different classes, which is what I've done. When you're using these value colors, it puts a little black X mark as an attention activator to let you know that it's going to color code depending on what you enter there versus in the homework field, I hadn't yet activated that. These are all the elements that are required of training. Now I have my workbook for managing. One other feature we'll talk about is joining. We held our first class. I'm going to clear this a little bit. I actually already have this Excel file open. Microsoft Teams provided me with a summary of the meeting. I had 42 participants. Here's who they are. Here's their email address. It does actually tell me how long they were in the meeting. If I scroll down to the bottom, I can decide if Learner 26 who was there for 12 seconds if they were going to get credit for attending or not. But really all I need out of this worksheet is just their email address because that's how I know who they are in my tracking sheet. I've highlighted it and I'm just going to highlight it in Excel. One of the other things you can notice is, Learner 79 must have had a little trouble. They were in for one minute and then must have gotten dropped or had to go back, come back in. There's actually two entries for Learner 76 and Learner 79. You always want to look at your data first. But with JMP and Join, we don't have to worry about that JMP will do a good job of merging the information. Another really fun feature is the JMP add in. I've highlighted what I want. There's a JMP add in within Excel, select it, data table. It's opening it on my other screen. I'll pull it over. I've literally just grabbed that information and put it in a JMP data table. I'm going to quickly add a column called Class 1. This is not a data table we're going to keep, so I'm not going to spend its character. Okay. Then this is the list of names of people who attended class 1. I'm going to just enter Y because I know that that's... I'm going to fill to the end of the data table and it's already called attendance. I don't need the Excel spreadsheet anymore. I'm going to go back to my tracker. You can see that I've deleted some of the entries here. I'm going to use a table update function. I like update because I just keep building onto the same table. I don't constantly generate new tables that I have to rename and save. In that attendance list, I know that email matches email. It is case sensitive, so I actually had to make sure that it was in full lowercase in both locations in order for it to match up. JMP will give you a really good preview. If you don't see anything, then you can take a look at whether or not maybe you missed something. Then what I want to do is the attendance table, which is the update table, I want to update the class 1 information there. I want to replace the class 1 information in the Master table because it's just blank. Let me see if I can do this so you can see what happens when I hit... Here we go. One, it's giving you a preview, but if you watch up here, this is the tracker sheet. I'm going to update it. I hit okay, and there we go. Now it's updated the tracker sheet with the information about, yes, these people attended class 1. As we're moving through a training, we're going to do that on a repetitive basis. We're going to get reports about attendance. We're going to get reports about who completed their homework. I don't have to manually go in here and, okay, you were there, you were there. I can do it in a much more automated fashion. Then that way, if somebody emails me or lets me know, "Hey, I wasn't there, but I watched the recording," now I can just enter that manually and it really reduces the amount of manual intervention. Again, a lot like what people are used to working with spreadsheets, but I think once you get used to it, you can actually do a lot more here. File, close this one. This is just a transient. I don't need to keep it for any reason, so I'm not going to save it. Then we're going to go to... Now time has gone by. We've run a whole bunch of classes. You can see people who've come to class. People have missed stuff. Some people came and I guess decided it wasn't for them. I've done some SIPs modules. This is what it looks like in the end. This is the accounting of what all of the participants have completed. One thing that would be nice is SIPs, they send me a copy of their certificate, so I do have to manually enter that information. It would be great to be able to get a report that I could then just join in or do some easier way of tracking that, but to be determined. Then the last piece is, okay, great, I know who was there. I know what they did. Now I got to score it. This is where... Again, we've got the tracking spreadsheet and we come all the way over to the end. Now, I've added a whole bunch of other columns which are based on formulas and added color coding. It's relatively a little busy, but it's relatively easy to see how much green there is versus red. I've got things color coded so it highlights to me who's missing information. T hen I actually even created a formula for each person. What exactly is missing? If nothing's missing, it's just a series of commas. And then again, a conditional formula, and we'll look at these really in the moment that tells me, hey, did they meet the minimum requirements? All the things that I said that they had to do. If they did, then I get a finished, and so it's really easy for me to then say, okay, I know who's finished, I know who hasn't, and it updates automatically. A ll this information will be available next time I run the training. Once I built it once, I can make minor modifications, and so it becomes a really helpful tool. Just to finish out, the power of the formula building. Here's class score. I said we held five classes. I decided you had to at least make it to a minimum of three. How did I score that? I created a column called class score. Again, you can see it's got a formula. We'll take a look at the formula. It looks pretty busy, but we can build it up in pieces and show you. Once you get accustomed to building logic, you can copy and paste the elements and replicate them pretty quickly. I f we just take a really quick look, each box is a different class. They come to class one. If there's no entry in that field, it means they didn't come. If the entry is not a Y and it's not an R because remember, Y stands for yes, R stands for recorded. I f they didn't come to class directly or participate in the recording, they get zero points for class one. If they had a Y or an R in these different ways, you could write the logic, they get a point and you basically take this element and you can paste it and then update same formula class two, class three. You see I'm adding them up. They get one point for class one, one for class two, three, four, and five. It totals up, and it is a little bit hard to see with some of the color coding, but this person only came to one class. This one came to all four. Again, this is where we're using the value colors. They're not showing an order. If it's zero, one or two, because that's not the minimum requirement, I color- coded it in some red coloring and three, four and five, which is minimum or above is green. Again, a lot of information you can build up in formulas pretty quickly. Same thing for homework. If we look at the... There it's a minimum of two. We can take a really use a slightly different logic this time just to show you the flexibility. In the homework field, if nothing's there, that means they didn't get a check mark or a tick saying they completed the homework. If homework is missing, and then this little exclamation point means not. If it's not missing, which means it's there, they get a point. A gain, you sum them up. So if they did all three homeworks, they would get three points. You just work your way across tips. We required two. Some people overachieved and you see this person at the top, it's really dark. Did all seven. I developed an extra credit formula saying, okay, if you were assigned to, if you did more, I'll give you some bonus points. And that way, if you missed a homework, you you can cover it with an extra steps module. A gain, you can just build up logic statements. You have to really think through what your requirements are and what the logic form is going to be. We'll just do a quick how do we build that? I think I had this column seven, which was an example. Yeah. All right. We'll just clear this out, edit formula. Again, I'm just going to clean this up, build it from scratch really quick. Again, once you're in the formula editor, if you're not sure where things are, you can type and it'll show you if it's under conditional. Just going to clean that up. Then it guides you, well, what does it need? If what? Well, if... Make sure you highlight the box. Class one, I want to do a comparison. Is missing zero else 1. We're just going to do a simpler formula. There you have it. Now you can add. Now I need to do the same thing for class two. Depending where you click, you'll highlight different parts of the formula. You want to make sure you get the whole box. You can use your up arrow. Once this whole formula box is highlighted, I can say Control + C for copy. Now I can just paste. I don't have to build up this if then else logic. I nstead I can just say, okay, class one, I want to apply the same thing for class two. That's how you iteratively would build up a formula once you apply–– You can start to see that's how we added up the scores based on the yeses or our content being in the attending class formula. Then the final piece is I want to know what's missing. One thing you'll notice is that this particular row here, notice that they aren't designated as having finished their training. If I look across the row, they completed four classes, they completed all three homeworks, they actually completed four SIPs modules. I know it's difficult to see with the coloring, but what they did n't complete was providing a data example. It's blank. That is a mandatory element of completing the course. Even if they overachieved on everything else, unless they apply their learning and provide us with some example of how they use JMP, they can't get full completion credit. That's why even though they have the points, they're missing the one critical element. These formulas are stored in the table that you can reference later, but it gives you an idea of... You can build up some pretty complex formulas, but it's saying, okay, if their class score is less than three, that means they didn't attend enough classes. Note that class is one of the items missing. These double pipes are for concatenate. It's just putting a comma delimiter between the elements. Then it's saying if homework is less than two, that means they didn't finish the minimum number of homeworks and so on. Then you can see this data example. If data example is missing, it's not points, it's just black or white. It's either there or it's not there. You can get a listing of what they've completed and what they've not completed. Once you have that, you can quickly tabulate and we'll just go to missing. Then I can add email. Now, because I did it in the opposite order, here's people that are missing one SIPs Module. Here's people that are missing two, here's people that are missing two SIPs modules and didn't do a data example. Then I can communicate back out to those groups of folks exactly what they're missing, and so they can either get it done or say, I'm not going to be able to finish this. Close this. Once we've spent the time to build up those formulas, again, we can do some graphics based on that finished column, and I can see what percentage of people by site or location finished the training. I can tabulate 35... We had just over 50 % completion rate. Not great, but that's reality and we can circle back with what were your barriers to finishing? Again, you can look at your metrics, you can report back on what's happening, all being driven off of this one data table by using different formulas and different graphics. It's very simple bar charts and summary tables. Hopefully that gives you a flavor of without getting into advanced analytics and model building and response surface modeling, you can get a lot of mileage out of the fundamental features of JMP. It's really, in my mind, a very good jumping point for folks. We've had a lot of success with getting people up and running and comfortable. If you can navigate through these tabulations and summaries and data cleanups and making some graphs and customizing the graphs and thinking about how to annotate the graphs so they have a quick, meaningful message in the most crisp presentation, you will have really moved the needle on the capabilities of your organization, and hopefully generated some excitement for the use of JMP. With that, I thank everyone for tuning in. Hopefully, when this is posted in the community, if you have questions, thoughts or suggestions, certainly welcome the discussion and hearing what other people have to say. But don't undervalue how far you can get with getting a broad base of beginners up and running. They can go out and do great things, as I said, by way of summary, get people excited and get people up and running. The nice thing is you get beginners and advanced practitioners now on the same platform. They can start to talk to each other. The beginners can move along and the advanced practitioners don't have to go backwards. We're trying to remember how Excel works, they can stay in the platform where they do most of their analytics. When you do that, you can join the Ninja community and dare mighty things like flying helicopters on Mars. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

One of the most important product test machines (ATOS) is investigated in this global Autoliv project with the target of introducing an automated alarm system for product test data and a root cause analysis. We wanted a flexible automated software solution to transfer data into an SQL database and perform a root cause analysis. Furthermore, we wanted to send web-based links of reports to an existing “leading-to-lean” (L2L) dispatch system, which informs machine owners via mail. We use JMP to automate all processes via Task Scheduler for all these tasks. Hello . My name is Astrid Ruck . I'm working as Senior Specialist for Autoliv. Autoliv is a worldwide leading manufacturer of automotive safety components such as airbags , seatbelts, and active safety systems . Today , I would like to show you an automated process of controlling product test data and creating alarm reports for root cause analysis . We will start our presentation with a video on the working method of Autoliv's most important product test machine called ATOS. These machine s make a 100% control such that no defect part will be delivered . The resulting tests will be written into a log file , including additional information, and automatically send to a server in Amsterdam . In the blue circle , you see our usage of JMP . So in the first step , the log files are transferred into a database and daily reports are created which are saved on the server . If and only if there is an alarm, a second table is used from the traceability system of our relief call ed Atraq which includes component information of every retractor . This is used for predictive screening for root cause analysis in our alarm report . Alarm report is saved to the server and we use an HTTP post to send this link to our Autoliv's dispatch system which is called Leading 2Lean . Leading 2Lean sent an automated mail to the corresponding machine owner . Here we see the retractor . It has an orange webbing and a clear cover . Let us start with a video . So this is the Atlas machine . Here you see the retractor but now we have a black cover instead of a clear cover . Here you see the webbing, and sometimes you will see a little marker on the webbing , because then you can see if there is a webbing extraction or retraction . We will start with the tilt lock testing, and tilt lock testing is to ensure that the blocking off your seatbelt in the case of a roll -over scenario . We start with a tilt lock right testing, and here in this little display you see the corresponding tilt lock angle , which should be between 15 and 27 . So let us run it . It tilts to the right , it tilts to the left , it tilts for ward , and it tilts backward . The next step will be the measuring of the webbing lengths , because this also belongs to the blocking system . Take a look here at this right -hand side , and now it starts the webbing measurement, and now already here , very short —might go a little bit back — Web Sense Lock and No Lock is tested here in this little box with a sensor . Web Sense Lock is to ensure blocking of webbing extraction in a case of a crash . But Web Sense No Lock is to ensure free wheeling if you're in a parking position . These are all informations which are written into the log files , including machine parameters and an internal barcode . This in turn of barcode is unit per retractor . It includes the retractor number, its global line ID , its production day, and the key index . Soon these log files are transferred once per day to a server . It is not SPC . It is used for a root cause analysis , therefore , we don't want to disturb the testing of the products . Therefore , the transfer time is between two shifts and it is synchronized within Autoliv facilities, but different between Autoliv facilities . For example , in Hungary , the log files will be transferred at 6:05 . And in Romania , the lot folks will be transferred at 7:15 . And here you see the folder structure on the server . It starts with the directory of the plant , so Autoliv Hungary, Autoliv Romania , then in each folder of the plant you find separate folders of the machines . And then in each machine folder , you will find folders of the year and the months , and in the last stage , you will find the daily log files . Since JMP 16 , each action in JMP is recorded in the Enhanced Log . Data work can now be saved as playable script per point -and -click . Jordan Hiller from JMP, he says, "JMP writes 90% of your code , the skeleton ." So the consequence is that the other 10% is learning by doing, and this presentation is to give you a small idea how you can write your own scripts for automated data analysis and root cause finding . So I will give you some little short scripts which you can copy and paste into your own scripts . In the beginning, we would start with the multiple file import, then we create the relevant columns . We select the relevant columns and delete the irrelevant columns . This procedure is independent of the order of columns because sometimes some columns are added in the log file . But this procedure takes the name of the columns and we are quite independent of any other order tree . Then of course we clean the data in the first step . Here we have an example . We have a product family and there is an empty space and retrac tor triple X , but we would like to have retractor to triple X without this empty space , and here you see this corresponding script . Then we transpose the data into the database . We use the command from JMP new SQL query , we say what kind of connection string we would take , and then we use the function in JMP of custom SQL and write our SQL command and then we run it in the foreground because run foreground ensures that the transfer in the database will be complete before the next procedure will run . And don't forget to close all and to exit . So here we start with the multiple file import, and here you see once again the folder structure , and here in the beginning you see what kind of folder you select . One of the best thing is that you can see include subfolders because our daily log file is in a very sub-sub folder , and this helps us a lot . We are interested in log files and we are interested in data from 6:15 to 6:15 , and it was the 21st of December last year when we uploaded the log files . Here you can see the relevant files which are found in this time slot with similar files aspect, and we see here a tabulator is my separator of the fields . This is the result of the multiple file import . The worksheet with the machine ideates the start date, start time, seatbelt, and here you see the results of the tilt lock testing . Tilt right result, it is pass and fail . T ilt right angle , here you have the angle and here you'll see the other things . If you edit your source , then you get your script and now you can take this copy and paste it into your own script and that will be your first script you can run . Now we would like to transfer this data into the database . We transfer 1,000 rows per loop . So here you see from the worksheet , the first row, the row number 1,000 . And we say get the rows from the first ro w and smaller is row number 1,000, and call it My List . So DT is my data table , here are the number of rows with R, and the row tells me, "I don't want to have the column names . I would like to have the values ." And this is how the list looks like . Here you see that the upload date is in brackets or in quotation marks . And here is the start date and the start time also from type character , because we have had some difficulties to transfer the data, and this was a nice script and it works . Here it's the end . So if you look here in this table , you see that left result is empty, is the character , and here left is also empty , which can be seen here by double quotation marks and here by a dot . But SQL doesn't know any numeric empty cells , and therefore we use the next trick to make a substitution . First of all , we would like to get rid of the double quotation mark and would like to have only one of it. Therefore , we say substitute , and because this double quotation mark is a very specific character , we have to use backslash and exclamation mark in the front and then we replace it . Then we would like to get rid of the first and the last record . Therefore , we say remove the first and the last character and SQL doesn't know curly brackets . Therefore , we replace them with a round bracket . Here in the case of the dot, we cannot directly remove it because it is also included in real numeric values , so we use a little trick . We say replace dot with the comma into null with the comma . Here in green below, you see the resulting S QL where you list . This is the way how it should look like in SQL . So we have a queue once again, and the corresponding SQL command used in custom SQL is nothing else than a plain string , and that goes directly into the database . The form is here once again . We use an SQL template and then we say , okay , insert . Here comes the name of the database . SPC is the name of the table in our database, and here in brackets , there are the column names in the database , then the values and table. And now we use the same trick as before . We substitute table with x, and x is my value list . Here it is . And this is called SQL . Then we'll say new SQL query , your connection string . Then we use the function and JMP custom SQL, SQL, and if you would like to see how does SQL look like , it looks like this . One main trick I learned from the staff from JMP was to use this substitution . It's a very good tool to get such kind of commands . Every program is started via Task Scheduler . So here it is a display of a Task Scheduler . On the page , General , here you can see myself as also author , and here , a trial run, whether I'm logged on or not , because it will also run at the weekend and on holidays . It'll run all the time. Here it is quite necessary to choose such Windows Server , which belongs to your s erver you will have installed your JMP . So if you choose the wrong server here , you could have background processes . Here we trigger our transfers, scripts daily at 6:15 . And if you check your history , then it should look like this . Your task should be completed . It shouldn't look like this, task stopping due to time out reached because that means you have some background processes and that's not good . Here we have some field action in the Task Scheduler, and here we browse the location of the batch data file . And the batch data file is nothing else than the notepad . Here you have the location where JMP is installed , and here is the location of your JMP script . And don't forget to say exit at the end . If you use a batch , then you have to use slash -slash -exclamation -mark in the first line . So not in the second , not in the third line . It must be in the first line . And the key idea of every program we use is we have the same program , but still in the beginning, we say what kind of plant is it ? So for main here, we'll have the same program . But then here , instead of ALH , there will be ARO, like Autoliv Romania . And if we use the multiple file import in the beginning , then we say evaluate your plant . So the only thing you have to change is in the beginning , the plant name. That's all . Here you see our daily log file . The structure is always the same , so it has two tables on the top , followed by two graphs . And here we see all tests over all machines . And here you see , we have had nine times not okay values of tilt lock overall , and a lot of okay values, and the corresponding percentage is given with 0.33% and 99 .67% . Here we have the number of not okay, the percentage of failure, and same for pass . And here you see the absolute number of the test results pass and fail for Local Confection Line . So here we have three ATOS machines , and we can see that we have five times tilt right was not okay, three times tilt left was not okay , and one time tilt backward was not okay . And tilt lock overall is the summary of all four tilt lock angles, so here we have nine not o kay . On the right-hand side , you see the same bar chart, but now the scale is different . Here we have a percentage scale . And as you can see that here for tilt right and t ilt left, our scrap rate is larger than 1% . Therefore , an alarm must be created . But first of all , I would like to describe how daily reports could be created because the same idea is used to create alarm reports . First of all , we create a new window , which is a vertical box , and it is called Report . Then we create a second new window , which is a horizontal box , and that is called Table . In this third step , we create a table , call it tab1 , make a report out of it, and appended to the table of the horizontal box . Then we make the same thing once again . So we have the second report , which would also appended to the horizontal box . And at the end , the horizontal box will be appended to the vertical box . And this is how it looks like . If you would like to add some graphs , then you have to create one more horizontal box , which will also appended to the vertical box , and this is her . And then you will have the graphs below the tables . Here , once again , some ideas , some scripts . I hope it will help you . We save our daily reports as a picture . We don't save it as a PDF because we are not interested in all this page breakage . We would like to have high flexibility and no additional software . If we would have used a PDF , then we would have had four pages , so Table 1 , Table 2, and two further pages for the graphs . And so how do we store the string ? We create a variable , and this is nothing else than the path w here we would like to save our report . So here it is a report . Here it is ALH. As I said before , we say evaluate the plants and we will have the right plant there . And then here daily test result , here comes the timestamp, and we say PNG , and these vertical lines mean we concatenate everything and then we save the picture with this variable name and that's it . Here I would like to show you the rules for an alarm for every level . So we have several levels. If we have more than 200 parts , then we will have an alarm if the scrap rate is larger than 1% . If we have only a small number, so less than 200 parts , then we will have an alarm with not okay parts if we have more than five not okay parts , which means a scrap number of 2% . But we can also have a potential alarm if three parts are not okay . On the first level , we take this table from the daily report overall machines . And if you take a look, then tilt lock overall, tilt right, and tilt left have more than three not o kays . So here , therefore we have potential alarms . If we have potential alarms , we dive deeper . Now, we take the machine into account . So here you see machine 123xx , here comes machine . 124xx , and so on . And then you can see here , if we take the machine into account , then the scrap rate for tilt right and t ilt left is larger than 1% . Therefore , we have an alarm . For tilt lock overall , the scrap rate is low , but we have an potential alarm . So we will create an alarm for this machine, and now we use Atraq . Atraq is a traceability system of Autoliv, so it has information which components are included in which retractor , so one retractor , and we have the total information of every part . Here you see the display of using the database in JMP . You see two tables . The first table is that table which we transferred into the database based on our test results and re tractor information . Then we have the second table , which comes from the traceability system . And if you press this little [inaudible 00:23:58] , then you come through this picture and we make a left outer join . And now how do we make our join ? We use the internal barcode . In the beginning , I told you that every retractor has a unique internal barcode , and this unique internal barcode is called serial . So if they match together, then I have all information and therefore I make a left outer join . This is how an alarm report looks like . It starts once again with the table . It tells us the upload date , the location , which machine is in fact effected by this alarm . Here is a test , and here once again , the information about number and percentage of being okay and not okay . And the same information is given here in the graphs; absolute number , percentage number . Now, we take the information from our matching from ATOS data . The first table is for ATOS table data , and here we consider tilt lock overall, and now you see this seatbelt is effected. We also consider machine parameter . And what does the 15 mean ? Here is a translation . It means lower specification limit . Upper specification limit is 27 and so on . So every value here has these kind of title . Here below, these are the component data given by the traceability system Atraq. We have the CS- Ball, CS- Sensor, and every component has four columns: part number, lot number , box number, and supplier. Part, lot, box, supplier . So this is some information we forgot . And now we start our predictive screening . First of all , we try to find out what was constant . If something is constant , then it will not have an effect on your okay and not okay values , and the remaining predictors are used in a predictive screening . Here you see direct ly the results of the combinations , and you see that the shift itself has a very high impact . So we like this predictive screening because it is easy to read for non -statistic ians and it identifies predictors which might be weak alone but strong when used in combination with other predictors . And then based on this predictive screening , we append graphs to the report with a larm report, and we colour the graphs according to the predictive screening . First, we planned out the shift as relevant . Here you can see in blue the afternoon shift . You can see directly that there was no failure for the morning shift , and the red lines here are specification limits , but it starts in the afternoon . Box Serial was also an significant predictor , and now you can see that the purple and the blue Box Serials also have an effect , and this is our root cause analysis . So test this Box Serials. They behave different to the others . Now we save the alarm report, and we would like to send this link of the alarm report to Autoliv's dispatch system called Leading 2Lean . Leading2Lean is configured to automatically send notifications to the correct owner . Usually, you can send an mail , but sending a notification via Leading 2Lean includes a dispatch process, and it must be closed . Here , this is the way we do it . First of all , we define a variable called alarm . It gives me the path and the location of the corresponding alarm report . Then we use an associative array, so we see also the site . And here as a description , we include the alarm . This is sent via H TTP request . Here we have the fields array , which we defined here before, and then we send it . This is the way , so make copy and paste the skeleton of your JMP script . This is how Leading2Lean looks like , the dispatch system. Here we have a dispatch number , the name , the date when it was created . Here we have the link which we sent via HTTP request . And if you press it or you can also open it via email , then you will get this alarm report . This total process , which I have described using queries , make predictive screening and make HTTP request , everything could be realized with JMP . And in the same way , I would like to go make such analysis for components based on subassembly . As John Hiller said, JMP writes 90% of your code, the skeleton . I hope that I could have given you some more percentage for your own scripts . I hope that this presentation helped you a lot, and that you like it as much as me to work with JMP . Thank you .

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Batch processes are subject to high variability: raw material composition, initial condition, unit degradation, and their intrinsic dynamic nature. Additionally, they are characterized by several distinct phases and steps that drastically change the conditions during the manufacturing process. In this presentation, we will illustrate with an industrial example how to use data science and machine learning to convert this high variability and apparent excess of data into valuable information. First, we will show how to summarize batch properties into features and use the open-source Predictor Explainer Add-in to identify the most relevant ones using AutoML and ExplainableAI. Then, we will discuss the need to align the data timewise before performing trajectory analysis, briefly introducing the pros and cons of different methodologies to achieve this result. Finally, we will dive into trajectory analysis. In this last step, we will use the Functional Data Explorer functionality of JMP Pro to monitor and identify deviations. Analyzing these deviations will lead us to identify key process control improvements to optimize production further. Thanks, everybody. I'm Mattia Vallerio, Advanced Process Control at Solvay s ite in Italy in Spinetta Marengo. Today, I'm here to present a work that we did together with the University of L euven on the use of the analysis of industrial batch data. In more specific, I will present a JMP plugin that we developed that is using autoML to do feature screening, and then I will be moving on to use functional prediction Explainer to analyze batch data. T he idea is on one side, the autoML will be used for automated screening of relevant parameters, and on the other side, the functional principle component. The idea there is to use it for anomaly detection on batch manufacturing processes. While doing that, I will also talk about the need of align data time wise to be able to properly analyze it and why you need to do it and how you could do it in a simple way. Just for reference, this is a work that has been published in a book, but it's also available in archive. T his is the reference with all the authors listed there and you can download it for free. Feel free to have a look at it and you will find some more details on what I will talk about today. In the same way, the plugin that I will present is both freely available in GitHub, but also on the JMP community page as well in the material for this talk and also on a dedicated page that is also called predictor-e xplainer. M oving back to the talk today, the data that we use is based on a use case that was published by Salvador Munoz back in 2003. There, you can download this code where he's using PCA and PLS method to analyze batch data. T here, the use case contained within is also used in this talk as well and in the publication that I showed just before. I f we look at the data that we are analyzing, basically it is drying process. This drying process is composed of three different phases: the deagglomeration phase, the heating phase, and the cooling phase which you can see here as phase 1, 2, and 3. T he purpose of this process is fairly simple, is just to remove solvent from the dry cake for the material that has been introduced into this drying unit. A s you can see, we have different initial cake weight that is introduced into the system, and there are different variation according because the starting material is different every time. T he purpose is to reach specific target concentration for the solvent at the end so it doesn't have to be too dry or too wet at the end of the phase. Y ou can clearly see already from this picture that we have some variation in shape and time duration of the temperature profile and therefore also of the process itself. If we go a bit further in analyzing the data, then you can see that we have a variety of different lengths of batch duration. This is the color on the right side on the legend. You can clearly see here even more clearly than before that there are different shapes. T his is also true for the solvent concentration. As you can see already, this shouldn't be too much of a shocker for anybody that is in the process industry. T he longer the batch, the lower the solvent concentration, the shorter the batch, the higher the final length concentration more or less with some few exceptions. But as you can see, the length is all over the place and also the main phases, they are not aligned. If you would take now data for all these batches and start to analyze it, you will be comparing samples. For example, at this point in time you will be comparing data from the de agglomeration phase with data from the heating phase, or even from the cooling phase with the deagglomeration phase. O f course, this is not what we would like to do. That's why it's important before you do anything else with the data, it's important that you actually squeeze or shrink or enlarge data. But in order to have all the different batch have the same length. Y ou can do this in different ways. This is technically called dynamic time warping. This is also a feature that is included into JMP when you do functional data exploration. But there are different ways to do this. You have very complex mechanisms and algorithms that have been developed during the years. T he reference for these methods you can find in the publication that I just showed you. But the drawback for this… O ne of the drawbacks of the advanced methodology is that you need a reference trajectory in order to be able to use most of the dynamic time-warping algorithm. There are other ways that you could use to synchronize the batches. One is that if you have a monotonous- increasing latent variable, most of the time, this is the conversion or the total amount of material that is fed inside the reactor, so t he cumulative feed in the reactor. This can be used as a way to plot the data in a system in a standardized way and to have all the data aligned. The methodology that we used for this use case for this talk and also that we are proposing in the paper, in the article that we wrote is to normalize the data based on the automation triggers. B y automation triggers we mean the change in the different phases. E very beginning and end of the phase is then normalized between 0 and 1, as you can see here. T he deagglomeration phase starts from 1 and goes to 2, and the each phase goes from 2 to 3, and the cool down phase goes from 3 to 4. Then all the data is squeezed or stretched to fit into this bucket. T hen something very nice happens that you can directly see abnormality or abnormal batches in a more clear way than what you would have done on the left side. Then if you would look in the plot of the phase time, so the one in the middle, then you can clearly see that the inclination of the line basically tells you how long the batch lasted. T he more steep the line, the longer the phase that we are currently looking at. The drawback of this methodology is that it cannot be applied online, of course. T his can only be applied once the batch is finished or once the phase is finished. But of course, online, it's basically impossible to know when this is going to end. Y ou therefore need to resort to other kind of alignment procedure like dynamic time- warping that is described in the paper. I won't be touching that today. T hat's it for this. How do you analyze actually batch data? There are different ways to do that. T he first way that we are looking at is by using fingerprints. W hat do we call fingerprints? Basically, fingerprints, y ou can define it as aggregated or a statistical summary of different summary statistics of the data that have physical meaning or engineering value. T hese are normally the variables that your engineer look at to know if the batch is going correctly or has been performing well or not. If you ask your experts in the field or in the process, they will have this kind of KPI that they are monitoring to know if a batch has been performing well or not. F or example, one of that could be the maximum level of the tank in the deagglomeration phase or the maximum temperature in the drying phase or the standard deviation between the set point and the measured variable during the drying phase or… I don't know. You name it. You can go as crazy as you want, and you can build basically as many features as you want starting from the data that you have. T his is a way to remove the burden of the transient behavior of batches, and it's a way to actually compare between batches by using simple statistics to compare different features of the batch. The problem with this is that you can imagine that you can end up with a lot of different statistics that you have to track and monitor, and sometimes it's very difficult to understand which ones are really relevant or which are not relevant at all. T herefore, that's why we developed this plugin that I showed you just before which uses autoML to basically do a feature selection on all these fingerprints that you can create yourself. The add-in can be installed by everybody on JMP. I t basically looks like this. I t looks like any normal menu that you would have in JMP. I t requires you to install a Python installation that also is automatically managed by the installer of this plugin as well. I f you want to do it, let's say, let's try to use it. W e want to model the final concentration of the solvent, which is our Y. You can just basically pop in all the sensor data that you have, and it will automatically create all the different feature engineering. We'll take the maximum, the minimum, the standard deviation, the median, the mean, and all the statistics you possibly imagine of all the variables that we introduce. If you have information on the batch ID and the phase ID, then you can just plug it in. Additionally, if you have the Python installation, then you can ask the tool to do a SHAP plot for the SHAP value of the different features to get a better understanding of what the boosted tree is doing behind the scenes to actually do the magic and use this and select the features that are relevant or not. If I just click on… T hen you can tweak your number of trees and signal-t o- noise ratio and you can do whatever you want. You can even add weights. You can choose. I f we click on OK, then the magic happens. Now, as you see, it's still computing because this is the Python script. Behind, it is computing the SHAP values, so it might take some time before we get the results. But let me see if I can… Yeah, I can move here already. T his is the result basically. Y ou can see that we have the different… A s I said, the tool basically generates a lot of different statistical aggregation of the data. You have the standard deviation of the agitator speed, standard deviation of the torque, mean of the agitator speed. Y ou can see it for yourself. T hen we also have… Oops, s till computing. That's the beauty of doing it live. Sometimes it doesn't go as planned. Here it is. This is the SHAP plot and we're going to look at it later. Let's do it again, but without the SHAP value request just because I want to show you another feature. I won't click on that. I won't be doing it anymore. Now… Oh, the SHAP [inaudible 00:16:57]. I'll do it again afterwards, but let's move on with this. As you can see, we also have random and uniform noise with statistical feature of this. T his is being introduced as a way of cutting off, as a way of selecting which features are really relevant and which features are not relevant at all or cannot be distinguished by noise, actually. This is standardly built in as well. I t gives you an automatic cut off as well. O ne of the things that you can see is that he selected the torque and the agitator speed as one of the interesting variables to look at. Now, you cannot really use this as… A ctually, if you think about it, it's quite understandable, because depending on the amount of the wetness of the cake that you introduce, then of course the torque consumed by the agitator will be higher or less or lower depending if it has to work more or less. I t's completely normal that at the beginning, the ones that are a bit wetter, they might be less resistant, and the ones that are less wet will have a little bit more resistance. But this is the kind of feedback that you get from the tool. The standard output is a plot with the variables that are more relevant. You might have seen passing by it. It also makes a parallel coordinate plot of the output as well with the color on the target. Here, you can see again that if the torque is a bit higher, then the final concentration of the solvent is also a bit higher, and if the torque goes down, then the wetness of the cake at the end is lower. T he agitator speed is basically the effect of the torque as well. T his is also the torque. But this is just a visual representation of what the tool does as we were seeing before, but it's not there anymore. I don't see it. Y ou can also have SHAP values to look at the data in a different way. T he SHAP values, if you're not familiar with the term, is a way of visualizing the impact or the effect of the different values on the target. It's a way to explain actually the result of the machine- learning algorithm that you use behind. L et's try to do it maybe one more time, maybe selecting fewer parameters, let's say, torque, agitator, and dryer temperature set point, which are the ones that have been [inaudible 00:20:48]. Okay, like that. Then we do with phas e and batch. T hen we ask him to do the SHAP plot. Let's see. Coming. B ear with me a bit with this. It should be any minute, any second now. Oh, here we go. The legend gives you the normalization of the value. I f the points is on the left side, then you have a negative effect on the target value, and if your point is on the right side of this, then you have a positive effect on the target value. Now, as you can see again, the torque is one of the most important ones. Y ou will basically see the same that we have seen in the parallel coordinates plot and in the results of the analysis as well. If you have a lower value of the torque, then you have a negative effect or you are on the left side of the curve, and if you have a positive value of the torque or higher value of the torque, then you have a positive effect. T hen you can analyze this for the other variables as well. But this is a very powerful… W e think it's a very powerful tool to visualize the effect and to break down, to analyze what the actual algorithm spits back to you in a more efficient way. At least this is what we think and that's why we included it into this tool. T hen you can just scroll down and look at all of the variables. Then of course, we still have the random uniform and the noise as well inside, even though it's not really relevant. You might have noticed that batch ID was also relevant as well. It's a bit fishy, this plot, right? This actually is a good point to move into the next part of the talk. That is to have anomaly detection for batches or to have a way of analyzing if one of the variable is going out of spec. T he standard way to do this for batches or for industry in general, is to look at some KPIs and see if they evolve during time. F or example, we might want to look at… No, that's not what they I wanted to open. We might want to look at the different phases and the duration, to have a look at the variation that we see. W e expect a lot of variation in the deagglomeration phase and a little bit less in the heat phase and the cool down. T he other way to look at this is basically to do a control chart of different parameters and see if these parameters are inside the limits that you have specified or not. O ne way to look at that is to look at the target function, for example, that will be one of the first variables that you need to monitor. N ow, if you remember the graph that we showed before where you could see that the batch ID had an impact on the solvent. Now plotted like this, it makes more sense what we are looking at in that SHAP plot. It is because there has been definitely a trend. S tarting from batch 0 towards batch 70, there is a variation on where the final solvent concentration has been. U p to batch 30, we were on target, then we went under target, and then we went too much, high solvent as well. Definitely something changed during the process, and therefore, we had this kind of visualization in the SHAP plot as well. I t picks up that the batch ID is relevant to predict the final solvent concentration, but it's just an artifact of this data. N ow, we don't know if this is different batches. Most likely, there are different batches of different product, and the initial concentration differed from different campaigns or something else was going on. But this is an additional uncertainty that is inherent of batch processes that if you have this variation in your raw materials. This is also true for other process as well, but for batch processes, it's much relevant as you can see here. One way to look at data or to do anomaly detection that has been widely published and is also widely used in this industry is the use of PCA analysis like PCA and PLS combination to understand the multivariate space at the specific point in time. I f this is not representative of what is going on in the batch or in the ongoing batch, then you will have an alarm. I t's a multivariate way to look at the data. Now, with the functional predictor explainer, now we can basically do the same, but instead of using standard PCA, we will use the entire information of the trend. This is a standard tool that you can find inside JMP. It is in specialized model. T his is Functional Data Explorer . It's a part of JMP Pr o, only JMP Pro. That's what I'm using. If you have it, then you can use it. We can do basically the same or we can do this analysis. I already run it so we'll just relaunch it. Basically, what you see is, for example, if we're looking at the tank level as a function or as a variable, then it gives you summary statistics. T he idea behind the FPCE like PCA is instead of creating… It's basically creating and identifying eigen functions that can explain the shape that we see in a specific percentage. I n this case, for the tank level it just identified two eigen functions, and the sum of this function can explain 97.3% of the shape of the totality of the shapes that we see. N ow, here you see all the shapes on the left, and you can clearly see that there are some that are not represented by… They're not similar to the rest. You can play around a bit and increase the number of shapes to include the third eigen function, but automatically, JMP selects for you the most appropriate number of eigen function to have a trade off of explanation of the shape. I f you go back to two… There we go. How does this work? Basically, as you can see, you have all the batches here, and then you have the score plot which is actually what allows you to understand which batches are anomalous and which are not anomalous. Y ou have definitely batch 61 that is a bit out there with respect to the rest. Then as you can see here, you have batch 55. Going left from right, you can see that there is an evolution of the batches on the Component 1 axis, which was this specific shape over there. A ccording to where you are on this C omponent 1 axis, then the batches will have different shapes, and the max level basically will increase and increase until you reach batch 55 and batch 66, which are a bit anomalous with respect to the rest. This basically is the same concept of having a PCA but with shape function analysis instead of multivariate analysis done row by row and point by point. The idea in the end is that you could use this online to understand if the batch is inside the specification or outside of the specification. Y ou could do it per phase, for example. If you have a specific shape for one of the variables that you need to trend, then you could use this to analyze and see where you are. The same is true for the other shapes that we have, the other variables. You can do this for the dryer temperature variable. In this, case we have three different eigen function, and this explains up to 87% of the variation. A gain, by looking at the score plot, you can spot anomalous batch basically just by looking at this, so batch 34 as a flat top while all the other batches have a pointy shape that you can find back in basically all the other ones. The model that is coming out can be used for on line anomalies detection if you can implement it. B y the way, if you have the new version of JMP, you can connect directly to your process h istorian if you have OSI PI. Otherwise, there's been another talk by my colleague, Carlos, about the use of another plugin that we developed to extract data from your historian which can connect to both OSI PI and Aspen 21. You can download your data directly and plug, pop it in and see if a batch has been behaving according to your specification or not. Basically, I think this more or less cover what I wanted to show, and the idea that we did the two different methodology that we have been using in Solvay to look at process data. Looking forward to see you at the summit in Spain next month in March if you are there. Otherwise, feel free to reach out to me or to any of my co author if you need more information. Just as a break up, this is the place where you find the article that we published about this with a little bit more information and a little bit more detail with respect to what I've just shown to you. A gain, it's open source. You can download it for free from the link and it's all there for you to look at and browse. Thanks again for your attention. T hat's it.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

In many instances, relevant data exists, yet often, it is not directly accessible, and either cannot be utilized for data-driven analyses or requires painstaking manual efforts to extract. One classical instance of this type is PDF documents. In this presentation, we will demonstrate an example of standardized PDF reports in the Laboratory Information Management Systems (LIMS) and how the JMP Scripting Language can automate data extraction from these PDF files. The presentation will also show how the resulting scripts can be packaged as an Add-in for distribution to many users. ____________________________________________________________________________ Explanation of the attached materials: 2023-03-20 Automated extraction of data from PDF documents using the customized JMP Add-ins.pdf --> The slide deck in which the numbering of the examples refers to the correspondingly enumerated sections in 02 Step by step development.jsl --> The JSL file that guides you through the step by step process of the JSL code development in order to read the Freigabedaten_Beispiel.pdf --> Examplary sample data stored in a PDF file 03a Functional code.jsl --> Summarizes all code developed in 02 03b Custom_Functions.jsl --> Example file to demonstrate how multiple JSL files can be packaged in a JMP add-in 03c Values for add-in creation.txt --> The values utilized to define the JMP add-in Example PDF Data Parse.jmp --> The JMP add-in created from 03a - 03c. PDF Data Load Example.jmpaddin --> The same add-in but with extended functionalities (files selection, progress window, etc.) that were not discussed in the presentation Good day, everyone. My name is Peter Fogel. I'm an employee of CSL Behring Innovation, and it's my pleasure today to talk to you about Automated Extraction of Data from PDF Documents using what I call Customized JMP Add-ins. More or less, let me give you a little bit of an high level overview of what we're going to do today. First of all, I want to motivate in the introduction why we should actually want to extract data from PDF documents. Then second of all, in the approach, I want to show you how you can leverage JMP to actually really do so, and what it actually means to use JMP and to create JMP scripts. Finally, we want to really transfer those JMP scripts into what I would call an add- in, and I want to explain a little bit why add- ins are actually the better way to store, if you like, JMP scripts. Finally, I want to tell you what you can do once you are actually at the level of JMP. Why should we actually use PDF documents and want to extract data from it? Well, on the right- hand side, you see one example of a PDF document, and you see that it actually contains quite a lot of data. Quite often, this data is unfortunately not really accessible in any other way. Be it for questions of old software systems, be it in any proprietary software, be it of whatever it actually is. Sometimes, really, PDF documents, and here you can really also replace the word PDF with any other document format is really the only choice. You want to actually have this data, or otherwise, you would really need to actually have a lot of manual operations to do on the data, which is both annoying but potentially also really demotivating for your team members. The latest point is, if you don't have the data at hand, well, you can't make the decisions you want to. Quite often, data is key to making informed decisions. Without informed decisions, well, that's really a disadvantage in today's business world. What I want to show you now is really how can we actually use structured data in PDFs files, how can we leverage them using JMP, and how can we, based on that, really make decisions. Today, I'll only focus the aspect of really how to get the data out of the PDF and how to really give it over to the user, everything else, how to analyze the data and so on, could be then a topic for another talk at another time. Before we actually start really with JMP itself, let's talk a little bit about what I would call the guiding principle. The first part, I believe, is really, first of all, understand what you want to do. If you don't understand the topic itself, you can't really work with it. In this case, we know we have any PDF document, or potentially also multiple PDF documents, which we want to actually parse. Then we might need to actually do some organization of the data. And finally, potentially also do system processing depends obviously on what is in there and what specifics we have. But in the end, that could be more or less a three- step approach. From there on, you could be ready to do any data analysis you want to do. Really understand your question at hand and we'll do so also in the next slides in a little bit more detail. The next part is really break it down into modules. The more modules you have and the better they are defined, the easier it is. Really make your problem into smaller pieces and then you can really tackle each piece on its own, and it's much easier than if you actually have one big chunk of things to do at the same time. The third part, I believe, is always use JMP to the best you can do, because JMP really can do quite a lot of what I would call heavy lifting for you. We'll see one example, which in this case will be the PDF Wizard, but there are many, many more things that you could do from analysis platforms like the distribution platform, over other platforms. They can really do a lot for you, and in the end, you just have to scrape the code and that's it. You can really get it more than for free. The fourth point, I believe, if you define more fields, really also make sure that they are standardized. Standardized, this sense really means they should have defined inputs and outputs so that actually if you figure out I want to do this part of one of the modules slightly differently, it still doesn't break the logic of the code after all because it still has the same inputs and outputs. The last part, I hope should be clear, let's first focus on functionality and then later on, really make it user- friendly and really suitable for any end user. That's also what we will do today. We'll really focus more on functionality today and less on the appearance. Let us now very shortly look into our PDF documents, and I'll also share that with you in a second, the actual document. But now let's first look into this snapshot here on the right- hand side. What do we see? Well, this PDF is actually consisting of several pieces. The first one is typically this header, which just holds very general information of which we might just use some of them, but potentially the all. Then we actually get an actual table, which is this data table here which has both a table header as well as some sample information. If we look into that now in an actual, let's say, an actual sample, then we can actually look into this PDF and we'll see that, we can share that with you. It really looks like that. You see this table continues and continues and continues across multiple pages. On the last page, we'll actually see that there's again data and then at some stage, we'll have some legend down here. Potentially, obviously, we'll also note that there might not necessarily be data on this page, but we can rather just have the legend here. Just as a background information, we know now how the structure of this document works. More or less, we can also state the first page is slightly differently, then we'll actually have our interior pages, and the last page, as mentioned, can contain sample information, but it does not have to, and it certainly contains always the legend. If we now get a little bit into more details, we'll actually see that each more or less, let's say line or each... Let's call it line or entry in this data table, actually consists of a measurement date as typically also than actual measurement. Those again are actually separated into multiple pieces. You will have, for example, here the assay, which in this case is just called end date, or here, let's say the user side. You might then also have some code or assay code, it depends. You will have a sample name, you will also have a start and end date typically. You might have some requirements and so on and so forth until finally, you get what we call the reported result at the end. Our idea would be really how to get that out. We'll see actually, yes, this first line, if you like, of each entry, that actually holds different information than the second one. This third line actually just holds here what we call WG in terms of the requirements. It's not really yet that perfectly structured, but we see there is a system behind this data, and that really allows us to really then scrape the data to parse them, to really utilize them to their full extent. Let's now again break it down in modules, as I said. What we can do is we can again think around this three- step process, and I believe what we could also do is we could actually try to break it down in even more steps. The first step could be, and that is now really user dependent then, that actually user says, Please tell me which PDFs to parse. The user tells you, It's PDF one, two, three, for example. Then you would actually say per PDF, I always do exactly the same, because in principle, every PDF is the same. One has more pages than the other, doesn't matter, the logic always stays the same. You would, first of all, try to determine the number of pages. This we won't cover today, but in general, we can think around it. Then you might actually want to read general header information as we know it, and obviously process it. We might certainly want to read the sample information and process that. We might want to combine that. Again, this one we'll skip today, and we obviously want to combine the information across files. Now that means at that stage, we would really have all the information available that we want to. Finally, just what we need to do is we need to actually tell the user, now tell us where to store it. Finally, we want to store the result. Again, those two last steps we won't cover today, but I guess you can really imagine that that is something that is not too complicated to be achieved. Now, let's actually jump into JMP itself. What I want to show here is that really, let JMP do the help lifting. This case, in particular, let actually the PDF Wizard do all the powering of the data for you, and if you like all, you then have to do is really change the structure of the data. But more or less, you actually can leverage the JMP Wizard or the PDF Wizard in JMP to a full extent. At that stage, let's really switch very quickly over to JMP itself, and let's see how that works. I've taken here this example which is called just Freigabedaten Biespiel.p df and we'll actually see what happens. If you open that either by double- clicking on it or by actually going via File and Open or this shortcut File Open, then you can actually see that if we select a respective PDF file, you can actually use the PDF Wizard, and now let me make that a little bit larger for you to actually read the data. We see that from the beginning, actually JMP already auto- detects some of those data tables in here, but we now want to be really specific and we just want to, in this case, only look at the header. Let's ignore that for now and let's really just look at the general header table. We would say in that case, it starts here with the product and adds with the LIMS Product Specification. So we can draw just simply a rectangle around it, let that fall, and you'll actually see in an instant what happens over here. You'll see JMP recognizes that one has two lines. That seems to be about right. It also recognizes, well, in principle, I have only two fields. Now, one could argue, well, this one is one field, this one is a field, and this one is a field. So it might or might not. It depends a little bit also on how you want to process the data, say JMP, please split here the data. If we don't want to do so, we really need to actually look at, yes, this second part of the field starts with something like a LIMS log number. In any case, we now have more or less data at hand in the format and could just say okay, JMP will actually open that data to the force. Now, very interestingly, what we can directly do is we can actually look into the source script and we can see, oh, there's actually code. And this code we can really leverage. I would now just copy this code for a second. We could now actually create a first script. For this, I'll just actually open a script all by myself. I'll very quickly open that for you. We can actually add here the code. What you should actually see is that this code that I've just added is really the same as the code that we have down here. It has no difference whatsoever. So let's just use the code as it is. Now, if we look a little bit closer at that code, we'll actually see that there are a couple of things we can see. The first one would be that this actually is just the file name of the file that we used. Instead of actually having their long file name, I said down here, okay, let's define that as a variable and let's just use the file name here. What we also see is that this table name that was here is actually the name of the table how it actually is returned by JMP. In this case, we would potentially not just call it something like that, but rather this case had that information. And then more or less, we also see that JMP actually tells us how it actually passed that PDF table. In this case, it says it was page one, and it says I actually looked for data in this rectangle. Everything else was done automatically. If we execute this statement now, we actually see it gets us exactly the same data as previously, and that is it. So far, so good. That is just, if you like, all until now about the reading of a PDF file. However, as I said, we actually also wanted to look at the actual sample data, not only the header data, but also the sample data. Let's now do that once more. Let me enlarge that again a little bit so that we can look at that. A gain, you could say, Okay, in this case, let's ignore the data. Let's again focus only on one specific part, in this case, the sample data only here on page one. Where does the sample data start? Well, it starts here with the LIMS Proben number. It goes down exactly until here and also out until the scales column if you look. We can read that now in assays. What we would now see directly is both at looking over here but also looking over here, that JMP actually utilizes two lines as a header, so two rows. That is not really what we desire because only the first line is really the header. Everything else actually is content. If you right- click on this red triangle, you could actually adjust that and say, Oh, I don't want to use, in this case, two rows as a header, but only just one. Now, once you change that, you see, okay, we start with the end date as the first actual value here. That's perfectly fine. The other part that we might actually spot is that this first, if you like, column actually contains two extra columns. Here, the one that actually holds the sample number, and here, the start date. The reason for that is that actually, many of those values are actually too long to be broken into two columns. We can now tell JMP, please enforce that it is broken into two columns by right- clicking into more or less the right vertical position and then telling it, Please add here column divider, and would now directly see that yes JMP splits that. More or less we now get, unfortunately here, a little bit of a mess for always this, let's say, first column where actually SOP word here is split as an S and OP, but therefore we have a start column. Here , I would say, let's appreciate as it is, obviously, keep in mind that we split this field always, which is a little bit unfortunate, but it is good as it is for now. Again, if you capture that content, you would get a JMP data table, and for that, you could again source or use the source script to actually look at the code. If you compare this code to the code I've captured now here previously, you would see it is pretty exactly the same, potentially up to this field where we actually set the header or the column divider. That might be shifted a little bit only, but the remainder is exactly the same. We could really read here how that actually works. You see that you have one header row, you see that it's page one. You again have defined a rectangle for where you want to read, and here you have also defined column borders as we more or less want to appreciate. Again, as previously, you could actually say, Let's source out this name, and let's also source out this table name or replace them. And that is more or less what we call now our content file. If I close that and we just run once more this code, you would actually see. That creates our JMP data table as we want to. Getting more or less the first shot at your data seems perfectly fine is not way too complicated, I would argue. Now, how do we go from here? We have now the data in principle, but obviously, we need to organize that a little bit. For this, we can actually take a number of features, and it depends a little bit as to what we want to do. There is things where we can actually use the lock, which actually records more or less all your actions in JMP on the graphical user interface. From there, you can actually really script code. That is something we'll see just as an instance here. In addition, you could also use the scripting index, which I highly recommend, which really holds quite a number of functions and examples. And so really helps you to actually also use them. We can use the formula editor, I believe, and we can also use the copy table script, for example, to really get things going. Now, let's demonstrate that again at our JMP data table. In this data table, we'll actually see that we have a number of things in here. For example, we want to now actually get that organized in meaningful form. First of all, let's define how that format should look like. Let's open a new JMP data table, which will be, if you like, our target. Into this data table, we want to write, and let's define what should be it. We could, for example, say the first thing we want to do is that we have here the assay, for example. We then potentially would also want to have an assay or just assay code, it depends on what you want to call it. We might want to have here the sample name because obviously, that is now this field that should be captured as well because that is highly relevant. You might also want to include a start date or an end date, and so on, and so forth until you actually have more or less included all of those fields as you want. Now, I would at that stage also say they should actually be just by now because this data over here is also correct. So if you want like attribute, if you like, we should also do so here and standardize those attributes by selecting actually data type and say, yeah, that should be correct at that stage. Now we have that data table, but obviously, this doesn't help us so much because that is not reproducible by now. However, there is the option to really record that. For example, you could say, copy the table script without data, and I'll do so for a second, and I would now insert that script here as well. If we look at that, we'll see that we actually created a new data table which has the name Untitled 4, and obviously, we can change that. It has so far zero rows and it has all the different columns that we just created from assay to start. We could give it a name and I've actually created here a data table that has just the name data for page one that holds those first four attributes, as well as all the others that we actually want to have. Let's actually leverage that and continue with this one as this one was really just a demonstration. Let's create that one. Let's run it, and you'll actually see that's just a data table as it should be with all the fields that we want to fill from now on. What we also want to do for now is we want to recall this data table, which is just called something like that, and we call it to actually call that that content, in that case, say and we actually want to abbreviate this LIMS Proben minus number to LIMS Probe for simplicity. Now, what do we actually want to do? We actually want to work with the data a little bit, and I want to illustrate two examples how we could do so. Let's look first at this column unfold. Within this one, you see that there is actually the A G and also the WG, and we might actually want to split that into two separate columns to really make sure that in one column later on, we can more or less capture the AG values and in another one, the WG values, and that not the sample information as here is split really across three rows, but rather following what I would call a date target or a fair data format in one room. How could we do so? Let's, in this case, just insert the column and let's call this column AG, say requirement just to more or less translate the word unfold into English. Now, what would we want to see? We would actually say if there is an AG in here, then let's capture the value after the AG in this column. If there's nothing there, then let's capture nothing. And if there's WG, then let's also not capture anything because that does relate to age. How could we do so? Well, I would say let's build a formula. Formula typically is really the best place to start. What do we want to do? As I said, we want to do something conditional, which means if there's an AG in there, we want to see something in there. If there's no AG in there, then let's not do so. The easiest way to do so, I would say, is the if condition, which really tells you if there's something, then do something, and if there's nothing in it, then do something else. We would say here if contains and contains really looks for a substring if you like. We would actually look now for this column which is called Anforderung, and we would look for the word AG, and we say that should happen something, and if not, then something else should happen. Now, we've actually just created a very simple if statement. And more or less those two, we would still have to specify. However, even at that stage, we could actually look like if that what we described makes sense. We would see whenever there's an AG like here or here in our column, Anforderung, then we would see a then statement, which is good. Otherwise, we would see here just the else statement, which is also good. So let's modify that a little bit. What would we want to see in the then statement? Ideally, I would say we want to see more or less what is called in the Anforderung filter or the Anforderung column, but really getting rid of this AG part and just keeping it in mind. To do so, you have many options. One of them, I would say, is so- called Regex or Regular Expression, which really says, take what is in this column, look for this, in this case, AG part, replace this by nothing, and then actually give me back the remaining. You would see if we do so, then we would actually looking at more or less the whole expression, we'd see if there is AG with a minus, we will actually get a minus as a return. If there is a smaller equal to 50 minutes, we'll get the smaller equal to 50 minutes. That sounds good. The else statement assay , we would actually just say, let's make there an empty statement, so nothing else should be returned. And that actually really would work. You see, if we go to this column, only whatever you have this AG, it will return the value after the AG. That looks perfect. Now, I would actually use more or less this idea or this logic to actually include it in my script. We could also again capture the code from the data table and we would see it down to formula. But in principle, we could also capture. Before we do so, I have inserted here a little bit of additional information, which means in case we would actually read the last page, we saw that there was the legend. And in this case we said, let's remove the legend and it should be good. In addition, I also said if there should be any completely empty rows, I would want to remove them. Now to continue, I would actually say, let's look now for where are the samples, and then let's capture actually the data of each sample. In this case, we would look into where our samples and we would see, let me very quickly execute this part, would actually execute and would see, okay, that is actually a start where each sample starts. It looks actually, in this case, only for where more or less this value of end is missing. Similarly, where the Anforderung is missing because those are the two columns that define where actually only the sample resides if we have to move up the column. Now, iterating across each sample on its own, we would actually look at where is the data. Taking, for example, this Losezeit sample here as the second sample, we'd look at, okay, the assay, or we first look at where does it start. It would start in this case at row 4. Would actually now combine the data of those two fields to get, again, a full name. We would look actually at where does the assay sit. The assay is, if you like, just in verbal names, it would be actually the first part of this whole string, if you like, just before the forward slash. You could really just capture that, potentially also removing the one because that doesn't make sense. Similarly, you could look into the code, which would be really the second part here, which you could get from there, and so on, and so forth. Now, obviously, I agree, this part of code doesn't look way too simple, but if you read it very carefully, it actually always has more or less the same structure. You look at the part of the code that is in the respective line at the respective field and potentially to do a little bit of twisting just as we did with the AG column. If you look at this AG column, you'll actually see there's again our regular expression, there is the AG part that we replace by nothing, and that's more or less it as we do. If you have done so now, you would actually want to create here one additional line where you can actually now enter all the data that we have captured. How would we do that? We would actually say, right- click onto that here, sorry, right- click onto, left- click onto the row menu and say Add Rows and enter there. Now, interesting enough, at that stage, you could really look also into the Log statement and see there, there's one statement that says Add Rows and you could just copy this part about add rows. This is really more or less the same as I did here. You see there's also, in addition, this At end. Typically, that's the default value so it doesn't matter if I have it or not, but that's it. From there on, I could really say if I have included that, I actually just copy all those values that I had previously here, everything that starts with a C into the respective column. Sorry, into the respective column. In principle, it should, if I know correctly, execute that at once. It should now actually work as is. So we see actually the second row now was the one that was correctly added, or if I delete them for a second. Again, that should now execute as is. We could really do so line by line by line, and we'll see if we do that across all the samples, which should be very good. Now, let's return at that stage a little bit into the presentation and look how we continue from there. Now, we have actually at that stage really captured all the sample information, but we want to make it a little bit more handy for like. So far it's a little bit of massive code, but we can certainly break it down a little bit better. That is what we would do now. We'd really say, let's make out functions from it. And functions have really the nice feature of they tell you what to actually have here as an input and what to have it as an output. That really means you have that standardization of inputs and outputs anyways. In my eyes, it's also way easier to debug and to maintain. You have no need for any copy- pasting operation. In my eyes, it also really enforces a good documentation of code. Let's do so. What could we do now? As we have seen previously, when we actually read our data, we use this open statement and just said that's it. However, here we could now also say, let's define a function which just has a file name, then we read the data and we return the data. In principle, it's not, let's say, too different from what we did. Just that we actually say it's a function which takes one argument, in this case, the file name, could be also multiples, and which returns something. If we actually execute that, we'll see, oh yeah, that actually created exactly that data table that we initially brought in. Similarly, you could also do so and say, oh, we just transformed the data by creating a new data structure and then by actually changing the data or let's say organizing it as we want it. If we also more or less initialize that data, we would see, yeah, also that should work as is. So we'll see here. This more or less now concepts exactly to what we did previously. So it really means you have just, if you like, only two functions which you can call, which I believe is a really good way of organizing your code. Now, let's more or less think also about the last part. And the last part in my eyes is really a little bit around UX or user, let's say, experience. That means a little bit around how should I present it to the user? What I believe is that you can certainly play around with which data tables are visible at which stage. And you see here a really short snippet around that you could create a data table from the beginning as invisible, or you could just more or less hide it after being created at the initial stage. Or you could actually say, if I actually store data, I could provide users a link to the directory directly, which means they don't have to actually look for that file, but really can just click on the link and see now the directory opens. Or you could actually inform the user about the progress of your execution, and so he or she knows, Oh, I'm still at File 1, but already at page 8 out of 12. There's a number of options that you could do, but obviously, as I mentioned, I would take that only once I've really implemented the whole code. Now, more or less what we can state at that stage is that, yes, we have now more or less all the code in place to really run, let's say, this collection of data from our data table or from our PDF files into a data table. However, there is one issue and that is more or less the issue of really bringing it to the user. The point being, more or less I have one big JMP file, potentially it has quite a lot, let's say, offline, and the user in principle has to, at least up to some degree, interact with that. T hat is something I typically would want to avoid because that is not really, let's say, something users want to do, and I would also be a little bit scared that they might break the code. Instead, I would turn to JMP Add- in, which has the nice feature of being only one file and it just requires a one- click installation. The other part is it's easily integrated into the JMP graphical user interface. You don't have to interact with the script. You have a lot of information at your fingertips, and there's actually a lot of information of how you can do more or less create an add- in. There is, for example, the add- in manager, I've added here the link, but there's also the option to actually do so on a manual or script- based way. I believe while it takes a little bit higher effort, it's actually much better in terms of the understanding. I want to show you very quickly how that works. For that, I've actually created in my folder where I've stored all the data so far, so all the JMP codes so far, I've actually created once the functional code, which actually holds all the code that we've created just in a slightly more organized form if you like. You might actually really recognize, again, this read sample data page or this t ransform sample data. Plus I've added here an additional file which really just holds an example of additional code. You could imagine that potentially you want to outsource the functions from the functional code to the custom function, say, for example, to really make the code better readable, or so on and so forth. Now, you could actually say from those two, I want to create a JMP add- in. Simply by saying, okay, I go to File, sorry, I go to File and New. There, you have the option to create the add- in. You would now actually have to specify a name and a ID. I've now just thought about it previously and so will not really care too much about what they are called. But please really look at more or less the suggestions for JMP add-ins. You would look into, oh, which menu items do I have? And so you would add a command, you would give it a name, let's say in this case, launch PDF creator, and you would have to specify if either you want to add here the JS code or if you actually have it in the file. In this case, I would say, let's use it in the file as we did it. It should actually be in here and you would include that one. Similarly, you could actually see that there are a number of additional options like startup or exit scripts. At the end, you have to include any additional file you want to have of it. In this case, let's just assume it would be our custom function code. In the end, you can more or less save that as, say, our example PDF data browser add- in. Once that is actually stored, you can simply install that by actually double- clicking on that install, and you would see that you have under add- in now a launch PDF reader, which in this case would really just read this one specific PDF. So it's still quite fixed. There's quite a lot of, let's say, information which we could make more dynamic , for example, the file selection as I mentioned at the beginning. But that's more or less at least one way how you could read the data. Now, let's return here very quickly to a little bit of what we could do in addition. We could have really a short look also into JMP add- in. I would say that a JMP add-i n, and that is very nice about it, actually contains really more or less every single one tool. Let's look at our example PDF data path and we'll see where it was installed. In addition, if you look into that, you will actually see it holds all the JSR code that we have, plus two additional files which define actually what that add- in is named and what its ID is, plus more or less the graphical or the integration into the graphical interface. If you read that a little bit careful, those two statements in here, you will actually see how you can easily adapt them to your purposes if needed. The last part I actually want to show here is actually what you could also do if you had it fully functional. This is more or less what I want to show you now at that stage. We'll install what I would call the final add- in. A little bit of the add- in, having also in addition, let's say, a little bit of the user- friendly tools. You could see I have to edit that now under here, this GDC menu. I would have a little bit of buttons to click a few more than potentially previously. You could actually say, Oh, what do I want to actually read? In this case, I would want to read those seven files. As mentioned, they are all copies of each other just to have examples here. We would see that there in principle should be also a progress window here which waits now for demo purposes after each file for two seconds, reads each file, we see also that the speed of the reading is actually quite impressive, I believe. At the end, you see there's data being progressed in the background. The user sees that also in principle but doesn't see it in the foreground. The user is not really annoyed in the foreground, but only once the data are processed, we'll get here a final result and we'll actually see that this is the whole data table. It holds data from, let's say, the first file until more or less the last file, so on file number is called six, and that would be more or less the way. Now, as I mentioned, that is until now, I believe, also quite a lot to do. So we could still ask, what is next? Is there any next step? I would argue, yes, there is. The first one in my eyes is really celebrate. Getting until this stage is really not a triple task and it is really a true achievement. Really be happy about it, really concrete yourself that is really an achievement. The second part is, in principle, you might want to do a little bit more around it. You might want to think about code versioning. How do you actually work with going back a version or going ahead a version? If you have developed that or looking into feature which doesn't work anymore, but stuff like that. Code versioning, I believe, is quite helpful. Similarly, if you think about collaborative development, Git might be an answer there. If you think about unit testing, so how to really ensure that even though you have once tested your code and you have now changed it a little bit, it still works, then unit testing might be the answer. If you want to deploy more or less add-ins to a larger user base, you still have to think a little bit around how that works. There is so far, I believe, no really good solution on the market. The other part is, obviously, I would love to hear feedback and any questions. You can reach me under this email address and I'm happy to hear more or less any suggestions, criticism, whatever it is, please feel free to reach out and I hope you could learn a bit today. I'm really happy to share with you the script, the code, the presentation, everything that I showed you in the last 30-ish minutes. Thank you very much and have a wonderful afternoon.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

A picture is said to be worth a thousand words, and the visuals that can be created in JMP Graph Builder can be considered fine works of art in their ability to convey compelling information to the viewer. This journal presentation features how to build popular and captivating advanced graph views using JMP Graph Builder. Based on the popular Pictures from the Gallery journals, the Gallery 8 presentation highlights new views and tricks available in the latest versions of JMP. We will feature several popular industry graph formats that you may not have known could be easily built within JMP. Views such as Integrated Tabular Graphs, Satellite Mapping, Formula Based Graphs, and more will be included that can help breathe new life into your graphs and reports! Welcome, everybody. The picture is from the Gallery 8. My name is Scott Wise. I'm a senior systems engineer and data scientist. Every year, we get a chance to show you six or more views that are really compelling or cool graphs that you probably didn't know you could generate through the JMP Graph Builder. I want to leave you with something a little more interactive to start with. Hopefully, this is something that can help amaze your friends. Our inspiration came when my daughter, Sammy, and I were having a lot of fun at the National Video Game Museum in Frisco, Texas. Now, besides me being able to relive my childhood of all the arcade games and the home video games, they did a good job showing you how the technology improved. A game that I particularly liked in the arcade was Atari's Battle Zone. It was the first arcade game that was successful in big numbers that enabled you to use 3D vector graphics. You felt like you're in a 3D planet, Battle Zone, as well as was a first person perspective because you felt like you were in the tank. It had all these obstacles littered around there like these queues and there's pyramid back hide behind these couldn't drive through them, but they were great shields. They protect you from the enemy fire and you could duck out and take a shot. This was actually big technology for the time. It took a lot more electronics and programming to do 3D rendering, but they had to answer a problem. Can you recognize your orientation to a solid shape? If there's a wall depicted, are you behind the wall, in front of the wall? Given that, Sammy and I came up with two challenges in Graph Builder. I'm going to show you two shapes. The first shape is a basic shape. Just using a custom map, I'm going to put that shape in the Graph Builder. Also in that Graph Builder pane, there's going to be two points. There's going to be a point A and a point B, and I want to know if point A is inside or outside the shape. I want to do the same thing with point B, and I'm only going to give you three seconds. Let me bring up the data. Are you ready? Let's get these in the head or write them down. Three seconds. All right. I imagine everybody didn't think that was too challenging. Let's take a look at the answers. Point A is in, point B is out. Now, this one one is really easy to eyeball. I can just tell that point B is outside the U shape. In fact, if I click into the shape or I color by the shape in Graph Builder, you can readily see, Okay, B is outside in the nonshaded area A is inside. Well, that's all well and good. But what about the next shape? This one's going to be a little more challenging for you. It's a spiral shape. Same instructions. I want to know if point A is in or outside the shape. In this case, it'll be a spiral. I want to know if point B is inside or outside. Three seconds. Are you ready? All right. Did you get that answer correct? Let's see what the official word is. Point A is in, point B is out. Now, this one was a little harder to eyeball. I didn't give you that much time to trace it with your finger. I'm just looking at it and making a guess. I don't know which way to guess. Now, if I can click into the Graph Builder and highlight the point, now I can see A is in and B is out. But it's hard to see if I don't have that capability. This was the problem those video game designers ran into with both the U shape and the spiral shape or any shape they would run into. They developed a methodology. It's called Ray Casting. Think about drawing a line out in any direction from the A point, from the B point, and you just pretty much count the number of times it intersects, crosses, goes across one of the shape lines. If it crosses an odd number of times, the point is within or it's in the shape. If it crosses an even count of times, the point is outside or out of the shape. Let's see how that works. Let's go back to our U shape. What I did, I included a column here that would enable me to also include intervals. We will see how to do this a little later when we talk about forest plots. I'll just look at the finished product here and you see I've driven an interval plus or minus 30 around B and A enough to get through the shape. Go with B. I'm at B right now. Pick a direction. I'll go right. I see one crossing, two crossing. That's a two. It's even. It means it's outside the shape. What about A? Let's go same thing. Let's go right. 1, 2, 3. Three is odd, so it's inside the shape. Very cool. Will this help us with the harder one? I bet it will. Let's take a look. Here we go. Let's look at B. Go either direction. I'll go this direction. I'll go left this time. 1, 2, only two. It's even, so it means it's out. What about A? A's right here. Okay, there's one line crossing, two line crossings, three line crossings going to the left. It's even, it's thin. That easy. Now you know something cool you could do, know what Ray Casting is? You got something that can help amaze your friends. Let's go have some more fun with the Graph Builder. Let's see what we have now in our pictures from the Gallery 8. This year we got formula- based. We have Tabular data that's been integrated with the graph. We have a flow parallel, a special type of parallel plot. Forest plots that make use of those intervals. Percentile in the factor for doing comparisons, that's cool. We can even do satellite drill- downs. Let's dive right in. Now, I'm going to give you this journal. Each journal is going to have everything I'm showing you. It's going to have pictures. It's going to have instructions, why it is all hired tips, even the step by step instructions on how to do this yourself in Graph Builder. Then I'm going to give you the raw data. Now, with this graph that we're going to look at here, one of the tips is we need to include a formula and all its elements in the data table. W hat was happening was my father challenged me to help him buy a garden hose. He was doing some spraying, so he attached this little spray wand to the end of the hose, and he wanted to get good water flow. He knew there was a certain water pressure coming out of the tap, but also knew he could buy small or long length hoses, and he could buy small or larger diameter of the hose. He wanted to see which one worked the best. To do this, all we had to do was find the formula and put that formula into JMP. It is right here under this waterflow rates. T here's the constants that have been customized for hoses and how you work in diameter, pressure, and length. Then I have all of my components for that formula. With different size hoses, different diameters. It looks like I've got three different diameter hoses, three different pressures, four different lengths available. Now when I go into the graph builder, now I can just put the water flow on the Y. I want to see length on the X, maybe diameter on the overlay [inaudible 00:09:24], there we go . Then maybe I'll put water pressure on the group backs because I know I can right-click here and do one level at a time. Now I've got a smoother line that's not really telling the full story. Neither is just doing a line because the line is just showing me I'm just really plotting the points. I'm not really plotting the plots between the point, but I have a formula here. I should be able to. Yes, now I can go in and do a formula. You can right- click, change that line to a formula, or you can click right up here into the highlighted icon. Now you can see I've gotten the formula baseline. Now I could probably answer something about a 60 foot hose and where I expect the water flow to be at a certain pressure and a certain diameter. It worked out well that the bigger your diameter, 0.75, like three quarter, definitely was the green line. It had the best waterflow performance and the shorter the hose had the better water flow performance. It's because as water travels through a long hose, it does rub against the insides creating friction and that slows in your water flow. Now, the other thing you might see me do from time to time is you might see me dragging pictures and it's literally as easy as just grabbing a picture, dragging that picture in there. You can right- click. It's got an image area under the right-click. You can size and scale it. I can fill the graph completely. I can go right back to it and put some transparency so I can make the points pop on top, and now you can get a better view. That was our first graph that you probably didn't know you could do in Graph Builder. Been there for a while, just been hidden from many. Now, let's talk about something that came in in JMP 17, Tabular data. I want to thank Joseph Reece for helping me get this inspiration and some support to come up with the best solution. We're able to create not only reference lines in tables below the graph, but they're actually integrated in with the graph. This is something really special they added in JMP 17 to get this integrated Tabular data. Let's bring up this data set. This data set is chemical production. In this data set, I am going to pull up the graph builder. I am going to put the material vendor on the X, and I'm looking to see if there's a difference among those vendors in terms of my rate of reaction of my process when I use their products. I like box plots, I'll change out the box plots. Not a lot I can do to help my comparison here. Maybe I can go in this lower left hand side of box plot element and turn on these confidence diamonds. Maybe I could color by the rate of reaction, maybe back means lower than green. I'm not exactly sure. Now, Joseph recommended, "Hey, why don't you add back in the points?" I'm going to just right- click in here, add points. But this time, instead of looking at all the points, let's look at a mean of the points. Let's look at the confidence interval, lines up with the ends of my means diamond, makes sense. Instead of an air band, now I can do a band or a hash band. That's cool and that's given me a better look. I get a little more confidence that Acme might be different than green. It would help, though, to have a reference line. All I got to do is go into the area, the graph area, right-click, add a caption box. These are all hidden under caption boxes. You're like, Well, I've done caption boxes and that's what I expected to happen. It just put the mean up there. Well, what we can instead do is we can change the location now and you can make it an axis reference line. There it goes right there at the bottom. I'm going to go right back into this area and I'm going to add a second caption box. I'll close up these others. I don't want to see, but now I can add the mean, not over top the other one, but my location now could be an axis table. I can even add more summary statistics like maybe the standard error. I can click on this numbers format. I can go and maybe do like a fixed decimal point and I'll do it with two. I'll just say done. Now I've got a really good view. All that's left is for me to clean up the legend. I'll go to Legend settings. I don't need all these things, maybe just the one that shows the color gradient. I'll go to the position, drop it to the bottom. Then I'll right-click on it and go to the gradient, and now I'm moving horizontal in this direction. That's what it will look like. I like that. Now I've got my graph. Going back under the Graph Builder hotspot, I can go to redo, go to column switcher, and now I can switch out the ready reaction with a couple of the other continuous measures. Watch what happens when we go from ready reaction to agitation. This recalculates. All the axis table recalculates. Your reference line will recalculate. Your table of summary statistics at the bottom wind up under the columns recalculates. This is a wonderful thing you can add to your charts and put these in the dashboards and share these with each other, even on the cloud. Things like JMP Live, this would be awesome. That is actually Tabular data. Let's go to our next. Let's go to our next view. Our next view is Flow Parallel Plots. I want to thank Jeb Campbell for helping me with the inspiration and also the solution for this. Might look like a regular parallel plot, but I want you to see it's outflowing or I say inflows, let me get this right, inflows are coming in to a big bucket of budget, and then I see outflows going out like there's the taxes and here's savings and it further gets split up. How do I get these inflows and outflows into the same parallel plot? The first thing I'm going to do is in my data, I'm going to make sure that every branch of my data, starting from the back and going forward. I had 12K outflow. That came from a 20K outflow 1. Outflow 2 savings was 12K it was part of the 20K in outflow 1 along with Roth and savings, which made up the 20K. It went into the total of 101K. Out of that one, the inflow was part of the money I got from my job, which was 90K. But the amount for this branch is 12K. That way, the amounts will add up to the total, which will be 10 1 K. All this is set up. These are all categorical. If I go into Graph Builder, I'm just going to lay out all these categorical ones in the X. I'm going to size by the amount. I'm going to color by the outflow, and now I'm going to select the parallel plot. You can right-click and select it or you can select it from the icon. Now I should have something that looks familiar to what I want, but this little bit in here doesn't look like it's all resetting. I need this to reset, right? The inflows go into the big bucket and then the outflows come from the big bucket. You do that by clicking on this combine sets. When I do that, it gets me the right behavior. I'm going to say done. Let's take a look at it here. You can play with the ordering here to make it look a little more pleasing to the eye. Now I can pick one of these outflows like this auto car payment and I can see the 8K comes from here. It was part of a bigger auto K, which was 11 K. That was part of the side hustle money went to that. I can see home, I can see home. Most of that was the home mortgage. There was 2 K here for the upkeep. I can follow that one all the way back in to see it came from my job money. But that's where it came in for out of this total budget I fed. It's really cool. We can do input output boxes, project budgets. There's a lot you could do with this. All right, so really cool view. Forest Plots. As I mentioned before, intervals are a really cool way to do a lot of comparisons. This I'm looking at some mean comparisons among three of the four Cs of Diamond buying, clarity, color and cut. I have different levels of them. I want to see if there's a difference as it occurs to the price of the Diamond. Say you're shopping for a engagement Diamond. What I will do, I will go pull up the data. This is some summarized data. Again, I have color, clarity, price. You can see I've got different levels of those. I have the number, the mean standard error in the lower and upper 95 % confidence interval around the mean. All that's just been saved into a JMP table. I will go to the Graph Builder. I will go put my X, which is my three Cs, three of the four Cs. I'm going to put my level right to the right of the X. Now they're lined up. That looks pretty nice. I will put the mean on the X. Now I will color by the level. Now, how to get the intervals? Well, there's an interval box. If you only have one, you can drop things up into a corner. You have to play around with it a little bit. You can drop things up into a corner of this. But if you have both sides of the interval, you can grab them both and put them right there in the interval box. I say done. It did a nice job. Now it's really easy to see what groups together, what might be statistically different from the 95 % confidence interval compared to another level. I'm going to make it easier on my eyes. I'm going to right-click, go to access settings under that X. I might show a grid which should give me a little outline and I'm going to reverse the order. Now when I did this, I can see that okay, the very fine clarity, that almost flawless clarity and the very, very slight imperfections are different than the others, but it's different in a bad way. They're actually cheaper prices. That doesn't make sense. Their clear diamonds are category D, so K would be more cloudy. I can see there is a group which is different than some of the others. But some of these less clear diamonds are more expensive. Maybe color is not the right thing to look at. But I can see there is a logical order of cut. An ideal diamond should be cut better and be worth more money than the ones that are not cut very good. You can see that. You can see which ones are different. This is a nice way of doing means comparison, interval comparisons. Does intervals contain a certain reference amount. Intervals contain zero. There's a lot of ways to use this, but you can do forest plots now in Graph Builder. We're cooking right along. Let me get to the next one. This is Percent of factors. If you have ranked or scale data, this is a great way of doing comparisons on a zero to 100 % scale. My family likes to visit all the coffee shops in Austin. Here's some old rating and sentiment data that came from Yelp. You got ratings here. Sometimes they got sentiment in here. It's a lot of fun. These are all coffee houses that are still open in Austin, we go to some of these and it's easy to set up now. Go to Graph Builder, just put your levels. That's my coffee shop. Put the ratings on the overlay. Don't have to put anything on the X. I'm going to ask for bars. Instead of side by side, I go stacked. Okay, am I done? No, it's going from count from zero to 250. It's not showing me for zero to 100 %. How do I do that? Really easily change your summary statistic to Percent of factors. Change that, it fills it in. Now you can see it. Really nice, really interactive. I also had within my data, I had a low high rating. I could see all the high ratings were the things where it looked like I gave it 4 stars or 5 stars. What's really nice about that one is now I could come over here, I can go order by. I can now order by another. It doesn't have to be in the graph, just has to be in my data table as a column. I can do that high rating, I can say go, and now I can see that wow this safety, I'm saying that correctly, was the highest rating. Flight path coffee is one that my family really likes. This one right in here, got a lot of positive ratings. I could even play around with filtering by the vibes here. I put vibe sentiments, the review mentioned the word vibe and it was positive. I selected it and you can see that flight path came out pretty good as well. Good music, good place to study, good location, all just the right vibes, the right crowd, nice place to hang out. We got time for our last pictures from the gallery. We're going to look at some satellite mapping. Really, all the mapping is changed in JMP 17. You can drill down, I think, in even better detail now because we switched to the Map Box type math. Remember, to do a map in JMP Graph Builder, you just need positional data here at latitude and longitude. I'm going to look into some of these places I stayed at and different places in California. I'm going to focus on this Delta King. You right-click, go to Graph Builder and you pick back roadmap, street map service, and here's all your options. I have a couple of these saved for us. These are just hotels I stayed at over the years. Here's a dark view of just all the US hotels, and I can see where these these things lie, that's a nice view. People like dark mode. I can go look at more of a topographical. They have an outdoor view, so you can get an idea, is it in the water, is it next to a lake, next to a river, that type of thing, in a city. You can go to the street view as well. The street view, again, works just like you would expect. You can drill down on more detail to more detailed street levels. I find instead of using this plus or minus up here, I like using the Magnifier tool in JMP. If I click on this one, I can see that this Delta King. Oh, my goodness. What is that? That's not a hotel. Now I can go and I can switch my background graph away from the streets and I can give it a satellite. Now my satellite will go and it'll show me, wow, that's a ship. The Delta King is one of the old paddle wheeled steamboats that used to fly between Sacramento and San Francisco and is still there. It is now a hotel you can stay at. Thanks to my friend, coworker Bonnie Rigo, who gave me gave me a chance to experience staying at the Delta King once. We had a good stay there in a very unique hotel. Okay, so there's other really cool views. I'll let you explore those, including the Luxor in Las Vegas and the Fountain Blue in Miami Beach. I've got some good ones in here, so you can go play with this data. But what I'd like to do is wrap up here. I did include a bonus picture from the gallery. This is a combination painter chart, a combination of line charts and Pareto charts and bar charts that can be ordered and show increasing or decreasing performance of defect reduction. This was used at Ford in the RATD program and is very popular for folks doing defect reductions. If you want to learn how to do the painter chart? You can do it in JMP 17 in the Pareto platform, but in any version of JMP, you can get there just from the graph builder. All right. Where to learn more? There are lots of other pictures from the galleries. Years that we've gone and did more views. Go look at all of the galleries. We're on our 8th. There's also one through seven to look through. You can as well take a look at the blogs on the JMP community. There's a lot of them that have been done on these graphs or on other really cool views. There's other presentations and tutorials and training. I recommend you will have these in the journal. Zane Greg, the father of the Graph Builder, it's always good to learn from. As well as our training resources in our new Learn JMP area in the JMP community where we have formal training as well as mastering JMP training on things like graph builders and dashboards. If you want to suggest views, please do go to the community and put them in the JMP wish list. We get some of our ideas from you saying, "Would it be great if JMP Graph Builder could do this and look like this?" This would be so helpful. A lot of these will make it into releases of JMP. All right, so we are done with our presentation. I hope you enjoyed pictures from the Gallery 8. I just want you to go out and enjoy the rest of the presentations. But for sure, go have fun graphing and exploring your data in JMP Graph Builder. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Often as we are trying to gain insights from our data, understanding that two variables are related is not enough. We need to dig deeper and ask questions like: under what circumstances are they related? For whom are they related, why are they related, and how? Moderation (i.e., interactions), mediation, and moderated mediation models allow us to answer these types of questions. These models are popular and important but cumbersome to fit. Furthermore, visualizations essential for understanding interactions are difficult to create from scratch. This presentation will describe the Moderation and Mediation Add-In for JMP Pro, which enables easy specification, fitting, and visual probing of interactions in three popular models: moderation, first-stage moderated mediation, and second-stage moderated mediation. With minimal user input, the add-in automatically specifies and estimates the appropriate model. Then, the results are processed and packaged into ready-to-publish output. An interactive Johnson-Neyman plot, as well as a simple slopes plot, is created. We will provide an in-depth demonstration of these features using an example from psychology. Academics and data analysts across the social, behavioral, educational, and life sciences will benefit from this novel functionality. Blog post describing the Moderation and Mediation Add-In: https://community.jmp.com/t5/JMPer-Cable/Who-what-why-and-how-Tools-for-modeling-and-visualizing/ba-p/527173 All right . Hi , everybody . My name is Haley Yaremych . I worked at JMP this past summer as a statistical testing intern , and I'll be returning this coming summer in the same role . This past summer , I built an Add- in that helps users fit and visualize interactions . I'm excited to talk to you all about that today . Okay , to set up the example that I'm going to be using throughout the talk . Let's take a look at this clip from a website called The Science of People.com . This clip reads . Do you know the impact of your work when we don't have our why at the front of our mind ? It can be hard to feel motivated and excited about what we're doing. When we get busy or overwhelmed . The why just seems to slip away . This clip tells us that when we feel that our work has meaning , this tends to lead to greater job satisfaction . With a structural equation modeling path diagram , we would display that cause and effect relationship like this . But if we're too overwhelmed at work , this relationship might weaken . The meaningfulness of our work should be related to job satisfaction , but only if overwhelm is low . Conceptually , we could represent that like this . In the social sciences , this is what we call moderation , because overwhelm is going to moderate that relationship between meaningfulness and job satisfaction . But more widely , this is known as an interaction . When we find a significant interaction , we need to visualize it in order to understand what's going on . To do that , we often need to look at simple slopes . A simple slope describes the relationship between the predictor and the outcome at a particular value of the moderator . In this plot , we're taking a look at the relationship between meaningfulness and job satisfaction at three different values of overwhelm . The red line is that relationship when overwhelm is low . The blue line is when overwhelm is at its mean , and the purple line is when overwhelm is high . Just as we would expect , the relationship between meaningfulness and job satisfaction is the strongest . When overwhelm is low and won't overwhelm is high , that relationship weakens . Being able to visualize simple slopes is a really essential part of fitting and understanding models that involve interactions . But in order to publish these results , we also often need details about the values of those simple slopes and their statistical significance at different values at the moderator . Just like I've shown here for high and low values of overwhelm . We can also take things a step further beyond simple moderation . This clip also mentions that meaningfulness might result in greater job satisfaction because it tends to lead to greater motivation at work . There might be a cause and effect pathway here , and this is what we would call mediation . But again , overwhelm needs to be low in order for these benefits to play out . We might expect that overwhelm needs to be low in order for this first effect to be present . We would call this first stage moderated mediation . Or we might think that low overwhelm is more important for the second effect to be present . We would call this second stage moderated mediation . In these moderated mediation models , if we find a significant interaction , we still need to probe that and assess significance at different values at the moderator . But this time , we're interested in plotting and testing this entire effect of meaningfulness on job satisfaction through motivation . We call this the indirect effect . We're going to see an example of this in our demo in just a few minutes . These types of questions come up all the time , not only in social science research , but also in other areas . Given their popularity , it's no surprise that we've had a lot of requests from JMP users to incorporate quick and easy ways of fitting and visualizing these types of models . A lot of these user requests mentioned moderation , mediation and simple slopes . The Jason Nieman Plot is an extension of the simple Slopes plot that I showed earlier , and I'll get to that in a few minutes . But basically these are all different jargony ways of asking for the same functionality . You'll notice that a lot of these requests mention the process macro . The process macro is a very widely used tool for fitting these types of models . It provides easy model fitting and a lot of numeric output about these models . But right now , it doesn't provide visualizations . The burden would be on the user to take this numeric output and create a graph with it elsewhere . That can be very cumbersome and error prone . This is a really important drawback because these graphs are essential for understanding interactions . Just to give you a sense of how difficult it is for the user to create these graphs on their own , these are the formulas that underlie the two plots that you're about to see in the demo . Imagine having to code these up yourself . It would be really tough . With this add-in , we wanted to draw upon the strength of the process macro that make it so popular so easy and automated fitting of these models . But then we also added features that cannot be found elsewhere and that really capitalize on the unique strengths of JMP . Engaging visualizations that otherwise would be really tough for users to make from scratch . Here's a quick summary of the features of our Add- in , as well as what users are currently up against . If they want to fit these models with the structural equation modeling platform JMP but without the Add- in . We've automated all the details of model fitting and without the adding , there's a lot of data preprocessing that's often required and it can be difficult to specify the correct structural equation model . We also provide a lot of numeric output , but we're also going to sift through that output and do the further calculations with it that are needed to really distill that output . Then , as I mentioned , all visualizations are now automated so users can avoid those complex formulas . Now I'm going to JMP over to a demo using the second stage moderated mediation model with the Add-in . Here's the model that we're going to fit . Within JMP , I'm going to open up our ... Oops, I moved my bar here . Okay . I'm going to open up our moderation mediation Add-in . I'm going to put the second stage moderated mediation model . Within the user input window , the first thing we see is these figures . Like I mentioned , a difficult aspect of fitting these types of models can be understanding how to make the JMP from what we think is going on conceptually to the statistical model that needs to be fit . The goal of these figures is just to take that burden away from the user , and the only input that we need from the user is just to select a variable for each role . I'm going to do that here . Then optionally any number of covariates can be added . By default , any variables involved in an interaction term are going to be mean centered . But this can be turned off or they can be centered around a user specified value . Then those plots that I mentioned are only going to be shown in the output if the interaction is significant at alpha 0.05 . But this can also be turned off . When I click okay and I pull up our output , the first thing we see is the output from the structural equation modeling platform . But again , this can be a lot to sift through . The goal of this moderation detail section is to pull out all the most important parts of the ACM output to do any necessary computations with that output , and then to package everything into sentences that can be easily understood and copy and paste it into a publication or report . You'll see here we get some details about the conditional indirect effects . Again , these are very similar to simple slopes , but now we're calling them indirect because the effect of meaningfulness on job satisfaction is traveling through motivation . The next action here is going to be our Jason name and plot . This plot really is the state of the art method for probing an interaction because it's going to provide a lot more detail than the simple Slopes plot that I showed earlier . Here on the X axis , we have the moderator , so overwhelm is on the X axis and then the Y axis is going to be the effect of meaningfulness on job satisfaction through motivation . That indirect effect is what's changing as a function of overwhelm . We're looking at that effect at each possible value of overwhelm . We can see that that effect is weakening as overwhelm increases . But this plot can sometimes be kind of hard for people to wrap their head around , mainly because we have an effect on the Y axis . As in this example , although most of these effects are positive , they're just becoming less positive as overwhelm is increasing . This can sometimes be a little confusing . To make things even clearer , we added graph flights to this plot . When I hover over this line , I'm going to see a graph fit that shows me the effect of meaningfulness on job satisfaction at this particular value of overwhelm . We can see that when overwhelm is low , that is that effect is strong and positive . Then as overwhelm increases , that effect is weakening . Until eventually , when overwhelm is really high , that effect is basically flat . A really nice advantage of JMP is that we were able to add these graph fits and really aid user understanding here . Another nice aspect of this Jason Neumann approach is that we can calculate these significance boundaries . This boundary is the exact value of overwhelm , where this effect goes from being statistically significant , which is in blue to non-significant , which is in red . Typically there's going to be two significance boundaries . You can see up here that they were both calculated , but only one appears in the plot . This is because this plot is only going to show values of the moderator that were observed in the data set . We did this for extrapolation control . Here we can say that as long as overwhelm is less than about 1.25 , there's going to be a significant effect of meaningfulness on job satisfaction through motivation . Our final section of output here is going to be a conditional indirect effects plot . This is a lot like the simple slopes plot that I showed earlier . Basically we're just taking a few of those graph plots and we're putting those into a static plot . Same idea here . We end up with the same takeaways , but this specific type of graph is often needed for publication . Some features that aren't included in the Add-in right now that we would love to add in the future . The first is bootstrapping . Right now these confidence bands are calculated mathematically , but finding them with bootstrapping is sometimes preferable . We would love to be able to add that in the future . We love to add more types of models . The process macro that I mentioned earlier offers dozens and dozens of model options . Here we only have three , but we did choose the three most popular types of these models . But we'd love to be able to add more in the future . All right . With that , I'm going to go ahead and wrap up . Thank you so much for your attention . You can feel free to email me with questions at this address . I've also included a link to the JMP Community blog post that provides a lot more detail than what I had time to get into today . This is going to go through basic moderation as the running example , which I think will be really applicable to anybody in any field that's interested in testing and probing interactions with these tools . Again , thank you for your attention .

0 attendees

0

Event has ended

0 attendees

0

Tuesday, May 16, 2023

2022年10月全新发布的JMP 17 将无缝、流畅、可再现、可引导、可记忆、可封装、可协作、可共享的数据分析工作流提高到了一个新的高度。在2023 JMP Discovery Summit China上，JMP大中华区数据分析负责人宣布了JMP 17在中国区的正式发布，并通过引导工作流、导航工作流、分享工作流和封装工作流，演示了JMP 17 如何为企业的数据分析全生命周期赋能，帮助企业将数据分析业务的愿景一步步照进现实，实现卓越分析。

0 attendees

0

Event has ended

0 attendees

0

Tuesday, May 16, 2023

徐靖智，捷普集团绿点科技JGP-MTC金属技术研究院负责人，成都厂金属技术培训学院院长；马欢，捷普科技（成都）有限公司，金属技术中心主任工程师演讲主题：金属加工制造行业中数据分析的应用趋势和挑战演讲摘要：本次演讲分享了JMP如何帮助企业在组织范围内建立自循环的工程与数据分析能力体系，构建基于数据驱动的分析文化，提升整个组织的工程能力，实现智能化和自动化制造，在提高生产可靠性和生产效率的同时，满足客户日益增长的个性化需求。

0 attendees

0

Event has ended

0 attendees

0

Tuesday, May 16, 2023

0 attendees

0

Event has ended

0 attendees

0

Tuesday, May 16, 2023

JMP的脚本非常强大，拥有着丰富直观的函数库，学习成本低，目前已有各种大规模的数据分析插件被源源不断的开发出来。本次演讲演示了如何通过JMP脚本语言（JSL）所创建的脚本工具，批量自动生成设备及工艺参数对比的报告，一键完成对所有生产设备零件动态参数数据的自动分段拟合，并直接导出可以与生产系统相对接的参数格式，在全公司范围内供工程师直接复制使用，准确且高效，大大提高了工作效率。

0 attendees

0

Event has ended

0 attendees

0

Tuesday, May 16, 2023

0 attendees

0

Event has ended

0 attendees

0

Tuesday, May 16, 2023

0 attendees

0

Event has ended

0 attendees

0

Monday, October 16, 2023

Preparing data for analysis is said to be 90% of the work in data analytics...and it is especially important to have good tools to make this process efficient. Luckily, JMP includes a suite of features to assist with removing complexities from your data and transforming it to fit your problem, allowing you to focus on innovation, experimentation, and discovery. Best of all, these tools are enhanced with each release. John Sall will be joined by Julian Parris, Andrea Coombs, Michael Hecht, and Michael Goff (just some of the skilled software developers and statisticians at JMP) as they explore ways to angle, wrangle, and untangle YOUR data using JMP.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 16, 2023

Join Ruth Hummel, Senior Manager of JMP Education, to examine the mind of JMP and to learn how to make the paradigm shift needed to think like a data explorer. Like Copernicus shifting from an earth-centered to a sun-centered understanding of the universe, when you start using JMP, you no longer see a problem-focused mess of a spreadsheet, filled with summary statistics and partial analyses side by side with raw data. Rather, you see a data table that is the center of the analysis universe. The columns become the cast of characters that you can assign to roles in various data stories, all connected back, dynamically, to that data table. New columns are created through formulas, depending on the old columns, often with a simple right-click and with JSL programming language that writes itself. JMP offers a connected, dynamic, progressive analysis experience, with richness and depth at your fingertips. It just takes a little paradigm shift to unlock the magic of JMP.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 16, 2023

JMP has a wealth of design of experiments (DOE) options from which to choose. While this array is incredibly powerful, it also has the potential to be a bit intimidating to those who are new to this area. What category of design should I choose from the many possibilities? How do I know what the best one is for my experimental objectives? This talk provides some ideas for how to strategically tackle Step 0 of the process of constructing the right design by considering the following questions: What are the goals of the experiment? What do we already know about the factors, responses, and their relationship? What are the constraints under which we need to operate? Once these questions are answered, we can match our priorities with one of the many excellent choices available in the JMP DOE platform. Hi. I'm here to talk today about the crucial Step 0 of Design of Experiments. Really the idea of this is to take full advantage of the wealth of different tools that are under the DOE platform in JMP. I'll walk through what we should be thinking about in those early stages of an experiment. If you look at the DOE platform listing, what you'll see is that there's a lot of different choices. Within each choice, there's many more choices. Within some of those, there's nested possibilities. If you're an expert in design of experiments, this wealth of possibilities really feels like such a wonderful set of tools. I love all of the options that are available in JMP that allow me to create the design that I really want for a particular experiment. But if you're just getting started, then I think this set of possibilities can feel a little bit intimidating and sometimes a bit overwhelming. It may be a little bit like going to a new kind of restaurant that you've never been to before. Someone who's a seasoned visitor to those kinds of restaurants, loves all the possibilities and the wealth of options on a big menu. But if you're there for the first time, it would be nice if someone guided you to the right set choices so that you could make a good decision for that first visit and have it be successful. Here's what I'm planning on talking about today. First, I think the key to a good experimental outcome is to really have a clear sense of what the goal of the experiment is. I'll talk through some different possibilities of common goals for experiments that really help us hone in on what we're trying to accomplish and what will indicate a success for that experiment. Then I'll do a quick walk- through of some of the more common choices of design of experiment choices in JMP, and then I'll return to how do we interact with those dialog boxes that we get when we've chosen a design for what factors to choose, the responses, and the relationship between the inputs and the outputs. That's where we're headed through all of this, and I will say that the first and the third steps really need a tremendous amount of subject matter expertise. If you're going to be successful designing an experiment, you really need to know as much as possible about the framework under which you're doing that design. We want to in fact incorporate subject matter expertise wherever possible to make sure that we're in fact setting up the experiment to the best of our ability. What are we trying to do? I've listed here six common experimental objectives. I think that sort of gives you, a checklist if you like of different options of things that you might be thinking of accomplishing with your experiment. We might start with a pilot study, so we're just interested in making sure that we're going to get data of sufficient quality for the experiments and answering the questions that we want to have. We might be interested in exploration or screening. We have a long list of factors, and we want to figure out which one seem to make a difference for our responses of interest, and which ones don't seem particularly important. We also might want to do some modeling. Actually formalizing that relationship that we're seeing between inputs and responses, and capturing it in a functional form. Sometimes we don't get the level of precision that we need, and so we need to do model refinement, and so that might be a second experiment. Then once we have a model, we want to use that to actually optimize. How do we get our system to perform to the best of its capability for our needs. Then lastly, there's a confirmation experiment where we make that transition from the controlled design of experiments environment that we're often doing our preliminary data collection in to production, and making sure we can translate what we've seen in that first experiment into a production setting. You can see from this progression that I've outlined here, that we may actually have a series of small experiments that we want to connect. We may start off with a pilot study to get the data quality right, then we'll figure out which factors are important, then we'll want to model those, then we'll want to use that model to optimize, and then lastly translate those results into the final implementation in production. We can think of this sequentially or for an individual experiment just tackling one of these objectives. Now we have some framework for what the goals of the experiment are and how to think about that, we'll now transition to looking at what some of the common choices are in JMP and how they connect with different goals. I'll open up the DOE tab in JMP, and you can see that we've got the list of possibilities here where we've got the nested options tucked underneath some of the main menu items that we have here. The talk is only half hour, and so I won't be able to cover all of the tabs. I've given a brief description of some of the tabs that I won't have time to talk about. Design Diagnostics is all about having a design or maybe several designs and comparing and understanding the performance. Sample Size Explorer is all about how big should the experiment be and some tools to evaluate that. Consumer Studies and Reliability Designs are really kind of specialized ones. I'm setting those aside for you to do a little research on your own about that. In Consumer Studies, we're usually asking questions of consumers, about what their priorities are, what features they like. That tends to be a comparison between two options and how they value those choices. Reliability is all about how long our product will last. That's a little bit different than things that I'll talk about in the rest of the talk. I'll start off with some of the C lassical Designs or the general designs that we have that have been developed. Then I'll finish with some of the JMP specific tools that are much more flexible and adaptable to a broader range of situations. I'll start with that bottom portion of the tab. Here we are in JMP in the DOE tab, and I'm going to start with Classical. You'll see that I'm tackling this in a little bit different order than the list is presented by JMP. I think those ones are presented by JMP in their order of popularity, and I'm choosing to tackle them more from principles about how they were developed. In Classical Designs, a Full Factorial design is looking at all combinations of all factors at all levels. That works nicely if we have a small-ish number of factors, but it can in fact get a little bit out of control if we have a large number of factors, but it's exploring the entire set of possibilities very extensively. The next one that I'll talk about is a Two-Level Screening design, and essentially, what that's doing is it's choosing a subset of the two factorial possibilities, and it's a strategic subset that allows us to explore the space, but keep the design size more manageable. You'll notice that those first two possibilities I've shown at two levels, and that's typical for screening designs. Usually, we just want to get a simple picture of what's happening between the input and the responses. When we want to start modeling, then a Response Surface Design typically allows for exploring curvature. When we're modeling, three levels or sometimes more than three levels can be a good way to understand curvature and also understand interactions between the factors and how they impact the response. Alright. T hat's three of the items under the Classic tab. The other ones are Mixture Design. Typically in all the other possibilities, what we have is that we can vary the individual factors separately from each other. But in a Mixture Design where we're talking about the composition or the proportion of the ingredients, they're interdependent. If I increase the amount of one ingredient, it probably reduces the proportion of the other ingredients that are in that overall mixture. A bit of a specialized one when we're looking at putting together ingredients into an overall mixture. Taguchi Arrays, I've listed here as a kind of optimization, and the optimization that they're interested in is making our process robust. Typically when we're in a production environment, we might have noise factors. These are in fact, factors that we can control in our experiment but when we get to production, we're not able to control them. Then we have a set of factors that we can control both in the experiment and in production. The goal of Taguchi Arrays is to look for a combination of the controllable factors that gets us nice stable predictable performance across the range of the noise factors. You can see C1 here has a pretty horizontal line which means it doesn't matter which level we are at for the noise factor, we'll get a pretty consistent response. Those are the classical options. The next of the items on this JMP design tab that I'll talk about are Definitive Screening Designs. These are specialized designs that were developed at JMP, and they are a blend of an exploration or screening design, so a focus on a lot of two- level factor levels, and modeling. You can see with the blue dots, we have some third levels, so a middle value for the factors that allows us to get some curvature estimated as well. It's a nice compact design that's primarily about exploration and screening, but it does give us the option for an all in one chance to do some modeling as well. That's very popular in a lot of different design scenarios. The next tab is Special Purpose, and you can see there's quite a long list of possibilities there, and I'll hit some of the more popular ones that I think show up in a lot of specialized situations. A Covering Array is often used when we're trying to do testing of software. A lot of times what causes problems in software is when we have the combinations of factors. This is a pretty small design that's typical for Covering Arrays, so 13 runs, and we're trying to understand things about 10 different factors. What's nice about these Covering Arrays is that it gives us a way to see all possibilities of, in this case, three different factors. If I take two levels of each factor, a zero and a one, there's eight different combinations for how I can combine those three factors. All zeros, all ones, and then a mixture of zeros and ones. I've highlighted those with eight different underlined, what's really nice about these Covering Arrays is whichever three factors I choose, I will be able to find all eight of those combinations. There's 10 choose three different combinations of those 3 factors that I might be interested in, and all of them have all of those possibilities represented. That's a very small design that allows us not so much estimation, but to check possibilities for problems that we might encounter particularly in software. Next, a very important category of Space Filling Designs. Compared to the other options that I've talked about, which are model- based, this one just says, I maybe don't know what to expect in my input space. Let me give even coverage throughout the space that I've declared and just see what happens. You can see that I have many more levels of each of the factors. There's a lot of specialized choices in here, but they all have this same feel of nice, even coverage throughout the inputs face. I think these are often used in computer experiments or in physical experiments where we're just not sure what the response will look like. I'll talk a little bit more about that when we get to the decision making portion in Step 3 of the talk. Next is MSA Design or a Measurement S ystem Analysis, and this typically is associated with the Pilot Study. Before I dive in and really start to model things or do some screening, it's helpful to understand some basics about the process and the quality of the data that I'm getting. Here, I can sort of divide the variability that I'm seeing in the responses and attribute it to the operator, the measurement device, or the gage, and the parts themselves. Sort of understand the breakdown of what's contributing to what I'm seeing. That's very helpful before I launch into a more detailed study. Finally, G roup Orthogonal Super saturated Designs are in fact, really compact designs. In this example, where you have six runs, and we're trying to understand what's happening with seven different factors. That may seem a little bit magical, but it's a very aggressive screening tool that allows us to understand what's happening with a lot of factors in a very small experiment. It's important with these designs to not have a lot of factors. If all seven factors are doing something, and I only have six runs, I'll end up quite confused at the end. But if I think two or three of them may be active, this may be a very efficient way to explore what's going on without spending too many resources. Those are the start here ones that I've talked through a little bit. Now I'm going to finish with these wonderful tools in JMP that are more general and more flexible for different scenarios. Custom Design, I think is just an amazing tool for its flexibility. What's really nice in Custom Design is that I have this wealth of different possibilities for the kinds of factors that I can include. Continuous factors, maybe I'll add in, Discrete Numeric ones, and then also Categorical Factors. I have a lot of different choices so I can put together the pieces, and if I'm not sure what the design should look like in that bottom portion of the list, this gives JMP some control to help guide me to a good choice. On the next page, I have the option about whether I'm just interested in Main E ffects, whether I want to add some two factor interactions, and whether I want to build a Response S urface Model, so more the modeling goal of the experiment. This is sort of an easy way to build a design, and I have flexibility here to specify whatever design size I feel would be helpful and is within my budget to make a design and the expertise of the JMP design team are going to guide me to a sensible choice. This is a great way if you're not sure about how to proceed, but you're still making some key decisions about what the goal of the experiment should look like. Next, the Augment tab. If you think back to what I've talked about for the Experimental Objectives, you see that there's this connection between the stages. Maybe I've done some exploring or screening, and then I'd like to transition to modeling. Well, this allows me to take an experiment that I've already run and collected data for, and then connect it to the Augment D esign, assign the roles of what's a response and what's the factor, and then add in some additional runs. There's some specialized ones here, but if I choose the Augment portion, that allows me to specify a new set of factors, perhaps a subset of what I have or an additional factor and then also what model I would now like to design for. This is a flexible tool for connecting several sets of data together. Lastly, Easy DOE is a great way to get started for your very first experiment. It allows you to build sequentially and it guides you through the seven different steps of the entire experiment. It'll allow us to design and define, and so that's figuring out what the factors are, what the levels are, their general nature, then we can select what kind of model makes the most sense for what we're trying to accomplish, then progress all the way to actually running the experiment, entering the data, doing the analysis and then generating results. This is a wonderful progression that walks you all the way through what am I trying to do? To having some final results to be able to look at. What I will say is that this is designed for a model- based approach. What you'll see is that all of these look like they're going to choose a polynomial form of the model. That needs to make sense as a starting point. But if that does make sense and it does in a lot of situations, then this is a wonderful option. Just to finish things up here, what are some of the other key questions now that I have a goal I know a particular choice that I want to use in JMP, what are some of the other key questions before I actually generate that design? A whole category is about the factors. We need to use our subject matter expertise to figure out which factors we should be looking at. If we have too long of a laundry list of factors, then the experiment necessarily needs to be quite large in order to understand all of them. That's going to have an impact on how expensive our experiment will be. If we have too few factors, then we run the possibility of missing something important. What type are they going to be? We need to think about getting the right subset. As I showed you in C ustom Design, we have quite a wide variety of different types of roles for the different factors that we're looking at. That's another set of choices. How much can we manipulate the factors? Are they naturally categorical, or are they continuous? Then we need to think about the ranges or the values for each of those. Let's go to DOE and Custom Design. Then I'll just start off and I'll have three different continuous factors. What you can see is I can give a name to each of the factors, but I also get to declare the range that I want to experiment in for each of those factors. A s you can imagine, this has a critical role in the space that I'm actually going to explore. I need to hone in on what's possible and what I'm interested in to get those ranges right. If I make the range too big, then I may actually have a lot going on across the range of the input and I may not be able to fully capture what's going on. If I make the range too small, then I may miss the target location and I may get a distorted view of the importance of that factor. Here, this input actually has a lot going on for that response, but if I sample in a very narrow range, it looks like it's not doing anything. Lastly, if I'm in the wrong location, I may miss some features and not be able to optimize the process for what I'm doing. Again, the choice of which factors and the ranges, relies a lot on having some fundamental understanding about what we're trying to do and where we need to explore. The next piece to talk about is the relationship between inputs and responses. I will say that one of the common mistakes that I often see is that we run an experiment, and then after the fact, people realize, oh, we should have collected this. In textbooks, a lot of times, it looks like there's a single response that we're interested in and we run the experiment to just collect for that response. In practice, I think most experiments have multiple responses and so this is a key decision, is to make sure before we collect that first data point that we actually include the right set of responses so that we can answer all of the questions from that one experiment. Then we need to think about what we know about the relationship. Is it likely to be smooth? Is it going to be continuous in the range that we've selected? How complicated are we expecting it to be? A ll of these have an impact on the design that we're going to have. A couple of common mistakes about the relationship is, one, being a little too confident, so we assume that we know too much about what's going to happen, and then don't build in some protection against surprises. Then also if we have multiple responses, not designing for the most complicated relationship. If one of them were interested in Main E ffects and the other one we think there might be curvature, we need to build the design so that it can estimate the curvature because that's the more complicated relationship. A first key decision that I think is a little bit hidden in JMP is that we have to decide between model- based, and that's usually sensible if we're confident that our responses will be smooth and continuous, and that we're not investigating too big of a region, or should we do space filling? Space filling can be a good safety net if we're not sure what to expect, if we're exploring a large region, or if we want to protect against surprises. I'm pointing here on the last slide. I have more details about that to a paper that I wrote with a colleague, Dr. Lu Lu at the University of South Florida, where we talk about the implications of that first fork in the road, how do we choose between model- based and space filling, and what are the repercussions? Then lastly, we need to think a little bit about constraints. Our input region, if we've declared some ranges for the different inputs, that naturally seems like it's a square or a rectangle. But in that region, there may be some portions where we can't get a response or we just don't care about what the responses look like. Imagine if I am doing an experiment about baking and I'm varying the time that the cookies are in the oven and the temperature of the oven. I might know that the coolest temperature for the shortest amount of time won't produce a baked cookie. It'll still be raw, or it might be the hottest temperature for the longest time will overcook the cookies. I want to sort of chop off regions of that space that aren't of interest or won't give me a reasonable thing. In JMP, there's easy ways to specify constraints to make the shape of that region match what you want. The last thing is all about budget, how big should my experiment be, and that's a function of the time that I have available and the cost of the experiment. In JMP, we jump to here. Maybe I specify a response surface model, you'll see that there's a new feature called Design Explorer, which when I activate that, it allows me with a single click of a button to generate multiple designs. I can optimize for good estimation, so D or A-O ptimality, or good prediction of the responses with I-O ptimality. I can vary the size of the experiment and center points and replicates. If I click Generate All Designs, it will generate a dozen or so designs, which then I can compare and consider and figure out which one makes the most sense. I think understanding the budget, thinking of that as a constraint, is an important consideration that we need to have. To wrap things up, just a few helpful resources. The first one is a JMP web page that talks in a little more detail about the different kinds of designs. It fills in a lot of the details that I wasn't able to talk about today about those individual choices on the DOE tab. The Model-B ased versus Space-F illing , that's the paper I referenced earlier, where we need to understand the implications of choosing a model- based design or doing space- filling, which is a little more general and a little more protective if we are expecting some surprises. Then the last two things are, two White Papers that I wrote, the first one talks about how you can use Design Explorer to consider different design sizes and different optimality criteria and then choose between the different choices by looking at the compare design option in JMP. Then lastly, everything I've talked about here is dependent on subject matter expertise. The why and how of asking good questions, give some strategies for how to interact with our subject matter experts to be able to target those conversations and make them as productive as possible. I hope this has been helpful, and will help you have a successful first experiment using JMP software. Thanks.

0 attendees

0

Event has ended