Abstracts

0 attendees

0

Monday, September 12, 2022

Postpartum hemorrhage (PPH) is a major cause of maternal death in low-resource countries, accounting for 661,000 deaths worldwide between 2003 and 2009. To assess this burden, the WHO conducted studies to find methods for the prevention and treatment of PPH. Three large clinical trials were conducted in the past two decades by collecting blood loss volume data (V) for more than 70,000 deliveries. The outcomes were PPH (V>500 mL) and severe PPH (V>1000 mL). The parameters under comparison were the proportion of these events. The comparisons of small proportions led to very large (20,000 to 30,000) trial sizes. By using data from large trials,the Survival platform in JMP Pro showed clearly that the distribution of V is very close to the lognormal distribution. This finding allowed the efficiency of estimates of probabilities and relative risks to be improved and permitted a substantial reduction of sample size for treatments comparison (typically less than 4,000), in regard to those needed by the binomial outcome. Quicker and less expensive trials are very welcome to speed up obtaining results and have become common practice. Hello. I am Jose Carvalho, a statistician at Statistic Consulting in Campinos, Brazil. I thank you for the opportunity to show an application of JMP to clinical trials where a major improvement movement came from a statistical discovery. As a result of that discovery, one trial ended with the expected and very desired results. Subsequent trials on the same syndrome will be much cheaper and faster. The problem is the bleeding after birth or postpartum haemorrhage, P PH for short. P PH accounts for 125,000 deaths per year. Even in developed countries like the United States, it's the cause of 11% of the maternal deaths. Now, PPH is defined, just for classification, as blood volume in excess of 500 mL in 24 hours after delivery. If the volume exceeds 1,000 mL, then it's severe PPH. It'll be interesting to know the main cause of PPH. 90% of the cause are uterus atony . It's a failure of the uterus to contract after the delivery. If the uterus failed to contract, then the bleeding continues. Then we can treat that by giving drugs to contract the uterus or by some physical action. The main cause are trauma and placental tissue retention and coagulation system failure. You'll be dealing with uterus atomy and its prevention. PPH can bring serious threat to woman's life and health. Its onset must be quickly diagnosed during the delivery and treated. Treatments include, as I said, drug treatment with additional uterotonics and as a last resort, artery irrigation or h ysterectomy, the uterus removal. New drugs and devices are being developed to prevent PPH. Every one of those must be tested in clinical trials before they are allowed to use in the natural deliveries. We have data on three very large trials. The first one, the oldest one, was published in 2001. It was the Misoprostol. That's the name of a drug that was compared to the standard treatment and used 18,000 women. The second one after that, and that was published in 2012, is the Active Management, not a drug, but a physical procedure of pulling the umbilical cords. Now, the Misoprostol didn't prove to be as effective as the standard drug of treatment, which is oxytocin. The Active Management did not show any improvement also on PPH. Now we're going to deal... Sorry, with the Carbetocin trial , published in 2018, the largest of all, that enrolled 29,000 women. In all these trials, the primary outcomes were severe PPH and/or PP H. Now, to diagnose sPPH and/or PP H, we need to know the blood volume. The observations were volume, the numbers volumes in m L. But only the indicators of SPPH and PPH were considered in the statistical analyses. That is binomial variance zero, one, yes or no. Okay, in spite that we had the information about the blood volume. Before we proceed, just a small explanation about the two drugs that we'll be dealing with, again. The standard drug to use in deliveries is oxytocin. It's given routinely to every delivery work, every part in the world. As soon as the baby is delivered, the woman receives a shot of oxytocin. It's a standard procedure. Now, oxytocin is very nice. It reduces the severe PPH rates from 3.84% to 2%. It helps the incidents or the rates of the sPPH. But there is a problem, it is a heat-labile substance. It must be kept in a cold chain at seven centi grades all the time. Now, in countries with low resources, this can be a problem. If you do not keep it in this cold chain logistic, the drug will lose its efficacy, I'm sorry. Sometimes you can apply a drug that is not effective at all. Now, carbetocin, it's a new drug which has the same active principle of oxytocin and just a change in the excipients that makes it heat- stable. Carbetocin can be kept for six months at 30 centigrades, which is about room temperature in most places in the world. Now, there were very high hopes that carbetocin would be a good replacement for oxytocin, most of all for use in those low- resource countries. A clinical trial was devised for PPH, it was done by the WHO and it was a non- inferiority trial. The parameters for this trial are in the objective. The investigators said that, to declare carbetocin non- inferior to oxytocin, it should preserve 75% of the benefits. Now, the benefit is this 3.84% minus 2% so this gives them non-inferiority margin of 0.4 6%. We are talking about very low rates and the relative risk of 1.23. Carbetocin would be declared non-inferior to oxytocin if in the trial we could prove or bring evidence on that a relative risk is less than 1.23. This result in just a mazing competition in a sample size with over 30,000 people. We ended up with a trial with about 29,000. Those were in several countries as we signed that table before in the many centres. It was a very expensive trial, just a data collection of well over almost two years. It's a very serious thing. Why are the trials so large? Well, the obvious response answers to that question, is the proportion being compared are small. The effects are necessarily even smaller. Not so obvious, but it's still obvious, that the triumph needs to be so large because we are losing a great deal of information by mapping V, the volume, into two categories, like this. On this histogram here we have the actual distribution of the blood loss volume for the 29,000 subjects of the trial and then the cut- off point, thousands . Just imagine, just looking at the histogram, how much information is lost by taking all the niceties of the frequency of the histogram in zero, one left to the thousand line, right to the thousand line. But that's the way it was done, because for some reasons people like to use this dichotomization. If it's over 1,000, it's severe PPH. If not, it's not. I don't know even if that's well too associated with any further consequence on the health of the women. That's the way it's done. The classification is that. Now JMP helped us to discover that the distribution of the blood loss volume is log normal. There is a story behind it. We set forth to analyze the experiment as decided by the investigators used in the binomial distribution. But we saw that very easily that the two distributions of carbetocin and oxytocin, the blood loss volume distribution were pretty much the same. We were not very happy with this dichotomization to begin with, but we had to do it. That's what the protocol said. Now, once we the statisticians at the trial, we found beyond any doubt that the distribution was log normal. When I say the distribution of blood loss volume is log normal, I mean a big if, it is, it is not an approximation, a nice fit, things that we statisticians like. No, we had 29,000 points and the fit you are going to see was perfect. Then we went to do some homework and we found from physics that the blood loss volume distribution... Excuse me, the fluid volume in pipes has a log normal distribution, and that has been known since the 19th century. Coming from physics. Of course, we realized that our pipes are blood vessels, so they are elastic. The viscosity of the blood changes because of coagulation. But still, we have sort of a model. We have fluid in pipes, flowing in pipes, and the data showed that. We were very excited with that. We went further to see the consequence of using V for the estimation of the risk and we got nice results. Now then we had to convince the investigators. Such a large trial has lots of investigators, big shots. The physicians, they own the problem, so they have the last word and everything. They thrown at the idea. Some of them really didn't like. They said, "Well, we use no hypothesis since it just binomial variant, it has no model." It has, but they think it doesn't. People think it's too simple. What if the log normal distribution is not correct? We can have wrong results. Then we did exactly what we're going to do right here. Now, we did the analysis in front of them, and that with JMP was very compelling, and I hope you agree with that. JMP helped also on the communication of the discovery to the investigators in a very compelling way. Just then, to advance the result, using the lognormal distribution saves the results of the experiment. That's part of the story. We went on then to publish those results after the publication of the experiment was done, because the experiment failed. You'll see that it's a nice story. But then we published the results with the lognormal distribution as a secondary analysis. That touched the hearts of the European authorities, like the ADMA. Right now, carbetocin is very happily being used in low- resource countries, where it is needed. We are very happy with that. Let me show you how it went. First of all, the measurement. You see on the left, a sort of collector to collect the blood. It's used in many case in deliveries. As I told you, sometimes you have to take very fast action when the woman is bleeding too much. People can evaluate the blood loss by just seeing the stain in the bed, in the floor. But in many case, people use that collector. That collector has a scale that I enlarged on the right. In the first two trials, the blood loss volume was evaluated with that. Then they changed. They changed because it was no good, not perfect for our experiments, the three of them that's been running for about 20 years now. Let me show you how it goes with JMP. Let's see. I feel more comfortable with JMP. Here is a data table with all the 71,000 case of the three trials. Miso prostol. Here they are. Mis oprostol, A ctive Management, and Carbetocin . Let's see the distribution of the blood loss volume for three of them by trial, not by treatment. The difference by treatment is so small that it won't matter for this short demonstration here. I'm not analyzing the experiment yet. Here is for Misoprostol distribution. You see that it's a very nice log normal, isn't it? Can be something else, but it is log normal. It looks like a nice distribution, but it has problems. It's hiding the problems, actually, not for fitting a log normal, but for analyzing the way it was with the binomial variants. Let's use the Grabber tool and change the pins of the histogram. Make them thinner. Okay, there we go. What we see, we see spikes in distribution. Regularly you have spikes, you can see them here. Let me change a little bit, yes. Now you said, well, there's no problem. It's like numerical integration. You lose on the one beam then you have access on the other beam and they alternate and you end up with a nice integration. Well, not the case here, because we have a problem that in 1,000 we have a cut- off here. Let me take a zoom of distribution around 1,000, which matter most for us. See here's the spike at 1,000 . But you see part of this frequency here comes from the left, from the 900. Because of the reading of that scale, that scale was rough and people tended to round the numbers. There is a sort of a digit preference here. It's very clear that some of the known cases of PPH were moved to PPH. It's no trivial quantity for that small frequency here. That means that in spite of having no model, as my colleague said, for the binomial variants, we probably have a positive bias on this estimation. Now, this problem was taken care of by taking the weight of the collector device before the collection, before the use, before the procedure, and then weighing it again after the procedure. That was done only for the carbetocin trial that started on the carbetocin trial. If you go the same trick here, change the beams. Now you see that we have a nice distribution, no problem with spikes anymore. Weighing solved that problem. Now, let me tell you this collector is not for the experiment. It's for actual clinical use. The evaluation of the blood loss and its speed during the delivery is perfect with that scale. We cannot remove that and then weigh then to decide that, you have to take, say, a hysterectomy or thing like that. It's still in place, it's still used like that. We just changed it for the trial. We wait at the end. That's just for curiosity or something interesting. That came also from the ability that we have so easily to do this sort of analysis with JMP. That's more important than we can even think of. Now let's go to the real problem. It is also easy with JMP. I'm going to analyze the results of the carbetocin trial but then, so that I don't get mixed up in front of you, I prepared data set with subset just for the carbetocin trial . Here it is, 29,000 case only. That's a subset of that other trial. Let me take the opportunity to tell you what the data that I have here, of course, that's not the full data of the trial, that clinical trials. Clinical trials, you collect the hundreds of columns of [inaudible 00:20:56] for many reasons and for controlling so on and so forth. Here we have just the center, because the experiment was randomized by center so I have to keep it. Then the arm, it's one and two here but here have the issue that's closed and I have open treatment and control here the trial is over, of course. Then the volume, that's all the data we need. Those two columns here are derivations, are the sPPH indicator and P PH indicator so they are just very easy to do. Just an indicator of [inaudible 00:21:48] PPH in this case. Let's start by analyzing the way protocol SEBs, perhaps in a simple way, not doing the complete analysis, but let's analyze the SPPH response. Remember... Not remember, I didn't say that yet. In the actual trial analysis, we came to the relative risk of 1.26 and the maximum, I told you, for non-inferiority was 1.23. So it was a near missed situation. We could not declare non-inferiority and if you go to the publication of the experiment, you can find in the reference in the last slide here. We have to publish that we didn't prove non-inferiority, much to our regrets. Let's go and do it just to show that's a sort of show off for JMP. How we need now is a fit Y by X. It's so simple after all that work. We have treatment for X and we have to use block for centers, just to respect randomization. And there we can explore the results of this here. But I'm looking just for the relative risk, which is one item in the... It's one item on the menu here, relative risk. Well, one is our response and treatment must be in the numerator. That's our choice. There we go. We have down here 1.255, that's the 1.26 that we got with those nice models, random models for center and things like that. So it's 1.25. It's a near miss situation. We didn't prove non-inferiority. Instead of just weeping over the results, we went on and tried to do an analysis that was not planned, but anyway, we published it as a sort of secondary analysis afterwards. Let's analyze the distribution of v. To do that, I'm not going to the distribution platform. Rather, I'm going to use reliability and survival, life distribution because it's a much richer platform for studying distributions, except that the variable, the column must be non- negative. That's the case for volume, okay. I can use this instead of timing here. I don't need sensory, nothing like that. There's no such a thing here. It's just a tool for fitting distributions. Now let's get down to business here. I have distribution of both treatment and control, that is carbetocin and oxytocin. Let's separate those. You can do that by a local data filter for treatment and then I'll choose treatment here, that's carbetocin. On the right here we have the data points, those black dots,. They are so many, 15,000 of them. Those that were treated with carbetocin, that they look like a continuous line but those are the points. They're not having blue, they're nonparametric estimates, nonparametric estimates [inaudible 00:25:41]. They are the same as the binomial point wise, because they have no sensory. Then where's the lognormal here? There's no lognormal in the menu for distributions. That's because there are zeros in the data. Then we cannot fit a lognormal with two parameters. Some women are very lucky enough to have [inaudible 00:26:06] zero millilitre for blood loss. Probably that was some mistake. There were women that went almost to 4000 in the control and those were probably in shock. This large span here for the binomial variation was separating just two. Okay, let's fit the threshold lognormal. The lognormal that you take a shift so that we can put the zeros in the field. Now we have three lines here. The red one is the threshold lognormal. They are all three. They are hiding themselves, the three of them. Then people can say "Well, okay, the fit is very nice, perfect." It's not always like that. If I fit a normal or a smallest value here things like that , you can see that you come out but that's no need here. We can find the risk in several place in this result here. The risk is one minus 0.985. If you don't want to do this sort of subtraction, we can show the survival curve and the risk is 1.47 for carbetocin. If I want to see the risk at 1,000 for oxytocin, it's again the same, 1.47. Wow. We have also confidence interval here. People will challenge us say, "T hose distributions, they look the same because of the scale of the graph." Well, let's take up this challenge. Let's do a zoom here. Let's do a zoom around 1,000. Just because we are caring about that. Look how close the fit is. It's very close. Now I can go even further, like this. And now we can see even more. We see that the point estimates this black dot here, if you want, it's almost the same as the red line, which is the lognormal fence. My fellow investigators there could see that I don't have expressions or a table. A table won't say anything. They could even— I don't know— but they could even think that statisticians were cheating. Here is the easy way to show it but there is more to see here. If you see the confidence interval for the lognormal distribution, it's one third of that of the nonparametric distribution. Well, since the precision goes with the... increase with the square root of the experiment size. We can guess that if we take size one ninth of that I would get for the lognormal the same confidence interval that I get for the nonparametric here. That's interesting. Instead of using 30,000 women, essentially I could use 3000 and get this result. That was very good for the investigators, they planned on that. This reduced [inaudible 00:29:55] of the confidence interval came from the lognormal, which was not planned. So something else to hear. Well, okay, you're doing fine for the risk . You're getting the risk from the log normal which is the same as the binomial rate and you have a closer confidence interval if the log normal assumption is okay, it is. Now what about the relative risk ? Well, we can go and take the logarithm of the V. You have a normal distribution so we have a standard apparatus to do some regressions and find the relative risk. But I remember John Sol talking on this same meeting last year. His talk has a nice title, Delicate Brute Force. Let's use the same thing, delicate brute force. If it's good for John Sol, it's going to be good for us too. Here is the estimation... the estimated parameters of the lognormal that we get. If we can do a bootstrap sample of this, we can compute the risk, the bootstrap risk. We have a bootstrap sample for the risk. We can do that for carbetocin and for oxytocin and that's good. Then you say, "Well, I have to program this." "I have to program the bootstrap sample." It's not difficult but you have to program and then you have to compute 1000 times, 2000 times, whatever it is lognormal fits. But no, JMP is nice twice. If you click with the right button, this table, you have bootstrap on the menu. The suggestion is to take 2500, we can take 5,000 or whatever, but it takes a long time. We did that with 1,000. We were very happy with that. It takes 10 minutes or so for each of treatment and carbetocin or oxytocin. I'm not going to make you wait 10 minutes, I didn't want to wait for longer, right. We did that before and here is the bootstrap sample for the control. I mean, that is oxytocin. The output are the parameters here. The first line is the actual result of the experiments and all the rest is 1,000 bootstrap samples that's why we have 1,001 here. Now this column here came from the parameters. It's just the risk estimate. One minus the log normal distribution at the point 1,000 minus threshold, location and scale. Fine, easy. Now here's the same thing for carbetocin. Now I use a result that I've read the book by [inaudible 00:33:18] , the man who knows everything about bootstrap. To have a bootstrap sample of the relative risk, all I have to do is take those two bootstrap samples here and join the tables row wise. It's a Mickey Mouse operation for JMP, like we do with the tables here and so on. Here's the results. I kept just the risks column here for carbetocin here and oxytocin. If you don't want to use this extra point, I don't know why you wouldn't bu t. We can exclude it to use just the bootstrap. We have the relative risk here, just the quotient of those two columns. We're done. Take the distribution of this bootstrap sample, the relative risks and here we are, we're almost to celebrate now. Here's the distribution. You don't see 1.23 here... Yes, you see, but then we need now one sided confidence interval with 95% coverage. I need the 5% quantile, which is not here. Okay, so we kindly asked JMP to compute that you can put display options, custom quantiles and we need 0. 95 quantile, which turns out to be 1.11. We even have a bonus result which is the confidence interval for this estimate. If you want to be really safe, we can use the upper, the upper confidence limit... For the limit of the confidence limit that's too involved to say. Anyway, it's far away than 1.23. Then we have proven in some sense, we have thrown evidence that carbetocin is non- inferior to oxytocin. That's the result we published. A s I told you, that publication with some work by the investigators, it's warmed the hearts of the EMA, the European authority who was overlooking this trial and carbetocin is not being used on places where you have no code chain assured. Let me use your time if I can, just to show the efficiency that we get. Let me go back to presentation here. Let's see the relative efficiency of binomial versus lognormal. Let's take the problem, not non-inferiority but simple problem of testing the superiority of a new drug over oxytocin. The new drug would be declared superior if it's risk for sPPH is less than 1.5% compared with 2% of oxytocin. We have here all we need to do a binomial test. For the lognormal test we need to convert from this piece to the means. Let's do it. For the [inaudible 00:37:15] , you have this, for the lognormal, we just do this. We want to know the risk, is the probability of being larger than 1,000 so we take logs on both sides, no subtract then standardized, which is now a normal variance. And here for S, the standard deviation, use 0.7. It's important in every bit that we did, every inference that we did with those three trials and then a few more, that we have data available. Every time the standard deviation came about 0.7. We called jokingly, this is a universal constant of P PH. The standard deviation of the logarithm of the blood volume loss is 0.7 we replace S by 0.7. We compute the quantiles at 2% and then 1.5% here and we solve the equations on them and we have those two means to compare with a normal distribution. The difference is 0.08 14. Okay, now we go back to JMP. I feel even ashamed of showing that but it's fun and we did that for our medical team there and it was very compelling, as I told you. Sorry about all those windows open here. Very simple. You come to DOE, Sample Size Explorer, Power, it's mickey mouse stuff. Let's do that for the sample size for two independent sample proportions. We have one sided, it's a superiority test. The proportion under the new is 2%, under the alternative is 1.5%. It's going to change to too much, 1.5 in hiding here. Then we want 8% for Power and the sample size is 17,000. Okay, 17,000 to go old fashioned. Let's compute the sample size for the lognormal. Using Sample Size Explorers from DOE, Power and Power for Two Independent Sample Means. We have a one sided test. We have to add the standard deviations which are 0.7 for both groups and then the difference to detect we had compute 0.0814 and we want 80% Power. Okay, we came to the result. Now, the sample size or the experiment size is 1831. That's about one ninth of the 17000 that we had computed for the binomial [inaudible 00:40:31]. That's what we got by just inspecting the the width of the confidence intervals in the reliability platform. That's how much more efficient using lognormal [inaudible 00:40:48] binomial. Just to finish... Just to finish the rap-up is the lognormal distribution fits very well the blood loss volume distribution so why not to use it? Using this fact, the estimates of the risks are much more precise. We even show that our big trial was saved in some sense by showing non- inferiority of carbetocin, using the lognormal. We are very happy to communicate to you that a new trial is already underway now using the lognormal. Now this trial would not come to life because we don't have money for 30,000 people. But since we are using only less than 4000, that made the trial possible. It's underway now. It's for treatment, not for prevention like they're of others. That's what I had to tell you. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

What if you could save time in your process of collecting data, cleaning it, and readying it to begin your analysis? Accessing data and getting it prepared for review is often the most time-consuming part of creating a new data analysis or project. With that in mind, we would like to introduce the Workflow Builder in JMP 17. With this exciting new feature, JMP users can now record their entire process from beginning to end, starting with accessing data from multiple sources. Working with the action recorder (added in JMP 16 to track steps and provide scripts that can be saved and reused), Workflow Builder tracks all your changes in data prep and cleanup, data analysis, and reporting. In this presentation, we will show how to operate the Workflow Builder, save each action, and then replay and share them in a polished report. This is sure to become your new favorite feature in JMP 17. No manual clean-up means extra time in your day! Hi, thanks for joining us today. I'm Mandy Chambers and I'm a principal test engineer in the JMP development group. And I'm going to talk to you today about a brand new feature for JMP 17. It's the Workflow Builder. And I want to talk to you about how to navigate the data workflow and just sort of the idea behind the fact that the Workflow Builder sort of grants all your wishes for data clean- up. What is the Workflow Builder? It's a new utility that records your data preparation, clean- up, and analysis steps, and it makes it easy to create a workflow that you can use over and over again. During the EA cycle, we did a survey where we went out and we asked our users how they might be interested in using the Workflow Builder. And we got a smattering of answers, but a couple of the top suggestions where people thought they might use it for sharing work with others who might be needing to do the same actions, they might use it for reusing the entire sequence of steps again and again, or taking those steps and applying them to new data. And some people said they would use it for archiving their work and documenting maybe past work so they wouldn't forget how they did things. There were lots of answers all in there, educational purposes, teaching JSL. Today what I'm going to do is I'm going to introduce you to the Workflow Builder and then I'm going to show you several samples of workflows that I created to demonstrate some of these actions. Let's get started and let me share a couple of example workflows. I'm going to open up this workflow and I'm just going to show you what it looks like. This is one I created using Big Class Families. And then I just made value labels and ran a Graph Builder. And so what we're going to do instead of running this one is we're going to create one from scratch. The way you open the Workflow Builder is you go to F ile, New, and you go down here and you select New Workflow and you'll see that the workflow opens up and it's completely empty. There's nothing in it. In order to activate the workflow, you need an action such as opening a table or importing your data. We're going to start by opening... The steps will go in here as you open things. And this is a recording log history that is really fed from the enhanced log. The script that's built when you're running JMP is fed into here. If I leave this open, which I will today, I'll have different workflows up. The history of what I'm doing all day will be captured in this lower part of the Workflow Builder. Let's go over and open up Big Class Families. And I want you to notice that as I open this up, a little button pops up in your window that says, "Hey, do you want to start recording?" This is an option so you can check this to say, "Don't ask me this again." But I want to be asked because I want to know what's happening. I'm going to just leave it the way it is and I'm going to say yes. And you'll see that here's Big Class Families. Notice our new pet c olumn out here that we've added for JMP 17. And this has now recorded this step over here to where it's open Big Class Families and it's also put it down here in the history. And the button up here that is recording is this very first button. And I don't know if you noticed at first it was solid red and now that I'm recording, it's kind of hollowed out. It's white in the middle. Let's do a couple of things that we would normally do if we opened up a JMP data table. I'm going to go into this column and recode it and I'm just going to make titlecase because that will be something that's simple and I don't want an extra column. I'm going to say recode this column in place so we'll be saying name column. And then I'm going to say Recode. And you can see right here, that's been added. And then I want to go to this step and go to Column Properties and I'm going to add my value labels here. And so if I can type, I'm going to add this and add some labels. And you can see the labels are changed and there's the comment there. And then I'm just going to grab a Graph Builder that's out here and run it. And there's a Graph Builder. Now you'll notice that when I ran the Graph Builder, it did not get recorded in the workflow builder steps, nor did it get put down here. Platforms do not get recorded unless you close them and then they're recorded as actions. Or we had some feedback during the cycle that it would be better to be able to save those if you wanted to. Under the platform's red triangle menu, you can go here, say Save Script, and say T o Workflow, so you could add it here. I'm not going to do it this way right now. I'm just going to close Graph Builder and you can see it's been added. I'm done with my workflow that I want to demonstrate here. I'm going to stop recording. And then the second button is a button that sort of resets everything. It will close up everything and reset the workflow. Now I can rerun it, which I will do. Before I do that, I want to talk about a couple of other places where you can record your actions. The enhanced log or the log file here is another place where you can get actions. As I said, all of this has been tracked in here. We opened the table, we created a recode. We did value labels, we ran our Graph Builder. Under this red triangle menu, we also added a save script that says you can save script and add it to the workflow. I could click this and it will add it to the workflow and notice, it added it to the workflow and I'm not in record mode. You can go and grab things. Let's say you're working throughout the day and you do several things and you forget something. You can go back and find it here or you can find it in the log down here and you can push it up. I could grab this recode and I could grab this and I could push it up here. Again, I'm not recording, it's just adding things to my workflow. Now, I don't need all multiple steps in here, so I'm going to go to this one and right click and I'm going to remove it. And the recode, I don't need a second recode, so I'm also going to remove that. I have four steps in my workflow and let's just click the third button. Here is a button that will execute all the steps. These other buttons are buttons that do one step at a time, or you can go backwards one step at a time. But I'm just going to execute the workflow. And there you have it. It ran the workflow. You can see my column here that changed the names to title case. And the labels are here for males and females. And so we have our first workflow. We've built it. Yay, that's successful. Let's close this and let's close the workflow. And I'm not going to save it right now. Let's go on to my second workflow. I'm a big fan of Virtual Join, so I created this little workflow here. And what I did was I went and opened up the three of the pizza tables that we have in samples. I opened Pizza Profiles, Pizza Subjects, and Pizza Responses. And then I created my own link IDs and link references and I ran a Graph Builder. And so I'm going to just run this workflow and then we're going to do a couple of things to change it. Here's the workflow. You can see that I've opened up the tables if you're familiar with Virtual Join, which most people are at this point. Here's where I created a link ID. Here's another one with link ID. And then here's Pizza Responses, which is my table that's actually driving the Graph Builder. This is where I created my link references for these columns. And this is the table that I'm using for the actual platform here that I ran. So I can make this as a presentation and I can make it a little cleaner. I don't really need to see the tables. It would just be nice to see the report. I'm going to show you a couple of things more about the Workflow Builder so that we can do that. If you go to this panel over here, it says Step Settings. And you open this up, this is where all the magic' s happening. This is where all the JSL is being captured. As I hover over this, you've probably seen things popping up like this. You do have actions where you... There's tool tips under here that will show you the script. But there are a couple of things under here. If I don't want to see this table, we have a couple of actions built in here. There's a thing called Show Message which I'll talk about. You can create subsets and random seeds. You can do custom J SL. But what I want to do is I want to hide this table. All of a sudden, here's my step where my table is open. And then here's my hide. I'm going to hide all three of these tables. I'm going to add this action to all three of these steps and hide the tables. And then there's the JSL that was captured when I created a link ID and another ID. And here are my references and so forth. And there's my Graph Builder. I'm going to close this right panel here and I'm going to run this again. And this time you'll see it run through. And there you have it. It runs through. We didn't see any tables. It hid my tables for me. You can see them down here in my JMP Home window down at the bottom. They're there if I want to open something and I want to run another report. But this is a much cleaner report. Let me point out a couple of other things that I might want to do in here. I might want to slow this down a little bit. Another action I could do in here is I could add a custom action. And so let's just add a wait statement in here. And so what I'm going to do is I'm going to type just like you would normally type a JSL step. Just write there to say wait. I kind of want that to be after this step. I'm going to push it down so it follows the step, the step setting the link reference. And there's the wait. And then let's run this again and just pay attention for a second. See if you see it hesitate before it runs t he Graph Builder. There's the hesitation and there's the Graph Builder. There are a couple other menu items. I think it's easier to show you these all along the way. If you want to save your workflow, you go to the File menu and you can say Save or Save As and it will save the workflow locally for you with a .jmpflow is the ending on the file. If you want to add this to a journal, one of the things that's been put together for us is the ability to create a journal out of your workflows. You'll see here, here's the open. Here's your code so you could run your code. And here's the report at the bottom. There's a thumbnail here and I have a full- size graph if I want to see it there. So that's really a nice feature for this because journals are sometimes hard. I did create this one to the right, but I have create them a lot of times and save them and reuse them. So this is just nice that this is sort of built- in. That's a nice feature. The other thing that you can do is you can go up here and you can say save your script to the script window. Just so you're clear, this creates a script that does all the JSL that we've been doing, but it does not regenerate the workflow dialogue. There is not a script that will create that window for you. This would run just straight script. I have the hide function in here that's b een created to hide the tables. It would run the same thing and just run the Graph Builder at the end, but it will not redo the workflow window. I think that's mostly what I wanted to show you in this. One other thing that you'll see is there is the ability to group some of these steps and I have some more workflows. I'll show you where I've done this. But the way that you would do this is you would right- click and say Group. You might want to do that because these are all opens. These are actually steps where I'm changing things about the columns and then I'm running a report. You have the ability to do groups within groups and group some of your workflow together so it's a little cleaner and you know what you're doing. Let's close this workflow. The next workflow that I want to show you is one that actually Peter Hersh designed and it's more along the line... I titled it Distribution Education Type. And he had done this cool thing where had opened up a data table and then I believe he had run a one way and then he had gone through and selectively picked different areas of the platform output and then done a definition of what each one was. I kind of cloned his idea and I picked up my own distribution. And so I'm just running a distribution on this pain column here. And then what he's actually utilizing, and I'll show you in a minute, is the Show Message window where he selected this by using a report script. And then he basically grabbed a definition for quantiles and freezing for a minute. It won't go on until I say okay. And then it moves to this next section and it pulls up the definition here. And then I'm going to say okay again. Notice the little running man over here as well. I didn't point that out before. It's hesitated right now. So the first step has been completed and you get a green check. But the little running man is here now. He's kind of waiting for me to finish. And when I click this, he turns into a green check, too. Let's close this. And then I want to show you another feature where you can go down here under the red triangle and you can duplicate a workflow. And so I'm going to go into this workflow and get into the side over here that's kind of magical. I'm going to close this one because I don't really need it anymore. And then I'm going to show you how you could just go in here. Now, you can recapture this by bringing up another table and doing all the steps again. Or you could just go in here and you could do a little typing. And I'm not the best typist, but I'm going to do it this way. I'm going to go in here because I want to show you how this was done. There's a distribution running here and the distribution was captured. But he went in here and he added this and assigned it to a report. He's just calling it Report OW. I'm going to go in here now and I'm going to change the table to the body fat table. And I'm going to pick a different column, percent body fat. And I need to type it right. And then this is the part down here where he's selected it, and he's doing the definition of quantiles. So that's the first part of this. And then this action here is where he added Show Message. The Show Message step is right here. And he typed in quantiles. We typed in a definition here. He selected it to be a modal window. And then he went to the next step, which is a custom action, the clear action. And he did that by selecting custom here and he named it clear selection. So he's taking the report and saying d eselect it, just straight up JSL. And then the next step is to select the next part of this, which I need to change it to percent body fat. And he's selecting the summary statistic and then the Show Message for that is summary statistic and the definition. So if I typed this correctly, and hopefully I did, we should be able to close this side, go over here, and run this again. There's the body fat table, different table open. Here's the body fat distribution. Quantiles are selected here. There's my quantiles definition. Say okay. And now there's the summary statistic and the definition. This is kind of a cool thing. It's a neat way to use it for maybe a teaching tool, some kind of educational piece. But I just thought I would show that quickly because it's just another way to use workflow. Let's get into... I decided that it would be a little bit more interesting to maybe show you a real- life example. qwas talking with my husband, actually, and we were talking about what kind of data I could go and find. And we were talking about this payroll protection program and sort of a real data example. So this is government data. It's all public knowledge. I was able to get in there and really drill into court orders and court proceedings. You can see people's names. There's a way to search in every state for any kind of company. There's a lot of data out there. I took a smaller table, which I'm giving you guys an abbreviated version of the journal. I didn't give you the whole thing. But you will get this table. I believe it's in there. This is a smaller list, but it was a pre- alleged PPP fraud data list. And so they went through, and they've actually tried and called and found people. Couple of data points here. The accused folks were seeking about $250 million in loans, but they actually obtained about 113 million which, okay, to me, that's not small amounts of money. Joshua Bellamy was an ex-NFL football player that played for the New York Jets, and he conspired with this guy, Phillip Augustin, with Drip Entertainment, which is sort of a music industry... And they connected together from Florida and Ohio. And I kind of have a map where I can show how I went in and found the two of them. And I think they came up with about $17 or 24 million or something. But it was all fraudulent money. This place called Papillon Air, I believe they got a large amount of money, too, but they took about $2.5 million and purchased luxury cars and a private plane. There were people in Houston, one individual, that went through and applied for about 80 different loan applications, working with various different people, with fake companies, fake different licenses and agreements, and he purchased a Lamborghini and a lake property. And I've got a little note here that says you need to be really careful with your text messages. I read some of the correspondence in that particular court order, and it's right there in the court order as he's texting these different people all over the country saying, "Hey, it's time that we go and file our tax form," for this and that and everything else, and it's all right there, so just be careful with what you do. This is read in from the government. It's a straight- up Excel file. I created the whole thing, reading it in and capturing stuff and cleaning up. Here's an example of where I imported the data. You can see groups here. I grouped columns to change names of columns, formats, change things to multiple response so they would do better in a mapping situation. I've got labels in here, selected deleting rows and hiding and excluding certain things, selection of rows to create markers and colors to make another map. And then there are several reports that we'll run. I want to thank Lisa Grossman— I'll thank her at the end, too— but she helped me with some of these Graph Builders. She's on my team and I appreciated her helping me so I could show some of the features that are in JMP 17 for Graph Builder. And you'll notice down here there are several reports. And this particular report right here is lighter and it's italicized to the right. And I just want to say that the way... That's a report that I'm not showing you, I didn't want to delete it, but the way that you enable and disable that is you right click and you can say step enabled or not enabled and I'm going to take it out. I didn't really want to delete it because I might want to use it some other time. So this is a nice way to say, "Hey, I'm running some stuff, but I only need to show somebody maybe one or two things." That's just a way to keep something in there and not lose what you've already captured. Let's run this workflow. And you can see it runs fairly quickly. Now, it took me a while to build the workflow. Let's go through and let's just talk about a few of these things. This is a Text Explorer. I'm just a big fan of Text Explorer because I like word clouds. But this is the DOJ records. They had a column where they were talking about what all happened in that particular aspect of the charge against these different people. It's just interesting to see the words here. According to allegations and allegedly and the millions and the companies. You can see PPP funds and that kind of thing. That was really just fun for me. I wanted to show you that. Here's a Graph Builder that we designed. And I made myself some notes. I used the aspect of this part of the workflow so I could remember what I wanted to say about some of these things. There's a little notes section right here. The comment we made here was this Hawaii guy really cleaned up. I don't know, he tried to get about $18 million. He got almost 13. I guess that was good for him at some point, but not in the end. And it said he ended up falsifying, how many different employees were there, and all of that. But you can see in the map here, the red is kind of honed in on him. That's where he was located. That is just a regular map using the colors here. And then this particular map is a map of the states with... I called it states with red flags. And in looking at this, this is where I went into the data and I actually was able to find— and I'll pull the table over here— I was able to find these two guys pretty easily. This guy, Phillip Augustin, is right here. And then Joshua Bellamy was down here. And as I was looking at these, I was able to go across and see what they actually did. Joshua played for the New York Jets and lived up near New Jersey. When you link these together, you can see that some of the addresses maybe fall in Florida, but other places fall in Ohio. And then the address he used for the company was Cross River Bank, which was up in New Jersey. And then the notes that are written here are that this guy up here was using this Clear Vision Music Company and they got $17 million in funds. They filed about 90 different fraudulent applications for the millions of dollars that they got. And then Joshua was down here. So there's similar comments, but the company here was Utilization Review Pros, and the company here was Clear Vision Music Company. So just an interesting... The map that Lisa made me here was kind of cool because the one I saw the government had was very flat and had little red flags in it. We kind of designed it this way. Then this is just the distribution that I had because it's an easy way to see the states. Then you can see Florida here, Georgia, looks like New York and Texas were hot with the fraudulent areas. We had this other field out here where it said, "Did they plead guilty?" We were looking at that, wondering how many different places did they plead guilty. Did it make any difference? It didn't seem to make any difference. And I have another graph in the next set of data. I just wondered if maybe they got off easier. I don't really know. But if you want to look at that later, I do have the links in there for all of this to be able to show you. So that's that particular workflow. I wanted to go into the bigger workflow. Let me go down into this one. This is the single entries for every one of the PPP loans. And I think they're about $1.6 million in rows or 1.6 million unique entries. I went in and I created a smaller workflow that actually goes through and imports the data, concatenates the tables together, and saves it. And then I went to a second workflow where I did all my cleanup. You could have done them all together, but I wanted to split it up. This is an example, too, of just using something where I had created this. And I realized that I already had the table out on my desktop so I didn't want to be redoing the same things over and over so I added a JSL step, just a custom action myself to say, "Hey, go look for this. If it's there, then delete it." I'm going to run this workflow and pay attention to the little running guy right here. He's working, he's running out to go get the data. It's kind of big, he's importing it. And then the little check marks are done as we're running through this and we're concatenating the tables, we're getting rid of those tables, and then we're saving the one big table. All I did was open that, run it, and create my data table. I'm going to close this up and I'm going to go to the next workflow. And now I have that table and I'm ready to run my bigger workflow here. Again, I created formats, I standardized attributes here. I went through and I did some recoding. I wanted to get some latitude and longitude for different cities. There was a table that got opened with those that got updated into this table and I closed that table because I didn't need it. I've got a tabulate report and a couple more Graph Builders and so forth and distribution. Let's just run this through. Obviously, it takes a little while to build these things, but then once you've built it, you've got your reports and you're ready to roll so you can see how quickly it runs. This is a distribution that I ran and in this particular data table because it's a single entry for everything, I went in and I was reading and to figure out, okay, what's fraud and what's not. You'll notice in this loan status here, this Exemption 4s were the ones that were fraudulent and so those are the ones that are interesting. If I click on that you can kind of see that almost all the states had maybe something in there. Not quite all of them, but some of them. And just to be fair, there was a fairly large amount of money here that was paid back in full. People did pay back some of the loans. I was trying to see if things made differences as to whether they were corporations, limited liabilities, sole proprietorships, or anything like that. And I couldn't really see that it fell one way or the other, whether it was fraud or not fraud. This particular graph here, and I'm going to go open up my little side panel that helps me cheat a little bit. This is a Graph Builder that's showing a new feature in JMP 17 that I want to point out here at the bottom. And it's the tabular summations. This is just a bar chart that shows the total amounts for each loan status and it's overlaid with the business type. Again, just curious, did it make any difference if it was a new, existing business, if it was younger, older, or start up? And I'm not sure that really mattered. But the cool feature down here is just to kind of see. These are the summations you can now get in Graph Builder. I can show you how to do that. If you open up the control panel under here, there's something called a caption box, and the location here is an access table. And so if I drill down on this, there's an access table here and that's been selected. That's what allows you to be able to do that. It's showing the sum of each of those different business age descriptions. And then my graph behind here— I'm going to show you in a minute— uses the Axis Reference Line, which is another part of that. That's where that comes from. Let's look at that one. This was a graph that was actually done with the Axis Reference Lines. And this is actually showing the average current approval amount per business type, whether it was rural or urban. And the comment here that Lisa made was, "I don't know what it is to be an employee with stock options, but they really cleaned up or racked up here," if you look at that bar chart. Again, I was looking at these, trying to figure out does it have anything to do with whether it's limited liability. There's that partnership here, there's two of those because I think you can be an individual and then you can also have a partnership. I didn't see any big differences with that. But again, here's this access reference line that's showing the main rural and the mean urban. That is just a marker that you can now add in there so that you have a little bit of measurement when you're doing those graphs. This particular graph is using the latitude and the longitude which we brought in. And this is kind of showing Hawaii. And so in this particular graph, it again is showing sort of a new feature for the... If you go in here and look at the graphs and look at the background maps, it's doing the street maps and there's been some things added underneath the selections for URLs. And this is using the map box satellite. So that's a nice look, a nice graph. Again, this is Hawaii. And I think our guy from my first map was actually in this part of the islands. The reddest dot up here is this one up here. I'm not real sure about that, but I do remember Hawaii was all red so it seemed like there was a lot that went on in Hawaii. This particular tabulate I wanted to show you here, this is actually showing a feature in JMP 17 that was introduced earlier in the 17 cycle. You may have seen it if you use the EAs, but we created this ability to do pack analysis columns on the right- hand side. And so the way that you do those is you can go over here and I've used the current approval amount with the forgiveness amount. They'r e packed together by right- clicking and saying pack the columns so they're packed, can unpack them. And then there's a template. I had gone out and the templates in here with the first and the other, and you can do a name selector of using a comma or something else. But I went in here and added a little spacing in here and I changed it to brackets. I believe the default comes up with no spaces and it's parentheses. It kind of makes a nice report. Again, I was looking at the exemption part of this, which is the fraud. Here's the paid in full, which is a little bit more money maybe. But if you look at the exemption part, I kind of honed in here. And this is where I was looking at... Here's corporation that said 30% of the total and the limited liability, the LLCs are about 29%. Just some interesting data points. Like I said, I give you the references to these and if you want to go and dig in and look yourself, you feel free. That wraps up the Workflow Builder demonstration. I want to close and I just want to say thanks to the development staff that worked really hard on designing the Workflow Builder, Ernest Vasseur, Dave White, Evan McCorkel, just to name a few. There were a lot of people that worked on this. Julian Paris was also really key in the design phase and prototyping and helping a little bit with initial testing. Again, I thank Lisa for the Graph Builder assistance as well. There are references here, like I said, included for the PPP data, so you can look at that. And I just want to close with saying that I think Workflow Builder will be the best new feature probably in JMP 17. I'm probably a little biased, but I think it's going to save you time with your data clean up and prep. I think you're going to get more out of reusing recorded and repetitive steps that you find yourself doing maybe every day. It should simplify your work efforts and maybe accelerate your daily processes, but it's going to leave you a lot more time in your day for other stuff. So try it out and we'd look forward to talking with you about it and good luck and thank you for letting me share with you today.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

Most of us know the old expression, “Garbage In = Garbage Out.” This expression is critical when applied to systems that we use to measure outputs on the production line or in the lab. After all, if we can’t adequately measure an output, then how can we expect to control or improve it? A Measurement Systems Analysis (MSA) is an exercise that can help us understand both current gauge performance as well as set strategies for using the gauge in a lab, manufacturing line, etc. It can also help determine whether to improve or replace the gauge. Since it requires a monetary and time investment to complete, this talk focuses on getting the most out of this investment. The presentation goes beyond traditional Gage R&R tools, showing how to apply GR&R and EMP results to visualize gauge performance and to establish guidelines for appropriate use of a gauge. Gauge performance curves will be introduced as a visualization tool. Deployment guidelines will include EMP gauge assessments and spec limit guard banding. The background stories, anecdotes, recommendations and cautionary tales are based on experiences (successes and failures) over many decades developing and improving measurements systems. An introduction to a new JMP add-in that supports this extended analysis is included.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

This paper describes an approach for driving yield improvements by analyzing process performance data with JMP. Analysis of performance data -- including process long-term and short-term capability, stability and statistical control -- is particularly useful when monitoring hundreds of process KPIs retrospectively. Modern manufacturing requires many process and metrology steps to ensure healthy product lines and high-quality products. During the production ramp-up phase, identifying the processes of most concern is highly challenging. Using JMP scripting and quality data analysis, Magic Leap’s Eyepiece Manufacturing factory implements an automated process that can pull, analyze, visualize, correlate, predict and verify factory yield improvement based on a variety of performance metrics. Hello everyone. My name is Harry Dong. I'm the director of the Optical Process Engineering Group at Magic Leap. I'm so excited to be here to present my topic: Factory Yield Ramp- up Approach through P rocess Performance Metrics Guided Improvement Activities. Today I'm going to cover Factory Process Performance Overview, the Process Screening Tool and the Statistical Process Control using JMP Script to automate the process capability analysis and the conclusions. People have been talking about process performance, S uch as capability stability, but what are they? Basically, process capability is a measure of the ability of the process to meet the specifications. While, the process stability refers to the consistency of KPIs over time. It's very important to realize there is no inherent relationship between process stability and process capability. Thus it is both extremely important aspects of any manufacturing process. As you can see from the bottom- left picture. It's showing, process can be both capable and stable meantime, this is a perfect world. Basically your process is super tight against your lower spec limit, upper spec limit. But on the other side. When you started seeing instability, when your variation is small, you'll still be able to meet the process specifications. But over time, you will see a lot of variations. The other scenarios can be your process is super stable. Over time you don't see a lot of up and downs, But because it could be like a small a three Sigma, a lower spec limit, upper spec limit, you have large variation, then you'll not be able to meet your specification against a high yield target, high process capability target. The worst case can be your process is not capable, which means you have large variation and over time you also see a lot of stability issues. It's everywhere, Across the whole industry, monitoring factory process using the process performance plot becomes very useful. It's including capability, stability in the combined metrics. As you can see, this is a JMP- generated process performance plot. On the x- axis you see the stability index and the y- axis is indicating the capability overall which is a Ppk. Eventually this is a four quadrant plot. We want to push everything low stability and high process capability which is in the green zone. Often you'll see some process not capable which is under this line or not stable, which is across the vertical line showing in this graph. I'm going to talk about the Process Screening function, JMP provided. This is a very powerful tool, if you are talking about quickly identify unstable process or some incapable process, meanwhile. Basically this is a JMP- generated report, as you can see, I have 24 processes listed in this report. Very quickly you can see stability index is showing up. How do you define the process stability? It's calculated using the Within Sigma and Overall Sigma, This is the ratio using the Overall Sigma divided by Within Sigma, What do they mean? Overall Sigma usually treated as a longterm process variation and the Within Sigma treated as a shortened process variation. So JMP has certain rules that you can refer to for this calculation. But basically you can define color codes. How do you see your process stable? You can use 1, 1.3, whatever the number you want, to color code that. For me, as you can see, I color coded process greater than 1.7 as a process not being stable. Yellow as a process is kind of marginal, And the green zone meaning the process is super stable. This is telling us some process can be stable but not capable, Because you can see the Ppk Cpk are kind of low, But on the other side, you can see a bunch of process, they were not stable, but some of them are very capable, Again, this is aligned with what I went through earlier. Basically, utilizing the Process Screening tool, you can quickly identify the unstable processes as I marked in this graph. Meanwhile, as you can see in this Process Screening tool, you can also see the control chart alarms based on the samples that the raw data you put in the report. We finally deploy the control chart, called statistical process control chart, be able to monitor and improve process capability. Stability is very important, This is a JMP- generated control chart, You can use Windows Scheduler or JMP Live to automate those charts to be able to pull the data real time. Meanwhile, you can also use certain web API to generate the control chart. JMP is very useful in this case. It can send out email notification to the group, to individual engineers. Meanwhile, if you can connect us with your internal system , I think that's going to give you additional power to be able to communicate with your process tool, be able to pause the tool, put it on hold for the engineers to react to the variations either process shift or out of control data points. Process Capability Analysis is a standard, Basically a few rules that we need to follow, First of all, you want to make sure your data set is following the normal distribution. This can be done using JMP tool, I won't go through. But on the other side we realize Cpk calculation changes dramatically due to the outliers, especially when sample size is small. The outliers can be driven by special causes, excursions during the process. So this can add bias to Process Capability Analysis. Sometimes if you do see some outliers, you can drag your Cpk down, but it's not representative to your standard process variation. Apply outlier removal method to remove outliers. We'll help you get rid of those noises to better understand your true process capability. Within the JMP, they offer different methodology to remove outliers. I won't go through each of them, but they're all very powerful. You can read through the instructions which method is the best in your case. Basically, they can be found under Analyze/S creening /E xplore Outliers. Basically, I'm going to show the basic method we are using to exclude the extreme values for our process, which is a quantile range outlier removal. Basically this tool we found is very useful, Because when you try to pull the data through the database, often you'll find some outliers. Some data are very extreme, you know they're outliers, some are not as obvious, So basically this quantile range outlier method offer you flexibility. Basically, as you can see, this is the distribution of our process. We have upper spec limit, target and you also have the mean. And follow this Box Plot, you can find the 10th percentile value, or the 90th percentile value. Basically the inter-percentile range is calculated using 90th percentile minus 10th percentile value and this is defined as your inter-percentile range. The lower threshold value is calculated, using the 10th percentile value minus three times the calculated inter-percentile range. So this becomes your low threshold for each individual processes. On the other side the high threshold is defined by 90th percentile plus three times inter-percentile range. This becomes your high threshold. Basically all the extreme value outside of this range will be treated as outliers and they can be colored. They can be marked as missing . They can be excluded from your data analysis. As you can see, JMP did provide the flexibility. How do you define the inter-percentile range? You can do 0.1, you can do other values as needed. And also the Q value, which is this value I highlighted here, can be changed as well. It really depending on how much noise you want to get rid of from your data analysis. A s you can see, This is an example that showing some value is being colored and also being changed to missing value, so they will be excluded from the data analysis. This is a quick demonstration to show you how the outlier is going to impact your Cpk calculation. As you can see, Cpk value remains equal or better post outlier removal. It really depends on your sample size. Sometimes if your sample size is small, the change can be more dramatic. But in my case, I believe my sample size is quite large. This is why they're not showing very big differences. Yeah, the quantile range o utliers parameters can be tuned if necessary as I mentioned earlier. Other things I want to highlight in this page, it's the process capability box plot. We figure out this is very useful tool because you can be monitoring many process parameters. So be able to put them together to visualize how tight they are, which direction they're shifting and how much variation is being counted to calculate our process capability. Process stability is very useful, as you can see. They use a standardize s pec limit to be able to combine everything together. It's super useful for data visualization. I want to quickly show you, before and after we automate the process performance analysis. At the beginning team are not using the JMP scripting to automate this process, as you can see. We have to collect the quality data by individual process owners per module, so data, sometimes, becomes not standard. They use different formats and then they often do manual outlier removal and then they have to grab all the data together to be able to merge them. It's also a manual process, very tedious. After that they have to do the manual process performance because the variation they see through different people, different format and then generate report. They have to eventually repeat everything they did on like weekly basis, monthly basis or per PEQ build. But on the other side for the automated process flow, basically the quality data can be queried all in one step. All the raw data, all the specification data can be put from the database using the SQL JMP scripting, Of course we apply standard outlier removal methodology across all the data and then all the sorting, split, spec assignment, all the different visualization can be done automatically. Standard report will be generated. Then, when we are talking about over time or per PEQ summary , you can simply modify your SQL query filter to update the data and on top of that, you have basically all the raw data. You can apply local data filter. You can add additional functions to make your filter data analysis easy. Basically, we figure out manual process is very time consuming, the feedback is slow and it's not very efficient regarding the yield improvement. On the other side, we figure out the automated scripting process using JMP. Anybody can perform this complicated process performance analysis in minutes. Some highlights I want to share for the process capability analysis. As you can see, I didn't include the JMP query portion. But I basically put different process name, different test label, all the raw data into the JMP table and after that we figure out to split this d ata table because table comes in everything combined process, part ID, process time, test label. So we have to eventually split the data table to be able to perform this process capability analysis, the box plots or the Cpk Ppk analysis. So very useful function for JMP is after we split, group them. So we have some missing value because we sort the data by data time. Some process leave blank but JMP is smart enough not counting those missing value into the capability analysis which is very useful, So I do want to mention that. Please don't forget to sort your data by date time because that's super important because sometimes the Cpk Ppk calculation is really depending on the process sequencing. If you are not sorting the data by date time, then the results can be biased, So we made some mistake earlier, figure out this is useful tip. The other one is automated outlier removal using quantile range outlier. As you can see, b asically, very simple process. You get all the column names using this scripting. Condition can be numeric, continuous and then you start at launching the quantile range outlier platform basically assign them into the report and then go through the report. Use this for loop function to be able to exclude all the outliers identified, This can be repeated as needed . Then you can also launch the process capability analysis. If you don't know how to do it, you can basically manually do it and then grab the code from the log, This is a new function JMP provided which we figure out is super helpful. Something I want to highlight here is the spec limit assignment, This is a super powerful, very useful. Basically you can assign specification for multiple process variables, using another data table that's generated using t he SQL query as well. Conclusions. Analyzing process performance data using JMP is super critical to drive the yield improvement in modern factories, especially with many process and methodology stacks to ensure healthy production lines and deliver high quality products. Analysis of performance data including long term, short term process capability, stability and statistical process control is particularly useful when monitoring hundreds of process KPIs. During the production ramp up phase. Identifying the processes of most concern is highly challenging and using JMP scripting and quality data analysis platforms , our Eyepieces factory implemented an automatic processor which can pull, analyze, visualize, correlate, predict and verify factory yield improvement based on a variety of performance matrix. Magic Leap's Eyepiece Factory, we demonstrated greater than 90% RTY, which includes hundreds of process KPIs. Eventually, you have to multiply them together to get this RTY number, no throughput yield. We were able to demonstrate greater than 90%e RTY six months ahead of our next generation product launch, driven by continuous process improvement activities, guided by automated process performance analysis, using JMP scripting, and quality platform tools. And thank you for your time.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

The quality and SPC platforms in JMP 17 have many new features and capabilities that make quality analysis easier and more effective than ever. The measurement systems analysis platforms—Evaluating the Measurement Process (EMP) MSA and Variability Chart—have been reorganized and improved and a new Type 1 Gauge Analysis platform has been added. The Manage Limits utility (previously called Manage Spec Limits) has been generalized and expanded to handle many types of quality related limits that are needed to work easily with many processes in various quality platforms. The Distribution platform has added the ability to adjust for limits of detection when fitting distributions and performing process capability analysis. Control Chart Builder has several new features including a label role, a row legend, a new button to switch an XBar/R chart to an IMR chart, new dialog options and Connect Thru Missing. Both the EWMA and the Cusum Control Charts have several new features including the abilities to save and read from a limits file and save additional information to the summary table. Hello, my name is Laura Lancaster and I'm here with my colleague, Annie Dudley Zangi, to talk about recent developments in JMP quality and SPC. The first thing I want to talk about is some improvements that we've made to the distribution platform specifically related to limits of detection. So limited detection is when we're unable to measure above or below a certain threshold. And in JMP Pro 16, some functionality was added for limits of detection. Specifically in the DOE platform, we added the ability to account for limits of detection and a Detection Limits column property was added that's used by the Generalized Regression platform to specify censoring for responses. However, what was left unaddressed was a problem with process capability and limits of detection. The problem is that when you ignore limits of detection when analyzing process capability, it can give misleading results. And there was no way to do process capability with censored data. But in JMP Pro 17, and I just wanted to note that this is the only feature that we're going to talk about that's JMP Pro. Everything else is regular JMP in this talk. So in JMP Pro 17, now in the Distribution Platform, we recognize that Detection Limits column property and we can adjust the fitters for censored data. That means that the Process Capability report that's within those fitters that use the adjusted fit to account for censored data will give more accurate results. And the available adjusted distribution fitters are Normal, Log normal, Gamma, Weibull, Exponential, and Beta. And before I go to the example, I just wanted to give a shout out to check out the poster session Introducing Limits of Detection in the Distribution Platform that Clay Barker and I worked on if you want to learn more about this. Let's go ahead and go to JMP. Here I have some drug impurity data where I have an issue with being able to detect impurities below a value of one. And this data that I've recorded is actually in the second column and anywhere that I wasn't able to record an impurity because it was below one, I've simply recorded it as a one. So this is censored data. This first column is really the true impurity values that I'm unable to know, unable to detect with my detection. So let's go ahead and compare both of these columns using distribution. So if I go to Analyze, Distribution, and I look at both of these columns, you can clearly see there's a pretty big difference between having true impurity values which I'm unable to know, and the censored data. Ultimately, what I want to do is I want to do a log normal fit and run a process capability analysis on this data. So I'm going to go ahead and do that for both of these distributions. So I'm going to do log normal fit for both of them. You can see that I get... Obviously the histograms look pretty different and my fits look pretty different too, which isn't surprising. Now, I want to do Process Capability on both of these. I've already added an upper spec limit as a column property, and you can see that when I have my true data, which I'm unable to know, my capability analysis looks pretty different from having the censored data. With the true data, my capability looks pretty bad. There's probably something I need to address. But because I'm not able to see the true data, and I only have the censored data that I can analyze in JMP, the PPK value is a lot better. It's above one, and I may blissfully move along thinking that my process is capable when in actuality, it really isn't so good. But thankfully, in JMP Pro 17, I can add a detection limits column property in my data. So this third column is the same as my second column, except that I've added a detection limits column property. So I've added that I have a lower detection limit of one. And now when I run Distribution platform on this third column with a detection limit column property, and I do my logn ormal fit and notice because I have censored data, I have a limited number of distributions available, I'm going to do my log normal fit, and it's telling me it detected that detection limit column property, and it knows I have a lower detection limit of one. And when I do Process Capability, you can see that my capability analysis is more in line with when I had the true data because my PPK is 0.546, doesn't look so good. And I realize that there's probably something that I need to address with this process. It's not very capable. All right, so let's move along to the next topic. The next thing I want to talk about is some improvements in Measurement Systems Analysis, specifically the Type 1 Gauge Analysis platform. A Type 1 Gauge Analysis platform is a basic measurement study that analyzes the repeatability and bias of a gauge to measure one part relative to a reference standard. It's usually performed before more complex types of MSA studies such as EMP or Gauge R&R that are already in JMP. It's required by some standard organizations such as VDA in Germany, and this has been requested by our customers for quite a while, but we believe it's useful for anyone, whether it's required by a standard organization or not. It's located in JMP 17 in the Measurement Systems Analysis launch dialog as an MSA Method type. It requires a reference standard value to compare your measurements against, and a tolerance range where you want your measurements to be within 20% of your tolerance range. Produces a run chart, metrics such as Cg, Cgk, which are comparable to capability statistics, bias analysis, and a histogram for analyzing normality. Let's go ahead and look at this new platform in JMP. Here is my Type 1 Gauge data. It's simply measurements of one part with one gauge. So to get to the platform, I go to Analyze, Quality and Process, Measurement Systems Analysis. And the first thing I want to do is change the method from EMP to Type 1 Gauge. I'm going to move my measurements as the response I'm going to leave everything else at default, and I'm going to click OK. But before I can proceed to get my report, I have to enter that metadata that I mentioned earlier, the reference value and the tolerance range. So I'm going to go ahead and enter that information. I'm going to enter it as a tolerance range and my reference value. I'm going to skip resolution because that's optional. Click OK. And this is the default report that I get. I get a run chart on my measurements graphed against my reference line. And I also get the 20% tolerance range lines. One's 10% tolerance range above reference and one is 10% below. So you get some default capability statistics. Now notice that my measurements are well within the 20% of my tolerance range, which is really good. I could also do a Bias Test to see if my measurements are biased. It looks okay. And I could also turn on a histogram to test for normality. And before we move on, I wanted to find out one more thing, and that's that in this top outline menu, if I click on that, there's an option to save that metadata that I had to enter to be able to get this report. Remember, I had to enter the reference value and the tolerance range. So I could either save this metadata as a column property and we've introduced a new column property called MSA, or I could save it to a table. I'm going to go ahead and save it as a column property so I can show you the new MSA column properties. If I go back to the data table, this is the new MSA column property. You can see it's storing my tolerance range and my reference value, and it also can hold other metadata for other types of MSA analysis. Let's move along to the next topic. I also want to talk about some improvements to existing MSA platforms, the EMP MSA platform. EMP stands for Evaluating the Measurement Process, and this is the platform based on Don Wheeler's approach and a variability chart platform. So in both of these platforms, we've improved the usability when analyzing multiple measurements at one time. We have better handling of the metadata, such as [inaudible 00:10:16] or tolerance values or process Sigma. So this has been improved in variability charts and it's added to the EMP MSA platform. We've also reorganized the reports so that they work better with data filters. In addition, we've filled in the gaps between the EMP MSA platform and the Variability Chart. We've done this by adding some reports to the EMP MSA platform. We've added the Misclassification P robability Report, the AIAG Gauge R&R report, and a Linearity and Bias report. In addition, we've modernized the Linearity report and the Variability Chart to match the new Linearity report in the EMP MSA platform. So let's go ahead and look at some of these changes. So here I have some measurement systems analysis data for some tablets where I've measured two different attributes with multiple operators. And I want to analyze this using the EMP platform. So I'm going to go to Analyze, Quality and Process Measurement Systems Analysis. First thing I want to do is change the method back to EMP, take my measurements as response, Tablet as Part, Operator as Grouping. Notice there's now this standard role, if I were doing a linearity and bias study, I would use that, but I'm not in this example. Also some new options down here in the dialogue, but the one I want to point out is the Show EMP Metadata Entry Dialogue. I want to set that to Yes so I can enter tolerance values and a historical Sigma for the AIAG Gauge R&R report. So I'm going to click OK and this dialogue pops up. I don't have to enter this data during the launch, but I'm going to because I think it's easier. So I'm going to go ahead and enter the data, and when I click OK, my report looks similar to how it's always looked when I've had multiple measurements. But I also have an additional outline at the top, and we'll look at that in a minute. But the first thing I want to do is I want to turn on the Misclassification Probabilities report for both of these analyses. So I'm going to choose Misclassification Probabilities, and you can see, I get a new misclassification probability report for both of these and it's available without a prompt because I've already entered my lower and upper tolerance values. Now, if I had not already entered that information, I would have been prompted. Or I could use the new option, Edit MSA Metadata, to either enter or edit any of that information, which would automatically update any of the corresponding reports. Let's go ahead and turn on the AIAG Gauge R&R report for both of these as well. And you can see I get an AI AG Gauge R&R report that looks a lot like what's in the Variability C hart platform, and it includes that percent tolerance column because I entered tolerance values and percent process, because I entered historical Sigma. I could also turn on the discrimination ratio if I desired. And before we move on, I just want to point out at this top outline menu, once again, we have an option to save the metadata. I can save the metadata, which includes not only the MSA metadata, but also I can save out measurement Sigma, which is a result of my MSA analysis, which can be consumed by the Process Screening platform. So it's going to be considered process screening metadata, and there's actually a new process screening column property for that. But I'm going to save this as a table just so we can look at it. I can see I have my MSA metadata, plus I've saved out the measurement Sigma once I've computed those variance components. So let's go on to the next topic, my final topic before I hand this over to Annie, the last thing I wanted to talk about was some improvements to the Manage Spec Limits utility. In fact, the name has been changed to the Manage Limits utility because now it handles more than just spec limits. It still handles spec limits and anything related to process capability. But now it also can handle Process Screening metadata, which includes centerline, specified Sigma, and measurement Sigma, MSA metadata, and Detection Limits. So now I'm going to hand this over to Annie. Hi, everyone. I am Annie Dudley Zengi, and I am the developer responsible for control charts in JMP. I'm here to talk with you about some of the new features that I added for Control Chart Builder in version 17. So I added a Label Role in addition to the Y, the subgroup, and the phase role, there's now a label role. I've added a button so that you can switch an XBar and R chart to an IMR chart. I added a row legend, a Connect Thru Missing Command, and I've done some Dialog U pdates. I'll start with this data table diameter, which you can find in the sample data. And let's start with the label role. So I'm going to alternate between using the interface and using the dialogs so that everybody can get a feel for both. If I start with the interface, and I drag D iameter in to the graph, we immediately see we get an Individual and Moving Range chart. Now, one thing that you'll notice that's new here is this new role in the lower left- hand corner of the chart for the label. Now I can drag Day in. Now I want to take a look at Day here in the data table. So you might notice that there are six different rows that are associated with May 1st, 1998. There are six rows associated with every date in this particular data table. And we know that if we were to drag that to the Subgroup role, then Control Chart Builder will automatically aggregate. But sometimes we don't want that. So for this example, I'm going to drag this to the Label role. You notice we still have an Individual and Moving Range chart. It did not switch and it did not aggregate the data. We can see that it's a regular axis. We currently have an increment of 24. We can change the increment to six. We can see every date on the x- axis and we can still see that we have an Individual and Moving Range chart of Diameter. So there's the Label role. Now the next option is the switch to the IMR chart. This option was made available because there's now a Label role. To switch to an IMR chart, we first have to have an XBar on our chart. So I will create an XBar on our chart through the dialog. You can choose Control Chart and then XBar Control Chart. Again I'll move Diameter to the Y. And this time I'm going to put Day in as a Subgroup. You can see here it's aggregated the data because we have Day as actually the subgroup. But if I show the control panel and I scroll down, you'll notice there's a new button here underneath the old button of the Three Way Control Chart. And when I click that button, it moves the variable from the Subgroup role into the Label role. So you see we now have an Individual and Moving Range Chart of Diameter. Now, the next option is a Row Legend. Row Legend is new for Control Chart Builder. And I have a little note here. The Row Legend option is only going to appear when there's only one row per subgroup. So if you right- click like you do in a lot of other graphs in many other platforms in JMP, you'll now see a Row Legend here, but only if you have one row per subgroup. And the Row Legend acts like a row legend does anywhere else. I can choose, say, for example, Operator, and it will color by Operator by default. And now you have your points colored accordingly. The next option— I'm going to close this— is Connect Thru Missing. Now, Connect Thru Missing is going to involve some missing data. So let's open up Coding, which happens to have the Weight that you might normally be measuring, but it also has Weight 2 that has missing data. If I go through the interface and create two control charts, you notice we have a good- looking control chart here. Everything is connected and so forth. But if we scroll down to the second one, we see some gaps. And sometimes management doesn't want to see the gaps, so we need to connect those. So there's a new option under the red triangle menu called Connect Thru Missing. You can see the little caption there. It says, "This item is new as of version 17." This was in the old Legacy platform. And so I've been bringing more options into Control Chart Builder that were available in the old Legacy platform. So there's your Connect Thru missing. Now, the next option— I'm going to switch back to my slides here for a moment— so the next option is the Laney and P prime control charts. This is a bigger option. So let's think about Control C harts for a moment. The purpose of Control Charts is to show the stability of your process. If your process is not stable, then you cannot reliably make the same sized part, which is going to be a problem for all of your customers. And so there's lots of tests involved in making sure that you are stable, that you're reliably able to make the same part. Now, if you're looking at attribute control charts, those are based on either the Binomial or the Poisson distribution, and those assume a constant variance. Now, what happens if the variance changes over time? Maybe there's humidity or there's temperature problems or there's wear and tear on a gear. This is what statisticians refer to as over dispersion, or in rare instances, under dispersion. And one parameter distribution cannot model this. So Laney proposes that we normalize the data in order to account for the variant and account for varying subgroup sizes. And David Laney wrote a paper in 2002, Improved Control Charts for Quality. So let's take a look at the Laney charts. Here I have some data also found in the sample data. This is not a terribly large lot size, but here we have a column for teaching purposes that has a varying lot size. So let's explore how this works. If we were to look at, say, a P chart of the number of defective out of this varying lot size. I'm going to use the menus, the dialogs again, I'm going to create a P chart to start with and let's see how that performs. Okay, we're going to look at our number defective, and then we have the lot as our Subgroup identifier. Now, I'm going to use Lot Size 2 because that's the varying lot size. And click OK. All right, so on first glance, yes, we expected the non- constant limits because we have the varying subgroup sizes. But we also notice immediately that our chart, this process is out of control because we have these points that are beyond the limits and we can turn on the Test Beyond Limits and they're flagged. And so this process would probably raise all kinds of alarms and people would be trying to retool things. Now, if I show the control panel when I have the statistics set to proportion, because Laney only, in his paper, gave formulas for the P and the NP chart, or the P and the U chart. So currently it's only implemented for a proportion. But when you have your statistics set to proportion, you have four choices now instead of just two on your Sigma. So we could switch to the Laney P prime chart and see what that difference is going to be. And suddenly you see your process is not nearly as problematic. It's not out of control at all. It looks like this process is actually stable, which is great news. Now, is this is this really okay, you might ask, or is this cheating? Let's take a look at the formulas and help us figure this out. So Laney suggested that we compute a moving range, Sigma on the standardized values. So these Z's, those are our standardized values. We compute an average moving range on that. And we have a Sigma sub z , which is the average moving range divided by 1.128. And then we take that Sigma sub z and we insert it into the exact same formula that we saw for our P limits. And so what you can see from this is if you actually have a constant variance, this Sigma sub z is going to approach one. Many argue, including Laney, that it is generally safe to use this instead of the P chart since it's going to approach one and it's going to be the same anytime you actually do have constant limits. So there's the Laney P prime chart. I wanted to show you also, there's a few dialog updates. Let me show you some of those right here. So I hinted a little bit at it. You can see the Laney P prime and U prime. Those are two new dialogs that you can see there. The IMR chart now has a label role on the dialog. The XBar and our Control Chart now has a Constant Subgroup Size option in case you don't have a subgroup that you want to specify. There's a little more work that was done on the Three Way Control Charts. So that now, not only can you specify the constant subgroup size if you don't have a subgroup already identified, you can also choose your Grouping Method, your Between and Within Sigmas for your control chart. So there's different options that are added on the Three W ay Control Chart dialog. And I want to thank you very much for your time. If you have any questions, please feel free to ask. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

JMP Live is a secure collaboration platform for sharing JMP insights with your colleagues, even if they are not JMP users. JMP Projects are self-contained files that help you organize JMP data tables, reports, scripts and more. This presentation walks through some strategies for staying organized in JMP Live, and in JMP Projects, and for moving smoothly between the two. Hi, I'm Aurora Tiffany- Davis, and I'm joined today by Aaron Andersen. We're software developers on the JMP Live team. We'd like to talk to you today about staying organized on JMP Live and in JMP Projects. As a reminder, JMP Live is a secure platform for sharing your JMP insights with your colleagues, even if they don't use JMP themselves. It also offers deeper collaboration with your colleagues who do use JMP. JMP Projects are self- contained files which can help you to organize your data tables, your reports, your scripts, and more. To get started, Aaron is going to talk a little bit more about those JMP Projects. Aaron? Thanks, Aurora. I am JMP Developer Aaron Andersen, and I'm going to show you how to organize your work using JMP Projects. To do this, I'm going to use some data from the JMP sample data directory. If you have JMP open while you watch this video, you can follow along with us. Seventeen samples, data. The data I'm going to use is called Airline Delays .jmp. I'd like to do some analysis of this, and hopefully, get some insights. Because I know that I'm going to be producing several reports, and I'm not sure what else with this project, I would like to keep all of those things organized and together in JMP. To do that, I'm going to use a JMP Project. I will go to File, New, P roject, which creates a new project and opens the JMP Project window. JMP Project window is a container window into which all of the data tables and reports that I'm going to create or open will live throughout this project. Let's drag Airline Delays in. We can see this JMP data table opened here in the project window. Let me make this bigger. The Airline Delays data table contains information from almost 30,000 airline flights that took place in the United States over the course of a year. For each such flight, we have information about how long the flight was, whether the flight arrived on time or not and by how much, and what airline flew that flight. To get a better visual picture of this information, let's open Graph Builder. Let's start by getting an overview of what a typical week looks like. Typically, I want to know, is there a day of the week that is more or less likely to have its flight delayed than others? Now, of course, all I'm really learning from this is, was there a day of the week in the particular year this data was taken? But I can reasonably extrapolate some of this information to airline flights today. We'll start with Day of the Week, put that in the Y column, and Arrival Delay in the X column. That's Arrival Delay. It's not... Let's switch these around, Order by Swap. It isn't liking the day for some reason. Move Day of the Week down to here. Put Arrival Delay on the Y axis. There we go. Now, I have pretty good graph showing me the mean arrival delay for any given day of the week. I can already see that Friday is the biggest day most likely or statistically expected to have the longest delays and Saturday is the shortest. To get a little bit better view of this, let's group this by airline. Drag airline to Group Y. Then let's flip these back like I wanted to do the first time when I couldn't quite get it right. There we go. Finally, to help see the days of the week better, we'll drag Day of the Week into the Color column. Then I'm going to change the color scheme on the Day of the Week. Double click on this label here, hit color scheme, and get a color scheme that's not quite so bold for this particular graph. Now, I think I'm finished. What I have is a graph showing for each airline and each day of the week, what the mean arrival delay was for the year. The colors allow me to follow a particular day from one airline to the next. The first thing I noticed in this graph, which is funny, is that there's only one of these that's negative. If I flew Southwest on a Saturday, my expected delay would be negative, which is to say I respect to arrive on time, whereas every other row in the whole graph is positive. On average, the flights were late every other day of the week for every other airline. That's not what you want if you're ringing an airline. But at least they're not too bad, 15 minutes, 10, 15 minutes appears to be typical for the average anyway. To try to get a better picture of this data, let's create one more graph. Open Graph Builder a second time, and this time, let's try to get an overview of an entire year's worth of airline flights to see if there are clusters of higher and lower delays throughout the year. To do that, I'm going to drag Month to the Y column and Day of Month to the X axis. Graph Builder will automatically create a heat map for me. Then I'm going to make sure that Arrival Delay is the color source. Finally, let's go into the Y axis, and reverse the order so that January' s at the top and December's at the bottom. Now I have a graph showing an entire year's worth of airline flights. I can already see where the dark red is. There are certain clusters of delays. There's a cluster here right around the Christmas holidays in the United States that drops off once the holidays actually start. There's an oddly delay filled day here right in the middle of November, and there's a lot more in the summer months than the winter months. I can speculate that maybe these delays are correlating with flight volume, but the more people who fly, the more likely a flight is to be delayed. Because airports would be busier, loading and unloading a plane takes long if there are more people on it. It's a pretty good hypothesis. I don't have that data in this table, though, so I can't confirm it yet. But I have a pretty good start. If I want to see the two graphs that I made side by side in the product window, I just go up to Airline Delays, and I drag it out, and I drop it in this dock right, drop down. Now I have my two graphs side- by- side, so I can see them both simultaneously. If I wanted to, I can actually take the data table, I can drag that down to the bottom, so that I can see all three graphs that is to say all three items, two reports, and the graph at the same time. This is particularly useful if I want to modify this data table, and watch the graphs update as I do. But before I do that, let's save this project. I've made a lot of progress here. I like to save my work so that I don't lose it if something goes wrong or I mess something up. Let's go to File, Save Project As, put it on the Desktop and call it Airlines.jmpprj which I pronounce JMP Project. You can imagine not any vowels. JMP Project. That will save the project file here on my Desktop, and I can now close it. A ll my reports that I created and the layout that I use are saved in that file. If I reopen that file, everything comes back right the way I left it, which is the second useful feature of JMP Projects. Not only can you organize your data and your reports in the project window in a very convenient way, however you want, you can also save the project at any point, close it, and resume where you left off later. In fact, you can open more than one project file at the same time if you want to work on more than one JMP analysis or more than one project any given day. Now that I have this project back open, I'm looking at the Distance and the Elapsed Time columns here, and I can see that there is some huge variation in the length of these flights. This flight is 327 minutes long. That's five and a half hours of flight, which makes sense; it's 2,200 miles. Whereas this flight is a little bit less than an hour. Some of them, if I keep scrolling, aren't significantly less than that. Let's say that I'd like to exclude shorter flights from my analysis, under the idea that I only want to look at substantial flights. Maybe if your flight is only half an hour long then small delays and getting a runway position change things more than in a large flight, where you have a chance to make up time. What I'd like to do is exclude from these reports all flights that were less than say an hour and a half worth of length. To do that, I will go to Rows, Row Selection, Select Where. I'm going to select Distance and set distance is... actually, Elapsed Time. You can do with mile, let's do it with minutes. Any flight where e lapsed time is less than 90 minutes, I am going to select this in the data table. I can now see that there were 9,338 such flights out of 29,000 total flights, so a significant number of them. To exclude them from the analysis, I can go up here to Rows, select H ide and Exclude, and all of these are now hidden. You can see that the data changed a little bit. It didn't change a lot, but it did change. There is a difference in longer flights versus shorter flights in what the mean delays turn out to be. Having done that, I'll save the project again so that I can save my progress and come back to this point later. Before I do that, notice that I modified the Airline delay data table to hide and exclude all of these rows. I would like when I resume this work for those modifications to restore with the project. But what I don't want to do is overwrite the copy that is in my sample data folder because I would keep these pristine and fresh the way they sit with JMP for future use. What I'm going to do is I'm going to save a copy of this data table. I'll Save As. But because I'm in a project, I have the option to save it to a place called the Project Contents, which is about what it sounds like. It is how I can save this table to be contained inside the project file itself. The project is essentially a miniature file system that can contain files and folders relevant to your JMP analysis that live inside the project file. If I hit Save here, we can now see that Airline Delays, a copy of it is saved inside of this project. If I go back to my Desktop... I got to save the project first, save the project, then go back to my Desktop. We can see that when I save the project, the size gets quite a bit larger because now this file itself contains, not just two reports, but also the data table that I use to generate those reports. Because this is a self- contained file, I can do things like copy and paste to create a backup copy of the file. Now my backup copy also contains its own copy of airlinedelays.jmp safely secure here in case I mess up the other copy in my main project. Because this is a single file, it's easy for me to email this file to one of my colleagues, if they also are a JMP user, and allow them to open this project and see the results of the work that I did. However, if I want an easy way to share this project with non- JMP users, if I want an easy way for me and my colleagues to collaborate on this work together, I can upload these reports to my organization's JMP Live Instance, where my colleagues can see them. To do that, and I put these back into Tabs first, to publish these reports to JMP Live, I'm going to go File, Publish, Publish Reports to JMP Live. This loads the JMP Live Publish page from my organization's J MP Live Instance. I want to publish both of these reports, and I want to publish them to a space called Discovery Americas 2022, and a folder called Staying Organized on JMP Live and in JMP Projects, title of this presentation, where we'll explain shortly what a space is and how full of JMP Live work. But for now, this is where I want to put this stuff. Let's go ahead and hit Next. This last string gives me a chance to customize the titles of these reports. These are generic. Let's rewrite this to be Airline Delays by Weekday and Airline, or say, Day of Week, to be less ambiguous. Down here, let's call this one Airline Delays by Month and Day of Month. Now I have two reports ready to go. I hit Publish. JMP is going to upload these reports and the data that I use to create them to our JMP Live Instance. Now we see Success page. It's already finished. Showing me that I published two reports and one data table to a folder called Staying Organized on JMP Live and in JMP Projects. I can click on this link to actually load it in JMP Live and see that it is there, largely the same as it was on my system. To show off JMP Live and demonstrate the value and able to collaborate and work with reports in this way, and pass it to Aurora. Yeah. Thank you, Aaron. All right, so I'm browsing around on the homepage of our organization's JMP Live site, and I see that Aaron has published some new reports that look pretty interesting, having to do with airline delays. I see that he put both of these in the same folder. Let's take a look at that folder. One of the easiest ways to stay organized when you're working with JMP Live is whenever you're publishing the reports, just put them somewhere reasonable. Easy enough. And Aaron has done that here. He's put his reports into a folder called Staying Organized on JMP Live. Of course, that's the title of this talk that we're giving. But if you recall, even before he chose a folder, he was asked to choose a space to publish his content to, and he chose the Discovery Americas 2022 space. This is a place for Aaron and I and a few of our other colleagues in JMP Live to work on content related to this Discovery conference. It contains interesting reports not only in the Staying Organized on JMP Live talk, but also we've got another talk in this conference that takes a deep dive into publishing, another one about automatically refreshing your data, and so on. It makes sense that we would all be working in the same space. But what is a space? Well, I like to really call them collaboration spaces, because that's really what they are. They're just a place for multiple JMP users to work together on the same content. To show you more about what I mean, I will switch over to a browser, where I'm logged in as an administrator. As an admin, I have access to this Permissions tab. When I click on this tab, I can easily turn on and off collaboration permissions for individual users and for groups of users. We can see here that in this space, all of the users in my organization have permission to view the content in this space and to download it. But Aaron and I, we have some extra permissions, so we have the permission to create new content in the space, in other words, to publish, like Aaron just did a moment ago. We also have permission to edit content and so on. We are fairly well trusted members of the space. Let me switch back to my normal browser now. Of course, Discovery Americas 2022 isn't the only space that my organization has set up, and I'd like to show you how to find additional spaces. But before I do, I know I'm going to want to find this folder again, so I'm going to bookmark it to make it really easy for myself later on. Now if I go up to this blue navigation bar and click on the word Spaces, it opens up the Space Directory, and we can see here that I have access to some other spaces as well. Discovery Americas 2022 is the one we are just looking at. We also have one for Discovery Europe 2023, a conference coming up in the spring. I see that there's also a space here with my name on it. That is my own personal space. In JMP Live version 17, every user gets their own personal space to do with whatever they want. There's also a shortcut to your personal space. If you go all the way to the top and all the way to the right and click on your profile picture, you'll see this shortcut My Personal Space, My space doesn't really have that much in it, but what it does have is this P ermissions tab, even though I'm not an admin. The reason being this is my own personal space, so I should get a say on who has access to it. Of course, by default, I'm the only one with access to it, but I can invite more people in if I want. I have chosen to let Michael Goff in to see the content in my space, although, I don't really let him do much else. Now that we've had that brief tour of spaces, let's go back to the folder we were working in. I'm going to use the bookmark I made to get there quickly. All right, here we are. I can see these reports that Aaron has published, but I'm thinking ahead, and I think we're going to want a lot more content in here in the future, maybe some content that doesn't have anything to do with airlines. To stay organized, I'm going to create a new folder by going up here and finding the New Folder icon, click that, and let's say airlines. Now that I think about it, I actually have some airlines, at least one airline report that I want to publish as well. But whereas Aaron's reports are entirely related to airline delays, my report has nothing to do with that. It's more to do with the flow of traffic of airplanes over the continental US. I'm going to add another layer of organization in here under Airlines. I'm going to create a folder called Delays for Aaron's stuff and a folder called Traffic Flow for my stuff. Now, I just want to move Aaron's content into the right place. The easiest way for me to do that is to click over to the Files tab, and I will select all of Aaron's files, that being the two reports that he published and the data that those reports rely upon. I'll come over here to the upper right and select Move Posts, and I'll find that Delays folder I just created a second ago, and move all of Aaron's content in there. Now we've got Airlines with two folders: Delays that's got Aaron's stuff, and Traffic Flow that's got nothing in it, because I'm just about to publish something to it right now. Let me switch over to JMP on my machine. I have here a bubble plot with a local data filter. This shows the flow of flights that are taking place over the continental US. It also has a local data filter. I can filter this to just show certain airlines. I've chosen Delta and Southwest. We can see here that Delta has a hub in Atlanta, Georgia, and we can see, rather unsurprisingly, that Southwest Airlines concentrates its flight patterns in the Southwest region of the United States. Let's publish this to JMP live. File. It works just the same as when Aaron was publishing from his project, even though I'm publishing outside of a project. File, P ublish, Publish Reports to JMP Live. The first thing you do is choose among those reports that you have open which ones do you want to publish. It's a really easy decision for me because I only have one report open. Next. Now I need to choose, of course, where to put it. I'm going to stay in the Discovery Americas 2022 space. I'm going to stay in the Staying Organized folder. But under that, I want to drill down a little bit, go inside Airlines and inside T raffic Flow, and that's where I want my reports to be published. I'll click Next. Just publish that. We can see here on the results screen that we have published to the Traffic Flow folder one new report as well as the data that the report relies upon. It's this data that allows the report to remain interactive once it goes on JMP Live. Let me follow the link here, and this will open up my organization's JMP Live site and take me right to this newly published report and we can see that it is still interactive. I can speed it up, slow it down, maybe I want to find out what's going on with Express Jet, a much smaller airline. You can see the interactivity is still here I want to let Aaron know that I've done a little bit of reorganization so that he can see what he thinks of it. I'm going to move back up our folder hierarchy a little bit. My report is in the folder Traffic Flow, of course, so I'll move up there, then I'll move up one more to this Airlines. I want to let Aaron know what's going on . Let me actually make a comment on one of his reports. That's going to make sure that he gets a notification about it. Just open one of his reports and click on Comments here, and I'll just let him know. "Aaron, I did a bit of reorganization. Let me know what you think." Let's see what Aaron thinks about it. Thanks, A urora. If I want my JMP Live Instance, I'm going to see a pop up here in the upper r ight- hand corner just to say I'm logged in on my computer to the same JMP Live Instance, a little alert. When I click on this, I can see that Aurora Tiffany- Davis added a new comment to a report that I uploaded. I can click here to go to the report, and then view the comment that Aurora made. "I did a bit of reorganization. Let me know what you think." I'm just going to say, "This is great, thanks." I appreciate her helping me out with this. I can now go in and take a look at the report that she added I said it was great before I saw it because we're recording a video, I got a sneak preview. I suppose in real life, I want to see it first, so I can know if I'm saying this is great or this is crap, depending on what I think of Aurora's work, but it is great. It's uploaded. It's airline flights going across the country . Depending on whether the data table that she used for this has dates in it, I might actually be able to use it to answer the question I had earlier, which is, are the delays correlated with volume? Even if it doesn't, it's data that I would like to add to the project that I created with the airline delays. What I'd like to do then is create a new project that contains the delay reports that I made, plus the traffic flow reports that Aurora made. On JMP Live, I can do this automatically. I just go to the Airlines folder. I go up here to the Menu bar. I hit Download as JMP Project, and JMP Live is going to create a project for me with this information. Let's put that on the Desktop, and let's call it Airlines Updated. When I open this project in JMP, I'm going to see this file, the project manifest that JMP Live has had to tell me everything I put in the project, and in the case that it went wrong, what it couldn't put in the project that's empty today, which means everything that should have been there was. I can see the list of reports included. This is one that I made. If I click on that, it will open the project. I can also get them down here because all of these reports and the data is saved inside the Project Contents. In fact, it's saved in the exact same folder structure that Aurora organized it into on JMP Live, which is useful for me because now it's in two neat little subfolders. I can open the air traffic report that she made and return to here. I can swap out the airlines interactive, just like it was before. I can add all of this stuff to the work that I did. I have essentially round trips to Data. It started on my machine when I made my first two reports. I upload it to my organization's JMP Live i nstance. A colleague, Aurora, was able to see the report that I created, add to them herself, reorganize the structure of my end or hers, and then I was able to, in one step, download the resulting folder as a JMP project that I can then continue to work with, analyze, explore, and discover. Pass it back to Aurora to finish up. Yeah. Thank you, Aaron. I hope that the features that we showed you today can help you to stay organized. Mostly, I hope that you and your colleagues are creating so much content in JMP that staying organized becomes absolutely crucial for you. There are actually several other JMP Live focused talks during this conference, so if you're interested in JMP Live, we encourage you to check those out. Either way, thank you so much for joining us today, and we hope you have a fantastic rest of your conference. Bye now.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

This presentation explores the use of JMP Pro to analyze gene expression of single-cell RNA sequencing data from murine melanoma samples. Single-cell RNA sequencing is a next-generation sequencing technology that reveals the heterogeneity between individual cells and permits comparison of the transcriptores of different cell types. While there are many statistical tools to analyze scRNA-seq data, JMP provides many streamlined methods for initial analysis and visualization of data. The objectives for this study were to determine cell subtypes from a sample of T cells extracted from cancer-infiltrated lymph nodes and adjacent tissue and find differentially expressed genes. PCA and clustering methods were used for preliminary exploration of cell heterogeneity. Predictive modeling was used to determine if cells can be accurately classified as cancerous by gene expression profiles. Next, T cells taken from different time points were analyzed to study the trajectory of gene expression associated with functional changes in these cells. Significant gene signatures were explored with pathway enrichment analysis and other downstream tools. Hello, everyone. Today I'll be presenting about the application of JMP to analyze gene expression single- cell RNA data in melanoma cells. My name is Catherine Zhou, and I'm a high school student at Lynbrook High School. As introduction, melanoma develops in the cells or melanocytes that produce melanin and is the most aggressive type of skin cancer, representing 65% of all deaths from skin cancer. T cells are a type of white blood cells that develop from stem cells in the bone marrow and mature in the thymus, where they multiply and differentiate into helper, regulatory, or cytotoxic T cells or become memory T cells. Cytotoxic T cells, which are activated by various cytokines, bind to and kill infected cells and cancer cells. In melanoma patients, T cells mount an immune response against the tumor, but at some point, the responder T cells become ineffective due to a local immunosuppressive process occurring at the tumor sites. We'd like to identify the causes behind T cell dysfunction. Alternative splicing is a regulatory process essential to generate transcriptome diversity. Misregulation contributes to disease and cancer. Eukaryotic genes are composed of int ronic and exonic sequences, as you can see in this diagram. In the alternative splicing process, non-coding introns of a gene are selectively removed and the included exons are combined in the final messenger RNA, and translation of these different isoforms, where different combinations result in different proteins and different cellular functions. Next, traditional bulk sequencing examines the genome of a cell population such as the cell culture or tissue, and its output is the average gene expression of the cell population. On the other hand, single cell sequencing measures the genomes of individual cells from the population. Single-cell RNA sequencing, or scRNA-seq, measures the transcriptomes of each cell in the sample, which reveals the heterogeneity of thousands of cells and provides insight into cellular differences at high resolution. In this study, I'll be analyzing alternative splicing in scRNA-seq data, which has rarely been studied due to several issues. ScRNA-seq results in a very large number of cells and sparse data. However, with the proper statistical tools like JMP and R, we can take steps to solving these issues. In this presentation, I'll be going over my method to find the most significant alternative spicing or AS events differentiating cancerous T cells from healthy lymph node T cells. I'll explain my data processing pipeline, and then I'll go over how to use JMP Pro for predictive modeling, clustering, and visualization. And then I'll go over the results of the analysis of scRNA-seq dataset of T cells in murine melanoma. This is a diagram of the presentation. First, I'd like to explain my dataset preparation and processing process. First, I prepared and processed the dataset. I took the scRNA-seq dataset of cells in lymph nodes with melanoma in mice from this paper over here, and then performed read alignment of the F ASTQ files with STAR, and detected AS events with the pipeline derived from rMATS, which uses a generalized linear mixed model. Instead of using the pairwise comparison, I ran it on a single sample with each individual cell, effectively quantifying AS in each cell. Then I used R to quantify exon skipping events with IJC, which stands for Inclusion Junction Count, and SJC, which stands for Skipping Junction Count. As you can see below in this diagram, an inclusion junction is detected when the read includes both the flanking sequence, which is this black sequence over here, and the exon sequence, which is this E sequence over here, while a skipping junction count is detected when the read only includes the two flanking sequences and just entirely skips the exon. Then I create a matrix of this data. Next, I subsetted the T cells using cell labels previously defined by the authors of the dataset. To filter AS events, I applied the following rules, and I did this in R. There must be at least ten reads per junction. The event has to be detected in more than ten cells, and the event cannot have no variability across the cells. This moved around 75% of the exons. This shows that filtering is important because it removed exons with low coverage and were basically unimportant. This reduced the feature size, which will improve the performance of predictive modeling and analysis further in the future. Then I calculate the PSI or percent spliced in value with this equation: IJC over IJC plus SJC. Before I go into the next step of predictive modeling, I wanted to explain the reasons I decided to apply predictive modeling in JMP for this specific problem. There have been several studies that successfully used machine learning models for analyzing RNA-seq data, which achieved high accuracy. I hypothesized that it would also be accurate for scRNA-seq data. There are many advantages of using predictive modeling as well. It can extract meaningful features from huge datasets, classify data and predict outcomes with supervised learning, and recognize underlying relationships of data, like in neural networks. JMP Pro also provides a great interface for exploring these different models, allowing you to easily apply sophisticated algorithms to large datasets quickly and accurately. It also allows for model comparison and screening. Finally, it generates some visual and interactive reports that reduce the black box effect of machine learning models, which means that you don't really know what happens in the model, you just know the output, which is not very useful. Our dataset had 8,044 variables or exons, and 1,014 rows or cells. It was difficult to run analysis with such a large number of columns, so I used a bootstrap forest for variable selection. A bootstrap forest is very computationally efficient and it can operate quickly over wide datasets, and it uses random sampling with replacement, causing it to be very robust and accurate. I could have also used predictor screening, which is another function in JMP, but I found that manually running it using a tuning design table was more computationally efficient and allowed me to tune the number of trees as well. But I found that the number of trees doesn't make a huge difference. In general, the more trees you use leads to better results, but their improvement decreases as the number of trees increases. At a certain point, the benefit in prediction performance from learning more trees will be lower than the cost in computation time. For the 8,044 exons, I tuned the number of trees, and I found that around 50 trees resulted in the highest accuracy. And then I took the top 100 exons from the column contributions, and then I ran it again with the top 50 exons, and then again with the 50 exons, comparing the accuracy as you go. This shows the JMP reports for tuning trees and variable selection. As you can see, I use the tuning design table and change the number of trees. I looked at the entropy R square and these different metrics. I believe 50 trees was the best result. I took the top 100 exons and then I ran the bootstrap forest with 30 trees because I did another tuning design table. For each iteration, I did a separate tuning design table to find the best number of trees. As you can see, the entropy R square increases as the number of exons was decreased. As I selected the top three different exons, it shows that it doesn't really matter if it's the top 10 or top 50 or top 100, it will still result in a high accuracy. The misclassification rate varied. The RMS error also decreased, and the average absolute error also decreased. With these metrics, I decided to use the top 10 exons to run the next steps of my analysis. I used the Model Screening function in JMP Pro to run the top 10 exons on all the different models that were possible. It resulted in a high accuracy across the board. I looked at the prediction profiler to determine how the exons caused tumor or cancerous cells to be different. Now, I'll be performing a demo of this process. Let me open the top 10 exons. In JMP Pro, you can go to Analyze, Predictive Modeling, and then Model Screening. I'll just hit Recall because I used this before and it automatically updates with your previous settings. And then it also did K-fold cross v alidation. This process is very quick, it only takes around 20 seconds or 10 seconds. All right. As you can see, the neural boosted had the highest accuracy. Next was generalized regression lasso. It split the dataset into training and validation. You can see that boosted tree had the highest accuracy for training, which makes sense because it basically fits the data really well when you use a tree. But for validation, let's see, the most accurate was in neural boosted. With this information, I decided to use a neural boosted to do the final classification of cells for tumor cells or normal cells, because it would result in the highest accuracy. I clicked Run Selected, and then I used the validation column that I previously created with the Make V alidation function in the Predictive Modeling tab. Here are the results of the neural network. It was very quick, again. You can see that there are the ROC graphs. It basically has a very high AUC. If you take a look at the prediction profiling, the profiler, this shows the different ways... Let me explain this better. As you move your cursor along, if you look at A POBR, this gene, when the inclusion of this exon increases, then there's a higher probability that the cell will be a tumor cell. Over here, for this, this is also the same trend. For this one, this is the inverse trend. For PCDH9, as the inclusion of this exon increases, the probability that it will result in a tumor cell decreases. Sorry. This allows you to look at the role of these exons in causing the function of tumor and normal cells to be different. This is a very powerful visualizer, and it can show the different relationships between these cells. You can also just save the formulas to the... Sorry, one second. You do Publish Prediction Formula, and that saves it in the formula depot. Okay. Next, I'd like to look at the difference in expression as it goes from 5 days to 8 days to 11 days. This dataset also provided the cells at different time points. If you look at here, this column shows that the different times are available for us to analyze. I subsetted the tumor cells. Let me find that over here. Tumor only. I compared the distribution of the different exons. Let me show you the graph. As you can see, there are several exons that increase in expression over time. Over here , SLAMF9 or T OMM20 increases dramatically, and these also increase. There are also exons that increase and then decrease, which show that they have an optimal period of time that they're the most active. I'll be going over these specific genes and functions later in my presentation. Let me go back to my presentation. All right. Here, I also ran clustering on the top 10 exons dataset. I did K- means clustering, and this more accurately classified the tumor versus normal cells. As you can see here, the blue is normal cell and then the red is the tumor cell. This shows that there are possibly different subsets of tumor cells that have different functions. I also ran UMAP on all the cells. UMAP is another dimensional reduction formula. This was ran with a JMP add- in that was created by this author. I will link it in my presentation in the community. Here is the gene enrichment analysis that I performed. As you can see, the top function of all of these genes was RNA import into the mitochondrion. The second most important one was regulation of leukocyte degranulation. T cells are a type of leukocyte. It can show that possibly these leukocytes or T cells are degranulated, and that causes the cells to become tumors. Also, malignant tumors selectively retain mitochondrial genome and ETC function. That can also explain why RNA import into the mitochondrion is an important function. This is another graph of the top GO biological processes. Over here, the top one is cellular process. There are also biological adhesion, regulation, and immune system processes that can be further explored. I'd like to explain the most notable genes. These genes are basically the top genes that were in the column contributions. S TPBX2 is involved in intracellular trafficking, control of SNARE complex assembly, and the release of cytotoxic granules by natural killer cells. This can show that the T cells are involved in some sort of trafficking. Or the cytotoxic T cells can use this exon to regulate the tumor versus normal functions. SLAMF9 nine encodes a member of the signaling lymphocytic activation molecule family and its transmembrane. WARS is a tryptophanyl- tRNA synthetase that catalyzes the aminoacylation of tRNA with tryptophan and is induced by interferons. All of these genes have a role in the immune system and T cell function. We can study these further and determine if they actually have a important function with in vitro tests. Next, I explained this earlier, but I analyzed the tumor gene expression over time and T OMM20, which was the one that had the most dramatic inclusion, is actually implicated in the translocase of outer mitochondrial membrane complex that facilitates cancer aggressiveness and therapeutic resistance in chondrosarcoma, which is a type of cancer. You can see that TOMM 20 causes cancers to be more aggressive. It's really interesting how our basic analysis with JMP Pro resulted in this exon being the most important. This shows the effectiveness of JMP in these types of studies. In conclusion, the exploration of this dataset facilitated by JMP demonstrates the role of genes and exons in T cells in melanoma. A good thing is that the code can be replicated in Python. JMP allows that for more robust or detailed analysis. These potential genes like TOMM 20 or STPX9 can aid in the discovery of novel and personalized approaches to cancer treatment. And then we can perform in vitro testing on the top genes. Thank you for listening to my presentation. I will provide all the code and R scripts in my presentation link.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

Through JMP 16 outlier and quantile box plots (distribution), together with quantile range outlier and robust fit outlier detection (screening), we present comprehensive strategies to powerfully separate signal from noise in the presence of univariate response(s). We also propose that through practical analysis with the box plot, we can connect the Gauge R&R noise impact with location of the points most adjacent to the upper and lower fences. We use Monte Carlo sampling (random() function and instant columnfFormulas) to produce multiple distribution types (normal, uniform, peaked, bimodal) to validate the impact on the box plot and histogram together to detect normality violation failure modes. We demonstrate that the box plot is a powerful visualization tool to judge the data distribution, unique to separate skewness from outliers. Graph Builder, one-way, GoF, and nonparametric hypothesis testing show that – since box plot is very weak to detect bimodality, kurtosis, or draw hypothesis test decisions (missing sample size effect) – both the histogram and box plot are needed to visualize normality. Together with descriptive statistics, the most powerful discrimination between different candidate distributions is presented. Finally, we synthesize and demonstrate our learning experience by formulating 17 thought-provoking quiz questions and answers to maximize the utility of the box plot for data-driven problem solving. Well, thank you everyone for joining me. This is a Discovery Summit 2022 presentation, courtesy of my co-presenters, Charles Chen and Mason Chen. My name is Patrick Giuliano. The title of this talk is, Box Plot A nalysis: Blending Scientific and Artistic Enquiry in Uni variate Response Characterization. Here's the abstract. You can find this on the JMP User C ommunity in the Discovery 2022 community page, US D iscovery 2022 community page. I'm putting it here for reference. I will provide a link in the slides to the community page where the project will live. What's the motivation for this project? The Box Plot is one of the most popular graphical tools to visualize a univariate distribution of data. This project studies how to use the Box Plot to analyze data effectively. Most people who use the Box Plot don't use it necessarily to determine the shape of the distribution of the data. In fact, many people use it wrongly to draw mean or mean comparison decisions, and they may assume normality based on symmetry, when in fact, the normality assumption would actually not be reasonable if they were to take a closer look at the shape of the data on a histogram, for example. The objective of this project is to demonstrate how to use JMP specifically 16, to interpret information in a Box Plot and to improve proficiency in a global community of scientists and engineers that are really under a DMEIC or APS, or Lean type Six Sigma methodology, which is very popular, obviously, today and over the last few decades. The interesting thing about this project is we framed it in the context of 17 quiz questions. This is a question and answer slide deck. And I'm not going to go into too much detail about each and every question, whic h I will show you here. But what I'd like to do is show you a little bit about how you can use JMP to explore the answer to these questions, because I think that's really the most interesting and fun part. The first thing I wanted to do is just quickly go over what a Box Plot is. So what is the anatomy of a Box Plot? Just as a refresher for some of you, or introduction for some of you, the median is indicated by the midline, and it's referred to as the second quartile, Q2 or the 50th percentile. Then Q1 is referred to as the 25th percentile. Q3 is the 75th percentile, as you can see here. The interquartile range, or IQR, is the difference between Q3 and Q1. The other important elements are we have what's called a whisker on the lower side and on the upper side. Right at the end of that whisker, sometimes we refer to it as the fence and you'll see a vertical line and JMP draws a vertical line to indicate that edge. What's important about this is that this location is actually Q1 minus 1.5 times the IQR, which is represented by the distance between this edge and this edge. This point and the upper fence is Q3 plus one and a half times the IQR. So that defines the upper edge. Then any points that are beyond these edges or fences are considered potential outliers. And they actually show up by themselves as points whereas the rest of the data in the middle of the histogram is not shown for emphasis on the points that are beyond the beyond these fences. I'm going to jump right in. How did we explore and develop the answers to these questions, and in some cases, even refine the questions themselves? Well, we created a simulated data table in JMP 16, where we constructed 100 rows of data, and we constructed data, first from a normal distribution, and then applied some transformation to that data. We see that we have a normally distributed data drawn from a population the mean of zero and a standard deviation of one. Then we have uniformly distributed data. Then we have data that's peaked, i.e has a positive Kurtosis. Then we have data that's right skewed, has two modes, has some outliers about 3 % on average and then integers. In all the cases, with the exception of the bimodal, we just based the simulated formula, on the original normal column. The way that we put all the data on the same scale is we use the column standardized function so that we could compare all the data relative to each other in the distribution platform. This is just a preview of that. I'll jump over to JMP and show you that. But again, all of this data is centered at a mean of approximately zero and a standard deviation, approximately one. We covered the first question. Why i s a Box Plot, sometimes referred to as a five- point plot? Well, there are five main points . There's Q1, there's Q2 in the middle, there's Q3, and then there are the whiskers, the upper and lower . Next question. What are the two ways that the Box Plot can determine whe ther the distribution is skewed? Well, we can look at the width of the box itself. We can also look at the width of the whiskers. In this right skewed example, you can see that upper whisker is much longer than the lower one. So that would imply that the data is right skewed. In other words, that the tail in the data, if you were to imagine that distribution, is pointing to the right. Third, why does the Box Plot include the median and not the mean? Well, a Box Plot uses the median to determine or gauge skewness. So if the distribution is normal, then the mean is equal to the median. And in fact, what you would see here is that this median line in the middle would line up exactly with the edges of this diamond, the middle of this diamond. In that case, you would effectively have a situation where you're not really losing any information because the distribution is symmetric. The median in general, then might be considered better, regardless of whether the distribution is normal or non-normal. Fourth, why is the Box Plot the most powerful visualization tool, or one of the most powerful tools to separate skewness and outlier problems? When we talked about this idea that because the Box Plot uses this Q1 minus one and half times IQR and Q3 plus one and a half times IQR methodology, it really allows us to separate potential outliers from the main data. It also gives us a framework by which to judge whether the upper whisker is larger or smaller than a lower whisker. So those two components of the plot really help us, rather, see if we're trained skewness and potential outlying this. This is a unique feature of the Box Plot. The fifth question is a little more interesting. What's the relationship between the interquartile range, that distance between Q1 and Q3, and the standard deviation, which we can calculate for any data set, regardless of how it's distributed? If the data is normal, what about if the data is skewed or non-normal or peaked or any other shape? Well, we know, based on theory, that the ratio of the IQR, to the standard deviation is 1.35 for normal data. What would that ratio look like if the data wasn't normal? Well, we can explore that in JMP, and I'm going to show you that really quickly. Here's the data set. I'm also going to post this on the community . The first thing I'm going to do is go ahead and show you how I get to a visual state where we can see all the Box Plots together, without the distributions. This is interesting, but I'm going to go ahead and start from the beginning. I'm going to analyze distribution. I'm going to show you how I got there. I'm actually going to click everything, and JMP is going to give me a histogram and a Box Plot together. A t the end of the presentation, we're going to summarize why that's important. But what I'm going to do here is I'm going to go ahead and turn off the histogram, I can go ahead and customize the width here of the lines. I can copy this customization over, which is really nice. I'm going to hold down the control key because I'm on a PC. I'm going to right click and then I'm going to hit Edit, Copy, P aste C ustomizations, and that's going to bring them all over. I'm actually going to hold down the control key again and resize this so that I can resize them all together. Now I'm going to minimize the quantile section, because I'm going to get the information that I need from the summary statistics section. I actually have the IQR and a standard deviation shown here. A lthough I could customize this, either here or in the properties, which I can access under File, Preferences and under the distribution platform group. What I'm going to do is I'm actually going to make this information into a data table and I'm going to right- click and select, Make Combined Data Table to do that. Now I only need the IQR and the standard deviation. I'm really only interested in that. So I'm actually going to select, one of the standard deviations, one of the IQRs. I'm going to move my cursor over here and select Matching on all of the rows that have these values in them. I'm going to go ahead and invert the selection, delete the rows that I don't want, and I'm left with this. Now I'm just going to go ahead and restructure the data so that I can calculate the ratio of the IQR to the standard deviation. So I'm just going to use Table Split for that. I'm going to go ahead and split by column 1 put column 2 in here, put these in a group. I'm going to click OK. I have the data how I want. This shows me from which distribution this statistics came from, I'm going to go ahead and do a New Formula C olumn, Combine, Ratio. There you go. This looks a little bit hard to interpret for me. I'm going to go ahead and change it so that I can only see two decimals. I've got numbers which are very similar, it should be, anyway, very similar to what I have in the slide here detailing the ratio of the IQR to the standard deviation. Of course, they're going to be different because there's sampling error. This table is only one sampling experiment. But this is how I can quickly and interactively extract this information and really understand, what does this ratio look like if my data is not normal in a particular way? We can see here, that the values that tend to be lower at the peak distribution of the one with outliers, the values that tend to be higher than the typical or the expected theoretical 1, 3, 5 normal, are going to be the uniform, the right skewed and bimodal. Next question, what's the ideal outlier percent if the distribution is perfectly normal? Well, it turns out that if we look in the textbooks or reduce simulation, on average, we should see about 0.7% of the points beyond the fences, in a normal distribution, or at least perhaps not beyond the fences, but if we were to do a control chart, we would certainly see about, which is under the assumption of normality. For example, if we were to do an individual moving range chart, we would see around 0.7% of the points on average being outside the limits. Although for practical purposes, we could probably say, if we saw about 3 % or less of the distribution beyond the limits, we would consider it approximately normal. Why is that question important? Well, we can use the proportion of the points beyond the fences in a Box Plot when the sample size is small to determine whether or not we have some evidence of normality on the basis of outliers. Although if our sample size is too big, then we're going to see lots and lots of points beyond those fences. So it's really important that we consider a " reasonable sample size." And that's part of the reason why we only considered 100 rows in our project. Next question, what's the difference between a quartile range and a quantile range Box Plot? Well, in a practical context, anyway, we can talk about the Explore Outlier utility in JMP 16, which a llows us to adjust the Q, which is the multiplier on the IQR and the tail quantile, which is essentially how the data is divided up. We can customize that range. I'm just going to show you what that looks like real quickly. I'm going to go into Analyze, I'm going to go to Screening, Explore Outliers. I'm going to do this on my raw data. I'm actually going to close this. I'm going to go back to the raw data table. I'll just pick a couple of these. I'll actually pick the ones that I have in my slides to peak in the outliers. I'm going to go ahead and use the quantile range outliers. I'm going to adjust this to what the Box Plot uses: 0.25 and 1.5. I'm going to click Rescan and JMP's going to identify potential outliers here. How does this connect to the distribution platform? Well, if we go over here, we look at this, what we're going to see, is there are a number of outliers here. I'm actually going to select the rows, I'm going to go over here. Well, lo and behold, it's these values. So you got 1, 2, 3, 4, 5, 6,7. There's seven outliers. 1, 2, 3, 4, 5, 6,7. That squares up. That's exactly what we would expect. Similarly, we've got 1, 2, 3, 4, and if we scroll over here, and under the outliers, see if we're over here are to four. Great. Going back to the slides here, we can customize this . And that's actually what we get into in subsequent Question 10. How do we determine whether outliers are marginal or extreme? Well, and why is it important? Well, we can adjust the sensitivity of the outlier detection based on the multiplier on the IQR while keeping the tail quantile the same. You might intuitively expect that if you were to take Q₃ plus a larger number times the IQR, it's going to extend the whisker length and similarly, on the lower side. That's going to mean that more points are going to fall inside. So less outlier would be detected. We should be able to see that and test that in JMP. So if I were to increase this to two and click Rescan, we see a few outliers become part of the Box Plot, or we can imagine a situation where that's the case. I'll increase this to three, I'll hit Re scan we see even fewer outliers being identified still. A s I go up to Q equal to five, now I only have one outlier detected in the peak column of data. So the idea here is that we can develop criteria for Q, for example, we might situate it with three, a situation where data might be considered a typographical error, where it might be an extreme or more extreme outlier. We might set Q equal 1.5 if, for example, we think that the potential outlier might be associated with variability due to the measurement system or special process variation. We can do some simulation based on our application and decide on what the value of Q should be in these particular scenarios. In connection with that, in Question 10, we touched a little bit on GRR or measurement system variability. Question 8 talks a little bit about, it goes a little bit deeper into this and brings together some ideas. The idea here is that we might actually consider the distance between the upper fence and the first outlier or the first potential outlier series of outliers. We may extend that upper fence by a distance of two times the Sigma due to the measurement system variability. In this way, we're actually considering the variability due to the measurement system. And we're asking ourselves, is this potential value within the noise of the measurement system or not? We're creating a graphical way, a blended graphical means of determining whether the value is reasonable under the expectation tha t there's measurement system variability. I have here the distance between the marginal outlier and the whisker should be compared to the GRR noise standard deviation. If it's within two standard deviations, we don't have 95 % confidence to conclude this marginal outlier is different from the whisker. This is just a graphical version of a one- sample T- test in effect. We could actually construct a one- sample T- test using this red line as our target and the observed value, or rather assumed series of values, this black dot, as our distribution relative to that target. The next question, how many points do we need really to produce a Box Plot if we're sample size limited? Well, we might need at least seven points, and our simulation in this particular sampling experiment shows that. What's happening here? Well, each of these three data sets have the same median. You can see in this data set, there are six observations. In this one there are seven, and then this one there are eight. Let's start on the left, actually, and we have eight observations, one out here around 15. What if we reduce the number of observations to seven and we actually included the same observation here, but we reduce one of the others? What if we reduce it further while maintaining the same median? Then what we see is that this outlier 15, which is still in the data set, no longer becomes an outlier. In essence, it becomes absorbed into the whisker itself. The other thing that's interesting about this simple experiment, is that the IQR becomes inflated when we go from seven to six. We can see that visually as that the width of this box from Q1 to Q3 becomes much wider. We can also see that numerically here. I actually want to show you how we might explore that in JMP. Here's some data. It's not the same data, but here's some data. I just created a column that ranks the data. A gain, I just use an instant column formula. I can do that by selecting one of these options, so I believe it's under distributional. Now, what I'm going to do is I'm going to go ahead and just clock this data. I'm going to turn the histogram on its side. I'm actually going to invoke the local data filter. I'm going to bring in that rank column that I'm going to make it ordinal first, so that I can select data individually rather than under the assumption of the continuous distribution. I'm going to select everything. Now, let's see, if I go back to the data table, I know that 8 represents the highest, the largest value. I'll keep 8 in there, and I'll just start reducing some of the lower values by holding down my control key and clicking that, which will effectively remove that point dynamically from this analysis. I got the control key down and clicked again, click again. You saw it there, that one outlier at the low side, anyway, in this case, just disappeared. There's a relationship among the distance between the fences and the points, which is calculated on the basis of the data where the median and the quartiles are calculated based on the data that's in the analysis. This gives you a better means of appreciating how the Box Plot is changing as a function of data that's either in or out of the analysis. This is a really super cool feature that I really like to use a lot in many contexts. What's the advantage of a Robust Fit Outlier algorithm, which is a JMP 16 algorithm? It gives us another means of detecting outlyingness . We have the ability to use a Cauchy method which often avoids the impact of skewness, which can be useful for practical situations. We can also use a 3-s igma or a K-sigma multiplier in order to help detect outlyingness . All of these methods really help us separate potential outliers from real outliers and help us create a reasonable signal detection and methodology in a similar way that we might do if we were to use control charting and build a control chart with limits for our particular experimental or manufacturing application. 13. Can we include the sample size information in the Box Plot? Well, this is where the Box Plot starts to present a clear limitation. There isn't any sample size information explicitly in the Box Plot. A lthough, we do have the ability in graph builder to create a notch Box Plot, which gives you something like a confidence interval on the median, the edges indicate a confidence interval on the median. We also have the ability in graph builder to invoke the caption box which is a very useful feature for summarization of data graphically without needing to provide an additional tabular data output. But of course, that information is completely hidden to the Box Plot itself. Connected to that is, can we make any decision with any level of statistical confidence if we're just looking at the Box Plot? The answer is no. In this particular example, we actually designed it so the medians were slightly different on average. And so we're getting some separation among the medians between the groups. We used to fit Y by X in this context. What this shows is that the mean [inaudible 00:29:27] represents the mean, the mean diamonds are non- overlapping. It looks like all across all four groups being compared, which indicates that there's some evidence that there's a difference in the means between the groups. We can also see the difference in the medians. We can do a non- parametric test. In this case, we're using a non- parametric steel test with control, where the control is just the Z normal. We're seeing some evidence of separation, statistical separation among the medians in this particular instance. It's hard for us to detect that and see that in the Box Plot. In fact, it really isn't that clear at all. How can we tell if we have any concern with respect to Kurt osis ? What's Kurtosis? Kurtosis is basically the idea that if it were a positive Kurtosis, you would have data that's concentrated in the middle, your data that's squished together into the middle of the distribution. That's this example in the right. If you had an idealized case of extreme negative Kurtosis, you'd have a uniform distribution where the data is really spread out. What you can see in these graphs relative to the normal distribution, is that the 50 % dense zone, indicated by this red bar, is basically about as long as the distance between Q1 and Q3 here, but it's on one side of the median, on one side of the median and the uniform case. It's about as long as this box width then it's also on one side of the median. That's a unique characteristic feature of this uniform distribution shape. If we look at the peak situation, we see that the box width is much more compressed and the shortest half width is also about the same as the box width, the shortest half is the most dense region rather as centered about the median. That's similar to what you would see for the normal distribution case, where the 50 % dense region will be about centered on the mean or the median and about the same width as the box. Clearly, the differentiator here is that the distances are reduced quite a bit. Really the takeaway here, though, is that this type of interpretation is really difficult. And it would be easier for us to rely on the shape that's evinced by the histogram than to try to look at the Box Plots separately. Question 16 is very similar to 15. What about in the context of data that has more than one mode? What about a bimodal distribution? Well, I just took the Box Plots and pulled them out on the left, they're from the pictures on the right. We can't really see a whole lot of difference among these. It's difficult for us to interpret this. But once we put the histograms, we can see clearly if we fit a two- peak distribution that there's two modes in this data, and there's maybe one mode, maybe a small mode, but really essentially one mode in the data on the left. The Box Plot isn't particularly good at detecting that presence of multiple modes. The last question is, how many "normality violation failure modes" can we detect with the Box Plot? This question brings all the other ones together. Well, if we have skewness, we've shown that we have a strong ability to detect that. If we have potential outliers, we definitely have a strong ability to detect that. If we have Kurtosis, which is really related to the shape as is if there are multiple modes, then we really don't have a strong ability to detect that. If we're considering hypothesis testing, we definitely don't have an ability to detect that either with the Box Plot. What's the takeaways? Well, the Box Plot is definitely a powerful visualization tool. It's a great introductory tool, and it has a wonderful ability to separate skewness from potential outlying ness. But it has its limitations. In cases where we're looking at Kurtotic shape or a bimodality or multimodality, the histogram is definitely a better choice. That's really probably why JMP uses both the Histogram and the Box Plot together in the distribution platform to visualize how the data is behaving, if you will. Of course, adding descriptive statistics helps us really round up the picture where we have a graphical first approach. This is just, again, a summarization of what we've discussed. But the last couple of minutes, I just want to show you a couple more things about the data set itself. Because I think this is perhaps the most useful aspect of the project. How might we set up a data set like this? All we really have to do to simulate data in JMP is just create some rows and then create a function, a random normal function. The process that we did, one way you could do this is you could say, okay, you can go into a column formula… Let me just show you this. You can just double- click into it, and you can click Formula, you can edit the formula. You can go over here to these random functions, you can click it in, and then you can specify a population mean in Sigma. Zero and one by default, click OK, and then I can add a bunch of rows. I'll go ahead and add 100 rows. What about these other distributions? A uniform distribution is, we can use the random uniform function. And then we can specify a Min and Max value. In this case, I just specified the minimum of this column, this normally distributed data column, and the Max is the maximum. And then finally, as I mentioned, I standardized the column so that it was on the same numeric scale. This standardize this column, standardized feature is common to all of these columns. Now, the last thing I want to talk about real quick is, well, what about peak? What about right skewed, and even bi modal? Well, one of the things we can do, which I really think is cool, is we can use the distribution calculator in JMP to help us understand what certain distribution types look like. I'm just going to go into it here. I'm going to just drive down in here. I'll share with you the location here of this script. It's going to be under Calculator. It's not. It's going to be under Distribution. Generator. Distribution calculator, on the calculators, yes. How might I create a distribution that's right skewed? Which random function would I use? Well, I have the ability to look at some of these distributions and see for example, if I specify a random F and I specify these parameters, then I'm getting a distribution with this kind of skewness. And then I can say, well, what happens if I change these parameters a little bit? How is that going to change the distribution? I can use this insight to specify the parameters for the random distributions that I specify in my data set. In fact, that's what I did here. What did I do for the peaked one? Well, if I look at the T distribution, and I reduce the degrees of freedom, I'm going to get a distribution that's relatively peak. I'm going to see a positive Kurtosis in that. That's one way I can understand the shape of these distributions so that I can use them to my advantage to do different what if analysis in JMP. I'm just going to quickly go back to my slides. Thank you very much for listening. If you have any questions, I look forward to receiving them on the user community. As I mentioned, this project will be posted there, and the summary abstract is posted at this link here. Thank you again.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

You don't need an elegant predictive model to tell you what's going on with coral reefs these days; as seawater temperatures continue to rise as a result of climate change, thermo-sensitive corals continue to decline in abundance across the globe. Where data science comes in handy, however, is in what I refer to as "coral reef triage;" by that I mean, in the likely event that we can't save everything, which corals and coral reefs do we prioritize for preservation? Where are the refugia characterized by the unusually hardy corals that may have a chance of weathering the storm? Historically, we answered these questions by randomly stumbling upon corals and coral reefs that, for whatever reason (e.g., environmental factors or unique adaptations of the corals themselves), were unusually robust. Given the incredibly small percentage of coral reefs characterized to date (<0.0001%), however, simply hoping to come upon climate change-resistant corals by chance alone is neither a time- nor cost-effective conservation strategy. In this presentation, I use JMP Pro to demonstrate how we can instead leverage data from laboratory experiments and coral reef surveys to predict where we will find the most stress-prone corals, as well as those that display a marked capacity for resilience. By triaging coral reefs across a spectrum of climate resilience, we can not only make more informed management decisions, but we can actually use machine learning and other predictive modeling tools to dictate the optimal mitigation and/or bioremediation ("coral rescue") approach(es) for particular reefs. Hi everybody, thanks for tuning in. My name is Anderson Mayfield, and I'm a core reef scientist working in South Florida. Over the next 45 minutes or so, I'm going to talk to you about some exciting research I've been doing entirely in the JMP Pro suite on attempting to enable coral reef triage with machine learning. So, to give you an outline of what I'm going to be discussing today, I'm going to give you a brief overview of some problems facing coral reefs, the ecosystem I study. I'm going to give you a little bit of a recap of the talk I gave in the JMP Discovery 2021 summit as well as the 2020 summit. And this is what I'll refer to as the coral Veterinarians approach, in which I was trying to make predictions about the fates of individual coral colonies. About halfway through the talk, I'm going to segue to what I've been working on more recently, which is attempting to find the resilient reefs, which reefs out there are going to be the ones that can weather the storm with respect to climate change and are going to be around in future millennia. And this approach I'll refer to as the poor epidemiologist approach. So I think most of you probably already are aware of the motivation or the need for this research. Coral reefs are in bad shape. The reason is because, the simple coral animals that build these amazing structures have a delicate intricate association with dinoflagellates of the family symbiodyacia. This allows them to build these massive structures that can be seen from space. The problem is, as seawater temperatures get warmer and warmer, the symbiosis breaks down, the algae, the dinoflagellates are no longer able to photosynthesize, and they leave the coral or digested or actually just cease to photosynthesize. What this means is the corals slowly begin starving to death and they perish. Certainly we're worried about other stressors as well, things like seawater pollution, disease, eutrophication, over development of coastal regions. But on a truly global scale, climate change is what we're most concerned with, particularly these rising seawater temperatures. But for sure, certain corals fare better than others. There's harder species, there's more resistant genotypes within a species. You might even have clone mates that are in close proximity to one another, one of which might die due to high temperatures, the other will maintain resilience. So what drives this resilience in these more robust corals has been something I've been working on for about 20 years now. What I've been trying to do more recently is not just explain what makes corals resilient, but try to predict which corals that we haven't studied yet will be the ones that might inherit the Earth. So I'm going to give a kind of brief overview of my former approach. I don't want to make it seem like I've completely abandoned this line of research, but as you'll see there are some issues with it in terms of its cost. The goal today is to show you kind of the old way I was doing it and then this transition to this newer, cheaper, potentially more global way. So what I was doing before, I'm a molecular biologist by training, I was using molecular and physiological data from corals nearly exclusively to make predictive models that would then give me a prediction about the fate of the longevity, the lifespan of the coral. And this is what I call the coral veterinarians approach, because I was basically doing what your own physician would do. I would check in on my patients every now and then, I would take biopsies, I would profile them using molecular stress tests that I've developed over the years, and then I would attempt to make predictions about whether or not these corals would bleach as temperatures became warmer. I think it's important to note that, the molecular components of this are particularly important because subcellular biology is going to reflect aberrant behavior or stress indicative behavior before you observe the changes with the naked eye. I don't want to wait for the corals to bleach or become diseased or start to slough off their tissues. I want to look at sublethal indications of stress that happened weeks or months before these catastrophic manifestations. Analogously, this is why we have our annual physicals. You want to know, for instance if your cholesterol levels are high before you have a heart attack, because if you know you have high cholesterol you might be able to change your diet, take medication, change your lifestyle. You might be able to thwart these kind of more severe signs of health decline, like a cardiac arrest. It's the same idea with coral. We want to look at something at sublethal scales that we can do something proactive. So if we know a coral is stressed based on its molecular signatures, we might be able to mitigate something at the local scale. We may not be able to slow the rate of climate change for the sake of that coral, but we could do something at the local scale that would give it a chance. What I was doing a few years ago at a project I carried out at Noah's Marine Lab in Miami, AOML , is I was building thousands of neural networks in JMP Pro 16, in which I was taking laboratory corals and field corals. I was taking data from their protein levels. This is a proteomic project. Then we had our field test samples where there were these corals out in the field in the Florida Keys, where we didn't know if they were going to bleach or become diseased or perish. But we would routinely take biopsies and then enter the proteomic or the protein data into these neural network models I made in JMP Pro, and then the models will spit out a prediction. Then the beauty of working with adult corals is they don't move this is actually also a bad thing for them because it means they can't just move away when conditions deteriorate, but it means I know where to find them and then I can go out there and see, if the neural networks predictions were correct. They actually worked really well. This was one particular species we did this kind of proof of concept with was called Orbicella feveolata. It looks like this. With these neural network models that were trained with lab and field protein data, their accuracy is about 92%. So 92%, this is about 11 out of 12. So 90, 95% of the time, I can use the protein data exclusively tell you whether or not a coral colony will bleach as temperatures get really warm. Typically in South Florida, we see our highest seawater temperatures in August or September. In 2019, I took some samples from different reefs throughout the Keys. For instance, we have this sample here, 6745 from Crocker Reef. We basically entered the proteomic data from that sample months in advance of bleaching, so I think sometime in the winter. The neural network from JMP Pro 16 is flagged as being bleaching sensitive. We went out there as temperatures reached 32 or 33 C, which is very stressful for corals, and we saw the colony appearing like this. This is bad news. It might recover from this, but it probably hasn't. There was another coral from a site we know is typically more resilient. I mean, this is a huge, ancient, several hundred year old Orbicella colony. Based on its protein biomarkers input into the neural network from JMP Pro it was deemed bleaching resistant. Low and behold, we went out there during the high temperature event that was killing other corals, it looks pretty good. You don't see any signs of hailing or bleaching. Similarly, we have another site that's also known for having more resilient corals called the Rocks. It's protein biomarker signatures were input into the neural network model and it was also deemed bleaching resistant, and this indeed appeared to be the case. This is kind of a map of the Florida Keys, our marine labs up in Miami, so not too far away. This is something I've wanted to do for a long time, using molecular signatures to assign a level of health or stress, the case may be because this could enable coral reef triage in which we could prioritize our conservation efforts. Maybe this example reef down here that I gave an A plus, lots of resilient corals that don't seem in jeopardy of bleaching or disease. Maybe we will let that reef be for now and focus our efforts on reef that was given a grade of C. Maybe the one that we gave a grade of F, maybe it's too far gone it's not even worth our efforts to try to save it. But I think these kinds of triage data are going to be important for prioritizing management decisions and I was really excited about this project. But there's a huge issue, it's really expensive and it's slow. That's one coral species in a relatively small area of the Earth took three years of my time, working 80 hours a week, quarter of a million dollars to basically build those neural network models. Most of the world's coral reefs are in the Indo Pacific. The most beautiful one are found in this region that I've highlighted in the bottom, known as the Coral Triangle. These are areas that do not fund coral reef research to any great extent, they simply don't have the human power or the funding. Even if they did, there are hundreds, up to six or 700 coral species you can find on these reefs. I will have passed away long before I could do this sort of analysis with all these corals, even if I had a couple of helpers. It's too expensive and it's too slow. Is there something else we could do that would help us to know something about the resilience, the longevity, the stress loads of these reefs, without having to do these fancy, expensive molecular analyses that require well trained personnel. That's what I'm going to be talking about the rest of the time. This is what I call kind of transitioning from a coral veterinarian who's got a handful of patients that I know their health in great detail, to thinking of myself more as an epidemiologist. I'm trying to look for more global trends in coral health that I could use to make models about their future persistence on the Earth as temperatures warm. If you remember before, I only used the physiological data to make a predictive model. Now what I'm going to do is I'm going to try to integrate three disparate data types into making a predictive model. We're going to look at environmental data, and by that I mean things like, seawater quality, the type of reef, whether the reef is exposed to the elements, the shape of the reef, those kinds of physical properties, ecological data, this is essentially what's living on the reef. The corals present, how much algae there is, how many fish live on the reef. These are all things that could be important for reef health, and then also the physiological data from the corals themselves. This actually has never been done before. Most people monitor the health of reefs based on only two properties, temperature and the abundance of coral, which is a good start. But as I'll show you, I think these models that are more comprehensive and holistic are going to give you much higher predictive power. So in this case, we're not simply trying to predict the resilience, individual coral colonies, we're looking at a more habitat or entire ecosystem level scale, that's what we're trying to predict. So as a proof of concept for this, I've got a nice data set. I've been playing with from the Solomon Islands it's in the southeastern part of this Coral Triangle I mentioned that this is where you see the most biodiverse reefs, the reefs with the most amount of coral and in my subjective opinion, this is where you see the most beautiful reefs on the planet. And I had an amazing opportunity to dive all over this region and beyond with Khaled bin Sultan Living Oceans Foundation. A couple of years back, they carried out what was known as the Global Reef Expedition, it was the largest coral reef survey ever undertaken So we had a whole team of scientists monitoring the reef from the satellite level, from space all the way down into the molecules of the organisms residing on these reefs. So it's a really rich data set. We have nice reef maps we've been developing, we have scuba surveys, divers collecting information about what's living on the reefs. We're looking at our environmental data, our seawater quality this is obviously going to be important for coral health and then my role, as you can see in this image here was in sampling corals, just taking tiny little biopsies to profile with some molecular assays I've developed over the last 20 years. And we used a different species from the Caribbean. We use this coral called, Pocillopora acuta. It's kind of intermediately sensitive, so it's kind of in the middle, it's kind of a typical coral but more importantly, it's the model coral for research. So this is the coral that we know the most about its physiology. So I would encourage you to either check out my personal website, coralreefdagnostics.com, to really see how incredible a location, Solomon Islands and other places we visited were for people that are more interested in the data. Living Oceans Foundation has this interactive map web server that's loaded with high resolution maps and all manner of data we collected, it's all open access, it's a really nice resource and I was really happy to have been a part. So finally, 15 minutes in, let's start doing something in JMP. So I mentioned we have all these different data types. We've got stuff living on the bentos, we've got the ecological data, we've got the coral health data. If I talked to my marine biologist friend, the first thing they're going to want to know is, what's the coral cover on the reefs? Ecologists are admittedly a little bit too focused on abundance as you may see later in the talk, depending on how the models run. Coral cover alone or coral abundance is not actually a good predictor of poor resilience. A reef with tons of coral doesn't actually do any better than a reef with a few coral. One of the reasons that might be is, a reef that's been decimated that may only have a few corals left. Those stragglers inherently adapted or acclimatized to whatever killed off their brethren so they actually are more resilient. The reef might be gross and ugly and no tourists may want to go there, but it's not actually a lower resilience. So for me, I'm more interested in what's going on with corals. Most people in the field are more obsessed with coral cover, which is still important, even if it's not a good metric for resilience, you still want to know, where do I find the reefs with the most coral? Maybe that's where you want to start [inaudible 00:16:11] . How would you go about doing this in JMP Pro? With this demo, I'm actually going to do it in JMP Pro 17, a beta version that I've been demoing for a few months but you could just as easily do this analysis in JMP Pro. Just to familiarize you with what the data set looks like, the rows, there's 272, these are what we call transects. These are swaths of the reef that we surveyed. You can see we looked at different depths. These are the environmental data I mentioned before. We've got spatial data such as coordinates, the type of the reef, seawater quality. And you don't need to worry too much about these abbreviations, but these are just the abbreviations for the genera of organisms that were living on the reef. We basically bend them into 54 different coral bins, six algae bins, barren substrate, so this is where nothing is living, this is going to be important to remember. Then other invertebrates. These are the main things that occupy the reef environment. I've excluded the fish data because I didn't have a nicely curated data set at the moment, but I definitely want to factor that in later. But let's look at this live coral cover. This is all the different coral genera, sum together. This is a simple univariate analysis. I want to know, in the Solomon Islands what's contributing most to the variation in coral cover. And I think a really good way to get at this really simply as a first go, is to predict your screen. In this analysis, the Y is going to be my live coral cover, and I want to look at these eleven environmental parameters that I think might influence coral cover in the Solomon Islands. I'm going to put them here as my X. Right off the bat, you can see depth. It's contributing to about 40% of the variation in the coral cover. To a marine biologist or a coral biologist, this is not going to be a surprising finding, we know different parts of the world, corals prefer different depths. Most of the most lush coral reefs you're going to see are from about 2 meters down to about 30 meters. Let's see where we find the most corals in the Solomon Islands. With this selected, I don't even have to go back to my columns. I can just go directly into fit Y by X, move the live coral cover into the Y. Let's just do a simple ANOVA. I actually have my depth as bins, although I've got the continuous data somewhere. We see from doing this analysis of variance a really strong effect across these four depth bins, and we're seeing significantly higher coral cover in the eight to twelve meter window. We can actually look at these two Keys post hoc test and we see that eight to 12 have over 50% coral coverage. A healthy reef can range from 20, 40, 50% is astonishingly impressive coral cover, you're not going to see this kind of coral cover in much of the world. But for now it's important to know that, in the Solomon Islands eight to 12 meters is where you find the most coral. But to me that might be good for a publication, but that's not really that interesting. So if I've got colleagues or marine park managers who are working in the Solomon Islands and they say, We can't go out there and survey all these reefs. I mean, this is a huge area. What we surveyed was a drop in the bucket. We want to make predictions about reefs we didn't visit that might also have a lot of coral, that might be important for conservation. High coral cover reefs also where you see more fish and other invertebrates. This might be important for people that want to bio prospects, for instance. Now what I'm going to do is I'm going to do something similar, but rather than just do a simple predictor screen of coral cover, I'm going to do a model screen, which I try to build a simple predictive model of coral cover. Let's go back into JMP Pro 17. This was a newly- available feature in JMP Pro 16, I believe, and is arguably my favorite feature in the entire package. What you're going to see here, I'm going to set this up exactly the same way I did before. Live coral cover is my Y, and then we've got our 11 environmental potential predictors here. I had JMP make me ahead of time a validation column because it's going to be important to validate this. You see down here a list of all the different predictive models you can test. I want to include all of them. I want to look at two- way interactions as well as quadratics. I'm not going to do k-fold cross validation because I have a validation column. Let's let this run. It's going to be looking at this fairly large dataset. It's not huge. I think many of you working in industry, this will actually be a pretty puny data set, but it's going to test it with all these different modeling types and it's going to give me this nice summary output. I can see right here who won this particular battle. A generalized regression with forward selection using a pretty advanced it's looking at quadratics, it's looking at factorial combinations. It considered a lot of different parameters in the 68 samples that were flagged as validation. We don't actually even have to go into fit model now and try to rerun this. We can run it right out of the model screen. There's a lot of data, we're not going to sift through all of this because, to be honest, this was something I did on the fly by design. I've never run this particular model before, just because I think that really emphasizes how easy it is to dive in and start interpreting. There's other ways to get at this, but I'm lazy, so I want to see what are the most important predictors that this generalized regression model found. Depth. We're not surprised to see depth there because we just saw from the predictor screen that is important in driving trends in coral cover on the Solomon Islands. Reef type times latitude interaction, that's maybe a little bit harder to wrap our heads around. But let's go into the profiler and see what we can learn in more detail about this. The profiler is here. Let me close some of these things so we get a little bit more room. Enlarge this. The profiler is not showing me the reef type times latitude interaction on the same plot per se. But watch what so if you just look at reef type in isolation, we have barrier reefs, fringing reefs, patch reefs, and these other which tend to be these pinnacles that come up out of the ocean depths. We don't see much difference in coral cover, but look how the latitude line shifts. This is emphasizing that latitude times reef type interaction. Over here, we're seeing a very similar plot as when we did the [inaudible 00:23:59] in the fit Y by X. We're seeing 8- 12 meters as being the sweet spot for finding the most coral. But what I think is cool is to go one step further and do this desirability analysis. What I'm going to do, I think it's probably going to remember my presets, but let's just start it from scratch. I want to tell JMP to give me the scenario that would result in the highest live coral cover, because this is what a marine biologist is going to want to know. Right here, my response goal is to maximize live coral cover, so I want to have high desirability values for my high coral cover levels. I hit okay, then I go back in here and I say Maximize Desirability. Unsurprisingly, they stay the same, 8-12 meters is where we want to hone in on our search. But this might be more interesting to people that are embarking on a field trip. "Hey, we've got a week in the country, we want to find rich high coral cover reefs where should we go?" Well, I think you should go to this farther flung islands out and farther away from the equator. As you'll see later, these are the more remote, sparsely- populated parts of the country, which is probably where you expect to find more coral. A lthough it's very similar to the barrier reef, you'd probably want to focus on these other types of reefs and barrier reefs, if you have the choice, versus fringing reefs and patch reefs. I think doing this kind of analysis could be important for conservation and for planning field trips. But arguably, this is a little bit of an aside, and we have not yet reached the actual goal at this time. That's coming up. All right, we've done these two demos, let's go back into PowerPoint. I really wish I had more time for this, but I just know I don't and I feel so bad for all the developers and people that work so hard on this, but I take full advantage of the multivariate platform and this is going to be really important because even though in this past demos, I just looked at live coral cover, singular Y. In reality, that's completely belittling the complexity of these ecosystems. There's hundreds of things living on the sea floor. You really need to do a multivariate analysis where you've got multiple Ys, multiple Xs. We're talking about things like principal components analysis, multi- dimensional scaling, doing these daily in JMP Pro. Really like discriminant analysis. For instance, right here, this took me 1 minute. I can quickly see that reefs of Tinakula in a multivariate scale are very different from those of the rest of the country. If you were to go to the Solomon Islands, you would know, this is because these are reefs growing at the base of an active volcano. They look very different, they behave very differently. The multivariate benthic data corroborate this. Similarly, we see this nice effect. I've color- coded the reef sites by exposure, whether they were sheltered or exposed to the waves or intermediate. And you can see pretty nice parsing by exposure in this discriminant analysis. I'm a big fan of these algorithms and partially squares in particular, and I've got some hidden slides and some scripts in the data table that I'll make publicly available. So if you want to get more detail about the multiv ariate analysis, you're definitely welcome to download. But what I want to spend the rest of the talk on is the health of the corals themselves. T hat was looking at the bentos, the reef as a whole. I'm a physiologist, I want to know what's going on in the corals, and I measured so many different things in these corals over the years that I recently created what I call the Coral Health Index for the tree. This is basically an amalgamation of a bunch of different response variables that I know from my past research scale with coral resilience. What I've done is tried to simplify things to where if your Coral Health Index score is zero, this means you're about to kick the bucket. Five means you're immortal. Trivia is [inaudible 00:28:25] like corals and jellyfish technically are immortal if left their own devices and no stress, they can continue to regenerate forever but of course, in reality, there's always going to be some limitation. They're going to reach the surface, the water is going to get too cold, but they can actually live forever. Anyway, we're not going to see any corals their fives. This basically follows a bell curve so we're going to find most of our corals, their health indices are in this 2-3 window. With the help of John Powell, he made these really nice customized pie graphs. I adapted this from some... They're called these really great coral reef report cards. They're developed by an NGO called AGRA. I said, I love that visual. I want to adapt it, but focus on coral scale. What this is is each of these outer four widgets, which you can see the details here, the interior is basically showing you the average of the four widgets. A s you can see, we're seeing values as low as 1.5. Corals and Nono Lagoon seem to be the least resilient. Most of the people in the Solomon Islands live close to the capital of Honiara. We probably would expect this kind of west- east gradient. We tend to see higher Coral Health Index values over here in the provinces and the Reef Islands and Monte Carlo. This is not surprising. This map was made with Graph Builder. Let's see. I think I have enough time. I'm not going to try to reproduce this map because I think this map, even though I love it, I think it's still too complicated for a manager. They don't want to see all these pie widgets. They want a single number. I want to show you a really cool trick. There's great webinars about how to plot data onto a map on the JMP website but I'm going to do something that was new to me and it might actually be useful to a lot of you. It's taking it one step further. We're going to do it in JMP Pro 16 because I want to be able to publish this online. That's not yet a feature in JMP Pro 17 because it's still the beta version. I want to plot the Coral Health Index on a map. This is going to be shockingly easy in a Graph Builder. Just going to drag my latitude and longitude over JMP nose to treat these as such. I don't want this line. Right now it's just showing me essentially the location of my dive sites. I want to add a background map. This is the detailed Earth. Let's make it bigger. We see the Solomon Islands now. Getting closer. I want to overlay my Coral H ealth Index its color. Still not there yet. I want to convert this to a heat map, but I want a finer scale of resolution and this is the trick that I learned that I think is going to be really useful because I was actually doing this ArcGIS before, which is a PC- only program. I'm on a Mac, cost thousands of dollars. I said, why can't I do this in JMP? And it turns out that I can. What I want to do is I want to force a smaller grid onto this map because I want these cells to be much smaller. I want them to be 0.5 by 0.5 degrees. As long as you turn the grid lines on, it's going to give you an average of the Coral Health Index in each of these 0.5 by 0.5 decimal degree boxes. That's what I want. I actually prefer to use a green to red, and the default is to have red be high. If you remember the image of the Coral Health Index, I actually have green as the high value, so I'm going to switch it as such. I actually want it to span the entire range, even though I don't have many zeros or fives. I'm going to do this, drag this here, and now I think it's looking good, but it's still too busy. I'm going to turn the grid lines back off. It will keep the cell shapes that I want. Voila, in my opinion, this is exactly how I want to see these Coral Health data portrayed. But I'm going to take it yet another step further. I'm going to say, "Hey, look, my friends that have never seen these data, they may want to play around with the different environmental variables and see how these change depending on the type of reef, the temperature and whatnot. I'm going to add this local data filter. Going to give this a name. Still not done yet, though. I want to actually share this with my friends. What I'm going to do, I'm going to publish to JMP Public. This may take a minute because I may not be logged in, but let's just see. I'm going to create a new post. I want to share it with everyone. I can add an image if I want. I'm just going to leave all these defaults as is for now, and we'll publish it. It's going to take a few seconds. Hopefully it works well. It's going to migrate me over to the website, and I'll show you, as it's working, what you can then do once it publishes. All right, here we go. Let's go ahead and check it out online first. This is what I can share with my friends so they can say, "Hey, look, I only care about reefs over ... I'm only going to be able to go to the western part of the country for my field trip. I don't care about those reefs in the east. So let me just turn them off. Then it's going to refresh. Then you can hone in your search here. You could look at the different reef types. Another thing you can do, which I do all the time, is you can actually take the embed code or the embed card, copy it, and put it in your personal website. Because of the way my website's set up, I have so much padding here, it's not actually going to show the map very well. It's better for me to simply do what they call a card where I've got a schematic of it here, and then if people want more details, they can click on it and then go back to JMP public. This is a super cool feature that I think people with access to JMP should be taking advantage of. This is just showing you how you can basically even embed it within your website, within a presentation. But I don't think we need to go into that. A gain, that's another aside, we're finally getting to the good stuff. This is what I've been wanting to do. This is the goal of this whole analysis. So we're almost at the finish line. This is using the JMP Pro suite to try to find the climate- resilient corals that we haven't stumbled upon yet. We usually find climate resilient corals either through experiments, through surveys. We've lost this time window. We don't have time to do all these experiments, we don't have the money. Coral reefs are in bad shape. We need a way to speed up the search for the resilient corals that we may want to use for restoration. The ones we may want to protect, buy or preserve. What we're going to do is we're going to make a predictive model of the Coral Health Index we factor in all the different survey data we've collected. It sounds daunting, but I think you'll see this is actually something that could be done relatively quickly. In this case, I'm going to go to another data table that's got my coral physiological data and that is somewhere here. This is 110 rows. Instead of dive sites now, these are coral samples. This is the ecological data. The Coral Health Index is here. We're going to go over to my beloved model screen again. I probably could use recall, but just to be safe, we're going to take 50 benthic categories. These are the bins of things that live on the reef. Move them here. World Health Index is what we want to predict. We're going to take this validation column here. We're going to use the same settings as last time. It looks a little bit different because I'm now doing this in JMP Pro, but it's working very similarly. I want to do the additional methods with quadratics. I think this will run fairly quickly, and indeed it did. In this case, a neural network that was boosted rose to the top. Validation R squared of about 0. 49, it's not bad, let's run it. It's going to be different because of the way neural networks work. They can vary actually quite dramatically from run to run, especially when you have relatively smaller data sizes like mine. But we're still in the ballpark, 0.52. But if you know about neural networks, you know, there's tons of different modeling parameters that you can tinker and tweak. That's why this really brilliant ad- in from Dietrich Schmidt has been an absolute game changer for my research. He created a nice GUI that's going to let me look at potentially thousands of different factorial combinations of modeling parameters. But today, for the interest of time, I'm just going to do four. I input the model exactly like I did in Model Screen, but now you'll see these options that are specific to the neural network platform. I want to just look at, you know what, I'm going to explain this while it's running because it might take a second to run and we're running low on time. I'm going to have to build four models for me. I think everything's in there like I want. All right, now let me explain this while it's running. I think I input something wrong. Apologize for that. Let's see, let me restart this input this year. This is all correct. I want these to vary. I think maybe this was too low. Let's try it again. It's basically going to start running these models. It's going to use the JMP default. I've heard, basically he leveraged the power of design of experiments to basically have the number of sigmoidal, linear, and radial activation nodes span 0-4. We can have up to 20 boost. I'm allowing the covariance to either be transformed or untransformed, either with or without a robust fit. Because I want to go with the minimum number of potential factors, I want to use a weight decay algorithm. It gives me this nice output. Let's see if the R squared of the validation models did any better than the JMP default. Most of the time they do. In this case, it's not way too much different. About 0.55 We can run it, it will ask me to save the output and in the meantime it's going to run this model which may end up actually being very similar to the JMP default one. But then once it spits it out, we're actually going to, whatever gives us, we're going to go with it. I'm going to show you, assuming it was our square or another modeling benchmark, that you're happy with what you could then do with the analysis. That's going to be going back into the Desirability analysis. If you just bear with me another few seconds, it should finish. What we're going to do is we're basically going to go into the Profiler, and I'm going to tell the Profiler, hey, I want to find the conditions, the environmental conditions, and the benthic conditions that lead to the highest Coral Health Index scores. Because that's where I might want to focus my efforts for conservation, for trying to find Brazilian corals. You can see in this case we got a fair bit higher our squared. Let's go into the Profiler. It's probably going to remember my settings, just safety, let's go, set Desirability. I want to maximize the Coral Health Index, so it remembered it. Now I want to maximize Desirability. It's going to tell me the conditions in which I'm going to find the corals with the highest Coral Health Index scores We don't have time to go into all these, but this is going to be super useful for people that are embarking on field trips, and to managers. They're going to say, look, if I want to find the most resilient corals in the Solomon Islands, I'm best sticking to intermediately exposed fringing reefs, within the lagoon, submerged reef types. Some of these may not make as much sense, the time of day, temperature, you may not have that luxury. Things like depth, you want to focus on shallow corals, in this example. These are going to be super useful data that are going to allow us to find resilient corals on a much faster time scale. The important thing to note here is one thing to note is these aren't necessarily the conditions in which you find the most corals, because, remember, more is not necessarily healthier, but these are things that are cheap to measure. Latitude, longitude, you just need a smartphone. Temperature, you need a thermometer. You don't need to do these fancy, expensive molecular analysis by PhD scientists. You can train a high school student to go out there and collect these data that are going to be really informative for coral health. My idea is I have all these similar data sets from all over the world. I can start building what I'm calling this Coral Health Atlas. I can use Graph Builder to make these nice plots where I'm showing people where resilient corals are likely to be found. This is going to help us, in concert with these temperature based models from Noah, envision what the future reefs are going to look like, where we're going to find corals in the future, which corals are going to live there. Since we're running out of time, don't worry, I'm not going to read off this list. But this was not completely done in isolation. I did obviously benefited greatly from the JMP Pro software itself, but a lot of these people behind the scenes lended their support. Some of you won't be surprised to see your name there, some of you might be surprised, and that's it was probably because you gave a webinar or you wrote a blog or something that was really inspiring to me. I hope you're happy to see your name up there. I really want to give a shout out to Diedrich Schmidt if he's on, for developing that really excellent auto- tuning add- in that's greatly benefited my research. I also want to give a shout out to John Powell, not just for helping me make those figures, but because he was the person that really convinced me that JMP is more than just a software package. You've got this network of really talented individuals behind the scenes that are willing and able to help you along the way. I really appreciate John and everybody else's support. So with that, I'll end my talk and I'm probably over here furiously answering questions. If we are to any time left, I'm happy to field more. Alright, thanks a lot.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

Micol Federica Tresoldi, Senior Research Statistician, Dow Chemical Xinjie Tong, Senior Research Statistician, Dow Chemical This case study investigates chemical mixtures to achieve optimal properties using design of experiment (DOE) data. The formulation space consists of four input variables: Chemical A Type, Chemical B Type, Chemical C Type, and Chemical D Content. The first three variables represent different compositions for making Chemical A, B and C, respectively, and as such, can be coded both as categorical factors, as well as continuous mixture variables. We created the DOE treating them as categorical due to the experimental constraints. However, at the data analysis stage, even after considering thousands of simulated hypothetical formulations, none of them was predicted to meet the desired properties. At that point, to be able to identify promising subregions, we needed to overcome the discreteness of the space. So, we recoded those two factors as continuous and mixture variables, derived the equivalent regression model, and reran the simulations. Indeed, under certain assumptions, this coding strategy enables one to interpolate and consider missing compositions not present in the original DOE. In this presentation, we demonstrate how to use JMP Pro 16 Profiler Simulation feature with Graph Builder to achieve an extensive and insightful exploration of the formulation space applicable to diverse fields. Hello everyone. My name is Micol Tresoldi. Today my talk will be about coding with continuous and mixture variables to explore more of the input space. Before I jump into the topic though, I'd like to give your brief outline of what my presentation would look like. I'll start by sharing and presenting to you a little bit of a general idea of what the object of the project was and the objective that was driving it, and then I'll pass on to present the initial approach that we took initially to pursue this objective. I'll then show you though, that following this initial approach, we do encounter some problem. At that point at the problem stage, we'll need to go back to the beginning of the problem setting, and try to look at that from a slightly different perspective, in a way that we can figure out an alternative way of looking at our input variables. In doing this, we'll be altering our data structure. But I'll show you how we can actually build an equivalent to statistical model in a way that we will not be in need of going and collect any additional data, but actually we'll be able to re-analyse exact same data, and still be able to hopefully overcome our initial problem and find some useful directions to go. This is the overview of the presentation. Let me start by giving you the general idea of the project. When the clients first reach out to us, they had something in mind in terms of having some ingredients they needed to mix together, in a way that the final formulation exhibited some optimal properties. More specifically, any formulation was going to be judged upon two properties, and each of these properties had to meet some certain optimality criteria. As I just stated, the problem itself, is pre- general in its nature . We'll have some ingredients, using the common analogy, we can think about ourselves in the kitchen having some ingredients and having to figure out a way to mix them. In way that, at the end, our cake will look nice and also taste good. This is the general framework. Now, let me give you some more details about this specific cake. The recipe calls for four ingredients, Factor A, Factor B, Factor C, and Factor D. For Factor A, B, and C, actually the amount to put in the recipe is being predetermined. We don't have freedom there. On the other hand, what we need to decide though is how we're going to make those ingredients, if you like. There are multiple ways of making those ingredients because we have multiple raw materials that we can employ to arrive to those ingredients. Then only after having these ingredients ready for using, we can actually employ them in the final recipe. This is for Factor A, B, and C. For Factor D on the other hand, there is only one raw material we can use. Only one way of making it. What we need to decide is how much we're going to put a factor D in the final recipe. Just to recap, in terms of decision -making problem, we'll need to decide four things, how we make Factor A, how we make Factor B, how we make Factor C, and how much of Factor D we're going to put in the recipe. Okay, now I'll need to be a little more specific in giving you some more details about how, what were these ways of making Factor A, B, and C. The client, when they came to us, they had relatively few options in mind for this. For Factor A, they wanted to consider two raw materials. either only using raw material A1, or only using raw material A2. For factor B, once again, only two raw materials, B 1 and B 2. The possible ways of making factor B, it was either the two pure blends of B 1 and B 2, or a 50-50 blend of B 1 and B2. Factor C we are now three row materials are available for making it. Again, either the three pure blends, C 1, C 1, C3, or as a fourth options, are 50-50 blend of C 1 and C 2. With respect to the Factor D quantity, which I'm going to denote by from now on, by X 1, they wanted to test four possible levels. Four possible amounts, five, 10, 15, and 20. Regarding the response variables, those are slightly more straightforward in the sense that we only have two of them, both our continuous variables. Each of them, as I was mentioned in the beginning, had to meet certain optimality threshold, optimality criteria. For Y1 had to be above 17. For Y2 had to be above 2.6. Now we have on our left, our input variables that we need to decide how to maneuver and vary in making the recipe. On the right side, we have the properties that we're interested in. What we decided to do was to propose our clients to do designed experiment in a way that we would go out and make some of these recipes, make some of these formulations and be able, from the collected data, after recording the properties for [inaudible 00:06:32] , actual observed formulation, to understand and infer the relationships undergoing that were linking the input variables. How we were making our recipe and response variables. How the properties actually were executing themselves for different combinations of the inputs. Ultimately that the objective of the project was in fact, to figure out whether there was an optimal recipe, meaning a recipe that whose properties both met their respective optimality criteria. Given this framework, given this setting, now it's pretty clear that X 1 is going to be a quantitative variable. But how about Factor A, B, and C? Given the fact that we can mix these raw materials . Are we going to treat them as categorical or are we going to treat them as numeric? At this stage, because the client was particularly interested in observing the performance of these specific compositions of the raw materials for making the various Factors A, B, and C, we decided to accommodate their requests and coded them as categorical variables in a way that we were sure that those specific compositions were going to show up in the design of experiment. Again, categorical variables means that, and in this case, each level of the categorical correspond to a possible way of making the ingredient or factor. We end up with three categorical variables with two, three, and four levels respectively. Now, turns out that actually this categorical coding approach was also pretty helpful in the discussion of how we wanted to specify the statistical model that, in principle, was supposed to, or at least assumed to be comprehensive enough to describe and capture the relationship undergoing between the factors and the responses, the properties. For the client, was particularly easy for having this categorical coding to identify and specify what kind of interaction turns who they were expecting to see in terms of explaining and be relevant in explaining the relationships. The final statistical model that we ended up specifying the design of experiment, comprised of main effects, two -way interactions, all of them, quadratic and cubic terms for the continuance variable with the addition of the interaction of the quadratic with one of the factors. Now, of course, we also had some constraint in the number of experiments available. Because we obviously don't have infinite amount of resources, so we put a constraint of 51 runs, and this is the DOE that JMP gave us able to estimate the statistical model we just specified, and also be able to be within the constraints that on our resources. Now with this, the only thing that was left to do was go and make this 51 formulations. Imagine that we're super quick, and everything is magic, and we have already got gun and made all of our relations collected data . Now we are in good shape for estimating the Gaussian model that we specified. These are the results for the first property, Y 1. We can see that there is a pretty good fit between predicting and actual values. A lso, if we look at the metrics, the reporting, the model summary, those look pretty satisfactory. The same is true if we look at now at the second property Y2, again, pretty good fit. We are happy with our models, and we think we did a good job in capturing the relationship. Now remember that what we really want to discover is, in fact, there is any optimal recipe that can meet both criteria for our properties. How are we going to do this? How are we going to establish if such a optimal recipe exists or not? Well, in JMP Pro 16, this is a super easy task, because we can simulate thousands of potential alternatives recipes by using the Profiler Feature options. For each of these hypothetical recipes, we can automatically have in the same table the predicted mean value for the two properties, so that it comes super natural and super easy to see if there is any optimal recipe. Just to give you an idea how quick that is, I want to show you live, how we can do this. This is my DOE categorical table, where I have my Factor A, B, and C. X1 is my only quantitative input variables. I have my recorded values for the two properties, Y1 and Y2. Imagine now that we have already run the model, estimated model and saved the prediction formulas for the two variables here. We can go here and highlight these two columns. Go to graph, select Profiler, and then put those two prediction formulas in the Y prediction formula box and click OK. This is the usual way we get a profiler dialogue box. In fact, we can , easily play around and changing the various, but levels of the inputs in a way that we can actually see how this impacts our predictions for the two properties. However, what I want to show you today is how we can actually ask, going to the red triangle, ask JMP to output a random table, and we can make it as big as we like. I'm going to start with 30,000 rows, just to start, I'll show you, see, didn't really, took no time for JMP to give us this 30,000 rows where each table, where each row corresponds to a hypothetical recipe that we haven't necessarily seen in the DOE. This is the power of having this feature in JMP, that we can explore the input space in literally no time. Now, if we are interested in seeing whether there is one recipe that is optimal, then we can go here, Graph Builder, and put the predictive values for Y1, predictive values for Y2. And then just to aid our visualization, I'm going to put a vertical axis in correspondence of the optimal threshold for Y2, and likewise horizontal line marking the optimal threshold for Y1. This upper quadrant denotes the optimal region, because both properties are satisfying the optimality criteria. Unfortunately, that we can see from here that we don't find any recipe that is, in fact, able to satisfy both the criteria. This is like, okay, not very good news. Now let me go back to my presentation very quick. We can see, in fact, that we don't have any properties line this quadrant with the happy green smiley. What do we do at this point? Do we give up? Of course not. What we can do is, in fact, go back to the beginning of the problem and try to see if we can change any of our initial choices that we first made in approaching the problem. In particular, you might be remembering that we were undecided whether we would treat the Factor A, B, and C as categorical or as numeric. So far we have treated them as categorical. So far, factor A as being a categorical variable, with two levels, either only using A1 or only using A2. However, because in fact, the client were open to mix the raw materials to make Factor A. So that was an option. Then what we can think of is substituting this Factor A with variable that now I call A1 Content, which is a quantitive variable, which represents how much of A1 I'm going to put into the mixture of A1 and A2 for making Factor A. The translation, the conversion between categorical levels and numerical values, it's almost immediate . If I'm only using A1, I'm going to use 100% of A1 in my mixture. so I can code A1 Content to be equal to one. On the opposite side, if I'm only using A2, this means that I have zero A1 Content in my mixture, and therefore A1 Content is going to be equal to zero. You might have guessed that implicitly, we are also defining A2 Content to be equal to 1 - A1 Content. But we don't really need that because we are only looking at two mixture variables. Why are we doing this? Well, the advantage is clear . With Factor A, we were constrained in looking at either A1 Content to be equal to zero or one. Now that we're considering continuance coding, the A1 Content can take any value between zero and one. This, of course, represents an enormous jump in the flexibility of our model and an infinite in the sense that now we are open to literally infinite more mixtures and infinite more ways of making Factor A. Likewise, Factor B is categorical with three levels. So far it's been this way, coded only B1, only B2 or 50-50 blend. But following the similar logic, we can now introduce a B1 Content, continuous variable. A gain, the conversion is going to be exactly the same. 50-50 blend of B 1 and B 2 will be converted in 0.5 because I'm using 50 % of B 1 and 50 % of B 2. Again, B 2 Content is 1 - B 1 Content. A gain, the advantage is that we're not bound to jump from zero to 0.5, or to zero to one necessarily, but we can explore the whole spectrum of values from zero to one. Factor C is likely more tricky, because we do have three possible raw materials to mix up. A t this stage, we need to introduce not just one, but actually three continuous variables that besides being continuous, have also the mixture constraints. Meaning at all times, they need to be something to one. But the conversion between the levels of Factor C and the three new mixture variables follows exactly the same logic. That's super easy. This is just a visualization of how we do the conversion of the levels. This is how the DOE points that we already have the data on. We don't need anything else. Are seat within the continuous coding space. Now, the only more involved steps in passing from the categorical coding to use a continuous coding is how, in fact, we convert this the statistical model that we use to design the experiment and then to analyze the data. How are we going to do this? Well, the easier way is to just do it in many small steps. What we're going to do is start with our main effects model, a little by little at the different factors. We start with Factor A, which had only two levels. Now in the continuous coding, what we're going to put is A1 Content. We're only going to put the linear term of this A1 Content. In fact, we only had one coefficient for Factor A in the category coding model. Likewise, now we're going to have one single coefficient for A1 content. Now if you don't believe me, this, it's an equivalent model. I'm going to show you a couple of examples. Imagine that we want to figure out the impact of using only A2 for making Factor A, then that means that A1 content is zero. Fine. From the categorical coding model, we're going to just look at the intercept term, because this extra term refers to when we use A1. On this other side, for continuous coding model, we're going to put the intercept, of course, and then the A1 Content coefficient, but now we will multiply it by zero because A1 Content is zero. Not even doing any math, you can really see that these two numbers are exactly the same. Similarly, if we want to see , what's the impact of using only A1 now at this time, A1 content is going to be equal to one. Now for categorical, I'm going to sum up the intercept term plus the Factor A coefficient accord accounting for the difference and the levels of the factor. On this other side though, we are going to always include the intercept. A t this point, we'll multiply the A1 Content coefficient by one because the content is one. Again, not even any math, the two numbers here are the same as the two numbers here. Exactly equivalent. Now with Factor B, we had three levels. How are we going to do that? Well, because it has three levels, now we can't just add the linear term, but we also need to add the quadratic term. We had two coefficients before, and we're going to have two coefficients also now with the continuous coding. A gain, if you don't believe me, this is an equivalent model, we can work out at least one example, which works exactly as befor e. If I only have B2, B 1 Content in zero, means that two coefficients are going to have zero weight in computing the impact. Therefore the two numbers are only just two that are, in fact the same. I'm not going to go into this again, only B 1 is equivalent to B1 Content equal to one. The most interesting is, this that at least requires you to do some summation. Where B 1 Content is going to be 0.5, because we are considering 50-50 blend. You can verify easily that these two numbers here summed up are equivalent to this other side of the equation where we put 0.5 and 0.5 squared, because now our B 1 Content is equal to 0.5. Now for Factor C, we had four levels. We particularly remember, we had three possible raw materials. We had to introduce three mixture variables. Every time we do have to deal with mixture variables things, it's slightly complicated because they become perfectly cleaner with any constant term. In putting the C 1, C 2, and C 3, the sum of them deletes or requires us to delete the constant term. But other than that, everything follows pretty much the same. We had three coefficients here, and we're still going to have three coefficient here because we have four, but we are getting rid of the intercept. So still the same balance. A gain, I'm not going to go through all of the examples, but you're more than welcome to look at the slides offline and check that those are, in fact, gives you always the same answers. These are all the examples. Now with so much work, we have found the conversion of the main effects. How we actually convert each separate factors into using the new continuous variables? Now our original model, though, included more than just main effects. In fact, we had the two -way interactions. Now the idea here is that every time Factor A appears, I'm going to substitute it with A1 Content. Every time Factor B appears, I'm going to substitute it with the two B1 Content and B 1 Content squared. Likewise for Factor C, I'm going to substitute it with the four terms that I've put here. The same holds when I'm interacting with X 1, and everything is very much in the same flavor, logically follows the same scheme. The only caution that you want to be aware of and be particularly attentive about is that every time you interact a three mixture variables, where those are your three mixture variables, those main effects that you originally had now need to be excluded from the model, otherwise, the model won't be feasible. That's the only caution that you need to be careful about. Other than that, we're ready to go. We've got our equivalent continuous model. Now what we can do is, in fact, again, verify that everything is still same. I get exactly the same predictions, either using the categorical coding or going and using the continuous coding. Now you might ask myself, why are you going into so much trouble and going, doing so much mess if things are exactly the same? Well, the advantage is immediate to see, and you can really appreciate it if you start looking at the profilers. This is the profilers, how it looks, when you use the categorical coding . You have to jump between the different levels. You don't have the faintest idea what can happen in between. With the continuous coding on the other hand, that's exactly what you can do. You can explore way more of the different possible ways of making the various ingredients Factor A, B, and C in a way that before it was just out of bounds. In technical terms, means that we have way more power of interpolation. This doesn't come free, of course. What you pay, the price of is in fact, that you are implicitly making some assumptions. The assumptions regards the way that the various new continuous variables that we have introduced are related to the responses. In a way, we are implicitly assuming that the relations between A 1 Content and our properties is linear. The relationship between B1 C ontent is quadratic and so forth. If you think that those assumptions don't really hold in your case, then of course, the whole procedure is questionable. You don't want to pursue this. But if you don't have any reason why you wouldn't believe this, or at least why you wouldn't at least explore this possibility, then, now we can go back and do the same exercise and explore again the input space, but with way more flexibility. Again, let's see if we can find that an optimal recipe with this new continuous mixture coding. How we're going to do? Well, exactly same ways. I'm going to use the JMP profiler feature and use the simulation and see if we can find anything. Now let me go here. This is my DOE continuous table. Continuous, because now you can see that these are all coded as continuous variables . They have the blue triangle next to themselves . The C 1, C 2, C 3 are also these stars, indicating that they're coded as mixture variables in JMP. Now imagine that again, we have already fitted our model with the fit model platform. We saved our prediction formulas now with the continuous coding. What we're going to do same thing, Graph Profiler, select those, and here we go. Here is our prediction profiler. Now we can play way more with the profiler and see all different combinations without having to jump between different options. Now, once again, red triangle, output random table. Just for making things fair, I'm going to ask 3,000 rows. Again, no time, literally blink of an eye. JMP gives you 3,000 row tables where now every recipe is again, sorry. Every row is again, a potential hypothetical recipe that we haven't really seen, necessarily seen in our DOE but it still feasible, because it still respects the constraint that we had at the beginning. Once again, to figure out whether something good is happening, or at least whether within this 30,000 formulation, we do find something that is optimal. I'm going to construct the same graph. Now you can see that our points are all disperse and are not aligned anymore. Again, fitting the axis just to aid our visualization. This is the nice thing. With this way of coding and looking at more of the input space, we do find few formulations that seem to be promising. Of course, we need to keep in mind that this our predictive values. Everything is still relying on our data, on our statistical model analysis, but is still more promising than before. We do find something in the optimal region defined by these two axis. Quickly, going back to my presentation, I want to draw a final conclusion here, which is, in fact, that, using the categorical coding, we couldn't find any recipe that at least on the predictive side, could, in fact, meet both the optimality criteria. Well, once we turn to, figuring out how to code these different categorical variables into continuous and mixture variables and exploit the JMP power of giving us thousands and thousands of formulations, we do find a few that in fact meet the specs. We were happy that at least we could go back to our clients say, look, instead of giving up on your project, try to make these formulations and see how, in fact, whether the actual properties do meet your criteria or not, but at least it gives us some directions of improvement where to go. With this, I'd like to end my presentation. Thank my colleague, Xinjie Tong and all of my collaborators at Dow Chemical. Thank you, all of you for watching my presentation. I'll be more than happy to answer any questions you might have at this point. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

JMP Pro 17 is a new standalone platform of choice for modern molecular-level data arising in such fields as genomics, metabalomics, and proteomics. Our previous product, JMP Genomics, relied on SAS for data import, processing, and analysis of the large data tables that are associated with -omic problems. New improvements in JMP Pro 17 provide an advanced level of capability and performance that allows it to stand on its own without the need for SAS. However, the move from JMP Genomics to JMP Pro for Genomics revealed many aspects of JMP Pro that needed to improve. These improvements have pushed the boundaries of what the product can do so that it can now handle these large problems. As a result , JMP Pro 17 is one of the only advanced analytics software packages to provide a combination of interactive and engaging user experience that allows for rapid point-and-click exploration of -omics data, advanced multivariate and predictive modeling tools, and a flexible and adaptive platform (through JMP Scripting and integration with other data science tools). After defining -omics, this presentation examines the types of data used for these problems, the technical challenges that come with preparing and analyzing large wide data tables, and how JMP Pro 17 addresses these challenges. Examples of just how easy it is to do -omic data analysis in JMP Pro 17 are also demonstrated. Hi, this is Sam Gardner with JMP. I'm a Product Manager at JMP. We're here to talk today about introducing JMP Pro for Genomics, pushing the boundaries of JMP Pro to enable data science on the desktop. I am one of the presenters. I'll be doing the introduction to this topic. I'm S enior Product Manager for Health and Life Sciences in the Product Management team at JMP. Our co-presenter today is Russ Wolfinger, who's a Distinguished Research Fellow and our Director of Scientific Discovery and Genomics at JMP. We'll talk a little bit about the background of genetics and genomics, functional genomics, and then talk about what we're doing to transition from our former product, JMP genomics, to using JMP Pro for genomics. Russ will demonstrate some of the new capabilities in the product. A little bit about classical genetics. This is where a lot of this got started. People have been doing classical genetics for a long time. They've been breeding plants and animals to get desired traits for those plants and animals. They've seen that they can do that to get, stronger animals, better plants, plants with desired properties and so on. You probably studied a long time ago, when you were young in school, about Gregor Mendel, the monk, who spent many years studying garden peas. He actually measured seven distinct characteristics of these peas— their height, their pod shape and color, seed shape and color, flower position and color— and observed that as these peas were crossbred with each other, that the traits were passed on from the parent plants to the progeny plants following some rather specific mathematical ratios who have made it probabilistically possible to make predictions about what the progeny would look like based on the traits of the parents. His and later work established the principles of genetic inheritance. What is genomics? Genomics is more than just classical genetics. Genomics uses a combination of DNA measurement methods and recombinant DNA methods to sequence and assemble and analyze the structure and function of genomes. It differs from classical genetics in that it looks at the organism's full complement of genetic or hereditary material. It focuses on the interactions between the loci or the location of different genes on the genome, and the alleles, the variation in the genes in the genome, so that you can understand things like epistasis, pleiotropic heterosis, which are things like, okay, one gene affects many things. That's pleiotropy. Epistasis is that sometimes, one gene impacts the output or the effect of another gene. Heterosis is sometimes you get synergistic effects by combining the genes from two different parents or two different organisms. This all relies upon the use of the central dogma of genomics. That dogma is that DNA, which is the code for our biological systems, is transcribed into RNA, which is the code that's used to make things and make proteins in the body. The proteins are the little chemical engines that do things inside the body and give it its function. From that, you can actually then measure things like metabolites, what actually happens, what do those proteins actually do inside the cells and inside the body. The path is DNA creates RNA creates protein, and the protein regulates how things function in the body, and that produces metabolites. Data is really enabling a genomics revolution. Modern measurement techniques are really helping us understand the structure and function of the genome and how it works inside the cells in biological system. We can sequence the genome now. We've got next- generation sequencing. Many years ago, when JMP first moved into this area, helping customers to be able to analyze this type of data, the way to measure it was microwaves, which was much more focused on very specific parts of the genome, and oftentimes a very limited set of genes in the genome. Now, you can sequence the whole genome of an organism. Also, you can look at things like expression and regulation. We're talking about the metabolites. What is the output into the biological system that you can measure? You can look at how the proteins are produced or what those proteins are doing. You can also look at how the structure of the DNA itself, what's called epigenetics, impacts the function of how DNA works and how the genes work inside the body. There are typically three main stages of analysis that happen when you're doing this type of work. One is you just generate the raw data. You do the sequencing work, generate the genome- sequencing data, or measure the metabolites or the protein expression or the RNA expression. And then that generates pretty large data sets that have to be filtered and de- multiplexed and trimmed and scored and cleaned up. This is typically handled in a automated or semiautomated workflow on computer systems that can process very large data files. Then it typically goes into a second stage where you start to do sequence alignment and basically lining things up, and being able to do things like counts. How many times did I see the expression of a particular RNA fragment or RNA sequence? Or how many times did I see a particular protein? Or all this raw data, how does it line up to actually make a picture of what the structure of the whole genome is? That's a pretty big mathematical computational process. That typically also gets done on pretty large computational systems with a lot of computational resources. And then the third stage, which is the stage where JMP really has played in, and where JMP Pro will continue to play in, is the determining genotype associations and genotype-to- phenotype relationships. A phenotype is just a trait of organisms, the relationship between the genes and the traits. And also looking at correlations and associations of the different genetic markers inside the genome, or the variance of the genetic markers. Oftentimes, what you want to do is you want to characterize those and then correlate them to physical, biological, or maybe disease state characteristics. All of this can actually be done with desktop software. JMP Pro is our solution to do that going forward in the future. We've had a product called JMP Genomics for 14 years, up until this year, that we were providing the customers. It was a combination product of JMP and SAS. SAS was really needed back early when we first put this out to do a lot of the data processing, because the size and the types of data we looked at was very difficult to do with a desktop software package like JMP. SAS did the data processing, some of the statistical methods, but JMP was used for further statistical analysis and visualizing the results of those analysis. JMP Genomics has been used in research and industry for a wide variety of genomics problems for many years. But we made a strategic decision this year to discontinue selling products that contain SAS with them. That's part of the decision that was made for JMP to become an independent company. We're a wholly-owned subsidiary of SAS now, and are moving down that road of independence. We are not going to be selling anything but JMP products going forward. Because of that, we have looked now to move the functions for genomic data analysis into JMP Pro. In JMP Pro 17, which will be available this fall in 2022, has been and will be optimized for big and wide data problems. It's going to have capabilities to meet the needs of genomic data science and genomic data scientists. It's going to utilize the strength of JMP Pro's predictive analytics and interactive visualization to help enable discoveries in this area of work. Some of the enhancements that we've made to push the boundaries of JMP Pro include just removing barriers and bottlenecks in the software. It's one thing to do analysis on tens or hundreds or even thousands of columns in a data table. But when you have a data table which maybe has many thousands or hundreds of thousands of columns, you start to reveal limitations sometimes in your software. By doing this work, we've uncovered places where we just need to streamline how operations happen inside the program. We've done that. An example would be if I wanted to do a transformation on hundreds of thousands of columns, we've significantly improved that process. It happens much faster on the data tables. Also being able to do very fast and efficient multivariate analysis methods like principal component analysis and clustering, when you have these really wide genomic data tables. And then being able to do models over and over again on thousands and thousands of response columns, and to do that efficiently and effectively. The second goal that we have in this transition is that bring in some capabilities in the JMP Pro that are very specific for genetic and genomic analysis. For instance, being able to import different formats that are commonly used in this area. Also, being able to do genetic marker analysis and simulation, as well as bringing in some newer popular data reduction methods such as t-SNE and Unimap. Overall, what we're getting to is a product that's going to be lean. It installs very quickly. You can use it on your desktop, but you can use it to do this very powerful analysis on these large, complex, wide data tables. To illustrate that, I'm going to turn it over to Russ. Russ is going to show us actually how you can do some realistic analysis and some real study analysis here on some genomic and genetic data. Well, thank you, Sam. It's a real exciting time for us. I know I've actually been with the genomics analysis revolution within SAS for over 20 years now. We actually [inaudible 00:11:46] in the early 2000s called Scientific Solutions, where we were starting to look at some of the early micro array data. It's been a really fun 20 years. Now, I would say, almost one of the most exciting times ever for us, where we're now able to code some of these routines directly in JMP pro using C++. A lot of them are running much faster than we had in the previous JMP Genomics product. I want to give you a little f lavor of that today with an example. This is a data set on loblolly pines, which for those of you from the Southeast might know it as probably one of the most popular species of pine. Typically, if you go into Home Depot or Lowe's and buy some two- by- fours or plywood, it's going to be made of l oblolly. When you fly into the area, you happen to see a lot of tree cover. Many of those, I'd say a good chunk of those trees, especially towards the Eastern part of North Carolina, are lobl ollies. It's a very important species, one that we really want to understand well. It's been studied very thoroughly, and even more so now that we've got some crunches going on with home building and what have you, it's critical to understand it inside and out. Genomic technology is fantastic for revealing some things that we just never knew before. This data is actually still 10 years old. It was from a paper in the Journal of Genetics by Resende et al. This is a group of researchers from the University of Florida and Embrapa in Brazil and University of Iowa, I believe, if I recall correctly. Here's the reference if you want to look it up. The data are also freely available. I've got them. I went ahead and downloaded them from the supplemental information and loaded them into a JMP table that you see here. As Sam was mentioning, the format in JMP Pro is what we typically like to call a wide format, where we've got everything in one table. Here, we've got some genotype indicator numbers indicating the lines as well as the mother and father that the trees came from. And then this specific data set that I've got here, we've got six traits that we've measured. I believe actually there's more. I think there's 17, if you want to see the reference. Our key focus of interest are these genetic markers. This data set's small by today's standards. We've only got 4,800. I say "Only 4,800" but that's still quite a few. As you can see, I'm scrolling through here, they're all coded as either zero, one, or two. These are so-called SNP markers, single nucleotide polymorphisms, where we'll have either... The number here indicates the number of the major allele that we have in the data. Zero would be the little A, little A, if you're familiar with the old genetics notation. The twos would be the big A, big A. The ones would be all the heterozygotes. So 4,500 of these markers. The basic goal in the end, typically... In fact, that was what the paper that this was from was about. They were comparing several of the popular predictive methods. But before we get to prediction, there's a lot of really good things that you want to do just to make sure the data are as expected, and also to learn and discover structure and other interesting characteristics. Let's dive in and see what we can do with a typical workflow here in JMP Pro. I would typically just like to look at the data in JMP. We can use just basic platforms. For example, here, let me bring up the multi- area platform and just check out basic plots of the data against one another. You can see, for example, here, rootnum and root numbin are fairly highly correlated with each other. Other ones, not so much. You can do distributions. For example, w e can do it here with the distribution platform. These traits have actually already been centered, I think. I believe all of them have a mean of around zero. They've gone through a little bit of pre-processing that we won't go into today. That's the way they came from the paper. Our basic goal is to use the genetic information to predict these traits. They represent various characteristics of the loblolly trees. For example, C WAC, I believe that's crowned with across the plant beddings. It's a measure of the tree size. We've got other measurements of density and characteristic of the roots, etc. All important things to know about and when studying these trees. Let me walk you through what we might consider a a basic workflow once you have your data set up like this. Now, before doing that, though, I do want to mention too that we have put in a fair bit of work to helping and aiding with importing such data. This particular data came as just standard comma separated value files, so no big deal to import it. But often, genetic data like this come in so-called VCF files. We now have new routines to be able to import those directly, as well as import files from the popular database, and then a few other formats, IDAT and what have you. Trying to make it really easy to get your data into JMP. As you know, once you've got your data set up in a JMP table, there's just all kinds of great things you can do. Many of the things that you hear about... Give you some more ideas, as well as some new things that we've put into place. To start out, we've got a brand new couple of platforms under the Analyze menu here at the bottom. Genetics. Analyze, Genetics. We've got Marker Statistics and Marker Simulation. Let's run the first one, Marker Statistics. This is just a basic platform for looking at characteristics of a set of markers. You can see here, I'm loading. We've got 4,853 SNPs organized in a group here in the JMP table. I just move them over into the markers. If everything else is okay, we'll just click OK. It runs quite quickly. What this basically does is it takes each marker and computes a variety of standard statistical genetic statistics that you can look across here and see what's going on. A key thing to check for a so-called Hardy- Weinberg Equilibrium. You can do a statistical test of that and get p- values from it, and even plot these along in a graph like this. On the Y axis, we actually use the log 10 p-value, which we also call the log worth. To go once step further, you can make a false discovery rate adjustment to avoid the multiple testing problem. You can see here, we've actually plotted both: the raw p-value, the raw log worth, as well as their FDR adjusted p-value. They tend to be quite similar, especially for the large ones. These markers up here are ones that would be out of equilibrium, very likely due to the cloning of the trees. These would be markers that might tend to drift or stabilize over time with future crosses. It would be good to check these out and make sure the distributions of the alleles are as expected. Arcing all the way back to the Gregor Mendel days, things that we learned about how alleles like this should behave. That's a good place to start, just to get an idea for the markers. Let's move next and do some pattern discovery. Here, there's several nice things we can try. A very basic one that's also been popular for decades with gene expression data is just to do hierarchical clustering. Again, I'm just going to put the SNPs in here. You typically will want to use one of these faster methods. Let's use fast ward. We do have some missing values, so let's do imputation. We'll go ahead and cluster it two ways. Let's click OK here. I'm going to go ahead. I'm running everything live today. A few of these things will take seconds to run. A nalyses I've got that actually will take a few minutes that I won't run live just for sake of time. But you can see here, this scale of data, JMP Pro can handle fairly readily. This one, you can see that the progress bar here will take probably 30 seconds to a minute to finish. But not too bad for a medium- sized data set like this. Again, we're clustering around 926 rows and 4,800 columns. But before actually the performance enhancements, this kind of analysis would take several minutes. In many cases, we've been able to achieve orders of magnitude speed up. I'm able, basically, to enable you to do analyses like this close to real time. A little bit of waiting might be required as here, but in general, it's pretty nice to be able to quickly get answers to fairly difficult questions. For example, here, we're trying to see how other rows of our data cluster with each other. Now here, a very interesting thing occurs. You can see I've got colorings that I did to the data. I colored the mother and father, or maternal and paternal alleles. If we look at this variable here, there's around 71 unique levels. And then within each cross, there's up to 17 or 20 individuals. The data have very nice, tight clusters. The clustering algorithm actually found those. You can see the colors indicate the coloring. This color theme is a bit jarring. Let's move it to black and white. We can see the structure a little more cleanly. Here, we can see the areas of white or where we've got some of those minor alleles starting to cluster and identifying the key places in the genome that distinguish these unique crosses. This is a nice plot just to get an overall feel for the various lines and how they compare with one another. But the main lesson are these tight clusters that are mapping up exactly like we would expect with the initial crosses, basically like very close siblings to one another compared to cousins, or second cousins, third cousins, etc . Now, another way to go about this would be more of a dimension reduction type approach. Here, the number one analysis is principal components. Let's try that on our steps and see what that reveals. Here, let's just use the defaults. Sorry, actually, I wanted to show off... There's a brand new method for wide data that's called fast approximate. It's a nice addition in software. It actually uses, if you're familiar with the method called a randomized SVD approach. You can see a little message. Let's see what's in the log. It turned out this was actually one case where an error message was quite beneficial. The software actually indicated which markers... There were some markers, they were non-numeric or constant. It turned out that a handful of these markers in the table were constant. This would be a case where we could go back and actually clean those out, since they're not really contributing much to the analysis, they're just constant. But the PCA platform found them as a byproduct. But if you look at the scores, first two principal components, we again have this nice clustering of families. As usual with JMP, all these plots are interactive and connected to one another. We can, for example, click on one of the branches of the tree over here, and it will highlight that cluster in the PCA. We can map these two graphs to one another. In fact, well, let's do that. We can add a third one. This is another brand new platform that's just coming out in JMP 17, called Multivariate Embedding. Here, we're going to compute the popular t-SNE algorithm, which stands for T multivariate embedding. This has actually been quite popular in the machine learning world, and it has trickled its way into the genomics field, especially with single- cell RNA. It does a little bit different dimensional projection than PCA. It tries to identify local structure, whereas PCA is looking for dimensions of largest variability across all markers. T-SNE's trying to find tight local clusters. It's actually perfect for this kind of data, just to reveal these families. You can see the nice little groups of clusters, and maybe more importantly, which clusters themselves are near each other. You can take a picture here. Kind of looks like a butterfly, something t-SNE will often have . I'd encourage you to try it on your data once you get your hands on JMP 17.0. That's revealing some nice structure in the data. Let's move on now to a dd some more statistically- oriented modeling. For it, the basic thing to usually start out with is what we would call a genome- wide a ssociation study, where w e'll basically take our trait, or our traits, in this case, and screen them against all the markers. The workhorse platform here is Response Screening. I'm going to Analyze, Screening, Response Screening. We've done quite a bit of work on this thanks especially to John Saul , who has implemented some nice performance improvements. What this does is basically a big Y by X analysis. I'm going to move our six targets or responses into the Y field, our SNPs into X. And then all you do is hit Go. What this will do... I think we do imputation. I think it might do that automatically. Let's see. Yeah. This one runs lightning fast. I basically just did six times 4,800 quick regressions and plotted all. This is a plot of all the p-values at once. Again, focusing on false discovery rate. It's got to be very careful about overfishing data like this. You want to make sure any lead that you chase is significant, even after a false discovery adjustment. Here, we see now that this crown width feature is the one that's popping out with the most hits. Then there's one for rustbin. These are sorted by significance, and then some of the other traits start to pop in. But clearly, it looks like we've got the most genetic action with this crown width trait. Now, to go a little further and illustrate the things we can do. This is very JMP Pro like. Let's save the table out of p-values. We've got everything now in a new JMP table, which is effectively all the results, and they're nicely colored for us. Just want to browse the table. But I'm going to go ahead and use Graph B uilder now. Let's make some volcano plots by hand. For these, we w ant to put the slope on the X- axis, and then the log worth on the Y. Let's go ahead. We'll make a separate one for each of our traits. I'm dragging that onto the wrap. You can see here, this is the kind of thing that JMP is really interesting at. I t often will find outliers of the data. Here's one that's way out here. We've got a slope estimate of nearly negative 2,000. It turns out that this variable is nearly constant. The regression just blows up with an almost nearly vertical, or nearly negative, highly negative slope. It turns out this is more of an anomaly than an actual significant hit. It would actually make sense just to ignore it. But it's actually nice to find that it's in the table and be able to identify it. This is the kind of thing that JMP is often really good at, finding weird patterns. But to hone in on the key results, let's go ahead and narrow our axes down. I just hit the axis button, and we're going to just zoom in. Let's go minus 10 to 10. You can see here, you get this characteristic V shape, where again, we're plotting the slope of the regression versus its negative log p-value. For CWAC, we actually got, again, as we expected before, more hits than anywhere else. A bunch of markers for positive and negative slope, which would indicate a additive genetic relationship going one way to the other. For the other traits, these are also V shape, and many of them are just really a lot less significant and often sq uished in with one another. The slope also depends on the scale of the measurement. It's maybe not quite as meaningful if we put all these on the same exact scale. But I just wanted to show this for illustration, as a way to compare everything side by side. That's a GWAS. Moving forward, let's get to probably what our main objective would be, which would be to predict these traits as a function of the markers. Here, we do have access to all the great predictive modeling platforms that are in JMP. Some of these, you have to be a little careful to use. With missing data, you may need to do the imputation first. Some might become quite slow given the size of the problem. For today, I just want to show probably my favorite one, which is XG Boost, using the XGB oost platform. This is a case where I actually ran this beforehand, because, and it's to run.. . But I l oaded all six traits into XGB oost and did ten- fold cost validation. I automatically left out each of the ten folds. Here, you can see the results of that run, where we've got the solid lines here in these graphs, are the validation curves over the iterations and the dotted lines of the training. You can see with these wide problems, there's a severe risk of overfitting, especially with a powerful approach like XG Boost. You have to be very careful. As you can see, I actually [inaudible 00:32:18] parameters. I could tweak them down for one, and you can see the other parameters here. Within each model fit, we've got both the training, observed versus predicted, and the validation. You can see here for C WAC we got a correlation of around 0.43. Correlation is a typical measure used to assess performance. This is competitive with what was published in the paper before, without hardly much tuning at all. But then there's a lot of other interesting things you can dive into, the most important features, etc. We even got some new things for instance, one thing called Shapley values that I'd encourage you to check out. There's going to be another talk on this topic by Peter Hirsch, Florian Laura Lancaster and myself on that here at the conference, I would encourage you to check that out. It's a way to break down predictions into their components. That gets another level you can go into with predicting. That's just one example of some nice predictive modeling you can do. To wrap up the demo, I wanted to return back where we started here in this Genetics menu. We've got a marker, a brand new marker simulation platform. This is some pretty advanced genetic modeling carried out by our internal expert, Luciano Silva. What this does is it actually will do virtual crossing by the genotypes. The idea is you'd load the markers in. The really interesting thing is you can put a predictor formula here. For example, I save the predictor formula from the XGBoost model of CWAC. What this will do is both simulate the crosses and predict their performance. This is what modern virtual breeding does. You can actually virtually cross different loblolly pine trees and predict what will happen with them without having to wait 10, 20, 30 years to grow them in the field. Extremely powerful, interesting approach that revolutionized the way modern breeding is done, and why so-called genomic selection, or predictive modeling with genetic markers is so popular. I'll go ahead and conclude there. I hope that whetted your appetite with some of the new things we've got going. A lot of the things I showed today would also work with gene expression data, although that's a little bit different ballgame in terms of what you're trying to do. But for sake of time, I thought it would be good just to look at this one example and dive somewhat deep. Thank you very much for your attention. Let us know if you've got questions as you have them. We're really e xcited about the new things coming in JMP 17 Pro. We've got a lot more things coming in the works. Thank you very much. We recognize that lot of people that come to discovery, this may not be their area of expertise. But you may know somebody who's doing this work, and we would love to get them connected with what we're doing here at JMP Pro, because we are going to continue to invest in adding capabilities and improving the software so it can do work like this better and better to meet the needs of scientists across the life sciences and this industry. Thanks for listening in.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

A long time ago in a galaxy far far away… Actually, it was 1986 in Rochester, NY. Eastman Kodak had 60,000 employees in the community. Sales of photographic film (that stuff your grandparents used to take pictures before digital cameras) were expanding. Waste was too high and the product was too variable. After trying everything else, the corporate quality finally obtained a green light for an SPC program. Within four years, the variance for several key measures dropped by a factor of 100. Products that had averaged six formula changes per event went for six months without a change. Photographic film manufacturing is no longer important for most of us, but the quality improvement processes used are as relevant today as ever. They are also enabled by JMP. In 1985 we used pencil and paper and mainframe SAS. Data collection sheets, cause and effect diagrams, regression analysis and SPC charts are all facilitated today with JMP. Well, thanks for being here today. My name is Ron Andrews. I've got contact information listed here, so if there are any questions after the fact, you can reach me at these addresses. Going to be talking about quality improvement, a very, very general term, but this is specific with specific results from a project I worked on many years ago. This goes a long time ago in a galaxy far, far away… Or maybe it was 1986 in Rochester, New York, at Eastman Kodak dealing with photographic emulsions. A little history. Kodak had a corporate quality council that had known for years that we really needed a robust statistical process control program. Management wasn't buying it. They didn't want to pay for it. They promoted some less expensive options like slogan contests and pep rallies, and a lot of you know about how effective they are. By 1985, sales were hitting records, but so was waste. So the council finally got approval for an SPC program. Though the improvements I'm going to talk about are a small part of the total effort. Within the emulsion manufacturing at Eastman Kodak that I was working with, I was one of several engineers and a number of operators working on this, so I contributed to the results I'm showing, but I was by no means the leader for the whole effort. So why light-sensitive silver halide emulsions? It's kind of obsolete technology, isn't it? Well, yeah, probably. But there's still three companies that do this on a regular basis, and there are still a few million people who shoot film. Most of all, this is familiar to me, and I have some results I can share. I'll talk about the basic process and what do the chemists tell us? We'll talk about several different quality improvement tools like data sheets, and cause and effect diagrams, trend charts, and statistical process control charts. Then got to deal with the people side of SPC. It's probably more important than the statistics. And then I'll deal with a question that I had to deal with directly way back then: How do you do SPC when you only make six batches a year? Before I really get started, I need to acknowledge the leadership of two people. In our group of engineers, there was no appointed leader, but Carl Eldridge was clearly the point man. He had this nice, easy- going manner and could talk production supervisors into making changes that they really didn't want to. But he'd come in, "W e're just going to try this out and see if it works. "And if it works, we'll probably keep doing it and it'll reduce your waste." He would talk them into it. Kevin Hurley was also a key person. He was 2nd-floor Emulsion M aking group leader. He was a very capable leader and had the trust of all the people who worked in his group. They decided they really wanted to have control of the process. Engineers could decide the specs, but they wanted to control the process. Turned out to be a very good decision. Overview of photographic film manufacturing, and this is the 50,000 -foot level. We weigh out the ingredients. We precipitate the silver halide emulsions. We wash them. We take samples of each batch and sensitize them at three different temperatures, choose the best temperature, and then sensitize the balance of each batch. Then we assemble all the ingredients necessary for a coating event, and test each melt. A melt is a kettle ful. You got to melt the gel. That's where that term comes from. Then make corrections for the layers out of spec, and there will be some. In those days, it was a given. Then we coat a short pilot, and then we adjust the formulas, and then we coat a short re- pilot about a week later and adjust the formulas again. And then if things are looking good, we coat the remaining emulsions in one or two large runs and test the results. And if necessary, take the coated rolls back to the coating ally and apply filter dyes to correct the color balance. If it's not already obvious, everything in red isn't an adjustment step. These are things we did because we didn't always get it right the first time. It's basic product control. Kodak has some of the most extensive and elaborate product control methods I've ever seen or heard about. It's not necessarily a market distinction. I'm focusing on emulsions because the products that I was dealing with, basically Kodachrome and Ektachrome slides, the light -sensitive silver halide emulsions were by far the biggest contributors to variability. In the emulsion manufacturing process, we were still using the old school equipment. There were some computer- controlled systems, but we were dealing with open kettles and gravity flow from jars into the main kettle. The main kettle started with water, phthalated gel, sodium bromide, and potassium iodide. We had three jars: one prepped with silver nitrate, another with ammonium hydroxide, another with sulfuric acid. We start by running the silver nitrate through disc orifices. There would be a set of discs with calibrated holes drilled in them. That was basically our flow control. Now, gravity flow is extremely consistent if you keep the geometry consistent. Big "if" there. Once we had all of the silver nitrate in there, we formed a number of silver halide crystals. We pour in the ammonium hydroxide. Ammonia is a silver solvent. It dissolves the little crystals, and they plate out on the big crystals, so that's our growth step. Then we go into the washing step. We need to remove the salts, the nitrate and the sodium and the iodide… Not the sodium, the potassium. Excuse me. We add acid, which, first of all, quenches the ammonia reactions, and second of all, it gets the pH low enough so that the phthalated gel coagulates and drops to the bottom of the kettle with the silver. At this point, we siphon the supernatant liquid off and complete the washing step. Some effects we knew about. We knew grain size was proportional to the silver run time. That's the total time it takes for the silver to run into the kettle. If the silver is running longer, that means it was a lower flow rate initially, where the individual grains are formed. If you have fewer grains and add the same amount of silver, you're going to grow them larger. Temperature is also proportional to run time, as is the amount of ammonia. That's not directly proportional. It's very nonlinear. It's a very steep slope to start with, and then it levels out. In addition to grain size, we had to deal with fog. Fog is what you get when a silver halide crystal develops without having been exposed to light. We don't form images that way, so we need to minimize this. That's proportional to the free ionic silver concentration and to some extent, the temperature. Now, for any chemists in the group, the solubility coefficient for silver bromide is something like 5 times 10⁻¹³ . The free ionic silver concentration is extremely low, but it still makes a difference. Variation in this level makes a difference in the photographic properties. We prepared c ause and effect diagrams on paper, hand- drawn. I really wish we had a tool like the one in JMP, where you list the key p arent parameters. In this case, we're looking at grain size, and then we have materials, methods, etc., t hat might affect that. And then you move these child parameters over to the parent side and list the things that might affect that. As far as I know, there's no limit to how many branches you have on your diagram. Once you have this table made up, you identify the child column and the parent column and hit the OK button, and out pops the diagram. I don't know of another way that's as easy, and I'm pretty sure there's nothing else as easy when you have to modify something. Instead of moving boxes around on a graphic chart, you just edit one or two of the lines, or maybe delete one, add one, and hit the button again. That's all there is to it. Now, all of these items listed on this chart can potentially affect the grain size. But when it came down to it, the run time and the variation from one disc orifice to another, and the variation from kettle to kettle were the most important things. We also did this for the vAg. vAg is a measurement which is as close as we can get to measuring the actual free ionic silver concentration. We have basically the same things listed here, but in this case, it's the percent phthalation which affects the washing, and the siphon level which is directly related to the washing. These are the two critical things in controlling the vAg. Going through some of the conventional quality improvement tools, we had data sheets. We had 14x 17 ledger books, about six inches thick. They had years worth of data of several hundred emulsion kinds , and they were in a lab that was hard to get to. You had to go through a dark hallway to get there. When we learned where it was and how to get there, we started borrowing the pages and transcribed the data on the emulsion kinds of interest into SAS datasets. It's a lot easier to use things in digital form. If we'd had JMP, the data tables would have looked something like this. Each of the emulsion kinds had a four- digit number identifying it. We had sequential batch numbers. We recorded the date. We recorded the kettle used, and then we recorded a number of parameters. This is the run time in seconds. pHs after several different process steps, and the vAg at the end. This is an early trend chart. We hadn't put control limits on it yet. This is the run time. Significant variability here. We could' ve done extensive regression analyses to try to determine what's really influencing this. The first step was easy. We overlaid the kettle designations. It's pretty obvious. You don't need any special analysis to know these kettles are different. These kettles have been there for a long time, and it wasn't really possible to completely rework them, so we restricted each emulsion kind to a particular kettle. Kind 6001 was restricted to kettle 602. I'll get into more details on the control charts later, but just to show the data. This early unrestricted phase. We were not using control limits at the time, but this was our initial variability. And then we restricted the kettle, and we got a large reduction in the variability. And then one of the other engineers got the idea that maybe all those disc orifices weren't created equal. He set up some experiments and ran some water batches and timed them all, and found there were consistent differences with different sets of disc orifices. We restricted a given set of disc orifices to a given emulsion kind. We had a file drawer with a folder for each emulsion kind, and there were envelopes in there that had the disc orifices in there. We had to make more of them, but it's just a little disc of metal with a hole drilled in it, so it was not expensive. That also gave us another big drop in variability. A number of things we learned in next few months. I mentioned the phthalated gel that coagulates when the pH gets low. We needed the percent phthalation to be correct. The gel plant couldn't hit it exactly with a single batch. They had to blend batches together to hit the 4.5%, plus or minus the of tenth of a percent aim that we were shooting for. That worked if the batches were not too far apart in their percent phthalation, but if you had a batch that was very high in its percent phthalation and a batch that was rather low in its percent phthalation, when you mix them together and go through the wash process, that high-phthalation gel is all going to drop out to the bottom of the kettle, but only part of the low -phthalation gel is going to fall out. So we had variable amounts of gel being transferred to the next step in the process, depending on the decisions they made in mixing gel batches. We came up with a rule that mixed batches had to be within 1% of each other. It's not perfect, but it was a big improvement. We mentioned run time and our restriction on kettles and disc orifices. We also improved our measuring of the run time. We used to rely on operators watching the clock as they opened the valve, and watching the clock as the last little bit of silver nitrate ran out. We put a switch on the valve so that the clock started then, and we had a sensor in the line so that when the last little bit ran out, it stopped the clock. Better data always helps. We learned, quite by accident, that if you have a delay when you're setting up in the process and you cook the gel a little bit longer than usual, it loses buffering capacity. With less buffering, when you add acid to coagulate the gel, that pH is going to drop farther than what you really wanted. We discovered this during the trend chart phase in our emulsions. One of the operators looked at the data and said, "This lot 's different. All the pHs are different o n this particular batch." Looked at it and agreed, "Yeah, that's different. There's something really unique about this batch." And conversations with the operator, " Do you know of anything that happened different on this particular batch?" He volunteered, "W ell, I had a problem with the ammonia jar, "and I had to dump it and start over again, "so there was a delay in getting started." Another operator chimed in, "I had a batch that looked like that in terms of the pHs a while back. "Let's go look at that." And we dug out the data for that one, and the timestamps said, yeah, there was a delay in starting that one. The pHs all were more variable. They were farther off. The higher pHs were higher and the low pHs were lower. So we did more experiments on the bench scale and found, yeah, there was a real effect there. And the chemist volunteered that, yeah, they knew it could happen, but they had no idea that it happened this fast. So we put a limit on the gel prep, a time limit. If you haven't started using it within a given time frame, you dump it and start over. It really does make sense to dump a couple hundred dollars worth of gel and salt rather than adding tens of thousands of dollars worth of silver to that kettle and running a risk of dumping that. We also learned in the washing process, it was better to be consistent and imperfect than strive for perfection and getting greater variability. That is, our operators had long been told in that washing process, the good stuff' s in the bottom of the kettle. That silver and gel down there at the bottom, that's the good stuff. Don't you dare suck any of that out in the siphon wand, but get all of the supernatant liquid you possibly can out. The only problem was the coagulation didn't always have the same density. Sometimes it was nice and compact in the bottom of the kettle, and sometimes it was a little fluffy and took up more space, and you couldn't siphon down as far. Rather than siphoning down as far as possible, we got more consistent results when we specified exactly how far to siphon. For kind 6001, we went down to number 23 on the siphon wand. We put markers. Basically, we put a measuring stick along the siphon wand and had different designations for different kinds. If we really needed to get that free ionic silver concentration lower, we added on an extra washing step. We re dispersed the gel by adjusting the pH and then recoagulated it. Looking at the vAg chart, this was the initial area, and this is when we started restricting the kettle. Not much change. It looks like there might be a slight reduction, but I wouldn't brag about that. In this last phase… Well, okay, we restricted DOs here, but the real change is when we add a standard siphon level rather than siphoning as far as we can. That made a real difference. We had reduced variability, so we continued that. Consistency is worth more than the ultimate performance, especially if you can't repeat that ultimate performance every time. Early successes like these were worth their weight in gold. The enthusiasm and increase in morale that that brought about was possibly worth more than gold. It was priceless. Few things get people more excited than having them have their own results result in dramatic improvements in the product. How do you sustain improvements, and how do you keep learning? Well, I've already showed you some control charts, but SPC charts are really the way to go. As I indicated, we decided to make them operator -centered, as in put the operators in control of the process. Now, the people side of SPC is probably more important than th e statistics. Some people take to SPC like ducks to water, and some people, it's more like cats to water. Now, I know there are some cats who actually can swim, but most cats are going to react more like this one does. They're going to get out of that water as fast as they possibly can. Now, that 2nd- floor Making group, they were in the ducks to water category. The 6th- floor Making group, which is what I dealt with more often with the Kodachrome products, I won't call them cats to water, but they were skeptical. I had to prove it to them that this was going to work before they really bought into it. It took longer, but we did get there. I hope most of you are familiar with the work of W. Edwards Deming. I was fortunate to attend one of his four -day seminars back in 1992. Happened to be the last year of his life. He was 92 at the time. He was one of the preeminent quality control and quality improvement experts in the world at the time. The Deming Award in Japan is named for him. They still give that award every year to the company showing a considerable improvement in quality. If you are not familiar with him, first of all, look up Deming's 14 points and read them. Second of all, get his book. Well, he wrote several books. I think Out of the Fear was the last one. Read that as well. But point number 8 of his 14 points says, "Eliminate fear." Allow people to perform at their best by ensuring that they are not afraid to express ideas or concerns. Think about that operator that volunteered that he had made a mistake and that caused a problem with that particular batch. He volunteered that freely. I've been other places where operators are often punished for making mistakes, at least reprimanded. When that happens, they don't admit mistakes. They cover them up, and you don't learn things. You got to work against that. Everybody has to be able to freely express what happened, what good happened, what bad things happened, and to communicate freely. It opens up a whole world of possible improvements when you have a free exchange of information like that. Getting down to the SPC charts. As I mentioned, we started with the charts in control of the operators. To do this, you got to keep it simple. Not that operators can't learn to deal with complicated charts eventually, but it's going to take longer and the training process will be longer for new employees. It's worth something t o keep it simple. We used a chart of individuals. We omitted the moving range part of the chart. I know this may be heresy for some quality control purists, but we looked at that and said it doubles the complexity of the chart. We know it adds additional useful information, but it doesn't double the amount of useful information, so we're going to forgo that for now. We also use only two run rules. A point was out of control if one point was beyond three sigma, or two out of three were beyond two sigma. That was the only criteria. Obviously, there are six more traditional rules, and other sets have even more run rules. We kept it simple, and this kept us busy. We still had a number of out- of- control events to investigate, so it kept us hopping. It was about all we could handle. It's also necessary to think about what limits you're going to set. I think that's actually on the next slide, so I'll get to that in a second. I'm getting ahead of myself. We had daily meetings to assess the charts. Operators would present them. They would indicate points that were out of control, and engineers were there to comment about what we know about it and help investigations. Most importantly, we had celebrations for out- of -control situations. Literally. When an operator indicated that something was out of control, we'd say thank you. Thank you for sharing that with us. Let's see what we can do working together to find out what happened and maybe fix something. Here's that slide that I was getting ahead of myself with. How do you set the limits? Purists insist that the control limits must be based on short-term variability. That's the definition of control. The process is in control when short -term variability matches long- term variability. Pragmatists know that even if you set the limits a little bit wider, say maybe take the first 30 points, take the standard deviations, set the limits of three sigma, even at that point, you're still going to have out -of -control points to deal with. If alarms happen too often, they're going to be ignored. Set the limits that are a challenge and achievable. You got to walk that tightrope. Now, I would suggest deciding how you're going to set the limits and then stick with that method until you decide you have to make a change. Don't just do it totally on a whim, but set a definition that's comfortable for your situation, and run with it. Most of all, you got to keep striving for continuous improvement. Looking at the results. Now, so far, I've just been talking about the emulsion making operation. The next operation, the sensitizing, is where there's a considerable boost of the photographic properties. We test the photographic properties after the sensitizing step. The lot -to -lot standard deviation for the photographic speed dropped from about 10 units, that's about a third of a stop for those familiar with that photographic term, to about 1 unit. Actually, it was lower than that because the standard deviation of the test process was about one unit. We had more than a ten fold reduction in the standard deviation. If you want a more impressive statistic, we had more than a hundredfold reduction in the variance. The formula adjustments from one coating event to the next dropped drastically. We had some products that went from six changes per event to zero changes over a span of six months. When we started this, we had no idea that we could possibly get anything that good. Now, I want to get back to that question I posed earlier. How do you implement SPC when you only produce six batches per year? One of my particular products was Kodachrome 25. That was a old and venerable product that had once been quite popular and had been assigned to the larger kettles. But a lot of the market had switched to higher- speed products like the Kodachrome 64 or the even higher -speed Ektachrome slide films. It was a rather small runner by the time I was responsible for it. A couple of emulsion kinds, we only produced six batches a year . n equals 6 is not very good for statistics. My answer to the question is what I call creative swiping Simply copy the procedures that were found to be useful on the large -running constituents and copy the same ones for the small runners. Now, they're the same class of emulsions. Same basic technology, gravity flow containers, ammonia digest, phthalated g el, coagulation for washing. We're using the same basic process. You find out what works in the large runners, apply it to the small runners, and we got similar improvements. By the way, these charts with the blue background, these are actually scans of 35 millimeter slides that I used in an internal presentation at Kodak back in 1988. They were computer -generated by a firm called Genigraphics. I think they charged $6 a slide. A lot of things have changed since then. This is looking at the vAg in finishing. Previous data I'd shown was in the making operation. Finishing is the sensitizing step. This is the last step before you put the emulsions into a coating event. We got a significant reduction in the variability. Now, contrast balance. I got to explain this. One of the most important things of a color film is you have three different color records: red, green, and blue. You got to keep the contrast of those three different records the same. They got to match each other. If they're all a little bit off, it's not too bad, but they got to match each other, so the contrast balance is the most important parameter. If it's off, you could end up with green highlights and pink shadows, There's no way people can correct that in these pre -Photoshop days. In 1987, we had a pretty wide spread of results in this two- dimensional plot. The hexagon h ere are the spec limits. This 95% confidence ellipse indicates there will be more outside of spec. There's one here, but there are going to be more over time. By 1988, we'd collapsed the variability down to this nice, tight little group centered pretty close to the center of this hexagon. This made my work, my job, so much easier, especially in terms of adjusting things from one coating event to the next. They became smaller and smaller adjustments, and eventually not having to adjust. In summation, there are many standard quality improvement tools. You don't have to use all of them. Pick the ones that fit your particular situation and use them. Technical staff should define the formulas and specifications. We found a huge benefit to having the operators in control of the process. They're going to need plenty of support, but this is the only way to get the really rapid feedback on what's actually going on. You got to keep it on the simple side to make this work. And most important of all, you got to celebrate those opportunities to learn and make improvements. That's the end of my presentation. Repeat the contact information. If anybody has questions, I'll be glad to answer them. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

The poster summarizes data exploration and machine learning modeling techniques applied to Consumer Assessment of Healthcare Providers and Systems (CAHPS) response data. Through the use of JMP unsupervised machine learning techniques, the presenters will identify patterns in responses. These patterns will be summarized as patient group/profiles which can inform the design of tailored care delivery models. Hello. I'm Renita Washburn, a PhD student at the University of Central Florida in the Modeling and Simulation program. Today, I' ll be presenting… Sorry. I know we have to do it in one session. Dr. Amon, really quick, do you want me to say mine and then you say yours and then I do the title? You could restart. You could restart if you'd like to. We've got time. Sorry. You can just introduce me to… or whichever works for you. I can go ahead- I'll just stop that I'm in the program and then you can introduce yourself and then I'll keep going. Okay, sounds good. After I say in the school Modeling, Simulation & Training at UCF then you can pick it up, okay? Okay. By the way, this will be, like I said, post process, so we can all edit all of that stuff out. Okay, sounds good. Okay, I'm going to go on mute again, and once I do, it's all you. Hello. I'm Renita Washburn, a PhD student in the Modeling and Simulation program at the University of Central Florida. And I'm Dr. Mary Jean Amon. I'm an assistant professor at the University of Central Florida in the School of Modeling, Simulation, & Training. Today, we will co- present our poster on identifying patterns in patient experience ratings with machine learning clustering techniques. In this poster session, we'll summarize the objectives, method, and results from the exploration of these patterns in patient experience ratings. Patient experience ratings were obtained from the 2019 Consumer Assessment of Healthcare Providers and Systems, from here referred to as CAHPS, response data, and limit it to patients seen by primary care provider. We use JMP's machine learning clustering and data preparation tools to identify four patient groups based on their survey responses. Cluster analysis is a machine learning technique used in many industries for customer segmentation. The goal is placing customers into groups based on similarities within the group and differences between the groups. In healthcare, exploration of customer segments provides insights on possible differences in care journeys and experiences, such as disparities between race, gender, culture, or health status. Identification of distinct groups can inform the design of tailored care delivery models. The project's three objectives were to, first, conduct a hierarchical cluster analysis on categorical survey response data; second, identify clusters through visual inspection of dendrogram and color map partitions based on their journeys, which was measured by survey questions related to link the relationship with the provider, utilization of services, and level of care management; and lastly, conduct pos t-hoc analyses to explore differences among clusters in their ratings of the provider, their overall health, and overall mental health. Before we dive into details of the methods and findings, we'd like to acknowledge the US Agency for Healthcare Research and Quality and Westat for providing the identified C AHPS data for this effort. The CAHPS data is used to gain insight into the healthcare experience from the patient's perspective. The 12 selected questions are intended to capture a patient's journey and interaction with your primary care provider over the last six months. The questions again focus on length of relationship with the physician, how the patient interacts with the physician's office for routine and urgent care needs, and the level of care coordination for ancillary services requested. Prior to initiating JMP's clustering tool, data preparation, including assigning the appropriate data modeling type for the survey questions. The data modeling type was either nominal, yes- no or not applicable, or null, a [inaudible 00:03:43] scale. Another data preparation task was reformatting of select questions. The CAHPS survey use this scale logic. For example, one question asked in the last six months, did you make any appointments for a checkup or routine care with this provider? If no, skip to next question. It was determined that the skip questions were relevant to the exploratory analysis. Therefore, values are recorded from missing to zero, which JMP refers to missing not at random. The last step of preparations that we highlight is related to the missing values. Instead of addressing this prior to modeling, we use JMP's built-in missing value feature to impute, to replace with estimates those missing values. This is an option selected from the clustering menu. Given the Likert scale questions, we hypothesize that the data was hierarchical with likely subgroups between the data. Hierarchical clustering with the ward distance method was applied, and the output was limited to four clusters for ease of interpretation. The ward method was appropriate for the categorical data as it did not require pure measure of distance. Instead, it builds clusters based on an analysis of variants like in Innova. A color map was added to the dendrogram output to aid visual comparison on response differences across the groups. Unique patterns within and differences between clusters were summarized based on low, medium, or high maintenance. Meaning how much access to care was used by the patient such as frequency of routine and urgent office visits or contacting the office during or outside of regular hours, as well as how well patients believe the office was managing their care, which was weak, higher, sufficient, defined by ratings and follow- up for lab and prescription needs. The cluster output was saved and assigned to each response for the ad hoc analysis. Now, the primary focus of the project was comparing clusters on three key ratings related to the provider, their overall health, and overall mental health. However, with JMP, you can use the cluster assignments to explore the distribution of demographic data as well as other question responses between the groups. For future analysis, we recommend exploring differences in age or race distributions between the maintenance management- based clusters. We were interested in understanding if there is a relationship between the cluster assignment and the patient's ratings of the provider, their overall health, and mental health. Visual inspection of the mean scores for each of these three variables suggested that there may be significant differences based on cluster. For example, high health maintenance patients who utilize more healthcare services but also have satisfactory ratings for lab and prescription management also appear to have higher ratings of overall health and mental health. If we go to the next slide, these observations were further examined using JMP's contingency analysis, which is a method for examining the relationship between two categorical variables. We identify statistically significant differences in provider, overall health, and mental health ratings based on the patient cluster, which further highlights the utility of our clustering approach in identifying meaningful patient groups. Overall, understanding the relationship between each group's care journey and overall experience and health ratings can inform the design of health care practices such as enhanced communication channels during non- regular office hours or care navigation services to aid with follow- up of lab and prescription management. Thank you for viewing today's session. We welcome your questions and comments.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

Parametric survival models are generally effective for describing personnel movements both within and external to an organization. The State of Florida has published employee data on a weekly basis for several years, enabling analysis of job changes and separations for approximately 100,000 employees representing a wide variety of professions across the major Standard Occupation Classification (SOC) codes. Further, data collected over the past five years also incorporates the advent of the COVID-19 pandemic, capturing the varying influence of this major event across the professions. JMP Scripting Language (JSL) was used to prepare and analyze this large data set to visualize the divergence in employee behavior between roles and under the influence of the pandemic. Due to the unusually close registration between Florida’s job codes and the federal SOC system, which is linked to Department of Labor salary profiles, these data and analyses provide an open-source and broadly relevant view on personnel behavior in both periods of stability and crisis. Hello. My name is Thor Osborn. I work at Sandia National Laboratories as a systems research analyst. That's basically a combination of operations, research, and investigative reporting. I'm going to present an analysis of personnel movements pre- and post-COVID for a large organization. In this case, the large organization is the State employees of the State of Florida, the State of Florida government employees. I'll say that that's fortuitous because that's part of their transparency policy. So we can look at that data, anyone can, and I'll give you the link for that. The data that I'm going to be showing analysis of is from August of 2017 through July of this year. Why do this? C OVID-19 pandemic and the mitigations that have been instituted to address that have been cited as catalysts for substantive changes in the workplace. For example, 5 or 10 years ago, work from home was considered an unusual opportunity or a temporary thing to address some kind of temporary issue like a [inaudible 00:01:20] . It wasn't considered something that most companies would do a lot of on an ongoing basis. Now, it's considered a thing that job seekers may rate companies on, if you take a look at the web, for example. Also, a nursing profession was especially impacted by C OVID-19 response, extreme hours, burnt out. But that has, in a lot of cases, led to exodus from the profession. F or many who' ve stayed in the profession, departure from typical long- term employment like working in a hospital in favor of traveling nurse or concierge contracts where they get paid a lot more and they have more flexible hours and aren't committed to, say, 12 hours since one after the other in a hospital. I'll say that the hospital, our end turnover went up to 8.4% from 2020 to 2021 to about 27% per year, according to the NSI National Health Care Retention Report. RN vacancies, meaning slots that hospitals want to fill, have gone up from about 8% in 2018% to 17% this year. So basically, one in every six nurses who's supposed to be there to help give care isn't. Now, motivation for doing it the way I'm doing it. Periodic tabulation of movements or rates is a typical business approach to business reporting, and almost every company does this. But it may obscure underlying behavior patterns because tallies don't tell you the micro behavior, and time- to- event analysis will enable a deeper look. I like to use parametric time- to- event analysis for this because the parameters can be informative. But to do that, you have to have a lot of events, and to get a lot of events unless events are extremely frequent and need an even larger population. This is fortuitous that the State of Florida makes weekly employee data available for about 100,000 people. Quick synopsis of the show for those with no attention span or just to help me and you, the COVID-19 pandemic was implicated as catalyst for many changes. I went over that. A longitudinal examination of behavior based on the evidence from a large organization seems timely. We need to look at these things. This is a natural experiment of magnificent or awful proportions. The data available on a weekly basis, as I said, straddle the beginning of the COVID-19 pandemic. This is a fortuitous collection. I started it on Intuition back in 2017, but then things happened. The State of Florida's decision to build its broadband structure around a categorization system that mirrors and links up with the federal SOC, or Standard Occupation Classification code structure, also provides a well established and readily available frame of reference, meaning you can get it and it's free, and it's reasonably well worked out, documented. You can look at employee populations, you can look at hiring, separations, all longitudinally within that framework at varying levels of specificity, and that's pre- framed by the SOC structure. The fact that they've melded with that structure provides an easy window into that level of analysis. I finish off with an analysis of the nursing profession as represented by registered nurses. That demonstrates what I'm calling a substantive difference. It's definitely visible in personnel flows between the pre- and post-COVID timeframes. This is an example. I haven't been able to go into the level of depth that I would like to with this analysis and this data set, but I wanted to show at least an example of what I was talking about [inaudible 00:05:36] . Again, just to really beat this one down, an unusual opportunity. Typical practice in HR is to frame salary structures in context with other similar organizations. Salary information is generally compiled by a consulting firm in HR from a collection of organizations that chose to participate in a defined pool for survey and referencing purposes. They don't do that for free. It costs a substantial amount of money. Now, the BLS also compiles salary surveys of its own on a national and state basis with jobs categorized by a standard structure, which they call the SOC. That data can be downloaded for free, and the State of Florida has referenced its structure to that. It's fortuitous in a way for them, because they don't have to pay for seller surveys if they don't want to, because it's all referenced against the federally established free data set. Now, I'm going to show if I can. Here's just a table. Apologies if it's a little small. A table showing broadband code right here, 10, that means Executive, 1011-03, Chief Executives. The point is, everything in the Florida set, this is about 3,000 codes. Except for a few recent ones, everything in the Florida set is referenced against this. The first six digits are the six digits in the SOC code. The first two are the major code. It's the job family, like 10 is Executive s, 11 is Management, 13 is Business jobs, and then a four- digit code for more specificity. In the case of Florida, they have an extra two digits which denotes a job level within their salary structure. But this framing allows you to link things back to the SOC. What did they give you? They give you an agency name of which there are 33 state agencies; budget entity, an office within the agency; a position number, that's a position within the agency; employee names. I'm not showing you that because I feel uncomfortable even though when you download it, you can obviously see who's who, whether the person is salaried or exempt hourly, or other personal services they call them, full- or part- time. A class code which is a code that indicates both the profession and the level. A class title which is essentially the same thing in words. State hire date, which is the first date that the individuals hired by the state. They could have had many terms of employment, come and left , but the state hire date is a fixed point in time for each person salary or hourly rate if the person is doing an hourly job. Again, this is freely available at the link noted on the screen. Just for a bit more framing, long- term view of wages in the State of Florida, looking at BLS, Bureau of Labor Standards data for SOC, is 00-000, just a weighted all occupations number. It covers everything. These are a lot of people, so I don't have error bars, 130 million people nationally and 7 million employees in Florida. What you see is that Florida's salaries , the blue line, are typically less than national, but they've been tracking pretty closely. There's not been much relative change in a long time except for this past year. Sometimes there are revisions. I'm not going to say this is necessarily meaningful. If it is a real difference then obviously be interesting to know about that. I haven't seen anything reported about that though, so I can't give you any further insight on that. If you look at Florida State employees versus typical Floridians, I don't have enough data in the set to really say very much, except for it looks like being a state employee is fairly attractive, at least if the jobs are typically comparable. There's no overriding incentive for people who work for the state to leave to go into the private sector there based on this. These are for median salaries, annual salaries. Looking at the Florida State employee population totals in the green line here starting at around 100,000 for exempt staff, doesn't include the hourly folks in either case here or here. Looking at separation rates and hiring rates as nine- week moving averages to be about two months as a centralized moving average, with JMP's usual capability for handling the endpoints. What you see is that for a fairly long time, except for this spike, which again, I haven't found anything to explain in the literature, nor in HR reports published by Florida. Pretty constant. After the pandemic hit, there was a long time period where the hiring rate was below the separation rate. So people were slowly leaving Florida. You can see that here in a downward slope on the green line. And then just this year, that stopped and began to reverse. Now, to be clear, the population is only salaried workers, only those holding one salaried state position at all times. Anybody with two salaried positions was removed because it could be a flawed data or it could be a very ambitious person. But I can't handle that with the time- to- event data because it's hard to understand exactly what a separation means when you still have a job at the same place. But it's only less than half a percent of the total people, so it shouldn't be a huge perturbation. Now, when I show this is a bit of a demo as well. Florida State personnel flows by SOC major code. But you can see on the right table… Here's the population by SOC major code, every individual grouping over time. This is code 43. That's Office Administrative Assistants. What I've done is I've used the hide and exclude capability to remove everything except for six codes, which are the largest codes. You see, the Administrative Assistants is 43. And also down here, 19 for Life and Physical Sciences is included, Business is included, Manager is included. What I'm trying to say here is simply that this is only including six out of something like 20 or so major SOC codes. But these are the largest. Using graph builder, it only shows those. That's really all it amounts to. Business and Finance, Community and Social Services, that we code 21, code 19, code 11. Now, one thing you'll see with Manager is that the hiring rate is always quite a bit less than the separation rate, and yet the net number of managers is roughly the same, and that's because only about half of the managers come from external sources, a lot of them come from internal promotions. You see this population over time, despite the vast difference in separations versus hiring, that's simply because about half of them come from internal. Now, you can also, again, as I was saying earlier, you can do detailed codes and the same principle applies. All I've done here is I've only included three SOC detail codes. The 29 major code, which is Medical Professionals, the 31 which is Support Folks in Medical Work, and then back to 29 again for Registered Nurses, but this is the Nurses and Nursing Assistants taken together. This code is no longer used and hasn't been for a while. But Florida set up its code system about two decades ago and so it's been kept in and they use it even though it isn't part of the standard SOC now. But the bottom line you'll see from here is that Florida is not attracting enough nurses to compensate for attrition. If you look at the State of Florida HR R eports, what you'll see there is that they think most separations are voluntary, about 92%. The number of authorized positions in the health agency has only been reduced by about 5% in the last several years, and yet the number of RNs has dropped by about a fourth. You can see that the number of nurses is falling rapidly compared to the allocation of nursing spots. If you go to the State of Florida website and look for a job in nursing, you'll see that there's plenty of opportunity. They've been trying to hire. Now, I am going to show some time- to-e vent analysis. I'm not going to show the script work that generated the data set for this, because although I find it fascinating, I know that a lot of folks don't do scripting. It's essentially an inference between who's there and who wasn't. If you go from one week to the next and people disappear and you've allowed for the fact that people do name changes sometimes, which requires coming up with a different way of IDing people to straddle the difference. Once you've accounted for that, then they must have left. Having left, that's a separation. They can also get promoted, and you can see that because one week they have a job, and then the next week they have a job that pays better, often the same general line, but with a different title. Capturing those movements is a bit of work but it's pretty straightforward, really. What you see here, I tried to capture four different kinds of events: demotions, a lateral to another SOC, could be moving out of the nursing profession, but nevertheless haven't changed their salary much, promotion, or separations. Separations is obviously the dominant factor here in terms of total counts. I'm using the Weibull typically because I find it more informative and it's not a bad fit. Post-COVID, you see a very similar curve, more promotions, relatively speaking. That's interesting. Now, here is the detail in tabular form so that you can see all the different pre- and post- cases for the major movements, lateral movement, promotion, and separation. What I'm talking about though, let's just go back to pre-COVID. Here's a subset of Exit Events, essentially exit from the status as an RN to whatever they moved to. Just to make it clear what was done here. If you relaunch, what you see is that I have a Censor column, just ones and zeros. The Exit Event is however they exit, or if they didn't exit, then it's just an active person in the field and they are not marked with a censor code. The Employment Segment Span, which is how long have they been employed in that particular segment of employment. Now, see that the number of laterals is really small compared to everything else. Promotions is definitely visible. Another thing you can see if you go down and look at separations is that the Weibull beta, which you can think of as the acceleration factor, even at the high end of the 95% limits, it's still below unity, and below unity means that people are less likely to go through that transition as time goes on, less likely to separate the longer they've done there. That's straightforward. You'll see it here. In fact, that's also through post-COVID, same basic beta factor or parameter, rather. Now, I'm going to show the post-COVID. Again, this is the same basic analysis. This is what happens when you do live demos. Something goofy with this one. Now it's giving me grief. Here, you see it's basically the same thing. Lateral is distorting because the lateral, there's only two counts. If you just get rid of that , you can see a much more clear picture. What you see is that the Promotions piece is moving up faster, 50 versus 744, whereas pre over a longer time spent, it was about 50 for about 1,200 separations. There's a predominance is shifting there. Going back to the more convenient layout here. Pre-COVID promotions were in this range where beta was a little over unity. But the 95% limits basically tell you that that's ambiguous. It could be really anywhere between a bit below and a bit above unity. Post-COVID, it's about 1.29, and within these 95% limits, always above unity. In other words, it's accelerating with time. The longer you go, the more likely you are to go through a promotion if you stay in that job. Here with the lateral movement, there really was never enough counts to do much of anything with that. The limits are very broad. I wouldn't put too much tal k in that, regardless. Now, if you put these on a common scale just to make sure that this isn't too confusing, I hope. You see very similar. I've shifted the color for the post-COVID case a little bit. On a similar scale, if you didn't superimpose these, promotions are clearly accelerated, p ost -COVID. Clearly a bigger impact, they're more opportunity. We do know from many news reports that people who are closest to retirement, often within the COVID complications and changes, simply moved forward with retirement more quickly because they wanted to get out other than deal with things. There is a shift here with the separations, and it does look real, but it's also small enough compared to the overall magnitude that it isn't quite as obviously different. In either case, the separation rate is similar and not changed overly much. This is a factor of two. This is a factor of a few percent. To conclude, wages in Florida have run lower than national values typically over the last decade, but haven't proportionally changed much. There certainly doesn't seem to be any obvious change in Florida salaries that would cause people to suddenly leave. The State of Florida's registered nurses have enjoyed greater and earlier promotion opportunities post-COVID. But I think it's also worth noting here that they work in a health organization for the State. This is not a State hospital. This is a health management, health support, health education activity. It's not 24/7 in a hospital. That moderates expectations. But you might expect the separation behavior among their RNs would change because opportunities have changed in the private sector. There's a lot of demand. On the other hand, state employees, they might be thought to be comfortable. I was expecting my hypothesis was that they would be more likely to separate, but that didn't happen. There really is no apparent difference. Now, this is not a complete bibliography of everything that I read in the last five years, before and after COVID that may have influenced things. This is just a handful of things. I thought they were fairly telling. The National Healthcare Retention & RN Staffing Report is a fairly thorough assessment of what people expect in hospital administration and what's actually been happening in terms of the employment and the separations, the turnover behavior of nurses. Three State of Florida annual reports. They do an annual report on a fiscal year that straddles two calendar years. The last one available is 2020. But essentially, they're simply reiterating that, yes, they have a number of open slots, they don't have them all full. Employment and nursing is dropping. They don't have any explanations for these things. I also don't have any explanation for the spikes and activity earlier, pre-COVID. I have a question into someone in the Governor's office there, but I haven't heard back yet. That's basically all I have for this presentation. I would be happy to entertain questions. The slides show at the beginning, there's my email. You can contact me if you want. Thanks.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

We don’t live in a static world. Dynamic visualization and visual management are essential elements of Lean Six Sigma; they link data and problem solving. As with detective work, it is important to be able to spot clues and patterns of behavior in a situation. Establishing a visual environment enables rapid processing of large data sets, which leads to quick detection of trends and outliers. The goal of Lean is elimination of waste. Waste is present in many forms, such as waiting for information, moving data to multiple sources, and over-processing data. Data visualization allows for reduction of these waste streams. This presentation provides a real-life case study where JMP is utilized to help “move the data to a story” in a visual way that aids in communicating information, eliminating waste, and driving continuous improvement. This case highlights the use of JMP tools, such as Excel import, Query Builder, Graph Builder, data filters, control charts, basic modeling, reporting, and dashboards. The presentation also explains how visual management helped engage and empower employees throughout the organization. Hi, everyone. Thank you for joining us for our presentation of From Data to Story: Using visualization to drive continuous improvement. My name is Allison Bankaitis, and my co- presenter is Scott W ise. A little bit about myself. I currently supervise a small team of process engineers at Coherent Incorporated, but I'm still very involved in the daily process engineering efforts. Previously, I held various process engineering roles at Corney Incorporated, and I'm very excited to show a case study of how we view some of these JMP tools in our process engineering work. A little bit for Scott. Thank you, Allison. I'm Scott Wise. I'm from JMP in support Allison's JMP usage, as well as other customers in the Northern California area. And I'm just real excited to be a part of this really cool case study. Hopefully, you'll pick up a lot of best practices and tips from some of the things that helped us. All right. I placed the abstract here for future reference, but just wanted to highlight a few things. Coherent has placed a recent focus on Lean, which aims to eliminate waste. Tools from JMP have aided data visualization, which in turn has enabled reduction of waste. Another advantage of these tools is the ability to engage and empower employees throughout the organization. These areas will be the focus of this presentation. Our first section is about eliminating waste in the data collection process. In this case study, we had a data collection process with unnecessary complexity. It used to take 20 minutes to process one part. So to do this, we had built a data query and access. This is just a screenshot here showing an example of a few data tables where we combined variables from various tables to get the output that we are looking for. And then we used a macro to pull data for an individual part into Excel. This is again, just a screenshot of an example database connection in Excel and the code that we would write in Excel. This was, again, done for each individual part. From that data, then we could attribute data in Excel. We could then calculate average values of each attribute. We would then pull additional data from our MES website, such as part type or other items listed here. Then all that data was copied into an Excel summary log. So we maintain the log, but we weren't really doing anything to track or analyze the data. With JMP, I was able to streamline the process. I built this framework in about an hour and reduced process time to five minutes per part. And in this case, this is just for one engineer myself, on one product that I've worked on. But if we can extend this to multiple products and multiple engineers, we could really gain a lot large savings of time. So to do this, I built a data query in JMP, which included both the attribute and MES data in one location. Just a screenshot of there'd be several tables here pulling in the data, the different variables here, we can do some initial filtering in the data query. So in this case I selected a time frame that I wanted to focus on and then a subset of variable that I down selected, so I don't manage both data set. And then I can build the data table here and always clean up more of the data later on as necessary. After I had the data, I replicated some of the charts that we already had in Excel, just made them very similar so that people could see what they're used to dealing with for the time being. After that, again, I built the summary table to replicate what they were used to seeing. I calculated the average attribute data and merged the original table and the tabulated table into one summary table so that they could have an output what they are used to seeing. The next thing is to take this data and move from data to story. So to do this, the first thing I was curious to know was what does the data tell us about current performance. So I plotted the data over time as my first aspect and I will show that to you in JMP. So just using this graph builder and the timestamp that I chose and then the main output that I started looking at, added that to the chart here. What I used to do is manually go in here and add reference lines using this field here. But what Scott showed me, which is really neat and then extends to all of the graphs, is that you can add it directly to the data table, you can add the spec limits. You just go into the Variable of Interest, Column Properties and go down to Spec Limits and add the values in here. This is checked so that you can see the graph reference lines on each graph. So once I had that output, I could see that there is a large amount of variation in the data and many of the values were outside of the spec limit. So the next thing I wanted to do was compare additional variables. So to do that pretty quickly, I was able to just add the column switcher to this graph I already had by going here and selecting the variable I wanted to change along with the other variables that I trust. Then from here I can quickly click through all these variables and see the variation in each one. Next for me, I have some process knowledge and I'm sure you would have process knowledge of your situation as well. Based on this process knowledge, I was able to select a variable that I thought might be driving some of the variation. In my case, I thought that X3 might be responsible for driving some of these trends that I was seeing. I put that into our graph here. The other piece of process knowledge that I have is that our spec is based on the average value for each part. I changed this to mean and then I was curious to see the line of fit over time, so I put that and added that here as well. The next thing I was interested in seeing is a little bit more about performance by looking at a control chart. To see the control chart, analyze quality and Process Control chart builder, and I was curious to see it against X2 , which is a part number and that same variable that I've been looking at. A gain, I was going to split it out by X3. From here, we can see there's a shift in the average based on which subset of X3 . A lso the thing that was obvious to me is that the sample sizes were uneven. To me, knowing the process, I know they should each have 10 collections of data for each part. So based on our process, I said, well, to get an initial look at the performance, I'm going to limit to only parts that have 10 measurements per. To do that, we made a new data table, cleaned up the data again, and once I had this, I recreated the control chart with just a small change. Then here I added a local data filter to have X3 split out on two separate graphs. That was my learning. Now I can see these upper and lower control limits and this process capability chart, since now I have the even subgroup sample size of tech. That's where I will hand it over to Scott. Thank you very much. All right. I'm going to pick up with the rest of the story. Allison has done a great job of understanding where the current performance of her process was. But we also thought there might be some other key variables within her data that could be useful for explaining these differences we're seeing in the output. One of the things that we tried was actually a modeling tool that's very simple and often used to screen for important variables. It's called a partition. In this partition, all you have to do is of course, you're going to pull up your data and then it is under predictive modeling. People call this a decision tree, and I'll show you why when we start to fill it out. But all you got to do right now is give it an output that's our 21 there, X 21, and get it the inputs we want. I'm going to put all the inputs in except for X2 , which was the kind of a part ID. I'm going to remove that one. There was another one that Allison recommended I removed, given her process knowledge, and that was X8. But we'll leave all the others in. When I say Okay, it brings up the start of a decision tree. What it's doing is saying, I can make a bunch of splits and I'm going to look at all the inputs and I'm going to try to find a cut point. We'll breaking basically any of those variables into two groups. Will that give any explanatory value toward the differences I'm seeing in the output? In this case, X 21. So if you make the first split, it's saying that I've explained 27% of all the difference you're seeing in the output via just splitting X 19 at 500. If it's greater or equal to 500, I'm going to have a much lower mean of 12. 67. If it's less than 500, watch out, it jumps up to 13. This is really cool to find other things I might want to split, break, view on my graphs. You can continue splitting and it will look at other variables like X3 came into place here and Allison already knew that was going to be an important variable. A s you keep splitting, you can see it starts to add in terms of the predictability. This RSquare, the closer to one, the more predictable. So it's like 56% predictability here. I've gone ahead and done that, I'll show you what that view looks like. Here's the finished view I came up with. I've got these nice big column contribution bars here at the bottom. You can see that X 19 got split. Actually found five cut points for X 19, but 52% of all the splits it was doing involved X 19, so it gave it a nice big bar. The next three would be next. Then everybody else was very small contribution or no contribution. It leads us to say, "Hey, X 19 might be important and it reinforces X3 being important." Now that we have that information, well, how confident are we that these things do belong in our study? Here it would be nice to look at X 21 by X 19 broken out. This one, of course, is going to be just simply going back into our graph builder. This chance we can put the X 19 down on the bottom axis so that would be the only X. Let's go ahead and put our X 21 right there on the Y. We can break that out by the X3 variable, which is pretty cool. Now, one thing we might want to do, X2 was the part ID. We can give it some color or some overlay. Either way, I think I will just go ahead and give it some color here and I will turn off the line. That's helpful. But what would be helpful is to use that local data filter that Allison showed, in case they want to really look at a specific sequence of parts. I'll go under the red hotspot there, that red triangle. I'll go local data filter and then we'll add the X2, and beautiful. Now we can go and just change up our view by that local data filter. That was a cool view that we've got. I can see that it's making a lot of differences there. Now one thing you might ask is could we even model this? Before I even go and model it so we can make some predictions, how sure am I that X3 and X 19 really are affecting X 21? Well, we can actually do a statistical test. We can test means. The way we're going to do that here is we are going to go back to our data. We're just going to go to Analyze, fit Y by X and now we're going to go into our output. We want to look at things either the effect on X 21 by those things we care about, X3 and X 19. I'm going to put them both in here and it's going to give me some different views. It's going to enable me to compare means in this one way analysis. I'm going to right click, I'm going to turn on the means test. I'm going to right click here. I even like this all pairs too key. I'm going to adjust our axis here. It's got these cool means diamonds. The middle of your diamond is the mean. The edges is your 95% confidence around the mean. The way it works, if you would slide these things over, would they overlap? It looks like they would pass like ships in the night. There's no overlap. As well as you got these comparison circles, you can click on one and see if the other one turns a different color. A ll this is based off a 0.5 alpha. What does that mean? That's your confidence so that's 95% confident. We'd be right 95 times out of 100 to say input X3 does have the level there, is having an effect on what my observed measures are for X 21. Given that before I go and try to fit a line or a curve line, I can go under this red triangle hotspot. I can go, you know what, let's go ahead and group by X3 . Now when I go back under this triangle option and I go to fit a line, or in this case, I know there's a little curvature, so I'm going to fit a quadratic line or polynomial line. Now it broke it out by X-3 so I'm really, really excited about that one. The blue line, which is his first version, 3-0, there's the formula for it. It only has 20% explainability. It's not a great fit, but you can see that jumped up to near 70% predictability for the 3_1. It's telling me that I've got not only significance in saying X3 is different and I'm seeing a difference when it comes to X 19 by X 21, but it matters for X 19 what level of X3 we're talking about. That's why the red line and the blue line are not on top of each other. Therefore, that's an interaction. If I'm going to try to predict something, I need to include that. So at this point, I think I have all that we're going to need to do to get in the hands of Allison Spears, a really cool tool that can help them predict what the output is going to be based on settings of X3 and X 19. You're seeing on the screen a profiler that comes off our modeling platform, and it's very easy to go and set up. If we go back, I'm going to go to the fit model here. We'll do our output for X21 again. Under my inputs, I know X3 and X 19, there it is, are very important. I told myself X3 and X 19 might need to be crossed, I might need to see those interactions. I know for X19, there's some curvature. The way I would check for this is I go, I'd select X 19, I go under this macros, and I'd say polynomial two degree. I have it set at two so I would get this curve term, polynomial term here. There's the interaction, and these are the main effects, so it's really two factors, but it's the four things in my model. So I'm just going to run it and it's going to try to fit a line. This should look very much like the fit Y by X. It's only really explaining 52%. This model is only explaining 52% of the differences I'm seeing in x 21. Not perfect, but it's pretty much, think about it, just for having two factors and their interaction in one of their curve terms, that's pretty good. But what I can do now under that red hotspot is turn on the profiler. This is worth the price of admission. This right here is going to enable Allison and her team to sit there and talk about what settings we should have. S hould I be at v3_1 ? Should I be at v3_0 for this x_3 input? Should I be low or high? A gain, it shows that interaction live. For example, I'll shift this color here. Watch what happens when I'm low. I'm sorry, not low but high on X 19, I'm way out here to 500. By the way, you can type in what you care about. Maybe I want to see what it's at 480. Look how flat that line is between _0 and _1. It doesn't really matter which one I select. I'm going to get the same kind of prediction. The red is my prediction, and the blue around it here is my confidence interval around that prediction. Of course, this wouldn't be good because I'm right on the lower spec limits. Watch what happens when I start to pull it. Well, I might be happier here with version 3-0 in a setting around 350 because that gets me close to the targets. But if I keep going up here, you see how steep this line is begin, and I definitely don't want to be on version 3_1. Because it has a steeper line and it has made this slope very steep. It's all coming out, but it's interactive in this profiler and now we can play with what would be the right settings for if I had to stay with version 3-1. If I go to version 3-0, what would be the right settings here? They might be different settings. There's always multiple optimal settings you can select. This is really cool. We now have the ability to predict. All right. Continuous process improvements. All this was great. We now have a faster way to get our analysis done. We've gone through a flow that enable s us to find what's important and see what's important. But what if we want to use that information to monitor over time and continually improve our process? It might be nice to have for different levels of X-3 a dashboard. Allison and I worked to create a standard type of dashboard that her team is used to seeing. They're used to seeing control charts first, and then the process capability around their specs. Then next they would want to see the output over time. That's the top chart in the middle there and then below that if there's anything else they should worry about. That was our big finding, that, "Hey, x 19 has an effect," so they would want to see that. Lastly, on the right hand side, we put a table with the average means for the output of interest, plus even some more outputs they like to take a look at. Of course, we want this to be interactive. So how can we build this dashboard for level zero and level one? We're going to bring up our data here. I think I already have it opened up here. I will go now and just create in just one swoop. This is why it's nice to be able to save your graphs, your analysis back to data. I'm just going to click and create a whole bunch of views here that are going to replicate what the team wants to see. Here is that control chart builder for the X bar and R. Next, we have the process capability. Next, we have that output over time. Next, we have the output over the X 19 that we wanted to show. Now we have the table. I have all the elements, and if you have all the elements, you don't have to save them back to make someone run them one at a time. You can combine them into a dashboard template and it's under File, New Dashboard. It will allow you to pick some type of template to start off with. I'm just going to pick this blank template. Now it's got all my reports, all the graphs and tables and things I've opened on the left. Now I can just bring into the body of the dashboard what I care about and I can orient things the way I would like to see them on my dashboard. When I'm done, it's easy to go and run that dashboard and then later save that dashboard when I'm ready. But I've already got that run here. So I'm going to close down the dashboard builder. I'm going to show you the dashboard we have already created to capture all this information. With one click of the button here's, my dashboard. And boy, beautiful looking dashboard here, just the way I want to see it. Now, the thing that we loved about this was your ability as well to still use the junk dynamic linkage. I can select a couple of high points and I can see where they will flow in the other graphs. I can even see down here where t hey're highlighted to my table. So this is great, but what about that X3 variable? We knew we wanted to be able to create separate dashboards for each of those. So instead of using a local data filter, I'm going to use a global data filter. It's actually under your Rose venue. It's right at the bottom. This one affects all graphs, all analysis. It affects what's hidden and selected back to your data table. On this one, I'll just go ahead and put X3 . Now when I click on Show and Include, I'll turn off the select so I can make my own selections. Now I can toggle between that _0 and that _1 . Now it works the same way. I can see things that were out of control or out of spec here for just version 3-0, then I can do the same thing for 3-_1. There we go. We have a nice tool that can be really used to, again, not get data just quicker and not just do one analysis, but actually make this a continuous process improvement tool that we can use day in and day out to quickly get the view we want and ask the questions we need to drive improvements. All right. So that is our story of moving from data to story, I should say. We wanted to leave you with where to learn more, where to get more information. Of course, we're going to give you the presentation. We're going to give you the journal we use so you can replicate these views we're seeing. But Allison and I felt that if you were wanting to really get started with JMP, go to the Getting Started with JMP webinars that we have. So it's on the JMP website, will include links in the journal, and it covers about everything we showed you today. We had a few more tips and tricks, but the new user welcome kit is another really good thing to take. This one allows you to work with a data set, it gives you a data set that you can follow along, and it's really nice step- by- step instructions. We're both big fans of the Statistical Thinking for Industrial Problem Solving. Free online learning, basically E-learning course, and you have so many different places you can do. I've used this to do just in time learning, and I've had a lot of people t ake all the sections just to get up to speed on everything JMP can do to help you compare and describe and predict all those fun things you want to do. Don't forget, if you have specific things you want to do, we do have Mastering JMP webinars that are available here. The JMP community, communityjmp .com is a good place to look for just in time learning, and as well, JMP Education, if you want to get more of the underlying theory on how a lot of these things work. We do a lot of public training or can customize training for you as well Just talk to JMP Education. All right. I will allow Allison to say a few words when we finish. But thanks, everybody, for joining us, and we hope you picked up on a few things you would like to try within JMP. Thanks, Scott. Thanks, everyone, for joining us. It was really a pleasure to share this case study from Coherent with you and to share all the new cool tricks that Scott has taught me and that we've learned through our journey with JMP at Coherent. So thanks again and take care. Bye.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

Building an analytic workflow for any manufacturing process can be be daunting. This presentation will demonstrate the ease of building an analytic workflow from preparing the data to analyzing the final product. The workflow demonstration will show steps for data visualization and multivariate analyses including clustering, predictive modeling and optimization of the process. Additionally, a chemometric modeling approach to quantify the active ingredient in a finished product will be included. Hello, everyone. My name is Bill Worley, and I am a systems engineer for JMP for the US Chem team. Today I'm going to be talking to you about An Analytic Workflow for Data and Chemometric Analysis with JMP . I have got a few things I want to highlight. We're going to be talking about getting the data in... Actually just following the analytic workflow that we share about getting data in, cleaning and blending, visualization, exploratory data analysis, building models. And then ultimately, what are you going to do with that data and how are you going to share it? Couple of things that are important are the new JMP Workflow Builder, I'm going to highlight that. This is just a snapshot of what I'll be showing you in a little bit. And the chemometric part of this is analyzing spectral data, using functional Data Explorer for pre- processing. And these are now built into JMP. If you can see over here, we've got a tab up here in FTE Now where you can choose from different types of pre- processing, standard, normal, variate , multiplicative scattering correction, Savitzy-Golay filtering, and baseline correction. And I believe there will be maybe one or two other things added to that. Just so you know, the data that I'm using is pulled from this paper. You see it right down here, back from 2002. Just to let you know, that's where the date is coming from. All right, I'm going to put that aside for now, and I'm going to go ahead and get things going here. So I've got my home window. I'm going to go and start a File, New, Project. And the workflow that I'm going to be working with is this one right here. I'm going to right- click on that. I'm going to open it. What I've done is I've taken the data set that I want to work with and I've built all these steps in the Workflow Builder, and I'm going to play that for you now. So it's going to populate our project here. So I'm going to go ahead and hit play. As you can see, that's building the workflow, doing some analysis. We're doing some model screening right now. And then everything is complete. And now we have all these tabs across the top where we've completed that analysis with the workflow. I've actually included one other table in there. When we get there, I'll talk more about that. But we built that table, we pulled the table in just to show you they're from the source data. We've actually pulled this data in from an Excel file. Getting the data into JMP from Excel is fairly easy. And we built some exploratory data analysis. So the first step we made is doing a distribution, and we can interact with it just like anything else. Everything's interactive from the Workflow Builder. Did some graphing where I put the column switcher in, added the local data filter. Just so you know, as we build the workflow, all these things are built-in and the recording helps you keep track of what's going on. And you can see that we've got full functionality going on there. Did some more exploratory data analysis looking at Fit Live IX , doing an [inaudible 00:03:46] like in this case, Fit Live IX for mill time versus dissolution, and blend time versus dissolution. Just to get back to it, this is tablet data that's pretty popular within the SCE community within JMP. And I'm just building on how we would analyze that and build out this workflow. Next step would be multi- variate analysis to see, for this dissolution, which is our key performance indicator, what might be any of the factors or what factors might be highly correlated with dissolution. Not seeing anything that's jumping out too much. We do have some partial correlation, but no one factor is jumping out as the answer for the data that we're looking at. We can get a better understanding of what factors might be important if we look at a predictor screening. We can see here that we have things called screen size, mill time, spray rate. Those look to be important factors that we could use to build a better model. Next, we can actually set up a stepwise regression. I'm going to go here and actually run this model in a second. And then we've got that output so that data is there and we could use that as needed. So we built that model, and we're out here looking at another type of analysis, which would be decision tree, a partition analysis. We can do neural net, build that model and we can do a [inaudible 00:05:35] squares. So we've got all those things together. But once all said and done, you could actually use something called Model Screening in JMP Pro to build these models out and find out which is the best overall model. And based on that, we can see that a neural- boosted model is probably the best overall model for us to work with. We can then take all this information, share it with all our colleagues, co-workers, anybody who might be interested. And we can do this in several different ways. One of the best ways would be to use JMP L ive and put everything out there for folks to look at and share. That's the first part of the analytic workflow. And again, if we look back here, that's all set up in this portfolio here. And as I said before, I had opened another table. And this is for the chemometric part of the analysis. This is near- infrared data for finding the active ingredient in tablets. We built the tablets, we made the tablets. Now we have to take the finished product and find out what's it all about. Do we have the right active ingredient? And can we tell based on this technique called near- infrared analysis? We're going to step through a few different things, but I'm going to turn that Workflow Builder back on to record these steps. So let's turn that on, let's go back to our data set, and let's do some analysis now. So I want to clear out these raw states first. And now I want to go to Analyze Clustering, Hierarchical Clustering. So I got that, and I've got all my data groups. So there's 404 wavelengths that are grouped. I'm going to pull those in, say, okay. Let's build this out a little bit, we're going to look at three different clusters, and let's color those clusters. All right, so you can kind of see that, let me pull this down a little bit, you can see we've got three clusters, fairly big green cluster and two smaller blue and red. So we've got that. And now let's go back to our data set, and let's do a Graph Builder. So let's go to Graph Builder. Let's pull our wavelengths in, here to X, do a parallel plot, clean that up a little bit, and right- click there to combine scales and parallel merged. I'm doing these steps pretty fast. This is something you'll want to go back and watch again, if it's of interest. But the thing I want to show you here is that the data is pretty scattered, and there's a lot of baseline separation, maybe some additive and multiplicative scattering that we need to clean up. So let's go back to our data table and go to another analysis step. Let's go to a multivariate method. Let's go to principal components. Again, we'll pull all our wavelengths in. Say okay. And the thing I want you to note here is that we have some 404 wavelengths are all grouped right around this little area right here. That is highly correlated data. We could build a model off of that, but it may not be the best. Because we're going to be including wavelengths that are not of importance because of the high correlation. So we'll clean that up in a little bit. I'll show you how to clean that up in a little bit. And as a matter of fact, let's go to that step right now. Let's go back over here and go to Analyze, Specialized Modeling. And we're going to go to Functional D ata Explorer. And let's get this set up first, and I'll tell you more about it. Let's put our wavelength there, we have our active ingredient, which is a supplemental variable. And then our ID function. There we go. Say okay. Raws as functions. Let me do that again. Active ingredient and our wavelengths. This kind of looks like what we saw before in Graph Builder, and we want to clean this up. So we've got these new tabs in J MP Pro 17 for Functional Data Explorer. Spectral is one of the tabs. And then, as we talked before, we have standard normal variate multiplicative scattering, Savitzy-Golay and baseline correction. I'm going to select the standard normal variate first to cleaning that up. And then you can see the baseline is a little wobbly here. Let's clean that up, take that next step, and then go ahead and say okay. And now we've got that set up. It looks a lot better, a lot cleaner. And now the next step would be to model this. We're going to use another new function in Functional Data Explorer called wavelets. It's wavelet modeling here. And you can see down here that our model has been built, and we're explaining a lot of the variation with about five functional principle components. But if you look at these, we're explaining the shape with our shape functions. That's where our eigenvalues come into. This is really just a nice way to look at the data and make sure that our spectra is being well modeled. As I said, we've got five shape functions that are explaining things really well. So let's clean this up a little bit and pull this back, make our model a little simpler. We won't go all the way back to five, but we'll leave it at 10 for now. You can look at the score plots. There's still some scattering here in the data, but we'll clean that up in a second. And then one of the other steps you want to take care is to do a wavelet model. This is new in JMP 17, this wavelet analysis. And what this is really all about is looking for can we find the important wavelengths that are going to give us a telltale sign of what's going on with the data. And what I'm looking for, especially with the spectra, is something where I can see a shift in the baseline. And I can see that we've got a good shift in the baseline and a grouping of spectral wavelengths around 88 20 to maybe 88 50. So that's the important part here. So we get an idea of what the important wavelengths are. All the data that I had done up here before, let me pull this back, the pre- processing that I had done before, I want to save that data out and do some analysis on it. So I'm going to go here to the Functional Data Explorer, select Save Data. This is going to be a new data set, and now we've got to do some work with this data and clean it up and make sure we're ready to go. I want to do a transpose. Transpose Y. X is our label. And these two drop into [inaudible 00:13:38] . See if we got this right, let's hit okay. And yes. So we've cleaned that table up, we've taken those 300 spectra, and then we've transposed them into another data table. This is all the pre- processed data, so we're going to do a few more things to this to show where that pre- processing has really cleaned up the data and where we can build some models with it. So let me get rid of this column. All right, and ready to go. So we're going to do the same thing that we did before. Let's go to Analyze, Hierarchical Clustering. Actually, let me take a step back here real quick. Here, I want to group these columns to make things a lot easier. So group those columns and let's go back to where we were. Let's go to Analyze, Hireachrial Clustering, pull our columns in, say, okay. We'll do the same thing we did before, we'll look at three different clusters, and color those clusters. This will be a quick comparison, but if you look at what we did before to what we've got now, we've got a lot tighter clusters. And these actually are pretty well dispersed. They're pretty even. Those clusters are fairly even right now. Let's go back to our data table. Let's go back to our Graph Builder and pull our wavelengths in again, we did before. We're going to make a parallel plot out of this again. It doesn't look great right now, but let's right- click here, and go to combined scales, parallel merged. And now you see that the data is really cleaned up where we did that pre- processing has taken things to... They look a lot better. Let's see if we can compare that here. What we had before, and what we have now. So we've got that data much cleaner. Any analysis that we do from here should be much better. So let's go back to Analyze, Multivariate M ethods, Principle Components. Pull our wavelengths in. Say okay. And now we've taken that data and we've broken that correlation structure that we had before. This is currently after pre- processing, this is what we had before. Just to show you the difference. So we've really clean things up. Now we'd want to take maybe one more step in the analysis. Analyze, let's go to Quality and Process Model- driven Multivariate Control charts. We're just looking for maybe some unusual behavior in here. In this case, it's based on the principal components. Say okay. And this is looking at two principal components. You can see that there's some potential outliers here. But this is spectral data, we're not going to get rid of anything. We just want to kind of view that. And one other thing we want to look at is to go to Monitor the Process, we're going to look at score plots. And now we can look at our subgrouping down here and we can actually compare these groups. I'm going to pull up a tool here, a lasso tool. I'm going to do my best to group these, a couple of these. That's going to be my group A. A nd I'm gonna do another lasso here. We'll just leave that as is and go there. We grabbed one of the wrong ones, but I think we'll be okay. And now we can compare where we're seeing differences in the spectra for these two subgroups. And as I was saying before if we looked right in here, those wavelengths are in somewhere in that 8800 range, and then we can see that there's a real difference there. O ne more thing we want to do, and this is the last step. What I'd shown before, I'd done model screening, and I want to do model screening again. I go to Analyze, Predictive Modeling, Model Screening. We're going to set this up, we're going to do our active. This is our response that we're trying to model, and we're going to use our wavelengths to build this model out. I'm going to clean this up a little bit. We don't need all these different modeling types. We're going to pull this out. But the nice thing about this is I can build all these models at once and really find out what's the best modeling approach to take with this data. I don't need that. I don't need that. Let's add those. One thing I'm not going to do for time sake is I'm not going to add any cross- validation. If we take that into account, it'll actually run a lot longer. But as you'll see, this is going to be fairly quick. I'm going to go ahead and say okay. And as this is going, just talk a little bit more about what we're seeing. We're building out these models. You can see it's stepping through. And let's see in about another few seconds here it should be done. There we go. Taking a little longer. There we go. Based on what I said, I didn't use any validation, but neural requires it. So that's the validations that you see there. But overall, we get a really good idea XG boost is going to be the best model to fit this data. We could use any of these others, because they're all really good models as well. But you get to choose, select one, let's say partially squares. Because that's the go- to analysis method for spectral data anyway. But we've got that. We can say run, selected, and fill out that model and find out, can we make it even better? Hopefully what I've done and showed you is that we can build these... Let me pull this back to our beginning here. Just a few steps. This is our workflow, and I've added those steps. So we've got that table that we're working with. We cleared the raw states . We transpose the data. Anything that we closed out is now part of that workflow. So we continue to build that workflow. One thing I'll say is that I would typically not build a workflow inside a project, but just showing you that it can be done. Let me go back to my slide here, and share. Let's flip this. One more step here. I just want to say thank you to a few people. Jeremy Ash, who's no longer at JMP, but he's a great inspiration for this. Mark Bailey has been a great help. Ryan Parker and Clay Barker have done really fantastic things with genreg and Functional Data Explorer. Chris Gotwalt has been really helpful in getting things set up. And then Mia Stevens has been a real supportive person in helping me build the spectral analysis out within JMP. So I really appreciate everything and that I'll say thank you. That's it.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

After JMP Live 16 was released, the JMP Live team and JMP product managers sorted through feedback from JMP Live customers and prospects. We then set out to address as many of these requests and concerns as humanly possible for JMP Live 17. The result is a virtually complete overhaul, designed to enhance collaboration and automate data updates. Whether your company has adopted JMP Live already, or you are still thinking about it, this talk is a must-see to understand what's coming. JMP Live 17 adds the concept of spaces, which provide a much more flexible way to create separate areas where different groups can collaborate and define who can create content and who can only view it. Another exciting aspect is the ability to update data in JMP Live directly from a database, without the need to rely on external tools like Task Scheduler. Come see what's new! -Well, thanks everybody for coming. Today, we're going to talk about JMP Live 17 and how it allows you to collaborate better with JMP. My name is Eric Hill, I am a developer on the JMP Live team, and my co- presenter Chris Humphrey, is also a developer on the JMP Live team. Thought I'd start by reminding everyone what JMP Live is, and then we can talk a little bit more about what's new and JMP Live 17. We introduced JMP Live back in the JMP 15 cycle about three years ago. And what JMP Live is a web application that you access through your browser. It is private to your company, so nobody can see the content of your JMP L ive instance other than you and your company. It can be installed on premises at your company, or you have the option of having JMP host your JMP Live instance on AWS, either way it's still private to your organization. The main purpose, at least when we started JMP Live was to allow people who use JMP experts who create analyses, to be able to publish them to a place where people who don't have JMP could see them and interact with them. There's a lot of interest that people have in sharing their JMP discoveries with people who don't have JMP, and you can already do that with screenshots and powerpoints and various things. But those things lack an important characteristic, and that is the interactivity that JMP is known for. With JMP Live, you can share your discoveries in a way that allows people to interact with them in many of the same ways that they can that you can when you're in JMP . It also allows people who do not have JMP , that's the main thing. The main thing it allows people to do is people who don't have JMP to see and interact with your content. In 17, we've added some features to facilitate collaboration between people who do have JMP, so even if you and your colleague both have JMP there's value in publishing both data and analyses to JMP Live so that you can collaborate and make each other's analyses better. Another thing that we have added in 17 is the ability to publish analyses that will automatically update when new data becomes available. And that was a big customer request and prospective customer request that we've had in the couple years that JMP L ive has been out, so in 17, we are delivering that feature. Things that are new in JMP Live 17. Well, we have this concept of spaces that you'll see a lot during our demo today. A space is an area of JMP Live that you can restrict to a certain group of people. A certain group of people may have the right to publish content there or view content there, or edit content, improve or script the data, updates, those things. So that's what a space is. We have greatly streamlined the publishing process, and you'll see that in a couple places in our demo today. We now support unlimited file folder hierarchies. So in previous versions of JMP Live , you had the root level, and you could create one level of folders on top of that, but beyond that, you couldn't continue to create folders at lower levels. Well, now in 17, you have a complete hierarchy of folders, so you can organize your content, however, makes sense. There's also a new feature in JMP L ive 17 called Open in JMP. So you can be using JMP Live and you can look at an analysis that you want to maybe take further in jump. You want to see if you could add something to that analysis or improve it in some way. Well, now that you can open it directly into JMP and if you have JMP on your machine of course, and then you can start working on it, improving it, experimenting with it, to see if you can add to it. So we'll again, seek some examples of that in our demo. And then the one I alluded to in the last slide schedule data refresh, the ability to create a script hat knows how to refresh your data, and then to run that on a schedule every 5 m inutes, every day, every week, however you want to do that. Now, the premise of today's demo is that Chris and I work together at a manufacturing company, that manufactures widgets. And we are responsible for some of the products, five of the products for that company, five of the widget products, and each of those products has a certain yield that gets reported every day. And we need to find ways to present that are helpful to both us and our colleagues at the organization. So we're going to collaborate, work together to come up with some analyses that we think are beneficial in that regard. So Chris, have you gotten started at all with the analyses of the yields of our widgets? -I have, I will share my screen, sorry. Eric and I work on a few different parts, and we have a couple data tables that we use to track the yields for those different parts. The first one is a simple data table with all five parts that we're responsible for, with their yield values over time. So this is something that I need to share with the rest of the group. So I'm going to create a simple report that shows the yield over time. So I'll use graph builder for this. I'll drag the date to the X column, one part to the Y column, and now I have a report that shows my yield values over time for this one part. I could create four more, one for each of the remaining parts, but I think it'll be a lot easier to use if I add a column switcher to this report, switch out the part that I added earlier with the five parts that are in the data table, and now I can see every part in one report. I can switch between it and see the yield values for those different reports or those different parts all in one report. So I think that's good. The second data table that we use is a fit model for a photo process that we run. This model has four factors, three responses, and there's a script already in the data table to run a fit lease squares that gives me a profiler. This allows me to see the interaction between the different values. I can change the values to see how it impacts other values, and pretty useful to the other engineers. So I think this is a report that I'd also like to share with the group. So in the past, I would either have to share my data table with a script that others could run. I could do screenshots, which I would lose all the interaction ability of these reports. Or maybe I could do a pdf also, it's not interacted. But with JMP Live, I can publish these reports to JMP Live directly, so others can use them just like I have here in JMP. So I go to publish, file publish, publish reports to JMP Live. And JMP is now setting up a connection to my JMP Live server. I set that up previously in my managed connections. The first screen you see is two reports, actually a list of the reports that are open in JMP. These are the two reports that we just created. I'll select both of those. And down here, I'll also make sure the publish new is selected. This is a new publish to JMP Live. I'll select next. And now, as Eric mentioned earlier, I need to pick the space that we use to collaborate as a group, or right now Eric and I are working on this before we share it with everyone. So I'm going to use the Eric and Chris space that he and I have access to. Within the space, I'll create a new folder called yields to store our data and our reports. I make sure that's selected. And I click next. The next screen shows me the reports that are going to be published. I see my yield report and my fit model. And so I'll give him a new title, and I'll change this one something a little more appropriate. And now I'm ready to publish. I'll hit the publish button. And at this time, JMP is sending the reports and the data up to JMP Live to publish it on the web. The results screen appears, and I see three sections. First is the location, that's the yields folder that I created in the space, Eric and Chris. Second section are the new reports. That's the two reports that we were working with in JMP. And the third section is the new data that was added to JMP Live. T hese are the data tables that are used to support the two reports that we sent. All of these values are hyperlinks. So if I click on the yields, I've brought to a folder on the web that shows the two reports that I added to JMP Live. The one on the left is the yield report, and I have the column switcher that I added, just like I had in JMP, I can switch between the products and see the yield data for each product. If I go back, I see the fit model. And in this report, the profiler is present, and it's as interactive as it was in JMP. So now anyone that has access to JMP Live can use the same data and provide the same analysis that they had in JMP. So Eric, I think that's a good first step. Can you take a look and see if maybe that's enough? -I can do that. All right. Well, that's interesting. Let me go see if I can only go to my, we go to JMP Live in my web browser here. And here's the instance of JMP Live that we're doing this. I'll log into that. All right. Okay. I can see, even right in my homepage, I can see the two reports that Chris created. I'm going to go ahead and go to the space where we're collaborating here. So there's the Eric and Chris space. And there's the yields folder. All right. Well, let's see what we've got here. So here's the widget, is that profiler that I heard Chris talking about. So yeah, this looks very helpful. I think our engineers will appreciate this interactivity. One thing I would like to do, though, is I'd like to look at the data behind this report. This is something else that we added new in JMP Live 17. I need this. Let me grab this. To get to the data, I can go to this details here and scroll down. And there is photo process app. That's the data table that Chris published that's behind this analysis here. So I will go there, and then I can just go to view data right here, and that will bring that data up in a browser so that I can just take a look at it. It doesn't have the full power of the JMP data table. I can't edit the data, and I can't do a number of things that you can do in the JMP data table. But I can do a number of things, and I can look through it and just get an idea what the data looks like. So kind of I get a feel for what's going on here. Now, one thing I notice as I'm scrolling through this data is that there are two material suppliers for this photo process that Chris has analyzed here, advanced materials like it is, and Cooper. I'm curious if the material supplier has any effect on the relationship between the factors and the responses. I'm going to go back to that report here. Here it is. So what I want to do is I want to add a data filter to this, to filter on that material supplier. Now I can't do that directly in JMP Live. For that, I need JMP. But it's really, JMP is only one click away. When you're in JMP Live 17, there's a button up here called open in JMP. So if I click that, it's going to open this report right here and the data behind it into JMP. And there it is. And here is the report. And then down here is the data. Now we've opened it into a JMP project. You may or may not have used JMP projects in the past, but a JMP project is a convenient way to collect reports and data that kind of go together into one object so that you don't lose track of what goes together. Because in JMP Live, when you have a report, it has data that goes along with it. In order to keep those together, when we open them in JMP, we go ahead and put them into a project, just to kind of hold everything together. But other than that, it will work just like JMP. So I can go up here to the red triangle and go to local data filter. And under factors, I have material supplier. So I will add a data filter for that. And I will check right here in JMP to make sure that it's worthwhile to add this even. And sure enough, there is a good bit of movement in the graphs as I click between the two material suppliers. So I think that's a worthwhile addition to what Chris did. So I would like to publish this back to JMP Live and replace the version that Chris published. We set up our space because we're collaborating. We set it up so we have the ability to edit each other's content. You don't have to set it up that way. You can set it up so that each person's content is private from the other people, or not, at least not editable. But we wanted to be able to collaborate, so we set it up in the way we did. So let's do file publish reports to JMP Live. We're getting things ready. Okay. There's the list squares report that I just created. Now I'm not going to publish new this time, I'm going to replace an existing report. So let's choose that. So here's the report I'm publishing. Now I've got to tell JMP what report on JMP Live do I want to replace. So I click here, and I can see, here's the fit lease squares that Chris just created moments ago. So I will select that. So that's the report I will replace. And I will add just a little extra to the title there, just so we differentiate between the two. I will click, and I will click next. Now I'm presented here with an interesting decision I need to make. When Chris published this data to JMP Live, the photo process opt data. And when I downloaded that to my machine, it just made a copy of that same data table on my machine. I don't really need to republish the data here, because Chris already published it. All I want to do is republish the report that has my data filter in it. So rather than doing anything to the data that's on JMP Live that Chris published, I'm just going to say, use the JMP Live data table. Use the table that's already on JMP Live that's associated with the report that I'm replacing. Click, replace. Off it goes. All right. So you see, I have, there's the folder that I put it in, and here's the report I created. And you see, it doesn't show any data tables here being published. And that's because of the choice I made to just keep using the data that's already out there. All right. Well, if I go back to my JMP Live window here, you can see that there's a little note here that an updated version of this report is now available because I publish something new. It doesn't immediately update it because you might be in the middle of something with this report, and don't want it to be jerked out from under you. So we let you be the one to decide, okay, I'm ready to reload this. And when I reload it, there's my data filter that I added. So I can select advanced materials, and there's the data filter for advanced materials. Switch over to Cooper. And I get the slightly different curves there. All right. So I am happy with that part. L et's take a look at what else Chris did over here. So here's the yield report that Chris published. It looks fine. I can switch between them and see them. But one thing I might like to do here is create a control chart for this yield process. You can think of a yield process as being in control or out of control, and that might be a helpful way to display the yield process. So I'm not sure that that's what we want to do, but I'm going to create that and then add it to the folder that Chris created, so we can look at the two and decide which one we like better or maybe you like them both. So to do that, I don't need to download this report, because I'm not going to do anything to his report. I'm going to make a new report. So I'm going to go over to the data for it, and I'm going to open that in JMP. Give it permission. And there you go. So now we didn't need to create a project or anything in JMP. We just brought the data table down and it looks like any other data table you would open in JMP. Now, in the interest of time, let's go to, let's see, let's go to the home window here. Now here's my script right here. So I made a script to create the control chart that I'm interested in. So I will just run that script. And there is that same information that Chris published, only this time it's in the form of a control chart, so I can go down and look, I've set some spec limits on here, so I can see that maybe some of them look pretty good. Others, here's one that looks like it's completely all the yields are below the lower spec limit. So that looks like a process that we might want to look at and try to improve the yield on. Okay, so I like the way this looks, I think this is a good addition to our collaboration here. So I will publish this as well. I'll publish reports to JMP Live. There's the yield control chart that I just created. I'm going to publish new and what space do I want to put that in? Well, here's the Eric and Chris space that we've been collaborating in. And here's the yields folder. So I want to put that right in there. Now let's think about the data for this one as well, because here again, I've made a new analysis, but I haven't changed Chris's data. So I really don't want to republish the data with this report. I just want to continue using, I want this report to use the data that Chris already published to JMP Live. So to do that, I need to go to the data options tab. And for the yield five star dot JMP data table. Instead of publish new data, I need to select existing data, and I will find a data source. And here it is, yield five star. That's the one I want. I will save that, and now I can publish. There we are. Again, we have the folder and the report. No data, because we didn't ask, we asked jump not to publish the data. So should be good there. I can go back to jump live. And I've got some new posts, and there's my control chart. So, see how that looks. Hopefully there it is. All right. That looks pretty good. Now, Chris, so I've made some updates to, I've updated one of your reports and created a new report out there. But as I was creating, I was wondering, we've got all the yields up to date to today, but tomorrow we're going to produce more Widgets, and we're going to have a yield value in the day after that, and the day after that on into the future. So I wonder if there's a way that we could allow all these reports that we've just created to automatically update periodically, and maybe daily when we have new yield data. Can we do anything like that? -Sure, I think we can. So in the past, even before JMP 17, we could update data from JMP. So in the past, if I'd had this request, I would have used a simple GSL script that would have opened the data table that Eric wants to update every day, would have updated the values from the database, then would have connected to JMP Live, and then updated the data on JMP Live with the new data table using the ID of the post on JMP Live. So that's good, except I still have to remember every morning when I come in to make sure before I get my coffee, that I need to push this up to JMP Live so Eric gets the new data. It's going to be hard to do if I'm on vacation, and I know I won't remember. So maybe in JMP Live. There's a better way to do it. So here I am on the yields folder, and I can see Eric's new report that I like a lot, the control chart report, but I need to see if there's a way to update the data. So I'll go to the data table, and here I can see the two reports that use this data table. And Eric's asked if we can update this data every day. I see an update data button here on JMP Live . That might help. Let's try that. So I'll push the update data button, and now I can select the data table on my local machine and update JMP Live with that data table. Marginally better, I guess, because now, at least I guess I don't have to run the last line of the script, but I still have to update the data table with the data from the database, and then use this button to upload the new data to JMP Live. Still I have a lot of steps that I need to remember to do every day, and I probably won't. If we look under settings, there's some new items here in JMP Live 17 to make this a lot easier. The first one we see are the refresh settings. That's exactly what I want to do. I want to refresh the data as quick as every morning. So Eric gets the new data and the rest of the engineers as well. So I'll make this refreshable, I'll go back to the reports. And now the update data has changed to refresh data. And so I'll refresh the data. I get an error that says it can't be refresh able because the refresh script is empty. I'll go back to the settings, and sure enough, there's nothing in my refresh script. When we look at this screen, we see this source script. When we updated or uploaded the JMP data table to JMP Live, we stored the source script off for that data table, and you see it here. In this case, this source script can be used to refresh the data directly. So all I really need to do is to copy this source script doubt, add it to the refresh script, and then see if that will refresh. So I'll copy the source script, and then I'll paste it in the refresh script. So now we have the connection to the data source and the creation of the data table. I'll save that. And then now I'll try to do another refresh. I'll refresh the data. This time it got queued for refresh. That's nice, but it failed to refresh. So if I look at the history tab, I can see the different things that have happened. The on demand data refresh that I just ran looks like it failed. And if I look at the details, I can see that an unknown error occurred with the connection string. It may not seem that helpful, but fortunately, I know what's wrong. So I'll go back to the settings. We'll look at the refresh script for a second. If you look at the refresh script, you'll see there's a password and a user ID. Now I could just paste my password in here and my user ID in here. I don't think that's overly secure. I don't think that's a good idea. So JMP Live is provided this substitution parameter syntax for user ID and password in a refresh script. So what I need to do is I need to provide this user ID and password to this refresh script somehow securely. So if I look at the assigned credentials tab, or hanging here. I can see there are no credentials assigned. I'll go into assign a credential. To create a credential, it's pretty simple. You just store the credential name, you provide the user ID and the password. That's all there is to it. You do it every day. But I've already set up a yield table credentials here with DBA web JMP as a username and a secure password that's stored in a database. I'll assign that credential to the refresh script and then save that. So now what's going to happen is this refresh script is going to run. When it runs, it will request the assigned credentials, add the user ID and password to the substitution, and then run the refresh. So I'll go back to the reports, and I'll see if I can refresh the data now. I'll run refresh data. Now the refresh looks like it worked. I see my reports are automatically regenerating, and I can actually see the thumbnail on the report was updated with the new yield data from the database. So now I'm a lot closer. I have a refresh data button here that all I have to do is press it, and it'll refresh the data. I don't have to update a data table. I don't have to upload anything to JMP Live. All I have to do is come in every morning and remember to press this button. It's at least more likely, but it's still not going to happen, I promise, especially when I'm at the beach. So there's one more pain that we haven't messed with yet, and that's the refresh schedules. The refresh schedules provide us a way to set up times to refresh this data automatically. Sounds like what I want. So I'm going to set up a refresh schedule that will run at seven o'clock every morning and update this data. I don't think I need to run it on Sundays or Saturdays, and so I'll take those two out of the list, and I'll save that. So now this refresh schedule is in place, which means the refresh script will run five days a week at seven o'clock in the morning. So I'll go back to my reports, and you guys just sit and talk amongst yourselves while we wait for seven o'clock to show up. Probably not. So let's see if we can look into getting a reef fresh to happen. We'll create another schedule here that runs every 5 minutes. And I'll set that to run. This is the most complicated part of the demo. I will set that to run at five, where we run here in just a second. So let's see, I think it's going to be 27. Let's see if that's right. -And you can just put 32 or 33. There you go. -Yeah. So now I have a refresh schedule. It's going to run every 5 Min, and I think I calculate it properly, so to run in just a few seconds. JMP Live is going to tell me, yes, it's going to run in just a few seconds. We go back to the report tab, and we'll wait for about 15 seconds here for the refresh to run. Refresh schedules are, you can have as many as you want for each data table. We won't run two at a time. So now the refresh is about to run. You see the automatic refreshes as run. My reports are updated. Well, they're still updating. So both reports have updated now, just a few seconds ago, if I actually go to the data, we also see it was updated a few seconds ago. If we look at the history, there was a scheduled data refresh that ran by the scheduler just a few seconds ago. So now we have a situation where the data automatically refreshes every morning, just like Eric wants, whether I'm on vacation, whether I remember or not. So I think this is pretty much what Eric wants. So, Eric, how does that look? -Chris, that really looks good. Let me click the new posts here, and I can see, I'm seeing the updated versions of the reports with the new data. So that sounds fantastic. -I think you may have shared your wrong screen, E ric. -Okay. Let's try that again. Let's stop sharing. Let's share. Here it is. Yep. Yeah, Chris, that looks really good. I'm looking at JMP Live here, and I see the two reports, and I can see that the data has been updated. Looks like our yield process is coming right along. The process that they put in place recently seem to be pushing yields up at least for this particular part that we're seeing in the thumbnail. So that is really good. Well, now that we've got this in place and it's working, there's more people than just the two of us who would like to be able to see this. So I would like to move these analyses and data that we've created over to a space that has more people allowed to see it. Just for reference, if I go to the permissions of this space, the Eric and Chris space, the only two people who can see content in this space are Chris and myself. So we want to rather than add people to this space, we want to go ahead and put this in a space that other people already are used to going to. So to do that, I can switch over to the files view of this particular folder. And I can see it looks a little bit more like a file explorer here. I've got the three reports in here, and I've got the two data tables. This one here, it's got a little bit of different icon to indicate that it is automatically refreshing on a schedule. So that's nice to know. So I'm going to grab these five posts here, and then I will go to move up here, and we will move those. I need to pick a different space. And the space is called discovery America's 2022. And there is a folder in that space called five star line. That's the line of products we're responsible for. So I'm just going to move those over there. All right. They're gone from here. That's half the battle. Let's go back to my space directory here and flip over to the discovery space, into the yields folder and the five star line. And there are our reports, right there's the report that's scheduled update. So in this space, as I mentioned, this space has all the people that are going to be interested in this yield data, not just Chris and I, but the other engineers that we work with. So that's one approach to putting some content in JMP Live, massaging it, making sure you like it, and then share it with a larger group of people. So that's a use case that we support here. All right. Well, let me flip back to my slides here. If you'd like to view the content that you saw us create here today, there it is. The place we published it to on JMP Live is actually viewable by anyone who can log in to this JMP Live instance, dev live 17. jmp.com. There's a little shortened link to it down here. And hopefully we'll get to the slide, so you can just click on it. But as long as you have a SAS Profile ID, you will be able to successfully log into Dev Live 17. And you can go find that report, those reports we just published, and you can watch it every 5 minutes, at least for the length of discovery, they'll be updating so you can watch that yourself. All right. W ith that, we will see if we have any questions and that will do it. Thanks for joining us everybody.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

A picture is said to be worth a thousand words, and the visuals that can be created in JMP Graph Builder can be considered fine works of art in their ability to convey compelling information to the viewer. This journal presentation features how to build popular and captivating advanced graph views using JMP Graph Builder. Based on the popular Pictures from the Gallery journals, the Gallery 7 presentation highlights new views available in the latest version of JMP. It features several popular industry graph formats that you may not have known could be easily built within JMP. Views such as dumbbell charts, word clouds, cumulative sum charts, advanced box plots and more will be included to help breathe new life into your graphs and reports! Welcome, everybody, to Pictures from the Gallery 7. My name is Scott Wise, I'm a Senior Systems Engineer in the US W est Coast and I'm joined today by my daughter Samantha. -Hey. -Hey. I wanted to ask you, you're a brand- new incoming college student. What are you most concerned about for the future? Well, to start, I'm pretty worried about negative effects on the environment like deforestation and soil depletion and climate change. Additionally, I'm worried about things like sexism in the workplace and gender gap wages. Wow, that's a lot to think about. It got me thinking as well what we can do to make this world a better place. To start off our presentation, I got three suggestions here. If we stay curious, part of what we can do with that curiosity is actually share with each other good data. You all like to analyze data, but sharing some good data would be a great idea. and JMP has a data for green initiative where over the JMP community, you can actually share what you think are meaningful views, meaningful data collections, and we can analyze these things together. The second thing you can do is to use your time. So JMP and SAS have both partnered with the IIASA, which is trying to actively measure the amount of deforestation in the rainforest to help guide better policy. So it's kind of a cool application that lets you look at some of their satellite images and actually help identify where you see development in human growth in the rainforest and enable them to do a better measurements. Lastly, user skills. We all have great JMP skills in practice, looking at analytics and building visualizations. Our friends at WildTrack I think are some of the best examples where they're using the footprints of many different species of endangered animals, and by doing a little bit of JMP and a little bit of visualization, they're able to help track in a non- invasive way these endangered animals to help us again create better policy. That's Sky and Zoe, and definitely I'll put a link in here where you can check out their work and get inspired. All right, so without further ado, here is the pictures for the gallery for this version. And in our version, I am going to dedicate every view to something around environmental green data. We can start that conversation. But as usual, I am showing you some things that are new into Graph Builder, such as the first five years I show you have been brand new things in JMP 16, as well as I'm going to show you just a couple of tips and tricks you probably have never seen before. All right, I've got the first chart here, and the first chart is going to address equality and it's on the gender wage gap. It's going to show you a new interval type chart that's available in JMP 16 that I call the Dumbbell Chart. Now I'm going to give you this journal, and why I'm pointing this out is if you want to recreate this view, you not only have a picture of what it looks like and tips on how to set your data up, to make this chart, but I give you the steps. In order, not only that, I give you the data, and within the data, you can just click on the script to regenerate the view. It's all there for you. I'm going to build this one from scratch. What is this data? This comes from the International Labor Organization and it is looking at the nominal mean earnings of males and females. But it has it normalized by US dollars. It would be nice to see if that gap' s getting smaller, like with Sammy's concern about the gender wage gap. So I used to think you'd have to create a formula which actually took the delta, but to graph it, you do not. You just need to have both columns you're wanting to compare. I'm going to go to Graph Builder. Everything today is going to be in Graph Builder. And I'm going to take the female monthly and the male monthly. I'm going to put them both on the x- axis, but I'm also going to put them on the interval landing spots. And I'm going to take year and put year on the Y. Now it looks really busy and it's because there's many countries represented here over this span. If I go under the red triangle, I like to call this a hot spot, and add a local data filter. We'll just look at it by one country. Let's pick out France. Now I get a pretty good view. It might be better to clean up the view a little bit. I can right- click right on the female monthly marker, and I can take this marker size up a little bit. I'm going to make it a 10. And now I can do the same thing with the male monthly. I'll make that 10 as well. I can right- click right here into the graph and go to Customize. And I want to make that intersection bar. I want to make it a different color. It's the second air bar in that list, and I'm going to make it gray and maybe make it a bigger width of three. Now I got the view I like. Now you can kind of tell why I call it a Dumbbell Chart, because if anybody likes to work out, you know, at the gym, you have weights and the ends of the weights is where the heaviness of it is on the ends. In the middle, you have a bar to lift. That's why a lot of people called this a Dumbbell Charts. Now, a couple of cool things I can show you. Number one, we generally don't read from bottom up, we read from top down. You can right- click here in your Axis Settings, and I can reverse this order just by clicking on this little box right here. Now I'm going 2010 to 2019. Also, I might want to put a reference line on the X-axis to help my eyes. So I'm going to right- click go to Axis Settings. And I think about 3,500 would make a good little reference line. It's going to put one on the X- axis, and there we go. Now I can kind of gauge, is the gap closing, is the wage increasing for both sexes, that type of thing. Now I'm going to bring a lot of pictures in as examples for our data, and I'm going to show you that if you just take a picture and you just put it into your graph, it will put the picture as a background. Now, it's sized horribly here. It's easy, you just right- click go to Image, go to Size and Scale, and save Fill Graph. There we go. And this is pretty cool. Now, I know the female symbol here in the background map is red, so maybe I'll go right up here to my legend and I will change the colors around here. Pretty easy to do. A lso I'm going to right- click back into the graph and in my image, I'm going to make that background a little more transparent, maybe like a 0.3 here. Now I've got a really cool view. Now, one word of warning, it's locked into this scale , and you're going to get a different scale for each country because some countries pay more than others. I know Germany pays very well. You can see it changed my picture, so you'd have to right- click and go to Image again and Fill the Graph to pull it back correctly. You might want to move your reference line. The background maps are not great if you're going to change your scale a lot. I do have a version here that's a multiple view version without a picture and if you right- click on this one, you can see here I was comparing over the same scale. I fixed the scales but made them a little bigger. What was the difference in France, Germany and Sweden, and you can see that in France, the wage gap doesn't look so bad on this scale, but Germany has a bigger wage gap but pays higher. Maybe you want to be in Germany. And I noticed in Sweden that the females make more than the males. There you go. There's a lot of differences out there. This is some fun data to play with, so definitely see what views you like. All right, second picture we're going to look at is a Word Cloud. This was the second most popular thing that got requested. And you might have seen in JMP, there is a Text Explorer platform that allows you to look at unstructured text data that you might have. And Word Cloud was one of the views it gave you just with a click of a button. But how do you do that in Graph Builder? Well, let's take a look. In Graph Builder, all you need is the unstructured text. In this case, I have a column of words and you need some sort of counterweighting. Here I have the weight and where this data came from. This was a study run during COVID of what are the top five things teachers were worried about. Of course they were dealing with a lo t. Remote teaching, sick students, sick teachers, a big change to curriculums just to get through the year. So here, the highest weight was anxious. Twenty respondents all mentioned being anxious. So what I've done is I have sorted the words by weights and then I just put a order here. So the highest weight got an order of one and the next highest got an order of two and so forth. That's how I got the weight column and that's how I got the order column. And I also created some random data because you can have a sorted, ordered word cloud, but you can also have one that just looks like a cloud. To generate that one... You might not have known this, but if you go and open a new column in JMP with this initialized data, you can put in random data and you can put in things like random normal data. Okay, I have already done that. Let's just go to Graph Builder and see how this works. Well, the first thing I'm going to do, I'm going to put weights. There we go on the Y- axis, and I am going to size by weight, but I don't want points. And here's a little trick in JMP 16, under the red triangle, under the points elements panel here that's on your bottom left hand side of the Graph Builder, I could set a shape column. When I do that, I can substitute for points the actual word. And now you're starting to see the words and as you start to move around your graph, you can see what's going on with those words and that is very cool. This right here is your first check of doing a word cloud. Now I can color, by the way, to give this thing some color. Now, I'll want to make it as cloud-like as possible. I can move this random over here. A gain, playing around with the data set, I can get the view that I like. Maybe I'll say done here, maybe I'll go to the legend position, maybe I'll put it on the inside left. Maybe I'll go under the legend settings. Maybe I'll turn off all but just the color code . And now I got a pretty nice- looking word cloud. Now as well, if I wanted it to be in sorted order, because I know anxious is the most important. That's the biggest- sized word, I love that to kind of be on top and then the next, and then the next. So to do that one, I'll open my control chart panel back up. I will swap out the order for the random. Now you can see all the big words are on the bottom, so I'm going to right- click here under Axis Settings, and I'm going to do that reverse order again. N ow all the big stuff is up at the very top that I have. And Jittering, what you didn't see before was you were getting a centered grid jitter. And that's actually what's automatically in your points jittering. If I do a positive grid now, I get things in order. Anxious, constant stress and tired, whatever it has room for on the line. But it is in that row order, which is pretty cool. But it's so much on the left- hand axis that's a little weird. So I can right- click here on the X-axis. Even though there's nothing down here, you can still play with the settings. And I can go and maybe make this a negative 0.5 for the minimum. It's going to add a little bit of space over here. And if it did a nice job, did a really nice job. Now you can get an ordered word cloud. All right. And then, of course, the ones I have in my data that you can play with, you can see I put in a nice transparent apple background just by bringing in that picture, which is really nice to play with the colors and all those type of things. All right, that was a nice popular view everybody asked for. The third most popular view was Line Charts. Everybody likes to do line charts. In JMP 16, there's many new features that actually fit lines through points in a lot of different formats, as well as label your line interactively. And we're going to look at tree cover loss. Remember I showed you that link that would help you with folks that were trying to save the rainforest. Well, it's important to know how much we're losing around the planet. We have some of that data, the under three here, moving average smoother line chart. I bring up my data here. I've got tree cover loss in hectares. By year, this should be pretty straightforward, so I'll just go to my Graph Builder. I will put my tree cover loss in hectares. I will put my year down here on the X. And you can see I've got points and smoother lines. Not so exciting, maybe take drivers and overlay. Getting more interesting, but I don't like these lines. What other options do I have for other smoother lines? Well, in JMP 16, they put things like moving average. Maybe a moving average would be cool. You can control the spread of that mover average with this local width. And I'm going to do that one. I'm going to say done. And now it's looking pretty good. I'm actually going to open it right back up, there's one thing I forgot to do. You can actually put a confidence in around them. Now I'll say done just fine, but I clicked this little button right there. Very cool. But I just want to look at kind of the big hitters. So this might be a good place to go under the red triangle, go to the local data filter. Go ahead under drivers, and just take the top three drivers. There we go. This is a good chart, I'm very close. But one thing that this legend' s kind of hard to read. Wouldn't it be nice to put the name next to the line, maybe even on the line? Oh, that would be awesome. Well, you can do it, you might not know this is the place you can do it, but if you just right- click right here on the legend where it says Agricultural Shift, you can say what happens to the label. You can add minimum values and first value, last value, but just go click Name and you can see... Oh, look at that, drew it right in there. I'm going to do the same thing with Commodity, and I'm going to do the same thing with Forestry Driven. Now I don't need that legend. I can go under my red triangle versus Graph Builder. I can turn off the legend because it's not adding any value. Now, here's what's really cool, you don't have to leave it out here. You can move it anywhere along the line if you get close to the line, it will try to take the slope of the line as its orientation. I'll do this for Agricultural Shift, I'll put that one there. Commodity, I'll put one there. And now I can move out the axis. And now I've got a really cool chart. By the way, on this chart, Agricultural Shift was the big haha. It was something that we were definitely having a huge spike in, but I think those efforts, of our friends trying to save the rainforest have managed to pull it back a little bit. A gain, you have that version scripted in your data as well as a cool little background picture in the background. What else do we have here? Let's go to the bottom of our chart. The next most popular chart was actually a Point Cumulative Summary Chart. Very interesting, it's on safety data. It's not using points, it's using, looks like a value of years. That's pretty cool. This data came from the Bureau of Transportation Statistics, and why I liked this data was it not only gave us a index of crash rate and injury rates, and this is all based off millions of miles driven, but for each year, like in 1998 year, it told us that the dual front airbags was the safety innovation that came in that year. T his would be cool to see what's going on with my line chart. I'm going to go graph, Graph Builder. We'll go ahead and take both crash and injury rate, put it on the Y, put year on the X. I'm going to turn off the smoother line here and just look at the points. And can I tell any difference between crash rate and injury rate? I really can't. This is where having a cumulative summary would be really cool because I can go under the summary statistic, under the Points element and just change this out from none to cumulative summary. Now, do you get a sense of the differences in the slope of the line? You should, because the summary of the crash rate definitely has a steep line, and I would expect this, there's more people on the Earth now driving more miles . But you can see that it looks like the injury rate has less steep slope and seems to be flattening out. And maybe that's because of these innovations. Here under my Axis Settings for the X- axis, I can put in like in 1998, there were the dual airbags. And see if that might be an inflection point, a cause of cars that are now protecting us more from injury. That's pretty cool. The other cool thing to do, you could as well under your points red triangle, set the shape column by year. And even though it's continuous, it's just going to give you the value in this case, that is really cool. So now I'm seeing it by year. Very nice. Then I have a view in here where I have gone through and added a whole bunch of the safety innovations over time and put a nice more transparent airbag background because airbags was a big deal. But you can see when things like blind spot warnings came in, anti- lock brake [inaudible 00:22:34] technology. Really cool to see how the industry is helping to save us from injuries. All right, [inaudible 00:22:45] right along. Our next to last chart, but still very popular review, is Advanced Box Plots. There's a lot more you can do with box plots to integrate them even with other elements like points and labels. And we're going to look at some climate city risk. And this is some really fun data that I found on looking out, projecting out to 2050. And it was coming up with this total climate change risk index on a 1- 100 scale. And it was looking at things like potential sea rise, shifts in temperature. Shifts in climate is something very important for a lot of us, especially us out in the West Coast, which is water stress or water scarcity. And that's how I came up with this total climate change risk score. If I want to see what that one is looking like on a box plot, it would be pretty easy to just take my total climate change on the Y. Take my... Well, actually, I'm going to put it on the X. There we go. And I'm going to do it by region on the Y. Now that's going to allow me to then ask for some box plots and here we go. It's not so interesting to me. I'm going to hold my Shift key down and add back in the points. Now, boring box plots don't have to be boring anymore because now I have different types of box plot types I can do in styles. Under style I got this Solid Style. Now it colors it in, which is pretty cool. And as well, you can go and notch them, and you can go and add fences to them. By the way, it's got the Outlier selected and since it looks like on this data set, you can see I've turned the labels on, I've already sorted this by total climate risk. I just want to label the top 10 that's already on there. I can turn off this outlier and that takes any duplications out of there and that's what we're looking at, which is pretty cool. Maybe change this color. By the way, if you cannot see your points in the boxplot, sometimes if you put the box plot in last, it would have moved it forward and it's over top of your points. So then all you have to do is go into your points and move it forward. Just right- clicking into your graph and just go into the right element and bringing it forward. That is pretty cool. And you can see what's going on where we have the highest risk of cities running in the climate change risk in 2050 and usually things coastal are at extreme risk of water shortage. Very cool. All right. And as well, I put a nice little background picture in the background on this one. It moved the legend in a little bit, so that's all in the instructions as well. All right, so we are to our last beautiful pictures from the gallery review. And that's a Wind Rose Chart, and I was thinking that we're getting a lot of adverse weather given the changes in our climate, and so we're always trying to get better at predicting which way are the winds blowing, how strong, where are hurricanes going, tornadoes, typhoons, all these types of things. There is a cool view and this is not limited to JMP 16. But it is a type of... The pie chart is actually a version of a Coxcomb chart that will make a compass rose. If I can get it labels that tell it like in a compass, what's north east? What's north west? Those type of compass directions. I can come up with a pretty cool pie chart that lets me segment that chart by, in this case, wind speed. So let's take a look at this data. There we go. This is a day's worth of data in the Great Lakes area. If you take a look here at the 6th row, you can see that I not only have latitude and longitude and the speed, where it starts, how strong the wind was, what direction it was going, and then of course, I can get that into a compass direction like west southwest. With that I should be able to go and just put that compass direction down on the X, ask for a pie chart, but not only any kind of pie chart, I am going to ask for the Coxcomb chart and I'm going to take the wind speed and I'm going to overlay it by the wind speed. You can play with these colors, I might move this in a little bit. I might make the really fast winds red. And now I can see that they were mainly in this direction on the compass. That was where predominantly most of the wind was and where some of the darker red was as well. Very cool. I have a couple of versions with this, you might have seen that I brought in kind of an old type of wind direction map where they're drawing the wind vectors onto a map, which is pretty cool, and if you want to see how to do that one, I put this in the instructions as well. And if I open the control panel back up, and I go under the spread hot spot where points are, and I go to Set Shape E xpression. You can see that there's a formula behind it. And what this formula is doing is it's looking at each point, which is plotted by the latitude and the longitude on the map. Then it is taking the wind speed, it is drawing an arrow, and it's drawing a bigger arrow, of course, if there was a stronger speed. That's kind of cool, and that's what draws those blue lines that you're seeing involved right there. All right, all those are in your instructions. I'm just about out of time here, I will show you that I have put into the journal that I'm giving you where to learn more. You can see other galleries, you can see blogs in journals, other presentations, and even great tutorials. They're all from the JMP C ommunity, community.jmp.com. Those are there for you. Go have fun with your Pictures from the Gallery 7. Go try to recreate these views on your own data so they can be nice and compelling, and do use your curiosity, time, and skills to help save the planet.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, September 12, 2022

JMP is best known for allowing you to "touch" your data with interactive visualizations, dynamic linking and graphical statistical outputs. However, many repeatable options are buried either in the red triangle menu, within multiple layers of menu options or only available on the data table. There is hope though! Using JSL you can improve efficiency and create a personalized experience with custom toolbar items that allow you to stay in the analysis window and workflow. In this presentation, I will review different scripts that adds little wins in your analysis; such as being able to select a column in the graph builder column list and then have it be selected in the data table or how to remove outliers by selecting them and replacing them as missing (vs having to hide and exclude) or how to control the profile desirability function more efficiently and more. Other examples will include how to make your own tuning table for any analysis, quickly set spec limits based off of fitted distributions & desired sigma levels for a large number of columns, how to automate running an MSA on 100's of columns, plus how to identify columns that have subgroups automatically and then run the appropriate control charts. My goal by the end of the presentation is that you will be more efficient, have a new way of thinking about how to modify JMP, and will dive into scripting. Hello everyone. My name is Steve Hampton. I work at Precision Castparts. I'm the process control manager there and I'm here today to talk about unleashing your productivity with JMP customization. I live with my family in Vancouver, Washington. I have been in castings my entire career. The last 15, I ha ve been with PCC, which is investment castings, and I am a self- proclaimed stat nerd. I think this little post-it note, it briefly describes a lot of my conversation I have with my wife, where she just gives me a very strange look as I try and explain why I am not watching TV and I'm playing around on JMP on Saturday, because I have a tasty beer to go along with it. So when I'm not nerding out on stats, my other thoughts usually focus around work if it's not around fun activities outside, and work is pretty cool. We make a lot of different products, but the one that I really like to show off is this 6 foot in diameter one- piece titanium casting. It's called an engine section stator. It goes on the Trent XWB engine which goes on the A 350 airplane, if you're keeping track. And you can actually see it tucked right behind those main fan blades in this second picture, as the first thing that the air will see before it enters into the core of the engine. So just a really cool industry to be in, aerospace and some high- tech investment castings. So why am I here? Well, I love JMP, I love talking to people that love JMP, and I love talking to people that love stats. So great to be around like minded people. And I hate clicking buttons to get things accomplished. So if you remember back in the day, there is a little known Christmas movie called The Grinch, [inaudible 00:02:02], and he had a scene where he's just saying, "The noise, the noise, the noise," and I feel like that's how I am a lot of times when I'm in JMP. It's the clicks, the clicks, the clicks, they just drive me insane. And I'm here because I like flexibility, and I think most people do. So I like to share some things that I've done to increase my flexibility with JMP. Interesting note, when I thought about this presentation, this mindset actually started way back in the day. I remember loving my NES, but the controller, very quickly, I had the thoughts on, "Well, why can't I have Jump be B and Run be A?" Because that worked better for me. And then that only got worse. The controllers got more buttons, which was great. Added to my need for flexibility but it took a long time for them to start to allow us to customize things. Now they're pretty good. But before the consoles got pretty good, I really found my benchmark for life of interfacing with a program in computer games, because not only do I have a keyboard that had tons of buttons so I could just immediately cause an action to take place, but I could remap all of them. So that's been my baseline when I compare everything that I interact with that's electronic to that. So there is hope, though, because we have the humble toolbar, at least you think it's humble, and the scripting, which is everyone that's involved with it knows it's incredibly powerful. Real quick, our first efficiency power up is the toolbar. So it's like Mario getting a little mushroom there. He goes from a small little guy to a bigger guy. A toolbar is by default pretty limited, but it's really easy. Just go turn on things and you immediately have a lot of access to things you can do with just one click on actions that you'll probably do a lot during data manipulation. So I recommend you keep on Reports, Tables, Tools, Analyze, and Data Table, which I marked up here in red boxes on the right. Another tip is that you can actually turn on toolbars for these windows differently or independently, and you can move them around independently, which is really nice if you want to have a custom set for each window, but be careful, if you move around too much, you' ll lose some of your efficiency as your hunt for where the icon is, has changed from one window to another. Even better, you can make your own. So if you go into the customizing toolbars and new toolbar, you can make your own. You see all those blue ones are ones that I've made. I think I've actually now made more toolbars for myself than come in JMP. So w inner winner chicken dinner, I guess. And the black ones are actually ones that I've added some additional icons to as well. So I think that combination gets you to raccoon Mario, which I thought was a pretty neat stuff back in the day. Always wanted to be raccoon Mario. Just some real quick other little things before we get into JMP is you can link frequently used buttons if you use the built in command. This really works well if you still want to be able to undo when you click something. If you link to a script and you run it, you can't undo it, which is a little bit annoying, but I usually put my script as a Run JSL in this file so it's linked, not embedding it in the toolbar, because then I can change the file outside of the toolbar and update the functionality without having to dig back into this toolbar menu. You can use built in icons. I use PowerPoint to just give a shorthand of what the icon does and then save it as bitmap and upload it. It's not great. If anyone has any better way of doing it, I'd love to talk to you at some point. But then you can also assign shortcuts, you can hide built in toolbar icons that you don't really want to be interested in. You can also add to your standard toolbars. So really scripting in JMP is super powerful, and then when you combine it with the toolbar, you get a pretty legendary efficiency team that I think looks like this. That's how it appears when I think about the combination. So let's get into JMP. So I have this data table here, and it's got a lot of columns. Normally, this would take a fair amount of cleaning up. First thing I do is just understand, is there the right column types and the right column amounts? So I can immediately see these two columns that are highlighted, that they're supposed to be continuous. You normally could right click, go into column info and change things up here. It's already bothering me because that's too many clicks. I can't change it to continuous here because it's a categorical base. So I made myself a little macro that I can go ahead and click and it's done. No matter how many I select, it's done. It kind of is a combination. It's a one click standardized attribute, which is great. You can also see over here I have this column that's the date and it's messed up. Once again, I don't want to right click and dig into subfolder. I just want to click and go. So I have made myself a two date function here. So now I have a two date. I also have this batch, which is right, it is technically continuous, but a lot of ways I want to use it would be more ordinal, but I want to have both. I've made myself a script that just throws out another column called Batch Nom and it's now a nominal column. And the reason that you might want to use this, if I select these guys and go into a filter, maybe I want to filter being able to drag, but if I want to grab just a single or just a couple batches, then it's a lot easier to do it in the nominal state. And also the way it shows up on some of the graphs can be better i n one way or the other. Then the next thing we can do is see that this date is individual date for a day. A lot of times we roll things up by weeks. So I could use the awesome built in formula columns and go and get a year week column as well, but it doesn't really mean a lot for most people because it's not a date. It's like, "What does the week five 2020 mean?" So I have built in a function where I take the date and it will return the next Sunday after that date. And so now I have a weekend column where I can bin it by weeks and it's really easy for people to understand and it's continuous versus the other way of doing it makes it nominal. So a lot of advantages to me in that. So I have just a lot of things that helps me clean up. The last thing is since this came from the categorical to numeric, then it has some missing things in here. I know these missings are actually zeros because if it doesn't have any data, it means there wasn't any defect. So since I do this a lot, I actually have this recode missing to zeros and recode zeros missing. So recode missing to zeros, there we go. So I haven't had to actually go in here, recode and then do more. Once again, already too much typing. For data manipulation steps that you do, adding in some scripting really can make you super effective in the data cleanup, and so you don't have to think about scripting just for analysis that you're running a lot. Just think about it in more micro steps to get some efficiency gains. The next thing is, I'm going to take us into Graph Builder, and let's bring this up. And so I spend a ton of time in Graph Builder because it's one of my favorite platforms. You really get a feel for your data and it's easy to get people that maybe aren't as deep into stats to understand what's going on. So this is probably the main platform I live in. And as I bring this up, immediately you can see like, "Oh well, since defect one is not in the right condition, the graph doesn't look great." But the nice thing is that I don't have to go to the data table. When I first started, I hated going back to the analysis or in the data table. Or I put them side by side, but then everything gets crunched up. So what the win was here is that by learning about the report layer and being able to pull out the state of different reports, in this case, I can pull up the state of what is selected in this box, I can actually select it in the data table. So now that is selected in the data table, and I could use my Go To Continuous, and now I'm back in business. So I call this staying in the workflow. I learned about that term from watching an on demand webinar about formulas and they were talking about staying in the workflow as far as staying in JMP. Don't go to Excel, do some formulas and bring it back into JMP. Like learning this use of formulas in JMP because its formula maker is amazing and you're staying in the workflow. So I'm saying you're staying in the workflow of staying in your analysis window, and that's where you want to live. I don't want to have to go back to the data table. So I'm going to use a standard toolbar to put a column switcher on and we're going to get all of these... Oh my goodness, all of these columns here. So we got a column switcher, and I also have put in another script here where I can now select from the data table with my column switcher, which is great. And it opened up another world of using a script that Jordan Hiller had helped me with when I was just starting down my scripting path of what we called newcome. So it was a way of taking data, this data is not good data, it's not fully completed parts. So I want to get rid of this, but I don't want to just hide and exclude. If I use my little shortcut, Ctlr+Q, that I remapped, that's gone. That's what I wanted on this slide, but now I lose all the information on that row. And I don't want to have to use Row E ditor, I don't want to have to use subsect with linking. That's all tuny c licking. So what I have here, I'm on the right response in the column switcher, I can select these guys and I can run my newcome script and now those data points are removed. So very quickly, you can go through with the column switcher and the newcome and be able to remove data that is either an outlier that you know shouldn't be in the data, or is causing problems, or is actual bad data that should be out. And I see a lot of bad data in the form of, it's out of place in the sequence of time. So this one's obvious, right? That's obviously bad data. It's obvious to me. So I'm just going to blow it out. So here's an interesting one. This is my interesting one at first. So you can see that the A, it's got some really crazy ones here, and these are all bad data. So another way you can look at that is I'm going to use... This toolbar is actually something you can just select as a standard script. You can just select this function in JMP to redo. So now I have my new column. I can take this out and I can do a box spot and I can say, "Okay, cool, here's outliers." So that's a way to blow things out. You can see I had a lot of them, but these guys are not outliers. And really, I'm using outliers in place of bad data because bad data usually shows up as an outlier. But these ones were not. [inaudible 00:14:40] show up as an outlier in the box spot. They are not bad data. So I'm going to nuke out all these guys. And you can see now, I don't have anything on the low side that's saying is an outlier, but I do know that I have outliers still and they' re outliers that I'm going to call in time. So this is so far away from the other data points that I know from my experience and looking at the [inaudible 00:15:10] that these are not real data points, they are data that we have jacked up. So I can go in here and select all these points that are bad data because of where they are in time and get rid of those. And you will never see that from a standard outlier analysis. So now I have a very nice looking curve, everything is cleaned up, and I was able to do that pretty darn fast. So it's a really powerful tool. If we go back along here, this is an interesting one. So I can see that I have this outlier right here. I'm going to nuke it. But you can see that there is a shift, and I unfortunately in my data table try to label it as a trial. So I could use the right click row, name selection and column, but there's still a lot of steps in there, so I'm just going to select. And I've made myself a binning column. So when I click this, whatever was selected is now binned. So I can very easily see what's going on. I can now even add in my text box and see the differences of the means. That's really useful. I mean, you can bin things as trial, not trial. I use good, bad a lot. So if my continuous data isn't great because of the measurement system, but it does do an okay job, it's just saying the part is good or bad, I can bin it with this and then do an analysis with the pass, fail, like a logistics analysis. So that's great. I also really like the dynamic selection. So if I were to go back here, I'm going to take the binning off. And now I have this selected column where it just changes it to a one if I select it. Now, I can dynamically go through and select different things, and I can see the mean. [inaudible 00:17:16] j ust real quickly. Okay, this grouping right here, its mean is 100 and above it is 288. And it's really useful for poking a data. Let's say right here, what's going on with this data? One, I can select it and see what differences and means are. But then two, I could see what the trend would have been like if this had not happened. So I can do a little bit investigation. And then I actually use inverse selection a lot, which is buried in the row menu. So I just have a toolbar here, so now I can inverse it. Everything basically is the same except for that now the bulk of the data is highlighted, which sometimes makes it easier. So that's great to use to analyze. The other thing I have is sometimes you might want to, say based upon what's selected here, what else is selected? So I call this my selected other columns. And then we're going to go and say, for this little grouping that was different, what else shared the equipment one that this grouping used. And when I click that, you can see that barely any of the rest of the B product used equipment one level, but a lot of item A did, and A is actually higher here. So it might be something that if we wanted to possibly not have this higher level, maybe we need to look at using the same equipment that the rest of B is using. A lot of different ways to slice and dice and learn things. The last thing is it could be I have two products here, but let's say I don't want to do two products, so I want to subset it. So I would go in, I have these subsetting icons, because once again, I just want to do it in one click and I do a lot of subsetting so it makes sense. So now, I have this new table. But what if I want to have the same graph though, and build that? I don't necessarily want to have to rebuild it from scratch, and there's some other ways to copy and paste some scripts over, but I do this enough that I actually am going to save the script to the clipboard and then I can bring this back and I can actually run the script from my clipboard. Hey, now I have a graph and it's all built up the exact same way I had before. So this is a really nice way to keep the efficiency you had from a previous table with a new table. Now, you'll see here, I'm going to close this and it pops up a window because it's saying, "What do you want to do with your other windows that's open?" And then if I were to click what to do with that, it [inaudible 00:20:24] say, "Hey, you didn't save this, what do you do with that?" And it's like a lot of times I have subset windows just because I want to be exploring things. And so all the clicking to close things is driving me crazy. So I actually made myself a little close everything around that table. And if you're in a window, it'll go close the base table and it doesn't ask you anything. So I can do real quick little explorations on little data sets and then close it down and just stay in the workflow and go fast. If I did want to save something, I made this little macro where it's going to save out in a generic name to a standard file location. And so I don't have to think about like, where am I going to save it and dive into a bunch of save menu. So if I want to move it a later time, I can, but I know at least where all my main things I want to keep are. And then if I do change something, say I change something... Actually, let's even say I change something from the graph. So I'm going to blow out all these guys. And if I wanted to now save this, I can't just click save because that's going to try and save this window. So I found it really useful to just have the save data table button that shows up so I can, once again, stay in the workflow of the analysis window and save my base data table. And once I'm done, I can close and get out of there. All right. That's everything I wanted to cover for there. So let's move on to a real quick example for functional data. This will be super quick. For functional data, the one thing I use a bunch is, if I have functional data that has a timestamp, you can see that's not super useful if I'm trying to look at all my lots because there's a big gap between the times. I could possibly step through and see what the shape is looking for. That's not super fun. And so what I have is I have this. I make a counter column which just uses the cumulus sum function. I can say, "What do I want to do it by?" And I can add up to four items that I subgroup the cumulative sum. I'm just going to do pieces because that's really the only thing that matters, and what I get out of that is I get a counter column that now everything shows up nice on one graph. And this is really good, but it only works well if the timestamps are pretty comparable. If the timestamps are all over the place because it's assuming the timestamps are the same, then you have to get a little bit more creative. Okay. So back to the presentation. So we got through all these things, but what I really want to show as we tail out of here is, for the ultimate in freedom and efficiency, you need to use scripts to expand JMP's functionality to fit your exact needs. So there's a lot of times, and hopefully you're putting them into the wished lists on the community, but there's a lot of simple ones you can actually take care of yourself. So you can see a nuclear Godzilla up there and we all know that a nuclear bomb plus Godzilla makes him king on the monsters. And so it's a little known fact probably that JMP plus scripting of functions makes you the king of data analysis. And I've gotten a lot of value from the scripting index, the two JMP books that are listed here and the user community, especially these two guys who I owe massive amounts of beer as gratitude for the time they saved my bacon and probably thousands of other people as well. So let's get into what we're going to do here. So the first thing is, we'll go back to this table. If I'm just doing more of an exploratory analysis or trying to get an explanation model versus predictive model, I'll use partition without a validation column. And this is nice because people that don't have JMP pro, they can use this as well. And what I do is... Yeah, we'll just put all this stuff in, that'll be fine. And we're going to go click O kay. And now I can actually... I like to split by LogW orth, so I can actually split by LogW orth and it's showing the minimum LogW orth out of this tree. And so I'll just split until I get below two. Okay, there's two. Go back, and here's my model. Our square is 44.9. Now, whenever counts get low, I do think that I might be overfitting a little bit, which is why I like this minimum split size, so I can prune back. Let's just say minimum split size is way too low. So I'm going to go 15 and then okay. So definitely left splits. Our square is still not too bad, and we can see our main factors that are contributing to our defects. These top three, I really like using the assess variable importance since it reorders what you're looking at into the main or the first boxes in the order. And I love the optimize and desirability. Once again, you have to keep clicking into the red box to run this. So I came up with a little macro to control the profiler . So I can actually come in here and say, "All right, I want to first maximize because it defaults to max and I can now remember the settings and we'll say max, and then I can alter the desirability to make it to the min and I can maximize and remember settings and we could say min. I could copy the paste settings, set to a row. I could link profilers and it's modal. Or non-modal, I apologize. So it can just stay up and out of the way when I don't need it, but yeah, it makes using the profiler, which is already just super powerful, super efficient as well. That's what I really like, and I suggest you grab from when I put them onto my page for this presentation. Then the next thing is, I got to go back here. I'm going to do some neural net stuff. So I definitely want to make a validation column. So I have these built in ones of the splits that I like, so it automatically creates it for me. So now I have my validation and I have a normal random uniform one in case I wanted to do any prediction screeners. And that helps with looking at cut out points, but in this case, we're just looking at neural nets. And where I got from here is I really like the Gen Reg, how it has this model comparison, and I really like in Bootstrap Force how you have a tuning table. When you're using a neural net, it can be very painful to feel like you're getting the right model because every step you have to change it, rerun it, and then look to see what's going on. And sometimes it just feels like you're spinning wheels. So through time, I found some models that I really like, and so I just built this platform where I'm going to recall. Here's everything, and I put down the number of boosts, number of tours is really low just so this run faster. And I can go ahead and run this. And so what it's going to do now, is for the models that I've put into my tuning table, ideally down the road, I like to have a tuning table be a little bit more in that first menu, but not there yet. So what I will get is I'll get this nice preto showing my test, my training validation and the different models. And so I can go through [inaudible 00:28:45] cool. Which one got me the closest without having to run these each individually? So I do see that it looks like this TanH(10)Linear(10)Boosted(5), overall, the average of all the R squares puts it at the highest, and it looks like everything's pretty close. So let's just start with this one. And the next thing I like to do is actually look at the min and maxes, and see did it actually predict in the range that I was expecting? So let's see, what did we say? I said 10, 10 and 5 boost. So 10, 10 and boost five. There we go. So I'll look at the min and max. So it predicted 5- 112. It's good, it didn't predict negative. That's definitely something I look for a model with defects or hours, because you're not supposed to have zero on any of those or negatives on any of those. And the defects we had was 1- 51. So yeah, it did okay. It's predicting on the high side, so I might go in here and be like, is there anything else that was actually predicting on the lower side or closer and still had good test values? So this is a really powerful tool because then I can just go into my actual window here and I can go down here and this is my model. And I could save my model out. I could save this first formula out, I can save this neural, just a certain one to my data table and then just use that from here on now. And it's already got built in my minimaxes here. Let's save from there. I find this to be a very powerful improvement for the neural net platform, which I already think is pretty powerful. And then also if you're just in standard JMP, the last thing I'll show is, I started trying to give some additional functionality for standard JMP people. And so here, you can... It contains how many initial nodes you have, what's the number that you want to step the nodes up, how many loops you want to go through with your validation percent, and if you wanted to do assess importance, you're going to click Okay. And what it does is, it runs all your models and it does the same thing except for... I had a chance here to work on getting the min max improved. So here I can see. Here's my min max, is what I was actually predicting, and then here I can see my training and validation. So ideally you want them to be as close together and as high as possible and then predict well. So here I'm looking at TanH(8), which puts me here. So that's pretty good. So that's probably the one I would go with. They're the closest, it doesn't overpredict. This one actually is predict... Even though it has a higher training, this one has a higher training, they're actually predicting negative values and then this one seems like it's getting over complex. So that's what I would go with. It's pretty useful for more standard users to get some more out of the neural net platform for them. Finally, let's just go quickly to some dim data stuff. We have the dim data example of get specs. So dim data example. So if our process that we do at our plant is we'll get a bunch of data and then we will calculate a spec limit from that. Usually it's either three or four sigma spec limit, so PPK of 1 or 1.33, and then we'll present that to the customer. That can take a long time in old days where we would manually run analysis and then best fit and then write it down or just use the normal distribution for everything and then calculate it in Excel. You have this option in JMP to do process capability and you can change it to calculate it off of a multiplier. And that's great because then you get your specs. The problem is you have a lot. Even if you hit the broadcast button, you have to enter that for each one. So what I did was definitely with help from a bunch of other people, because this got above my pay grade and scripting very quickly, is I went in and made this macro where I could say, what do you want the signal number to be? Click Okay and it goes through and it will spit out this for everyone or every distribution. Now I can right click, go into make combined data table. I have my data table. Then I can go here, select all for lower- up spec limit, use my Subset button, and this here, now I can submit that to the customer. Here's my upper- lower spec limits for all these things. I did that in hopefully less than a minute and it used to take someone to do that half their day, if not more. So using scripting to improve what you want to do, and the functionality and flexibility is great. Dim data unstacked table, where is that? Dim data unstacked table. Coming in at the home, here we have a bunch of dimensional data done by parts. The thing is, some of it is subgrouped and some of it is [inaudible 00:34:57] data. By using my subgrouping macro, I can select all my Ys, say what I want to check, and it will then put it as a subgroup or as an individual. And that allows me to go in and use my Control Chart Builder. So I can say these are individuals, these are subgroups, and I'm going to subgroup by this. Click Okay, and it takes a little bit to run. So I have one here, and it will actually put all the mixed control chart types all in one window, which is really nice because then I can now actually make a combined table of everything of the control limits in one table, which you can't do. You'd have to do a lot more steps of concatenating individual tables together. So that's great. You can also do the same thing with Process Screener, where I can put in individual and IMR here and then XB ar stuff here, and I can output a table here that shows for mixed subgrouping types IMR and XBar, and I can see the PPK of them and their out of spec rates and their alarm rates all in one. So it's nice to be able to keep everything together and have multiple windows open depending on their subgrouping type. And finally, the gauge R&R. Gauge R&R, especially something like a CMM, where you can have a lot of codes to do [inaudible 00:36:44] on, so it can be a lot of work. So I made a macro. The first thing you got to do to make this work really well is you got to add in specs. So I have this little script I made where I can select columns and then I can append columns if I need to. If I forgot one, I'm going to load from a spec table, click Okay, and then I will save this to the column properties. And I can actually use this as non-modal, so I can just keep it off the side in case you want to change something, and then I can go in and run my selected column gauge R&R. We're not going to go too crazy, but I'll just select these guys. It says, "Hey, you're going to run a gauge R&R in these. A re you okay with that?" Click Okay. We'll say part and operator and go. It won't take too long. And why this is nice? Is because you can see that if I go to connect the means, that connects really nicely like you'd expect. If I were to pull up a traditional gauge R&R , then it gaps because I don't have for each hit number, because the hit number for different codes are different. I'm missing data. So these don't apply to this actual item and it makes the charts get all messed up. But by using my macro, I can have a local data filter for each item. And when I select that local data filter, then all the things I'm not using go away. Now the charts look great. That adds a lot of how those charts look improvements. All the data down below is the same. Okay, that got us through everything. So I'm going to move on to some final thoughts. Okay, final thought. So I definitely encourage you to use the toolbar. Consistent layout, icon use and naming conventions are key for your effectiveness. Get into scripting. Here's some things I suggest that you focus on, and definitely use the log now that it will record your steps for you. It saves you a lot of typing. And really think beyond what JMP currently does and try and see if you can actually add that functionality yourself. For developers, I like to keep moving to keeping commands as flat as possible to get things out of submenus. And for me, I'm working on getting better at making icons, learning how to reference and pull data from the analysis window, which is called the report layer, and always including a recall button. So there are some statistical jokes for you, some of my favorite, and that's what I got. So thank you very much for your time and do we have any questions?

0 attendees

0

Event has ended